Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

Ce Liu1*, Jun Wang1*, Zhiqiang Cai1*, Yingxu Wang1,3, Huizhen Kuang2, Kaihui Cheng2, Liwei Zhang1, Qingkun Su1, Yining Tang2, Fenglei Cao1, Limei Han2, Siyu Zhu2†, Yuan Qi2†
1Shanghai Academy of Artificial Intelligence for Science 2Fudan University 3Mohamed bin Zayed University of Artificial Intelligence
*Equal Contribution, Corresponding Author

Abstract

Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D protein structural databases, such as the Protein Data Bank (PDB), by integrating dynamic data and additional physical properties. Specifically, we introduce a large-scale dataset encompassing approximately 12.6K proteins, each subjected to all-atom molecular dynamics (MD) simulations lasting 1 microsecond to capture conformational changes. Furthermore, we provide a comprehensive suite of physical properties, including atomic velocities and forces, potential and kinetic energies of proteins, and the temperature of the simulation environment, recorded at 1 picosecond intervals throughout the simulations. For benchmarking purposes, we evaluate state-of-the-art methods on the proposed dataset for the task of trajectory prediction. To demonstrate the value of integrating richer physical properties in the study of protein dynamics and related model design, we base our approach on the SE(3) diffusion model and incorporate these physical properties into the trajectory prediction process. Preliminary results indicate that this straightforward extension of the SE(3) model yields improved accuracy, as measured by MAE and RMSD, when the proposed physical properties are taken into consideration.

Statistics of dynamic PDB

Protein examples from dynamic PDB

Longer simulations reveal more conformational changes

MY ALT TEXT

RMSD plots for proteins from dynamic PDB and ATLAS. Longer simulation time can potentially capture more protein conformational changes, which are indicated by the red arrows.

Higher temporal resolution enhances dynamic details

Applications integrating dynamic and physical properties

To demonstrate the advantages of incorporating comprehensive physical properties into the analysis of protein dynamics and model design, we propose a SE(3) model extension to control the generation process based on conditions of amino acid sequence and physical properties. The model consists of three steps:

  1. Extracting features by amino acid encoder and physical properties encoder respectively
  2. Refining node features by IPA and concatenating with the physical condition embedding.
  3. Predicting the updated node features, torsion angles and transformations after 2D convolution.
MY ALT TEXT

Compared with SE(3)-Trans, our results demonstrate that the incorporation of physical properties allows the predictions from our approach to align more closely with the ground truth. The first row shows the results on the protein 2ERL_A, where our prediction on the alpha helices is more accurate. The second row shows the predicted 3TVJ_I. Our prediction is closer to the ground truth on the beta sheets.

MY ALT TEXT

BibTeX

@misc{liu2024dynamicpdbnewdataset,
      title={Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures},
      author={Ce Liu and Jun Wang and Zhiqiang Cai and Yingxu Wang and Huizhen Kuang and Kaihui Cheng and Liwei Zhang and Qingkun Su and Yining Tang and Fenglei Cao and Limei Han and Siyu Zhu and Yuan Qi},
      year={2024},
      eprint={2408.12413},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM},
}