Deep learning has transformed molecular structure prediction, with AlphaFold and deep learning-based docking methods achieving high accuracy in areas like protein design and drug discovery. However, these methods only predict the single state(static) structure, neglecting the protein structure flexibility. DiG (Distributional Graphormer), uses diffusion and efficiently predicts the equilibrium distribution of dynamic proteins, and generates diverse conformations and estimates state densities, orders of magnitude faster than traditional methods. Training can be achieved using experimental data or molecular dynamics simulations.
Application includes.
A. Protein conformation
B. ligand structure sampling
C. Catalyst–adsorbate samplings
D. Property-guided structure generation.
DiG Demonstrates Significant Efficiency Gains:
DiG achieves a remarkable 1000-fold speedup compared to Folding@Home on a 2.6-ms MD simulation of SARS-CoV-2 main protease conformation sampling.
DiG accomplishes this in 18 GPU days, whereas Folding@Home finished approximately 70 GPUs in 365 days.
Much cheaper compared to current sampling methods.
Data and Databases used in DiG:
PDB version used for training: downloaded on 25 December 2020
Template search used PDB70 database: downloaded on 13 May 2020
MSA lookup used Uniclust30 v.2018_08
Simulation trajectories: 238 from GPCRmd dataset
Protein–ligand docked complexes: CrossDocked2020 dataset v1.3
Programming languages and libraries: Python, PyTorch, Numpy, fairseq, torch-geometric, rdkit
MSA and PDB70 template searches: HHBlits and HHSearch from hh-suite
MD simulations: Gromacs
Energy function training: OpenMM, pdbfixer, amber14 force field
DFT calculations for carbon polymorphs dataset: VASP
Protein conformation sampling
DiG's performance was assessed against Conformational distributions from extensive MD simulations of SARS-CoV-2 proteins (RBD and main protease).
Proteins with experimentally determined multiple conformations.
DiG-generated structures closely resembled the diverse conformations observed in MD simulations and 70% of the RBD conformations sampled by simulations can be covered with just 10,000 DiG-generated structures.
It captured multiple functional states for various proteins, including adenylate kinase (rmsd < 1.0Å), LmrP(rmsd < 2.0Å), BRAF kinase, and D-ribose binding protein.
Ligand structure sampling around binding sites
DiG model trained on 1,500 complexes from MD simulations.
DiG evaluated on 409 protein-ligand systems not in training dataset.
Inputs: protein pocket information (atomic type and position), ligand descriptor (SMILES string).
Outputs: atomic coordinate distributions of both ligand and protein pocket.
Protein pocket flexibility reflected in up to 1.0 Å r.m.s.d. changes in atomic positions.
Conformationally, generated structures highly similar to crystal ligands (r.m.s.d. 1.74 Å).
Including binding pose deviations, generated structures within 2.0 Å r.m.s.d. of experimental data for nearly all 409 systems
Catalyst–adsorbate sampling
DiG trained on MD trajectories from the Open Catalyst.
Evaluated on random combinations of adsorbates and surfaces not in the training set.
DiG predicts adsorption sites and stable adsorbate configurations with probabilities.
Adsorption configurations of an acyl group on a stepped TiIr alloy surface predicted by DiG.
DiG finds all stable sites from a grid search using DFT methods.
Adsorption configurations close to DFT calculation results (RMSD 0.5–0.8 Å).
DiG predicts adsorption sites and probabilities for single N or O atoms on ten metallic surfaces.
Achieves 81% site coverage compared with DFT grid search results.
Predictions show excellent accordance with adsorption energies from DFT.
DiG is much faster than DFT (1 minute vs >2 hours for a single relaxation).
Presentation on DiG:
Thank you for spending your time on my blog! I would love to hear from you about any other topic, tool or tutorial and discussion to cover in my future posts.
Please do not hesitate to connect via LinkedIn


