Steric clashes analysis

Mihaly Varadi - 14/04/2015 (last update 17/02/2017 - wow, almost 2 years!)

EXECUTIVE SUMMARY

The objective of this analysis was to investigate the prevalence of steric clashes in ensemble models. The complete dataset was divided into subsets based on the underlying experimental data (SAXS, NMR, NMR & SAXS), and conformer generation method (random pool-based and MD-based). The subsets were compared with statistical tests to each other using Wilcoxon-Mann-Whitney tests. This tests was chosen due to the non-Gaussian distribution of the variables.


1 Data assembly

The complete PED database was used for the analysis. This constitued of:

  • 25466 conformers (PDB files)
  • of 60 ensembles
  • from 25 entries

2 Steric clash calculation

Steric clashes were identified with the python script steric_clash.py. The script relies on the PDBParser module of Biopython. When the script is executed, it will visit every subfolder in the original folder (i.e. going 1 level below), looking for files with “.pdb” extensions. The algorithm goes through all the atoms in the chains, and at each atom the pairwise distances between the given atom and all the other atoms is calculated. If the sum of the distances is lower or equal to the sum of the two atom’s van der Waals radius, then a steric clash is recorded. The output is a list of clashing atom pairs for each PDB file, saved in the stericClashLogs folder, and a main summary file, clash_stats.csv.

Sample from the clash_stats.csv file:

id  num_of_residue_clashes  num_of_residues clash_per_residue
PED5AAC_1/5AAC-1.pdb    0   92  0.00
PED5AAC_1/5AAC-10.pdb   5   92  0.05
PED5AAC_1/5AAC-11.pdb   0   92  0.00
PED5AAC_1/5AAC-12.pdb   0   92  0.00
PED5AAC_1/5AAC-13.pdb   0   92  0.00
PED5AAC_1/5AAC-14.pdb   3   92  0.03

3 Explorative analysis


4 Statistical tests

4.1 SAXS, NMR and NMR & SAXS subsets

SAXS and NMR plus SAXS entries have significantly higher ratios of steric clashes than NMR-based entries. Entries based only on SAXS data have the largest ratio of steric clashes proportional to the chain length.

Wilcoxon-test: SAXS only is significantly different than NMR only, p-value=0

Wilcoxon-test: SAXS and NMR is significantly different than NMR only, p-value=0

Wilcoxon-test: SAXS and NMR is significantly different than SAXS only, p-value=0.0030284

4.2 Random pool-based and MD-based subsets

MD-based entries have almost no steric clashes, and random pool-based entries have significantly more steric clashes compared to them.

Wilcoxon-test: Random-pool is significantly different than MD-based, p-value=0


CONCLUSION

Based on the analysis, the main difference is between MD-based and random pool-based conformer generation algorithms. MD-based almost never produces steric clashes, while in the random pool there can be conformers with even 20% of their residues in clashes. The differences between SAXS, NMR and NMR & SAXS based entries could reflect that MD-based entries were mostly associated with NMR data, while SAXS and NMR & SAXS based entries relied on random pools.