Mihaly Varadi - 14/04/2015 (last update 17/02/2017 - wow, almost 2 years!)
The objective of this analysis was to investigate the prevalence of steric clashes in ensemble models. The complete dataset was divided into subsets based on the underlying experimental data (SAXS, NMR, NMR & SAXS), and conformer generation method (random pool-based and MD-based). The subsets were compared with statistical tests to each other using Wilcoxon-Mann-Whitney tests. This tests was chosen due to the non-Gaussian distribution of the variables.
The complete PED database was used for the analysis. This constitued of:
Steric clashes were identified with the python script steric_clash.py. The script relies on the PDBParser module of Biopython. When the script is executed, it will visit every subfolder in the original folder (i.e. going 1 level below), looking for files with “.pdb” extensions. The algorithm goes through all the atoms in the chains, and at each atom the pairwise distances between the given atom and all the other atoms is calculated. If the sum of the distances is lower or equal to the sum of the two atom’s van der Waals radius, then a steric clash is recorded. The output is a list of clashing atom pairs for each PDB file, saved in the stericClashLogs folder, and a main summary file, clash_stats.csv.
Sample from the clash_stats.csv file:
id num_of_residue_clashes num_of_residues clash_per_residue
PED5AAC_1/5AAC-1.pdb 0 92 0.00
PED5AAC_1/5AAC-10.pdb 5 92 0.05
PED5AAC_1/5AAC-11.pdb 0 92 0.00
PED5AAC_1/5AAC-12.pdb 0 92 0.00
PED5AAC_1/5AAC-13.pdb 0 92 0.00
PED5AAC_1/5AAC-14.pdb 3 92 0.03
SAXS and NMR plus SAXS entries have significantly higher ratios of steric clashes than NMR-based entries. Entries based only on SAXS data have the largest ratio of steric clashes proportional to the chain length.
Wilcoxon-test: SAXS only is significantly different than NMR only, p-value=0
Wilcoxon-test: SAXS and NMR is significantly different than NMR only, p-value=0
Wilcoxon-test: SAXS and NMR is significantly different than SAXS only, p-value=0.0030284
MD-based entries have almost no steric clashes, and random pool-based entries have significantly more steric clashes compared to them.
Wilcoxon-test: Random-pool is significantly different than MD-based, p-value=0
Based on the analysis, the main difference is between MD-based and random pool-based conformer generation algorithms. MD-based almost never produces steric clashes, while in the random pool there can be conformers with even 20% of their residues in clashes. The differences between SAXS, NMR and NMR & SAXS based entries could reflect that MD-based entries were mostly associated with NMR data, while SAXS and NMR & SAXS based entries relied on random pools.