Kappa and radius of gyration analysis

Mihaly Varadi - 02/03/2016 (last update 02/03/2016)

EXECUTIVE SUMMARY

The aim of this analysis was to investigate the correlation between the radius of gyration averaged over an ensemble and the corresponding kappa-value.

1 Data assembly

Of the 72 available ensembles, we only used 46, which have passed both the steric clash analysis and the phi/psi angle analysis. Some ensembles described the same protein.

The average radius of gyration values were extracted from PED using a simple SQL query:

SELECT EnsembleExperimentID, EnsembleRg FROM Ensemble 

The kappa values were calculated using the kappaLocalcider.py script which relies on the library LocalCider developed by the Pappu lab (http://pappulab.github.io/localCIDER/).

The ensemble IDs, average Rg values and kappa values were combined, and stored in kappa_rg_data.csv along with the data type (saxs, nmr, both) and the ensemble calculation method (pool, md).

2 Data subsetting

The complete dataset (46 data points, i.e. ensembles) was subset by data types (nmr, saxs, both) and ensemble calculation (pool, md).

Sample sizes:

  • Complete set: 31
  • NMR: 24
  • SAXS: 5
  • BOTH: 2
  • POOL: 22
  • MD: 9

Note: Sample size of SAXS and MD subsets is rather small - confidence in statistical tests will be lower.

3 Plotting and correlation

The Pearson’s product-moment correlation of kappa and radius of gyration in the complete dataset is -0.7143699 with a p-value of 6.370309910^{-6} which indicates a weak negative correlation.

  • SAXS dataset: -0.75753 with a p-value of 0.1379933 which indicates no correlation.

  • NMR dataset: -0.8180331 with a p-value of 1.047127810^{-6} which indicates weak negative correlation.

  • Pool-based dataset: -0.2794559 with a p-value of 0.2078393 which indicates weak negative correlation.

  • MD dataset: -0.6823963 with a p-value of 0.0428395 which indicates strong negative correlation.

CONCLUSION

The correlation between kappa and the average Rg value was expected to be ~0.7 according to previous observations by Pappu et al. In our dataset and the various subsets we observed different correlations:

  • The complete dataset and the NMR subset show strongly significant correlation of ~0.42
  • SAXS and SAXS+NMR subsets do not show significant correlation
  • The pool-based ensembles show significant correlation of ~0.38
  • And the MD-based ensembles show strongly significant correlation of ~0.7

Therefore apparently the MD-based ensembles are in best agreement with the earlier findings - however, the sample size is very small, only 10 ensembles, which lowers confidence.