Mihaly Varadi - 02/03/2016 (last update 02/03/2016)
The aim of this analysis was to investigate the correlation between the radius of gyration averaged over an ensemble and the corresponding kappa-value.
Of the 72 available ensembles, we only used 46, which have passed both the steric clash analysis and the phi/psi angle analysis. Some ensembles described the same protein.
The average radius of gyration values were extracted from PED using a simple SQL query:
SELECT EnsembleExperimentID, EnsembleRg FROM Ensemble
The kappa values were calculated using the kappaLocalcider.py script which relies on the library LocalCider developed by the Pappu lab (http://pappulab.github.io/localCIDER/).
The ensemble IDs, average Rg values and kappa values were combined, and stored in kappa_rg_data.csv along with the data type (saxs, nmr, both) and the ensemble calculation method (pool, md).
The complete dataset (46 data points, i.e. ensembles) was subset by data types (nmr, saxs, both) and ensemble calculation (pool, md).
Sample sizes:
Note: Sample size of SAXS and MD subsets is rather small - confidence in statistical tests will be lower.
The Pearson’s product-moment correlation of kappa and radius of gyration in the complete dataset is -0.7143699 with a p-value of 6.370309910^{-6} which indicates a weak negative correlation.
SAXS dataset: -0.75753 with a p-value of 0.1379933 which indicates no correlation.
NMR dataset: -0.8180331 with a p-value of 1.047127810^{-6} which indicates weak negative correlation.
Pool-based dataset: -0.2794559 with a p-value of 0.2078393 which indicates weak negative correlation.
MD dataset: -0.6823963 with a p-value of 0.0428395 which indicates strong negative correlation.
The correlation between kappa and the average Rg value was expected to be ~0.7 according to previous observations by Pappu et al. In our dataset and the various subsets we observed different correlations:
Therefore apparently the MD-based ensembles are in best agreement with the earlier findings - however, the sample size is very small, only 10 ensembles, which lowers confidence.