Well-defined core regions

Explorative analysis and statistics

1) Data assembly

1.2) Data processing

Each PDB ID and chain was used in order to get NMRCore, FindCore, FindCore extended (aka FindCore2) and Cyrange data.

2) Number of domains per method

2.1) Version 1

2.2) Version 2

2.3) Version 3

## Warning in ks.test(number_of_domains$findcore, number_of_domains
## $findcore2): p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$findcore and number_of_domains$findcore2
## D = 0.00069134, p-value = 1
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$findcore, number_of_domains$cyrange):
## p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$findcore and number_of_domains$cyrange
## D = 0.36307, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$findcore, number_of_domains$nmrcore):
## p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$findcore and number_of_domains$nmrcore
## D = 0.92829, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$cyrange, number_of_domains$findcore2):
## p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$cyrange and number_of_domains$findcore2
## D = 0.36238, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$cyrange, number_of_domains$nmrcore):
## p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$cyrange and number_of_domains$nmrcore
## D = 0.75857, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$nmrcore, number_of_domains$findcore2):
## p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  number_of_domains$nmrcore and number_of_domains$findcore2
## D = 0.9276, p-value < 2.2e-16
## alternative hypothesis: two-sided

3) Ratio of core residues per method

## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = -119.88, df = 13630, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3094396 -0.2994830
## sample estimates:
## mean of x mean of y 
## 0.5084798 0.8129411
## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$cyrange/corr_number_of_residues$chain_length
## t = -106.29, df = 13445, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2790850 -0.2689783
## sample estimates:
## mean of x mean of y 
## 0.5084798 0.7825114
## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length
## t = -129.46, df = 11848, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3114743 -0.3021825
## sample estimates:
## mean of x mean of y 
## 0.5084798 0.8153082
## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$cyrange/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = -11.239, df = 13797, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.03573660 -0.02512276
## sample estimates:
## mean of x mean of y 
## 0.7825114 0.8129411
## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$cyrange/corr_number_of_residues$chain_length and corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length
## t = -12.867, df = 12043, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.03779316 -0.02780036
## sample estimates:
## mean of x mean of y 
## 0.7825114 0.8153082
## 
##  Welch Two Sample t-test
## 
## data:  corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = 0.94297, df = 12134, p-value = 0.3457
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.002553404  0.007287564
## sample estimates:
## mean of x mean of y 
## 0.8153082 0.8129411

According to Welch Two Sample t-test the differences in the ratios of core residues are the following:

  • FindCore is very significantly different (lower) than all the other methods
  • Cyrange is significantly different (lower) than NMRCore and FindCore2
  • NMRCore and FindCore2 are not significantly different

4) Ratio of unique residues

There are 37 entries (out of 5543, 0.67%) that are more than 10% different, and only 5 entries (0.09%) that are more than 20% different by the two methods.

75% of the entries (4162) are minimum 90% the same.

FindCore2 uniquely identifies significantly more residues as core domain residues than Cyrange (Welch paired test p-value <0.01, Kolmogorov-Smirnov test p-value < 0.01)

## 
##  Welch Two Sample t-test
## 
## data:  corr_unique_residues$unique_to_cyrange and corr_unique_residues$unique_to_findcore
## t = -31.058, df = 13684, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.03724686 -0.03282445
## sample estimates:
##  mean of x  mean of y 
## 0.02075680 0.05579246
## Warning in ks.test(corr_unique_residues$unique_to_cyrange,
## corr_unique_residues$unique_to_findcore): p-value will be approximate in
## the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  corr_unique_residues$unique_to_cyrange and corr_unique_residues$unique_to_findcore
## D = 0.60865, p-value < 2.2e-16
## alternative hypothesis: two-sided