Advanced search performed using these criteria:
This resulted in 5386 protein chains in 5492 PDB entries.
Entries were downloaded:
Each PDB ID and chain was used in order to get NMRCore, FindCore, FindCore extended (aka FindCore2) and Cyrange data.
## Warning in ks.test(number_of_domains$findcore, number_of_domains
## $findcore2): p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$findcore and number_of_domains$findcore2
## D = 0.00069134, p-value = 1
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$findcore, number_of_domains$cyrange):
## p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$findcore and number_of_domains$cyrange
## D = 0.36307, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$findcore, number_of_domains$nmrcore):
## p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$findcore and number_of_domains$nmrcore
## D = 0.92829, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$cyrange, number_of_domains$findcore2):
## p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$cyrange and number_of_domains$findcore2
## D = 0.36238, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$cyrange, number_of_domains$nmrcore):
## p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$cyrange and number_of_domains$nmrcore
## D = 0.75857, p-value < 2.2e-16
## alternative hypothesis: two-sided
## Warning in ks.test(number_of_domains$nmrcore, number_of_domains$findcore2):
## p-value will be approximate in the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: number_of_domains$nmrcore and number_of_domains$findcore2
## D = 0.9276, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = -119.88, df = 13630, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3094396 -0.2994830
## sample estimates:
## mean of x mean of y
## 0.5084798 0.8129411
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$cyrange/corr_number_of_residues$chain_length
## t = -106.29, df = 13445, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2790850 -0.2689783
## sample estimates:
## mean of x mean of y
## 0.5084798 0.7825114
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$findcore/corr_number_of_residues$chain_length and corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length
## t = -129.46, df = 11848, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3114743 -0.3021825
## sample estimates:
## mean of x mean of y
## 0.5084798 0.8153082
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$cyrange/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = -11.239, df = 13797, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.03573660 -0.02512276
## sample estimates:
## mean of x mean of y
## 0.7825114 0.8129411
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$cyrange/corr_number_of_residues$chain_length and corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length
## t = -12.867, df = 12043, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.03779316 -0.02780036
## sample estimates:
## mean of x mean of y
## 0.7825114 0.8153082
##
## Welch Two Sample t-test
##
## data: corr_number_of_residues$nmrcore/corr_number_of_residues$chain_length and corr_number_of_residues$findcore2/corr_number_of_residues$chain_length
## t = 0.94297, df = 12134, p-value = 0.3457
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.002553404 0.007287564
## sample estimates:
## mean of x mean of y
## 0.8153082 0.8129411
According to Welch Two Sample t-test the differences in the ratios of core residues are the following:
There are 37 entries (out of 5543, 0.67%) that are more than 10% different, and only 5 entries (0.09%) that are more than 20% different by the two methods.
75% of the entries (4162) are minimum 90% the same.
FindCore2 uniquely identifies significantly more residues as core domain residues than Cyrange (Welch paired test p-value <0.01, Kolmogorov-Smirnov test p-value < 0.01)
##
## Welch Two Sample t-test
##
## data: corr_unique_residues$unique_to_cyrange and corr_unique_residues$unique_to_findcore
## t = -31.058, df = 13684, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.03724686 -0.03282445
## sample estimates:
## mean of x mean of y
## 0.02075680 0.05579246
## Warning in ks.test(corr_unique_residues$unique_to_cyrange,
## corr_unique_residues$unique_to_findcore): p-value will be approximate in
## the presence of ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: corr_unique_residues$unique_to_cyrange and corr_unique_residues$unique_to_findcore
## D = 0.60865, p-value < 2.2e-16
## alternative hypothesis: two-sided