Well-defined core regions

Explorative analysis and statistics

1) Data assembly

Advanced search performed using these criteria:

  • Protein molecule
  • No DNA/RNA
  • From 2000-01-01 to 2016-10-20
  • 10 <= x <= 100 models
  • 50 <= x residues
  • Solution/Solid-state NMR

This resulted in 7046 protein chains in 6215 PDB entries.

Entries were downloaded:

  • FASTA format

Each PDB ID and chain was used in order to get NMRCore, FindCore, FindCore extended (aka FindCore2) and Cyrange data.

2) Number of domains per method

NMRCore and FindCore finds an unrealistic number of regions. FindCore 2 and Cyrange finds 1.8893611 and 1.6038773 regions on average.

3) Ratio of core residues per method

According to Welch Two Sample t-test the differences in the ratios of core residues are the following:

  • FindCore is very significantly different (lower) than all the other methods
  • Cyrange is significantly different (lower) than NMRCore and FindCore2
  • NMRCore and FindCore2 are not significantly different

4) Secondary structure of core domain residues

4.1) Ratio of core-domain residues with disorder propensities > 0.5

On average 0.1095728 of core residues are disordered when identified by Cyrange, and 0.125645 when identified with FindCore2. This difference is significant according to Welch paired t-test with p-value 1.753912610^{-218}. Significantly more of the identified core residues are disordered according to FindCore2 than Cyrange.

4.2) Ratio of core-domain residues with S2 < 0.85

4.3) Ratio of core-domain residues not in secondary structural elements

The core-residues identified by FindCore2 are significantly less structured (i.e. found in secondary structural elements) than those identified by Cyrange. Welch paired t-test p-value: 7.429417310^{-205}

5) Ratio of unique residues

There are 84 entries (out of 6692, 0.0125523%) that are more than 10% different, and 44 entries (0.006575%) that are more than 20% different by the two methods.

0.684997% of the entries (4584) are minimum 90% the same.

FindCore2 uniquely identifies significantly more residues as core domain residues than Cyrange (Welch paired test p-value=1.164165310^{-223}).

6) Uniquely identified residues in disordered and well-defined protein chains

6.1) Binary-coloured scatter plots

6.2) Gradient-coloured scatter plots

6.3) Density plots

6.4) Density plots of unique core residues outside of SSEs