Here, I removed gly from the testing data due to lacking of beta carbon chemical shift
## AA CA CB
## 1 SER 59.84 63.59
## 2 THR 62.47 69.49
## 3 ASP 54.54 41.14
## 4 ASP 53.98 41.14
## 5 SER 58.76 63.31
## 6 PRO 64.81 31.54
## AA freq
## [1,] "A" "0.0530973451327434"
## [2,] "R" "0.0442477876106195"
## [3,] "N" "0.0353982300884956"
## [4,] "D" "0.0973451327433628"
## [5,] "C" "0"
## [6,] "Q" "0.0530973451327434"
## [7,] "E" "0.106194690265487"
## [8,] "H" "0.00884955752212389"
## [9,] "I" "0.0353982300884956"
## [10,] "L" "0.115044247787611"
## [11,] "K" "0.0442477876106195"
## [12,] "M" "0.0530973451327434"
## [13,] "F" "0.0707964601769911"
## [14,] "P" "0.0619469026548673"
## [15,] "S" "0.0619469026548673"
## [16,] "T" "0.0530973451327434"
## [17,] "Y" "0.0265486725663717"
## [18,] "W" "0"
## [19,] "V" "0.079646017699115"
\[Cov= \begin{pmatrix} Sd_{Ca} & cov \\ cov & Sd_{Cb} \end{pmatrix} \] For each amino acid there should be three covariance matrices. For example, Cystein (ox) Covariance Matrix for \(\alpha\)-helix Secondary Structure.
## [,1] [,2]
## [1,] 2.430000 1.987452
## [2,] 1.987452 2.790000
\[Cov= \begin{pmatrix} Sd_{Ca} & 0 \\ 0 & Sd_{Cb} \end{pmatrix} \] There are 3 covariance matrices for each residue. For example, Arginine Covariance Matrix for \(\alpha\)-helix Secondary Structure
## [,1] [,2]
## [1,] 2.43 0.00
## [2,] 0.00 2.79
\[Cov= \begin{pmatrix} Sd_{Ca} & cov_{avg} \\ cov_{avg} & Sd_{Cb} \end{pmatrix} \]
There should be three matrices for each amino acid.
For example, Arginine Covariance Matrix for \(\alpha\)-helix Secondary Structure
## [,1] [,2]
## [1,] 2.430000 -0.692594
## [2,] -0.692594 2.790000
The standard deviation is the average standard deviation and the covariance is the average coveriance. \[Cov= \begin{pmatrix} Sd_{Ca, avg} & cov_{avg} \\ cov_{avg} & Sd_{Cb, avg} \end{pmatrix} \] For example, Arginine Covariance Matrix for any Secondary Structure
## [,1] [,2]
## [1,] 2.160000 -0.692594
## [2,] -0.692594 3.220000
Similar to Method D but with cov=0. For exam for arginine.
## [,1] [,2]
## [1,] 2.16 0.00
## [2,] 0.00 3.22
For example, for method A, Alanine covariance and inverse covariance matrix are:
## [,1] [,2]
## [1,] 2.430000 1.987452
## [2,] 1.987452 2.790000
## [,1] [,2]
## [1,] 0.462963 0.000000
## [2,] 0.000000 0.310559
## [1] 0.7252320 0.7313015 0.7293729 0.7274021 0.7193870 0.7119036 0.7050576
## [8] 0.7046656 0.6914656 0.6853443 0.6862799 0.6937346 0.6894009 0.6799994
## [15] 0.6775874 0.6761879 0.6622181 0.6346400 0.6452657 0.6237222 0.6207915
## [22] 0.6104276 0.6080978 0.6069231 0.6121106 0.6119336 0.6129610 0.6147677
## [29] 0.6191948 0.6241790 0.6227037 0.6201693 0.6249759 0.6301377 0.6206303
## [36] 0.6267864 0.6262982 0.6323001 0.6387994 0.6386078 0.6383700 0.6375337
## [43] 0.6438768 0.6428771 0.6489757 0.6552637 0.6613917 0.6675173 0.6731425
## [50] 0.6773740
The last version return a correction -0.38. And this version return a correction of -2.220446e-16, which is really small, and the correct reference is 0.
## [1] "bmr4851.txt"
## [1] "Method A"
## [1] 0.1020408
## [1] -2.220446e-16
## [1] "Method B"
## [1] 0.1020408
## [1] 0.08163265
## [1] "Method C"
## [1] 0.1020408
## [1] -0.1632653
## [1] "Method D"
## [1] -0.1020408
## [1] -0.04081633
## [1] "Method E"
## [1] 0.1020408
## [1] 0.1632653
This later can be changed to return the tope 3 best choices of assignments for each spin system.
Method A:
## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAFSYFNWMYPKTSBFFRAHNPTBAHITEIWSTDPARVNMWQYBQPLNHPFDQNPWWBPYQPFNKDAMQLRYVLTSQWLSNWRMBHLFMPVMQVNYQBFVQMILA"
Method B:
## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAISYFNWMYPKTSBBFRAHNPTBAHITEIWSTDVARVNMWQYBQPLNQPFDHNPWWBPYQPFNMDAMQLRYVLTSQELSYWRMBHLFMPVKQVNBWBIVQMYLA"
Method C:
## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "VVVCVVCVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVCCVVVVVVCVVVVVVVNCVVVCVVVVCVVVVVVVVCVVVVVVVCVVVVVVVVVVVVCVVVVVCVVVVV"
Method D:
## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "CCCCCCCCCCCCCCNCMCCCCCCCCCCCNCCCCCCCCCCCCCCCRCCCCQCCCCCCCCCCRCCCCCCCQCCCCCCMQCCCCCCCCCCCCCCCCCCCCCCCQCNCCCCCCCCCC"
Method E:
## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAISYFNWMYPKTSBBFRAHNPTLAHITEIWSTDVARVDMWQYBQPLNQPFDHNPWWBPYQPFNMDAMQLRYVLTSQELSNWRMBHLFKPVKQVNYQBFVQMYLA"
This test file doesn’t have cystein but I need to revise this code to have B included.
## [1] "Total number of residues in original sequence is: 113"
## $Method_A
## $Method_A$`Match Sum`
## [1] 71
##
## $Method_A$`Match Percentage`
## [1] 62.83186
##
##
## $Method_B
## $Method_B$`Match Sum`
## [1] 65
##
## $Method_B$`Match Percentage`
## [1] 57.52212
##
##
## $Method_C
## $Method_C$`Match Sum`
## [1] 3
##
## $Method_C$`Match Percentage`
## [1] 2.654867
##
##
## $Method_D
## $Method_D$`Match Sum`
## [1] 5
##
## $Method_D$`Match Percentage`
## [1] 4.424779
##
##
## $Method_E
## $Method_E$`Match Sum`
## [1] 70
##
## $Method_E$`Match Percentage`
## [1] 61.9469