Covariance Method

1. Importing and declaring all the necessary variables

2. Generating Testing Data

Here, I removed gly from the testing data due to lacking of beta carbon chemical shift

##    AA    CA    CB
## 1 SER 59.84 63.59
## 2 THR 62.47 69.49
## 3 ASP 54.54 41.14
## 4 ASP 53.98 41.14
## 5 SER 58.76 63.31
## 6 PRO 64.81 31.54

3. Calculating the Frequence from testing data

##       AA  freq                 
##  [1,] "A" "0.0530973451327434" 
##  [2,] "R" "0.0442477876106195" 
##  [3,] "N" "0.0353982300884956" 
##  [4,] "D" "0.0973451327433628" 
##  [5,] "C" "0"                  
##  [6,] "Q" "0.0530973451327434" 
##  [7,] "E" "0.106194690265487"  
##  [8,] "H" "0.00884955752212389"
##  [9,] "I" "0.0353982300884956" 
## [10,] "L" "0.115044247787611"  
## [11,] "K" "0.0442477876106195" 
## [12,] "M" "0.0530973451327434" 
## [13,] "F" "0.0707964601769911" 
## [14,] "P" "0.0619469026548673" 
## [15,] "S" "0.0619469026548673" 
## [16,] "T" "0.0530973451327434" 
## [17,] "Y" "0.0265486725663717" 
## [18,] "W" "0"                  
## [19,] "V" "0.079646017699115"

4. Assembling the 5 different coveriance matrix

Method A: formularize covariance matrix with secondary structures.

\[Cov= \begin{pmatrix} Sd_{Ca} & cov \\ cov & Sd_{Cb} \end{pmatrix} \] For each amino acid there should be three covariance matrices. For example, Cystein (ox) Covariance Matrix for \(\alpha\)-helix Secondary Structure.

##          [,1]     [,2]
## [1,] 2.430000 1.987452
## [2,] 1.987452 2.790000

Method B: formularize covariance matrix with cov = 0 but differenciate the standard deviations among secondary structures.

\[Cov= \begin{pmatrix} Sd_{Ca} & 0 \\ 0 & Sd_{Cb} \end{pmatrix} \] There are 3 covariance matrices for each residue. For example, Arginine Covariance Matrix for \(\alpha\)-helix Secondary Structure

##      [,1] [,2]
## [1,] 2.43 0.00
## [2,] 0.00 2.79

Method C: use the average of coveriance from 3 different secondary structure to fill the matrix.

\[Cov= \begin{pmatrix} Sd_{Ca} & cov_{avg} \\ cov_{avg} & Sd_{Cb} \end{pmatrix} \]

  • First Calculate all the average covariance of all the amino acides
  • Then use the average of of the covariance fill the matrix

There should be three matrices for each amino acid.

For example, Arginine Covariance Matrix for \(\alpha\)-helix Secondary Structure

##           [,1]      [,2]
## [1,]  2.430000 -0.692594
## [2,] -0.692594  2.790000

Method D: Using covariance without scondary structure.

The standard deviation is the average standard deviation and the covariance is the average coveriance. \[Cov= \begin{pmatrix} Sd_{Ca, avg} & cov_{avg} \\ cov_{avg} & Sd_{Cb, avg} \end{pmatrix} \] For example, Arginine Covariance Matrix for any Secondary Structure

##           [,1]      [,2]
## [1,]  2.160000 -0.692594
## [2,] -0.692594  3.220000

Method E: Using the average of standard deviation and cov=0

Similar to Method D but with cov=0. For exam for arginine.

##      [,1] [,2]
## [1,] 2.16 0.00
## [2,] 0.00 3.22

Calculate all the covariance & inverse matrixfor method A-E

For example, for method A, Alanine covariance and inverse covariance matrix are:

##          [,1]     [,2]
## [1,] 2.430000 1.987452
## [2,] 1.987452 2.790000
##          [,1]     [,2]
## [1,] 0.462963 0.000000
## [2,] 0.000000 0.310559

5. Calculate the Probability

Calculating the \(\chi^{*}\)

Using the \(\chi^{*}\) to get corresponding Probability

6. Finding the sum of absolute value of the difference between predicted and observed.

Reference Correction Calculation Function, return the sum of absolute difference

##  [1] 0.7252320 0.7313015 0.7293729 0.7274021 0.7193870 0.7119036 0.7050576
##  [8] 0.7046656 0.6914656 0.6853443 0.6862799 0.6937346 0.6894009 0.6799994
## [15] 0.6775874 0.6761879 0.6622181 0.6346400 0.6452657 0.6237222 0.6207915
## [22] 0.6104276 0.6080978 0.6069231 0.6121106 0.6119336 0.6129610 0.6147677
## [29] 0.6191948 0.6241790 0.6227037 0.6201693 0.6249759 0.6301377 0.6206303
## [36] 0.6267864 0.6262982 0.6323001 0.6387994 0.6386078 0.6383700 0.6375337
## [43] 0.6438768 0.6428771 0.6489757 0.6552637 0.6613917 0.6675173 0.6731425
## [50] 0.6773740

7. Overall Testing Function

The last version return a correction -0.38. And this version return a correction of -2.220446e-16, which is really small, and the correct reference is 0.

## [1] "bmr4851.txt"
## [1] "Method A"
## [1] 0.1020408
## [1] -2.220446e-16
## [1] "Method B"
## [1] 0.1020408
## [1] 0.08163265
## [1] "Method C"
## [1] 0.1020408
## [1] -0.1632653
## [1] "Method D"
## [1] -0.1020408
## [1] -0.04081633
## [1] "Method E"
## [1] 0.1020408
## [1] 0.1632653

8. Use the Correction to find the Assignment

This later can be changed to return the tope 3 best choices of assignments for each spin system.

Method A:

## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAFSYFNWMYPKTSBFFRAHNPTBAHITEIWSTDPARVNMWQYBQPLNHPFDQNPWWBPYQPFNKDAMQLRYVLTSQWLSNWRMBHLFMPVMQVNYQBFVQMILA"

Method B:

## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAISYFNWMYPKTSBBFRAHNPTBAHITEIWSTDVARVNMWQYBQPLNQPFDHNPWWBPYQPFNMDAMQLRYVLTSQELSYWRMBHLFMPVKQVNBWBIVQMYLA"

Method C:

## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "VVVCVVCVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVCCVVVVVVCVVVVVVVNCVVVCVVVVCVVVVVVVVCVVVVVVVCVVVVVVVVVVVVCVVVVVCVVVVV"

Method D:

## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "CCCCCCCCCCCCCCNCMCCCCCCCCCCCNCCCCCCCCCCCCCCCRCCCCQCCCCCCCCCCRCCCCCCCQCCCCCCMQCCCCCCCCCCCCCCCCCCCCCCCQCNCCCCCCCCCC"

Method E:

## [1] "STDDSPYKQAFSLFDRRIPKTSDLLRAQNPTLAEITEIESTLPAEVDMEQFLQVLNRPFDMDPEEFVFQVFDKDAMELRYVLTSEKLSNEEMDELLVPVKMVNYHDFVQMILA"
## [1] "STDDSPYKQAISYFNWMYPKTSBBFRAHNPTLAHITEIWSTDVARVDMWQYBQPLNQPFDHNPWWBPYQPFNMDAMQLRYVLTSQELSNWRMBHLFKPVKQVNYQBFVQMYLA"

9. Matching the resutls against the original sequence:

Matching numbers and Matching Percentages

This test file doesn’t have cystein but I need to revise this code to have B included.

## [1] "Total number of residues in original sequence is: 113"
## $Method_A
## $Method_A$`Match Sum`
## [1] 71
## 
## $Method_A$`Match Percentage`
## [1] 62.83186
## 
## 
## $Method_B
## $Method_B$`Match Sum`
## [1] 65
## 
## $Method_B$`Match Percentage`
## [1] 57.52212
## 
## 
## $Method_C
## $Method_C$`Match Sum`
## [1] 3
## 
## $Method_C$`Match Percentage`
## [1] 2.654867
## 
## 
## $Method_D
## $Method_D$`Match Sum`
## [1] 5
## 
## $Method_D$`Match Percentage`
## [1] 4.424779
## 
## 
## $Method_E
## $Method_E$`Match Sum`
## [1] 70
## 
## $Method_E$`Match Percentage`
## [1] 61.9469

LACS Method

Recreate the LACS figures

Scores

LACS cross-validation