LACS Results

##        CA                  CB           
##  Min.   :-1.390000   Min.   :-1.390000  
##  1st Qu.:-0.110000   1st Qu.:-0.110000  
##  Median : 0.010000   Median : 0.010000  
##  Mean   : 0.002056   Mean   : 0.002056  
##  3rd Qu.: 0.110000   3rd Qu.: 0.110000  
##  Max.   : 4.840000   Max.   : 4.840000

BaMORC Results

##        CA                 CB          
##  Min.   :-4.15000   Min.   :-1.77344  
##  1st Qu.:-0.23573   1st Qu.:-0.12265  
##  Median :-0.05567   Median : 0.03180  
##  Mean   :-0.04364   Mean   : 0.03304  
##  3rd Qu.: 0.13093   3rd Qu.: 0.18657  
##  Max.   : 5.00000   Max.   : 4.35558

BaMORC (without Reversed Det of CovMatrix) Results

##        CA                 CB          
##  Min.   :-4.14999   Min.   :-2.24017  
##  1st Qu.:-0.35255   1st Qu.:-0.11886  
##  Median :-0.14256   Median : 0.04536  
##  Mean   :-0.13435   Mean   : 0.05825  
##  3rd Qu.: 0.07161   3rd Qu.: 0.20881  
##  Max.   : 5.00000   Max.   : 4.51184

Outliers

BaMORC didn’t perform well on the following dataset, which have too many missing value and monotonic secondary structure typing coil. The quality of the data is an issue. THe outlier is similar to the LACS.

Positive side:

##      ID  AA  SS  CA      CB      N   
## [1,] "1" "M" "C" "NA"    "NA"    "NA"
## [2,] "2" "S" "C" "NA"    "NA"    "NA"
## [3,] "3" "V" "C" "NA"    "NA"    "NA"
## [4,] "4" "N" "C" "NA"    "NA"    "NA"
## [5,] "5" "S" "C" "53.90" "66.60" "NA"
## [6,] "6" "N" "C" "48.60" "34.10" "NA"

For the negative side:

  • Alpha Carbon: there are mostly undeterminated secondary structure, which in our case, we treated them as missing value, and only one appreciable secondary structure, which is coil.
##      ID  AA  SS  CA      CB      N       
## [1,] "1" "M" "U" "54.71" "32.73" "171.21"
## [2,] "2" "K" "U" "55.98" "32.96" "175.31"
## [3,] "3" "L" "U" "55.20" "42.58" "176.72"
## [4,] "4" "S" "U" "57.82" "63.93" "174.02"
## [5,] "5" "E" "U" "58.39" "29.26" "176.75"
## [6,] "6" "Y" "U" "60.14" "38.16" "176.47"
  • Beta Carbon: similar to the alpha carban, there are a lot of undeterminated secondary structure, which in our case, we treated them as missing value.
##      ID  AA  SS  CA      CB      N       
## [1,] "1" "M" "U" "56.62" "33.09" "177.95"
## [2,] "2" "S" "U" "59.26" "64.58" "176.09"
## [3,] "3" "A" "U" "53.70" "20.34" "179.70"
## [4,] "4" "T" "U" "62.65" "70.64" "176.07"
## [5,] "5" "A" "U" "53.45" "20.33" "179.11"
## [6,] "6" "A" "U" "53.49" "20.33" "179.56"

Data Problems vs. Performance (BaMORC)

For Alpla Carbon

Chemical Shift Missing vs. Alpha Carbon Reference Correction

Undeterminated Secondary Structure vs. Alpha Carbon Reference Correction

Glycine Frequence vs. Alpha Carbon Reference Correction

Data Length vs. Alpha Carbon Reference Correction

For Beta Carbon

Chemical Shift Missing vs. Beta Carbon Reference Correction

Undeterminated Secondary Structure vs. Beta Carbon Reference Correction

Glycine Frequence vs. Beta Carbon Reference Correction

Data Length vs. Beta Carbon Reference Correction

Coil Percentage vs. Reference Correction

Alpha Carbon

Beta Carbon

Error Model:

For Alpha Carbon

## [1] 1557
## [1] 1557
## 
## Call:
## lm(formula = DEoptim_results$CA ~ ill_data_stat$CA_NA_Freq + 
##     ill_data_stat$U_Freq + ill_data_stat$C_freq + ill_data_stat$Data_Len - 
##     1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5514 -0.1983 -0.0193  0.1666  4.9015 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## ill_data_stat$CA_NA_Freq  0.2869026  0.0808165   3.550 0.000397 ***
## ill_data_stat$U_Freq     -0.4998522  0.1165037  -4.290 1.90e-05 ***
## ill_data_stat$C_freq      0.0848052  0.0419326   2.022 0.043315 *  
## ill_data_stat$Data_Len   -0.0006439  0.0001390  -4.632 3.94e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3493 on 1459 degrees of freedom
##   (94 observations deleted due to missingness)
## Multiple R-squared:  0.04109,    Adjusted R-squared:  0.03846 
## F-statistic: 15.63 on 4 and 1459 DF,  p-value: 1.576e-12

For Beta Carbon

## 
## Call:
## lm(formula = DEoptim_results$CB ~ ill_data_stat$CB_NA_Freq + 
##     ill_data_stat$U_Freq + ill_data_stat$C_freq + ill_data_stat$Data_Len - 
##     1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9426 -0.1546 -0.0076  0.1598  4.2058 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## ill_data_stat$CB_NA_Freq  2.390e-01  6.160e-02   3.879 0.000109 ***
## ill_data_stat$U_Freq      5.324e-01  1.060e-01   5.022 5.74e-07 ***
## ill_data_stat$C_freq     -7.080e-02  4.190e-02  -1.690 0.091323 .  
## ill_data_stat$Data_Len    1.742e-05  1.327e-04   0.131 0.895566    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3258 on 1459 degrees of freedom
##   (94 observations deleted due to missingness)
## Multiple R-squared:  0.05821,    Adjusted R-squared:  0.05562 
## F-statistic: 22.54 on 4 and 1459 DF,  p-value: < 2.2e-16

Undetermine + Coi vs. Alpha Carbon Reference Correction

Undetermine + Coi vs. Beta Carbon Reference Correction

Non-missing count vs. Alpha Carbon Reference Correction

Non-missing count vs. Beta Carbon Reference Correction

U+C Count / Usable count

U+C Count / Non-missing count vs. Alpha Carbon Reference Correction

U+C Count / Non-missing count vs. Beta Carbon Reference Correction

Abs(CA) + Abs(CB) vs. Non-missing count

90% completion filter

95% completion filter

Using Square of the Inverse Determinant of Coveriance Matrix (this is not usable!)

##        CA                CB          
##  Min.   :-5.0000   Min.   :-4.99990  
##  1st Qu.:-2.6388   1st Qu.:-2.84651  
##  Median :-0.3357   Median :-0.05903  
##  Mean   :-0.1561   Mean   : 0.02683  
##  3rd Qu.: 2.3782   3rd Qu.: 3.01497  
##  Max.   : 5.0000   Max.   : 5.00000

Remove the Max Determinant (not much improvement)

##        CA                 CB          
##  Min.   :-4.97568   Min.   :-4.16751  
##  1st Qu.:-0.23644   1st Qu.:-0.12351  
##  Median :-0.05458   Median : 0.03197  
##  Mean   :-0.04132   Mean   : 0.02433  
##  3rd Qu.: 0.13243   3rd Qu.: 0.18585  
##  Max.   : 5.00000   Max.   : 4.35506

Using Square Root of the Inverse Determinant of Coveriance Matrix (looks better)

##        CA                 CB          
##  Min.   :-4.15000   Min.   :-1.75835  
##  1st Qu.:-0.22075   1st Qu.:-0.10384  
##  Median :-0.07436   Median : 0.03097  
##  Mean   :-0.05924   Mean   : 0.03304  
##  3rd Qu.: 0.08312   3rd Qu.: 0.15511  
##  Max.   : 5.00000   Max.   : 4.48326

90% Completion Square Root of the Inverse Determinant of Coveriance Matrix

Summary of SquareRoot Results 90% Completion

##        CA                 CB          
##  Min.   :-0.73343   Min.   :-0.89546  
##  1st Qu.:-0.21132   1st Qu.:-0.11052  
##  Median :-0.06378   Median : 0.03027  
##  Mean   :-0.05287   Mean   : 0.01435  
##  3rd Qu.: 0.07108   3rd Qu.: 0.14018  
##  Max.   : 1.27954   Max.   : 0.66748

Summary of LACS 90% Completion

##        CA                  CB           
##  Min.   :-0.550000   Min.   :-0.550000  
##  1st Qu.:-0.090000   1st Qu.:-0.090000  
##  Median : 0.010000   Median : 0.010000  
##  Mean   : 0.009783   Mean   : 0.009783  
##  3rd Qu.: 0.110000   3rd Qu.: 0.110000  
##  Max.   : 0.490000   Max.   : 0.490000

Remove Two Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.70963   Min.   :-0.92587  
##  1st Qu.:-0.21322   1st Qu.:-0.10645  
##  Median :-0.06195   Median : 0.03487  
##  Mean   :-0.05116   Mean   : 0.01507  
##  3rd Qu.: 0.06299   3rd Qu.: 0.13816  
##  Max.   : 1.29250   Max.   : 0.67485

Remove Three Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.70863   Min.   :-0.93193  
##  1st Qu.:-0.21100   1st Qu.:-0.10983  
##  Median :-0.05815   Median : 0.03912  
##  Mean   :-0.05270   Mean   : 0.01649  
##  3rd Qu.: 0.06676   3rd Qu.: 0.14131  
##  Max.   : 1.30118   Max.   : 0.67709

Remove 4 Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.73359   Min.   :-0.88610  
##  1st Qu.:-0.20971   1st Qu.:-0.11929  
##  Median :-0.05710   Median : 0.04031  
##  Mean   :-0.05294   Mean   : 0.01788  
##  3rd Qu.: 0.07018   3rd Qu.: 0.14508  
##  Max.   : 1.32585   Max.   : 0.67154

Remove 5 Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.75626   Min.   :-0.86516  
##  1st Qu.:-0.21267   1st Qu.:-0.11347  
##  Median :-0.05519   Median : 0.03684  
##  Mean   :-0.04995   Mean   : 0.01669  
##  3rd Qu.: 0.07770   3rd Qu.: 0.14973  
##  Max.   : 1.32559   Max.   : 0.67093

Remove 6 Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.76516   Min.   :-0.85544  
##  1st Qu.:-0.20971   1st Qu.:-0.11109  
##  Median :-0.05808   Median : 0.03630  
##  Mean   :-0.04736   Mean   : 0.01536  
##  3rd Qu.: 0.08165   3rd Qu.: 0.14677  
##  Max.   : 1.33937   Max.   : 0.67026

Remove 7 Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.75674   Min.   :-0.81879  
##  1st Qu.:-0.20560   1st Qu.:-0.11030  
##  Median :-0.05803   Median : 0.03418  
##  Mean   :-0.04934   Mean   : 0.01456  
##  3rd Qu.: 0.07258   3rd Qu.: 0.14465  
##  Max.   : 1.33489   Max.   : 0.66557

Remove 8 Highest Det of Covariance Matrix

##        CA                 CB          
##  Min.   :-0.74338   Min.   :-0.83943  
##  1st Qu.:-0.20436   1st Qu.:-0.10594  
##  Median :-0.05363   Median : 0.03394  
##  Mean   :-0.04743   Mean   : 0.01497  
##  3rd Qu.: 0.08984   3rd Qu.: 0.13965  
##  Max.   : 1.35728   Max.   : 0.67879

One Dimension combine CA and CB (optimize diagnally)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.453152 -0.095690  0.001032 -0.011173  0.063432  0.381491
##   RefCorrValue         Type    
##  Min.   :-0.550000   LACS:277  
##  1st Qu.:-0.090000             
##  Median : 0.010000             
##  Mean   : 0.009783             
##  3rd Qu.: 0.110000             
##  Max.   : 0.490000
##           5%          50%          75%          95% 
## -0.225108591  0.001031718  0.063431464  0.186012343
##     5%    50%    75%    95% 
## -0.290  0.010  0.110  0.302
##       95% 
## 0.4111209
##   95% 
## 0.592

Test between two distribution

Here we want to show that Assigned BaMORC resutls and LACS results are practically the same if not superior.

## 
##  Welch Two Sample t-test
## 
## data:  DEoptim_results_narm_1D[, 1] and LACS_Carbon[, 1]
## t = -1.5972, df = 515.05, p-value = 0.1108
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.046732314  0.004820485
## sample estimates:
##    mean of x    mean of y 
## -0.011172521  0.009783394

DEOptim for BaMORC

## [1] "Max Iteration = 10"
##   RefCorrValue              Type    
##  Min.   :-0.61120   DEoptim_10:283  
##  1st Qu.:-0.06514                   
##  Median : 0.08099                   
##  Mean   : 0.07983                   
##  3rd Qu.: 0.20822                   
##  Max.   : 0.81431
## [1] "90% Confident Interval"
## [1] 0.7604993
## [1] "Max Iteration = 20"
##   RefCorrValue              Type    
##  Min.   :-0.60346   DEoptim_20:283  
##  1st Qu.:-0.06592                   
##  Median : 0.08138                   
##  Mean   : 0.07977                   
##  3rd Qu.: 0.20653                   
##  Max.   : 0.81938
## [1] "90% Confident Interval"
## [1] 0.7524359
## [1] "Max Iteration = 50"
##   RefCorrValue              Type    
##  Min.   :-0.60346   DEoptim_50:283  
##  1st Qu.:-0.06594                   
##  Median : 0.08137                   
##  Mean   : 0.07977                   
##  3rd Qu.: 0.20659                   
##  Max.   : 0.81935
## [1] "90% Confident Interval"
## [1] 0.7524119

ALl the data

## Warning: Removed 282 rows containing non-finite values (stat_ydensity).
## Warning: Removed 282 rows containing non-finite values (stat_boxplot).

RefCorr Value vs. LACS

## [1] "For all the data, the correlation between BaMORC and LACS:"
##                      BaMORC_assigned_corr LACS_results_corr
## BaMORC_assigned_corr           1.00000000        0.06184824
## LACS_results_corr              0.06184824        1.00000000

## [1] "***************************"
## [1] "For 90% completion, the correlation between BaMORC and LACS:"
##            [,1]       [,2]
## [1,] 1.00000000 0.08286666
## [2,] 0.08286666 1.00000000

Abs(RefCorr Value) vs. abs(RefDB[CA-CB])

## [1] "abs(RefCorr Value) vs. abs(CS_Diff_CA)+abs(CS_Diff_CB)"
##            absRefCorr   CS_Diff
## absRefCorr  1.0000000 0.1794524
## CS_Diff     0.1794524 1.0000000
## Warning in plot.window(...): "Main" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "Main" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in box(...): "Main" is not a graphical parameter
## Warning in title(...): "Main" is not a graphical parameter

## [1] "abs(RefCorr Value) vs. abs(RMSD_CA-RMSD_CB)"
##            absRefCorr       RMSD
## absRefCorr 1.00000000 0.09117283
## RMSD       0.09117283 1.00000000
## Warning in plot.window(...): "Main" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "Main" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in box(...): "Main" is not a graphical parameter
## Warning in title(...): "Main" is not a graphical parameter