## CA CB
## Min. :-1.390000 Min. :-1.390000
## 1st Qu.:-0.110000 1st Qu.:-0.110000
## Median : 0.010000 Median : 0.010000
## Mean : 0.002056 Mean : 0.002056
## 3rd Qu.: 0.110000 3rd Qu.: 0.110000
## Max. : 4.840000 Max. : 4.840000
## CA CB
## Min. :-4.15000 Min. :-1.77344
## 1st Qu.:-0.23573 1st Qu.:-0.12265
## Median :-0.05567 Median : 0.03180
## Mean :-0.04364 Mean : 0.03304
## 3rd Qu.: 0.13093 3rd Qu.: 0.18657
## Max. : 5.00000 Max. : 4.35558
## CA CB
## Min. :-4.14999 Min. :-2.24017
## 1st Qu.:-0.35255 1st Qu.:-0.11886
## Median :-0.14256 Median : 0.04536
## Mean :-0.13435 Mean : 0.05825
## 3rd Qu.: 0.07161 3rd Qu.: 0.20881
## Max. : 5.00000 Max. : 4.51184
BaMORC didn’t perform well on the following dataset, which have too many missing value and monotonic secondary structure typing coil. The quality of the data is an issue. THe outlier is similar to the LACS.
## ID AA SS CA CB N
## [1,] "1" "M" "C" "NA" "NA" "NA"
## [2,] "2" "S" "C" "NA" "NA" "NA"
## [3,] "3" "V" "C" "NA" "NA" "NA"
## [4,] "4" "N" "C" "NA" "NA" "NA"
## [5,] "5" "S" "C" "53.90" "66.60" "NA"
## [6,] "6" "N" "C" "48.60" "34.10" "NA"
## ID AA SS CA CB N
## [1,] "1" "M" "U" "54.71" "32.73" "171.21"
## [2,] "2" "K" "U" "55.98" "32.96" "175.31"
## [3,] "3" "L" "U" "55.20" "42.58" "176.72"
## [4,] "4" "S" "U" "57.82" "63.93" "174.02"
## [5,] "5" "E" "U" "58.39" "29.26" "176.75"
## [6,] "6" "Y" "U" "60.14" "38.16" "176.47"
## ID AA SS CA CB N
## [1,] "1" "M" "U" "56.62" "33.09" "177.95"
## [2,] "2" "S" "U" "59.26" "64.58" "176.09"
## [3,] "3" "A" "U" "53.70" "20.34" "179.70"
## [4,] "4" "T" "U" "62.65" "70.64" "176.07"
## [5,] "5" "A" "U" "53.45" "20.33" "179.11"
## [6,] "6" "A" "U" "53.49" "20.33" "179.56"
## [1] 1557
## [1] 1557
##
## Call:
## lm(formula = DEoptim_results$CA ~ ill_data_stat$CA_NA_Freq +
## ill_data_stat$U_Freq + ill_data_stat$C_freq + ill_data_stat$Data_Len -
## 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5514 -0.1983 -0.0193 0.1666 4.9015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ill_data_stat$CA_NA_Freq 0.2869026 0.0808165 3.550 0.000397 ***
## ill_data_stat$U_Freq -0.4998522 0.1165037 -4.290 1.90e-05 ***
## ill_data_stat$C_freq 0.0848052 0.0419326 2.022 0.043315 *
## ill_data_stat$Data_Len -0.0006439 0.0001390 -4.632 3.94e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3493 on 1459 degrees of freedom
## (94 observations deleted due to missingness)
## Multiple R-squared: 0.04109, Adjusted R-squared: 0.03846
## F-statistic: 15.63 on 4 and 1459 DF, p-value: 1.576e-12
##
## Call:
## lm(formula = DEoptim_results$CB ~ ill_data_stat$CB_NA_Freq +
## ill_data_stat$U_Freq + ill_data_stat$C_freq + ill_data_stat$Data_Len -
## 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9426 -0.1546 -0.0076 0.1598 4.2058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ill_data_stat$CB_NA_Freq 2.390e-01 6.160e-02 3.879 0.000109 ***
## ill_data_stat$U_Freq 5.324e-01 1.060e-01 5.022 5.74e-07 ***
## ill_data_stat$C_freq -7.080e-02 4.190e-02 -1.690 0.091323 .
## ill_data_stat$Data_Len 1.742e-05 1.327e-04 0.131 0.895566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3258 on 1459 degrees of freedom
## (94 observations deleted due to missingness)
## Multiple R-squared: 0.05821, Adjusted R-squared: 0.05562
## F-statistic: 22.54 on 4 and 1459 DF, p-value: < 2.2e-16
## CA CB
## Min. :-5.0000 Min. :-4.99990
## 1st Qu.:-2.6388 1st Qu.:-2.84651
## Median :-0.3357 Median :-0.05903
## Mean :-0.1561 Mean : 0.02683
## 3rd Qu.: 2.3782 3rd Qu.: 3.01497
## Max. : 5.0000 Max. : 5.00000
## CA CB
## Min. :-4.97568 Min. :-4.16751
## 1st Qu.:-0.23644 1st Qu.:-0.12351
## Median :-0.05458 Median : 0.03197
## Mean :-0.04132 Mean : 0.02433
## 3rd Qu.: 0.13243 3rd Qu.: 0.18585
## Max. : 5.00000 Max. : 4.35506
## CA CB
## Min. :-4.15000 Min. :-1.75835
## 1st Qu.:-0.22075 1st Qu.:-0.10384
## Median :-0.07436 Median : 0.03097
## Mean :-0.05924 Mean : 0.03304
## 3rd Qu.: 0.08312 3rd Qu.: 0.15511
## Max. : 5.00000 Max. : 4.48326
## CA CB
## Min. :-0.73343 Min. :-0.89546
## 1st Qu.:-0.21132 1st Qu.:-0.11052
## Median :-0.06378 Median : 0.03027
## Mean :-0.05287 Mean : 0.01435
## 3rd Qu.: 0.07108 3rd Qu.: 0.14018
## Max. : 1.27954 Max. : 0.66748
## CA CB
## Min. :-0.550000 Min. :-0.550000
## 1st Qu.:-0.090000 1st Qu.:-0.090000
## Median : 0.010000 Median : 0.010000
## Mean : 0.009783 Mean : 0.009783
## 3rd Qu.: 0.110000 3rd Qu.: 0.110000
## Max. : 0.490000 Max. : 0.490000
## CA CB
## Min. :-0.70963 Min. :-0.92587
## 1st Qu.:-0.21322 1st Qu.:-0.10645
## Median :-0.06195 Median : 0.03487
## Mean :-0.05116 Mean : 0.01507
## 3rd Qu.: 0.06299 3rd Qu.: 0.13816
## Max. : 1.29250 Max. : 0.67485
## CA CB
## Min. :-0.70863 Min. :-0.93193
## 1st Qu.:-0.21100 1st Qu.:-0.10983
## Median :-0.05815 Median : 0.03912
## Mean :-0.05270 Mean : 0.01649
## 3rd Qu.: 0.06676 3rd Qu.: 0.14131
## Max. : 1.30118 Max. : 0.67709
## CA CB
## Min. :-0.73359 Min. :-0.88610
## 1st Qu.:-0.20971 1st Qu.:-0.11929
## Median :-0.05710 Median : 0.04031
## Mean :-0.05294 Mean : 0.01788
## 3rd Qu.: 0.07018 3rd Qu.: 0.14508
## Max. : 1.32585 Max. : 0.67154
## CA CB
## Min. :-0.75626 Min. :-0.86516
## 1st Qu.:-0.21267 1st Qu.:-0.11347
## Median :-0.05519 Median : 0.03684
## Mean :-0.04995 Mean : 0.01669
## 3rd Qu.: 0.07770 3rd Qu.: 0.14973
## Max. : 1.32559 Max. : 0.67093
## CA CB
## Min. :-0.76516 Min. :-0.85544
## 1st Qu.:-0.20971 1st Qu.:-0.11109
## Median :-0.05808 Median : 0.03630
## Mean :-0.04736 Mean : 0.01536
## 3rd Qu.: 0.08165 3rd Qu.: 0.14677
## Max. : 1.33937 Max. : 0.67026
## CA CB
## Min. :-0.75674 Min. :-0.81879
## 1st Qu.:-0.20560 1st Qu.:-0.11030
## Median :-0.05803 Median : 0.03418
## Mean :-0.04934 Mean : 0.01456
## 3rd Qu.: 0.07258 3rd Qu.: 0.14465
## Max. : 1.33489 Max. : 0.66557
## CA CB
## Min. :-0.74338 Min. :-0.83943
## 1st Qu.:-0.20436 1st Qu.:-0.10594
## Median :-0.05363 Median : 0.03394
## Mean :-0.04743 Mean : 0.01497
## 3rd Qu.: 0.08984 3rd Qu.: 0.13965
## Max. : 1.35728 Max. : 0.67879
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.453152 -0.095690 0.001032 -0.011173 0.063432 0.381491
## RefCorrValue Type
## Min. :-0.550000 LACS:277
## 1st Qu.:-0.090000
## Median : 0.010000
## Mean : 0.009783
## 3rd Qu.: 0.110000
## Max. : 0.490000
## 5% 50% 75% 95%
## -0.225108591 0.001031718 0.063431464 0.186012343
## 5% 50% 75% 95%
## -0.290 0.010 0.110 0.302
## 95%
## 0.4111209
## 95%
## 0.592
Here we want to show that Assigned BaMORC resutls and LACS results are practically the same if not superior.
##
## Welch Two Sample t-test
##
## data: DEoptim_results_narm_1D[, 1] and LACS_Carbon[, 1]
## t = -1.5972, df = 515.05, p-value = 0.1108
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.046732314 0.004820485
## sample estimates:
## mean of x mean of y
## -0.011172521 0.009783394
## [1] "Max Iteration = 10"
## RefCorrValue Type
## Min. :-0.61120 DEoptim_10:283
## 1st Qu.:-0.06514
## Median : 0.08099
## Mean : 0.07983
## 3rd Qu.: 0.20822
## Max. : 0.81431
## [1] "90% Confident Interval"
## [1] 0.7604993
## [1] "Max Iteration = 20"
## RefCorrValue Type
## Min. :-0.60346 DEoptim_20:283
## 1st Qu.:-0.06592
## Median : 0.08138
## Mean : 0.07977
## 3rd Qu.: 0.20653
## Max. : 0.81938
## [1] "90% Confident Interval"
## [1] 0.7524359
## [1] "Max Iteration = 50"
## RefCorrValue Type
## Min. :-0.60346 DEoptim_50:283
## 1st Qu.:-0.06594
## Median : 0.08137
## Mean : 0.07977
## 3rd Qu.: 0.20659
## Max. : 0.81935
## [1] "90% Confident Interval"
## [1] 0.7524119
## Warning: Removed 282 rows containing non-finite values (stat_ydensity).
## Warning: Removed 282 rows containing non-finite values (stat_boxplot).
## [1] "For all the data, the correlation between BaMORC and LACS:"
## BaMORC_assigned_corr LACS_results_corr
## BaMORC_assigned_corr 1.00000000 0.06184824
## LACS_results_corr 0.06184824 1.00000000
## [1] "***************************"
## [1] "For 90% completion, the correlation between BaMORC and LACS:"
## [,1] [,2]
## [1,] 1.00000000 0.08286666
## [2,] 0.08286666 1.00000000
## [1] "abs(RefCorr Value) vs. abs(CS_Diff_CA)+abs(CS_Diff_CB)"
## absRefCorr CS_Diff
## absRefCorr 1.0000000 0.1794524
## CS_Diff 0.1794524 1.0000000
## Warning in plot.window(...): "Main" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "Main" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in box(...): "Main" is not a graphical parameter
## Warning in title(...): "Main" is not a graphical parameter
## [1] "abs(RefCorr Value) vs. abs(RMSD_CA-RMSD_CB)"
## absRefCorr RMSD
## absRefCorr 1.00000000 0.09117283
## RMSD 0.09117283 1.00000000
## Warning in plot.window(...): "Main" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "Main" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "Main" is not
## a graphical parameter
## Warning in box(...): "Main" is not a graphical parameter
## Warning in title(...): "Main" is not a graphical parameter