Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      53    0.3574560 687 33    0.2292622 800      144   8.4955752
##  900      53    0.3574560 392 30    0.2084202 600      132   7.7876106
##  800      49    0.3304782 380 29    0.2014728 900      124   7.3156342
##  400      43    0.2900115 835 29    0.2014728 100      100   5.8997050
##  409      32    0.2158225 95  28    0.1945255 500       88   5.1917404
##  500      32    0.2158225 356 28    0.1945255 300       75   4.4247788
##  706      32    0.2158225 430 28    0.1945255   1       69   4.0707965
##  32       31    0.2090780 546 27    0.1875782 200       58   3.4218289
##  100      31    0.2090780 54  26    0.1806308 400       39   2.3008850
##  257      30    0.2023336 438 26    0.1806308 108       36   2.1238938
##  386      30    0.2023336 482 26    0.1806308 574       22   1.2979351
##  600      30    0.2023336 846 26    0.1806308 700       19   1.1209440
##  611      30    0.2023336 858 26    0.1806308 424       16   0.9439528
##  511      29    0.1955891 38  25    0.1736835 650       16   0.9439528
##  482      28    0.1888447 151 25    0.1736835 139       15   0.8849558
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      134   0.2798254 325 73    0.1682221 600      208   8.3467095
##  800      113   0.2359722 370 73    0.1682221 900      203   8.1460674
##  600      107   0.2234427 502 73    0.1682221 800      199   7.9855538
##  900      107   0.2234427 374 71    0.1636133 100      187   7.5040128
##  100       99   0.2067367 871 70    0.1613089   1      164   6.5810594
##  400       92   0.1921189 211 69    0.1590045 300      136   5.4574639
##  409       75   0.1566187 404 69    0.1590045 500      122   4.8956661
##  222       74   0.1545305 43  68    0.1567001 108       74   2.9695024
##  79        73   0.1524422 145 67    0.1543957 200       70   2.8089888
##  295       73   0.1524422 68  66    0.1520913 400       54   2.1669342
##  709       73   0.1524422 279 66    0.1520913 574       43   1.7255217
##  52        72   0.1503540 223 65    0.1497868 700       39   1.5650080
##  173       72   0.1503540 384 65    0.1497868 424       32   1.2841091
##  200       72   0.1503540 427 65    0.1497868 765       24   0.9630819
##  30        71   0.1482657 533 65    0.1497868 379       23   0.9229535
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      234   0.2187345 422 142   0.1460470 500      5     14.705882
##  800      230   0.2149955 412 141   0.1450185   1      3      8.823529
##  900      211   0.1972350 683 137   0.1409045 100      3      8.823529
##  100      196   0.1832135 96  136   0.1398760 800      3      8.823529
##  600      182   0.1701268 151 135   0.1388475 900      3      8.823529
##  400      166   0.1551706 338 134   0.1378190 139      2      5.882353
##  200      158   0.1476925 359 133   0.1367905 548      2      5.882353
##  1        155   0.1448882 728 133   0.1367905 574      2      5.882353
##  191      147   0.1374101 226 132   0.1357620  14      1      2.941176
##  377      145   0.1355406 370 132   0.1357620 112      1      2.941176
##  549      145   0.1355406 628 132   0.1357620 246      1      2.941176
##  222      144   0.1346059 390 131   0.1347335 563      1      2.941176
##  362      144   0.1346059 822 131   0.1347335 590      1      2.941176
##  500      144   0.1346059 572 130   0.1337050 596      1      2.941176
##  586      144   0.1346059 804 130   0.1337050 676      1      2.941176
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density 
##  300      59    0.4052476 156 32    0.2279527 600      216   9.751693
##  496      42    0.2884814 874 31    0.2208292 800      205   9.255079
##  409      33    0.2266639 515 30    0.2137057 100      186   8.397291
##  718      33    0.2266639 8   28    0.1994586 900      185   8.352144
##  782      33    0.2266639 119 28    0.1994586 500      157   7.088036
##  556      32    0.2197953 712 28    0.1994586 300      151   6.817156
##  765      32    0.2197953 99  27    0.1923351   1      145   6.546275
##  244      31    0.2129267 249 27    0.1923351 200      104   4.695260
##  441      30    0.2060581 426 27    0.1923351 400       76   3.431151
##  611      30    0.2060581 4   26    0.1852116 108       72   3.250564
##  650      30    0.2060581 507 26    0.1852116 700       54   2.437923
##  845      30    0.2060581 555 26    0.1852116 574       33   1.489842
##  783      29    0.1991895 610 26    0.1852116 738       27   1.218962
##  386      28    0.1923209 638 26    0.1852116 424       24   1.083521
##  600      28    0.1923209 163 25    0.1780880 548       24   1.083521
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density   
##  300      132   0.2769792 197 72    0.1720225 600      557   13.1027993
##  900      100   0.2098328 38  69    0.1648549 800      508   11.9501294
##  400       88   0.1846528 566 69    0.1648549 900      469   11.0326982
##  496       85   0.1783578 188 68    0.1624657 100      451   10.6092684
##  650       85   0.1783578 660 67    0.1600765   1      375    8.8214538
##  593       83   0.1741612 713 67    0.1600765 300      369    8.6803105
##  764       81   0.1699645 570 66    0.1576873 500      346    8.1392614
##  332       80   0.1678662 228 65    0.1552981 200      221    5.1987768
##  409       79   0.1657679 511 65    0.1552981 108      143    3.3639144
##  800       76   0.1594729 872 65    0.1552981 400      132    3.1051517
##  173       75   0.1573746 159 64    0.1529089 700       87    2.0465773
##  315       74   0.1552762 669 64    0.1529089 424       56    1.3173371
##  782       74   0.1552762 670 64    0.1529089 379       34    0.7998118
##  282       73   0.1531779 219 63    0.1505197 762       32    0.7527641
##  643       73   0.1531779 42  62    0.1481305 738       31    0.7292402
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      212   0.2026517 487 140   0.1471114 600      169   10.910265
##  800      192   0.1835336 242 139   0.1460606 100      159   10.264687
##  100      182   0.1739746 852 136   0.1429082 800      151    9.748225
##  900      181   0.1730186 149 134   0.1408066 900      135    8.715300
##  400      178   0.1701509 294 134   0.1408066   1      126    8.134280
##  600      165   0.1577242 620 133   0.1397558 300      100    6.455778
##  650      159   0.1519888 648 132   0.1387050 500      100    6.455778
##  700      156   0.1491210 740 132   0.1387050 200       89    5.745642
##  874      148   0.1414738 744 131   0.1376542 108       61    3.938025
##  892      147   0.1405179 877 131   0.1376542 400       57    3.679793
##  496      145   0.1386061 893 131   0.1376542 700       28    1.807618
##  93       142   0.1357384 232 130   0.1366034 574       26    1.678502
##  282      142   0.1357384 263 130   0.1366034 762       21    1.355713
##  500      142   0.1357384 286 130   0.1366034 379       19    1.226598
##  79       141   0.1347825 482 130   0.1366034 424       19    1.226598
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..