Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  300  15        2.7027027  60      5.5408971
##  36    7        1.2612613 160      4.9252419
##  151   7        1.2612613 300      4.3975374
##  49    6        1.0810811  30      3.2248607
##  104   6        1.0810811 130      2.4626209
##  18    5        0.9009009 200      2.3746702
##  32    5        0.9009009   1      2.0815010
##  62    5        0.9009009 100      2.0228672
##  68    5        0.9009009 260      1.7003811
##  76    5        0.9009009  56      1.5537965
##  122   5        0.9009009 230      1.4658458
##  181   5        0.9009009 183      1.1726766
##  182   5        0.9009009 232      1.0847259
##  209   5        0.9009009 203      0.9381413
##  263   5        0.9009009 107      0.9088244
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density   
##  300  35        2.1021021  60      16.3090129
##  30   20        1.2012012 300      12.5751073
##  130  11        0.6606607 160      11.5879828
##  73   10        0.6006006  30       8.3690987
##  184  10        0.6006006 200       4.6351931
##  193  10        0.6006006 130       3.2618026
##  274  10        0.6006006  56       3.1759657
##  31    9        0.5405405 100       2.6609442
##  36    9        0.5405405   1       2.3605150
##  56    9        0.5405405 260       1.4592275
##  82    9        0.5405405 183       1.0729614
##  95    9        0.5405405 203       1.0729614
##  106   9        0.5405405 230       1.0729614
##  118   9        0.5405405 232       1.0300429
##  122   9        0.5405405  70       0.8154506
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density   
##  300  41        1.1398388  60      21.1706975
##  223  21        0.5838198 300      15.5976165
##  39   20        0.5560189 160      13.0389064
##  59   20        0.5560189  30       7.0802664
##  38   19        0.5282180 200       4.2411497
##  82   19        0.5282180  56       2.8040659
##  100  19        0.5282180 130       2.4886085
##  207  19        0.5282180   1       2.3133544
##  245  19        0.5282180 100       2.2783035
##  266  19        0.5282180 260       1.6123379
##  279  19        0.5282180 230       1.2968805
##  17   18        0.5004170 203       0.8412198
##  36   18        0.5004170 232       0.6309148
##  231  18        0.5004170 183       0.5958640
##  260  18        0.5004170 205       0.5958640
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..