Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
## [1]  90 900
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1]  300 1501
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density   Knockoff Density  
##  800  22        0.4843681 800      21.219598
##  700  18        0.3963012 900      12.442721
##  353  14        0.3082343 400      12.266479
##  200  13        0.2862175   1       6.520973
##  44   12        0.2642008 840       5.992245
##  249  12        0.2642008   8       3.489602
##  472  12        0.2642008 808       3.172365
##  877  12        0.2642008 200       2.361650
##  6    11        0.2421841 471       1.797674
##  96   11        0.2421841  34       1.691928
##  129  11        0.2421841 700       1.480437
##  321  11        0.2421841 415       1.374692
##  415  11        0.2421841 737       1.339443
##  500  11        0.2421841 100       1.268946
##  741  11        0.2421841 317       1.092704
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
##  [1] 800 700 353 200  44 249 472 877   6  96 129 321 415 500 741 900 400
## [18]   1 840   8 808 471  34 737 100 317

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..