Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 2
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0       15       30       60       80

## [1] "Nonzero features - Signal"
##  [1]   1  30  60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  49   20        0.6228589 160      1305         24.5901639
##  59   18        0.5605730 130       920         17.3355945
##  64   18        0.5605730  30       608         11.4565668
##  213  18        0.5605730 260       507          9.5534200
##  254  18        0.5605730 200       502          9.4592048
##  57   17        0.5294301 230       286          5.3891087
##  190  17        0.5294301 300       212          3.9947239
##  232  17        0.5294301 100       176          3.3163746
##  242  17        0.5294301 273       164          3.0902581
##  285  17        0.5294301 214       131          2.4684379
##  35   16        0.4982871   1        97          1.8277746
##  75   16        0.4982871 222        84          1.5828151
##  132  16        0.4982871 142        63          1.1871114
##  153  16        0.4982871  25        44          0.8290936
##  159  16        0.4982871 258        35          0.6595063

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..