Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  130  25        1.4124294 160      17.038778
##  260  23        1.2994350 130      13.795535
##  273  15        0.8474576  30      10.975323
##  105  13        0.7344633 260       9.071680
##  95   12        0.6779661 200       9.048179
##  253  11        0.6214689 230       5.334900
##  297  11        0.6214689 300       4.770858
##  4    10        0.5649718 273       3.901293
##  18   10        0.5649718 214       3.807286
##  33   10        0.5649718 100       3.713278
##  47   10        0.5649718 222       2.655699
##  58   10        0.5649718   1       2.185664
##  87   10        0.5649718 142       1.997650
##  171  10        0.5649718  25       1.645123
##  190  10        0.5649718 129       1.480611
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"         
##  CBDA Frequency Density   Knockoff Density   
##  130  33        0.9780676 160      24.5901639
##  260  28        0.8298755 130      17.3355945
##  217  23        0.6816835  30      11.4565668
##  264  23        0.6816835 260       9.5534200
##  297  22        0.6520451 200       9.4592048
##  299  21        0.6224066 230       5.3891087
##  52   20        0.5927682 300       3.9947239
##  122  20        0.5927682 100       3.3163746
##  188  18        0.5334914 273       3.0902581
##  206  18        0.5334914 214       2.4684379
##  225  18        0.5334914   1       1.8277746
##  43   17        0.5038530 222       1.5828151
##  118  17        0.5038530 142       1.1871114
##  128  17        0.5038530  25       0.8290936
##  300  17        0.5038530 258       0.6595063
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
##  [1] 130 260 273 105 217 264  95 297 299 253  52 122 160  30 200 230 300
## [18] 100

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..