Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] "C:/Users/simeonem/Documents/CBDA-SL/ExperimentsNov2016/NULL9000/NEW"
## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density  
##  41   16        3.319502 49       11.173184
##  19   15        3.112033 51        8.491620
##  37   14        2.904564 41        7.709497
##  76   10        2.074689 63        5.251397
##  26    9        1.867220  7        5.027933
##  34    9        1.867220  1        4.972067
##  50    9        1.867220 37        3.463687
##  73    9        1.867220 14        3.072626
##  2     8        1.659751 55        2.960894
##  7     8        1.659751 44        2.849162
##  8     8        1.659751 77        2.625698
##  13    8        1.659751 50        2.402235
##  49    8        1.659751  8        2.234637
##  84    8        1.659751 39        1.620112
##  90    8        1.659751 64        1.620112
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density  
##  41   24        2.203857 49       24.142012
##  15   19        1.744720 51       11.597633
##  19   18        1.652893 41        9.704142
##  7    17        1.561065  7        8.047337
##  37   17        1.561065 63        5.443787
##  90   17        1.561065  1        5.325444
##  8    16        1.469238 77        4.260355
##  20   16        1.469238 50        3.905325
##  59   16        1.469238 55        3.431953
##  63   16        1.469238 14        2.958580
##  17   15        1.377410 44        2.603550
##  66   15        1.377410  8        2.366864
##  84   15        1.377410 37        2.248521
##  2    14        1.285583 39        1.420118
##  26   14        1.285583 33        1.183432
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
## [1] "Combined set of features selected across multiple experiments"
##  [1]  1  2  7  8 13 14 15 19 26 34 37 41 44 49 50 51 55 63 73 76 77
## [1] "Top best features selected across multiple experiments"
## [1] 15
## [1] "Length of top best features selected across multiple experiments"
## [1] 21

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..