Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] "C:/Users/simeonem/Documents/CBDA-SL/ExperimentsNov2016/NULL9000/NEW"
## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density   Knockoff Density  
##  211  25        1.8037518 157      13.940724
##  285  20        1.4430014  47       7.683864
##  23   13        0.9379509 245       6.915477
##  65   12        0.8658009 211       5.817783
##  62   11        0.7936508 219       4.720088
##  219  11        0.7936508  54       3.293085
##  4    10        0.7215007  97       3.293085
##  6    10        0.7215007 206       3.073546
##  10   10        0.7215007 128       2.963776
##  267  10        0.7215007 182       2.744237
##  51    9        0.6493506 299       2.744237
##  60    9        0.6493506 300       2.634468
##  133   9        0.6493506  42       2.414929
##  221   9        0.6493506 285       2.414929
##  298   9        0.6493506 236       2.085620
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density   Knockoff Density  
##  285  22        0.6890072 157      20.588235
##  211  21        0.6576887 245       7.466063
##  20   19        0.5950517  47       6.674208
##  97   18        0.5637332 211       6.221719
##  157  18        0.5637332 219       4.638009
##  219  18        0.5637332  54       4.072398
##  237  18        0.5637332  97       4.072398
##  18   17        0.5324147 182       3.506787
##  137  17        0.5324147 206       3.506787
##  201  17        0.5324147 300       3.393665
##  226  17        0.5324147 299       2.941176
##  248  17        0.5324147 128       2.149321
##  296  17        0.5324147 285       1.809955
##  67   16        0.5010961 210       1.696833
##  100  16        0.5010961 296       1.696833
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
## [1] "Combined set of features selected across multiple experiments"
##  [1]   4   6  10  23  42  47  51  54  60  62  65  97 128 133 157 182 206
## [18] 211 219 245 267 285 299 300
## [1] "Top best features selected across multiple experiments"
## [1] 15
## [1] "Length of top best features selected across multiple experiments"
## [1] 24

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..