Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] "C:/Users/simeonem/Documents/CBDA-SL/ExperimentsNov2016/NULL9000/NEW"
## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density   Knockoff Density 
##  749  16        0.3254017 749      9.806157
##  285  15        0.3050641  32      5.074116
##  722  15        0.3050641 101      3.477765
##  32   14        0.2847265 324      3.078677
##  119  13        0.2643889 348      2.793615
##  63   12        0.2440513 526      2.508552
##  216  12        0.2440513 701      2.451539
##  315  12        0.2440513 250      2.052452
##  324  12        0.2440513 772      1.995439
##  546  12        0.2440513 471      1.938426
##  612  12        0.2440513 321      1.881414
##  790  12        0.2440513 368      1.824401
##  132  11        0.2237136 527      1.767389
##  268  11        0.2237136 653      1.767389
##  320  11        0.2237136 454      1.710376
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density   Knockoff Density 
##  661  21        0.2114591 749      7.674944
##  699  20        0.2013896 324      3.837472
##  744  20        0.2013896 701      3.611738
##  271  19        0.1913201 348      3.386005
##  318  19        0.1913201 772      3.386005
##  324  19        0.1913201  32      3.160271
##  513  19        0.1913201 101      2.934537
##  296  18        0.1812506 321      2.708804
##  334  18        0.1812506 526      2.708804
##  345  18        0.1812506 454      2.483070
##  437  18        0.1812506  13      2.031603
##  749  18        0.1812506 177      2.031603
##  758  18        0.1812506 250      2.031603
##  30   17        0.1711811 388      2.031603
##  32   17        0.1711811 471      2.031603
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
## [1] "Combined set of features selected across multiple experiments"
##  [1]  32  63 101 119 132 216 250 268 285 315 320 321 324 348 368 454 471
## [18] 526 527 546 612 653 701 722 749 772 790
## [1] "Top best features selected across multiple experiments"
## [1] 15
## [1] "Length of top best features selected across multiple experiments"
## [1] 27

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..