Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] "C:/Users/simeonem/Documents/CBDA-SL/ExperimentsNov2016/NULL9000/NEW"
## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  
##  393  13        0.1838755
##  49   11        0.1555870
##  607  11        0.1555870
##  840  11        0.1555870
##  913  11        0.1555870
##  928  11        0.1555870
##  1104 11        0.1555870
##  1145 11        0.1555870
##  1307 11        0.1555870
##  1491 11        0.1555870
##  125  10        0.1414427
##  174  10        0.1414427
##  203  10        0.1414427
##  260  10        0.1414427
##  349  10        0.1414427
## [1] "EXPERIMENT" "1"         
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  
##  519  23        0.1387465
##  453  21        0.1266815
##  549  21        0.1266815
##  986  21        0.1266815
##  1257 21        0.1266815
##  652  20        0.1206491
##  10   19        0.1146166
##  174  19        0.1146166
##  867  19        0.1146166
##  970  19        0.1146166
##  1169 19        0.1146166
##  1422 19        0.1146166
##  168  18        0.1085842
##  176  18        0.1085842
##  319  18        0.1085842
## [1] "EXPERIMENT" "2"         
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..