Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        5       15       60       80

## [1] "Nonzero features - Signal"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  1    39        0.4757838  400     164          7.617278  
##  400  22        0.2683909    1     161          7.477938  
##  1161 13        0.1585946 1000     135          6.270320  
##  337  12        0.1463950  800     132          6.130980  
##  871  12        0.1463950  322      93          4.319554  
##  1135 12        0.1463950  251      63          2.926150  
##  1246 12        0.1463950 1500      61          2.833256  
##  226  11        0.1341954  384      58          2.693915  
##  282  11        0.1341954  537      53          2.461681  
##  413  11        0.1341954   25      49          2.275894  
##  476  11        0.1341954 1305      47          2.183000  
##  480  11        0.1341954 1200      45          2.090107  
##  505  11        0.1341954  200      42          1.950766  
##  570  11        0.1341954  496      42          1.950766  
##  800  11        0.1341954   83      41          1.904320  
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0       15       30       60       80

## [1] "Nonzero features - Signal"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## [1] "KNOCKOFF FILTER returned no feature"
## [1] "TABLE with CBDA-SL RESULTS"
##  CBDA Frequency Density  
##  1    38        0.2142656
##  400  34        0.1917113
##  60   24        0.1353256
##  200  24        0.1353256
##  100  22        0.1240485
##  775  22        0.1240485
##  458  21        0.1184099
##  600  21        0.1184099
##  800  21        0.1184099
##  1174 21        0.1184099
##  361  20        0.1127714
##  588  20        0.1127714
##  716  20        0.1127714
##  992  20        0.1127714
##  2    19        0.1071328

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..