Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        5       15       60       80

## [1] "Nonzero features - Signal"
##  [1]   1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  800  19        0.3997475 800      595          20.453764 
##  200  16        0.3366295 400      361          12.409763 
##  471  14        0.2945508 900      275           9.453420 
##  398  13        0.2735115 840      167           5.740804 
##  520  13        0.2735115 737      144           4.950155 
##  132  12        0.2524721 200      141           4.847026 
##  196  12        0.2524721   1      115           3.953249 
##  403  12        0.2524721  34      101           3.471983 
##  625  12        0.2524721 100       93           3.196975 
##  65   11        0.2314328   6       64           2.200069 
##  133  11        0.2314328 808       48           1.650052 
##  291  11        0.2314328 342       47           1.615675 
##  405  11        0.2314328 462       43           1.478171 
##  431  11        0.2314328 700       43           1.478171 
##  462  11        0.2314328 537       40           1.375043 
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0       15       30       60       80

## [1] "Nonzero features - Signal"
##  [1]   1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  800  28        0.2659827 800      75           12.998267 
##  200  25        0.2374846 400      64           11.091854 
##  342  23        0.2184858 900      44            7.625650 
##  700  23        0.2184858 737      32            5.545927 
##  309  22        0.2089864 840      30            5.199307 
##  498  22        0.2089864 200      26            4.506066 
##  421  21        0.1994870 100      25            4.332756 
##  578  21        0.1994870   1      23            3.986135 
##  625  21        0.1994870   6      21            3.639515 
##  226  20        0.1899877  34      18            3.119584 
##  288  20        0.1899877 700      13            2.253033 
##  358  20        0.1899877 462      11            1.906412 
##  132  19        0.1804883 471      11            1.906412 
##  248  19        0.1804883 226       9            1.559792 
##  259  19        0.1804883 342       9            1.559792

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..