This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000 0 5 15 60 80
## [1] "Nonzero features - Signal"
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
## CBDA Frequency Density Knockoff KO_Frequency KO_Density
## 800 19 0.3997475 800 595 20.453764
## 200 16 0.3366295 400 361 12.409763
## 471 14 0.2945508 900 275 9.453420
## 398 13 0.2735115 840 167 5.740804
## 520 13 0.2735115 737 144 4.950155
## 132 12 0.2524721 200 141 4.847026
## 196 12 0.2524721 1 115 3.953249
## 403 12 0.2524721 34 101 3.471983
## 625 12 0.2524721 100 93 3.196975
## 65 11 0.2314328 6 64 2.200069
## 133 11 0.2314328 808 48 1.650052
## 291 11 0.2314328 342 47 1.615675
## 405 11 0.2314328 462 43 1.478171
## 431 11 0.2314328 700 43 1.478171
## 462 11 0.2314328 537 40 1.375043
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000 0 15 30 60 80
## [1] "Nonzero features - Signal"
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
## CBDA Frequency Density Knockoff KO_Frequency KO_Density
## 800 28 0.2659827 800 75 12.998267
## 200 25 0.2374846 400 64 11.091854
## 342 23 0.2184858 900 44 7.625650
## 700 23 0.2184858 737 32 5.545927
## 309 22 0.2089864 840 30 5.199307
## 498 22 0.2089864 200 26 4.506066
## 421 21 0.1994870 100 25 4.332756
## 578 21 0.1994870 1 23 3.986135
## 625 21 0.1994870 6 21 3.639515
## 226 20 0.1899877 34 18 3.119584
## 288 20 0.1899877 700 13 2.253033
## 358 20 0.1899877 462 11 1.906412
## 132 19 0.1804883 471 11 1.906412
## 248 19 0.1804883 226 9 1.559792
## 259 19 0.1804883 342 9 1.559792
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..