This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000 0 5 15 60 80
## [1] "Nonzero features - Signal"
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
## CBDA Frequency Density Knockoff KO_Frequency KO_Density
## 80 50 9.276438 100 833 10.107997
## 100 31 5.751391 80 826 10.023055
## 50 13 2.411874 90 808 9.804635
## 13 11 2.040816 30 786 9.537677
## 71 11 2.040816 70 557 6.758888
## 59 10 1.855288 94 486 5.897343
## 86 10 1.855288 60 456 5.533309
## 18 9 1.669759 20 419 5.084334
## 39 9 1.669759 46 414 5.023662
## 60 9 1.669759 50 343 4.162116
## 68 9 1.669759 68 253 3.070016
## 74 9 1.669759 13 177 2.147798
## 55 8 1.484230 39 164 1.990050
## 84 8 1.484230 58 87 1.055697
## 1 7 1.298701 33 85 1.031428
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000 0 15 30 60 80
## [1] "Nonzero features - Signal"
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
## CBDA Frequency Density Knockoff KO_Frequency KO_Density
## 80 48 3.924775 80 1955 17.5242022
## 100 42 3.434178 100 1735 15.5521692
## 39 24 1.962388 90 1649 14.7812836
## 50 24 1.962388 30 1534 13.7504482
## 38 20 1.635323 70 785 7.0365722
## 70 19 1.553557 60 642 5.7547508
## 64 18 1.471791 94 630 5.6471854
## 60 17 1.390025 20 467 4.1860882
## 66 17 1.390025 46 439 3.9351022
## 13 16 1.308258 50 359 3.2179993
## 27 16 1.308258 68 208 1.8644676
## 28 16 1.308258 13 141 1.2638939
## 40 16 1.308258 39 134 1.2011474
## 44 16 1.308258 62 44 0.3944066
## 45 16 1.308258 33 42 0.3764790
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..