This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 218 26 4.770642 273 10.511478
## 273 21 3.853211 200 8.900524
## 279 19 3.486239 160 6.443818
## 120 16 2.935780 169 5.557793
## 11 15 2.752294 197 4.993959
## 121 15 2.752294 225 4.349577
## 161 15 2.752294 43 3.946839
## 186 15 2.752294 130 3.342731
## 219 15 2.752294 28 2.859444
## 227 15 2.752294 230 2.537253
## 252 15 2.752294 262 2.376158
## 277 15 2.752294 239 2.215062
## 31 12 2.201835 13 2.094241
## 98 12 2.201835 300 1.973419
## 267 12 2.201835 25 1.852598
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"
## CBDA Frequency Density Knockoff Density
## 289 19 1.2500000 160 15.011472
## 88 17 1.1184211 30 14.913143
## 102 17 1.1184211 130 10.291708
## 44 15 0.9868421 200 8.390692
## 211 15 0.9868421 260 8.292363
## 34 14 0.9210526 230 8.226811
## 125 14 0.9210526 129 4.195346
## 207 14 0.9210526 183 3.605375
## 48 13 0.8552632 300 3.572599
## 66 13 0.8552632 294 3.179285
## 72 13 0.8552632 23 2.687643
## 106 13 0.8552632 140 2.064897
## 22 12 0.7894737 214 2.032121
## 113 12 0.7894737 224 1.507702
## 157 12 0.7894737 273 1.245493
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "3"
## CBDA Frequency Density Knockoff Density
## 144 32 0.9137636 260 16.700164
## 273 28 0.7995431 130 14.251781
## 37 25 0.7138778 160 12.826603
## 51 25 0.7138778 300 11.748584
## 285 25 0.7138778 30 8.057738
## 55 24 0.6853227 230 5.426640
## 218 24 0.6853227 214 4.732322
## 231 24 0.6853227 100 3.252330
## 241 24 0.6853227 200 2.558012
## 242 24 0.6853227 4 2.283939
## 159 23 0.6567676 1 2.229125
## 274 23 0.6567676 43 2.137767
## 252 22 0.6282125 129 1.735794
## 263 22 0.6282125 183 1.425178
## 87 21 0.5996573 67 1.406907
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "4"
## CBDA Frequency Density Knockoff Density
## 291 13 2.691511 30 13.692390
## 256 12 2.484472 160 7.181136
## 99 11 2.277433 273 7.100750
## 224 11 2.277433 300 6.216506
## 59 10 2.070393 200 5.359057
## 148 10 2.070393 214 5.332262
## 167 10 2.070393 43 4.957128
## 35 9 1.863354 130 4.448017
## 76 9 1.863354 269 3.912111
## 122 9 1.863354 225 3.322615
## 239 9 1.863354 183 3.295820
## 287 9 1.863354 230 3.001072
## 34 8 1.656315 227 2.652733
## 64 8 1.656315 264 2.625938
## 70 8 1.656315 129 2.197213
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 212 18 1.2738854 160 11.578947
## 93 16 1.1323425 30 10.684211
## 201 16 1.1323425 130 8.438596
## 145 14 0.9907997 1 7.912281
## 261 14 0.9907997 300 7.315789
## 290 14 0.9907997 260 7.298246
## 4 13 0.9200283 200 6.912281
## 36 13 0.9200283 273 6.421053
## 89 13 0.9200283 100 6.157895
## 100 13 0.9200283 129 4.421053
## 148 13 0.9200283 25 4.228070
## 158 13 0.9200283 142 3.842105
## 27 12 0.8492569 230 3.385965
## 56 12 0.8492569 76 2.526316
## 99 12 0.8492569 279 2.526316
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 70 25 0.7409603 30 18.3901023
## 215 25 0.7409603 130 18.0693447
## 46 24 0.7113219 160 17.5042004
## 107 24 0.7113219 260 12.0360470
## 124 24 0.7113219 200 9.4394379
## 13 23 0.6816835 230 7.7287307
## 33 23 0.6816835 300 6.7664579
## 53 23 0.6816835 100 3.2992210
## 175 23 0.6816835 273 1.4815946
## 235 23 0.6816835 129 0.8859019
## 122 22 0.6520451 142 0.8400794
## 144 22 0.6520451 225 0.8248053
## 241 22 0.6520451 183 0.6567894
## 280 22 0.6520451 214 0.4734993
## 294 22 0.6520451 43 0.3207576
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..