This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 70 211 2.937900 63 114 1.766894 70 267 6.046196
## 32 193 2.687274 48 112 1.735896 40 240 5.434783
## 60 185 2.575884 43 105 1.627402 100 219 4.959239
## 80 167 2.325258 8 98 1.518909 30 216 4.891304
## 30 159 2.213868 49 94 1.456913 60 198 4.483696
## 90 152 2.116402 1 92 1.425914 80 183 4.144022
## 100 137 1.907547 47 89 1.379417 10 161 3.645833
## 10 130 1.810081 55 89 1.379417 90 158 3.577899
## 21 119 1.656920 52 86 1.332920 50 151 3.419384
## 57 119 1.656920 37 85 1.317421 32 118 2.672101
## 22 101 1.406294 33 84 1.301922 65 92 2.083333
## 79 98 1.364522 89 84 1.301922 20 79 1.788949
## 82 96 1.336675 93 84 1.301922 49 72 1.630435
## 39 94 1.308828 71 83 1.286423 94 61 1.381341
## 20 92 1.280980 88 82 1.270924 21 55 1.245471
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 70 432 1.993447 48 268 1.431395 70 586 12.5724094
## 60 362 1.670435 88 244 1.303210 100 541 11.6069513
## 32 356 1.642748 7 242 1.292528 40 527 11.3065866
## 80 347 1.601218 52 239 1.276505 30 521 11.1778588
## 30 345 1.591989 71 238 1.271164 60 376 8.0669384
## 90 340 1.568917 77 237 1.265823 80 299 6.4149324
## 10 317 1.462784 43 236 1.260482 90 267 5.7283845
## 100 297 1.370495 87 233 1.244459 10 261 5.5996567
## 21 266 1.227447 44 232 1.239118 50 204 4.3767432
## 57 263 1.213603 38 231 1.233777 32 133 2.8534649
## 46 255 1.176688 63 231 1.233777 65 79 1.6949153
## 55 250 1.153615 64 227 1.212413 20 76 1.6305514
## 79 250 1.153615 75 227 1.212413 49 60 1.2872774
## 39 246 1.135158 84 226 1.207072 21 44 0.9440034
## 22 240 1.107471 26 225 1.201730 94 41 0.8796396
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 70 784 1.673926 43 502 1.179068 70 1024 16.8089297
## 60 676 1.443334 8 497 1.167324 100 912 14.9704531
## 80 671 1.432659 1 493 1.157929 40 845 13.8706500
## 90 641 1.368605 23 492 1.155581 30 825 13.5423506
## 10 620 1.323768 9 488 1.146186 60 568 9.3237032
## 100 608 1.298147 2 481 1.129744 80 402 6.5988181
## 30 596 1.272525 38 480 1.127396 90 348 5.7124097
## 32 592 1.263985 91 479 1.125047 10 295 4.8424163
## 50 580 1.238364 44 478 1.122698 50 250 4.1037426
## 21 541 1.155094 7 477 1.120349 32 134 2.1996060
## 22 526 1.123068 63 477 1.120349 20 73 1.1982928
## 79 510 1.088906 81 476 1.118001 65 54 0.8864084
## 83 509 1.086771 33 475 1.115652 49 35 0.5745240
## 36 499 1.065420 18 474 1.113303 94 35 0.5745240
## 41 497 1.061150 78 474 1.113303 21 30 0.4924491
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 32 219 2.959859 48 158 2.414057 100 337 6.643012
## 60 185 2.500338 63 129 1.970970 30 325 6.406466
## 70 184 2.486823 43 119 1.818182 70 315 6.209344
## 80 157 2.121908 42 112 1.711230 40 307 6.051646
## 30 149 2.013786 49 107 1.634836 60 281 5.539129
## 90 146 1.973240 38 101 1.543163 90 253 4.987187
## 21 135 1.824571 8 100 1.527884 80 249 4.908338
## 57 122 1.648871 29 98 1.497326 50 216 4.257836
## 61 110 1.486687 84 96 1.466769 10 208 4.100138
## 10 106 1.432626 89 96 1.466769 32 162 3.193377
## 39 105 1.419111 37 95 1.451490 20 129 2.542874
## 22 104 1.405595 55 93 1.420932 65 110 2.168342
## 100 101 1.365049 47 92 1.405653 49 99 1.951508
## 41 97 1.310988 88 91 1.390374 21 97 1.912084
## 15 96 1.297473 92 91 1.390374 94 84 1.655825
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 80 338 1.566701 29 251 1.381246 100 831 12.3275478
## 60 317 1.469361 25 245 1.348228 70 799 11.8528408
## 10 315 1.460091 4 243 1.337222 30 758 11.2446225
## 21 306 1.418374 7 242 1.331719 40 754 11.1852841
## 90 303 1.404468 63 242 1.331719 60 665 9.8650052
## 55 300 1.390563 85 237 1.304204 80 525 7.7881620
## 70 299 1.385928 86 237 1.304204 10 453 6.7200712
## 32 289 1.339575 38 235 1.293198 90 434 6.4382139
## 79 272 1.260777 52 235 1.293198 50 405 6.0080107
## 67 268 1.242236 43 234 1.287695 32 191 2.8334075
## 100 268 1.242236 17 233 1.282192 20 125 1.8543243
## 83 259 1.200519 18 233 1.282192 65 100 1.4834594
## 35 258 1.195884 68 233 1.282192 94 67 0.9939178
## 22 255 1.181978 95 232 1.276689 21 65 0.9642486
## 15 254 1.177343 16 231 1.271186 49 64 0.9494140
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 10 20 30 40 50 60 70 80 90 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 70 679 1.463173 89 518 1.225542 70 1694 15.4986276
## 80 663 1.428695 75 501 1.185322 100 1577 14.4281793
## 90 624 1.344654 18 498 1.178224 40 1511 13.8243367
## 55 596 1.284317 17 496 1.173492 30 1481 13.5498628
## 32 590 1.271387 96 496 1.173492 60 1157 10.5855444
## 60 586 1.262768 47 492 1.164029 80 893 8.1701738
## 10 572 1.232599 84 489 1.156931 90 774 7.0814273
## 21 551 1.187346 68 488 1.154565 10 632 5.7822507
## 62 537 1.157178 28 487 1.152199 50 589 5.3888381
## 34 534 1.150713 23 484 1.145101 32 233 2.1317475
## 76 525 1.131319 64 484 1.145101 20 108 0.9881061
## 99 523 1.127009 43 483 1.142735 65 76 0.6953339
## 100 518 1.116235 87 482 1.140370 49 45 0.4117109
## 22 517 1.114080 4 480 1.135638 94 32 0.2927722
## 39 516 1.111925 7 478 1.130906 21 26 0.2378774
## [1] "Nonzero Features"
## [1] 10 20 30 40 50 60 70 80 90 100
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..