This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 100 115 0.15737902 297 65 0.09720353 100 164 8.4318766
## 1500 103 0.14095686 565 65 0.09720353 1000 130 6.6838046
## 1000 97 0.13274578 62 63 0.09421265 1500 109 5.6041131
## 863 94 0.12864025 150 63 0.09421265 1400 84 4.3187661
## 1200 93 0.12727173 153 63 0.09421265 1200 47 2.4164524
## 1400 87 0.11906065 299 63 0.09421265 800 38 1.9537275
## 599 84 0.11495511 649 63 0.09421265 1156 31 1.5938303
## 708 81 0.11084957 309 62 0.09271721 1047 30 1.5424165
## 326 76 0.10400701 1325 62 0.09271721 694 26 1.3367609
## 400 73 0.09990147 20 61 0.09122177 138 21 1.0796915
## 1279 73 0.09990147 396 61 0.09122177 589 21 1.0796915
## 1 72 0.09853295 453 61 0.09122177 863 21 1.0796915
## 1217 72 0.09853295 1182 61 0.09122177 400 18 0.9254499
## 1439 72 0.09853295 1231 61 0.09122177 1015 18 0.9254499
## 818 71 0.09716444 74 60 0.08972633 200 15 0.7712082
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 100 297 0.12904627 848 195 0.08748671 100 69 6.9207623
## 1400 268 0.11644580 199 187 0.08389751 1400 59 5.9177533
## 1200 267 0.11601130 1069 186 0.08344886 1500 56 5.6168506
## 1000 257 0.11166630 893 185 0.08300021 1000 51 5.1153460
## 1500 242 0.10514882 427 181 0.08120561 1200 36 3.6108325
## 599 206 0.08950684 1370 181 0.08120561 800 27 2.7081244
## 179 203 0.08820335 907 180 0.08075696 200 16 1.6048144
## 372 193 0.08385835 956 180 0.08075696 1047 16 1.6048144
## 352 191 0.08298935 995 180 0.08075696 1404 16 1.6048144
## 852 191 0.08298935 2 179 0.08030831 1156 15 1.5045135
## 116 190 0.08255486 235 178 0.07985966 400 13 1.3039117
## 358 188 0.08168586 297 178 0.07985966 694 13 1.3039117
## 906 188 0.08168586 957 178 0.07985966 1015 13 1.3039117
## 955 187 0.08125136 1056 178 0.07985966 1270 13 1.3039117
## 996 187 0.08125136 1075 178 0.07985966 138 9 0.9027081
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density
## 100 561 0.10748094 725 394 0.07976144
## 1200 528 0.10115853 323 386 0.07814192
## 1400 519 0.09943424 389 386 0.07814192
## 1500 506 0.09694359 1182 383 0.07753460
## 1000 505 0.09675201 264 377 0.07631996
## 599 423 0.08104178 649 376 0.07611752
## 1123 405 0.07759319 579 374 0.07571264
## 915 403 0.07721002 424 373 0.07551020
## 1396 402 0.07701843 135 372 0.07530776
## 1266 401 0.07682684 803 371 0.07510532
## 795 400 0.07663525 442 369 0.07470044
## 1367 400 0.07663525 244 368 0.07449800
## 400 397 0.07606049 817 368 0.07449800
## 112 396 0.07586890 1348 368 0.07449800
## 163 395 0.07567731 246 367 0.07429556
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 863 118 0.16908351 1288 71 0.10658100 100 242 10.886190
## 1000 97 0.13899238 1146 68 0.10207758 1000 228 10.256410
## 819 82 0.11749871 668 67 0.10057644 1500 209 9.401709
## 513 77 0.11033415 989 67 0.10057644 1400 190 8.547009
## 834 77 0.11033415 1298 67 0.10057644 1200 116 5.218174
## 1475 74 0.10603542 452 65 0.09757416 800 115 5.173189
## 1356 72 0.10316960 89 64 0.09607302 1156 57 2.564103
## 1200 71 0.10173669 962 64 0.09607302 1047 42 1.889339
## 1014 70 0.10030378 1001 64 0.09607302 1413 35 1.574449
## 1173 70 0.10030378 1405 64 0.09607302 1015 34 1.529465
## 1275 70 0.10030378 281 63 0.09457187 138 32 1.439496
## 304 69 0.09887087 555 63 0.09457187 589 31 1.394512
## 400 68 0.09743795 874 63 0.09457187 1266 31 1.394512
## 466 68 0.09743795 1062 63 0.09457187 400 29 1.304543
## 1340 68 0.09743795 106 62 0.09307073 694 29 1.304543
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 863 274 0.11939154 503 183 0.08618132 100 272 10.755239
## 400 233 0.10152638 1090 181 0.08523945 1000 242 9.569000
## 599 230 0.10021917 1260 180 0.08476851 1500 235 9.292210
## 1000 230 0.10021917 7 179 0.08429758 1400 225 8.896797
## 100 217 0.09455461 954 177 0.08335570 800 166 6.563859
## 1200 214 0.09324741 367 176 0.08288477 1200 158 6.247529
## 1500 213 0.09281167 215 175 0.08241383 1413 68 2.688810
## 304 207 0.09019726 291 175 0.08241383 1047 58 2.293397
## 1063 202 0.08801858 1133 175 0.08241383 1156 54 2.135231
## 112 195 0.08496843 1405 175 0.08241383 1015 53 2.095690
## 326 192 0.08366122 1374 174 0.08194289 138 49 1.937525
## 523 192 0.08366122 439 173 0.08147196 589 30 1.186240
## 1413 192 0.08366122 1179 173 0.08147196 599 28 1.107157
## 519 191 0.08322549 1268 173 0.08147196 200 27 1.067616
## 800 190 0.08278975 373 172 0.08100102 400 27 1.067616
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density
## 100 513 0.09835934 335 376 0.07775439
## 1000 506 0.09701720 1299 376 0.07775439
## 1200 504 0.09663373 1483 376 0.07775439
## 1400 476 0.09126519 756 371 0.07672042
## 1500 454 0.08704705 522 370 0.07651363
## 599 446 0.08551318 1488 369 0.07630683
## 863 439 0.08417105 169 363 0.07506607
## 400 431 0.08263718 1154 363 0.07506607
## 282 422 0.08091158 632 362 0.07485928
## 179 412 0.07899424 88 361 0.07465248
## 1019 406 0.07784384 587 361 0.07465248
## 513 405 0.07765211 642 361 0.07465248
## 1 400 0.07669344 658 361 0.07465248
## 800 399 0.07650171 670 361 0.07465248
## 1014 399 0.07650171 726 361 0.07465248
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..