This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 212 1.0733634 170 92 0.5216898 60 206 6.180618
## 200 141 0.7138879 26 88 0.4990077 160 167 5.010501
## 151 139 0.7037618 117 87 0.4933371 300 165 4.950495
## 30 134 0.6784467 190 85 0.4819960 30 116 3.480348
## 130 126 0.6379424 254 82 0.4649844 200 99 2.970297
## 152 103 0.5214926 112 81 0.4593139 130 76 2.280228
## 60 100 0.5063035 264 81 0.4593139 100 61 1.830183
## 61 98 0.4961774 208 79 0.4479728 56 57 1.710171
## 36 95 0.4809883 14 77 0.4366317 1 56 1.680168
## 56 95 0.4809883 52 76 0.4309612 260 43 1.290129
## 154 94 0.4759253 110 76 0.4309612 203 39 1.170117
## 230 92 0.4657992 153 76 0.4309612 230 39 1.170117
## 35 91 0.4607362 226 76 0.4309612 152 35 1.050105
## 105 90 0.4556731 291 76 0.4309612 154 34 1.020102
## 135 87 0.4404840 119 75 0.4252906 107 33 0.990099
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 483 0.7634794 265 238 0.4200049 60 402 16.763970
## 151 341 0.5390196 117 231 0.4076519 160 301 12.552127
## 30 332 0.5247933 252 231 0.4076519 300 292 12.176814
## 130 295 0.4663073 53 227 0.4005929 30 159 6.630525
## 200 293 0.4631459 186 227 0.4005929 200 103 4.295246
## 60 269 0.4252090 110 225 0.3970635 130 80 3.336113
## 154 265 0.4188862 215 225 0.3970635 100 68 2.835696
## 56 263 0.4157248 16 224 0.3952988 56 58 2.418682
## 251 263 0.4157248 82 224 0.3952988 1 52 2.168474
## 105 261 0.4125634 91 223 0.3935340 260 44 1.834862
## 83 257 0.4062406 175 222 0.3917693 230 40 1.668057
## 71 254 0.4014985 193 221 0.3900046 152 33 1.376147
## 298 252 0.3983371 169 220 0.3882399 203 29 1.209341
## 31 250 0.3951757 128 217 0.3829457 232 27 1.125938
## 134 250 0.3951757 174 217 0.3829457 183 24 1.000834
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 870 0.6229504 271 489 0.3785767 60 596 20.3343569
## 30 667 0.4775953 194 488 0.3778026 300 447 15.2507677
## 151 575 0.4117201 187 483 0.3739316 160 402 13.7154555
## 200 569 0.4074238 24 482 0.3731574 30 212 7.2330263
## 71 552 0.3952513 89 482 0.3731574 200 139 4.7424087
## 130 549 0.3931032 118 479 0.3708349 130 95 3.2412146
## 56 543 0.3888069 143 479 0.3708349 100 83 2.8317980
## 260 533 0.3816466 157 478 0.3700607 56 82 2.7976800
## 134 531 0.3802145 196 477 0.3692865 1 69 2.3541453
## 152 521 0.3730542 224 477 0.3692865 260 48 1.6376663
## 105 516 0.3694740 175 476 0.3685123 203 34 1.1600136
## 60 515 0.3687580 276 476 0.3685123 230 33 1.1258956
## 154 514 0.3680419 125 475 0.3677381 107 30 1.0235415
## 166 511 0.3658938 270 474 0.3669640 183 25 0.8529512
## 209 511 0.3658938 264 473 0.3661898 152 22 0.7505971
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 151 216 1.0892038 122 89 0.5024558 60 232 6.060606
## 300 174 0.8774141 284 85 0.4798735 160 225 5.877743
## 30 112 0.5647723 82 84 0.4742280 300 225 5.877743
## 135 101 0.5093036 175 84 0.4742280 30 186 4.858934
## 139 100 0.5042610 153 82 0.4629368 200 145 3.787879
## 166 99 0.4992184 180 81 0.4572913 130 125 3.265413
## 130 96 0.4840906 19 79 0.4460001 100 107 2.795193
## 295 96 0.4840906 117 79 0.4460001 56 92 2.403344
## 6 95 0.4790480 127 79 0.4460001 1 89 2.324974
## 83 95 0.4790480 223 79 0.4460001 260 87 2.272727
## 4 92 0.4639201 168 78 0.4403545 230 81 2.115987
## 36 92 0.4639201 59 77 0.4347090 203 52 1.358412
## 244 91 0.4588775 10 76 0.4290634 107 51 1.332288
## 200 90 0.4538349 48 76 0.4290634 152 45 1.175549
## 273 90 0.4538349 97 76 0.4290634 183 45 1.175549
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 151 492 0.7767481 270 242 0.4403042 60 724 18.6597938
## 300 453 0.7151766 227 234 0.4257487 300 664 17.1134021
## 30 310 0.4894144 110 231 0.4202904 160 589 15.1804124
## 200 269 0.4246854 66 229 0.4166515 30 364 9.3814433
## 273 261 0.4120554 182 229 0.4166515 200 214 5.5154639
## 71 258 0.4073191 163 228 0.4148321 130 190 4.8969072
## 260 258 0.4073191 68 224 0.4075543 100 162 4.1752577
## 130 254 0.4010041 144 224 0.4075543 56 130 3.3505155
## 298 254 0.4010041 106 222 0.4039154 1 127 3.2731959
## 166 253 0.3994253 101 221 0.4020960 230 78 2.0103093
## 155 250 0.3946891 279 221 0.4020960 260 75 1.9329897
## 83 249 0.3931103 29 220 0.4002766 203 53 1.3659794
## 165 247 0.3899528 7 219 0.3984571 183 44 1.1340206
## 256 246 0.3883740 46 219 0.3984571 232 32 0.8247423
## 28 245 0.3867953 51 219 0.3984571 107 24 0.6185567
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 830 0.5980473 186 498 0.3926051 60 1116 26.9305019
## 151 749 0.5396837 79 495 0.3902401 300 896 21.6216216
## 30 641 0.4618655 253 487 0.3839331 160 799 19.2808880
## 230 534 0.3847678 29 485 0.3823564 30 419 10.1110039
## 130 533 0.3840473 163 482 0.3799913 200 205 4.9469112
## 166 528 0.3804446 65 479 0.3776262 130 125 3.0164093
## 298 521 0.3754008 172 476 0.3752611 100 121 2.9198842
## 28 519 0.3739597 235 476 0.3752611 56 91 2.1959459
## 83 519 0.3739597 101 475 0.3744728 1 88 2.1235521
## 200 519 0.3739597 17 474 0.3736844 260 44 1.0617761
## 152 515 0.3710776 116 474 0.3736844 230 41 0.9893822
## 134 514 0.3703570 49 473 0.3728961 183 21 0.5067568
## 290 513 0.3696365 296 473 0.3728961 203 14 0.3378378
## 105 508 0.3660338 185 472 0.3721077 232 14 0.3378378
## 113 508 0.3660338 190 471 0.3713193 134 12 0.2895753
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..