This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 53 0.3574560 687 33 0.2292622 800 144 8.4955752
## 900 53 0.3574560 392 30 0.2084202 600 132 7.7876106
## 800 49 0.3304782 380 29 0.2014728 900 124 7.3156342
## 400 43 0.2900115 835 29 0.2014728 100 100 5.8997050
## 409 32 0.2158225 95 28 0.1945255 500 88 5.1917404
## 500 32 0.2158225 356 28 0.1945255 300 75 4.4247788
## 706 32 0.2158225 430 28 0.1945255 1 69 4.0707965
## 32 31 0.2090780 546 27 0.1875782 200 58 3.4218289
## 100 31 0.2090780 54 26 0.1806308 400 39 2.3008850
## 257 30 0.2023336 438 26 0.1806308 108 36 2.1238938
## 386 30 0.2023336 482 26 0.1806308 574 22 1.2979351
## 600 30 0.2023336 846 26 0.1806308 700 19 1.1209440
## 611 30 0.2023336 858 26 0.1806308 424 16 0.9439528
## 511 29 0.1955891 38 25 0.1736835 650 16 0.9439528
## 482 28 0.1888447 151 25 0.1736835 139 15 0.8849558
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 134 0.2798254 325 73 0.1682221 600 208 8.3467095
## 800 113 0.2359722 370 73 0.1682221 900 203 8.1460674
## 600 107 0.2234427 502 73 0.1682221 800 199 7.9855538
## 900 107 0.2234427 374 71 0.1636133 100 187 7.5040128
## 100 99 0.2067367 871 70 0.1613089 1 164 6.5810594
## 400 92 0.1921189 211 69 0.1590045 300 136 5.4574639
## 409 75 0.1566187 404 69 0.1590045 500 122 4.8956661
## 222 74 0.1545305 43 68 0.1567001 108 74 2.9695024
## 79 73 0.1524422 145 67 0.1543957 200 70 2.8089888
## 295 73 0.1524422 68 66 0.1520913 400 54 2.1669342
## 709 73 0.1524422 279 66 0.1520913 574 43 1.7255217
## 52 72 0.1503540 223 65 0.1497868 700 39 1.5650080
## 173 72 0.1503540 384 65 0.1497868 424 32 1.2841091
## 200 72 0.1503540 427 65 0.1497868 765 24 0.9630819
## 30 71 0.1482657 533 65 0.1497868 379 23 0.9229535
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 234 0.2187345 422 142 0.1460470 500 5 14.705882
## 800 230 0.2149955 412 141 0.1450185 1 3 8.823529
## 900 211 0.1972350 683 137 0.1409045 100 3 8.823529
## 100 196 0.1832135 96 136 0.1398760 800 3 8.823529
## 600 182 0.1701268 151 135 0.1388475 900 3 8.823529
## 400 166 0.1551706 338 134 0.1378190 139 2 5.882353
## 200 158 0.1476925 359 133 0.1367905 548 2 5.882353
## 1 155 0.1448882 728 133 0.1367905 574 2 5.882353
## 191 147 0.1374101 226 132 0.1357620 14 1 2.941176
## 377 145 0.1355406 370 132 0.1357620 112 1 2.941176
## 549 145 0.1355406 628 132 0.1357620 246 1 2.941176
## 222 144 0.1346059 390 131 0.1347335 563 1 2.941176
## 362 144 0.1346059 822 131 0.1347335 590 1 2.941176
## 500 144 0.1346059 572 130 0.1337050 596 1 2.941176
## 586 144 0.1346059 804 130 0.1337050 676 1 2.941176
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 59 0.4052476 156 32 0.2279527 600 216 9.751693
## 496 42 0.2884814 874 31 0.2208292 800 205 9.255079
## 409 33 0.2266639 515 30 0.2137057 100 186 8.397291
## 718 33 0.2266639 8 28 0.1994586 900 185 8.352144
## 782 33 0.2266639 119 28 0.1994586 500 157 7.088036
## 556 32 0.2197953 712 28 0.1994586 300 151 6.817156
## 765 32 0.2197953 99 27 0.1923351 1 145 6.546275
## 244 31 0.2129267 249 27 0.1923351 200 104 4.695260
## 441 30 0.2060581 426 27 0.1923351 400 76 3.431151
## 611 30 0.2060581 4 26 0.1852116 108 72 3.250564
## 650 30 0.2060581 507 26 0.1852116 700 54 2.437923
## 845 30 0.2060581 555 26 0.1852116 574 33 1.489842
## 783 29 0.1991895 610 26 0.1852116 738 27 1.218962
## 386 28 0.1923209 638 26 0.1852116 424 24 1.083521
## 600 28 0.1923209 163 25 0.1780880 548 24 1.083521
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 132 0.2769792 197 72 0.1720225 600 557 13.1027993
## 900 100 0.2098328 38 69 0.1648549 800 508 11.9501294
## 400 88 0.1846528 566 69 0.1648549 900 469 11.0326982
## 496 85 0.1783578 188 68 0.1624657 100 451 10.6092684
## 650 85 0.1783578 660 67 0.1600765 1 375 8.8214538
## 593 83 0.1741612 713 67 0.1600765 300 369 8.6803105
## 764 81 0.1699645 570 66 0.1576873 500 346 8.1392614
## 332 80 0.1678662 228 65 0.1552981 200 221 5.1987768
## 409 79 0.1657679 511 65 0.1552981 108 143 3.3639144
## 800 76 0.1594729 872 65 0.1552981 400 132 3.1051517
## 173 75 0.1573746 159 64 0.1529089 700 87 2.0465773
## 315 74 0.1552762 669 64 0.1529089 424 56 1.3173371
## 782 74 0.1552762 670 64 0.1529089 379 34 0.7998118
## 282 73 0.1531779 219 63 0.1505197 762 32 0.7527641
## 643 73 0.1531779 42 62 0.1481305 738 31 0.7292402
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 212 0.2026517 487 140 0.1471114 600 169 10.910265
## 800 192 0.1835336 242 139 0.1460606 100 159 10.264687
## 100 182 0.1739746 852 136 0.1429082 800 151 9.748225
## 900 181 0.1730186 149 134 0.1408066 900 135 8.715300
## 400 178 0.1701509 294 134 0.1408066 1 126 8.134280
## 600 165 0.1577242 620 133 0.1397558 300 100 6.455778
## 650 159 0.1519888 648 132 0.1387050 500 100 6.455778
## 700 156 0.1491210 740 132 0.1387050 200 89 5.745642
## 874 148 0.1414738 744 131 0.1376542 108 61 3.938025
## 892 147 0.1405179 877 131 0.1376542 400 57 3.679793
## 496 145 0.1386061 893 131 0.1376542 700 28 1.807618
## 93 142 0.1357384 232 130 0.1366034 574 26 1.678502
## 282 142 0.1357384 263 130 0.1366034 762 21 1.355713
## 500 142 0.1357384 286 130 0.1366034 379 19 1.226598
## 79 141 0.1347825 482 130 0.1366034 424 19 1.226598
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..