This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 210 6 1.2820513 108 7 1.1437908 80 62 1.892552
## 247 6 1.2820513 24 6 0.9803922 248 55 1.678877
## 275 6 1.2820513 42 6 0.9803922 209 51 1.556777
## 293 6 1.2820513 89 6 0.9803922 4 48 1.465201
## 197 5 1.0683761 182 6 0.9803922 150 43 1.312576
## 248 5 1.0683761 205 6 0.9803922 57 39 1.190476
## 259 5 1.0683761 281 6 0.9803922 169 39 1.190476
## 297 5 1.0683761 18 5 0.8169935 154 38 1.159951
## 21 4 0.8547009 22 5 0.8169935 170 37 1.129426
## 45 4 0.8547009 74 5 0.8169935 12 36 1.098901
## 48 4 0.8547009 77 5 0.8169935 147 36 1.098901
## 74 4 0.8547009 88 5 0.8169935 252 36 1.098901
## 100 4 0.8547009 98 5 0.8169935 68 35 1.068376
## 117 4 0.8547009 114 5 0.8169935 171 35 1.068376
## 127 4 0.8547009 121 5 0.8169935 33 34 1.037851
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 235 13 0.8274984 124 13 0.7475561 4 59 4.100069
## 150 12 0.7638447 34 12 0.6900518 248 59 4.100069
## 273 12 0.7638447 82 12 0.6900518 80 50 3.474635
## 33 11 0.7001910 134 12 0.6900518 12 33 2.293259
## 75 11 0.7001910 200 12 0.6900518 150 33 2.293259
## 113 11 0.7001910 258 12 0.6900518 169 33 2.293259
## 221 11 0.7001910 13 11 0.6325474 209 32 2.223767
## 250 11 0.7001910 91 11 0.6325474 33 30 2.084781
## 280 11 0.7001910 98 11 0.6325474 147 28 1.945796
## 6 10 0.6365372 139 11 0.6325474 154 27 1.876303
## 13 10 0.6365372 159 11 0.6325474 253 27 1.876303
## 16 10 0.6365372 185 11 0.6325474 57 26 1.806810
## 85 10 0.6365372 227 11 0.6325474 68 26 1.806810
## 156 10 0.6365372 250 11 0.6325474 118 25 1.737318
## 239 10 0.6365372 289 11 0.6325474 203 25 1.737318
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 151 24 0.6690828 300 21 0.5915493 4 77 4.171181
## 21 20 0.5575690 25 20 0.5633803 80 69 3.737811
## 98 20 0.5575690 239 20 0.5633803 248 69 3.737811
## 254 20 0.5575690 23 19 0.5352113 118 53 2.871073
## 74 19 0.5296905 62 19 0.5352113 209 48 2.600217
## 80 19 0.5296905 100 19 0.5352113 154 47 2.546046
## 146 19 0.5296905 232 19 0.5352113 253 44 2.383532
## 147 19 0.5296905 13 18 0.5070423 203 40 2.166847
## 12 18 0.5018121 78 18 0.5070423 57 39 2.112676
## 33 18 0.5018121 93 18 0.5070423 169 38 2.058505
## 81 18 0.5018121 94 18 0.5070423 150 37 2.004334
## 152 18 0.5018121 114 18 0.5070423 33 31 1.679307
## 203 18 0.5018121 190 18 0.5070423 147 30 1.625135
## 235 18 0.5018121 234 18 0.5070423 68 29 1.570964
## 245 18 0.5018121 7 17 0.4788732 70 29 1.570964
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 113 8 1.5686275 8 7 1.1627907 248 83 2.705346
## 247 8 1.5686275 52 6 0.9966777 4 66 2.151239
## 88 7 1.3725490 79 6 0.9966777 209 65 2.118644
## 147 6 1.1764706 95 6 0.9966777 253 65 2.118644
## 191 6 1.1764706 138 6 0.9966777 169 61 1.988266
## 206 6 1.1764706 158 6 0.9966777 80 60 1.955671
## 255 6 1.1764706 290 6 0.9966777 118 58 1.890482
## 22 5 0.9803922 293 6 0.9966777 12 53 1.727510
## 69 5 0.9803922 67 5 0.8305648 154 51 1.662321
## 90 5 0.9803922 82 5 0.8305648 171 50 1.629726
## 92 5 0.9803922 90 5 0.8305648 203 49 1.597132
## 98 5 0.9803922 99 5 0.8305648 57 48 1.564537
## 273 5 0.9803922 184 5 0.8305648 33 46 1.499348
## 276 5 0.9803922 243 5 0.8305648 68 46 1.499348
## 12 4 0.7843137 246 5 0.8305648 293 43 1.401565
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 30 60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 235 26 0.7564737 42 22 0.6037322 248 73 10.138889
## 80 24 0.6982834 137 21 0.5762898 80 68 9.444444
## 30 21 0.6109980 205 21 0.5762898 4 64 8.888889
## 51 20 0.5819028 89 20 0.5488474 209 51 7.083333
## 63 20 0.5819028 50 19 0.5214050 118 29 4.027778
## 283 20 0.5819028 107 19 0.5214050 154 29 4.027778
## 165 19 0.5528077 155 19 0.5214050 253 29 4.027778
## 173 19 0.5528077 184 19 0.5214050 147 24 3.333333
## 282 19 0.5528077 1 18 0.4939627 57 23 3.194444
## 76 18 0.5237125 15 18 0.4939627 150 22 3.055556
## 150 18 0.5237125 131 18 0.4939627 169 21 2.916667
## 253 18 0.5237125 178 18 0.4939627 203 20 2.777778
## 292 18 0.5237125 222 18 0.4939627 12 18 2.500000
## 33 17 0.4946174 233 18 0.4939627 68 15 2.083333
## 125 17 0.4946174 269 18 0.4939627 171 15 2.083333
## [1] "Nonzero Features"
## [1] 1 30 60 100 130 160 200 230 260 300
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..