This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000 0 1 5 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 1500 88 0.1812116 100 99 0.1981149 100 164 8.4318766
## 100 83 0.1709155 1500 91 0.1821056 1000 130 6.6838046
## 1200 78 0.1606194 1200 76 0.1520882 1500 109 5.6041131
## 1000 74 0.1523825 1000 73 0.1460847 1400 84 4.3187661
## 326 61 0.1256126 863 72 0.1440836 1200 47 2.4164524
## 863 61 0.1256126 1400 58 0.1160673 800 38 1.9537275
## 599 59 0.1214942 1047 54 0.1080627 1156 31 1.5938303
## 735 58 0.1194349 326 53 0.1060615 1047 30 1.5424165
## 818 55 0.1132573 599 52 0.1040604 694 26 1.3367609
## 683 53 0.1091388 683 52 0.1040604 138 21 1.0796915
## 304 52 0.1070796 429 51 0.1020592 589 21 1.0796915
## 1279 52 0.1070796 735 51 0.1020592 863 21 1.0796915
## 1400 52 0.1070796 909 51 0.1020592 400 18 0.9254499
## 1475 52 0.1070796 229 50 0.1000580 1015 18 0.9254499
## 909 51 0.1050204 818 50 0.1000580 200 15 0.7712082
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000 0 5 15 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 100 207 0.13468583 100 203 0.13131000 100 69 6.9207623
## 1400 185 0.12037139 1000 186 0.12031359 1400 59 5.9177533
## 1000 183 0.11907008 1200 184 0.11901990 1500 56 5.6168506
## 1200 180 0.11711811 1500 173 0.11190458 1000 51 5.1153460
## 1500 173 0.11256352 863 155 0.10026133 1200 36 3.6108325
## 179 143 0.09304383 599 141 0.09120546 800 27 2.7081244
## 599 139 0.09044121 852 141 0.09120546 200 16 1.6048144
## 1134 137 0.08913990 1400 140 0.09055862 1047 16 1.6048144
## 1337 135 0.08783859 186 139 0.08991177 1404 16 1.6048144
## 996 133 0.08653727 304 138 0.08926492 1156 15 1.5045135
## 1457 133 0.08653727 800 138 0.08926492 400 13 1.3039117
## 216 131 0.08523596 400 137 0.08861808 694 13 1.3039117
## 400 131 0.08523596 1266 137 0.08861808 1015 13 1.3039117
## 852 131 0.08523596 910 135 0.08732438 1270 13 1.3039117
## 326 129 0.08393465 1297 134 0.08667753 138 9 0.9027081
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 4 9000 0 1 5 60 80
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "4"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 863 80 0.1735509 863 123 0.2487109 100 242 10.886190
## 1000 64 0.1388407 1000 91 0.1840057 1000 228 10.256410
## 1475 57 0.1236550 1500 71 0.1435649 1500 209 9.401709
## 834 55 0.1193162 100 65 0.1314326 1400 190 8.547009
## 513 54 0.1171468 819 58 0.1172783 1200 116 5.218174
## 1200 54 0.1171468 1275 57 0.1152563 800 115 5.173189
## 304 51 0.1106387 1047 56 0.1132343 1156 57 2.564103
## 819 50 0.1084693 371 54 0.1091902 1047 42 1.889339
## 1036 50 0.1084693 800 54 0.1091902 1413 35 1.574449
## 1047 50 0.1084693 834 54 0.1091902 1015 34 1.529465
## 1128 50 0.1084693 1413 54 0.1091902 138 32 1.439496
## 276 48 0.1041305 854 52 0.1051461 589 31 1.394512
## 708 48 0.1041305 532 51 0.1031241 1266 31 1.394512
## 1367 48 0.1041305 1356 51 0.1031241 400 29 1.304543
## 63 47 0.1019611 142 50 0.1011020 694 29 1.304543
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 9000 0 5 15 60 80
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "5"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 863 190 0.12467519 863 279 0.18071587 100 272 10.755239
## 1200 168 0.11023911 1000 197 0.12760224 1000 242 9.569000
## 400 166 0.10892674 800 167 0.10817043 1500 235 9.292210
## 1000 165 0.10827056 1500 166 0.10752270 1400 225 8.896797
## 599 160 0.10498963 599 147 0.09521589 800 166 6.563859
## 1500 159 0.10433345 1014 145 0.09392043 1200 158 6.247529
## 100 145 0.09514685 400 140 0.09068180 1413 68 2.688810
## 304 138 0.09055356 1413 140 0.09068180 1047 58 2.293397
## 1063 138 0.09055356 100 137 0.08873862 1156 54 2.135231
## 112 137 0.08989737 819 135 0.08744316 1015 53 2.095690
## 326 136 0.08924119 1438 135 0.08744316 138 49 1.937525
## 917 134 0.08792882 304 134 0.08679543 589 30 1.186240
## 179 133 0.08727263 122 133 0.08614771 599 28 1.107157
## 519 132 0.08661645 1245 133 0.08614771 200 27 1.067616
## 1157 132 0.08661645 1284 133 0.08614771 400 27 1.067616
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 7
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 1 5 30 60
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 9000 20 1 5 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "7"
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 1400 79 0.1600454 1500 93 0.1863727 1000 129 6.5349544
## 1500 77 0.1559936 1000 83 0.1663327 100 119 6.0283688
## 1000 70 0.1418124 863 77 0.1543086 1500 113 5.7244174
## 100 68 0.1377606 100 75 0.1503006 1400 105 5.3191489
## 863 62 0.1256052 1400 66 0.1322645 800 54 2.7355623
## 1200 62 0.1256052 1200 63 0.1262525 1200 54 2.7355623
## 941 59 0.1195276 599 55 0.1102204 1156 37 1.8743668
## 326 58 0.1175017 1063 55 0.1102204 138 26 1.3171226
## 599 57 0.1154758 400 52 0.1042084 1404 25 1.2664640
## 1232 55 0.1114240 816 52 0.1042084 1015 21 1.0638298
## 1063 54 0.1093981 1139 52 0.1042084 1047 21 1.0638298
## 1128 54 0.1093981 1232 52 0.1042084 1266 18 0.9118541
## 1367 54 0.1093981 1365 52 0.1042084 589 17 0.8611955
## 513 53 0.1073722 1367 52 0.1042084 1413 17 0.8611955
## 1390 53 0.1073722 507 51 0.1022044 400 14 0.7092199
## [1] "Nonzero Features"
## [1] 1 100 200 400 600 800 1000 1200 1400 1500
##
##
##
##
##
##
## [1] EXPERIMENT 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 5 15 30 60
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 9000 20 5 15 30 60
## [1] 1 100 200 400 600 800 1000 1200 1400 1500