Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        1        5       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  1500     88    0.1812116 100  99    0.1981149  100     164   8.4318766
##  100      83    0.1709155 1500 91    0.1821056 1000     130   6.6838046
##  1200     78    0.1606194 1200 76    0.1520882 1500     109   5.6041131
##  1000     74    0.1523825 1000 73    0.1460847 1400      84   4.3187661
##  326      61    0.1256126 863  72    0.1440836 1200      47   2.4164524
##  863      61    0.1256126 1400 58    0.1160673  800      38   1.9537275
##  599      59    0.1214942 1047 54    0.1080627 1156      31   1.5938303
##  735      58    0.1194349 326  53    0.1060615 1047      30   1.5424165
##  818      55    0.1132573 599  52    0.1040604  694      26   1.3367609
##  683      53    0.1091388 683  52    0.1040604  138      21   1.0796915
##  304      52    0.1070796 429  51    0.1020592  589      21   1.0796915
##  1279     52    0.1070796 735  51    0.1020592  863      21   1.0796915
##  1400     52    0.1070796 909  51    0.1020592  400      18   0.9254499
##  1475     52    0.1070796 229  50    0.1000580 1015      18   0.9254499
##  909      51    0.1050204 818  50    0.1000580  200      15   0.7712082
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0        5       15       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"         
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  100      207   0.13468583 100  203   0.13131000  100     69    6.9207623
##  1400     185   0.12037139 1000 186   0.12031359 1400     59    5.9177533
##  1000     183   0.11907008 1200 184   0.11901990 1500     56    5.6168506
##  1200     180   0.11711811 1500 173   0.11190458 1000     51    5.1153460
##  1500     173   0.11256352 863  155   0.10026133 1200     36    3.6108325
##  179      143   0.09304383 599  141   0.09120546  800     27    2.7081244
##  599      139   0.09044121 852  141   0.09120546  200     16    1.6048144
##  1134     137   0.08913990 1400 140   0.09055862 1047     16    1.6048144
##  1337     135   0.08783859 186  139   0.08991177 1404     16    1.6048144
##  996      133   0.08653727 304  138   0.08926492 1156     15    1.5045135
##  1457     133   0.08653727 800  138   0.08926492  400     13    1.3039117
##  216      131   0.08523596 400  137   0.08861808  694     13    1.3039117
##  400      131   0.08523596 1266 137   0.08861808 1015     13    1.3039117
##  852      131   0.08523596 910  135   0.08732438 1270     13    1.3039117
##  326      129   0.08393465 1297 134   0.08667753  138      9    0.9027081
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 4 9000          0        1        5       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "4"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  863      80    0.1735509 863  123   0.2487109  100     242   10.886190
##  1000     64    0.1388407 1000  91   0.1840057 1000     228   10.256410
##  1475     57    0.1236550 1500  71   0.1435649 1500     209    9.401709
##  834      55    0.1193162 100   65   0.1314326 1400     190    8.547009
##  513      54    0.1171468 819   58   0.1172783 1200     116    5.218174
##  1200     54    0.1171468 1275  57   0.1152563  800     115    5.173189
##  304      51    0.1106387 1047  56   0.1132343 1156      57    2.564103
##  819      50    0.1084693 371   54   0.1091902 1047      42    1.889339
##  1036     50    0.1084693 800   54   0.1091902 1413      35    1.574449
##  1047     50    0.1084693 834   54   0.1091902 1015      34    1.529465
##  1128     50    0.1084693 1413  54   0.1091902  138      32    1.439496
##  276      48    0.1041305 854   52   0.1051461  589      31    1.394512
##  708      48    0.1041305 532   51   0.1031241 1266      31    1.394512
##  1367     48    0.1041305 1356  51   0.1031241  400      29    1.304543
##  63       47    0.1019611 142   50   0.1011020  694      29    1.304543
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 9000          0        5       15       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "5"         
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  863      190   0.12467519 863  279   0.18071587  100     272   10.755239
##  1200     168   0.11023911 1000 197   0.12760224 1000     242    9.569000
##  400      166   0.10892674 800  167   0.10817043 1500     235    9.292210
##  1000     165   0.10827056 1500 166   0.10752270 1400     225    8.896797
##  599      160   0.10498963 599  147   0.09521589  800     166    6.563859
##  1500     159   0.10433345 1014 145   0.09392043 1200     158    6.247529
##  100      145   0.09514685 400  140   0.09068180 1413      68    2.688810
##  304      138   0.09055356 1413 140   0.09068180 1047      58    2.293397
##  1063     138   0.09055356 100  137   0.08873862 1156      54    2.135231
##  112      137   0.08989737 819  135   0.08744316 1015      53    2.095690
##  326      136   0.08924119 1438 135   0.08744316  138      49    1.937525
##  917      134   0.08792882 304  134   0.08679543  589      30    1.186240
##  179      133   0.08727263 122  133   0.08614771  599      28    1.107157
##  519      132   0.08661645 1245 133   0.08614771  200      27    1.067616
##  1157     132   0.08661645 1284 133   0.08614771  400      27    1.067616
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 7
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          1          5         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 9000         20        1        5       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "7"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  1400     79    0.1600454 1500 93    0.1863727 1000     129   6.5349544
##  1500     77    0.1559936 1000 83    0.1663327  100     119   6.0283688
##  1000     70    0.1418124 863  77    0.1543086 1500     113   5.7244174
##  100      68    0.1377606 100  75    0.1503006 1400     105   5.3191489
##  863      62    0.1256052 1400 66    0.1322645  800      54   2.7355623
##  1200     62    0.1256052 1200 63    0.1262525 1200      54   2.7355623
##  941      59    0.1195276 599  55    0.1102204 1156      37   1.8743668
##  326      58    0.1175017 1063 55    0.1102204  138      26   1.3171226
##  599      57    0.1154758 400  52    0.1042084 1404      25   1.2664640
##  1232     55    0.1114240 816  52    0.1042084 1015      21   1.0638298
##  1063     54    0.1093981 1139 52    0.1042084 1047      21   1.0638298
##  1128     54    0.1093981 1232 52    0.1042084 1266      18   0.9118541
##  1367     54    0.1093981 1365 52    0.1042084  589      17   0.8611955
##  513      53    0.1073722 1367 52    0.1042084 1413      17   0.8611955
##  1390     53    0.1073722 507  51    0.1022044  400      14   0.7092199
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 8
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 9000         20        5       15       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "8"         
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  100      216   0.13914567 100  212   0.13576950 1000     59    6.3714903
##  1500     191   0.12304085 1000 206   0.13192697 1500     58    6.2634989
##  1000     189   0.12175246 1500 188   0.12039937  100     56    6.0475162
##  1400     178   0.11466634 863  163   0.10438881 1400     41    4.4276458
##  1200     160   0.10307087 1400 159   0.10182712  800     26    2.8077754
##  400      148   0.09534055 1200 153   0.09798459 1200     22    2.3758099
##  863      139   0.08954282 186  146   0.09350164 1047     17    1.8358531
##  599      135   0.08696604 400  146   0.09350164 1015     16    1.7278618
##  179      133   0.08567766 1047 145   0.09286121  400     15    1.6198704
##  196      133   0.08567766 19   139   0.08901868  138     14    1.5118790
##  868      133   0.08567766 800  138   0.08837826 1156     12    1.2958963
##  979      132   0.08503347 1209 138   0.08837826 1413     12    1.2958963
##  1390     132   0.08503347 326  135   0.08645699  694     11    1.1879050
##  1047     131   0.08438927 819  134   0.08581657  871     10    1.0799136
##  1209     131   0.08438927 852  134   0.08581657  589      9    0.9719222
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 10
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          1          5         60         80 
##       M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 10 9000         20        1        5       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "10"        
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  863      74    0.1576044 863  113   0.2288561  100     258   11.564321
##  1000     64    0.1363065 1000  86   0.1741737 1000     215    9.636934
##  1500     58    0.1235278 100   78   0.1579715 1500     205    9.188705
##  326      56    0.1192682 1500  74   0.1498704 1400     185    8.292246
##  100      55    0.1171384 819   58   0.1174660 1200     111    4.975347
##  599      53    0.1128788 122   57   0.1154407  800     109    4.885701
##  1047     53    0.1128788 478   57   0.1154407 1047      67    3.003138
##  519      52    0.1107490 550   57   0.1154407 1156      56    2.510085
##  83       51    0.1086193 800   57   0.1154407 1413      47    2.106679
##  819      51    0.1086193 1413  57   0.1154407  138      46    2.061856
##  1168     51    0.1086193 326   56   0.1134154  400      42    1.882564
##  26       50    0.1064895 728   56   0.1134154 1015      42    1.882564
##  521      50    0.1064895 535   55   0.1113901 1404      33    1.479157
##  947      50    0.1064895 1127  54   0.1093649  200      27    1.210220
##  1083     50    0.1064895 83    53   0.1073396  694      25    1.120574
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 11
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         60         80 
##       M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 11 9000         20        5       15       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "11"        
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  1500     162   0.10505496 863  236   0.15229049  100     283   10.563643
##  1000     160   0.10375798 1000 189   0.12196145 1000     245    9.145203
##  1063     159   0.10310950 1500 172   0.11099137 1500     236    8.809257
##  1200     158   0.10246101 100  157   0.10131189 1400     223    8.324001
##  863      153   0.09921857 1063 154   0.09937600  800     132    4.927212
##  1475     152   0.09857009 819  148   0.09550420 1200     130    4.852557
##  1400     145   0.09403067 834  144   0.09292301 1047      77    2.874207
##  100      144   0.09338219 1047 141   0.09098711 1156      73    2.724897
##  400      143   0.09273370 216  137   0.08840592 1015      56    2.090332
##  1365     141   0.09143672 675  137   0.08840592 1413      54    2.015677
##  83       138   0.08949126 1475 137   0.08840592  138      53    1.978350
##  1077     138   0.08949126 513  136   0.08776062  400      46    1.717059
##  549      136   0.08819429 122  135   0.08711532 1404      45    1.679731
##  834      135   0.08754580 751  134   0.08647002  200      35    1.306458
##  1134     135   0.08754580 1215 134   0.08647002  863      34    1.269130
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500