Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  32       128   3.465079 70  179   4.809242  70      267   6.046196
##  70       114   3.086086 32  125   3.358409  40      240   5.434783
##  60        99   2.680022 30  111   2.982268 100      219   4.959239
##  30        89   2.409312 90  100   2.686728  30      216   4.891304
##  90        85   2.301029 100  94   2.525524  60      198   4.483696
##  80        83   2.246887 60   92   2.471789  80      183   4.144022
##  57        76   2.057390 10   87   2.337453  10      161   3.645833
##  21        69   1.867894 80   75   2.015046  90      158   3.577899
##  100       66   1.786681 21   66   1.773240  50      151   3.419384
##  22        63   1.705468 57   64   1.719506  32      118   2.672101
##  79        61   1.651326 22   61   1.638904  65       92   2.083333
##  39        59   1.597185 39   59   1.585169  20       79   1.788949
##  24        57   1.543043 82   58   1.558302  49       72   1.630435
##  15        54   1.461830 79   55   1.477700  94       61   1.381341
##  10        53   1.434759 61   54   1.450833  21       55   1.245471
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 7
## [1] "MSE Count"
## [1] 7
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       215   1.973201 70  282   2.512921  70      586   12.5724094
##  32       205   1.881424 60  218   1.942613 100      541   11.6069513
##  60       204   1.872247 30  214   1.906968  40      527   11.3065866
##  80       187   1.716226 32  209   1.862413  30      521   11.1778588
##  10       175   1.606094 90  208   1.853502  60      376    8.0669384
##  30       174   1.596916 10  197   1.755480  80      299    6.4149324
##  90       169   1.551028 80  193   1.719836  90      267    5.7283845
##  57       151   1.385830 100 191   1.702014  10      261    5.5996567
##  21       143   1.312408 50  156   1.390127  50      204    4.3767432
##  41       138   1.266520 57  155   1.381215  32      133    2.8534649
##  22       137   1.257342 79  140   1.247549  65       79    1.6949153
##  100      136   1.248164 39  139   1.238638  20       76    1.6305514
##  46       134   1.229809 15  131   1.167350  49       60    1.2872774
##  79       131   1.202276 21  131   1.167350  21       44    0.9440034
##  82       129   1.183921 20  130   1.158439  94       41    0.8796396
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 7
## [1] "MSE Count"
## [1] 9
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       420   1.788680 70  484   2.012139  70      1024  16.8089297
##  80       369   1.571483 10  398   1.654610 100       912  14.9704531
##  60       363   1.545931 100 379   1.575622  40       845  13.8706500
##  10       342   1.456497 60  375   1.558992  30       825  13.5423506
##  90       332   1.413909 80  370   1.538206  60       568   9.3237032
##  32       324   1.379839 50  366   1.521576  80       402   6.5988181
##  100      322   1.371321 32  358   1.488318  90       348   5.7124097
##  30       306   1.303181 90  353   1.467531  10       295   4.8424163
##  50       300   1.277629 30  333   1.384385  50       250   4.1037426
##  21       278   1.183936 20  273   1.134946  32       134   2.1996060
##  22       266   1.132831 79  271   1.126632  20        73   1.1982928
##  57       263   1.120055 22  268   1.114160  65        54   0.8864084
##  79       262   1.115796 21  264   1.097531  49        35   0.5745240
##  72       260   1.107278 39  261   1.085059  94        35   0.5745240
##  41       259   1.103019 41  260   1.080901  21        30   0.4924491
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 8
## [1] "MSE Count"
## [1] 9
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  32       133   3.488067 70  149   3.872141 100      337   6.643012
##  60        91   2.386572 30  119   3.092516  30      325   6.406466
##  21        76   1.993181 90  113   2.936590  70      315   6.209344
##  70        73   1.914503 32  108   2.806653  40      307   6.051646
##  30        69   1.809599 60   90   2.338877  60      281   5.539129
##  57        68   1.783373 100  88   2.286902  90      253   4.987187
##  90        68   1.783373 21   80   2.079002  80      249   4.908338
##  15        63   1.652242 80   79   2.053015  50      216   4.257836
##  80        63   1.652242 10   75   1.949064  10      208   4.100138
##  61        62   1.626016 57   66   1.715177  32      162   3.193377
##  34        61   1.599790 34   62   1.611227  20      129   2.542874
##  39        61   1.599790 22   58   1.507277  65      110   2.168342
##  22        58   1.521112 61   58   1.507277  49       99   1.951508
##  83        58   1.521112 39   57   1.481289  21       97   1.912084
##  41        56   1.468660 78   57   1.481289  94       84   1.655825
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 5
## [1] "MSE Count"
## [1] 7
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  60       183   1.666667 70  238   2.085524 100      831   12.3275478
##  80       178   1.621129 80  215   1.883982  70      799   11.8528408
##  55       172   1.566485 60  208   1.822643  30      758   11.2446225
##  21       164   1.493625 100 205   1.796355  40      754   11.1852841
##  10       156   1.420765 10  201   1.761304  60      665    9.8650052
##  67       153   1.393443 90  196   1.717490  80      525    7.7881620
##  22       149   1.357013 32  174   1.524711  10      453    6.7200712
##  32       149   1.357013 21  167   1.463372  90      434    6.4382139
##  15       146   1.329690 55  166   1.454609  50      405    6.0080107
##  79       145   1.320583 22  157   1.375745  32      191    2.8334075
##  34       143   1.302368 79  143   1.253067  20      125    1.8543243
##  83       141   1.284153 83  143   1.253067  65      100    1.4834594
##  70       140   1.275046 76  139   1.218016  94       67    0.9939178
##  90       139   1.265938 62  135   1.182965  21       65    0.9642486
##  62       136   1.238616 30  133   1.165440  49       64    0.9494140
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 5
## [1] "MSE Count"
## [1] 7
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  80       360   1.535640 70  447   1.872800  70      1694  15.4986276
##  70       355   1.514311 80  403   1.688453 100      1577  14.4281793
##  90       320   1.365013 90  378   1.583710  40      1511  13.8243367
##  55       314   1.339419 100 356   1.491537  30      1481  13.5498628
##  32       309   1.318091 32  347   1.453829  60      1157  10.5855444
##  34       302   1.288231 60  335   1.403553  80       893   8.1701738
##  76       290   1.237043 50  327   1.370035  90       774   7.0814273
##  10       289   1.232777 10  326   1.365845  10       632   5.7822507
##  21       279   1.190121 55  317   1.328138  50       589   5.3888381
##  60       279   1.190121 21  288   1.206637  32       233   2.1317475
##  62       273   1.164527 30  287   1.202447  20       108   0.9881061
##  79       272   1.160261 34  284   1.189878  65        76   0.6953339
##  22       270   1.151730 76  279   1.168929  49        45   0.4117109
##  41       269   1.147464 41  273   1.143791  94        32   0.2927722
##  69       268   1.143198 62  269   1.127032  21        26   0.2378774
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 5
## [1] "MSE Count"
## [1] 8
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 7
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          1          5         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  60       131   3.580213 70  177   4.770889  70      249   5.856068
##  32       128   3.498224 30  129   3.477089 100      247   5.809031
##  70       123   3.361574 100 116   3.126685  30      237   5.573848
##  30        97   2.650998 32  114   3.072776  60      213   5.009407
##  80        74   2.022410 60  109   2.938005  40      195   4.586077
##  90        73   1.995081 10  103   2.776280  80      168   3.951082
##  57        71   1.940421 90   81   2.183288  10      162   3.809972
##  100       70   1.913091 57   74   1.994609  90      141   3.316087
##  39        65   1.776442 39   73   1.967655  50      114   2.681091
##  10        60   1.639792 50   59   1.590296  32      107   2.516463
##  21        53   1.448483 21   55   1.482480  20       86   2.022578
##  22        53   1.448483 80   54   1.455526  65       81   1.904986
##  35        50   1.366494 79   50   1.347709  49       66   1.552211
##  76        48   1.311834 82   50   1.347709  48       61   1.434619
##  79        48   1.311834 83   49   1.320755  94       61   1.434619
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 7
## [1] "MSE Count"
## [1] 8
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 8
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density  
##  70       256   2.315485 70  320   2.855358  70      606   12.752525
##  60       231   2.089363 60  227   2.025520  30      526   11.069024
##  80       203   1.836107 80  227   2.025520  40      514   10.816498
##  32       193   1.745658 90  211   1.882752 100      493   10.374579
##  90       185   1.673300 10  207   1.847060  60      402    8.459596
##  10       175   1.582851 30  198   1.766753  80      315    6.628788
##  30       168   1.519537 32  194   1.731061  90      250    5.260943
##  21       156   1.410999 100 192   1.713215  10      245    5.155724
##  57       137   1.239146 21  158   1.409833  50      207    4.356061
##  100      137   1.239146 50  142   1.267065  32      152    3.198653
##  22       133   1.202967 40  137   1.222450  20      100    2.104377
##  83       133   1.202967 41  137   1.222450  65       72    1.515152
##  41       132   1.193922 55  134   1.195681  94       55    1.157407
##  55       132   1.193922 39  132   1.177835  49       51    1.073232
##  78       132   1.193922 19  131   1.168912  21       48    1.010101
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 7
## [1] "MSE Count"
## [1] 9
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 9
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         15         30         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       395   1.677781 70  449   1.863844  70      928   15.7261481
##  60       363   1.541860 60  384   1.594022 100      907   15.3702762
##  80       362   1.537612 100 384   1.594022  30      813   13.7773259
##  30       329   1.397443 80  374   1.552511  40      781   13.2350449
##  90       329   1.397443 90  366   1.519303  60      552    9.3543467
##  100      310   1.316740 30  356   1.477792  80      447    7.5749873
##  32       306   1.299749 10  349   1.448734  90      362    6.1345535
##  10       298   1.265769 32  334   1.386467  10      268    4.5416031
##  21       274   1.163828 50  311   1.290992  50      238    4.0332147
##  39       266   1.129848 40  298   1.237028  32      127    2.1521776
##  79       264   1.121352 39  278   1.154006  65       67    1.1354008
##  22       263   1.117105 79  273   1.133250  20       55    0.9320454
##  41       263   1.117105 94  273   1.133250  49       37    0.6270124
##  20       260   1.104362 20  270   1.120797  21       31    0.5253347
##  50       259   1.100115 22  264   1.095890  94       29    0.4914421
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 9
## [1] "MSE Count"
## [1] 10
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 10
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          1          5         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  32       121   3.125807 70  159   4.048892  70      357   6.828615
##  60        84   2.169982 90  112   2.852050 100      337   6.446060
##  21        79   2.040816 32   99   2.521008  30      320   6.120888
##  57        74   1.911651 80   98   2.495544  40      300   5.738332
##  90        73   1.885818 60   96   2.444614  60      279   5.336649
##  80        71   1.834151 10   95   2.419149  90      247   4.724560
##  39        70   1.808318 30   95   2.419149  10      230   4.399388
##  70        66   1.704986 100  89   2.266361  80      218   4.169855
##  34        64   1.653320 21   82   2.088108  50      207   3.959449
##  30        63   1.627486 57   69   1.757066  32      193   3.691660
##  82        62   1.601653 39   63   1.604278  65      125   2.390972
##  24        59   1.524154 83   58   1.476954  20      108   2.065800
##  67        58   1.498321 82   54   1.375095  94       85   1.625861
##  22        56   1.446655 34   53   1.349631  49       81   1.549350
##  83        56   1.446655 67   53   1.349631  21       78   1.491966
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 5
## [1] "MSE Count"
## [1] 7
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 11
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  21       185   1.681360 70  241   2.119799  70      862   12.7609178
##  80       174   1.581387 80  220   1.935087 100      813   12.0355292
##  90       159   1.445060 100 212   1.864720  30      742   10.9844560
##  10       155   1.408707 10  202   1.776761  40      714   10.5699482
##  32       155   1.408707 90  191   1.680007  60      629    9.3116210
##  60       154   1.399618 60  189   1.662415  80      568    8.4085862
##  22       150   1.363265 21  180   1.583253  90      467    6.9133975
##  55       150   1.363265 32  176   1.548069  10      449    6.6469282
##  83       150   1.363265 55  171   1.504090  50      382    5.6550703
##  39       146   1.326911 83  161   1.416132  32      241    3.5677276
##  70       146   1.326911 30  154   1.354561  20      140    2.0725389
##  76       143   1.299646 22  151   1.328173  65      111    1.6432272
##  67       142   1.290557 41  149   1.310581  49       77    1.1398964
##  41       140   1.272380 79  142   1.249010  94       74    1.0954848
##  15       138   1.254203 67  141   1.240215  21       57    0.8438194
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 5
## [1] "MSE Count"
## [1] 7
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 12
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         15         30         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  80       359   1.544750 70  406   1.704593  70      1687  15.5197792
##  90       316   1.359725 80  403   1.691998  30      1582  14.5538178
##  21       301   1.295181 100 364   1.528256 100      1540  14.1674333
##  70       301   1.295181 10  345   1.448484  40      1478  13.5970561
##  60       298   1.282272 90  340   1.427492  60      1137  10.4599816
##  32       297   1.277969 50  325   1.364514  80       937   8.6200552
##  62       296   1.273666 60  325   1.364514  90       691   6.3569457
##  10       288   1.239243 21  320   1.343522  10       619   5.6945722
##  15       282   1.213425 32  307   1.288941  50       604   5.5565777
##  79       276   1.187608 30  292   1.225964  32       215   1.9779209
##  39       274   1.179002 20  287   1.204971  20       110   1.0119595
##  55       273   1.174699 62  286   1.200773  65        65   0.5979761
##  99       273   1.174699 99  275   1.154589  49        46   0.4231831
##  100      273   1.174699 39  274   1.150390  94        33   0.3035879
##  76       266   1.144578 55  274   1.150390  21        29   0.2667893
## [1] "M"          "Top-ranked"
## [1] 9000 1000
## [1] "Accuracy Count"
## [1] 6
## [1] "MSE Count"
## [1] 9
## [1] "Knockoff Count"
## [1] 10
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100