Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  70       211   2.937900 63  114   1.766894  70      267   6.046196
##  32       193   2.687274 48  112   1.735896  40      240   5.434783
##  60       185   2.575884 43  105   1.627402 100      219   4.959239
##  80       167   2.325258 8    98   1.518909  30      216   4.891304
##  30       159   2.213868 49   94   1.456913  60      198   4.483696
##  90       152   2.116402 1    92   1.425914  80      183   4.144022
##  100      137   1.907547 47   89   1.379417  10      161   3.645833
##  10       130   1.810081 55   89   1.379417  90      158   3.577899
##  21       119   1.656920 52   86   1.332920  50      151   3.419384
##  57       119   1.656920 37   85   1.317421  32      118   2.672101
##  22       101   1.406294 33   84   1.301922  65       92   2.083333
##  79        98   1.364522 89   84   1.301922  20       79   1.788949
##  82        96   1.336675 93   84   1.301922  49       72   1.630435
##  39        94   1.308828 71   83   1.286423  94       61   1.381341
##  20        92   1.280980 88   82   1.270924  21       55   1.245471
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       432   1.993447 48  268   1.431395  70      586   12.5724094
##  60       362   1.670435 88  244   1.303210 100      541   11.6069513
##  32       356   1.642748 7   242   1.292528  40      527   11.3065866
##  80       347   1.601218 52  239   1.276505  30      521   11.1778588
##  30       345   1.591989 71  238   1.271164  60      376    8.0669384
##  90       340   1.568917 77  237   1.265823  80      299    6.4149324
##  10       317   1.462784 43  236   1.260482  90      267    5.7283845
##  100      297   1.370495 87  233   1.244459  10      261    5.5996567
##  21       266   1.227447 44  232   1.239118  50      204    4.3767432
##  57       263   1.213603 38  231   1.233777  32      133    2.8534649
##  46       255   1.176688 63  231   1.233777  65       79    1.6949153
##  55       250   1.153615 64  227   1.212413  20       76    1.6305514
##  79       250   1.153615 75  227   1.212413  49       60    1.2872774
##  39       246   1.135158 84  226   1.207072  21       44    0.9440034
##  22       240   1.107471 26  225   1.201730  94       41    0.8796396
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       784   1.673926 43  502   1.179068  70      1024  16.8089297
##  60       676   1.443334 8   497   1.167324 100       912  14.9704531
##  80       671   1.432659 1   493   1.157929  40       845  13.8706500
##  90       641   1.368605 23  492   1.155581  30       825  13.5423506
##  10       620   1.323768 9   488   1.146186  60       568   9.3237032
##  100      608   1.298147 2   481   1.129744  80       402   6.5988181
##  30       596   1.272525 38  480   1.127396  90       348   5.7124097
##  32       592   1.263985 91  479   1.125047  10       295   4.8424163
##  50       580   1.238364 44  478   1.122698  50       250   4.1037426
##  21       541   1.155094 7   477   1.120349  32       134   2.1996060
##  22       526   1.123068 63  477   1.120349  20        73   1.1982928
##  79       510   1.088906 81  476   1.118001  65        54   0.8864084
##  83       509   1.086771 33  475   1.115652  49        35   0.5745240
##  36       499   1.065420 18  474   1.113303  94        35   0.5745240
##  41       497   1.061150 78  474   1.113303  21        30   0.4924491
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density 
##  32       219   2.959859 48  158   2.414057 100      337   6.643012
##  60       185   2.500338 63  129   1.970970  30      325   6.406466
##  70       184   2.486823 43  119   1.818182  70      315   6.209344
##  80       157   2.121908 42  112   1.711230  40      307   6.051646
##  30       149   2.013786 49  107   1.634836  60      281   5.539129
##  90       146   1.973240 38  101   1.543163  90      253   4.987187
##  21       135   1.824571 8   100   1.527884  80      249   4.908338
##  57       122   1.648871 29   98   1.497326  50      216   4.257836
##  61       110   1.486687 84   96   1.466769  10      208   4.100138
##  10       106   1.432626 89   96   1.466769  32      162   3.193377
##  39       105   1.419111 37   95   1.451490  20      129   2.542874
##  22       104   1.405595 55   93   1.420932  65      110   2.168342
##  100      101   1.365049 47   92   1.405653  49       99   1.951508
##  41        97   1.310988 88   91   1.390374  21       97   1.912084
##  15        96   1.297473 92   91   1.390374  94       84   1.655825
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  80       338   1.566701 29  251   1.381246 100      831   12.3275478
##  60       317   1.469361 25  245   1.348228  70      799   11.8528408
##  10       315   1.460091 4   243   1.337222  30      758   11.2446225
##  21       306   1.418374 7   242   1.331719  40      754   11.1852841
##  90       303   1.404468 63  242   1.331719  60      665    9.8650052
##  55       300   1.390563 85  237   1.304204  80      525    7.7881620
##  70       299   1.385928 86  237   1.304204  10      453    6.7200712
##  32       289   1.339575 38  235   1.293198  90      434    6.4382139
##  79       272   1.260777 52  235   1.293198  50      405    6.0080107
##  67       268   1.242236 43  234   1.287695  32      191    2.8334075
##  100      268   1.242236 17  233   1.282192  20      125    1.8543243
##  83       259   1.200519 18  233   1.282192  65      100    1.4834594
##  35       258   1.195884 68  233   1.282192  94       67    0.9939178
##  22       255   1.181978 95  232   1.276689  21       65    0.9642486
##  15       254   1.177343 16  231   1.271186  49       64    0.9494140
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]  10  20  30  40  50  60  70  80  90 100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density  MSE Count Density  Knockoff Count Density   
##  70       679   1.463173 89  518   1.225542  70      1694  15.4986276
##  80       663   1.428695 75  501   1.185322 100      1577  14.4281793
##  90       624   1.344654 18  498   1.178224  40      1511  13.8243367
##  55       596   1.284317 17  496   1.173492  30      1481  13.5498628
##  32       590   1.271387 96  496   1.173492  60      1157  10.5855444
##  60       586   1.262768 47  492   1.164029  80       893   8.1701738
##  10       572   1.232599 84  489   1.156931  90       774   7.0814273
##  21       551   1.187346 68  488   1.154565  10       632   5.7822507
##  62       537   1.157178 28  487   1.152199  50       589   5.3888381
##  34       534   1.150713 23  484   1.145101  32       233   2.1317475
##  76       525   1.131319 64  484   1.145101  20       108   0.9881061
##  99       523   1.127009 43  483   1.142735  65        76   0.6953339
##  100      518   1.116235 87  482   1.140370  49        45   0.4117109
##  22       517   1.114080 4   480   1.135638  94        32   0.2927722
##  39       516   1.111925 7   478   1.130906  21        26   0.2378774
## [1] "Nonzero Features"
##  [1]  10  20  30  40  50  60  70  80  90 100

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..