Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density 
##  210      6     1.2820513 108 7     1.1437908  80      62    1.892552
##  247      6     1.2820513 24  6     0.9803922 248      55    1.678877
##  275      6     1.2820513 42  6     0.9803922 209      51    1.556777
##  293      6     1.2820513 89  6     0.9803922   4      48    1.465201
##  197      5     1.0683761 182 6     0.9803922 150      43    1.312576
##  248      5     1.0683761 205 6     0.9803922  57      39    1.190476
##  259      5     1.0683761 281 6     0.9803922 169      39    1.190476
##  297      5     1.0683761 18  5     0.8169935 154      38    1.159951
##  21       4     0.8547009 22  5     0.8169935 170      37    1.129426
##  45       4     0.8547009 74  5     0.8169935  12      36    1.098901
##  48       4     0.8547009 77  5     0.8169935 147      36    1.098901
##  74       4     0.8547009 88  5     0.8169935 252      36    1.098901
##  100      4     0.8547009 98  5     0.8169935  68      35    1.068376
##  117      4     0.8547009 114 5     0.8169935 171      35    1.068376
##  127      4     0.8547009 121 5     0.8169935  33      34    1.037851
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density 
##  235      13    0.8274984 124 13    0.7475561   4      59    4.100069
##  150      12    0.7638447 34  12    0.6900518 248      59    4.100069
##  273      12    0.7638447 82  12    0.6900518  80      50    3.474635
##  33       11    0.7001910 134 12    0.6900518  12      33    2.293259
##  75       11    0.7001910 200 12    0.6900518 150      33    2.293259
##  113      11    0.7001910 258 12    0.6900518 169      33    2.293259
##  221      11    0.7001910 13  11    0.6325474 209      32    2.223767
##  250      11    0.7001910 91  11    0.6325474  33      30    2.084781
##  280      11    0.7001910 98  11    0.6325474 147      28    1.945796
##  6        10    0.6365372 139 11    0.6325474 154      27    1.876303
##  13       10    0.6365372 159 11    0.6325474 253      27    1.876303
##  16       10    0.6365372 185 11    0.6325474  57      26    1.806810
##  85       10    0.6365372 227 11    0.6325474  68      26    1.806810
##  156      10    0.6365372 250 11    0.6325474 118      25    1.737318
##  239      10    0.6365372 289 11    0.6325474 203      25    1.737318
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density 
##  151      24    0.6690828 300 21    0.5915493   4      77    4.171181
##  21       20    0.5575690 25  20    0.5633803  80      69    3.737811
##  98       20    0.5575690 239 20    0.5633803 248      69    3.737811
##  254      20    0.5575690 23  19    0.5352113 118      53    2.871073
##  74       19    0.5296905 62  19    0.5352113 209      48    2.600217
##  80       19    0.5296905 100 19    0.5352113 154      47    2.546046
##  146      19    0.5296905 232 19    0.5352113 253      44    2.383532
##  147      19    0.5296905 13  18    0.5070423 203      40    2.166847
##  12       18    0.5018121 78  18    0.5070423  57      39    2.112676
##  33       18    0.5018121 93  18    0.5070423 169      38    2.058505
##  81       18    0.5018121 94  18    0.5070423 150      37    2.004334
##  152      18    0.5018121 114 18    0.5070423  33      31    1.679307
##  203      18    0.5018121 190 18    0.5070423 147      30    1.625135
##  235      18    0.5018121 234 18    0.5070423  68      29    1.570964
##  245      18    0.5018121 7   17    0.4788732  70      29    1.570964
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density 
##  113      8     1.5686275 8   7     1.1627907 248      83    2.705346
##  247      8     1.5686275 52  6     0.9966777   4      66    2.151239
##  88       7     1.3725490 79  6     0.9966777 209      65    2.118644
##  147      6     1.1764706 95  6     0.9966777 253      65    2.118644
##  191      6     1.1764706 138 6     0.9966777 169      61    1.988266
##  206      6     1.1764706 158 6     0.9966777  80      60    1.955671
##  255      6     1.1764706 290 6     0.9966777 118      58    1.890482
##  22       5     0.9803922 293 6     0.9966777  12      53    1.727510
##  69       5     0.9803922 67  5     0.8305648 154      51    1.662321
##  90       5     0.9803922 82  5     0.8305648 171      50    1.629726
##  92       5     0.9803922 90  5     0.8305648 203      49    1.597132
##  98       5     0.9803922 99  5     0.8305648  57      48    1.564537
##  273      5     0.9803922 184 5     0.8305648  33      46    1.499348
##  276      5     0.9803922 243 5     0.8305648  68      46    1.499348
##  12       4     0.7843137 246 5     0.8305648 293      43    1.401565
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  235      26    0.7564737 42  22    0.6037322 248      73    10.138889
##  80       24    0.6982834 137 21    0.5762898  80      68     9.444444
##  30       21    0.6109980 205 21    0.5762898   4      64     8.888889
##  51       20    0.5819028 89  20    0.5488474 209      51     7.083333
##  63       20    0.5819028 50  19    0.5214050 118      29     4.027778
##  283      20    0.5819028 107 19    0.5214050 154      29     4.027778
##  165      19    0.5528077 155 19    0.5214050 253      29     4.027778
##  173      19    0.5528077 184 19    0.5214050 147      24     3.333333
##  282      19    0.5528077 1   18    0.4939627  57      23     3.194444
##  76       18    0.5237125 15  18    0.4939627 150      22     3.055556
##  150      18    0.5237125 131 18    0.4939627 169      21     2.916667
##  253      18    0.5237125 178 18    0.4939627 203      20     2.777778
##  292      18    0.5237125 222 18    0.4939627  12      18     2.500000
##  33       17    0.4946174 233 18    0.4939627  68      15     2.083333
##  125      17    0.4946174 269 18    0.4939627 171      15     2.083333
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..