Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density  Knockoff Density  
##  218  26        4.770642 273      10.511478
##  273  21        3.853211 200       8.900524
##  279  19        3.486239 160       6.443818
##  120  16        2.935780 169       5.557793
##  11   15        2.752294 197       4.993959
##  121  15        2.752294 225       4.349577
##  161  15        2.752294  43       3.946839
##  186  15        2.752294 130       3.342731
##  219  15        2.752294  28       2.859444
##  227  15        2.752294 230       2.537253
##  252  15        2.752294 262       2.376158
##  277  15        2.752294 239       2.215062
##  31   12        2.201835  13       2.094241
##  98   12        2.201835 300       1.973419
##  267  12        2.201835  25       1.852598
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"         
##  CBDA Frequency Density   Knockoff Density  
##  289  19        1.2500000 160      15.011472
##  88   17        1.1184211  30      14.913143
##  102  17        1.1184211 130      10.291708
##  44   15        0.9868421 200       8.390692
##  211  15        0.9868421 260       8.292363
##  34   14        0.9210526 230       8.226811
##  125  14        0.9210526 129       4.195346
##  207  14        0.9210526 183       3.605375
##  48   13        0.8552632 300       3.572599
##  66   13        0.8552632 294       3.179285
##  72   13        0.8552632  23       2.687643
##  106  13        0.8552632 140       2.064897
##  22   12        0.7894737 214       2.032121
##  113  12        0.7894737 224       1.507702
##  157  12        0.7894737 273       1.245493
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "3"         
##  CBDA Frequency Density   Knockoff Density  
##  144  32        0.9137636 260      16.700164
##  273  28        0.7995431 130      14.251781
##  37   25        0.7138778 160      12.826603
##  51   25        0.7138778 300      11.748584
##  285  25        0.7138778  30       8.057738
##  55   24        0.6853227 230       5.426640
##  218  24        0.6853227 214       4.732322
##  231  24        0.6853227 100       3.252330
##  241  24        0.6853227 200       2.558012
##  242  24        0.6853227   4       2.283939
##  159  23        0.6567676   1       2.229125
##  274  23        0.6567676  43       2.137767
##  252  22        0.6282125 129       1.735794
##  263  22        0.6282125 183       1.425178
##  87   21        0.5996573  67       1.406907
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "4"         
##  CBDA Frequency Density  Knockoff Density  
##  291  13        2.691511  30      13.692390
##  256  12        2.484472 160       7.181136
##  99   11        2.277433 273       7.100750
##  224  11        2.277433 300       6.216506
##  59   10        2.070393 200       5.359057
##  148  10        2.070393 214       5.332262
##  167  10        2.070393  43       4.957128
##  35    9        1.863354 130       4.448017
##  76    9        1.863354 269       3.912111
##  122   9        1.863354 225       3.322615
##  239   9        1.863354 183       3.295820
##  287   9        1.863354 230       3.001072
##  34    8        1.656315 227       2.652733
##  64    8        1.656315 264       2.625938
##  70    8        1.656315 129       2.197213
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  212  18        1.2738854 160      11.578947
##  93   16        1.1323425  30      10.684211
##  201  16        1.1323425 130       8.438596
##  145  14        0.9907997   1       7.912281
##  261  14        0.9907997 300       7.315789
##  290  14        0.9907997 260       7.298246
##  4    13        0.9200283 200       6.912281
##  36   13        0.9200283 273       6.421053
##  89   13        0.9200283 100       6.157895
##  100  13        0.9200283 129       4.421053
##  148  13        0.9200283  25       4.228070
##  158  13        0.9200283 142       3.842105
##  27   12        0.8492569 230       3.385965
##  56   12        0.8492569  76       2.526316
##  99   12        0.8492569 279       2.526316
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]   1  30  60 100 130 160 200 230 260 300

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density   
##  70   25        0.7409603  30      18.3901023
##  215  25        0.7409603 130      18.0693447
##  46   24        0.7113219 160      17.5042004
##  107  24        0.7113219 260      12.0360470
##  124  24        0.7113219 200       9.4394379
##  13   23        0.6816835 230       7.7287307
##  33   23        0.6816835 300       6.7664579
##  53   23        0.6816835 100       3.2992210
##  175  23        0.6816835 273       1.4815946
##  235  23        0.6816835 129       0.8859019
##  122  22        0.6520451 142       0.8400794
##  144  22        0.6520451 225       0.8248053
##  241  22        0.6520451 183       0.6567894
##  280  22        0.6520451 214       0.4734993
##  294  22        0.6520451  43       0.3207576
## [1] "Nonzero Features"
##  [1]   1  30  60 100 130 160 200 230 260 300

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..