Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  300  10        0.6377551 800      8.4955752
##  400   9        0.5739796 600      7.7876106
##  449   8        0.5102041 900      7.3156342
##  747   7        0.4464286 100      5.8997050
##  20    6        0.3826531 500      5.1917404
##  73    6        0.3826531 300      4.4247788
##  409   6        0.3826531   1      4.0707965
##  643   6        0.3826531 200      3.4218289
##  671   6        0.3826531 400      2.3008850
##  706   6        0.3826531 108      2.1238938
##  11    5        0.3188776 574      1.2979351
##  30    5        0.3188776 700      1.1209440
##  31    5        0.3188776 424      0.9439528
##  34    5        0.3188776 650      0.9439528
##  78    5        0.3188776 139      0.8849558
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  300  22        0.4352127 600      8.3467095
##  400  15        0.2967359 900      8.1460674
##  800  15        0.2967359 800      7.9855538
##  600  14        0.2769535 100      7.5040128
##  117  13        0.2571711   1      6.5810594
##  41   12        0.2373887 300      5.4574639
##  489  12        0.2373887 500      4.8956661
##  703  12        0.2373887 108      2.9695024
##  802  12        0.2373887 200      2.8089888
##  900  12        0.2373887 400      2.1669342
##  36   11        0.2176063 574      1.7255217
##  100  11        0.2176063 700      1.5650080
##  240  11        0.2176063 424      1.2841091
##  248  11        0.2176063 765      0.9630819
##  315  11        0.2176063 379      0.9229535
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         30         60 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  300  29        0.2576862 500      14.705882
##  400  26        0.2310290   1       8.823529
##  800  26        0.2310290 100       8.823529
##  100  22        0.1954860 800       8.823529
##  393  22        0.1954860 900       8.823529
##  781  22        0.1954860 139       5.882353
##  1    21        0.1866003 548       5.882353
##  335  21        0.1866003 574       5.882353
##  491  21        0.1866003  14       2.941176
##  519  21        0.1866003 112       2.941176
##  585  21        0.1866003 246       2.941176
##  586  21        0.1866003 563       2.941176
##  692  21        0.1866003 590       2.941176
##  847  21        0.1866003 596       2.941176
##  896  21        0.1866003 676       2.941176
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density 
##  300  14        0.8668731 600      9.751693
##  795   8        0.4953560 800      9.255079
##  223   7        0.4334365 100      8.397291
##  398   7        0.4334365 900      8.352144
##  35    6        0.3715170 500      7.088036
##  171   6        0.3715170 300      6.817156
##  189   6        0.3715170   1      6.546275
##  539   6        0.3715170 200      4.695260
##  556   6        0.3715170 400      3.431151
##  606   6        0.3715170 108      3.250564
##  656   6        0.3715170 700      2.437923
##  688   6        0.3715170 574      1.489842
##  701   6        0.3715170 738      1.218962
##  4     5        0.3095975 424      1.083521
##  20    5        0.3095975 548      1.083521
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density   
##  300  16        0.3485839 600      13.1027993
##  409  15        0.3267974 800      11.9501294
##  400  14        0.3050109 900      11.0326982
##  650  13        0.2832244 100      10.6092684
##  20   12        0.2614379   1       8.8214538
##  332  12        0.2614379 300       8.6803105
##  333  12        0.2614379 500       8.1392614
##  415  12        0.2614379 200       5.1987768
##  449  12        0.2614379 108       3.3639144
##  638  12        0.2614379 400       3.1051517
##  754  12        0.2614379 700       2.0465773
##  28   11        0.2396514 424       1.3173371
##  49   11        0.2396514 379       0.7998118
##  95   11        0.2396514 762       0.7527641
##  149  11        0.2396514 738       0.7292402
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80 
##  [1]   1 100 200 300 400 500 600 700 800 900

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  CBDA Frequency Density   Knockoff Density  
##  100  26        0.2414786 600      10.910265
##  800  25        0.2321910 100      10.264687
##  700  23        0.2136157 800       9.748225
##  300  22        0.2043280 900       8.715300
##  508  22        0.2043280   1       8.134280
##  18   21        0.1950404 300       6.455778
##  254  21        0.1950404 500       6.455778
##  209  20        0.1857528 200       5.745642
##  400  20        0.1857528 108       3.938025
##  590  20        0.1857528 400       3.679793
##  654  20        0.1857528 700       1.807618
##  659  20        0.1857528 574       1.678502
##  690  20        0.1857528 762       1.355713
##  900  20        0.1857528 379       1.226598
##  105  19        0.1764651 424       1.226598
## [1] "Nonzero Features"
##  [1]   1 100 200 300 400 500 600 700 800 900

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..