Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        5       15       60       80

## [1] "Nonzero features - Signal"
##  [1]   1  30  60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  130  37        2.3536896 160      731          17.529976 
##  260  21        1.3358779 130      584          14.004796 
##  258  20        1.2722646  30      449          10.767386 
##  60   14        0.8905852 200      384           9.208633 
##  273  14        0.8905852 260      383           9.184652 
##  300  14        0.8905852 230      239           5.731415 
##  67   12        0.7633588 100      189           4.532374 
##  96   11        0.6997455 300      188           4.508393 
##  179  11        0.6997455 273      174           4.172662 
##  267  11        0.6997455 214      151           3.621103 
##  39   10        0.6361323 222      111           2.661871 
##  55   10        0.6361323   1      100           2.398082 
##  83   10        0.6361323 142       66           1.582734 
##  87   10        0.6361323  25       55           1.318945 
##  141  10        0.6361323 258       55           1.318945 
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0       15       30       60       80

## [1] "Nonzero features - Signal"
##  [1]   1  30  60 100 130 160 200 230 260 300
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density   Knockoff KO_Frequency KO_Density
##  130  47        1.2929849 160      1249         23.9869407
##  260  30        0.8253095 130       904         17.3612445
##  273  27        0.7427785  30       644         12.3679662
##  300  26        0.7152682 200       528         10.1401959
##  160  23        0.6327373 260       487          9.3527943
##  65   20        0.5502063 230       258          4.9548684
##  258  20        0.5502063 300       179          3.4376800
##  83   19        0.5226960 273       156          2.9959670
##  136  19        0.5226960 100       153          2.9383522
##  293  19        0.5226960 214       129          2.4774342
##  187  18        0.4951857 222        89          1.7092376
##  201  18        0.4951857   1        87          1.6708277
##  3    17        0.4676754 142        70          1.3443442
##  7    17        0.4676754  25        52          0.9986557
##  13   17        0.4676754 258        46          0.8834262

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..