Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        5       15       60       80

## [1] "Nonzero features - Signal"
##  [1]  10  20  30  40  50  60  70  80  90 100
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density  Knockoff KO_Frequency KO_Density
##  80   50        9.276438 100      833          10.107997 
##  100  31        5.751391  80      826          10.023055 
##  50   13        2.411874  90      808           9.804635 
##  13   11        2.040816  30      786           9.537677 
##  71   11        2.040816  70      557           6.758888 
##  59   10        1.855288  94      486           5.897343 
##  86   10        1.855288  60      456           5.533309 
##  18    9        1.669759  20      419           5.084334 
##  39    9        1.669759  46      414           5.023662 
##  60    9        1.669759  50      343           4.162116 
##  68    9        1.669759  68      253           3.070016 
##  74    9        1.669759  13      177           2.147798 
##  55    8        1.484230  39      164           1.990050 
##  84    8        1.484230  58       87           1.055697 
##  1     7        1.298701  33       85           1.031428 
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0       15       30       60       80

## [1] "Nonzero features - Signal"
##  [1]  10  20  30  40  50  60  70  80  90 100
## [1] "TABLE with CBDA-SL and Knockoff RESULTS"
##  CBDA Frequency Density  Knockoff KO_Frequency KO_Density
##  80   48        3.924775  80      1955         17.5242022
##  100  42        3.434178 100      1735         15.5521692
##  39   24        1.962388  90      1649         14.7812836
##  50   24        1.962388  30      1534         13.7504482
##  38   20        1.635323  70       785          7.0365722
##  70   19        1.553557  60       642          5.7547508
##  64   18        1.471791  94       630          5.6471854
##  60   17        1.390025  20       467          4.1860882
##  66   17        1.390025  46       439          3.9351022
##  13   16        1.308258  50       359          3.2179993
##  27   16        1.308258  68       208          1.8644676
##  28   16        1.308258  13       141          1.2638939
##  40   16        1.308258  39       134          1.2011474
##  44   16        1.308258  62        44          0.3944066
##  45   16        1.308258  33        42          0.3764790

The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..