This is a summary of a set of 9 experiments I ran on Cranium using a single pipe workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. The CBDA-SuperLearner has been adapted to a multinomial outcome distribution in this case. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code [still in progress].
# # Here I load the dataset [not executed]
# ABIDE_dataset = read.csv("C:/Users/simeonem/Documents/CBDA-SL/Cranium/ABIDE_dataset.txt",header = TRUE)
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. No False Discovery Rates are shown (since we don’t have information on the “true” features). I list the top features selected, set to 20 here.
## Loading required package: lattice
## Loading required package: ggplot2
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 30 31 2.454473 30 21.482738
## 58 27 2.137767 1 16.134504
## 19 26 2.058591 5 6.405907
## 50 26 2.058591 20 5.996807
## 4 25 1.979414 55 5.607663
## 20 25 1.979414 54 3.741768
## 51 25 1.979414 34 3.641988
## 54 25 1.979414 29 3.322690
## 8 24 1.900238 24 2.913590
## 9 24 1.900238 57 2.255039
## 21 24 1.900238 31 1.347037
## 23 24 1.900238 16 1.297146
## 61 24 1.900238 6 1.287168
## 7 23 1.821061 42 1.247256
## 6 22 1.741884 50 1.247256
##
##
##
##
##
##
## [1] EXPERIMENT 7
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 30 11 3.313253 30 4.927607
## 1 10 3.012048 1 4.392262
## 4 10 3.012048 54 2.688892
## 43 9 2.710843 5 2.640224
## 54 9 2.710843 29 2.457720
## 8 8 2.409639 24 2.372551
## 12 8 2.409639 20 2.056211
## 29 8 2.409639 23 1.946709
## 58 8 2.409639 55 1.885874
## 3 7 2.108434 50 1.849373
## 9 7 2.108434 2 1.837206
## 16 7 2.108434 3 1.788539
## 20 7 2.108434 16 1.776372
## 21 7 2.108434 4 1.752038
## 44 7 2.108434 51 1.727704
##
##
##
##
##
##
## [1] EXPERIMENT 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 14 19 2.653631 30 8.207028
## 30 19 2.653631 1 7.263950
## 16 16 2.234637 5 4.535758
## 23 16 2.234637 20 3.424273
## 33 16 2.234637 54 3.109914
## 43 16 2.234637 24 3.031324
## 50 16 2.234637 29 2.683283
## 8 15 2.094972 55 2.660829
## 12 15 2.094972 34 2.469967
## 36 15 2.094972 2 2.312788
## 48 15 2.094972 57 2.020882
## 49 15 2.094972 6 1.987201
## 51 15 2.094972 3 1.852476
## 9 14 1.955307 51 1.785113
## 10 14 1.955307 23 1.706523
##
##
##
##
##
##
## [1] EXPERIMENT 9
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 30 30 2.347418 30 10.486932
## 1 28 2.190923 1 9.207244
## 16 28 2.190923 5 5.433250
## 44 27 2.112676 20 3.849908
## 5 26 2.034429 55 3.383581
## 20 26 2.034429 2 2.830496
## 34 26 2.034429 54 2.602755
## 15 25 1.956182 6 2.591910
## 24 25 1.956182 34 2.581065
## 52 25 1.956182 24 2.375014
## 55 25 1.956182 29 2.201497
## 26 24 1.877934 4 2.114738
## 27 24 1.877934 3 1.886997
## 29 24 1.877934 57 1.886997
## 35 24 1.877934 51 1.746015
## [1] "Top Features Selected across multiple experiments,shared between CBDA-SL and Knockoff filter"
## [1] 30 4 1 54 29 20 5 34
## [1] "R_precuneus" "PixelSpacingX"
## [3] "subjectSex" "R_cingulate_gyrus"
## [5] "L_precuneus" "R_gyrus_rectus"
## [7] "PixelSpacingY" "R_middle_occipital_gyrus"
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions (SL_Pred_Combined) is then used to generate the confusion matrix. By doing so, we combined the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first stage. Then, the second stage uses the top common features selected to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 99 90
## 1 66 75
##
## Accuracy : 0.5273
## 95% CI : (0.4719, 0.5822)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.17469
##
## Kappa : 0.0545
## Mcnemar's Test P-Value : 0.06555
##
## Sensitivity : 0.6000
## Specificity : 0.4545
## Pos Pred Value : 0.5238
## Neg Pred Value : 0.5319
## Prevalence : 0.5000
## Detection Rate : 0.3000
## Detection Prevalence : 0.5727
## Balanced Accuracy : 0.5273
##
## 'Positive' Class : 0
##