This is a summary of a set of 9 experiments I ran on Cranium using a single pipe workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code [still in progress].
# # Here I load the dataset [not executed]
# ADNI_dataset = read.csv("C:/Users/simeonem/Documents/CBDA-SL/Cranium/ADNI_dataset.txt",header = TRUE)
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. No False Discovery Rates are shown (since we don’t have information on the “true” features). I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 6 28 7.734807 4 5.284801
## 4 26 7.182320 6 5.051403
## 2 12 3.314917 2 5.045846
## 5 10 2.762431 9 4.901361
## 9 10 2.762431 5 4.617949
## 12 9 2.486188 60 3.834398
## 27 9 2.486188 7 3.695471
## 66 9 2.486188 59 3.473187
## 7 8 2.209945 3 3.295360
## 26 8 2.209945 8 2.133926
## 53 8 2.209945 56 1.956099
## 13 7 1.933702 16 1.922756
## 55 7 1.933702 63 1.833843
## 60 7 1.933702 55 1.767158
## 1 6 1.657459 15 1.539316
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 26 7.027027 5 4.777456
## 6 26 7.027027 2 4.603634
## 2 18 4.864865 4 4.587832
## 9 11 2.972973 6 4.545694
## 29 9 2.432432 9 4.540427
## 39 9 2.432432 7 4.087437
## 3 8 2.162162 60 3.929418
## 5 8 2.162162 3 3.592310
## 21 7 1.891892 59 3.450092
## 37 7 1.891892 8 2.633658
## 41 7 1.891892 56 2.207006
## 42 7 1.891892 55 2.159600
## 64 7 1.891892 16 1.864630
## 1 6 1.621622 58 1.801422
## 16 6 1.621622 36 1.759284
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 2 32 4.102564 2 10.005750
## 4 30 3.846154 6 9.708645
## 6 29 3.717949 4 9.569676
## 7 19 2.435897 9 8.534598
## 9 18 2.307692 5 7.221583
## 8 16 2.051282 7 5.458118
## 13 16 2.051282 60 4.331992
## 24 16 2.051282 59 3.675484
## 66 16 2.051282 3 3.445467
## 41 15 1.923077 8 2.597278
## 49 15 1.923077 36 1.408856
## 55 15 1.923077 56 1.293847
## 61 15 1.923077 16 1.231551
## 14 14 1.794872 63 1.231551
## 56 14 1.794872 15 1.126126
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 31 4.116866 6 9.478456
## 6 25 3.320053 4 9.217455
## 2 24 3.187251 9 9.070928
## 7 19 2.523240 2 9.006823
## 9 18 2.390438 5 7.408764
## 11 17 2.257636 7 5.916022
## 25 16 2.124834 60 4.290489
## 41 15 1.992032 3 3.974541
## 5 14 1.859230 59 3.795961
## 35 14 1.859230 8 3.365539
## 55 14 1.859230 16 1.712533
## 57 14 1.859230 36 1.579743
## 13 13 1.726428 15 1.501900
## 22 13 1.726428 56 1.492742
## 39 13 1.726428 55 1.401163
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 40 2.923977 4 14.575684
## 6 36 2.631579 2 14.456042
## 2 31 2.266082 6 14.390033
## 55 30 2.192982 9 10.932794
## 23 28 2.046784 5 8.139775
## 14 27 1.973684 7 4.509262
## 66 27 1.973684 60 2.661001
## 17 26 1.900585 8 2.636247
## 42 26 1.900585 3 2.198936
## 32 25 1.827485 59 2.087545
## 35 25 1.827485 65 1.505838
## 5 24 1.754386 36 1.097405
## 7 24 1.754386 39 1.027270
## 9 24 1.754386 16 0.899377
## 26 24 1.754386 63 0.820991
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 38 2.783883 6 14.4776594
## 6 33 2.417582 2 14.2828749
## 2 31 2.271062 4 14.0483384
## 9 31 2.271062 9 12.1561457
## 51 28 2.051282 5 8.2723803
## 42 27 1.978022 7 5.5215456
## 36 24 1.758242 8 3.4146923
## 40 24 1.758242 60 2.7627604
## 43 24 1.758242 3 2.2857370
## 46 24 1.758242 59 2.0710765
## 59 24 1.758242 65 1.9080935
## 14 23 1.684982 39 1.4429957
## 16 23 1.684982 16 1.0176499
## 20 23 1.684982 36 0.9023692
## 24 23 1.684982 44 0.8586421
##
##
##
##
##
##
## [1] EXPERIMENT 7
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 6 27 7.417582 6 5.399143
## 2 25 6.868132 4 5.074731
## 4 19 5.219780 2 4.958869
## 9 11 3.021978 5 4.918318
## 51 11 3.021978 9 4.906732
## 26 9 2.472527 60 3.765496
## 30 9 2.472527 59 3.493222
## 32 9 2.472527 7 3.354188
## 16 8 2.197802 3 3.336809
## 41 8 2.197802 8 1.946472
## 5 7 1.923077 63 1.807438
## 20 7 1.923077 56 1.714749
## 27 7 1.923077 16 1.622060
## 1 6 1.648352 64 1.552543
## 21 6 1.648352 36 1.488819
##
##
##
##
##
##
## [1] EXPERIMENT 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 30 4.115226 6 10.199869
## 6 26 3.566529 2 9.746765
## 2 23 3.155007 4 9.676283
## 9 20 2.743484 9 8.226351
## 11 19 2.606310 5 7.440971
## 12 16 2.194787 7 4.344762
## 5 15 2.057613 60 4.002417
## 16 15 2.057613 59 3.483864
## 55 15 2.057613 3 3.217037
## 27 14 1.920439 8 2.205105
## 44 14 1.920439 36 1.369380
## 54 14 1.920439 63 1.293863
## 63 14 1.920439 16 1.263656
## 38 13 1.783265 56 1.263656
## 57 13 1.783265 15 1.077380
##
##
##
##
##
##
## [1] EXPERIMENT 9
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 40 60
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 4 39 2.895323 6 15.6680411
## 6 39 2.895323 2 15.3609966
## 2 34 2.524128 4 14.5012720
## 9 31 2.301411 9 9.2332661
## 11 27 2.004454 5 7.5664532
## 60 26 1.930215 7 3.9740328
## 21 25 1.855976 60 2.3730152
## 41 25 1.855976 3 2.0835161
## 18 24 1.781737 8 2.0089482
## 24 24 1.781737 59 1.9212212
## 27 24 1.781737 65 1.2237916
## 30 24 1.781737 36 1.0264058
## 7 23 1.707498 39 1.0220195
## 16 23 1.707498 16 0.9079744
## 17 23 1.707498 63 0.7500658