This is a summary of a set of 30 experiments I ran on Cranium using a single pipe workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for the code [still in progress]. The test dataset is defined as below:
# Problem parameters
n <- 300;
p =100;
Ytemp <- rbinom(n, 1, .5);
x1 <- matrix(rpois(n*p/5, 4), nrow=n, ncol=p/5);#dim(x1);qr(x1)$rank
x2 <- matrix(runif(n*p/5, 0.1, .99), nrow=n, ncol=p/5);#dim(x2);qr(x2)$rank
x3 <- matrix(rnorm(n*p/5, mean=5,sd=1.5),nrow=n, ncol=p/5);#dim(x3);qr(x3)$rank
x4 <- matrix(rnorm(n*p/5, mean=25,sd=4.5),nrow=n, ncol=p/5);#dim(x4);qr(x4)$rank
x5 <- matrix(rbinom(n*p/5, 10, prob=0.45), nrow=n, ncol=p/5);#dim(x5);qr(x5)$rank
X2 <- cbind.data.frame(Ytemp, x1, x2, x3, x4, x5)
# Here I write the data in a text file [not executed]
# write.table(X2,"C:/Users/simeonem/Documents/CBDA-SL/Cranium/NULL_dataset.txt",sep=",")
# # Here I load the dataset [not executed]
# NULL_dataset = read.csv("C:/Users/simeonem/Documents/CBDA-SL/Cranium/NULL_dataset.txt",header = TRUE)
Thus, NO feature should be extracted by both the knockoff filter and the CBDA-SL algorithms. That translates into no-spikes/flat histograms shown below. No False Discovery Rates are shown. I list the top features selected, set to 11 now.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 3 13 2.559055 6 4.008641
## 29 11 2.165354 1 3.480557
## 13 10 1.968504 58 3.048488
## 36 9 1.771654 15 2.784446
## 11 8 1.574803 3 2.736438
## 14 8 1.574803 80 2.304369
## 24 8 1.574803 87 2.256361
## 25 8 1.574803 84 2.136342
## 32 8 1.574803 46 2.040326
## 38 8 1.574803 53 1.776284
## 71 8 1.574803 18 1.632261
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 5 15 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 72 11 2.131783 1 3.327747
## 17 10 1.937984 6 2.848935
## 6 9 1.744186 15 2.705291
## 48 9 1.744186 3 2.681350
## 90 9 1.744186 58 2.202538
## 1 8 1.550388 80 1.939191
## 27 8 1.550388 5 1.819488
## 44 8 1.550388 84 1.819488
## 47 8 1.550388 46 1.627963
## 54 8 1.550388 53 1.532200
## 65 8 1.550388 18 1.508260
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 33 12 2.448980 1 6.439394
## 53 11 2.244898 6 6.287879
## 6 9 1.836735 15 4.696970
## 18 9 1.836735 3 4.583333
## 22 9 1.836735 58 4.545455
## 29 9 1.836735 46 3.484848
## 56 9 1.836735 87 3.143939
## 100 9 1.836735 84 3.068182
## 3 8 1.632653 80 2.613636
## 38 8 1.632653 5 2.045455
## 40 8 1.632653 18 1.818182
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 5 15 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 12 11 2.272727 6 4.446816
## 6 10 2.066116 1 4.162220
## 26 9 1.859504 15 3.984347
## 47 9 1.859504 3 3.450729
## 71 9 1.859504 58 3.023835
## 84 9 1.859504 80 2.917111
## 5 8 1.652893 84 2.525792
## 70 8 1.652893 87 2.454642
## 89 8 1.652893 5 2.347919
## 94 8 1.652893 18 2.063323
## 1 7 1.446281 46 2.027748
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 22 17 1.593252 1 5.202644
## 31 16 1.499531 6 4.943949
## 82 16 1.499531 15 4.570279
## 29 15 1.405811 58 4.081633
## 32 15 1.405811 3 3.592986
## 45 15 1.405811 87 2.788158
## 46 15 1.405811 80 2.701926
## 86 15 1.405811 84 2.414487
## 3 14 1.312090 53 2.040816
## 4 14 1.312090 44 1.897097
## 5 14 1.312090 46 1.897097
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 17 21 1.805675 1 4.029937
## 38 21 1.805675 6 4.001151
## 13 17 1.461737 15 3.367876
## 51 17 1.461737 3 3.339090
## 58 17 1.461737 80 2.677029
## 65 17 1.461737 84 2.533103
## 74 17 1.461737 58 2.446747
## 1 16 1.375752 87 2.417962
## 36 16 1.375752 46 2.043754
## 50 16 1.375752 53 1.813472
## 87 16 1.375752 5 1.784686
##
##
##
##
##
##
## [1] EXPERIMENT 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 15 30 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 1 19 1.672535 1 8.468560
## 56 19 1.672535 15 6.693712
## 3 18 1.584507 6 6.288032
## 90 17 1.496479 3 5.425963
## 10 16 1.408451 58 4.513185
## 46 16 1.408451 80 3.245436
## 49 16 1.408451 87 3.042596
## 84 16 1.408451 46 2.687627
## 6 15 1.320423 18 2.281947
## 13 15 1.320423 84 2.281947
## 14 15 1.320423 62 1.825558
##
##
##
##
##
##
## [1] EXPERIMENT 9
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 67 29 1.429980 6 6.285714
## 49 28 1.380671 1 6.222222
## 52 28 1.380671 15 4.952381
## 72 27 1.331361 3 4.412698
## 36 26 1.282051 58 3.619048
## 73 26 1.282051 46 3.396825
## 97 26 1.282051 87 2.793651
## 1 25 1.232742 84 2.571429
## 15 25 1.232742 80 2.412698
## 17 25 1.232742 5 2.095238
## 5 24 1.183432 53 1.936508
##
##
##
##
##
##
## [1] EXPERIMENT 10
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 30 50 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 40 28 1.404917 6 4.811982
## 1 27 1.354742 1 4.525175
## 65 27 1.354742 15 4.079031
## 89 27 1.354742 3 4.047164
## 69 26 1.304566 58 3.218611
## 33 25 1.254390 84 2.772467
## 38 25 1.254390 87 2.390057
## 60 25 1.254390 80 2.358190
## 64 25 1.254390 46 1.912046
## 74 25 1.254390 5 1.880178
## 7 24 1.204215 71 1.880178
##
##
##
##
##
##
## [1] EXPERIMENT 11
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 30 50 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 1 30 1.558442 1 12.875536
## 14 28 1.454545 6 10.801144
## 27 26 1.350649 15 9.442060
## 24 25 1.298701 3 7.725322
## 32 25 1.298701 58 5.364807
## 95 25 1.298701 84 4.649499
## 3 24 1.246753 87 4.148784
## 36 24 1.246753 46 3.648069
## 62 24 1.246753 80 3.290415
## 69 24 1.246753 18 2.002861
## 74 24 1.246753 5 1.788269
##
##
##
##
##
##
## [1] EXPERIMENT 12
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 20 30 50 100 100
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## CBDA Frequency Density Knockoff Density
## 55 29 1.439206 6 9.142142
## 71 28 1.389578 15 8.202880
## 75 28 1.389578 1 8.077646
## 12 27 1.339950 3 7.263619
## 80 27 1.339950 58 4.758923
## 84 27 1.339950 84 4.007514
## 1 26 1.290323 87 3.506575
## 24 26 1.290323 80 3.130870
## 45 26 1.290323 46 2.629931
## 3 25 1.240695 5 2.066374
## 51 25 1.240695 18 1.878522