Some useful information

This is a summary of a set of 30 experiments I ran on Cranium using a single pipe workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for the code [still in progress]. The test dataset is defined as below:

# Problem parameters
n <- 300;
p =100;
Ytemp <- rbinom(n, 1, .5);

x1 <- matrix(rpois(n*p/5, 4), nrow=n, ncol=p/5);#dim(x1);qr(x1)$rank
x2 <- matrix(runif(n*p/5, 0.1, .99), nrow=n, ncol=p/5);#dim(x2);qr(x2)$rank
x3 <- matrix(rnorm(n*p/5, mean=5,sd=1.5),nrow=n, ncol=p/5);#dim(x3);qr(x3)$rank
x4 <- matrix(rnorm(n*p/5, mean=25,sd=4.5),nrow=n, ncol=p/5);#dim(x4);qr(x4)$rank
x5 <- matrix(rbinom(n*p/5, 10, prob=0.45), nrow=n, ncol=p/5);#dim(x5);qr(x5)$rank

X2 <- cbind.data.frame(Ytemp, x1, x2, x3, x4, x5)
# Here I write the data in a text file [not executed]
# write.table(X2,"C:/Users/simeonem/Documents/CBDA-SL/Cranium/NULL_dataset.txt",sep=",")
# # Here I load the dataset [not executed]
# NULL_dataset = read.csv("C:/Users/simeonem/Documents/CBDA-SL/Cranium/NULL_dataset.txt",header = TRUE)

Thus, NO feature should be extracted by both the knockoff filter and the CBDA-SL algorithms. That translates into no-spikes/flat histograms shown below. No False Discovery Rates are shown. I list the top features selected, set to 11 now.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  3    13        2.559055  6       4.008641
##  29   11        2.165354  1       3.480557
##  13   10        1.968504 58       3.048488
##  36    9        1.771654 15       2.784446
##  11    8        1.574803  3       2.736438
##  14    8        1.574803 80       2.304369
##  24    8        1.574803 87       2.256361
##  25    8        1.574803 84       2.136342
##  32    8        1.574803 46       2.040326
##  38    8        1.574803 53       1.776284
##  71    8        1.574803 18       1.632261
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  72   11        2.131783  1       3.327747
##  17   10        1.937984  6       2.848935
##  6     9        1.744186 15       2.705291
##  48    9        1.744186  3       2.681350
##  90    9        1.744186 58       2.202538
##  1     8        1.550388 80       1.939191
##  27    8        1.550388  5       1.819488
##  44    8        1.550388 84       1.819488
##  47    8        1.550388 46       1.627963
##  54    8        1.550388 53       1.532200
##  65    8        1.550388 18       1.508260
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15        100        100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  33   12        2.448980  1       6.439394
##  53   11        2.244898  6       6.287879
##  6     9        1.836735 15       4.696970
##  18    9        1.836735  3       4.583333
##  22    9        1.836735 58       4.545455
##  29    9        1.836735 46       3.484848
##  56    9        1.836735 87       3.143939
##  100   9        1.836735 84       3.068182
##  3     8        1.632653 80       2.613636
##  38    8        1.632653  5       2.045455
##  40    8        1.632653 18       1.818182
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15        100        100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  12   11        2.272727  6       4.446816
##  6    10        2.066116  1       4.162220
##  26    9        1.859504 15       3.984347
##  47    9        1.859504  3       3.450729
##  71    9        1.859504 58       3.023835
##  84    9        1.859504 80       2.917111
##  5     8        1.652893 84       2.525792
##  70    8        1.652893 87       2.454642
##  89    8        1.652893  5       2.347919
##  94    8        1.652893 18       2.063323
##  1     7        1.446281 46       2.027748
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  22   17        1.593252  1       5.202644
##  31   16        1.499531  6       4.943949
##  82   16        1.499531 15       4.570279
##  29   15        1.405811 58       4.081633
##  32   15        1.405811  3       3.592986
##  45   15        1.405811 87       2.788158
##  46   15        1.405811 80       2.701926
##  86   15        1.405811 84       2.414487
##  3    14        1.312090 53       2.040816
##  4    14        1.312090 44       1.897097
##  5    14        1.312090 46       1.897097
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  17   21        1.805675  1       4.029937
##  38   21        1.805675  6       4.001151
##  13   17        1.461737 15       3.367876
##  51   17        1.461737  3       3.339090
##  58   17        1.461737 80       2.677029
##  65   17        1.461737 84       2.533103
##  74   17        1.461737 58       2.446747
##  1    16        1.375752 87       2.417962
##  36   16        1.375752 46       2.043754
##  50   16        1.375752 53       1.813472
##  87   16        1.375752  5       1.784686
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 8
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         15         30        100        100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  1    19        1.672535  1       8.468560
##  56   19        1.672535 15       6.693712
##  3    18        1.584507  6       6.288032
##  90   17        1.496479  3       5.425963
##  10   16        1.408451 58       4.513185
##  46   16        1.408451 80       3.245436
##  49   16        1.408451 87       3.042596
##  84   16        1.408451 46       2.687627
##  6    15        1.320423 18       2.281947
##  13   15        1.320423 84       2.281947
##  14   15        1.320423 62       1.825558
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 9
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         30         50         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  67   29        1.429980  6       6.285714
##  49   28        1.380671  1       6.222222
##  52   28        1.380671 15       4.952381
##  72   27        1.331361  3       4.412698
##  36   26        1.282051 58       3.619048
##  73   26        1.282051 46       3.396825
##  97   26        1.282051 87       2.793651
##  1    25        1.232742 84       2.571429
##  15   25        1.232742 80       2.412698
##  17   25        1.232742  5       2.095238
##  5    24        1.183432 53       1.936508
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 10
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         30         50         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  40   28        1.404917  6       4.811982
##  1    27        1.354742  1       4.525175
##  65   27        1.354742 15       4.079031
##  89   27        1.354742  3       4.047164
##  69   26        1.304566 58       3.218611
##  33   25        1.254390 84       2.772467
##  38   25        1.254390 87       2.390057
##  60   25        1.254390 80       2.358190
##  64   25        1.254390 46       1.912046
##  74   25        1.254390  5       1.880178
##  7    24        1.204215 71       1.880178
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 11
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         30         50        100        100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density  
##  1    30        1.558442  1       12.875536
##  14   28        1.454545  6       10.801144
##  27   26        1.350649 15        9.442060
##  24   25        1.298701  3        7.725322
##  32   25        1.298701 58        5.364807
##  95   25        1.298701 84        4.649499
##  3    24        1.246753 87        4.148784
##  36   24        1.246753 46        3.648069
##  62   24        1.246753 80        3.290415
##  69   24        1.246753 18        2.002861
##  74   24        1.246753  5        1.788269
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 12
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20         30         50        100        100

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
##  CBDA Frequency Density  Knockoff Density 
##  55   29        1.439206  6       9.142142
##  71   28        1.389578 15       8.202880
##  75   28        1.389578  1       8.077646
##  12   27        1.339950  3       7.263619
##  80   27        1.339950 58       4.758923
##  84   27        1.339950 84       4.007514
##  1    26        1.290323 87       3.506575
##  24   26        1.290323 80       3.130870
##  45   26        1.290323 46       2.629931
##  3    25        1.240695  5       2.066374
##  51   25        1.240695 18       1.878522