This is a summary of a set of 30 experiments I ran on Cranium using a single pipe workflow file that performs the following tasks: * Loads a text file with arguments (each line is an experiment with the specs: M[jobs],misValperc[missing value %],min[Kcol_min] and max[Kcol_max] for FSR-Feature Sampling Range %, min[Nrow_min] and max[Nrow_max] for SSR-Subject Sampling Range %). * Loads a dataset for machine learning * Sets the number of jobs/iteration of the CBDA-SL algorithm [j_global] * Sets the experiment to run [i_exp, in a sequence constrained by the max number of jobs that can be submitted on Cranium through the LONI pipeline as a guest] * Set the working directory where every workspace is saved * Set the R script/scripts to be run
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for the code [still in progress]. The test dataset is defined as below:
# Problem parameters
n = 300 # number of observations
p = 100 # number of variables
nonzero=c(1,seq(10,p,10)) # variables with nonzero coefficients (fix location)
k = length(nonzero) # number of variables with nonzero coefficients
amplitude = 3.5 # signal amplitude (for noise level = 1)
X1 = matrix(rnorm(n*p), nrow=n, ncol=p)
nonzero=c(1,seq(10,p,10))
beta = amplitude * (1:p %in% nonzero)
y.sample <- function() X1 %*% beta + rnorm(n)
Ytemp = y.sample()# Here I write the data in a text file [not executed]
X2 <- cbind(Ytemp,X1)
#write.table(X2,"C:/Users/simeonem/Documents/CBDA-SL/Cranium/Gaussian_dataset.txt",sep=",")
# Here I load the dataset [not executed]
#Gaussian_dataset = read.csv("C:/Users/simeonem/Documents/CBDA-SL/Cranium/Gaussian_dataset.txt",header = TRUE)
# Here the X and Y matrix/vector are set for the CBDA-SL algorithm to proceed [not executed]
#Ytemp <- Gaussian_dataset[,1]
#Xtemp <- Gaussian_dataset[,-1]
Thus, the features that should be extracted by both the knockoff filter and the CBDA-SL algorithms are 1, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100. I don’t list the False Discovery Rates.
## [1] 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 3000 0 5 15 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 3000 0 5 15 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 3000 0 5 15 60 80
## CBDA Frequency Density Knockoff Density
## 59 12 2.424242 30 5.028448
## 92 11 2.222222 40 4.936183
## 31 9 1.818182 20 4.813163
## 42 9 1.818182 10 4.582500
## 52 9 1.818182 80 4.567123
## 54 9 1.818182 100 4.459480
## 63 9 1.818182 60 4.336460
## 77 9 1.818182 70 4.321083
## 87 9 1.818182 1 4.167307
## 27 8 1.616162 50 4.151930
## 29 8 1.616162 90 3.829002
## [1] 100
## [1] 0
## [1] 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 3000 40 5 15 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 3000 40 5 15 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 3000 40 5 15 60 80
## CBDA Frequency Density Knockoff Density
## 13 15 2.862595 70 5.053838
## 26 10 1.908397 30 4.810698
## 87 10 1.908397 40 4.637027
## 93 10 1.908397 50 4.150747
## 16 9 1.717557 10 4.081278
## 53 9 1.717557 80 4.046544
## 55 8 1.526718 90 3.942341
## 60 8 1.526718 20 3.751303
## 79 8 1.526718 60 3.647100
## 81 8 1.526718 1 3.299757
## 95 8 1.526718 100 2.830844
## [1] 90.90909
## [1] 0
## [1] 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 6 3000 0 5 15 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 6 3000 0 5 15 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 6 3000 0 5 15 100 100
## CBDA Frequency Density Knockoff Density
## 43 12 2.564103 70 4.957116
## 26 10 2.136752 80 4.739061
## 92 10 2.136752 10 4.535543
## 18 9 1.923077 40 4.506469
## 31 9 1.923077 30 4.419247
## 48 9 1.923077 50 4.419247
## 80 8 1.709402 100 4.390173
## 84 8 1.709402 60 4.244803
## 5 7 1.495726 20 4.186655
## 14 7 1.495726 90 4.186655
## 22 7 1.495726 1 3.765082
## [1] 90.90909
## [1] 0
## [1] 7
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 3000 10 5 15 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 3000 10 5 15 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 3000 10 5 15 100 100
## CBDA Frequency Density Knockoff Density
## 4 10 1.980198 30 4.933273
## 23 10 1.980198 70 4.798321
## 5 9 1.782178 50 4.708352
## 13 9 1.782178 80 4.423452
## 89 9 1.782178 100 4.288499
## 54 8 1.584158 40 4.273504
## 58 8 1.584158 90 4.213525
## 62 8 1.584158 60 4.183536
## 65 8 1.584158 1 4.168541
## 68 8 1.584158 20 4.033588
## 79 8 1.584158 10 3.958614
## [1] 100
## [1] 0
## [1] 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 3000 20 5 15 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 3000 20 5 15 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 3000 20 5 15 100 100
## CBDA Frequency Density Knockoff Density
## 94 10 2.109705 30 4.718257
## 20 8 1.687764 50 4.688204
## 38 8 1.687764 40 4.583020
## 52 8 1.687764 70 4.583020
## 61 8 1.687764 20 4.537941
## 68 8 1.687764 10 4.492863
## 74 8 1.687764 80 4.477836
## 78 8 1.687764 60 4.297521
## 82 8 1.687764 90 4.282494
## 1 7 1.476793 100 4.042074
## 4 7 1.476793 1 3.906837
## [1] 81.81818
## [1] 0
## [1] 9
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9 3000 30 5 15 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9 3000 30 5 15 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9 3000 30 5 15 100 100
## CBDA Frequency Density Knockoff Density
## 89 11 2.165354 30 4.959561
## 85 10 1.968504 70 4.593316
## 2 9 1.771654 60 4.547535
## 70 9 1.771654 90 4.547535
## 98 9 1.771654 10 4.410194
## 10 8 1.574803 50 4.303373
## 14 8 1.574803 80 4.288112
## 37 8 1.574803 40 4.272852
## 39 8 1.574803 1 4.028689
## 59 8 1.574803 20 3.982909
## 60 8 1.574803 100 3.830307
## [1] 72.72727
## [1] 0
## [1] 10
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 10 3000 40 5 15 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 10 3000 40 5 15 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 10 3000 40 5 15 100 100
## CBDA Frequency Density Knockoff Density
## 3 11 2.315789 40 4.837959
## 90 10 2.105263 30 4.791441
## 40 9 1.894737 90 4.388277
## 5 8 1.684211 70 4.341758
## 42 8 1.684211 60 4.295240
## 71 8 1.684211 80 3.923089
## 81 8 1.684211 20 3.845557
## 31 7 1.473684 10 3.799039
## 33 7 1.473684 50 3.752520
## 38 7 1.473684 1 3.674988
## 44 7 1.473684 100 3.659482
## [1] 81.81818
## [1] 0
## [1] 11
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 11 3000 0 15 30 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 11 3000 0 15 30 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 11 3000 0 15 30 60 80
## CBDA Frequency Density Knockoff Density
## 4 18 1.620162 30 6.570039
## 5 17 1.530153 50 6.398398
## 56 17 1.530153 40 6.303042
## 59 17 1.530153 60 6.255364
## 69 17 1.530153 90 6.179079
## 74 17 1.530153 70 6.131401
## 2 16 1.440144 20 5.978831
## 48 16 1.440144 10 5.835797
## 77 16 1.440144 80 5.749976
## 10 15 1.350135 100 5.578335
## 11 15 1.350135 1 5.559264
## [1] 90.90909
## [1] 0
## [1] 12
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 12 3000 10 15 30 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 12 3000 10 15 30 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 12 3000 10 15 30 60 80
## CBDA Frequency Density Knockoff Density
## 3 18 1.588703 30 6.576629
## 27 17 1.500441 40 6.410256
## 59 17 1.500441 70 6.302603
## 82 17 1.500441 90 6.302603
## 26 16 1.412180 50 6.146017
## 34 16 1.412180 80 5.960070
## 64 16 1.412180 60 5.871991
## 65 16 1.412180 20 5.862204
## 7 15 1.323919 10 5.695831
## 36 15 1.323919 1 5.255432
## 41 15 1.323919 100 4.922685
## [1] 100
## [1] 0
## [1] 13
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 13 3000 20 15 30 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 13 3000 20 15 30 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 13 3000 20 15 30 60 80
## CBDA Frequency Density Knockoff Density
## 47 19 1.636520 70 6.895165
## 35 18 1.550388 30 6.473012
## 43 18 1.550388 40 6.473012
## 50 18 1.550388 50 5.859885
## 59 18 1.550388 60 5.849834
## 1 17 1.464255 80 5.789527
## 58 17 1.464255 10 5.538245
## 71 17 1.464255 90 5.538245
## 4 16 1.378122 20 5.186451
## 33 16 1.378122 1 5.025631
## 68 16 1.378122 100 4.623580
## [1] 81.81818
## [1] 0
## [1] 14
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 14 3000 30 15 30 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 14 3000 30 15 30 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 14 3000 30 15 30 60 80
## CBDA Frequency Density Knockoff Density
## 39 21 1.865009 70 7.062720
## 76 18 1.598579 30 6.752858
## 14 17 1.509769 40 6.186558
## 95 17 1.509769 90 6.111764
## 27 16 1.420959 80 5.577519
## 28 16 1.420959 50 5.566834
## 86 16 1.420959 10 5.278342
## 94 16 1.420959 60 4.925740
## 96 16 1.420959 20 4.915055
## 13 15 1.332149 1 4.754781
## 34 15 1.332149 100 4.306016
## [1] 100
## [1] 0
## [1] 15
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 15 3000 40 15 30 60 80
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 15 3000 40 15 30 60 80
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 15 3000 40 15 30 60 80
## CBDA Frequency Density Knockoff Density
## 16 21 1.905626 70 7.469022
## 17 20 1.814882 30 7.033043
## 48 17 1.542650 40 6.620009
## 75 17 1.542650 90 5.828362
## 3 16 1.451906 50 5.610372
## 19 16 1.451906 80 5.564479
## 31 15 1.361162 20 5.059660
## 38 15 1.361162 60 4.703993
## 39 15 1.361162 10 4.290959
## 41 15 1.361162 1 3.946765
## 95 15 1.361162 100 3.843506
## [1] 100
## [1] 0
## [1] 30
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 30 3000 40 30 50 100 100
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 30 3000 40 30 50 100 100
## [1] "CBDA-SL results"
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 30 3000 40 30 50 100 100
## CBDA Frequency Density Knockoff Density
## 76 28 1.430761 70 8.339091
## 23 27 1.379663 30 8.021280
## 2 26 1.328564 40 7.793285
## 12 26 1.328564 50 7.171480
## 37 26 1.328564 80 6.839851
## 9 24 1.226367 90 6.819124
## 52 24 1.226367 60 6.591129
## 62 24 1.226367 10 6.100594
## 65 24 1.226367 20 5.782783
## 79 24 1.226367 1 5.534061
## 82 24 1.226367 100 4.843167
## [1] 100
## [1] 0