Some useful information

This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.

## [1] EXPERIMENT 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 1 9000          0        1        5       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  1500     88    0.1812116 100  99    0.1981149  100     164   8.4318766
##  100      83    0.1709155 1500 91    0.1821056 1000     130   6.6838046
##  1200     78    0.1606194 1200 76    0.1520882 1500     109   5.6041131
##  1000     74    0.1523825 1000 73    0.1460847 1400      84   4.3187661
##  326      61    0.1256126 863  72    0.1440836 1200      47   2.4164524
##  863      61    0.1256126 1400 58    0.1160673  800      38   1.9537275
##  599      59    0.1214942 1047 54    0.1080627 1156      31   1.5938303
##  735      58    0.1194349 326  53    0.1060615 1047      30   1.5424165
##  818      55    0.1132573 599  52    0.1040604  694      26   1.3367609
##  683      53    0.1091388 683  52    0.1040604  138      21   1.0796915
##  304      52    0.1070796 429  51    0.1020592  589      21   1.0796915
##  1279     52    0.1070796 735  51    0.1020592  863      21   1.0796915
##  1400     52    0.1070796 909  51    0.1020592  400      18   0.9254499
##  1475     52    0.1070796 229  50    0.1000580 1015      18   0.9254499
##  909      51    0.1050204 818  50    0.1000580  200      15   0.7712082
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 2 9000          0        5       15       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "2"         
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  100      207   0.13468583 100  203   0.13131000  100     69    6.9207623
##  1400     185   0.12037139 1000 186   0.12031359 1400     59    5.9177533
##  1000     183   0.11907008 1200 184   0.11901990 1500     56    5.6168506
##  1200     180   0.11711811 1500 173   0.11190458 1000     51    5.1153460
##  1500     173   0.11256352 863  155   0.10026133 1200     36    3.6108325
##  179      143   0.09304383 599  141   0.09120546  800     27    2.7081244
##  599      139   0.09044121 852  141   0.09120546  200     16    1.6048144
##  1134     137   0.08913990 1400 140   0.09055862 1047     16    1.6048144
##  1337     135   0.08783859 186  139   0.08991177 1404     16    1.6048144
##  996      133   0.08653727 304  138   0.08926492 1156     15    1.5045135
##  1457     133   0.08653727 800  138   0.08926492  400     13    1.3039117
##  216      131   0.08523596 400  137   0.08861808  694     13    1.3039117
##  400      131   0.08523596 1266 137   0.08861808 1015     13    1.3039117
##  852      131   0.08523596 910  135   0.08732438 1270     13    1.3039117
##  326      129   0.08393465 1297 134   0.08667753  138      9    0.9027081
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          1          5         60         80 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 4 9000          0        1        5       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "4"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  863      80    0.1735509 863  123   0.2487109  100     242   10.886190
##  1000     64    0.1388407 1000  91   0.1840057 1000     228   10.256410
##  1475     57    0.1236550 1500  71   0.1435649 1500     209    9.401709
##  834      55    0.1193162 100   65   0.1314326 1400     190    8.547009
##  513      54    0.1171468 819   58   0.1172783 1200     116    5.218174
##  1200     54    0.1171468 1275  57   0.1152563  800     115    5.173189
##  304      51    0.1106387 1047  56   0.1132343 1156      57    2.564103
##  819      50    0.1084693 371   54   0.1091902 1047      42    1.889339
##  1036     50    0.1084693 800   54   0.1091902 1413      35    1.574449
##  1047     50    0.1084693 834   54   0.1091902 1015      34    1.529465
##  1128     50    0.1084693 1413  54   0.1091902  138      32    1.439496
##  276      48    0.1041305 854   52   0.1051461  589      31    1.394512
##  708      48    0.1041305 532   51   0.1031241 1266      31    1.394512
##  1367     48    0.1041305 1356  51   0.1031241  400      29    1.304543
##  63       47    0.1019611 142   50   0.1011020  694      29    1.304543
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0          5         15         60         80 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 5 9000          0        5       15       60       80
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "5"         
##  Accuracy Count Density    MSE  Count Density    Knockoff Count Density  
##  863      190   0.12467519 863  279   0.18071587  100     272   10.755239
##  1200     168   0.11023911 1000 197   0.12760224 1000     242    9.569000
##  400      166   0.10892674 800  167   0.10817043 1500     235    9.292210
##  1000     165   0.10827056 1500 166   0.10752270 1400     225    8.896797
##  599      160   0.10498963 599  147   0.09521589  800     166    6.563859
##  1500     159   0.10433345 1014 145   0.09392043 1200     158    6.247529
##  100      145   0.09514685 400  140   0.09068180 1413      68    2.688810
##  304      138   0.09055356 1413 140   0.09068180 1047      58    2.293397
##  1063     138   0.09055356 100  137   0.08873862 1156      54    2.135231
##  112      137   0.08989737 819  135   0.08744316 1015      53    2.095690
##  326      136   0.08924119 1438 135   0.08744316  138      49    1.937525
##  917      134   0.08792882 304  134   0.08679543  589      30    1.186240
##  179      133   0.08727263 122  133   0.08614771  599      28    1.107157
##  519      132   0.08661645 1245 133   0.08614771  200      27    1.067616
##  1157     132   0.08661645 1284 133   0.08614771  400      27    1.067616
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 7
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          1          5         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 7 9000         20        1        5       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "7"         
##  Accuracy Count Density   MSE  Count Density   Knockoff Count Density  
##  1400     79    0.1600454 1500 93    0.1863727 1000     129   6.5349544
##  1500     77    0.1559936 1000 83    0.1663327  100     119   6.0283688
##  1000     70    0.1418124 863  77    0.1543086 1500     113   5.7244174
##  100      68    0.1377606 100  75    0.1503006 1400     105   5.3191489
##  863      62    0.1256052 1400 66    0.1322645  800      54   2.7355623
##  1200     62    0.1256052 1200 63    0.1262525 1200      54   2.7355623
##  941      59    0.1195276 599  55    0.1102204 1156      37   1.8743668
##  326      58    0.1175017 1063 55    0.1102204  138      26   1.3171226
##  599      57    0.1154758 400  52    0.1042084 1404      25   1.2664640
##  1232     55    0.1114240 816  52    0.1042084 1015      21   1.0638298
##  1063     54    0.1093981 1139 52    0.1042084 1047      21   1.0638298
##  1128     54    0.1093981 1232 52    0.1042084 1266      18   0.9118541
##  1367     54    0.1093981 1365 52    0.1042084  589      17   0.8611955
##  513      53    0.1073722 1367 52    0.1042084 1413      17   0.8611955
##  1390     53    0.1073722 507  51    0.1022044  400      14   0.7092199
## [1] "Nonzero Features"
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500
## 
## 
## 
## 
## 
## 
## [1] EXPERIMENT 8
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000         20          5         15         30         60 
##      M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 8 9000         20        5       15       30       60
##  [1]    1  100  200  400  600  800 1000 1200 1400 1500