Some useful information

This is a summary of a set of 10 replications over a single experiment to test the CBDA-SL robustness. We are using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each replication has a total of 9000 jobs and is uniquely identified by the same 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.

This document has the final results, by replication See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.

Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. We rank all the features based on either MSE or Accuracy metrics. We show only the top 15 in the tables below. The robustness of the CBDA-SL is shown by the consistent selection of similar top features across replications. Each replication has the same validation set for prediction purposes, but the CBDA-SL protocol is performed with diffrent seeds for each replication.

## [1] REPLICATION 1
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "1"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      402   0.1934543 300 423   0.2013327 800      176   10.666667
##  800      346   0.1665055 800 393   0.1870538 600      162    9.818182
##  400      342   0.1645805 400 374   0.1780105 900      159    9.636364
##  900      313   0.1506249 900 364   0.1732508 100      146    8.848485
##  650      312   0.1501436 1   341   0.1623037   1      121    7.333333
##  100      301   0.1448501 100 332   0.1580200 300      109    6.606061
##  700      287   0.1381129 700 314   0.1494526 500      107    6.484848
##  496      283   0.1361880 600 311   0.1480248 200       91    5.515152
##  500      282   0.1357068 500 306   0.1456449 108       57    3.454545
##  1        278   0.1337818 845 296   0.1408853 400       47    2.848485
##  805      272   0.1308945 282 289   0.1375535 700       45    2.727273
##  158      271   0.1304132 650 288   0.1370776 424       30    1.818182
##  611      271   0.1304132 200 285   0.1356497 574       22    1.333333
##  310      270   0.1299320 765 279   0.1327939 379       21    1.272727
##  698      269   0.1294508 121 277   0.1318420 548       19    1.151515
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 2
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "2"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      435   0.2090000 300 444   0.2094804 900      170   10.631645
##  800      366   0.1758483 800 398   0.1877775 800      169   10.569106
##  400      354   0.1700827 400 379   0.1788132 600      164   10.256410
##  100      299   0.1436575 100 372   0.1755106 100      156    9.756098
##  600      299   0.1436575 600 358   0.1689054   1      132    8.255159
##  900      297   0.1426965 900 331   0.1561667 500      103    6.441526
##  496      281   0.1350092 500 316   0.1490897 200       95    5.941213
##  845      279   0.1340483 700 314   0.1481461 300       91    5.691057
##  79       274   0.1316460 1   307   0.1448434 108       67    4.190119
##  623      274   0.1316460 200 301   0.1420126 400       52    3.252033
##  650      274   0.1316460 496 293   0.1382382 700       38    2.376485
##  322      273   0.1311655 650 288   0.1358792 574       25    1.563477
##  346      272   0.1306850 523 287   0.1354074 424       20    1.250782
##  700      270   0.1297241 252 283   0.1335202 379       19    1.188243
##  709      270   0.1297241 556 282   0.1330484 765       16    1.000625
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 3
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "3"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density   
##  300      421   0.1991881 800 446   0.2095569 800      181   11.5213240
##  800      400   0.1892524 300 430   0.2020392 600      151    9.6117123
##  400      350   0.1655958 900 376   0.1766668 100      141    8.9751750
##  900      322   0.1523481 100 368   0.1729080 900      137    8.7205602
##  100      316   0.1495094 400 368   0.1729080 500      126    8.0203692
##  496      302   0.1428855 500 320   0.1503547   1      121    7.7021006
##  747      291   0.1376811 200 318   0.1494150 300      106    6.7472947
##  738      288   0.1362617 1   313   0.1470657 200       78    4.9649905
##  500      286   0.1353154 600 312   0.1465959 108       60    3.8192234
##  650      283   0.1338960 623 302   0.1418973 400       51    3.2463399
##  623      278   0.1315304 700 298   0.1400179 700       48    3.0553787
##  1        277   0.1310573 650 297   0.1395480 424       30    1.9096117
##  600      277   0.1310573 138 293   0.1376686 574       23    1.4640356
##  146      275   0.1301110 738 290   0.1362590 379       21    1.3367282
##  200      275   0.1301110 386 284   0.1334398 650       14    0.8911521
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 4
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "4"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density   
##  300      439   0.2076436 800 433   0.2022561 600      146   10.2312544
##  800      395   0.1868319 300 430   0.2008548 800      144   10.0911002
##  400      330   0.1560874 900 377   0.1760983 100      142    9.9509460
##  900      325   0.1537224 100 364   0.1700259 900      133    9.3202523
##  100      315   0.1489925 400 358   0.1672233   1      117    8.1990189
##  600      292   0.1381137 500 345   0.1611509 300      103    7.2179397
##  500      289   0.1366947 600 345   0.1611509 500       89    6.2368605
##  650      284   0.1343298 1   328   0.1532102 200       75    5.2557814
##  496      280   0.1324378 700 306   0.1429339 108       48    3.3637001
##  486      272   0.1286539 200 303   0.1415326 400       47    3.2936230
##  1        271   0.1281809 496 301   0.1405984 700       36    2.5227751
##  174      271   0.1281809 650 299   0.1396642 574       28    1.9621584
##  623      271   0.1281809 386 288   0.1345260 424       15    1.0511563
##  643      271   0.1281809 14  284   0.1326576 548       15    1.0511563
##  86       270   0.1277079 219 278   0.1298550 379       13    0.9110021
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 5
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "5"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      419   0.1999647 800 418   0.1972675 600      161   10.254777
##  800      391   0.1866019 300 404   0.1906605 900      148    9.426752
##  400      349   0.1665577 400 356   0.1680077 800      141    8.980892
##  650      303   0.1446045 900 349   0.1647042 100      140    8.917197
##  845      288   0.1374459 500 336   0.1585691   1      123    7.834395
##  900      288   0.1374459 100 322   0.1519621 300      115    7.324841
##  496      287   0.1369686 600 321   0.1514901 500      104    6.624204
##  527      287   0.1369686 700 312   0.1472427 200       77    4.904459
##  110      286   0.1364914 1   305   0.1439392 108       62    3.949045
##  700      285   0.1360142 496 302   0.1425234 400       60    3.821656
##  600      281   0.1341052 650 298   0.1406357 574       31    1.974522
##  493      279   0.1331507 200 283   0.1335567 700       27    1.719745
##  72       277   0.1321962 549 280   0.1321409 424       25    1.592357
##  338      277   0.1321962 322 277   0.1307251 379       21    1.337580
##  764      271   0.1293328 414 277   0.1307251 762       19    1.210191
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 6
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "6"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      399   0.1902872 300 412   0.1944139 600      161   10.502283
##  800      389   0.1855181 800 411   0.1939420 900      145    9.458578
##  100      329   0.1569035 100 381   0.1797857 800      142    9.262883
##  400      328   0.1564266 900 350   0.1651574   1      122    7.958252
##  900      308   0.1468884 400 349   0.1646856 500      120    7.827789
##  650      290   0.1383040 600 346   0.1632699 100      112    7.305936
##  1        286   0.1363964 1   334   0.1576074 300      106    6.914547
##  409      286   0.1363964 500 317   0.1495855 200       90    5.870841
##  556      283   0.1349656 200 307   0.1448667 400       67    4.370515
##  700      276   0.1316273 700 295   0.1392041 108       56    3.652968
##  282      275   0.1311504 424 285   0.1344853 700       44    2.870189
##  500      275   0.1311504 650 285   0.1344853 574       26    1.696021
##  600      275   0.1311504 409 282   0.1330697 424       23    1.500326
##  456      272   0.1297196 623 281   0.1325978 379       16    1.043705
##  246      270   0.1287658 899 279   0.1316541 548       16    1.043705
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 7
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "7"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      402   0.1937471 800 419   0.1973092 800      170   10.960671
##  800      369   0.1778425 300 401   0.1888330 600      157   10.122502
##  900      334   0.1609739 900 383   0.1803567 900      154    9.929078
##  400      321   0.1547085 600 348   0.1638750 100      138    8.897485
##  100      314   0.1513348 400 347   0.1634041   1      116    7.479046
##  600      289   0.1392858 100 331   0.1558696 500      106    6.834300
##  706      288   0.1388039 500 328   0.1544569 300      102    6.576402
##  650      287   0.1383219 1   323   0.1521024 200       88    5.673759
##  138      281   0.1354302 200 306   0.1440970 400       60    3.868472
##  764      281   0.1354302 700 306   0.1440970 108       55    3.546099
##  1        278   0.1339843 409 292   0.1375043 700       37    2.385558
##  853      276   0.1330204 650 290   0.1365625 424       22    1.418440
##  577      274   0.1320565 764 289   0.1360916 574       20    1.289491
##  496      273   0.1315745 138 287   0.1351498 548       17    1.096067
##  878      272   0.1310926 496 280   0.1318534 762       17    1.096067
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 8
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "8"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      405   0.1919204 800 409   0.1926156 800      162   10.622951
##  400      366   0.1734392 300 401   0.1888481 100      158   10.360656
##  800      350   0.1658571 900 385   0.1813130 600      155   10.163934
##  900      327   0.1549579 400 374   0.1761326 900      146    9.573770
##  100      299   0.1416894 100 324   0.1525855 300      113    7.409836
##  496      294   0.1393200 500 324   0.1525855   1      109    7.147541
##  878      287   0.1360028 1   320   0.1507017 500      101    6.622951
##  409      285   0.1350551 600 312   0.1469342 200       76    4.983607
##  500      283   0.1341073 222 297   0.1398700 400       58    3.803279
##  222      279   0.1322118 138 295   0.1389281 108       53    3.475410
##  127      278   0.1317379 200 293   0.1379862 700       38    2.491803
##  556      278   0.1317379 765 292   0.1375153 424       31    2.032787
##  650      278   0.1317379 409 288   0.1356315 574       22    1.442623
##  282      276   0.1307902 496 286   0.1346896 738       22    1.442623
##  687      276   0.1307902 422 283   0.1332768 379       17    1.114754
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 9
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "9"          
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      422   0.2023796 300 415   0.1969588 600      162   10.595160
##  800      355   0.1702483 900 398   0.1888906 800      157   10.268149
##  900      352   0.1688096 800 397   0.1884160 900      140    9.156311
##  400      329   0.1577794 400 373   0.1770256 100      134    8.763898
##  100      308   0.1477084 100 356   0.1689574   1      118    7.717462
##  700      296   0.1419535 600 338   0.1604146 300      109    7.128842
##  650      289   0.1385965 1   311   0.1476004 500      108    7.063440
##  765      281   0.1347599 500 311   0.1476004 200       82    5.362982
##  600      280   0.1342803 650 301   0.1428544 108       54    3.531720
##  222      279   0.1338008 700 298   0.1414306 400       53    3.466318
##  574      279   0.1338008 765 293   0.1390576 700       37    2.419882
##  718      278   0.1333212 574 286   0.1357354 574       28    1.831262
##  63       275   0.1318825 409 284   0.1347862 424       24    1.569653
##  623      274   0.1314029 556 282   0.1338370 762       19    1.242642
##  379      272   0.1304437 623 280   0.1328878 379       18    1.177240
## 
## 
## 
## 
## 
## 
## [1] REPLICATION 10
##          M misValperc   Kcol_min   Kcol_max   Nrow_min   Nrow_max 
##       9000          0         15         30         60         80

## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "10"         
## [1] "TRUE FEATURES"
##  [1]   1 100 200 300 400 500 600 700 800 900
##  Accuracy Count Density   MSE Count Density   Knockoff Count Density  
##  300      422   0.2013522 300 451   0.2125183 600      169   10.242424
##  800      357   0.1703382 800 392   0.1847166 800      166   10.060606
##  400      340   0.1622269 900 380   0.1790620 900      154    9.333333
##  900      335   0.1598412 400 370   0.1743498 100      142    8.606061
##  100      296   0.1412328 600 346   0.1630407   1      139    8.424242
##  600      287   0.1369386 100 330   0.1555012 500      112    6.787879
##  529      278   0.1326443 1   319   0.1503178 300      110    6.666667
##  486      277   0.1321672 500 316   0.1489042 200       84    5.090909
##  764      276   0.1316901 529 294   0.1385374 400       68    4.121212
##  805      276   0.1316901 700 291   0.1371238 108       47    2.848485
##  700      275   0.1312129 623 284   0.1338253 700       45    2.727273
##  845      274   0.1307358 216 283   0.1333541 424       41    2.484848
##  1        273   0.1302587 138 282   0.1328829 574       22    1.333333
##  386      272   0.1297815 650 281   0.1324116 765       21    1.272727
##  510      271   0.1293044 845 281   0.1324116 548       20    1.212121