This is a summary of a set of 10 replications over a single experiment to test the CBDA-SL robustness. We are using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each replication has a total of 9000 jobs and is uniquely identified by the same 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by replication See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. We rank all the features based on either MSE or Accuracy metrics. We show only the top 15 in the tables below. The robustness of the CBDA-SL is shown by the consistent selection of similar top features across replications. Each replication has the same validation set for prediction purposes, but the CBDA-SL protocol is performed with diffrent seeds for each replication.
## [1] REPLICATION 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "1"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 402 0.1934543 300 423 0.2013327 800 176 10.666667
## 800 346 0.1665055 800 393 0.1870538 600 162 9.818182
## 400 342 0.1645805 400 374 0.1780105 900 159 9.636364
## 900 313 0.1506249 900 364 0.1732508 100 146 8.848485
## 650 312 0.1501436 1 341 0.1623037 1 121 7.333333
## 100 301 0.1448501 100 332 0.1580200 300 109 6.606061
## 700 287 0.1381129 700 314 0.1494526 500 107 6.484848
## 496 283 0.1361880 600 311 0.1480248 200 91 5.515152
## 500 282 0.1357068 500 306 0.1456449 108 57 3.454545
## 1 278 0.1337818 845 296 0.1408853 400 47 2.848485
## 805 272 0.1308945 282 289 0.1375535 700 45 2.727273
## 158 271 0.1304132 650 288 0.1370776 424 30 1.818182
## 611 271 0.1304132 200 285 0.1356497 574 22 1.333333
## 310 270 0.1299320 765 279 0.1327939 379 21 1.272727
## 698 269 0.1294508 121 277 0.1318420 548 19 1.151515
##
##
##
##
##
##
## [1] REPLICATION 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "2"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 435 0.2090000 300 444 0.2094804 900 170 10.631645
## 800 366 0.1758483 800 398 0.1877775 800 169 10.569106
## 400 354 0.1700827 400 379 0.1788132 600 164 10.256410
## 100 299 0.1436575 100 372 0.1755106 100 156 9.756098
## 600 299 0.1436575 600 358 0.1689054 1 132 8.255159
## 900 297 0.1426965 900 331 0.1561667 500 103 6.441526
## 496 281 0.1350092 500 316 0.1490897 200 95 5.941213
## 845 279 0.1340483 700 314 0.1481461 300 91 5.691057
## 79 274 0.1316460 1 307 0.1448434 108 67 4.190119
## 623 274 0.1316460 200 301 0.1420126 400 52 3.252033
## 650 274 0.1316460 496 293 0.1382382 700 38 2.376485
## 322 273 0.1311655 650 288 0.1358792 574 25 1.563477
## 346 272 0.1306850 523 287 0.1354074 424 20 1.250782
## 700 270 0.1297241 252 283 0.1335202 379 19 1.188243
## 709 270 0.1297241 556 282 0.1330484 765 16 1.000625
##
##
##
##
##
##
## [1] REPLICATION 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "3"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 421 0.1991881 800 446 0.2095569 800 181 11.5213240
## 800 400 0.1892524 300 430 0.2020392 600 151 9.6117123
## 400 350 0.1655958 900 376 0.1766668 100 141 8.9751750
## 900 322 0.1523481 100 368 0.1729080 900 137 8.7205602
## 100 316 0.1495094 400 368 0.1729080 500 126 8.0203692
## 496 302 0.1428855 500 320 0.1503547 1 121 7.7021006
## 747 291 0.1376811 200 318 0.1494150 300 106 6.7472947
## 738 288 0.1362617 1 313 0.1470657 200 78 4.9649905
## 500 286 0.1353154 600 312 0.1465959 108 60 3.8192234
## 650 283 0.1338960 623 302 0.1418973 400 51 3.2463399
## 623 278 0.1315304 700 298 0.1400179 700 48 3.0553787
## 1 277 0.1310573 650 297 0.1395480 424 30 1.9096117
## 600 277 0.1310573 138 293 0.1376686 574 23 1.4640356
## 146 275 0.1301110 738 290 0.1362590 379 21 1.3367282
## 200 275 0.1301110 386 284 0.1334398 650 14 0.8911521
##
##
##
##
##
##
## [1] REPLICATION 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "4"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 439 0.2076436 800 433 0.2022561 600 146 10.2312544
## 800 395 0.1868319 300 430 0.2008548 800 144 10.0911002
## 400 330 0.1560874 900 377 0.1760983 100 142 9.9509460
## 900 325 0.1537224 100 364 0.1700259 900 133 9.3202523
## 100 315 0.1489925 400 358 0.1672233 1 117 8.1990189
## 600 292 0.1381137 500 345 0.1611509 300 103 7.2179397
## 500 289 0.1366947 600 345 0.1611509 500 89 6.2368605
## 650 284 0.1343298 1 328 0.1532102 200 75 5.2557814
## 496 280 0.1324378 700 306 0.1429339 108 48 3.3637001
## 486 272 0.1286539 200 303 0.1415326 400 47 3.2936230
## 1 271 0.1281809 496 301 0.1405984 700 36 2.5227751
## 174 271 0.1281809 650 299 0.1396642 574 28 1.9621584
## 623 271 0.1281809 386 288 0.1345260 424 15 1.0511563
## 643 271 0.1281809 14 284 0.1326576 548 15 1.0511563
## 86 270 0.1277079 219 278 0.1298550 379 13 0.9110021
##
##
##
##
##
##
## [1] REPLICATION 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "5"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 419 0.1999647 800 418 0.1972675 600 161 10.254777
## 800 391 0.1866019 300 404 0.1906605 900 148 9.426752
## 400 349 0.1665577 400 356 0.1680077 800 141 8.980892
## 650 303 0.1446045 900 349 0.1647042 100 140 8.917197
## 845 288 0.1374459 500 336 0.1585691 1 123 7.834395
## 900 288 0.1374459 100 322 0.1519621 300 115 7.324841
## 496 287 0.1369686 600 321 0.1514901 500 104 6.624204
## 527 287 0.1369686 700 312 0.1472427 200 77 4.904459
## 110 286 0.1364914 1 305 0.1439392 108 62 3.949045
## 700 285 0.1360142 496 302 0.1425234 400 60 3.821656
## 600 281 0.1341052 650 298 0.1406357 574 31 1.974522
## 493 279 0.1331507 200 283 0.1335567 700 27 1.719745
## 72 277 0.1321962 549 280 0.1321409 424 25 1.592357
## 338 277 0.1321962 322 277 0.1307251 379 21 1.337580
## 764 271 0.1293328 414 277 0.1307251 762 19 1.210191
##
##
##
##
##
##
## [1] REPLICATION 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "6"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 399 0.1902872 300 412 0.1944139 600 161 10.502283
## 800 389 0.1855181 800 411 0.1939420 900 145 9.458578
## 100 329 0.1569035 100 381 0.1797857 800 142 9.262883
## 400 328 0.1564266 900 350 0.1651574 1 122 7.958252
## 900 308 0.1468884 400 349 0.1646856 500 120 7.827789
## 650 290 0.1383040 600 346 0.1632699 100 112 7.305936
## 1 286 0.1363964 1 334 0.1576074 300 106 6.914547
## 409 286 0.1363964 500 317 0.1495855 200 90 5.870841
## 556 283 0.1349656 200 307 0.1448667 400 67 4.370515
## 700 276 0.1316273 700 295 0.1392041 108 56 3.652968
## 282 275 0.1311504 424 285 0.1344853 700 44 2.870189
## 500 275 0.1311504 650 285 0.1344853 574 26 1.696021
## 600 275 0.1311504 409 282 0.1330697 424 23 1.500326
## 456 272 0.1297196 623 281 0.1325978 379 16 1.043705
## 246 270 0.1287658 899 279 0.1316541 548 16 1.043705
##
##
##
##
##
##
## [1] REPLICATION 7
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "7"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 402 0.1937471 800 419 0.1973092 800 170 10.960671
## 800 369 0.1778425 300 401 0.1888330 600 157 10.122502
## 900 334 0.1609739 900 383 0.1803567 900 154 9.929078
## 400 321 0.1547085 600 348 0.1638750 100 138 8.897485
## 100 314 0.1513348 400 347 0.1634041 1 116 7.479046
## 600 289 0.1392858 100 331 0.1558696 500 106 6.834300
## 706 288 0.1388039 500 328 0.1544569 300 102 6.576402
## 650 287 0.1383219 1 323 0.1521024 200 88 5.673759
## 138 281 0.1354302 200 306 0.1440970 400 60 3.868472
## 764 281 0.1354302 700 306 0.1440970 108 55 3.546099
## 1 278 0.1339843 409 292 0.1375043 700 37 2.385558
## 853 276 0.1330204 650 290 0.1365625 424 22 1.418440
## 577 274 0.1320565 764 289 0.1360916 574 20 1.289491
## 496 273 0.1315745 138 287 0.1351498 548 17 1.096067
## 878 272 0.1310926 496 280 0.1318534 762 17 1.096067
##
##
##
##
##
##
## [1] REPLICATION 8
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "8"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 405 0.1919204 800 409 0.1926156 800 162 10.622951
## 400 366 0.1734392 300 401 0.1888481 100 158 10.360656
## 800 350 0.1658571 900 385 0.1813130 600 155 10.163934
## 900 327 0.1549579 400 374 0.1761326 900 146 9.573770
## 100 299 0.1416894 100 324 0.1525855 300 113 7.409836
## 496 294 0.1393200 500 324 0.1525855 1 109 7.147541
## 878 287 0.1360028 1 320 0.1507017 500 101 6.622951
## 409 285 0.1350551 600 312 0.1469342 200 76 4.983607
## 500 283 0.1341073 222 297 0.1398700 400 58 3.803279
## 222 279 0.1322118 138 295 0.1389281 108 53 3.475410
## 127 278 0.1317379 200 293 0.1379862 700 38 2.491803
## 556 278 0.1317379 765 292 0.1375153 424 31 2.032787
## 650 278 0.1317379 409 288 0.1356315 574 22 1.442623
## 282 276 0.1307902 496 286 0.1346896 738 22 1.442623
## 687 276 0.1307902 422 283 0.1332768 379 17 1.114754
##
##
##
##
##
##
## [1] REPLICATION 9
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "9"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 422 0.2023796 300 415 0.1969588 600 162 10.595160
## 800 355 0.1702483 900 398 0.1888906 800 157 10.268149
## 900 352 0.1688096 800 397 0.1884160 900 140 9.156311
## 400 329 0.1577794 400 373 0.1770256 100 134 8.763898
## 100 308 0.1477084 100 356 0.1689574 1 118 7.717462
## 700 296 0.1419535 600 338 0.1604146 300 109 7.128842
## 650 289 0.1385965 1 311 0.1476004 500 108 7.063440
## 765 281 0.1347599 500 311 0.1476004 200 82 5.362982
## 600 280 0.1342803 650 301 0.1428544 108 54 3.531720
## 222 279 0.1338008 700 298 0.1414306 400 53 3.466318
## 574 279 0.1338008 765 293 0.1390576 700 37 2.419882
## 718 278 0.1333212 574 286 0.1357354 574 28 1.831262
## 63 275 0.1318825 409 284 0.1347862 424 24 1.569653
## 623 274 0.1314029 556 282 0.1338370 762 19 1.242642
## 379 272 0.1304437 623 280 0.1328878 379 18 1.177240
##
##
##
##
##
##
## [1] REPLICATION 10
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "REPLICATION" "10"
## [1] "TRUE FEATURES"
## [1] 1 100 200 300 400 500 600 700 800 900
## Accuracy Count Density MSE Count Density Knockoff Count Density
## 300 422 0.2013522 300 451 0.2125183 600 169 10.242424
## 800 357 0.1703382 800 392 0.1847166 800 166 10.060606
## 400 340 0.1622269 900 380 0.1790620 900 154 9.333333
## 900 335 0.1598412 400 370 0.1743498 100 142 8.606061
## 100 296 0.1412328 600 346 0.1630407 1 139 8.424242
## 600 287 0.1369386 100 330 0.1555012 500 112 6.787879
## 529 278 0.1326443 1 319 0.1503178 300 110 6.666667
## 486 277 0.1321672 500 316 0.1489042 200 84 5.090909
## 764 276 0.1316901 529 294 0.1385374 400 68 4.121212
## 805 276 0.1316901 700 291 0.1371238 108 47 2.848485
## 700 275 0.1312129 623 284 0.1338253 700 45 2.727273
## 845 274 0.1307358 216 283 0.1333541 424 41 2.484848
## 1 273 0.1302587 138 282 0.1328829 574 22 1.333333
## 386 272 0.1297815 650 281 0.1324116 765 21 1.272727
## 510 271 0.1293044 845 281 0.1324116 548 20 1.212121