This is a summary of a set of 1 experiments using a LONI pipeline workflow file that performs 3000 independent jobs, each one with the CBDA-SL and the knockoff filter feature mining strategies. Each experiments has a total of 9000 jobs and is uniquely identified by 6 input arguments: # of jobs [M], % of missing values [misValperc], min [Kcol_min] and max [Kcol_max] % for FSR-Feature Sampling Range, min [Nrow_min] and max [Nrow_max] % for SSR-Subject Sampling Range.
This document has the final results, by experiment. See https://drive.google.com/file/d/0B5sz_T_1CNJQWmlsRTZEcjBEOEk/view?ths=true for some general documentation of the CBDA-SL project and github https://github.com/SOCR/CBDA for some of the code.
Features selected by both the knockoff filter and the CBDA-SL algorithms are shown as spikes in the histograms shown below. I list the top features selected, set to 15 here.
## [1] EXPERIMENT 1
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 300 10 0.6377551 800 8.4955752
## 400 9 0.5739796 600 7.7876106
## 449 8 0.5102041 900 7.3156342
## 747 7 0.4464286 100 5.8997050
## 20 6 0.3826531 500 5.1917404
## 73 6 0.3826531 300 4.4247788
## 409 6 0.3826531 1 4.0707965
## 643 6 0.3826531 200 3.4218289
## 671 6 0.3826531 400 2.3008850
## 706 6 0.3826531 108 2.1238938
## 11 5 0.3188776 574 1.2979351
## 30 5 0.3188776 700 1.1209440
## 31 5 0.3188776 424 0.9439528
## 34 5 0.3188776 650 0.9439528
## 78 5 0.3188776 139 0.8849558
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 2
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 300 22 0.4352127 600 8.3467095
## 400 15 0.2967359 900 8.1460674
## 800 15 0.2967359 800 7.9855538
## 600 14 0.2769535 100 7.5040128
## 117 13 0.2571711 1 6.5810594
## 41 12 0.2373887 300 5.4574639
## 489 12 0.2373887 500 4.8956661
## 703 12 0.2373887 108 2.9695024
## 802 12 0.2373887 200 2.8089888
## 900 12 0.2373887 400 2.1669342
## 36 11 0.2176063 574 1.7255217
## 100 11 0.2176063 700 1.5650080
## 240 11 0.2176063 424 1.2841091
## 248 11 0.2176063 765 0.9630819
## 315 11 0.2176063 379 0.9229535
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 3
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 30 60
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 300 29 0.2576862 500 14.705882
## 400 26 0.2310290 1 8.823529
## 800 26 0.2310290 100 8.823529
## 100 22 0.1954860 800 8.823529
## 393 22 0.1954860 900 8.823529
## 781 22 0.1954860 139 5.882353
## 1 21 0.1866003 548 5.882353
## 335 21 0.1866003 574 5.882353
## 491 21 0.1866003 14 2.941176
## 519 21 0.1866003 112 2.941176
## 585 21 0.1866003 246 2.941176
## 586 21 0.1866003 563 2.941176
## 692 21 0.1866003 590 2.941176
## 847 21 0.1866003 596 2.941176
## 896 21 0.1866003 676 2.941176
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 4
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 1 5 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 300 14 0.8668731 600 9.751693
## 795 8 0.4953560 800 9.255079
## 223 7 0.4334365 100 8.397291
## 398 7 0.4334365 900 8.352144
## 35 6 0.3715170 500 7.088036
## 171 6 0.3715170 300 6.817156
## 189 6 0.3715170 1 6.546275
## 539 6 0.3715170 200 4.695260
## 556 6 0.3715170 400 3.431151
## 606 6 0.3715170 108 3.250564
## 656 6 0.3715170 700 2.437923
## 688 6 0.3715170 574 1.489842
## 701 6 0.3715170 738 1.218962
## 4 5 0.3095975 424 1.083521
## 20 5 0.3095975 548 1.083521
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 5
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 5 15 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 300 16 0.3485839 600 13.1027993
## 409 15 0.3267974 800 11.9501294
## 400 14 0.3050109 900 11.0326982
## 650 13 0.2832244 100 10.6092684
## 20 12 0.2614379 1 8.8214538
## 332 12 0.2614379 300 8.6803105
## 333 12 0.2614379 500 8.1392614
## 415 12 0.2614379 200 5.1987768
## 449 12 0.2614379 108 3.3639144
## 638 12 0.2614379 400 3.1051517
## 754 12 0.2614379 700 2.0465773
## 28 11 0.2396514 424 1.3173371
## 49 11 0.2396514 379 0.7998118
## 95 11 0.2396514 762 0.7527641
## 149 11 0.2396514 738 0.7292402
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
##
##
##
##
##
##
## [1] EXPERIMENT 6
## M misValperc Kcol_min Kcol_max Nrow_min Nrow_max
## 9000 0 15 30 60 80
## [1] 1 100 200 300 400 500 600 700 800 900
## [1] "TABLE with CBDA-SL & KNOCKOFF FILTER RESULTS"
## [1] "EXPERIMENT" "1"
## CBDA Frequency Density Knockoff Density
## 100 26 0.2414786 600 10.910265
## 800 25 0.2321910 100 10.264687
## 700 23 0.2136157 800 9.748225
## 300 22 0.2043280 900 8.715300
## 508 22 0.2043280 1 8.134280
## 18 21 0.1950404 300 6.455778
## 254 21 0.1950404 500 6.455778
## 209 20 0.1857528 200 5.745642
## 400 20 0.1857528 108 3.938025
## 590 20 0.1857528 400 3.679793
## 654 20 0.1857528 700 1.807618
## 659 20 0.1857528 574 1.678502
## 690 20 0.1857528 762 1.355713
## 900 20 0.1857528 379 1.226598
## 105 19 0.1764651 424 1.226598
## [1] "Nonzero Features"
## [1] 1 100 200 300 400 500 600 700 800 900
The features listed above are then used to run a final analysis applying both the CBDA-SL and the knockoff filter. The ONLY features used for analysis are the ones listed above. A final summary of the accuracy of the overall procedure is determined by using the CDBA-SL object on the subset of subjects held off for prediction. The predictions are then used to generate the confusion matrix. We basically combine the CBDA-SL & Knockoff Filter algorithms to first select the top features during the first round. Then, the second stage uses the top features to run a final predictive modeling step that can ultimately be tested for accuracy, sensitivity,…..