Effect of GDSTM-Based Decorrelation on Feature Discovery: The DARWIN Evaluation

Here I showcase of to use BSWiMS feature selection/modeling function coupled with Goal Driven Sparse Transformation Matrix (GDSTM) as a pre-processing step to decorrelate highly correlated features. The aim(s) are:

  1. To improve model performance by uncovering the hidden information between correlated features.

  2. To simplify the interpretation of the machine learning models.

This demo will use:

Loading the libraries

library("FRESA.CAD")
library(readxl)
library(vioplot)
library(igraph)

op <- par(no.readonly = TRUE)
pander::panderOptions('digits', 3)
pander::panderOptions('table.split.table', 400)
pander::panderOptions('keep.trailing.zeros',TRUE)

Material and Methods

Signed Log Transform

The function will be used to transform all the continuous features of the data

signedlog <- function(x) { return (sign(x)*log(abs(x)+1.0e-12))}

Data: The DARWIN Data-Set

The data to process is described in:

Cilia, Nicole D., Giuseppe De Gregorio, Claudio De Stefano, Francesco Fontanella, Angelo Marcelli, and Antonio Parziale. “Diagnosing Alzheimer’s disease from on-line handwriting: a novel dataset and performance benchmarking.” Engineering Applications of Artificial Intelligence 111 (2022): 104822.

From the DARWIN_readme.rtf

The DARWIN dataset contains handwriting data collected according to the acquisition protocol described in [1], which is composed of 25 handwriting tasks. The protocol  was specifically designed for the early detection of Alzheimer’s disease (AD). The dataset includes data from 174 participants (89 AD patients and 85 healthy people). The file “DARWIN.csv” contains the acquired data. The file consists of one row for each participant plus an additional header row. The first row is the header row, the next 89 rows collect patients data, whereas the remaining 84 rows collect information from healthy people. The file consists of 452 columns. The first column shows participants’ identifiers,whereas the last column shows the class to which each participant belongs.  This value can be equal to  ‘P’ (Patient) or ‘H’ (Healthy). The remaining columns report the features extracted from a specific task. The tasks performed are 25, and for each task 18 features have been extracted. The column will be identified by the name of the features followed by a numeric identifier representing the task the feature is extracted. E.g., the column with the header “total_time8” collects the values for the “total time” feature extracted from task #8.

DARWIN <- read.csv("~/GitHub/FCA/Data/DARWIN/DARWIN.csv")
rownames(DARWIN) <- DARWIN$ID
DARWIN$ID <- NULL
DARWIN$class <- 1*(DARWIN$class=="P")
print(table(DARWIN$class))
#> 
#>  0  1 
#> 85 89

DARWIN[,1:ncol(DARWIN)] <- sapply(DARWIN,as.numeric)

whof <- !(colnames(DARWIN) %in% c("class"));
DARWIN[,whof] <- signedlog(DARWIN[,whof])

## The size of training

trainFraction=0.65;

## The file with codes for creating shorter names
namecode <- read.csv("~/GitHub/FCA/Data/DARWIN/Darnames.csv")

Correlation Matrix of the DARWIN Data

cormat <- cor(DARWIN,method="spearman")
gplots::heatmap.2(abs(cormat),
                  trace = "none",
                  scale = "none",
                  mar = c(10,10),
                  col=rev(heat.colors(5)),
                  main = "Raw Correlation",
                  cexRow = 0.45,
                  cexCol = 0.45,
                  key.title=NA,
                  key.xlab="Spearman Correlation",
                  xlab="Feature", ylab="Feature")

Train and test set

set.seed(2)
caseSet <- subset(DARWIN, class == 1)
controlSet <- subset(DARWIN, class == 0)
caseTrainSize <- nrow(caseSet)*trainFraction;
controlTrainSize <- nrow(controlSet)*trainFraction;
sampleCaseTrain <- sample(nrow(caseSet),caseTrainSize)
sampleControlTrain <- sample(nrow(controlSet),controlTrainSize)
trainSet <- rbind(caseSet[sampleCaseTrain,], controlSet[sampleControlTrain,])
testSet <-  rbind(caseSet[-sampleCaseTrain,],controlSet[-sampleControlTrain,])
pander::pander(table(trainSet$class))
0 1
55 57
pander::pander(table(testSet$class))
0 1
30 32

Decorrelation: Training and Testing Sets Creation

I compute a decorrelated version of the training and testing sets using the GDSTMDecorrelation() function of FRESA.CAD. The first decorrelation will be driven by features associated with the outcome. The second decorrelation will find the GDSTM without the outcome restriction.

## The GDSTM transformation driven by the Outcome
deTrain <- GDSTMDecorrelation(trainSet,Outcome="class",thr=0.8,verbose = TRUE)

Included: 378 , Uni p: 0.005434231 To Outcome: 220 , Base: 33 , In Included: 33 , Base Cor: 51 1 , Top: 130 < 0.8 >( 1 ).1 : 0 : 0.792,<|>Tot Used: 294 , Added: 169 , Zero Std: 0 , Max Cor: 0.9968972 2 , Top: 33 < 0.8 >FALSE1 : 0 : 0,<|>Tot Used: 309 , Added: 35 , Zero Std: 0 , Max Cor: 0.9902345 3 , Top: 2 < 0.8 >( 1 )1 : 0 : 0.8,<|>Tot Used: 309 , Added: 2 , Zero Std: 0 , Max Cor: 0.7977194 [ 4 ], 0.7977194 . Cor to Base: 175 , ABase: 180

deTest <- predictDecorrelate(deTrain,testSet)

## The GDSTM transformation without outcome
deTrainU <- GDSTMDecorrelation(trainSet,thr=0.8,verbose = TRUE)

Included: 378 , Uni p: 0.005434231 To Outcome: 0 , Base: 0 , In Included: 0 , Base Cor: 0 1 , Top: 131 < 0.8 >( 4 ).1 : 0 : 0,<|>Tot Used: 301 , Added: 173 , Zero Std: 0 , Max Cor: 0.9683179 2 , Top: 28 < 0.8 >( 1 )1 : 0 : 0,<|>Tot Used: 313 , Added: 29 , Zero Std: 0 , Max Cor: 0.8045096 3 , Top: 2 < 0.8 >( 1 )1 : 0 : 0.8,<|>Tot Used: 313 , Added: 2 , Zero Std: 0 , Max Cor: 0.7970469 [ 4 ], 0.7921168 . Cor to Base: 173 , ABase: 180

deTestU <- predictDecorrelate(deTrainU,testSet)

Correlation Matrix of the Decorrelated Test Data

The heat map of the testing set.

cormat <- cor(deTest,method="spearman")
gplots::heatmap.2(abs(cormat),
                  trace = "none",
                  scale = "none",
                  mar = c(10,10),
                  col=rev(heat.colors(5)),
                  main = "Test Set Correlation after GDSTM",
                  cexRow = 0.45,
                  cexCol = 0.45,
                  key.title=NA,
                  key.xlab="Spearman Correlation",
                  xlab="Feature", ylab="Feature")

Holdout Cross-Validation

Before doing the feature analysis. I’ll explore BSWiMS modeling using the Holdout cross validation method of FRESA.CAD. The purpose of the cross-validation is to observe and estimate the performance gain of decorrelation.

par(op)
par(mfrow=c(1,3))

## The Raw validation
cvBSWiMSRaw <- randomCV(DARWIN,
                "class",
                fittingFunction= BSWiMS.model,
                classSamplingType = "Pro",
                trainFraction = trainFraction,
                repetitions = 150
)

.[++-++++-++-].[+++++++++++-++++-]..[++++—++-].[++++++++-+–].[+++++++++–].[++++++++++-]..[++++++++++++++++++++]…[++++++-++-].[++++++++++++++++++++]…[+++++–++++++–].10 Tested: 171 Avg. Selected: 41.3 Min Tests: 1 Max Tests: 8 Mean Tests: 3.625731 . MAD: 0.2615184 .[+++++++++++-]..[++++++++++++++++++++]…[++++++++++++++-]..[+++++++++++-]..[++++++++++-]..[++++++-+-+-].[+++++++++-].[+++++++-].[+++-].[+++–+-+–]20 Tested: 174 Avg. Selected: 38.9 Min Tests: 2 Max Tests: 14 Mean Tests: 7.126437 . MAD: 0.2558425 .[+++++++++++-+–+-]..[++++++++-+-].[++++++++-].[+++++++++++++++-+-]..[+++++++++-+++-+++-]..[+++++++++-].[++++++++++-++—]..[+++++++++-].[++—].[++-]30 Tested: 174 Avg. Selected: 37.33333 Min Tests: 5 Max Tests: 18 Mean Tests: 10.68966 . MAD: 0.2560108 .[++++++-].[++–++-+-].[++++++++++++++++++++]…[+++++++++-++++++-+]..[++++++++++++++++++++]…[++++-].[++++++++++-+++++-]..[++++++-+—++-].[++++++–].[+++++-]40 Tested: 174 Avg. Selected: 38.1 Min Tests: 7 Max Tests: 25 Mean Tests: 14.25287 . MAD: 0.2555855 .[++++++++++++++++++++]…[++++++++++-]..[++++++++++++-]..[++++++++-].[++++++++++++++++++++]…[++++++++–].[+++++—].[++++++-].[+++-+-].[+++++++++–]50 Tested: 174 Avg. Selected: 37.94 Min Tests: 9 Max Tests: 27 Mean Tests: 17.81609 . MAD: 0.2577091 .[+++++++++++-+—]..[++++++++–].[+++++++-].[++++++++++++++++++-]..[++++++++++++++++-]..[+++++++++++-+++-++]..[+++++-++-].[+++-].[++++++++++++-]..[+++–+-]60 Tested: 174 Avg. Selected: 38.15 Min Tests: 12 Max Tests: 32 Mean Tests: 21.37931 . MAD: 0.2555041 .[++++++++-+-].[++++++-].[++++++++++++–]..[+++++++++++++++++-]..[+++++++++++++-]..[++++++++-].[++++++++-++-++++-]..[+++++++++++++++-++-]..[++++++++++-]..[++++++—]70 Tested: 174 Avg. Selected: 38.65714 Min Tests: 15 Max Tests: 38 Mean Tests: 24.94253 . MAD: 0.2552179 .[+++++++++++++++-++-]..[++++++++++++++++++++]…[++++++++++++++++++++]…[+++++++++++++++++++-]..[++++++++-++-]..[++-+-].[++++++++++++-+-]..[++-].[+++++++++++++++++++-]..[+++++++++++-].80 Tested: 174 Avg. Selected: 39.9375 Min Tests: 16 Max Tests: 42 Mean Tests: 28.50575 . MAD: 0.2539662 .[+++++++++-].[+++++++++++-]..[+++++++-++++++-+-]..[++++++++++-]..[++++++++++++++++-]..[+++-++-].[++++++++++-++–]..[+++++++++++-+++++-]..[++++++++++++-]..[+++++++++-]90 Tested: 174 Avg. Selected: 40.28889 Min Tests: 20 Max Tests: 45 Mean Tests: 32.06897 . MAD: 0.2539906 .[+++++++++++++++++–]..[++++++++++–]..[+++++-].[++++-+++++++++++++-]..[++++++++++++++++++-]..[++++++++++++-++++++]..[+++++-].[+++++++++++++++++-]..[+++++++++++++++++-+]..[+++-]100 Tested: 174 Avg. Selected: 40.6 Min Tests: 23 Max Tests: 49 Mean Tests: 35.63218 . MAD: 0.2546618 .[++++++++++-++-]..[+++++-+-].[+++++++++++++++-+-]..[+++++++–].[++++-].[++++++++++++++-+–]..[++++++++++-+++++-+]..[++++++++-+–].[++++-++++++++++++++]..[++-]110 Tested: 174 Avg. Selected: 40.40909 Min Tests: 26 Max Tests: 54 Mean Tests: 39.1954 . MAD: 0.2540901 .[++++++++++++-]..[++++++++++++++-]..[+++++++++++++++++-]..[++++++++++-+++-]..[++++-++++-+++–]..[+++++++++++++-]..[+++++++++++++++++–]..[++++++++++-+–]..[+++++++-].[+++++++++++++-].120 Tested: 174 Avg. Selected: 40.99167 Min Tests: 29 Max Tests: 58 Mean Tests: 42.75862 . MAD: 0.2539299 .[++++-++–].[++++++-++-].[+–++++–].[++++++++++++++++++++]…[+++-+-].[+++++-+-].[++++++++++++++++++++]…[++–].[++++++++-].[+++++++++-++++++-+].130 Tested: 174 Avg. Selected: 40.63077 Min Tests: 33 Max Tests: 60 Mean Tests: 46.32184 . MAD: 0.252978 .[++++++++-+-].[++++++++++++-++-]..[+++++++++++++++-]..[+++-+–].[+++++-+-].[++++++++++++–]..[+++++++++++++++-+-]..[++++-+-].[++++++++++-]..[+++++++++++++-].140 Tested: 174 Avg. Selected: 40.36429 Min Tests: 38 Max Tests: 66 Mean Tests: 49.88506 . MAD: 0.252622 .[++++++-+–++-].[+++++++++++++++–+]..[+++++-].[+++++++++++-]..[++++-+++++++-]..[++-].[++++++++++-+-]..[++++++++++++-]..[+++-+++-].[++++++++++++++++++-].150 Tested: 174 Avg. Selected: 40.26 Min Tests: 40 Max Tests: 68 Mean Tests: 53.44828 . MAD: 0.2521148


bpraw <- predictionStats_binary(cvBSWiMSRaw$medianTest,"BSWiMS RAW",cex=0.60)

BSWiMS RAW

pander::pander(bpraw$CM.analysis$tab)
  Outcome + Outcome - Total
Test + 73 11 84
Test - 16 74 90
Total 89 85 174
pander::pander(bpraw$accc)
est lower upper
0.845 0.782 0.895
pander::pander(bpraw$aucs)
est lower upper
0.947 0.918 0.976
pander::pander(bpraw$berror)
50% 2.5% 97.5%
0.153 0.102 0.209

## The validation with Outcome-driven Decorrelation
cvBSWiMSDeCor <- randomCV(DARWIN,
                "class",
                trainSampleSets= cvBSWiMSRaw$trainSamplesSets,
                fittingFunction= filteredFit,
                fitmethod=BSWiMS.model,
                filtermethod=NULL,
                DECOR = TRUE,
                DECOR.control=list(Outcome="class",thr=0.8)
)

.[+++++-].[+++++-].[++-++++–].[++++++-].[+++++++-].[++++++++++++++-]..[++—].[++-].[++++++++-+-++-++-]..[+++++-+–]10 Tested: 171 Avg. Selected: 25.8 Min Tests: 1 Max Tests: 8 Mean Tests: 3.625731 . MAD: 0.2608507 .[++-].[++++++++++++++++-]..[+++-].[+++-+—].[+++++-+—-].[++++++-].[+++++++++-++-]..[++++++++++-]..[++++-+-].[+++–++–]20 Tested: 174 Avg. Selected: 27.55 Min Tests: 2 Max Tests: 14 Mean Tests: 7.126437 . MAD: 0.2533841 .[+++++-].[+++++—].[++++++++-].[+++-].[++++-].[+++++-].[+++++-].[+++++++–].[+++++++++-++-]..[+++-]30 Tested: 174 Avg. Selected: 26.46667 Min Tests: 5 Max Tests: 18 Mean Tests: 10.68966 . MAD: 0.2434923 .[+++–++-].[+++-].[++++++-].[+++++++++++++–]..[++++++++++++++++++++]…[++++++++++-]..[+-++-].[++++++-].[++++++–].[++++-]40 Tested: 174 Avg. Selected: 27.725 Min Tests: 7 Max Tests: 25 Mean Tests: 14.25287 . MAD: 0.241383 .[+++++++++++++++++++-]..[++++++++++-]..[+-++++++++-].[+++++-].[+++++++-].[++++++++-].[+++++-].[++++-].[++–+++++-].[++++-++-]50 Tested: 174 Avg. Selected: 28.8 Min Tests: 9 Max Tests: 27 Mean Tests: 17.81609 . MAD: 0.2438322 .[++++++++++–]..[+++++–].[+++++++-].[++++++-+-].[++++++-].[++++++++-].[+++++++-++-].[+++++-].[++++-++-].[++-++-+-]60 Tested: 174 Avg. Selected: 28.85 Min Tests: 12 Max Tests: 32 Mean Tests: 21.37931 . MAD: 0.2432394 .[++++++++-].[++++++++++++-]..[++++-+++++-+-]..[++++++++-+++-]..[+—].[+++++-].[++++++++–++—]..[+++++++-].[++++++++++++-]..[++++-]70 Tested: 174 Avg. Selected: 29.34286 Min Tests: 15 Max Tests: 38 Mean Tests: 24.94253 . MAD: 0.2436433 .[++++++++++-]..[+++++++++++++-+-]..[++++++++++-]..[++++++-++–].[+++++++-].[++++-++-].[++++++-].[+++++-].[++++++++-].[+++++-++-]80 Tested: 174 Avg. Selected: 30.05 Min Tests: 16 Max Tests: 42 Mean Tests: 28.50575 . MAD: 0.2461633 .[+++++-+++–].[++++++++-].[++++++++++++++++++++]…[+++-+–].[+-].[+++-].[+++++++++-].[++++++-].[+++++++++++-++++-]..[+++++-++–]90 Tested: 174 Avg. Selected: 30.13333 Min Tests: 20 Max Tests: 45 Mean Tests: 32.06897 . MAD: 0.2467647 .[+++++++++++++++-+-]..[+++++++++++-++-]..[+++-].[++++++++++++++–+-]..[+++++++++++++++-]..[++++-+-+-].[++++-++++–].[++++-+++-].[+++++++++-].[++++++++–+-]100 Tested: 174 Avg. Selected: 31.3 Min Tests: 23 Max Tests: 49 Mean Tests: 35.63218 . MAD: 0.2474804 .[+++++++++-].[+++-+–].[++++-].[++++-+–].[+++-].[++++++++++-+–]..[+++++++++++–++-]..[++++++-].[+++++++++–].[++++-]110 Tested: 174 Avg. Selected: 30.97273 Min Tests: 26 Max Tests: 54 Mean Tests: 39.1954 . MAD: 0.2478764 .[+++++++—].[++-].[++++++++++++++++++-]..[+++++++-+-].[+++++-++++-].[++++++++-+-].[++++++–++-].[++++-].[++++++++++-+++-]..[+++++-]120 Tested: 174 Avg. Selected: 31.275 Min Tests: 29 Max Tests: 58 Mean Tests: 42.75862 . MAD: 0.2482969 .[+++-+-].[++++–].[+++++++++-].[++++++++++-+++-+++]..[+++++-+-].[++-].[++++++++++++++++++++]…[+++–+++-].[++++-+++-+-].[++++++++++-++-++-].130 Tested: 174 Avg. Selected: 31.53846 Min Tests: 33 Max Tests: 60 Mean Tests: 46.32184 . MAD: 0.2487497 .[++-+-].[++++++++-].[+++++++–++-].[++++++++++++-++++-]..[+++-].[+++-].[+++++–+-].[++++-++-+-].[+++++-+++++-]..[+++++++-+-+++-].140 Tested: 174 Avg. Selected: 31.41429 Min Tests: 38 Max Tests: 66 Mean Tests: 49.88506 . MAD: 0.2503202 .[+++++-].[++++-].[++++++-+-].[++++++++-].[+++++++-++–].[++—++–].[+-].[+++++++-].[+-+–].[++++-+-]150 Tested: 174 Avg. Selected: 30.81333 Min Tests: 40 Max Tests: 68 Mean Tests: 53.44828 . MAD: 0.2514641


bpDecor <- predictionStats_binary(cvBSWiMSDeCor$medianTest,"Outcome-Driven GDSTM",cex=0.60)

Outcome-Driven GDSTM

pander::pander(bpDecor$CM.analysis$tab)
  Outcome + Outcome - Total
Test + 77 11 88
Test - 12 74 86
Total 89 85 174
pander::pander(bpDecor$accc)
est lower upper
0.868 0.808 0.914
pander::pander(bpDecor$aucs)
est lower upper
0.956 0.93 0.983
pander::pander(bpDecor$berror)
50% 2.5% 97.5%
0.132 0.0861 0.179

### Here we compute the probability that the outcome-driven decorrelation ROC is superior to the RAW ROC. 
pander::pander(roc.test(bpDecor$ROC.analysis$roc.predictor,bpraw$ROC.analysis$roc.predictor,alternative = "greater"))
DeLong’s test for two correlated ROC curves: bpDecor$ROC.analysis$roc.predictor and bpraw$ROC.analysis$roc.predictor
Test statistic P value Alternative hypothesis AUC of roc1 AUC of roc2
1.8 0.0363 * greater 0.956 0.947

### Testing improving proability
iprob <- .Call("improveProbCpp",cvBSWiMSRaw$medianTest[,2],
               cvBSWiMSDeCor$medianTest[,2],
               cvBSWiMSRaw$medianTest[,1]);
pander::pander(iprob)
  • z.idi: 0.15
  • z.nri: -1.3
  • idi: 0.0011
  • nri: -0.196
### Testing improving accuracy
testRaw <- (cvBSWiMSRaw$medianTest[,1]-cvBSWiMSRaw$medianTest[,2])<0.5
testDecor <- (cvBSWiMSDeCor$medianTest[,1]-cvBSWiMSDeCor$medianTest[,2])<0.5
pander::pander(mcnemar.test(testRaw,testDecor))
McNemar’s Chi-squared test with continuity correction: testRaw and testDecor
Test statistic df P value
2.25 1 0.134

## The validation of Decorrelation without the outcome restriction
cvBSWiMSDeCorU <- randomCV(DARWIN,
                "class",
                trainSampleSets= cvBSWiMSRaw$trainSamplesSets,
                fittingFunction= filteredFit,
                fitmethod=BSWiMS.model,
                filtermethod=NULL,
                DECOR = TRUE,
                DECOR.control=list(thr=0.8)
)

.[+++++++-].[++++++++++++-]..[++++-+++++-+-]..[+++++++-].[+++++++-+-].[+++++++-].[++++++++++-]..[+++++++—].[+++++++++–+++-]..[+++++-]10 Tested: 171 Avg. Selected: 34.7 Min Tests: 1 Max Tests: 8 Mean Tests: 3.625731 . MAD: 0.258054 .[+++-].[++++++-++-].[++-++-].[+++-+-].[+++++-].[+++++++++–].[++++–].[++++++++++++-++-]..[++++++-].[++++++-]20 Tested: 174 Avg. Selected: 30.95 Min Tests: 2 Max Tests: 14 Mean Tests: 7.126437 . MAD: 0.2546943 .[++++++++++++++–]..[++++++++–].[+++++++++++-]..[+-+++—].[+++++++-+++–]..[++++-+-].[+++++++-].[+++++++++-].[+++++++-].[+++++-]30 Tested: 174 Avg. Selected: 31.46667 Min Tests: 5 Max Tests: 18 Mean Tests: 10.68966 . MAD: 0.2522787 .[++++++-+-].[++++++++++++++++-++]..[+++++-].[++++++++++-+–+-]..[+++++++++-].[++++++++++-]..[++++++++++++-]..[+++++-].[++++++-].[++++-]40 Tested: 174 Avg. Selected: 32.6 Min Tests: 7 Max Tests: 25 Mean Tests: 14.25287 . MAD: 0.2514554 .[+++++++++++-+++-]..[++++++–].[++-+-+++-].[++++++–++-].[+++++++-].[++++++++++++-]..[+++-].[++++-++++-].[+++++++++++++-+-]..[+++++++-+++-].50 Tested: 174 Avg. Selected: 33.36 Min Tests: 9 Max Tests: 27 Mean Tests: 17.81609 . MAD: 0.2555964 .[++++++++++-]..[++++++-].[++++++++-].[++++-++-].[+++++++++++-]..[++++++++-+-].[++++++++–+-].[+++++-].[+++++++++++++++++-]..[++++++-]60 Tested: 174 Avg. Selected: 33.88333 Min Tests: 12 Max Tests: 32 Mean Tests: 21.37931 . MAD: 0.2570975 .[+++++++-+-].[++++++++-].[+++-++++–].[+++-].[+++++++++++-]..[+++++–+++-].[++++++-++-+-].[++++++++++++++++-++]..[+++++++++-].[+++++–]70 Tested: 174 Avg. Selected: 34.05714 Min Tests: 15 Max Tests: 38 Mean Tests: 24.94253 . MAD: 0.2554043 .[++++++++++-+++-]..[++++++++++++++++++++]…[++++++++++++++++-]..[++++-++++–].[++++++++-].[++++-].[+++++++++-].[++++++++++—+-]..[+++++++++-].[++++++++-]80 Tested: 174 Avg. Selected: 35.2625 Min Tests: 16 Max Tests: 42 Mean Tests: 28.50575 . MAD: 0.25775 .[+++++++++++-]..[+++++++++-].[+++++++++++++++-]..[+++++++++-].[++++-].[++++++–].[+++++++++++++-]..[+++++++-++++-]..[++++++++++++++++++-]..[++++-]90 Tested: 174 Avg. Selected: 35.92222 Min Tests: 20 Max Tests: 45 Mean Tests: 32.06897 . MAD: 0.2574025 .[++++++++++++++++++++]…[+++++-].[++++—-].[+++++++++++++++-+++]..[+++++++++++-++–]..[++++++++++++-++++++]..[+++-].[+++++++-+++–]..[++++–].[+++++—-]100 Tested: 174 Avg. Selected: 36.26 Min Tests: 23 Max Tests: 49 Mean Tests: 35.63218 . MAD: 0.2570899 .[++++++++++-+–]..[++++-].[+++++–].[++++++-].[+++-].[++++++++++++++-]..[++++++-].[++++++++++-]..[+++++++++++++–]..[+++++++-+-+-]110 Tested: 174 Avg. Selected: 36.12727 Min Tests: 26 Max Tests: 54 Mean Tests: 39.1954 . MAD: 0.2559931 .[+++-+–].[++++++++++-++-++-]..[+++++++++++-++-+-]..[+++++++-].[+++-+++++-++–]..[+++++++++-].[++++++++–].[++++++-].[+++++++-].[+++++-+-+-]120 Tested: 174 Avg. Selected: 36.05 Min Tests: 29 Max Tests: 58 Mean Tests: 42.75862 . MAD: 0.2562526 .[++++++-].[+++++-++-].[+++++++++-].[++++++++++++++++-++]..[++++++-].[+++++++++—].[+++++++++++++++++-+]..[++-+-].[+++++++++-].[++++++++++++++++-].130 Tested: 174 Avg. Selected: 36.26154 Min Tests: 33 Max Tests: 60 Mean Tests: 46.32184 . MAD: 0.2578297 .[++++++++-].[++-].[+++++-+-].[++++++++++++++++-]..[++++++++++-]..[+++++–+-+-].[+++++++++++++-]..[++++++++-].[+++++++++++-]..[++++++++++++++-].140 Tested: 174 Avg. Selected: 36.4 Min Tests: 38 Max Tests: 66 Mean Tests: 49.88506 . MAD: 0.2572715 .[+++-].[+++++++++-].[++++++++++++-++-]..[+++++-].[+++++++++++-]..[+-++-].[++++-++-+-].[++++++++—].[+++++++-].[++++++++++++-].150 Tested: 174 Avg. Selected: 36.19333 Min Tests: 40 Max Tests: 68 Mean Tests: 53.44828 . MAD: 0.2573273


bpDecorU <- predictionStats_binary(cvBSWiMSDeCorU$medianTest,"Blind Decorrelation",cex=0.60)

Blind Decorrelation

pander::pander(bpDecorU$CM.analysis$tab)
  Outcome + Outcome - Total
Test + 80 9 89
Test - 9 76 85
Total 89 85 174
pander::pander(bpDecorU$accc)
est lower upper
0.897 0.841 0.938
pander::pander(bpDecorU$aucs)
est lower upper
0.958 0.933 0.983
pander::pander(bpDecorU$berror)
50% 2.5% 97.5%
0.103 0.0623 0.154

### Here we compute the probability that the blind decorrelation ROC is superior to the RAW ROC. 

pander::pander(roc.test(bpDecorU$ROC.analysis$roc.predictor,bpraw$ROC.analysis$roc.predictor,alternative = "greater"))
DeLong’s test for two correlated ROC curves: bpDecorU$ROC.analysis$roc.predictor and bpraw$ROC.analysis$roc.predictor
Test statistic P value Alternative hypothesis AUC of roc1 AUC of roc2
1.97 0.0242 * greater 0.958 0.947
par(op)

## Testing probability improvement
iprob <- .Call("improveProbCpp",cvBSWiMSRaw$medianTest[,2],cvBSWiMSDeCorU$medianTest[,2],cvBSWiMSRaw$medianTest[,1]);
pander::pander(iprob)
  • z.idi: -1.23
  • z.nri: -2.65
  • idi: -0.0105
  • nri: -0.393

## Testing accuracy improvement
testDecorU <- (cvBSWiMSDeCorU$medianTest[,1]-cvBSWiMSDeCorU$medianTest[,2])<0.5
pander::pander(mcnemar.test(testRaw,testDecorU))
McNemar’s Chi-squared test with continuity correction: testRaw and testDecorU
Test statistic df P value
5.14 1 0.0233 *

The Raw Model vs. the Decorrelated-Based Model

After demonstrating that decorrelation is able to improve BSWiMS model performance, I’ll focus is showcasing the ability to discover new features associated with the outcome.

First, I’ll compute the BSWiMS models for the original data, and for the decorrelated data-set. The model estimation will be done using the training set and tested on the holdout test set, and repeated 10 times. After that, I’ll compare the statistical difference of both ROC curves.

par(op)
par(mfrow=c(1,3))

bm <- BSWiMS.model(class~.,trainSet,NumberofRepeats = 20)

[+++++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-+-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++–+-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]………………………………..

bpraw <- predictionStats_binary(cbind(testSet$class,predict(bm,testSet)),"BSWiMS RAW",cex=0.60)

BSWiMS RAW


bmd <- BSWiMS.model(class~.,deTrain,NumberofRepeats = 20)

[++++++++++++++++++-+++++++++++++++++++-+++++++++++++-++++++++++++++++++++++-+++++++++++++++-++++++++++++++++++-+++++++++++++++-+++++++++++++++++-++++++++++++++-+++-++++++++++++++++-+++++++++++–+++++++++++++++++-++++++++++++-+++++-+++++++++++++++++-+++++++++++++–++++++++++++-++++++++++–+++++++++++-++++–+++++++++++-+++-+-++++++++++++++-]………………………….

bpdecor <- predictionStats_binary(cbind(deTest$class,predict(bmd,deTest)),"Outcome-Driven Decor",cex=0.60)

Outcome-Driven Decor


## Comparing the two ROC curves
pander::pander(roc.test(bpdecor$ROC.analysis$roc.predictor,bpraw$ROC.analysis$roc.predictor,alternative = "greater"))
DeLong’s test for two correlated ROC curves: bpdecor$ROC.analysis$roc.predictor and bpraw$ROC.analysis$roc.predictor
Test statistic P value Alternative hypothesis AUC of roc1 AUC of roc2
1.85 0.0321 * greater 0.954 0.917
## Comparing the test accuracies
testRaw <- (testSet$class-predict(bm,testSet))<0.5
testDecor <- (deTest$class-predict(bmd,deTest))<0.5
pander::pander(mcnemar.test(testRaw,testDecor))
McNemar’s Chi-squared test with continuity correction: testRaw and testDecor
Test statistic df P value
1.33 1 0.248


bmdU <- BSWiMS.model(class~.,deTrainU,NumberofRepeats = 20)

[+++-+++++-++++++++++++-+-+++++++++-++++++++++++-+–++++++++++++-+-++++++++–+-+++++-+++++++++++++–++++++-++++++++++++++-++++++-++++++++++–++++++-+-++++++++++—-++++++++-+++++++++++++-++++++++++++–++++++++++++–++++++-+-++++++-+++-++-]……………….

bpdecorU <- predictionStats_binary(cbind(deTest$class,predict(bmdU,deTestU)),"Blind Decorrrelation",cex=0.60)

Blind Decorrrelation


## Comparing the test curves
pander::pander(roc.test(bpdecorU$ROC.analysis$roc.predictor,bpraw$ROC.analysis$roc.predictor,alternative = "greater"))
DeLong’s test for two correlated ROC curves: bpdecorU$ROC.analysis$roc.predictor and bpraw$ROC.analysis$roc.predictor
Test statistic P value Alternative hypothesis AUC of roc1 AUC of roc2
1.63 0.0514 greater 0.947 0.917
## Comparing the accuracies
testDecorU <- (deTestU$class-predict(bmdU,deTestU))<0.5
pander::pander(mcnemar.test(testRaw,testDecorU))
McNemar’s Chi-squared test: testRaw and testDecorU
Test statistic df P value
0 1 1

par(op)

The Feature Associations

I’ll print the graph showing the association between features. Each feature cluster represents a logistic regression formula (formula nugget) discovered by the BSWiMS method. The figure will plot:

  • Raw formula network

  • Outcome-driven network

  • Blind network

The plots will show only formula networks with more than 50% of occurrence and 25% of feature to feature association.

par(op)
par(mfrow=c(1,3))
### The raw model

pander::pander(nrow(bm$bagging$formulaNetwork))

112



cmax <- apply(bm$bagging$formulaNetwork,2,max)
cnames <- names(cmax[cmax>=0.5])
cmax <- cmax[cmax>=0.5]
adma <- bm$bagging$formulaNetwork[cnames,cnames]

for (cx in c(1:nrow(namecode)))
{
  cnames <- str_replace_all(cnames,namecode[cx,1],namecode[cx,2])
}
cnames <- str_replace_all(cnames,"_","")
cnames <- str_replace_all(cnames,"th","")
rownames(adma) <- cnames
colnames(adma) <- cnames
names(cmax) <- cnames
adma[adma<0.25] <- 0;
gr <- graph_from_adjacency_matrix(adma,mode = "undirected",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
plot(fc, gr,
     vertex.size=20*cmax,
     vertex.label.cex=0.5,
     vertex.label.dist=0,
     main="Original Feature Association")



### The Outcome Driven Model

pander::pander(nrow(bmd$bagging$formulaNetwork))

103



cmax <- apply(bmd$bagging$formulaNetwork,2,max)
cnames <- names(cmax[cmax>=0.5])
outcomeNames <- cnames

cmax <- cmax[cmax>=0.5]
adma <- bmd$bagging$formulaNetwork[cnames,cnames]

for (cx in c(1:nrow(namecode)))
{
  cnames <- str_replace_all(cnames,namecode[cx,1],namecode[cx,2])
}
cnames <- str_replace_all(cnames,"_","")
cnames <- str_replace_all(cnames,"th","")
rownames(adma) <- cnames
colnames(adma) <- cnames
names(cmax) <- cnames
adma[adma<0.25] <- 0;
gr <- graph_from_adjacency_matrix(adma,mode = "undirected",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
clusterOutcome <- fc
clusterOutcome$names <- outcomeNames

plot(fc, gr,
     vertex.size=20*cmax,
     vertex.label.cex=0.5,
     vertex.label.dist=0,
     main="Outcome-Driven Decorrelation")


### The Blind Decorrelation

pander::pander(nrow(bmdU$bagging$formulaNetwork))

83



cmax <- apply(bmdU$bagging$formulaNetwork,2,max)
cnames <- names(cmax[cmax>=0.5])
cmax <- cmax[cmax>=0.5]
adma <- bmdU$bagging$formulaNetwork[cnames,cnames]

for (cx in c(1:nrow(namecode)))
{
  cnames <- str_replace_all(cnames,namecode[cx,1],namecode[cx,2])
}
cnames <- str_replace_all(cnames,"_","")
cnames <- str_replace_all(cnames,"th","")
rownames(adma) <- cnames
colnames(adma) <- cnames
names(cmax) <- cnames
adma[adma<0.25] <- 0;
gr <- graph_from_adjacency_matrix(adma,mode = "undirected",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
plot(fc, gr,
     vertex.size=20*cmax,
     vertex.label.cex=0.5,
     vertex.label.dist=0,
     main="Blind Decorrelation")

Feature Analysis of Models

The analysis of the features required to predict the outcome will use the following:

  1. Analysis of the BSWiMS bagged model using the summary function.

  2. Analysis of the sparse GDSMT

  3. Analysis of the univariate association of the model features of both models

  4. Report the new features not found by the Original data analysis

par(op)
par(mfrow=c(1,1))
## 1 Get the Model Features
smOriginal <- summary(bm)
rawnames <- rownames(smOriginal$coefficients)

### From Drived Decorrelation
smDecor <- summary(bmd)
decornames <- rownames(smDecor$coefficients)

### From Blind Decorrelation
smDecorU <- summary(bmdU)
decornamesU <- rownames(smDecorU$coefficients)



## 2 Get the decorrelation matrix formulas
dc <- getDerivedCoefficients(deTrain)
### 2a Get only the ones that were decorrelated by the decorrelation-based model
deNames_in_dc <- decornames[decornames %in% names(dc)]
selectedlist <- dc[deNames_in_dc]
theDeFormulas <- selectedlist
pander::pander(selectedlist)
  • De_gmrt_in_air19:

    gmrt_in_air19 mean_speed_in_air19
    1 -0.944
  • De_mean_jerk_in_air19:

    mean_acc_in_air19 mean_jerk_in_air19
    -1.14 1
  • De_mean_gmrt19:

    gmrt_on_paper19 mean_gmrt19 mean_speed_in_air19
    -0.398 1 -0.685
  • De_paper_time10:

    disp_index10 paper_time10
    -1.62 1
  • De_num_of_pendown21:

    air_time21 num_of_pendown21
    -0.803 1
  • De_pressure_mean5:

    max_y_extension5 pressure_mean5
    -0.889 1
  • De_mean_gmrt10:

    gmrt_in_air10 mean_gmrt10
    -0.706 1
  • De_mean_jerk_in_air7:

    mean_acc_in_air7 mean_jerk_in_air7
    -1.15 1
  • De_mean_acc_on_paper21:

    mean_acc_on_paper21 mean_speed_on_paper21
    1 0.61
  • De_mean_speed_on_paper15:

    gmrt_on_paper15 mean_speed_on_paper15
    -1.02 1
  • De_disp_index17:

    disp_index17 max_y_extension17
    1 -1.4
  • De_mean_jerk_on_paper5:

    max_y_extension5 mean_jerk_on_paper5
    0.513 1
  • De_mean_gmrt23:

    gmrt_in_air23 mean_gmrt23
    -0.845 1
  • De_mean_speed_on_paper9:

    gmrt_on_paper9 mean_speed_on_paper9
    -1.09 1
  • De_paper_time5:

    max_y_extension5 mean_speed_on_paper5 paper_time5
    -1.15 0.946 1
  • De_paper_time8:

    disp_index8 paper_time8
    -1.38 1
  • De_mean_jerk_on_paper4:

    mean_acc_on_paper4 mean_jerk_on_paper4
    -0.457 1
  • De_total_time10:

    air_time10 total_time10
    -0.632 1
  • De_max_y_extension24:

    max_x_extension24 max_y_extension24
    -0.987 1
  • De_mean_jerk_in_air2:

    mean_acc_in_air2 mean_jerk_in_air2
    -1.19 1
names(selectedlist) <- NULL
### 2b Get the the names of the original features

allDevar <- unique(c(names(unlist(selectedlist)),decornames))
allDevar <- allDevar[!str_detect(allDevar,"De_")]
allDevar <- str_remove(allDevar,"Ba_")
allDevar <- unique(allDevar)


# The analysis of the blind decorrelation

dcU <- getDerivedCoefficients(deTrainU)
### 2a Get only the ones that were decorrelated by the decorrelation-based model
deNames_in_dcU <- decornamesU[decornamesU %in% names(dcU)]
selectedlistU <- dcU[deNames_in_dcU]
pander::pander(selectedlistU)
  • De_mean_gmrt23:

    gmrt_in_air23 mean_gmrt23
    -0.845 1
  • De_air_time9:

    air_time9 total_time9
    1 -1.58
  • De_disp_index17:

    disp_index17 max_y_extension17
    1 -1.4
  • De_paper_time10:

    disp_index10 paper_time10
    -1.62 1
  • De_mean_acc_on_paper21:

    mean_acc_on_paper21 mean_speed_on_paper21
    1 0.61
  • De_pressure_mean5:

    max_y_extension5 pressure_mean5
    -0.889 1
  • De_paper_time8:

    disp_index8 paper_time8
    -1.38 1
  • De_paper_time9:

    paper_time9 total_time9
    1 -0.737
  • De_paper_time1:

    disp_index1 paper_time1
    -1.22 1
  • De_mean_speed_in_air19:

    gmrt_in_air19 mean_speed_in_air19
    -1.03 1
  • De_total_time10:

    air_time10 total_time10
    -0.632 1
  • De_mean_jerk_in_air7:

    mean_acc_in_air7 mean_jerk_in_air7
    -1.15 1
  • De_mean_speed_in_air24:

    gmrt_in_air24 mean_acc_in_air24 mean_speed_in_air24
    -0.857 -0.0899 1
  • De_max_y_extension24:

    max_x_extension24 max_y_extension24
    -0.987 1
names(selectedlistU) <- NULL
### 2b Get the the names of the original features

allDevarU <- unique(c(names(unlist(selectedlistU)),decornamesU))
allDevarU <- allDevarU[!str_detect(allDevarU,"De_")]
allDevarU <- str_remove(allDevarU,"Ba_")
allDevarU <- unique(allDevarU)

pander::pander(c(length(rawnames),length(decornames),length(decornamesU)))

95, 92 and 67

pander::pander(c(length(rawnames),length(allDevar),length(allDevarU)))

95, 104 and 76



### 2c Get only the new feautres not found in the original analysis
dvar <- allDevar[!(allDevar %in% rawnames)] 

### 2d Get the decorrelated variables that have new features
newvars <- character();
for (cvar in deNames_in_dc)
{
  lvar <- dc[cvar]
  names(lvar) <- NULL
  lvar <- names(unlist(lvar))
  if (length(lvar[lvar %in% dvar]) > 0)
  {
     newvars <- append(newvars,cvar)
  }
}

## 3 Here is the univariate z values of the orignal set
#pander::pander(bm$univariate[dvar,])
## 4 Here is the univariate z values of the decorrelated set
#pander::pander(bmd$univariate[newvars,])

## 4a The scater plot of the decorrelated vs original Univariate values

zvalueNew <- bmd$univariate[newvars,]
rownames(zvalueNew) <- str_remove(rownames(zvalueNew),"De_")
rownames(zvalueNew) <- str_remove(rownames(zvalueNew),"Ba_")

zvaluePrePost <- bm$univariate[rownames(zvalueNew),c(1,3)]
zvaluePrePost$Name <- NULL
zvaluePrePost$NewZ <- zvalueNew[rownames(zvaluePrePost),"ZUni"]
pander::pander(zvaluePrePost)
  ZUni NewZ
gmrt_in_air19 3.5727 2.82
mean_jerk_in_air19 2.5727 1.16
mean_gmrt19 3.1903 4.55
paper_time10 5.1654 3.70
num_of_pendown21 2.2029 2.30
pressure_mean5 5.5292 5.14
mean_gmrt10 4.5398 3.99
mean_jerk_in_air7 2.8557 5.28
mean_acc_on_paper21 4.4476 2.08
mean_speed_on_paper15 5.0906 3.46
disp_index17 4.2319 4.75
mean_jerk_on_paper5 2.9614 4.16
paper_time5 0.0963 3.31
mean_jerk_on_paper4 2.0792 2.24
total_time10 5.0610 3.08
max_y_extension24 0.3149 2.80
mean_jerk_in_air2 4.3791 2.93
plot(zvaluePrePost,
     xlim=c(-0.5,6.5),
     ylim=c(0,7),
     xlab="Original Z",
     ylab="Decorrelated Z",
     main="Unviariate IDI Z Values",
     pch=3,cex=0.5,
     col="red")
abline(v=1.96,col="blue")
abline(h=1.96,col="blue")
text(zvaluePrePost$ZUni,zvaluePrePost$NewZ,rownames(zvaluePrePost),srt=65,cex=0.75)

The Summary of the Decorrelated-Based Model

Here I will print the summary statistics of the Logistic models found by BSWiMS, using the original and transformed dataset. After that, I will show the characteristics of the features not found by the original analysis.


pander::pander(smOriginal$coefficients)
  Estimate lower OR upper u.Accuracy r.Accuracy full.Accuracy u.AUC r.AUC full.AUC IDI NRI z.IDI z.NRI Delta.AUC Frequency
total_time23 0.12158 1.0787 1.129 1.182 0.812 0.718 0.857 0.813 0.719 0.857 0.1875 0.933 6.01 6.04 0.1385 1.00
paper_time9 0.26451 1.1812 1.303 1.437 0.732 0.785 0.876 0.732 0.785 0.876 0.2098 1.019 5.35 6.69 0.0907 1.00
gmrt_on_paper9 -0.09026 0.8793 0.914 0.949 0.723 0.780 0.860 0.724 0.781 0.860 0.1697 0.967 5.00 5.87 0.0795 0.35
num_of_pendown19 -1.00103 0.2366 0.368 0.571 0.643 0.847 0.920 0.643 0.847 0.921 0.1574 1.215 4.80 8.74 0.0737 1.00
air_time23 0.08090 1.0478 1.084 1.122 0.795 0.783 0.856 0.795 0.783 0.856 0.1292 0.896 4.67 5.45 0.0730 1.00
total_time22 0.40245 1.2412 1.495 1.802 0.714 0.840 0.905 0.715 0.840 0.905 0.1613 1.203 4.62 9.17 0.0642 1.00
mean_speed_in_air23 -0.56877 0.4452 0.566 0.720 0.714 0.829 0.899 0.715 0.830 0.899 0.1562 1.081 4.62 7.23 0.0694 1.00
paper_time23 0.50460 1.3264 1.656 2.068 0.821 0.847 0.906 0.821 0.847 0.906 0.1495 1.206 4.53 8.51 0.0590 1.00
paper_time8 0.42740 1.2551 1.533 1.873 0.634 0.847 0.911 0.634 0.848 0.911 0.1465 1.222 4.52 8.84 0.0639 0.95
gmrt_in_air23 -0.33782 0.6148 0.713 0.828 0.705 0.815 0.877 0.706 0.815 0.877 0.1472 0.854 4.47 5.11 0.0618 1.00
total_time9 0.14201 1.0770 1.153 1.234 0.741 0.814 0.872 0.742 0.815 0.872 0.1513 0.993 4.46 6.58 0.0577 1.00
paper_time25 0.15081 1.0885 1.163 1.242 0.696 0.877 0.942 0.698 0.878 0.943 0.1253 1.535 4.46 13.89 0.0649 0.10
gmrt_on_paper23 -0.01321 0.9809 0.987 0.993 0.688 0.685 0.790 0.688 0.685 0.790 0.1576 0.710 4.40 4.04 0.1050 0.15
total_time6 0.15618 1.0844 1.169 1.260 0.732 0.807 0.878 0.733 0.807 0.878 0.1374 1.096 4.36 7.18 0.0712 1.00
mean_jerk_on_paper8 0.35992 1.2129 1.433 1.694 0.598 0.811 0.876 0.598 0.811 0.876 0.1376 1.067 4.36 6.88 0.0650 0.55
disp_index6 0.23276 1.1363 1.262 1.402 0.625 0.802 0.867 0.626 0.802 0.867 0.1335 0.953 4.33 5.86 0.0654 0.85
total_time16 0.05891 1.0284 1.061 1.094 0.705 0.776 0.851 0.707 0.776 0.851 0.1395 0.989 4.33 6.17 0.0743 0.80
mean_speed_in_air19 -0.03078 0.9561 0.970 0.983 0.679 0.778 0.823 0.677 0.779 0.822 0.1303 0.783 4.25 4.52 0.0437 0.15
paper_time6 0.17195 1.0944 1.188 1.289 0.652 0.808 0.861 0.653 0.809 0.861 0.1366 0.998 4.20 6.38 0.0522 1.00
air_time6 0.07822 1.0394 1.081 1.125 0.732 0.802 0.865 0.732 0.803 0.865 0.1316 1.121 4.19 7.49 0.0623 0.95
total_time7 0.12701 1.0676 1.135 1.208 0.741 0.818 0.890 0.742 0.818 0.890 0.1209 1.039 4.15 6.93 0.0720 1.00
max_x_extension6 0.19735 1.1064 1.218 1.341 0.527 0.802 0.859 0.527 0.802 0.859 0.1293 0.975 4.12 5.97 0.0571 0.60
air_time15 0.10939 1.0590 1.116 1.175 0.759 0.828 0.866 0.759 0.828 0.866 0.1397 0.919 4.12 5.52 0.0378 1.00
air_time22 0.09774 1.0491 1.103 1.159 0.732 0.817 0.866 0.732 0.817 0.866 0.1304 0.992 4.09 6.26 0.0492 1.00
mean_speed_in_air25 -0.23477 0.7032 0.791 0.889 0.714 0.806 0.866 0.715 0.807 0.866 0.1152 0.947 4.07 5.83 0.0588 1.00
air_time16 0.05598 1.0276 1.058 1.088 0.732 0.792 0.852 0.733 0.792 0.852 0.1236 0.914 4.06 5.52 0.0601 0.90
max_x_extension21 -1.78665 0.0703 0.168 0.399 0.580 0.839 0.886 0.584 0.840 0.886 0.1242 1.240 4.04 8.81 0.0462 1.00
gmrt_in_air7 -0.20189 0.7368 0.817 0.906 0.714 0.824 0.884 0.715 0.824 0.884 0.1305 0.966 4.04 6.05 0.0598 1.00
pressure_mean5 -0.82731 0.2879 0.437 0.664 0.696 0.776 0.818 0.700 0.777 0.818 0.1207 0.818 4.03 5.12 0.0417 0.90
gmrt_in_air17 -0.05451 0.9222 0.947 0.972 0.741 0.752 0.817 0.741 0.753 0.817 0.1229 0.959 3.97 5.85 0.0637 0.75
total_time17 0.12493 1.0670 1.133 1.203 0.741 0.834 0.874 0.742 0.834 0.874 0.1103 0.975 3.96 6.20 0.0394 1.00
mean_acc_on_paper24 -0.04517 0.9341 0.956 0.978 0.545 0.804 0.854 0.545 0.804 0.854 0.1071 0.995 3.93 6.13 0.0503 0.10
num_of_pendown23 0.73082 1.4262 2.077 3.024 0.687 0.814 0.866 0.688 0.815 0.866 0.1104 0.988 3.92 6.19 0.0516 1.00
disp_index8 0.12083 1.0641 1.128 1.197 0.589 0.841 0.892 0.589 0.841 0.892 0.1176 1.044 3.91 6.63 0.0510 0.20
mean_speed_in_air7 -0.11521 0.8384 0.891 0.947 0.688 0.786 0.844 0.688 0.786 0.844 0.1229 0.773 3.91 4.51 0.0578 0.95
pressure_var19 -0.35082 0.5873 0.704 0.844 0.634 0.842 0.892 0.634 0.842 0.892 0.1133 0.999 3.87 6.28 0.0500 1.00
gmrt_in_air25 -0.17727 0.7616 0.838 0.921 0.696 0.805 0.855 0.697 0.805 0.855 0.1125 0.929 3.86 5.81 0.0501 0.90
gmrt_on_paper10 -0.06255 0.9099 0.939 0.970 0.670 0.801 0.853 0.670 0.802 0.853 0.1183 0.753 3.86 4.42 0.0518 0.40
pressure_mean9 -0.68024 0.3538 0.506 0.725 0.670 0.826 0.871 0.673 0.826 0.871 0.1079 0.955 3.86 6.23 0.0450 0.95
total_time24 0.06929 1.0337 1.072 1.111 0.687 0.786 0.857 0.688 0.787 0.857 0.1008 0.844 3.86 4.99 0.0702 0.60
mean_speed_on_paper9 -0.06752 0.9000 0.935 0.971 0.723 0.770 0.829 0.724 0.771 0.829 0.1182 0.790 3.84 4.63 0.0585 1.00
air_time13 0.07053 1.0354 1.073 1.112 0.670 0.797 0.854 0.670 0.798 0.854 0.1152 0.655 3.84 3.76 0.0562 1.00
paper_time22 0.07748 1.0372 1.081 1.126 0.679 0.773 0.827 0.680 0.773 0.827 0.1191 0.556 3.83 3.17 0.0540 0.60
air_time2 0.02548 1.0119 1.026 1.040 0.670 0.787 0.842 0.669 0.788 0.843 0.1096 0.792 3.83 4.61 0.0548 0.60
mean_acc_in_air25 -0.02256 0.9664 0.978 0.989 0.696 0.756 0.830 0.697 0.757 0.829 0.1024 0.642 3.83 3.66 0.0727 0.20
mean_gmrt23 -0.16300 0.7812 0.850 0.924 0.687 0.801 0.858 0.688 0.801 0.858 0.1177 0.722 3.82 4.16 0.0567 1.00
mean_gmrt7 -0.11298 0.8411 0.893 0.949 0.679 0.788 0.842 0.679 0.789 0.842 0.1195 0.697 3.82 3.99 0.0534 1.00
mean_gmrt25 -0.17986 0.7582 0.835 0.920 0.688 0.784 0.849 0.688 0.784 0.849 0.1124 0.902 3.79 5.74 0.0647 0.90
air_time24 0.01919 1.0095 1.019 1.029 0.688 0.801 0.851 0.688 0.801 0.851 0.0966 0.565 3.78 3.17 0.0501 0.20
num_of_pendown9 0.05588 1.0265 1.057 1.089 0.661 0.796 0.855 0.661 0.796 0.855 0.1100 0.927 3.77 5.82 0.0588 0.85
mean_speed_on_paper23 -0.10570 0.8507 0.900 0.952 0.670 0.783 0.842 0.670 0.783 0.842 0.1142 0.818 3.77 4.87 0.0593 0.70
total_time12 0.09709 1.0458 1.102 1.161 0.696 0.798 0.853 0.697 0.799 0.853 0.1098 0.878 3.76 5.38 0.0545 0.95
max_y_extension19 -0.58057 0.4079 0.560 0.768 0.482 0.808 0.858 0.485 0.808 0.858 0.1060 0.860 3.74 5.24 0.0505 0.70
air_time7 0.03479 1.0159 1.035 1.055 0.777 0.779 0.846 0.778 0.779 0.846 0.0861 0.766 3.73 4.54 0.0667 1.00
total_time18 0.01696 1.0080 1.017 1.026 0.643 0.700 0.771 0.644 0.700 0.771 0.0979 0.506 3.73 2.81 0.0710 0.60
total_time8 0.19419 1.0938 1.214 1.348 0.714 0.840 0.895 0.715 0.840 0.895 0.1098 1.085 3.70 6.95 0.0546 1.00
mean_gmrt17 -0.08232 0.8804 0.921 0.963 0.750 0.780 0.832 0.750 0.780 0.833 0.1023 0.864 3.69 5.16 0.0522 0.90
gmrt_on_paper7 -0.01252 0.9806 0.988 0.995 0.643 0.793 0.843 0.644 0.794 0.844 0.1087 0.853 3.69 5.12 0.0497 0.15
paper_time12 0.17618 1.0841 1.193 1.312 0.714 0.822 0.880 0.715 0.822 0.880 0.1048 0.871 3.68 5.33 0.0578 0.90
paper_time7 0.01978 1.0092 1.020 1.031 0.679 0.759 0.787 0.680 0.759 0.788 0.1119 0.753 3.67 4.38 0.0285 0.40
air_time17 0.13230 1.0630 1.141 1.226 0.795 0.867 0.899 0.795 0.867 0.900 0.0961 1.119 3.66 7.43 0.0330 1.00
mean_jerk_on_paper24 -0.33490 0.5951 0.715 0.860 0.616 0.819 0.867 0.615 0.819 0.867 0.0961 0.800 3.65 4.74 0.0478 0.90
gmrt_on_paper1 -0.05128 0.9231 0.950 0.978 0.661 0.851 0.905 0.660 0.851 0.905 0.0965 1.177 3.65 8.05 0.0538 0.15
air_time4 0.00485 1.0022 1.005 1.008 0.679 0.779 0.811 0.679 0.780 0.812 0.1023 0.758 3.65 4.34 0.0319 0.15
total_time13 0.06355 1.0296 1.066 1.103 0.723 0.769 0.825 0.725 0.769 0.825 0.1094 0.630 3.62 3.58 0.0563 1.00
num_of_pendown5 0.05472 1.0251 1.056 1.088 0.688 0.792 0.839 0.689 0.793 0.840 0.1011 0.769 3.61 4.53 0.0468 1.00
total_time2 0.08212 1.0374 1.086 1.136 0.687 0.800 0.850 0.688 0.800 0.850 0.1031 0.899 3.60 5.52 0.0498 1.00
disp_index22 0.02732 1.0113 1.028 1.044 0.705 0.764 0.808 0.706 0.764 0.808 0.0971 0.773 3.59 4.50 0.0440 0.20
total_time15 0.08073 1.0372 1.084 1.133 0.741 0.809 0.841 0.741 0.810 0.841 0.1285 0.611 3.59 3.48 0.0309 1.00
pressure_mean4 -0.19802 0.7366 0.820 0.914 0.670 0.768 0.816 0.673 0.768 0.816 0.0999 0.774 3.56 4.76 0.0490 0.30
paper_time17 0.11412 1.0533 1.121 1.193 0.723 0.807 0.840 0.724 0.807 0.840 0.1016 0.834 3.54 5.02 0.0328 0.85
mean_acc_on_paper9 0.02307 1.0098 1.023 1.037 0.598 0.791 0.848 0.599 0.792 0.849 0.0904 0.667 3.52 3.77 0.0572 0.10
num_of_pendown15 0.07591 1.0332 1.079 1.127 0.652 0.770 0.814 0.652 0.770 0.814 0.0979 0.718 3.50 4.13 0.0444 0.70
paper_time15 0.11441 1.0509 1.121 1.196 0.696 0.813 0.860 0.697 0.813 0.860 0.1000 0.892 3.49 5.57 0.0466 1.00
total_time3 0.01437 1.0060 1.014 1.023 0.679 0.756 0.806 0.678 0.756 0.806 0.0991 0.770 3.45 4.47 0.0492 0.25
mean_jerk_on_paper21 0.20534 1.0922 1.228 1.381 0.652 0.793 0.831 0.652 0.793 0.831 0.0937 0.692 3.44 3.96 0.0376 0.95
mean_speed_on_paper15 -0.03748 0.9431 0.963 0.984 0.652 0.788 0.832 0.652 0.789 0.832 0.0973 0.639 3.44 3.59 0.0432 0.45
air_time8 0.04018 1.0179 1.041 1.065 0.616 0.801 0.857 0.616 0.801 0.857 0.0965 0.786 3.43 4.60 0.0560 0.75
total_time19 -0.00846 0.9867 0.992 0.997 0.500 0.780 0.825 0.495 0.780 0.824 0.0953 0.517 3.39 3.06 0.0446 0.20
mean_acc_on_paper21 0.01319 1.0056 1.013 1.021 0.616 0.729 0.781 0.616 0.730 0.781 0.0952 0.629 3.38 3.52 0.0515 0.20
mean_jerk_on_paper9 0.25018 1.1121 1.284 1.483 0.643 0.799 0.850 0.643 0.799 0.850 0.0837 0.673 3.38 3.82 0.0510 0.85
mean_gmrt1 -0.02664 0.9585 0.974 0.989 0.634 0.764 0.821 0.635 0.764 0.821 0.0831 0.711 3.36 4.04 0.0568 0.50
mean_gmrt8 -0.02220 0.9658 0.978 0.990 0.679 0.771 0.824 0.678 0.771 0.824 0.0957 0.854 3.36 5.09 0.0535 0.35
air_time5 0.09902 1.0423 1.104 1.170 0.714 0.861 0.894 0.716 0.862 0.895 0.0891 1.058 3.35 6.86 0.0332 1.00
disp_index23 0.21958 1.0938 1.246 1.418 0.714 0.792 0.843 0.715 0.793 0.843 0.0911 0.754 3.31 4.45 0.0501 1.00
pressure_var5 0.01053 1.0041 1.011 1.017 0.723 0.812 0.843 0.723 0.813 0.843 0.0595 0.883 3.29 5.30 0.0304 0.25
total_time10 0.00218 1.0009 1.002 1.003 0.643 0.760 0.791 0.644 0.760 0.792 0.0858 0.513 3.28 2.83 0.0317 0.10
mean_jerk_in_air21 0.01770 1.0071 1.018 1.029 0.571 0.842 0.889 0.572 0.842 0.889 0.0844 1.010 3.28 6.24 0.0473 0.15
mean_jerk_in_air2 0.01618 1.0065 1.016 1.026 0.688 0.786 0.813 0.688 0.786 0.813 0.0847 0.778 3.27 4.54 0.0265 0.45
mean_jerk_on_paper22 0.12182 1.0485 1.130 1.217 0.464 0.806 0.859 0.463 0.806 0.859 0.0734 0.535 3.21 3.01 0.0531 0.25
gmrt_in_air6 -0.01596 0.9744 0.984 0.994 0.598 0.787 0.823 0.598 0.786 0.823 0.0863 0.620 3.18 3.48 0.0366 0.15
max_y_extension25 0.04756 1.0178 1.049 1.081 0.562 0.756 0.808 0.561 0.757 0.808 0.0746 0.628 3.16 3.51 0.0516 0.20
pressure_mean7 -0.01109 0.9822 0.989 0.996 0.688 0.758 0.787 0.690 0.759 0.788 0.0845 0.742 3.14 4.40 0.0294 0.10
pressure_var4 0.01154 1.0043 1.012 1.019 0.679 0.848 0.875 0.678 0.848 0.875 0.0685 0.795 3.14 4.60 0.0274 0.15
gmrt_in_air1 -0.00266 0.9957 0.997 0.999 0.616 0.787 0.812 0.616 0.788 0.813 0.0554 0.506 3.09 2.77 0.0247 0.10

pander::pander(smDecor$coefficients)
  Estimate lower OR upper u.Accuracy r.Accuracy full.Accuracy u.AUC r.AUC full.AUC IDI NRI z.IDI z.NRI Delta.AUC Frequency
Ba_paper_time23 0.78326 1.7139 2.189 2.795 0.821 0.797 0.895 0.821 0.797 0.894 0.2576 1.178 6.29 8.02 0.09738 1.00
Ba_paper_time9 0.57976 1.4641 1.786 2.178 0.732 0.827 0.918 0.732 0.827 0.918 0.2251 1.366 5.80 10.23 0.09128 1.00
Ba_air_time22 0.45418 1.3374 1.575 1.855 0.732 0.828 0.906 0.732 0.828 0.906 0.2196 1.289 5.56 9.29 0.07790 1.00
Ba_disp_index6 0.36304 1.2493 1.438 1.655 0.625 0.761 0.853 0.626 0.762 0.853 0.1831 1.067 5.23 6.87 0.09153 1.00
num_of_pendown19 -0.99024 0.2470 0.371 0.559 0.643 0.827 0.900 0.643 0.827 0.900 0.1856 1.273 5.06 9.21 0.07321 1.00
Ba_mean_speed_in_air19 -0.27747 0.6695 0.758 0.858 0.679 0.785 0.860 0.677 0.785 0.860 0.1697 0.984 5.02 6.14 0.07476 0.75
Ba_total_time6 0.43002 1.2967 1.537 1.823 0.732 0.815 0.903 0.733 0.815 0.904 0.1747 1.199 5.01 8.39 0.08824 1.00
Ba_mean_gmrt14 -0.04290 0.9404 0.958 0.976 0.696 0.732 0.829 0.697 0.732 0.829 0.1703 0.966 4.84 5.89 0.09708 0.20
pressure_mean1 -0.25926 0.6873 0.772 0.866 0.616 0.816 0.889 0.617 0.816 0.889 0.1570 0.839 4.84 5.07 0.07287 0.25
De_gmrt_in_air19 4.28673 11.9647 72.728 442.083 0.571 0.818 0.900 0.572 0.818 0.900 0.1674 1.096 4.82 7.29 0.08167 1.00
Ba_mean_speed_in_air25 -0.75922 0.3405 0.468 0.643 0.714 0.856 0.897 0.715 0.857 0.897 0.1621 1.235 4.70 8.56 0.04016 1.00
Ba_max_x_extension21 -1.92130 0.0646 0.146 0.332 0.580 0.875 0.914 0.584 0.875 0.914 0.1441 1.373 4.63 10.73 0.03929 1.00
Ba_pressure_mean8 -0.13178 0.8271 0.877 0.929 0.679 0.749 0.822 0.682 0.749 0.822 0.1454 0.918 4.52 5.79 0.07359 0.25
Ba_gmrt_in_air7 -0.21840 0.7288 0.804 0.886 0.714 0.800 0.873 0.715 0.800 0.873 0.1577 0.860 4.52 5.11 0.07301 1.00
Ba_air_time23 0.05888 1.0335 1.061 1.088 0.795 0.749 0.843 0.795 0.750 0.844 0.1203 0.716 4.49 4.20 0.09391 1.00
Ba_num_of_pendown15 0.29472 1.1667 1.343 1.545 0.652 0.806 0.873 0.652 0.807 0.873 0.1540 0.962 4.49 6.01 0.06617 0.85
Ba_mean_speed_on_paper11 -0.17532 0.7719 0.839 0.912 0.670 0.782 0.851 0.670 0.783 0.851 0.1484 0.960 4.47 6.00 0.06860 0.85
Ba_total_time8 0.22112 1.1249 1.247 1.383 0.714 0.831 0.884 0.715 0.831 0.883 0.1448 1.165 4.40 7.87 0.05220 1.00
Ba_max_y_extension19 -1.28281 0.1521 0.277 0.505 0.482 0.825 0.884 0.485 0.825 0.884 0.1381 1.059 4.37 7.19 0.05820 0.70
disp_index7 0.02730 1.0145 1.028 1.041 0.661 0.712 0.826 0.661 0.712 0.827 0.1391 0.792 4.37 4.57 0.11539 0.20
air_time9 0.00605 1.0032 1.006 1.009 0.616 0.741 0.812 0.615 0.741 0.812 0.1380 0.844 4.37 4.95 0.07055 0.15
pressure_var19 -0.35742 0.5909 0.699 0.828 0.634 0.826 0.889 0.634 0.826 0.890 0.1377 1.047 4.34 6.66 0.06304 1.00
De_mean_jerk_in_air19 2.27760 3.4348 9.753 27.695 0.607 0.811 0.881 0.606 0.811 0.881 0.1349 0.976 4.30 6.01 0.07039 0.20
Ba_mean_jerk_on_paper21 0.39794 1.2225 1.489 1.813 0.652 0.791 0.867 0.652 0.791 0.867 0.1372 1.018 4.30 6.43 0.07597 0.85
Ba_gmrt_on_paper15 -0.00968 0.9858 0.990 0.995 0.634 0.716 0.792 0.634 0.717 0.792 0.1303 0.700 4.21 3.95 0.07531 0.15
Ba_air_time17 0.13728 1.0764 1.147 1.223 0.795 0.843 0.890 0.795 0.843 0.890 0.1264 0.976 4.20 6.04 0.04661 1.00
Ba_gmrt_in_air23 -0.64721 0.3876 0.524 0.707 0.705 0.865 0.915 0.706 0.866 0.914 0.1344 1.116 4.16 7.19 0.04862 1.00
Ba_mean_acc_in_air2 0.01266 1.0062 1.013 1.019 0.679 0.767 0.837 0.680 0.767 0.837 0.1261 0.896 4.14 5.33 0.07058 0.10
De_disp_index22 0.16888 1.0887 1.184 1.288 0.705 0.738 0.820 0.706 0.738 0.820 0.1264 0.913 4.10 5.50 0.08170 0.85
Ba_total_time15 0.09722 1.0514 1.102 1.155 0.741 0.800 0.846 0.741 0.800 0.846 0.1484 0.754 4.06 4.35 0.04532 1.00
De_mean_gmrt19 3.06929 4.8581 21.527 95.388 0.634 0.868 0.914 0.634 0.868 0.914 0.1131 1.023 4.04 6.32 0.04576 1.00
Ba_total_time24 0.15035 1.0789 1.162 1.252 0.688 0.813 0.860 0.688 0.813 0.860 0.1149 0.976 4.02 6.05 0.04725 0.95
Ba_mean_speed_on_paper25 -0.05956 0.9143 0.942 0.971 0.616 0.851 0.894 0.616 0.851 0.894 0.1254 1.200 4.02 8.32 0.04243 0.10
Ba_total_time18 0.12306 1.0588 1.131 1.208 0.643 0.843 0.885 0.644 0.843 0.885 0.1025 1.120 4.00 7.63 0.04237 0.70
Ba_pressure_mean18 -0.04003 0.9412 0.961 0.981 0.625 0.828 0.879 0.627 0.827 0.879 0.1180 0.886 3.99 5.60 0.05136 0.10
Ba_air_time19 -0.03771 0.9446 0.963 0.982 0.482 0.810 0.868 0.477 0.810 0.868 0.1153 0.870 3.99 5.47 0.05791 0.50
Ba_air_time13 0.05485 1.0275 1.056 1.086 0.670 0.776 0.825 0.670 0.776 0.825 0.1215 0.700 3.98 4.07 0.04887 1.00
Ba_total_time2 0.12460 1.0614 1.133 1.209 0.688 0.804 0.870 0.688 0.804 0.870 0.1175 0.856 3.98 5.10 0.06563 1.00
De_paper_time10 0.34638 1.1799 1.414 1.694 0.625 0.792 0.865 0.626 0.792 0.865 0.1222 0.975 3.95 6.20 0.07318 0.95
num_of_pendown23 0.81829 1.4931 2.267 3.441 0.687 0.824 0.868 0.688 0.824 0.869 0.1142 0.998 3.94 6.19 0.04453 1.00
Ba_num_of_pendown9 0.05651 1.0264 1.058 1.091 0.661 0.789 0.847 0.661 0.790 0.848 0.1169 0.854 3.94 5.10 0.05823 1.00
Ba_gmrt_on_paper7 -0.02962 0.9564 0.971 0.985 0.643 0.743 0.814 0.644 0.744 0.814 0.1211 0.630 3.93 3.56 0.07006 0.50
Ba_pressure_mean4 -1.33523 0.1313 0.263 0.527 0.670 0.812 0.866 0.673 0.813 0.866 0.1143 0.880 3.90 5.64 0.05357 0.90
Ba_gmrt_on_paper9 -0.09580 0.8629 0.909 0.957 0.723 0.797 0.845 0.724 0.797 0.845 0.1129 0.844 3.85 5.04 0.04763 1.00
De_num_of_pendown21 -0.12229 0.8290 0.885 0.945 0.589 0.821 0.865 0.590 0.821 0.865 0.1058 0.809 3.84 4.79 0.04347 0.85
Ba_paper_time12 0.16411 1.0828 1.178 1.282 0.714 0.806 0.868 0.715 0.807 0.868 0.1145 0.862 3.84 5.18 0.06132 1.00
Ba_total_time3 0.04000 1.0176 1.041 1.064 0.679 0.779 0.827 0.678 0.779 0.827 0.1104 0.758 3.83 4.38 0.04857 0.60
De_pressure_mean5 -0.33315 0.6003 0.717 0.856 0.688 0.815 0.871 0.691 0.815 0.871 0.1044 0.860 3.83 5.44 0.05573 0.90
paper_time17 0.10153 1.0492 1.107 1.168 0.723 0.777 0.827 0.724 0.778 0.827 0.1123 0.761 3.82 4.46 0.04903 1.00
Ba_air_time5 0.08009 1.0406 1.083 1.128 0.714 0.811 0.864 0.716 0.811 0.864 0.1129 1.060 3.82 6.90 0.05337 1.00
max_x_extension6 0.07872 1.0383 1.082 1.127 0.527 0.773 0.843 0.527 0.774 0.844 0.1157 0.743 3.80 4.28 0.07004 0.35
Ba_air_time4 0.03336 1.0152 1.034 1.053 0.679 0.792 0.842 0.679 0.792 0.842 0.1100 0.850 3.75 5.07 0.05036 0.80
Ba_gmrt_on_paper1 -0.01529 0.9769 0.985 0.993 0.661 0.792 0.823 0.660 0.792 0.823 0.1098 0.792 3.71 4.59 0.03155 0.10
Ba_total_time16 0.06661 1.0297 1.069 1.110 0.705 0.815 0.851 0.707 0.815 0.851 0.1098 0.916 3.70 5.59 0.03585 1.00
Ba_pressure_var4 0.02765 1.0127 1.028 1.044 0.679 0.805 0.858 0.678 0.805 0.858 0.1062 0.992 3.69 6.37 0.05325 0.25
De_mean_gmrt10 -0.02584 0.9615 0.974 0.988 0.625 0.795 0.839 0.625 0.796 0.839 0.0970 0.664 3.69 3.73 0.04336 0.10
Ba_mean_acc_in_air17 -0.00247 0.9962 0.998 0.999 0.687 0.716 0.782 0.688 0.718 0.784 0.1076 0.703 3.68 3.98 0.06564 0.10
De_mean_jerk_in_air7 0.75300 1.3893 2.123 3.245 0.696 0.809 0.848 0.694 0.810 0.848 0.1060 0.892 3.65 5.57 0.03820 1.00
pressure_mean7 -0.32207 0.6041 0.725 0.869 0.688 0.843 0.873 0.690 0.843 0.873 0.1019 0.932 3.63 5.84 0.03009 0.50
Ba_mean_jerk_on_paper9 0.23896 1.1112 1.270 1.451 0.643 0.779 0.828 0.643 0.779 0.828 0.0979 0.763 3.62 4.41 0.04877 0.80
De_mean_acc_on_paper21 0.31464 1.1525 1.370 1.628 0.625 0.805 0.861 0.624 0.806 0.861 0.0918 0.909 3.60 5.46 0.05563 0.70
De_mean_speed_on_paper15 -0.06263 0.9069 0.939 0.973 0.625 0.807 0.828 0.626 0.808 0.828 0.1007 1.026 3.57 6.57 0.02027 0.10
Ba_disp_index15 0.04628 1.0202 1.047 1.075 0.643 0.786 0.830 0.643 0.787 0.830 0.0923 0.676 3.55 3.83 0.04357 0.20
pressure_mean9 -0.79146 0.2905 0.453 0.707 0.670 0.844 0.880 0.673 0.844 0.880 0.0967 0.940 3.55 6.37 0.03553 1.00
paper_time15 0.12296 1.0552 1.131 1.212 0.696 0.820 0.859 0.697 0.820 0.859 0.1011 0.899 3.54 5.41 0.03938 1.00
De_disp_index17 0.11562 1.0538 1.123 1.196 0.723 0.755 0.809 0.723 0.755 0.809 0.0955 0.782 3.51 4.58 0.05376 0.65
De_mean_jerk_on_paper5 0.00763 1.0033 1.008 1.012 0.688 0.708 0.767 0.687 0.708 0.766 0.0917 0.610 3.50 3.40 0.05856 0.15
Ba_paper_time24 0.01797 1.0074 1.018 1.029 0.670 0.772 0.815 0.669 0.773 0.816 0.0940 0.961 3.50 5.87 0.04269 0.25
Ba_gmrt_on_paper22 -0.01150 0.9823 0.989 0.995 0.661 0.757 0.800 0.661 0.756 0.800 0.0941 0.679 3.50 3.89 0.04333 0.15
De_mean_gmrt23 0.64584 1.3262 1.908 2.744 0.482 0.846 0.898 0.481 0.846 0.897 0.0801 0.667 3.49 4.01 0.05114 0.70
De_mean_speed_on_paper9 -0.18389 0.7502 0.832 0.923 0.491 0.846 0.894 0.493 0.846 0.894 0.0841 1.015 3.49 6.61 0.04821 0.10
De_paper_time5 -0.05514 0.9174 0.946 0.976 0.643 0.832 0.856 0.645 0.832 0.856 0.0858 0.889 3.45 5.62 0.02403 0.15
Ba_num_of_pendown5 0.05411 1.0240 1.056 1.088 0.688 0.810 0.850 0.689 0.811 0.850 0.0963 0.756 3.45 4.51 0.03950 0.95
Ba_air_time7 0.07122 1.0331 1.074 1.116 0.777 0.829 0.875 0.778 0.829 0.875 0.0755 0.851 3.45 5.11 0.04586 1.00
Ba_gmrt_in_air8 -0.02915 0.9549 0.971 0.988 0.643 0.786 0.835 0.642 0.786 0.835 0.1014 0.936 3.45 5.64 0.04885 0.30
De_gmrt_on_paper17 -0.04668 0.9302 0.954 0.979 0.643 0.759 0.807 0.644 0.759 0.807 0.0972 0.621 3.44 3.49 0.04828 0.60
Ba_mean_gmrt1 -0.01907 0.9707 0.981 0.992 0.634 0.751 0.790 0.635 0.751 0.790 0.0880 0.565 3.39 3.14 0.03880 0.40
Ba_pressure_mean15 -0.05324 0.9187 0.948 0.979 0.598 0.794 0.838 0.600 0.795 0.839 0.0938 0.563 3.37 3.22 0.04395 0.25
De_paper_time8 0.01234 1.0053 1.012 1.020 0.679 0.728 0.782 0.680 0.729 0.782 0.0882 0.792 3.36 4.63 0.05330 0.15
Ba_mean_jerk_on_paper24 -0.30806 0.6150 0.735 0.878 0.616 0.824 0.873 0.615 0.824 0.873 0.0816 0.751 3.34 4.33 0.04945 0.95
paper_time13 0.04981 1.0216 1.051 1.081 0.679 0.797 0.836 0.679 0.796 0.836 0.0943 0.862 3.32 5.18 0.03961 0.55
pressure_var5 0.01292 1.0050 1.013 1.021 0.723 0.772 0.819 0.723 0.773 0.819 0.0649 0.868 3.30 5.14 0.04684 0.50
De_mean_jerk_on_paper4 0.30755 1.1385 1.360 1.625 0.643 0.835 0.869 0.642 0.835 0.869 0.0804 0.967 3.30 5.89 0.03350 0.25
max_y_extension25 0.02288 1.0093 1.023 1.037 0.562 0.713 0.795 0.561 0.714 0.795 0.0834 0.632 3.30 3.55 0.08108 0.10
De_total_time10 0.06314 1.0267 1.065 1.105 0.598 0.858 0.866 0.599 0.858 0.866 0.0836 0.606 3.27 3.41 0.00787 0.15
Ba_air_time11 0.00516 1.0020 1.005 1.008 0.688 0.812 0.804 0.688 0.812 0.804 0.0857 0.928 3.24 5.54 -0.00877 0.10
Ba_mean_acc_on_paper3 0.00638 1.0024 1.006 1.010 0.625 0.750 0.784 0.625 0.750 0.784 0.0802 0.645 3.22 3.61 0.03313 0.10
paper_time2 0.01183 1.0045 1.012 1.019 0.670 0.749 0.800 0.669 0.749 0.801 0.0859 0.693 3.21 3.93 0.05204 0.25
Ba_mean_speed_on_paper2 -0.00827 0.9867 0.992 0.997 0.643 0.742 0.801 0.642 0.743 0.801 0.0839 0.584 3.20 3.24 0.05813 0.20
De_max_y_extension24 -0.22207 0.6975 0.801 0.919 0.545 0.833 0.877 0.544 0.833 0.877 0.0806 0.824 3.20 4.96 0.04360 0.30
air_time12 0.00129 1.0005 1.001 1.002 0.625 0.684 0.751 0.625 0.684 0.751 0.0741 0.452 3.10 2.46 0.06712 0.10
De_mean_jerk_in_air2 0.01672 1.0060 1.017 1.028 0.643 0.747 0.793 0.641 0.747 0.793 0.0781 0.739 3.06 4.27 0.04579 0.10

pander::pander(smDecorU$coefficients)
  Estimate lower OR upper u.Accuracy r.Accuracy full.Accuracy u.AUC r.AUC full.AUC IDI NRI z.IDI z.NRI Delta.AUC Frequency
Ba_total_time9 0.34082 1.22366 1.4061 1.616 0.741 0.801 0.875 0.742 0.802 0.875 0.1996 1.176 5.22 8.43 0.0732 1.00
Ba_disp_index6 0.48646 1.34539 1.6265 1.966 0.625 0.793 0.868 0.626 0.794 0.869 0.1750 1.085 5.12 6.99 0.0745 0.75
Ba_air_time6 0.40498 1.27713 1.4993 1.760 0.732 0.832 0.900 0.732 0.833 0.900 0.1722 1.215 5.06 8.29 0.0677 1.00
num_of_pendown19 -1.34867 0.14577 0.2596 0.462 0.643 0.829 0.898 0.643 0.829 0.898 0.1659 1.116 4.88 7.32 0.0689 1.00
Ba_max_y_extension19 -0.97114 0.24490 0.3787 0.585 0.482 0.856 0.905 0.485 0.856 0.905 0.1391 1.374 4.74 10.48 0.0497 0.20
Ba_gmrt_on_paper23 -0.30209 0.64640 0.7393 0.845 0.688 0.789 0.885 0.688 0.790 0.885 0.1532 0.923 4.72 5.84 0.0957 0.45
Ba_air_time8 0.08039 1.04492 1.0837 1.124 0.616 0.793 0.873 0.616 0.793 0.874 0.1571 0.996 4.68 6.42 0.0809 0.50
Ba_air_time22 0.41048 1.26068 1.5075 1.803 0.732 0.837 0.898 0.732 0.837 0.898 0.1611 1.195 4.67 8.21 0.0613 1.00
Ba_air_time15 0.25117 1.15197 1.2855 1.435 0.759 0.837 0.890 0.759 0.838 0.890 0.1665 1.038 4.66 6.53 0.0521 1.00
pressure_var19 -0.53641 0.45346 0.5848 0.754 0.634 0.794 0.888 0.634 0.794 0.888 0.1548 1.113 4.63 7.76 0.0937 0.90
Ba_num_of_pendown9 0.13453 1.07500 1.1440 1.217 0.661 0.781 0.858 0.661 0.781 0.858 0.1591 0.924 4.62 5.67 0.0770 0.75
Ba_air_time23 0.17048 1.10159 1.1859 1.277 0.795 0.811 0.873 0.795 0.811 0.873 0.1341 0.847 4.52 4.98 0.0624 1.00
Ba_num_of_pendown15 0.51010 1.32143 1.6655 2.099 0.652 0.847 0.908 0.652 0.847 0.908 0.1492 1.023 4.49 6.55 0.0606 0.40
Ba_max_x_extension21 -2.85259 0.01530 0.0577 0.218 0.580 0.826 0.870 0.584 0.826 0.870 0.1458 1.196 4.48 8.17 0.0444 0.85
De_mean_gmrt23 0.38098 1.22770 1.4637 1.745 0.482 0.800 0.884 0.481 0.800 0.884 0.1229 1.057 4.47 6.68 0.0843 0.20
De_air_time9 -0.07474 0.89702 0.9280 0.960 0.607 0.825 0.879 0.608 0.825 0.879 0.1387 1.029 4.35 6.52 0.0535 0.15
Ba_air_time5 0.13404 1.07198 1.1434 1.220 0.714 0.815 0.869 0.716 0.816 0.869 0.1291 0.995 4.21 6.28 0.0535 0.90
Ba_air_time2 0.13647 1.07554 1.1462 1.222 0.670 0.809 0.867 0.669 0.809 0.867 0.1347 0.903 4.19 5.44 0.0581 0.80
Ba_gmrt_in_air23 -0.31803 0.62794 0.7276 0.843 0.705 0.821 0.875 0.706 0.821 0.875 0.1239 0.751 4.19 4.37 0.0539 1.00
Ba_pressure_mean18 -0.12720 0.82866 0.8806 0.936 0.625 0.839 0.881 0.627 0.839 0.881 0.1274 0.807 4.04 4.84 0.0424 0.15
Ba_paper_time12 0.31434 1.16973 1.3694 1.603 0.714 0.812 0.865 0.715 0.812 0.865 0.1249 0.980 4.03 6.14 0.0524 0.90
De_disp_index17 0.37527 1.19022 1.4554 1.780 0.723 0.824 0.879 0.723 0.825 0.879 0.1085 1.000 4.01 6.37 0.0538 0.35
num_of_pendown23 1.04273 1.69961 2.8370 4.735 0.688 0.821 0.864 0.688 0.821 0.865 0.1221 0.959 4.00 5.87 0.0431 0.80
De_paper_time10 0.56345 1.31971 1.7567 2.338 0.625 0.812 0.875 0.626 0.812 0.875 0.1204 1.018 3.99 6.39 0.0634 0.65
Ba_air_time4 0.04045 1.01956 1.0413 1.063 0.679 0.831 0.870 0.679 0.831 0.870 0.1167 0.903 3.98 5.48 0.0391 0.30
Ba_air_time18 0.06768 1.03332 1.0700 1.108 0.670 0.829 0.904 0.670 0.830 0.904 0.1084 1.105 3.97 7.13 0.0737 0.20
Ba_gmrt_in_air8 -0.00802 0.98797 0.9920 0.996 0.643 0.745 0.767 0.642 0.745 0.766 0.1261 0.829 3.96 4.87 0.0216 0.10
Ba_mean_acc_in_air25 -0.45053 0.50838 0.6373 0.799 0.696 0.843 0.886 0.697 0.844 0.886 0.1228 1.080 3.95 7.09 0.0428 0.60
Ba_air_time17 0.10653 1.05570 1.1124 1.172 0.795 0.816 0.862 0.795 0.816 0.862 0.1075 0.736 3.94 4.23 0.0461 1.00
De_mean_acc_on_paper21 0.74156 1.42795 2.0992 3.086 0.625 0.831 0.890 0.624 0.832 0.890 0.1037 0.950 3.90 5.82 0.0588 0.55
pressure_mean9 -0.73185 0.32302 0.4810 0.716 0.670 0.834 0.881 0.673 0.834 0.881 0.1109 1.012 3.90 6.91 0.0468 0.60
Ba_pressure_mean4 -1.98537 0.05193 0.1373 0.363 0.670 0.846 0.892 0.673 0.846 0.893 0.1035 0.969 3.87 6.12 0.0463 0.30
Ba_air_time7 0.06993 1.03591 1.0724 1.110 0.777 0.777 0.842 0.778 0.777 0.842 0.1047 0.865 3.87 5.21 0.0650 0.95
paper_time15 0.18002 1.09113 1.1972 1.314 0.696 0.800 0.846 0.697 0.800 0.846 0.1167 0.870 3.76 5.19 0.0457 0.95
Ba_mean_jerk_on_paper21 0.21270 1.10461 1.2370 1.385 0.652 0.755 0.828 0.652 0.756 0.829 0.1120 0.758 3.75 4.45 0.0731 0.45
Ba_air_time13 0.06734 1.03161 1.0697 1.109 0.670 0.779 0.829 0.670 0.779 0.829 0.1119 0.711 3.71 4.13 0.0500 1.00
De_pressure_mean5 -0.17175 0.76981 0.8422 0.921 0.688 0.789 0.836 0.691 0.789 0.836 0.1044 0.871 3.70 5.62 0.0471 0.40
De_paper_time8 0.01818 1.00816 1.0183 1.029 0.679 0.707 0.830 0.680 0.707 0.831 0.1087 0.757 3.70 4.42 0.1238 0.10
Ba_pressure_mean15 -0.06256 0.90837 0.9394 0.971 0.598 0.806 0.850 0.600 0.807 0.850 0.1060 0.538 3.70 3.03 0.0436 0.15
Ba_num_of_pendown5 0.07217 1.03410 1.0748 1.117 0.688 0.789 0.833 0.689 0.789 0.833 0.1075 0.753 3.69 4.41 0.0437 0.80
disp_index9 0.11135 1.05501 1.1178 1.184 0.661 0.774 0.823 0.660 0.774 0.823 0.1043 0.799 3.69 4.66 0.0486 0.45
De_paper_time9 0.15116 1.07681 1.1632 1.256 0.554 0.852 0.893 0.552 0.853 0.893 0.0966 0.884 3.66 5.34 0.0406 0.15
De_disp_index23 0.36573 1.18574 1.4416 1.753 0.714 0.787 0.843 0.715 0.788 0.843 0.1081 0.707 3.63 4.08 0.0551 0.90
pressure_var5 0.00782 1.00370 1.0079 1.012 0.723 0.782 0.810 0.723 0.782 0.810 0.0614 0.972 3.63 5.90 0.0277 0.15
Ba_mean_acc_on_paper24 -0.34220 0.58416 0.7102 0.863 0.545 0.847 0.893 0.545 0.847 0.893 0.1008 0.965 3.62 6.13 0.0461 0.30
Ba_gmrt_on_paper9 -0.09822 0.85998 0.9065 0.955 0.723 0.779 0.832 0.724 0.779 0.832 0.1009 0.846 3.61 4.99 0.0526 0.80
Ba_gmrt_on_paper1 -0.20921 0.72391 0.8112 0.909 0.661 0.841 0.872 0.660 0.841 0.873 0.1044 1.070 3.61 6.72 0.0319 0.35
De_disp_index22 0.14163 1.07310 1.1522 1.237 0.705 0.834 0.896 0.706 0.835 0.896 0.0908 0.946 3.59 5.71 0.0611 0.20
De_paper_time1 0.22689 1.10448 1.2547 1.425 0.652 0.791 0.831 0.652 0.791 0.832 0.0983 0.676 3.57 3.86 0.0403 0.45
Ba_air_time24 0.02039 1.00949 1.0206 1.032 0.688 0.790 0.828 0.688 0.791 0.828 0.0930 0.681 3.54 3.93 0.0368 0.15
Ba_gmrt_in_air7 -0.27133 0.65135 0.7624 0.892 0.714 0.831 0.874 0.715 0.832 0.874 0.1032 0.858 3.53 5.23 0.0420 0.90
Ba_mean_speed_on_paper11 -0.12303 0.82559 0.8842 0.947 0.670 0.846 0.884 0.670 0.846 0.884 0.0993 0.908 3.49 5.53 0.0383 0.35
Ba_air_time16 0.05666 1.02571 1.0583 1.092 0.732 0.806 0.836 0.733 0.806 0.837 0.1005 0.894 3.48 5.32 0.0302 0.75
Ba_gmrt_on_paper7 -0.08954 0.86773 0.9144 0.963 0.643 0.814 0.858 0.644 0.815 0.858 0.0985 0.711 3.48 4.12 0.0431 0.45
De_mean_speed_in_air19 -2.89272 0.00979 0.0554 0.314 0.545 0.865 0.890 0.545 0.866 0.890 0.0933 0.692 3.42 3.98 0.0240 1.00
De_total_time10 0.41874 1.19489 1.5200 1.934 0.598 0.882 0.917 0.599 0.883 0.917 0.0860 0.862 3.39 5.11 0.0343 0.30
Ba_mean_jerk_on_paper9 0.44595 1.21266 1.5620 2.012 0.643 0.842 0.885 0.643 0.842 0.886 0.0824 0.819 3.37 4.80 0.0433 0.50
paper_time2 0.01242 1.00518 1.0125 1.020 0.670 0.755 0.789 0.669 0.755 0.789 0.0917 0.792 3.36 4.59 0.0336 0.10
De_mean_jerk_in_air7 0.69635 1.30486 2.0064 3.085 0.696 0.807 0.837 0.694 0.807 0.837 0.0907 0.735 3.29 4.37 0.0300 0.80
Ba_mean_speed_in_air6 -0.02287 0.96418 0.9774 0.991 0.571 0.768 0.827 0.571 0.768 0.827 0.0826 0.782 3.28 4.51 0.0590 0.10
max_x_extension6 0.03151 1.01309 1.0320 1.051 0.527 0.843 0.863 0.527 0.843 0.863 0.0856 0.898 3.27 5.35 0.0204 0.10
paper_time17 0.22563 1.09611 1.2531 1.433 0.723 0.820 0.859 0.724 0.820 0.860 0.0867 0.798 3.23 4.70 0.0394 0.85
Ba_mean_acc_in_air2 0.02312 1.00907 1.0234 1.038 0.679 0.821 0.870 0.680 0.822 0.870 0.0833 0.919 3.19 5.57 0.0485 0.15
Ba_mean_acc_in_air21 0.01604 1.00609 1.0162 1.026 0.562 0.857 0.875 0.563 0.857 0.875 0.0721 0.824 3.15 4.81 0.0179 0.15
De_mean_speed_in_air24 -0.14999 0.78250 0.8607 0.947 0.527 0.788 0.806 0.526 0.788 0.806 0.0768 0.833 3.12 4.86 0.0176 0.10
De_gmrt_on_paper17 -0.04819 0.92332 0.9529 0.984 0.643 0.782 0.833 0.644 0.782 0.833 0.0707 0.469 3.05 2.62 0.0510 0.25
De_max_y_extension24 -0.16847 0.75791 0.8450 0.942 0.545 0.807 0.842 0.544 0.807 0.843 0.0723 0.605 2.98 3.40 0.0352 0.25

## Let focus on the new features

decorCoeff <- smDecor$coefficients[newvars,];
ncoef <- dc[newvars]
cnames <- lapply(ncoef,names)
names(cnames) <- NULL;
decorCoeff$Elements <- lapply(cnames,paste,collapse="+")
pander::pander(decorCoeff)
  Estimate lower OR upper u.Accuracy r.Accuracy full.Accuracy u.AUC r.AUC full.AUC IDI NRI z.IDI z.NRI Delta.AUC Frequency Elements
De_gmrt_in_air19 4.28673 11.965 72.728 442.083 0.571 0.818 0.900 0.572 0.818 0.900 0.1674 1.096 4.82 7.29 0.08167 1.00 gmrt_in_air19+mean_speed_in_air19
De_mean_jerk_in_air19 2.27760 3.435 9.753 27.695 0.607 0.811 0.881 0.606 0.811 0.881 0.1349 0.976 4.30 6.01 0.07039 0.20 mean_acc_in_air19+mean_jerk_in_air19
De_mean_gmrt19 3.06929 4.858 21.527 95.388 0.634 0.868 0.914 0.634 0.868 0.914 0.1131 1.023 4.04 6.32 0.04576 1.00 gmrt_on_paper19+mean_gmrt19+mean_speed_in_air19
De_paper_time10 0.34638 1.180 1.414 1.694 0.625 0.792 0.865 0.626 0.792 0.865 0.1222 0.975 3.95 6.20 0.07318 0.95 disp_index10+paper_time10
De_num_of_pendown21 -0.12229 0.829 0.885 0.945 0.589 0.821 0.865 0.590 0.821 0.865 0.1058 0.809 3.84 4.79 0.04347 0.85 air_time21+num_of_pendown21
De_pressure_mean5 -0.33315 0.600 0.717 0.856 0.688 0.815 0.871 0.691 0.815 0.871 0.1044 0.860 3.83 5.44 0.05573 0.90 max_y_extension5+pressure_mean5
De_mean_gmrt10 -0.02584 0.961 0.974 0.988 0.625 0.795 0.839 0.625 0.796 0.839 0.0970 0.664 3.69 3.73 0.04336 0.10 gmrt_in_air10+mean_gmrt10
De_mean_jerk_in_air7 0.75300 1.389 2.123 3.245 0.696 0.809 0.848 0.694 0.810 0.848 0.1060 0.892 3.65 5.57 0.03820 1.00 mean_acc_in_air7+mean_jerk_in_air7
De_mean_acc_on_paper21 0.31464 1.152 1.370 1.628 0.625 0.805 0.861 0.624 0.806 0.861 0.0918 0.909 3.60 5.46 0.05563 0.70 mean_acc_on_paper21+mean_speed_on_paper21
De_mean_speed_on_paper15 -0.06263 0.907 0.939 0.973 0.625 0.807 0.828 0.626 0.808 0.828 0.1007 1.026 3.57 6.57 0.02027 0.10 gmrt_on_paper15+mean_speed_on_paper15
De_disp_index17 0.11562 1.054 1.123 1.196 0.723 0.755 0.809 0.723 0.755 0.809 0.0955 0.782 3.51 4.58 0.05376 0.65 disp_index17+max_y_extension17
De_mean_jerk_on_paper5 0.00763 1.003 1.008 1.012 0.688 0.708 0.767 0.687 0.708 0.766 0.0917 0.610 3.50 3.40 0.05856 0.15 max_y_extension5+mean_jerk_on_paper5
De_paper_time5 -0.05514 0.917 0.946 0.976 0.643 0.832 0.856 0.645 0.832 0.856 0.0858 0.889 3.45 5.62 0.02403 0.15 max_y_extension5+mean_speed_on_paper5+paper_time5
De_mean_jerk_on_paper4 0.30755 1.139 1.360 1.625 0.643 0.835 0.869 0.642 0.835 0.869 0.0804 0.967 3.30 5.89 0.03350 0.25 mean_acc_on_paper4+mean_jerk_on_paper4
De_total_time10 0.06314 1.027 1.065 1.105 0.598 0.858 0.866 0.599 0.858 0.866 0.0836 0.606 3.27 3.41 0.00787 0.15 air_time10+total_time10
De_max_y_extension24 -0.22207 0.698 0.801 0.919 0.545 0.833 0.877 0.544 0.833 0.877 0.0806 0.824 3.20 4.96 0.04360 0.30 max_x_extension24+max_y_extension24
De_mean_jerk_in_air2 0.01672 1.006 1.017 1.028 0.643 0.747 0.793 0.641 0.747 0.793 0.0781 0.739 3.06 4.27 0.04579 0.10 mean_acc_in_air2+mean_jerk_in_air2

Differences Between Blind vs. Outcome-Driven Decorrelation

In this section I will show the differences in unaltered basis vectors between the Outcome driven Transformation vs. the blind decorrelated transformation

par(op)
par(mfrow=c(1,1))


smDecorU <- summary(bmdU)
decornamesU <- rownames(smDecorU$coefficients)

get_De_names <- decornames[!str_detect(decornames,"De_")]
get_De_namesU <- decornamesU[!str_detect(decornamesU,"De_")]

unn <- bmd$univariate[,3]
names(unn) <- rownames(bmd$univariate)
pander::pander(as.matrix(unn[get_De_names]))
Ba_paper_time23 8.31
Ba_paper_time9 7.47
Ba_air_time22 6.20
Ba_disp_index6 4.42
num_of_pendown19 4.13
Ba_mean_speed_in_air19 4.20
Ba_total_time6 6.21
Ba_mean_gmrt14 4.60
pressure_mean1 3.73
Ba_mean_speed_in_air25 5.87
Ba_max_x_extension21 2.80
Ba_pressure_mean8 4.45
Ba_gmrt_in_air7 6.22
Ba_air_time23 9.33
Ba_num_of_pendown15 4.72
Ba_mean_speed_on_paper11 4.92
Ba_total_time8 5.24
Ba_max_y_extension19 3.22
disp_index7 4.45
air_time9 3.95
pressure_var19 2.94
Ba_mean_jerk_on_paper21 4.54
Ba_gmrt_on_paper15 4.15
Ba_air_time17 7.30
Ba_gmrt_in_air23 6.02
Ba_mean_acc_in_air2 3.74
Ba_total_time15 8.70
Ba_total_time24 4.46
Ba_mean_speed_on_paper25 3.91
Ba_total_time18 5.47
Ba_pressure_mean18 4.01
Ba_air_time19 1.26
Ba_air_time13 5.66
Ba_total_time2 5.81
num_of_pendown23 4.83
Ba_num_of_pendown9 4.89
Ba_gmrt_on_paper7 5.15
Ba_pressure_mean4 4.99
Ba_gmrt_on_paper9 5.82
Ba_paper_time12 5.59
Ba_total_time3 4.96
paper_time17 5.96
Ba_air_time5 5.22
max_x_extension6 1.91
Ba_air_time4 4.18
Ba_gmrt_on_paper1 3.59
Ba_total_time16 5.95
Ba_pressure_var4 4.19
Ba_mean_acc_in_air17 4.78
pressure_mean7 4.39
Ba_mean_jerk_on_paper9 3.30
Ba_disp_index15 4.13
pressure_mean9 4.41
paper_time15 5.90
Ba_paper_time24 3.72
Ba_gmrt_on_paper22 4.57
Ba_num_of_pendown5 5.21
Ba_air_time7 6.78
Ba_gmrt_in_air8 3.68
Ba_mean_gmrt1 4.12
Ba_pressure_mean15 4.31
Ba_mean_jerk_on_paper24 3.32
paper_time13 4.83
pressure_var5 4.07
max_y_extension25 1.92
Ba_air_time11 4.04
Ba_mean_acc_on_paper3 3.67
paper_time2 4.73
Ba_mean_speed_on_paper2 4.31
air_time12 3.95
pander::pander(summary(unn[get_De_names]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.26 4.02 4.5 4.77 5.56 9.33

unnU <- bmdU$univariate[,3]
names(unnU) <- rownames(bmdU$univariate)
pander::pander(as.matrix(unnU[get_De_namesU]))
Ba_total_time9 6.83
Ba_disp_index6 4.42
Ba_air_time6 5.86
num_of_pendown19 4.13
Ba_max_y_extension19 3.22
Ba_gmrt_on_paper23 5.00
Ba_air_time8 3.73
Ba_air_time22 6.20
Ba_air_time15 8.15
pressure_var19 2.94
Ba_num_of_pendown9 4.89
Ba_air_time23 9.33
Ba_num_of_pendown15 4.72
Ba_max_x_extension21 2.80
Ba_air_time5 5.22
Ba_air_time2 5.07
Ba_gmrt_in_air23 6.02
Ba_pressure_mean18 4.01
Ba_paper_time12 5.59
num_of_pendown23 4.83
Ba_air_time4 4.18
Ba_air_time18 4.43
Ba_gmrt_in_air8 3.68
Ba_mean_acc_in_air25 5.01
Ba_air_time17 7.30
pressure_mean9 4.41
Ba_pressure_mean4 4.99
Ba_air_time7 6.78
paper_time15 5.90
Ba_mean_jerk_on_paper21 4.54
Ba_air_time13 5.66
Ba_pressure_mean15 4.31
Ba_num_of_pendown5 5.21
disp_index9 4.37
pressure_var5 4.07
Ba_mean_acc_on_paper24 1.96
Ba_gmrt_on_paper9 5.82
Ba_gmrt_on_paper1 3.59
Ba_air_time24 3.97
Ba_gmrt_in_air7 6.22
Ba_mean_speed_on_paper11 4.92
Ba_air_time16 5.67
Ba_gmrt_on_paper7 5.15
Ba_mean_jerk_on_paper9 3.30
paper_time2 4.73
Ba_mean_speed_in_air6 2.13
max_x_extension6 1.91
paper_time17 5.96
Ba_mean_acc_in_air2 3.74
Ba_mean_acc_in_air21 1.49
pander::pander(summary(unnU[get_De_namesU]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.49 3.98 4.78 4.77 5.67 9.33
#boxplot(unn[get_De_names],unnU[get_De_namesU],xlab=c("Method"),ylab="Z",main="Z Values of Basis Features")

x1 <- unn[get_De_names]
x2 <- unnU[get_De_namesU]
X3 <- x1[!(get_De_names %in% get_De_namesU)]
X4 <- x2[!(get_De_namesU %in% get_De_names)]
vioplot(x1, x2, X3,X4, 
        names = c("Outcome-Driven", 
                  "Blind",
                  "Not in Blind",
                  "Not in Outcome-Driven"),
        ylab="Z IDI",
   col="gold")
title("Violin Plots of Unaltered-Basis")


sameFeatures <- get_De_names[get_De_names %in% get_De_namesU]
pander::pander(as.matrix(unn[sameFeatures]))
Ba_air_time22 6.20
Ba_disp_index6 4.42
num_of_pendown19 4.13
Ba_max_x_extension21 2.80
Ba_gmrt_in_air7 6.22
Ba_air_time23 9.33
Ba_num_of_pendown15 4.72
Ba_mean_speed_on_paper11 4.92
Ba_max_y_extension19 3.22
pressure_var19 2.94
Ba_mean_jerk_on_paper21 4.54
Ba_air_time17 7.30
Ba_gmrt_in_air23 6.02
Ba_mean_acc_in_air2 3.74
Ba_pressure_mean18 4.01
Ba_air_time13 5.66
num_of_pendown23 4.83
Ba_num_of_pendown9 4.89
Ba_gmrt_on_paper7 5.15
Ba_pressure_mean4 4.99
Ba_gmrt_on_paper9 5.82
Ba_paper_time12 5.59
paper_time17 5.96
Ba_air_time5 5.22
max_x_extension6 1.91
Ba_air_time4 4.18
Ba_gmrt_on_paper1 3.59
Ba_mean_jerk_on_paper9 3.30
pressure_mean9 4.41
paper_time15 5.90
Ba_num_of_pendown5 5.21
Ba_air_time7 6.78
Ba_gmrt_in_air8 3.68
Ba_pressure_mean15 4.31
pressure_var5 4.07
paper_time2 4.73
## The features by Outcome Drive not in Blind
pander::pander(as.matrix(x1[!(get_De_names %in% get_De_namesU)]))
Ba_paper_time23 8.31
Ba_paper_time9 7.47
Ba_mean_speed_in_air19 4.20
Ba_total_time6 6.21
Ba_mean_gmrt14 4.60
pressure_mean1 3.73
Ba_mean_speed_in_air25 5.87
Ba_pressure_mean8 4.45
Ba_total_time8 5.24
disp_index7 4.45
air_time9 3.95
Ba_gmrt_on_paper15 4.15
Ba_total_time15 8.70
Ba_total_time24 4.46
Ba_mean_speed_on_paper25 3.91
Ba_total_time18 5.47
Ba_air_time19 1.26
Ba_total_time2 5.81
Ba_total_time3 4.96
Ba_total_time16 5.95
Ba_pressure_var4 4.19
Ba_mean_acc_in_air17 4.78
pressure_mean7 4.39
Ba_disp_index15 4.13
Ba_paper_time24 3.72
Ba_gmrt_on_paper22 4.57
Ba_mean_gmrt1 4.12
Ba_mean_jerk_on_paper24 3.32
paper_time13 4.83
max_y_extension25 1.92
Ba_air_time11 4.04
Ba_mean_acc_on_paper3 3.67
Ba_mean_speed_on_paper2 4.31
air_time12 3.95

## The features not in outcome driven
pander::pander(as.matrix(x2[!(get_De_namesU %in% get_De_names)]))
Ba_total_time9 6.83
Ba_air_time6 5.86
Ba_gmrt_on_paper23 5.00
Ba_air_time8 3.73
Ba_air_time15 8.15
Ba_air_time2 5.07
Ba_air_time18 4.43
Ba_mean_acc_in_air25 5.01
disp_index9 4.37
Ba_mean_acc_on_paper24 1.96
Ba_air_time24 3.97
Ba_air_time16 5.67
Ba_mean_speed_in_air6 2.13
Ba_mean_acc_in_air21 1.49

The Final Table

I’ll create a table subset of the logistic model from the Outcome-Driven decorrelated data.

The table will have:

  1. The top associated features described by the feature network, as well as, and the new features.

    1. For Decorrelated features it will provide the decorrelation formula
  2. Nugget labels

    1. The label of nugget as found by the clustering procedure
  3. The feature coefficient

  4. The feature Odd ratios and their corresponding 95%CI


## The features in top nugget
clusterFeatures <- clusterOutcome$names
## The new features 
discoveredFeatures <- newvars[zvaluePrePost$ZUni<1.96]

tablefinal <- smDecor$coefficients[unique(c(clusterFeatures,discoveredFeatures)),
                                   c("Estimate","lower","OR","upper","z.IDI")]

nugget <- clusterOutcome$membership
names(nugget) <- clusterOutcome$names
tablefinal$Nugget <- nugget[rownames(tablefinal)]
tablefinal$Nugget[is.na(tablefinal$Nugget)] <- "D"
deFromula <- character(length(theDeFormulas))
names(deFromula) <- names(theDeFormulas)
for (dx in names(deFromula))
{
  coef <- theDeFormulas[[dx]]
  cname <- names(theDeFormulas[[dx]])
  names(cname) <- cname
  for (cf in names(coef))
  {
    if (cf != dx)
    {
      if (coef[cf]>0)
      {
        deFromula[dx] <- paste(deFromula[dx],
                               sprintf("+ %5.3f*%s",coef[cf],cname[cf]))
      }
      else
      {
        deFromula[dx] <- paste(deFromula[dx],
                               sprintf("%5.3f*%s",coef[cf],cname[cf]))
      }
    }
  }
}
tablefinal$DecorFormula <- deFromula[rownames(tablefinal)]
pander::pander(tablefinal)
  Estimate lower OR upper z.IDI Nugget DecorFormula
Ba_air_time23 0.0589 1.0335 1.061 1.088 4.49 1 NA
Ba_total_time15 0.0972 1.0514 1.102 1.155 4.06 1 NA
Ba_paper_time23 0.7833 1.7139 2.189 2.795 6.29 2 NA
De_mean_gmrt23 0.6458 1.3262 1.908 2.744 3.49 2 -0.845gmrt_in_air23 + 1.000mean_gmrt23
Ba_paper_time9 0.5798 1.4641 1.786 2.178 5.80 3 NA
Ba_air_time17 0.1373 1.0764 1.147 1.223 4.20 2 NA
Ba_gmrt_in_air23 -0.6472 0.3876 0.524 0.707 4.16 3 NA
De_mean_gmrt19 3.0693 4.8581 21.527 95.388 4.04 3 -0.398gmrt_on_paper19 + 1.000mean_gmrt19 -0.685*mean_speed_in_air19
Ba_air_time22 0.4542 1.3374 1.575 1.855 5.56 4 NA
Ba_total_time6 0.4300 1.2967 1.537 1.823 5.01 4 NA
Ba_max_x_extension21 -1.9213 0.0646 0.146 0.332 4.63 3 NA
num_of_pendown19 -0.9902 0.2470 0.371 0.559 5.06 4 NA
De_gmrt_in_air19 4.2867 11.9647 72.728 442.083 4.82 4 + 1.000gmrt_in_air19 -0.944mean_speed_in_air19
Ba_air_time7 0.0712 1.0331 1.074 1.116 3.45 4 NA
Ba_mean_speed_in_air25 -0.7592 0.3405 0.468 0.643 4.70 4 NA
Ba_total_time8 0.2211 1.1249 1.247 1.383 4.40 4 NA
Ba_gmrt_in_air7 -0.2184 0.7288 0.804 0.886 4.52 5 NA
Ba_total_time2 0.1246 1.0614 1.133 1.209 3.98 5 NA
Ba_air_time5 0.0801 1.0406 1.083 1.128 3.82 5 NA
pressure_var19 -0.3574 0.5909 0.699 0.828 4.34 5 NA
paper_time15 0.1230 1.0552 1.131 1.212 3.54 5 NA
pressure_mean9 -0.7915 0.2905 0.453 0.707 3.55 6 NA
Ba_air_time13 0.0548 1.0275 1.056 1.086 3.98 7 NA
Ba_total_time16 0.0666 1.0297 1.069 1.110 3.70 8 NA
Ba_gmrt_on_paper9 -0.0958 0.8629 0.909 0.957 3.85 7 NA
Ba_paper_time12 0.1641 1.0828 1.178 1.282 3.84 7 NA
Ba_mean_jerk_on_paper24 -0.3081 0.6150 0.735 0.878 3.34 5 NA
paper_time17 0.1015 1.0492 1.107 1.168 3.82 7 NA
De_mean_jerk_in_air7 0.7530 1.3893 2.123 3.245 3.65 5 -1.149mean_acc_in_air7 + 1.000mean_jerk_in_air7
num_of_pendown23 0.8183 1.4931 2.267 3.441 3.94 7 NA
Ba_num_of_pendown5 0.0541 1.0240 1.056 1.088 3.45 7 NA
Ba_disp_index6 0.3630 1.2493 1.438 1.655 5.23 8 NA
De_pressure_mean5 -0.3331 0.6003 0.717 0.856 3.83 5 -0.889max_y_extension5 + 1.000pressure_mean5
Ba_total_time18 0.1231 1.0588 1.131 1.208 4.00 9 NA
Ba_num_of_pendown9 0.0565 1.0264 1.058 1.091 3.94 7 NA
Ba_total_time24 0.1504 1.0789 1.162 1.252 4.02 8 NA
Ba_max_y_extension19 -1.2828 0.1521 0.277 0.505 4.37 9 NA
Ba_pressure_mean4 -1.3352 0.1313 0.263 0.527 3.90 5 NA
pressure_mean7 -0.3221 0.6041 0.725 0.869 3.63 10 NA
De_paper_time10 0.3464 1.1799 1.414 1.694 3.95 8 -1.620disp_index10 + 1.000paper_time10
Ba_mean_jerk_on_paper21 0.3979 1.2225 1.489 1.813 4.30 7 NA
Ba_num_of_pendown15 0.2947 1.1667 1.343 1.545 4.49 9 NA
De_mean_acc_on_paper21 0.3146 1.1525 1.370 1.628 3.60 5 + 1.000mean_acc_on_paper21 + 0.610mean_speed_on_paper21
De_num_of_pendown21 -0.1223 0.8290 0.885 0.945 3.84 7 -0.803air_time21 + 1.000num_of_pendown21
Ba_air_time4 0.0334 1.0152 1.034 1.053 3.75 7 NA
Ba_mean_jerk_on_paper9 0.2390 1.1112 1.270 1.451 3.62 8 NA
Ba_mean_speed_on_paper11 -0.1753 0.7719 0.839 0.912 4.47 9 NA
Ba_gmrt_on_paper7 -0.0296 0.9564 0.971 0.985 3.93 11 NA
De_disp_index22 0.1689 1.0887 1.184 1.288 4.10 9 NA
Ba_total_time3 0.0400 1.0176 1.041 1.064 3.83 12 NA
paper_time13 0.0498 1.0216 1.051 1.081 3.32 9 NA
De_gmrt_on_paper17 -0.0467 0.9302 0.954 0.979 3.44 13 NA
De_disp_index17 0.1156 1.0538 1.123 1.196 3.51 14 + 1.000disp_index17 -1.396max_y_extension17
Ba_mean_speed_in_air19 -0.2775 0.6695 0.758 0.858 5.02 15 NA
pressure_var5 0.0129 1.0050 1.013 1.021 3.30 16 NA
Ba_air_time19 -0.0377 0.9446 0.963 0.982 3.99 15 NA
De_paper_time5 -0.0551 0.9174 0.946 0.976 3.45 D -1.154max_y_extension5 + 0.946mean_speed_on_paper5 + 1.000*paper_time5
De_max_y_extension24 -0.2221 0.6975 0.801 0.919 3.20 D -0.987max_x_extension24 + 1.000max_y_extension24

Saving all the generated data

save.image("~/GitHub/FCA/DARWINDemo.RData")