Foreword: About the Machine Learning in Medicine (MLM) project
The MLM project has been initialized in 2016 and aims to:
Encourage using Machine Learning techniques in medical research in Vietnam and
Promote the use of R statistical programming language, an open source and leading tool for practicing data science.
Introduction
H2O is an open-source, fast and scalable machine learning platform that could be implemented into R environment. Unlike other platforms (caret, mlr) in R, h2o provides its own library of machine learning algorithms, including the most powerful ones like Random Forest and Deep learning. H2o algorithms can also be applied with full compatibility to other ML platforms such as mlr and sparklyr.
Different to other available stand-alone ML packages in R, all h2o’s algorithms were customised to provide optimal “out-of-the-box” performances. Those include a fast training speed, high stability and good accuracy for any task, even without any tuning.
In the present tutorial, we will set a benchmark study to compare the performance of two most powerful algorithms in h2o in their native forms (with neither tuning nor feature preprocess): Random Forest vs Deep neural net
Materials and method
Our case study implies the UCI’s Mammographic dataset. Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead.
The dataset has been built for developing a model that allows to predict the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient’s age. It contains patient’s age and three BI-RADS attributes together with the ground truth (the severity field) for 516 benign and 445 malignant masses that have been identified on full field digital mammograms collected at the Institute of Radiology of the University Erlangen-Nuremberg between 2003 and 2006. These can be an indication of how well a CAD system performs compared to the radiologists.
First, we will prepare the ggplot theme for our experiment
library(tidyverse)
my_theme <- function(base_size = 10, base_family = "sans"){
theme_minimal(base_size = base_size, base_family = base_family) +
theme(
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5),
axis.title = element_text(size = 12),
panel.grid.major = element_line(color = "grey"),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "#faefff"),
strip.background = element_rect(fill = "#400156", color = "#400156", size =0.5),
strip.text = element_text(face = "bold", size = 10, color = "white"),
legend.position = "bottom",
legend.justification = "center",
legend.background = element_blank(),
panel.border = element_rect(color = "grey30", fill = NA, size = 0.5)
)
}
theme_set(my_theme())
mycolors=c("#ce002c","#02afdb","#ca20f9","#18bf7f","#f9b700")
Now we load the dataset from UCI website and make some modifications
df=read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/mammographic-masses/mammographic_masses.data",sep=",",na.strings ="?")%>%as_tibble()%>%.[,-1]
names(df)=c("Age","Shape","Margin","Density","Severity")
df$Shape=recode_factor(df$Shape,`1` = "Round", `2` = "Oval",`3` = "Lobular", `4` = "Irregular" )
df$Margin=recode_factor(df$Margin,`1` = "Circumscribed", `2` = "Microlobulated",`3` = "Obscured", `4` = "Illdefined",`5` = "Spiculated" )
df$Density=recode_factor(df$Density,`1` = "High", `2` = "Iso",`3` = "Low" , `4` = "Fatcontaining")
df$Severity=recode_factor(df$Severity,`0` = "Benign", `1` = "Malignant")
df$Age=as.numeric(df$Age)
df
## # A tibble: 961 × 5
## Age Shape Margin Density Severity
## <dbl> <fctr> <fctr> <fctr> <fctr>
## 1 67 Lobular Spiculated Low Malignant
## 2 43 Round Circumscribed NA Malignant
## 3 58 Irregular Spiculated Low Malignant
## 4 28 Round Circumscribed Low Benign
## 5 74 Round Spiculated NA Malignant
## 6 65 Round NA Low Benign
## 7 70 NA NA Low Benign
## 8 42 Round NA Low Benign
## 9 57 Round Spiculated Low Malignant
## 10 60 NA Spiculated High Malignant
## # ... with 951 more rows
There are some missing values in our data. The good new is that both Random Forest and DL algorithms can handle automatically missing values (only in h2o) so we dont need to do anything.
bar_missing <- function(x){
library(dplyr)
library(reshape2)
library(ggplot2)
x %>%
is.na %>%
melt %>%
ggplot(data = .,
aes(x = Var2)) +
geom_bar(aes(y=(..count..),fill=value),alpha=0.7)+scale_fill_manual(values=c("skyblue","red"),name = "",
labels = c("Available","Missing"))+
theme_minimal()+
theme(axis.text.x = element_text(angle=45, vjust=0.5)) +
labs(x = "Variables in Dataset",
y = "Observations")+coord_flip()
}
matrix_missing <- function(x){
library(dplyr)
library(reshape2)
library(ggplot2)
x %>%
is.na %>%
melt %>%
ggplot(data = .,
aes(x = Var1,
y = Var2)) +
geom_tile(aes(fill = value),alpha=0.6) +
scale_fill_manual(values=c("skyblue","red"),name = "",
labels = c("Available","Missing")) +
theme_minimal()+
theme(axis.text.x = element_text(angle=45, vjust=0.5)) +
labs(x = "Variables in Dataset",
y = "Total observations")+coord_flip()
}
df%>%bar_missing()
df%>%matrix_missing()
Data exploration
Hmisc::describe(df)
## df
##
## 5 Variables 961 Observations
## ---------------------------------------------------------------------------
## Age
## n missing distinct Info Mean Gmd .05 .10
## 956 5 73 0.999 55.49 16.43 30.75 36.00
## .25 .50 .75 .90 .95
## 45.00 57.00 66.00 73.50 78.00
##
## lowest : 18 19 20 21 22, highest: 86 87 88 93 96
## ---------------------------------------------------------------------------
## Shape
## n missing distinct
## 930 31 4
##
## Value Round Oval Lobular Irregular
## Frequency 224 211 95 400
## Proportion 0.241 0.227 0.102 0.430
## ---------------------------------------------------------------------------
## Margin
## n missing distinct
## 913 48 5
##
## Value Circumscribed Microlobulated Obscured Illdefined
## Frequency 357 24 116 280
## Proportion 0.391 0.026 0.127 0.307
##
## Value Spiculated
## Frequency 136
## Proportion 0.149
## ---------------------------------------------------------------------------
## Density
## n missing distinct
## 885 76 4
##
## Value High Iso Low Fatcontaining
## Frequency 16 59 798 12
## Proportion 0.018 0.067 0.902 0.014
## ---------------------------------------------------------------------------
## Severity
## n missing distinct
## 961 0 2
##
## Value Benign Malignant
## Frequency 516 445
## Proportion 0.537 0.463
## ---------------------------------------------------------------------------
Data visualising
p1=df%>%ggplot(aes(x=Severity,fill=Shape))+geom_bar(position="fill",color="black",alpha=0.7)+scale_fill_manual(values=mycolors)+coord_flip()+ggtitle("Shape")
p2=df%>%ggplot(aes(x=Severity,fill=Margin))+geom_bar(position="fill",color="black",alpha=0.7)+scale_fill_manual(values=mycolors)+coord_flip()+ggtitle("Margin")
p3=df%>%ggplot(aes(x=Severity,fill=Density))+geom_bar(position="fill",color="black",alpha=0.7)+scale_fill_manual(values=mycolors)+coord_flip()+ggtitle("Density")
p4=df%>%ggplot(aes(fill=Severity,x=Severity,y=Age))+geom_boxplot(alpha=0.7)+scale_fill_manual(values=c("#02afdb","#ce002c"))+coord_flip()+ggtitle("Age")
library(gridExtra)
grid.arrange(p1,p2,p3,p4,ncol=2)
Our data have a simple structure. To summarize, we have a binary classification problem. The target is Tumor severity (Benign or Malignant). There are 4 features: a numerical one (Age) and 3 other multilevel categorical features (Shape, Margin and Density). We can see clear contrasts in Age, Shape, Margin and Density between 2 classes of severity.
The caret package will be used for data splitting. Original dataset will be randomly splitted into 3 parts: a training subset of 761 cases that will be used for model training, a validation and testing subsets with equal size of 100, they will be used for model testing. The proportion of 2 labels in Severity was well balanced across 3 subsets.
library(caret)
set.seed(123)
idTrain=createDataPartition(y=df$Severity,p=760/961,list=FALSE)
trainset=df[idTrain,]
remain=df[-idTrain,]
idtest=createDataPartition(y=remain$Severity,p=99/200,list=FALSE)
validset=remain[idtest,]
testset=remain[-idtest,]
sp1=df%>%ggplot(aes(x=Severity,fill=Severity))+stat_count(color="black",alpha=0.7,show.legend = F)+scale_fill_manual(values=c("#02afdb","#ce002c"))+coord_flip()+ggtitle("Origin")
sp2=trainset%>%ggplot(aes(x=Severity,fill=Severity))+stat_count(color="black",alpha=0.7,show.legend = F)+scale_fill_manual(values=c("#02afdb","#ce002c"))+coord_flip()+ggtitle("Train")
sp3=validset%>%ggplot(aes(x=Severity,fill=Severity))+stat_count(color="black",alpha=0.7,show.legend = F)+scale_fill_manual(values=c("#02afdb","#ce002c"))+coord_flip()+ggtitle("Valid")
sp4=testset%>%ggplot(aes(x=Severity,fill=Severity))+stat_count(color="black",alpha=0.7,show.legend = F)+scale_fill_manual(values=c("#02afdb","#ce002c"))+coord_flip()+ggtitle("Test")
grid.arrange(sp4,sp3,sp2,sp1,ncol=1)
Initialising h2o and data spliting
The initialisation of h2o package has been introduced in previous tutorials. In brief, we need to bring up h2o then to transform 3 datasets into h2o frames.
library(h2o)
h2o.init(nthreads = -1,max_mem_size ="4g")
##
## H2O is not running yet, starting it now...
##
## Note: In case of errors look at the following log files:
## C:\Users\Admin\AppData\Local\Temp\RtmpaGlRyh/h2o_Admin_started_from_r.out
## C:\Users\Admin\AppData\Local\Temp\RtmpaGlRyh/h2o_Admin_started_from_r.err
##
##
## Starting H2O JVM and connecting: ... Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 11 seconds 541 milliseconds
## H2O cluster version: 3.10.3.6
## H2O cluster version age: 1 month and 30 days
## H2O cluster name: H2O_started_from_R_Admin_bqm621
## H2O cluster total nodes: 1
## H2O cluster total memory: 3.56 GB
## H2O cluster total cores: 0
## H2O cluster allowed cores: 0
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## R Version: R version 3.3.1 (2016-06-21)
wdata=as.h2o(df)
##
|
| | 0%
|
|=================================================================| 100%
wtrain=as.h2o(trainset)
##
|
| | 0%
|
|=================================================================| 100%
wtest=as.h2o(testset)
##
|
| | 0%
|
|=================================================================| 100%
wvalid=as.h2o(validset)
##
|
| | 0%
|
|=================================================================| 100%
response="Severity"
features=setdiff(colnames(wtrain),response)
A) Training RF and DL models in h2o
As mentioned above, we will apply 2 ML algorithms in their native forms in h2o:
H2O’s Random Forest (RF) is a powerful algorithm. On a given dataset, RF will construct a combination (forest) of many elementary decision tree (max number = 50 trees by default) with randomized structure of depth, input features and instances for each tree. A single tree’s performance might be weak, but when they reunite together as an ensemble, their combined performance would be remarkably amplified.
H2O’s Deep Learning consists of a multi-layer feed-forward artificial neural network, trained with stochastic gradient descent using back-propagation. A deep neural net can contain a large number of hidden layers of neurons, characterized by an activation function with or without regularisations/penalties.
#Out of the box RF vs DL
# RF
rfmod0=h2o.randomForest(
model_id = "RF",
x = features,
y = response,
training_frame = wtrain,
validation_frame = wvalid,
nfolds=10,
fold_assignment = "Stratified",
seed=123)
##
|
| | 0%
|
|=========================================================== | 91%
|
|=============================================================== | 98%
|
|=================================================================| 100%
#DL
dlmod0=h2o.deeplearning (x = features,
y = response,
model_id = "DL",
training_frame = wtrain,
validation_frame = wvalid,
nfolds = 10,
fold_assignment = "Stratified",
replicate_training_data = TRUE,
keep_cross_validation_fold_assignment = TRUE,
keep_cross_validation_predictions=FALSE,
score_each_iteration = TRUE,
reproducible = TRUE,seed=123)
##
|
| | 0%
|
|= | 2%
|
|==== | 6%
|
|=========== | 16%
|
|======================== | 37%
|
|============================================================ | 92%
|
|=================================================================| 100%
Those commands will show us the original structure of each algorithm
dlmod0@model$model_summary
## Status of Neuron Layers: predicting Severity, 2-class classification, bernoulli distribution, CrossEntropy loss, 44Â 202 weights/biases, 526,5 KB, 8Â 371 training samples, mini-batch size 1
## layer units type dropout l1 l2 mean_rate rate_rms
## 1 1 17 Input 0.00 %
## 2 2 200 Rectifier 0.00 % 0.000000 0.000000 0.004975 0.003049
## 3 3 200 Rectifier 0.00 % 0.000000 0.000000 0.128367 0.256157
## 4 4 2 Softmax 0.000000 0.000000 0.002692 0.003062
## momentum mean_weight weight_rms mean_bias bias_rms
## 1
## 2 0.000000 -0.017817 0.097536 0.227792 0.103061
## 3 0.000000 -0.007845 0.070605 0.957456 0.051503
## 4 0.000000 -0.029633 0.382043 -0.000000 0.000750
rfmod0@model$model_summary
## Model Summary:
## number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1 50 50 80387 13
## max_depth mean_depth min_leaves max_leaves mean_leaves
## 1 20 15.90000 83 143 111.94000
Our Deep learning model consists of a neural network of 4 layers, 1 input layers of 17 neurons (each one refers to a level of categorical variables, including Missing values, or 1 numerical variable = Age), with 0% drop-out rate, 1 output layer of 2 neurons, each one supports the estimation of probability of Benign or Malignant tumors, 2 hidden layers, each one contains 200 neurons. Activation function is Rectifier without dropout and no regularisation.
Our Random Forest model consists of a combination of 50 trees (maximal number of tree by default). The max-depth was fixed at 20 for each tree, the averaged tree depth was 15.9 (minimum= 13). Minimum and maximum number of leaves were respectively 83 and 143. Each tree has averagely about 112 leaves. By default,drop-out rate for input was defined as squareroot of 4 columns or 2, and drop-out rate ratio for instances was about 0.63).
Evaluating the out of box performances
The performances of two algorithms in their native forms will be evaluated here. First, we will look at the averaged of cross-validation on a fixed, independent validation set (n=100):
cvdl0=dlmod0@model$cross_validation_metrics_summary%>%as_tibble()%>%mutate(Metric=rownames(.))%>%gather(cv_1_valid:cv_10_valid,key="Fold",value="Result")%>%mutate(.,model="OOB_DL")
cvrf0=rfmod0@model$cross_validation_metrics_summary%>%as_tibble()%>%mutate(Metric=rownames(.))%>%gather(cv_1_valid:cv_10_valid,key="Fold",value="Result")%>%mutate(.,model="OOB_RF")
cv0=rbind(cvdl0,cvrf0)%>%subset(.,Metric!="err_count" & Metric!="lift_top_group")
cv0$Result=as.numeric(cv0$Result)
cv0%>%ggplot(aes(x=Metric,y=Result,fill=model))+geom_boxplot(alpha=0.6)+coord_flip()+scale_y_continuous(breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +scale_fill_manual(values=mycolors)+ggtitle("Out of box models, 10x10CV")
The graph indicates that averaged performance of DL model on validation set was slightly better than that of RF model. DL model produces higher Accuracy, AUC, recall and specificity, as well as lower mse, absolute error rate, mcc and logloss.
Should we believe that DL is really better than RF ? I don’t think so… We must also validate two models on the Testing subset by confusion matrix:
h2o.performance(dlmod0,wtest)
## H2OBinomialMetrics: deeplearning
##
## MSE: 0.1773853
## RMSE: 0.4211713
## LogLoss: 0.5871914
## Mean Per-Class Error: 0.2041063
## AUC: 0.8194444
## Gini: 0.6388889
##
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
## Benign Malignant Error Rate
## Benign 39 15 0.277778 =15/54
## Malignant 6 40 0.130435 =6/46
## Totals 45 55 0.210000 =21/100
##
## Maximum Metrics: Maximum metrics at their respective thresholds
## metric threshold value idx
## 1 max f1 0.396206 0.792079 49
## 2 max f2 0.215605 0.846774 58
## 3 max f0point5 0.651672 0.772358 44
## 4 max accuracy 0.651672 0.800000 44
## 5 max precision 0.982446 1.000000 0
## 6 max recall 0.042028 1.000000 88
## 7 max specificity 0.982446 1.000000 0
## 8 max absolute_mcc 0.651672 0.601929 44
## 9 max min_per_class_accuracy 0.710898 0.777778 43
## 10 max mean_per_class_accuracy 0.651672 0.801932 44
##
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.performance(rfmod0,wtest)
## H2OBinomialMetrics: drf
##
## MSE: 0.1833406
## RMSE: 0.4281829
## LogLoss: 0.6939248
## Mean Per-Class Error: 0.1964573
## AUC: 0.7882448
## Gini: 0.5764895
##
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
## Benign Malignant Error Rate
## Benign 41 13 0.240741 =13/54
## Malignant 7 39 0.152174 =7/46
## Totals 48 52 0.200000 =20/100
##
## Maximum Metrics: Maximum metrics at their respective thresholds
## metric threshold value idx
## 1 max f1 0.531768 0.795918 44
## 2 max f2 0.268992 0.840164 52
## 3 max f0point5 0.531768 0.767717 44
## 4 max accuracy 0.531768 0.800000 44
## 5 max precision 0.999200 1.000000 0
## 6 max recall 0.001488 1.000000 82
## 7 max specificity 0.999200 1.000000 0
## 8 max absolute_mcc 0.531768 0.605624 44
## 9 max min_per_class_accuracy 0.630907 0.760870 40
## 10 max mean_per_class_accuracy 0.531768 0.803543 44
##
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
The result of confusion matrix is unclear for testing subset: Overall the DL model misclassified 21 cases while RD forest commited only 20 errors. However, using RF model seems to be safer, as it’s false negative rate was only 13/54 versus 15/54 by DL model. It will be unacceptable if a patient with malignant tumor is misclassified as Benign tumor (so biopsy will not be considered and the patient’s disease get more severe without treatment). False positive rate is equal for both models.
Extending the evaluation with mlr package
mlr package provides more metrics and functions for evaluating the performance. Though mlr package also supports h2o algorithms, we dont need to train them again, and we would like to replicate the original models as they were generated in h2o. So we will use a TRICK here: We will train 2 dummies h2O models in mlr, but instead of using directly those mlr based models, we CLONE or replicate the trained models in h2o into mlr’s class models. We will get exactly our two h2o models in mlr class.
library(mlr)
taskMammo=mlr::makeClassifTask(id="Mammography",data=na.omit(df),target="Severity",positive = "Malignant")
learnerDL = makeLearner(id="DL","classif.h2o.deeplearning", predict.type = "prob")
learnerRF = makeLearner(id="RF","classif.h2o.randomForest", predict.type = "prob")
mlrDL=mlr::train(learnerDL,taskMammo)
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|====== | 10%
|
|============================================== | 70%
|
|=================================================================| 100%
mlrDL$learner.model=dlmod0
mlrRF=mlr::train(learnerRF,taskMammo)
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|= | 2%
|
|========================================================= | 88%
|
|=================================================================| 100%
mlrRF$learner.model=rfmod0
mets=list(auc,bac,tpr,tnr,mmce,ber,fpr,fnr,ppv,npv)
predDL=predict(mlrDL,newdata=testset)
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
predRF=predict(mlrRF,newdata=testset)
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
pdfDL=predDL%>%performance(.,mets)%>%as_tibble()%>%mutate(.,Metric=row.names(.),Model="DeepLearning")
cbind(pdfDL$Metric,pdfDL$value)
## [,1] [,2]
## [1,] "auc" "0.819444444444444"
## [2,] "bac" "0.78341384863124"
## [3,] "tpr" "0.826086956521739"
## [4,] "tnr" "0.740740740740741"
## [5,] "mmce" "0.22"
## [6,] "ber" "0.21658615136876"
## [7,] "fpr" "0.259259259259259"
## [8,] "fnr" "0.173913043478261"
## [9,] "ppv" "0.730769230769231"
## [10,] "npv" "0.833333333333333"
pdfRF=predRF%>%performance(.,mets)%>%as_tibble()%>%mutate(.,Metric=row.names(.),Model="RandomForest")
cbind(pdfRF$Metric,pdfRF$value)
## [,1] [,2]
## [1,] "auc" "0.788244766505636"
## [2,] "bac" "0.803542673107891"
## [3,] "tpr" "0.847826086956522"
## [4,] "tnr" "0.759259259259259"
## [5,] "mmce" "0.2"
## [6,] "ber" "0.196457326892109"
## [7,] "fpr" "0.240740740740741"
## [8,] "fnr" "0.152173913043478"
## [9,] "ppv" "0.75"
## [10,] "npv" "0.854166666666667"
pdf=rbind(pdfDL,pdfRF)
pdf%>%ggplot(aes(x=Metric,y=value,fill=Model))+geom_point(size=5,shape=21,color="black")+coord_flip()+scale_fill_manual(values=mycolors)+scale_y_continuous(breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1))+geom_hline(yintercept=c(0.2,0.8),linetype=3,size=1.2)
The extended validation in mlr package on a pooled dataset has confirmed that RF could be slightly better than DL. Despite that DL model has higher AUC, its balanced accuracy (BAC) was lower and its balanced error rate (BER) was higher than RF. RF model also provides better True positive and true negative rates.
To confirm such observation, we will perform a resampling by bootstrap on our test subset. From the original 100 instances, we will replicate our sample 100 times (10.000 instances). This allows us to determine the confidence interval for our performance’s metrics.
bootPRED=function(mlrmodel,data,i){
d=data%>%.[i,]
pred=predict(mlrmodel,newdata=d)
perf=mlr::performance(pred,measure=list(auc,bac,ber,mmce))
AUC=perf[[1]]
BAC=perf[[2]]
BER=perf[[3]]
MMCE=perf[[4]]
return=cbind(AUC,BAC,BER,MMCE)
}
set.seed(123)
library(boot)
resDL=boot(statistic=bootPRED,mlrmodel=mlrDL,data=testset,R=100)%>%.$t%>%as_tibble()%>%mutate(model="DL")
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
resRF=boot(statistic=bootPRED,mlrmodel=mlrRF,data=testset,R=100)%>%.$t%>%as_tibble()%>%mutate(model="RF")
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
##
|
| | 0%
|
|=================================================================| 100%
names(resDL)=c("AUC","BAC","BER","MMCE","Algorithm")
names(resRF)=c("AUC","BAC","BER","MMCE","Algorithm")
resDL$iteration=rownames(resDL)
resRF$iteration=rownames(resRF)
resboot=rbind(resDL,resRF)
resbootLong=resboot%>%gather(AUC:MMCE,key="Metric",value="Score")
resbootLong%>%ggplot(aes(x=Score,fill=Algorithm))+geom_density(alpha=0.6)+facet_wrap(~Metric,scales="free")+scale_fill_manual(values=mycolors)
resbootLong%>%ggplot(aes(x=Metric,y=Score,fill=Algorithm))+geom_boxplot(alpha=0.6)+facet_wrap(~Metric,scales="free")+scale_fill_manual(values=mycolors)+coord_flip()
Hmisc::describe(resDL)
## resDL
##
## 6 Variables 100 Observations
## ---------------------------------------------------------------------------
## AUC
## n missing distinct Info Mean Gmd .05 .10
## 100 0 98 1 0.8216 0.05299 0.7452 0.7555
## .25 .50 .75 .90 .95
## 0.7979 0.8263 0.8541 0.8746 0.8913
##
## lowest : 0.6979167 0.6986795 0.6994798 0.7262905 0.7389163
## highest: 0.8942308 0.8968591 0.9002020 0.9146538 0.9172676
## ---------------------------------------------------------------------------
## BAC
## n missing distinct Info Mean Gmd .05 .10
## 100 0 94 1 0.7872 0.04849 0.7162 0.7334
## .25 .50 .75 .90 .95
## 0.7617 0.7841 0.8206 0.8388 0.8506
##
## lowest : 0.6802721 0.6855427 0.7006803 0.7053571 0.7070707
## highest: 0.8514610 0.8607810 0.8628247 0.8782051 0.8808374
## ---------------------------------------------------------------------------
## BER
## n missing distinct Info Mean Gmd .05 .10
## 100 0 95 1 0.2128 0.04849 0.1494 0.1612
## .25 .50 .75 .90 .95
## 0.1794 0.2159 0.2383 0.2666 0.2838
##
## lowest : 0.1191626 0.1217949 0.1371753 0.1392190 0.1485390
## highest: 0.2929293 0.2946429 0.2993197 0.3144573 0.3197279
## ---------------------------------------------------------------------------
## MMCE
## n missing distinct Info Mean Gmd .05 .10
## 100 0 19 0.994 0.2173 0.04934 0.1500 0.1600
## .25 .50 .75 .90 .95
## 0.1800 0.2200 0.2400 0.2700 0.2905
##
## Value 0.12 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24
## Frequency 2 1 3 6 6 9 6 6 5 13 8 11
## Proportion 0.02 0.01 0.03 0.06 0.06 0.09 0.06 0.06 0.05 0.13 0.08 0.11
##
## Value 0.25 0.26 0.27 0.29 0.30 0.32 0.34
## Frequency 6 6 4 3 3 1 1
## Proportion 0.06 0.06 0.04 0.03 0.03 0.01 0.01
## ---------------------------------------------------------------------------
## Algorithm
## n missing distinct value
## 100 0 1 DL
##
## Value DL
## Frequency 100
## Proportion 1
## ---------------------------------------------------------------------------
## iteration
## n missing distinct
## 100 0 100
##
## lowest : 1 10 100 11 12 , highest: 95 96 97 98 99
## ---------------------------------------------------------------------------
Hmisc::describe(resRF)
## resRF
##
## 6 Variables 100 Observations
## ---------------------------------------------------------------------------
## AUC
## n missing distinct Info Mean Gmd .05 .10
## 100 0 98 1 0.7898 0.05929 0.7090 0.7272
## .25 .50 .75 .90 .95
## 0.7566 0.7874 0.8238 0.8539 0.8704
##
## lowest : 0.6385417 0.6682055 0.6743434 0.6848000 0.7081493
## highest: 0.8709936 0.8922276 0.8955857 0.8985594 0.9006061
## ---------------------------------------------------------------------------
## BAC
## n missing distinct Info Mean Gmd .05 .10
## 100 0 95 1 0.7999 0.04614 0.7298 0.7499
## .25 .50 .75 .90 .95
## 0.7758 0.8009 0.8240 0.8521 0.8617
##
## lowest : 0.6940988 0.6958333 0.7050505 0.7226891 0.7252747
## highest: 0.8627091 0.8706597 0.8731884 0.8969697 0.8995598
## ---------------------------------------------------------------------------
## BER
## n missing distinct Info Mean Gmd .05 .10
## 100 0 95 1 0.2001 0.04614 0.1383 0.1479
## .25 .50 .75 .90 .95
## 0.1760 0.1991 0.2242 0.2501 0.2702
##
## lowest : 0.1004402 0.1030303 0.1268116 0.1293403 0.1372909
## highest: 0.2747253 0.2773109 0.2949495 0.3041667 0.3059012
## ---------------------------------------------------------------------------
## MMCE
## n missing distinct Info Mean Gmd .05 .10
## 100 0 19 0.992 0.2048 0.04714 0.1400 0.1500
## .25 .50 .75 .90 .95
## 0.1800 0.2000 0.2300 0.2600 0.2705
##
## Value 0.10 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23
## Frequency 2 1 4 5 3 8 8 9 16 7 8 5
## Proportion 0.02 0.01 0.04 0.05 0.03 0.08 0.08 0.09 0.16 0.07 0.08 0.05
##
## Value 0.24 0.25 0.26 0.27 0.28 0.30 0.31
## Frequency 5 7 4 3 2 2 1
## Proportion 0.05 0.07 0.04 0.03 0.02 0.02 0.01
## ---------------------------------------------------------------------------
## Algorithm
## n missing distinct value
## 100 0 1 RF
##
## Value RF
## Frequency 100
## Proportion 1
## ---------------------------------------------------------------------------
## iteration
## n missing distinct
## 100 0 100
##
## lowest : 1 10 100 11 12 , highest: 95 96 97 98 99
## ---------------------------------------------------------------------------
The bootstraped validation on test subset showed that RF model’s performance seems to be better than that of DL model. To verify whether such difference would be significative or not, we will apply a Wilcoxon test (A student t-test could also be used as the metrics were normally distributed in both models):
wilcox.test(x=resDL$MMCE,y=resRF$MMCE,alternative = "greater")
##
## Wilcoxon rank sum test with continuity correction
##
## data: resDL$MMCE and resRF$MMCE
## W = 5802.5, p-value = 0.02471
## alternative hypothesis: true location shift is greater than 0
wilcox.test(x=resDL$BAC,y=resRF$BAC,alternative = "less")
##
## Wilcoxon rank sum test with continuity correction
##
## data: resDL$BAC and resRF$BAC
## W = 4124, p-value = 0.01621
## alternative hypothesis: true location shift is less than 0
wilcox.test(x=resDL$BER,y=resRF$BER,alternative = "greater")
##
## Wilcoxon rank sum test with continuity correction
##
## data: resDL$BER and resRF$BER
## W = 5875, p-value = 0.01631
## alternative hypothesis: true location shift is greater than 0
wilcox.test(x=resDL$AUC,y=resRF$AUC,alternative = "greater")
##
## Wilcoxon rank sum test with continuity correction
##
## data: resDL$AUC and resRF$AUC
## W = 6788, p-value = 6.282e-06
## alternative hypothesis: true location shift is greater than 0
Based on those results, now we can confirm that the RF model’s performance was statistically better than that of DL model.
Conclusion
Through this tutorial, we have seen that both Deep learning and Random Forest were powerful, even in their basic training protocole with no tuning. Both RF and DL models allow to classify the Mammographic result with a good balanced accuracy.
Random Forest would be a better choice as this algorithm is faster, easier to tune. The RF model performed as well as Deep neuralnetwork and even better when false negative misclassfication is concerned. Despite that Random forest won this match, we can’t make any conclusion about it’s performance in other problems, as we all know about “No free lunch theory” based on which there will be no absolute solution for all problems. When we test RF and DL algorithms on an infinite amount of tasks, we might finally find that no one is better than other.
So everything you have seen is not the end of the road. Hyper-parameter tuning, feature selection and larger data could eventually improve the performance of any algorithm.
END
Reference:
M. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Medical Physics 34(11), pp. 4164-4172