Improved Classification with Super Learner

Ensemble Modeling Overview

Imagine there is a jar of marbles and each person is asked to guess how many marbles are in the jar. Some individuals will guess over and some will guess under. What happens if you average the guesses? As technology has advanced, so have statistical models. There are now multiple choices of models for classification including: logistic regression, support vector machines, discriminant analysis, classification trees, etc. The issue arises when deciding which one to use and which tuning parameters to select for each given model. A solution that may improve the classification rates is to combine all predictions into one, more improved prediction (Steinki and Mohammad 2015). This is the basis idea behind ensemble modeling. In class, we discussed a variety of ensemble model methods including: model averaging, bagging, random forests, boosting, bumping, and stacking. The focus of this report is on stacking. I will introduce an R package, SuperLearner, and fit a few base models using classification methods we learned in class and evaluate the area under the curve (AUC) of the receiver operating characteristic (ROC). A ROC curve is a plot of the true positive rate (sensitivity) against the false positive rate (1-specificity) for the different possible cutpoints of a diagnostic test. This shows the tradeoff between sensitivity and specificity. To compare models, we calculate the AUC, a higher AUC indicates a better fit. After fitting the base models, I will combine the base learners into a super learner and compare the final model to the base learners.

####Stacking Stacking is a procedure where multiple learner modules are applied on a dataset to extract multiple predictions, which are then combined into one composite prediction. There are three main steps in stacking: ensemble generation (homogeneous if all base models belong to the same class of models or heterogeneous if all base models belong to a diverse set of models), ensemble pruning, and ensemble integration. Diversity is considered one of the key success factors of ensembles.

Suppose that \(M\) predictors are available, \(\hat{f_1}, \hat{f_2}, ..., \hat{f_M}\). Under a loss or risk function, a weight vector \(\textbf{w}\) is created and the predictor \[\hat{f}(x)=\sum_{m=1}^{M}w_m\hat{f_m}(x).\] These weights are then optimized over a given function such as minimizing the mean square error or maximizing area under the curve (AUC) of receiver operating characteristic (ROC) using cross validation techniques.

Vardeman (2018) explains the theory behind stacking. Consider \(M=2\), with the loss function defined as the mean squared error. Then \(E[y-\hat{f}_1(x)]=0\) and \(E[y-\hat{f}_2(x)]=0\). Define \(\hat{f}_\alpha = \alpha\hat{f}_1+(1-\alpha)\hat{f}_2.\) Then,

\[\begin{align} E[(y-\hat{f}_\alpha(x))^2] &= E[(\alpha\hat{f}_1+(1-\alpha)\hat{f}_2)^2]\\ &= Var(\alpha\hat{f}_1+(1-\alpha)\hat{f}_2)\\ &= (\alpha, 1-\alpha) \text{COV} \begin{pmatrix} y - \hat{f}_1(x) \\ y -\hat{f}_2(x) \end{pmatrix} \begin{pmatrix} \alpha \\ 1-\alpha \end{pmatrix} \end{align}\]

Since this is a quadratic function of \(\alpha,\) that has a minimum, there is a minimizing \(\alpha\) that produces a better expected loss function than either \(\hat{f}_1(x)\) or \(\hat{f}_2(x)\).

In general, \[\textbf{w}^{stack}=\underset{w}\arg\min\sum_{i=1}^N\left(y_i - \sum_{m=1}^M w_m\hat{f}_m^i(x_i)\right)^2\] for the \(m^th\) predictor fit to the training set with the \(i^th\) case removed. This optimizes a kind of leave-one-out cross validation error for a linear combination of \(\hat{f}_m's\). An ad hoc version of stacking-type averaging is more common and is based on an informal consideration of one’s (CV-supported) evaluation of the effectiveness of individual predictors correlations. Recall that a diverse set of base predictors is considered one of the key success factors of ensembles.

####Diabetes Dataset The dataset I have selected for my project is the Pima Indians Diabetes Database found on Kaggle through the UCI data repository, originally from the National Institute of Diabetes and Digestive and idney Diseases. The patients in this study are all females at least age 21 years old and of Pima Indian heritage. The target variable, Outcome, is binary and indicates whether or not a patient has diabetes. The objective is to correctly classify individuals to have diabetes or not based on several medical predictor variables such as the number of pregnancies the patient has had, their BMI, glucose level, insulin level, blood pressure, age, and diabeties pedigree function (DPF) (Smith et al. 1988).

Diagnostics

Testing for outliers, cooks distance, and influential points, the plots below indicate six points which were further removed from the dataset for modeling purposes.

## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
##     rstudent unadjusted p-value Bonferroni p
## 350 2.967294          0.0030043           NA

##       StudRes         Hat       CookD
## 14   1.002281 0.114022424 0.009536169
## 229 -2.634163 0.020610966 0.057205591
## 350  2.967294 0.003069042 0.024729204
## 454 -1.057796 0.078351507 0.007159288
## 503  2.786460 0.005137660 0.024505179
## 707  2.306667 0.023944183 0.031920741

There does not appear to be strong correlation between any predictors. The correlation between skin thickness and insulin is moderate at 0.45. We would expect there to be a correlation between age and number of pregnancies since the older a woman is, the more pregnancies she is likely to have, this correlation is moderate at 0.56.

Due to influential points, I removed the following observations: 14, 229, 350, 454, 502, and 707. I then created a training set including 60% of the dataset and held out the other 40% as a test set for cross validation.

Diabetes <- Diabetes[-c(14, 229, 350, 454, 503, 707),]
outcome  <- Diabetes$Outcome
data     <- subset(Diabetes, select = -Outcome)

# Create train & test sets
set.seed(904)
pct = 0.6
train_obs = sample(nrow(data), floor(nrow(data)*pct))
X_train = data[train_obs, ]
X_holdout = data[-train_obs, ]
Y_train = outcome[train_obs]
Y_holdout = outcome[-train_obs]

table(Y_train, useNA = "ifany")

## Y_train
##   0   1 
## 297 160

Exploring the dataset, we obtain the following coefficients from the logistic regression model using glm(). We observe that the chances of having diabietes increases as the number of pregnancies, glucose levels, BMI, diabetes pedigree function, and age increase while the chances of having diabetes decrease as blood pressure, skin thickness, and insulin levels increase.

##              (Intercept)              Pregnancies                  Glucose 
##             -9.033836090              0.120943571              0.038777313 
##            BloodPressure            SkinThickness                  Insulin 
##             -0.013052960             -0.001082154             -0.001094749 
##                      BMI DiabetesPedigreeFunction                      Age 
##              0.092884435              1.146457478              0.014083702

Looking at the variable importance and added variable plots, glucose is the most important predictor for the outcome of diabetes, followed by BMI, and age, diabetes pedigree function, blood pressure, pregnancies, insulin, and skin thickness.

Super Learner

SuperLearner was developed by Polley et al. (2018), a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. The set of candiate learners can be viewed using listWrappers(). Each wrapper calls the library in which the original learner was developed. For example, if you want to run SL.randomForest, then library(randomForest) must be installed. The screening algorithms serve the purpose of dimension reduction. Naimi and Balzer (2018)

library(SuperLearner)
listWrappers()

## All prediction algorithm wrappers in SuperLearner:

##  [1] "SL.bartMachine"      "SL.bayesglm"         "SL.biglasso"        
##  [4] "SL.caret"            "SL.caret.rpart"      "SL.cforest"         
##  [7] "SL.earth"            "SL.extraTrees"       "SL.gam"             
## [10] "SL.gbm"              "SL.glm"              "SL.glm.interaction" 
## [13] "SL.glmnet"           "SL.ipredbagg"        "SL.kernelKnn"       
## [16] "SL.knn"              "SL.ksvm"             "SL.lda"             
## [19] "SL.leekasso"         "SL.lm"               "SL.loess"           
## [22] "SL.logreg"           "SL.mean"             "SL.nnet"            
## [25] "SL.nnls"             "SL.polymars"         "SL.qda"             
## [28] "SL.randomForest"     "SL.ranger"           "SL.ridge"           
## [31] "SL.rpart"            "SL.rpartPrune"       "SL.speedglm"        
## [34] "SL.speedlm"          "SL.step"             "SL.step.forward"    
## [37] "SL.step.interaction" "SL.stepAIC"          "SL.svm"             
## [40] "SL.template"         "SL.xgboost"

## 
## All screening algorithm wrappers in SuperLearner:

## [1] "All"
## [1] "screen.corP"           "screen.corRank"        "screen.glmnet"        
## [4] "screen.randomForest"   "screen.SIS"            "screen.template"      
## [7] "screen.ttest"          "write.screen.template"

Fit Base Learners

The first step in stacking is to select a number of base models, I selected generalized linear models, generalized additive models, lasso and ridge regression, ranger (a faster version of random forest), kernel support vector machines, and a simple mean. To fit one model in super learner, use SuperLearner() and specify the response, parameters, family, and the function to fit. Below, I fit the binomial generalized linear model and obtained a mean square error risk of 0.152 and an AUC of 0.8259. The coefficient is the weights for each model when combining them into the final super learner model. Since we are only fitting one base learner, it is given a weight of 1.

sl_glm = SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = "SL.glm")
sl_glm

## 
## Call:  
## SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = "SL.glm") 
## 
## 
## 
##                 Risk Coef
## SL.glm_All 0.1501103    1

pred_glm = predict(sl_glm, X_holdout)
AUC_eval("glm", pred_glm$pred, Y_holdout)

## [1] 0.8156812

Taking a look at the code for the SL.glmnet() wrapper, we can see that alpha is set to one, indicating that it defaults to lasso regression.

SL.glmnet

## function (Y, X, newX, family, obsWeights, id, alpha = 1, nfolds = 10, 
##     nlambda = 100, useMin = TRUE, loss = "deviance", ...) 
## {
##     .SL.require("glmnet")
##     if (!is.matrix(X)) {
##         X <- model.matrix(~-1 + ., X)
##         newX <- model.matrix(~-1 + ., newX)
##     }
##     fitCV <- glmnet::cv.glmnet(x = X, y = Y, weights = obsWeights, 
##         lambda = NULL, type.measure = loss, nfolds = nfolds, 
##         family = family$family, alpha = alpha, nlambda = nlambda, 
##         ...)
##     pred <- predict(fitCV, newx = newX, type = "response", s = ifelse(useMin, 
##         "lambda.min", "lambda.1se"))
##     fit <- list(object = fitCV, useMin = useMin)
##     class(fit) <- "SL.glmnet"
##     out <- list(pred = pred, fit = fit)
##     return(out)
## }
## <bytecode: 0x000000001d628808>
## <environment: namespace:SuperLearner>

Fitting the lasso regression model, we obtain a risk of 0.151 and an AUC value of 0.8244.

sl_lasso = SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = "SL.glmnet")

## Loading required namespace: glmnet

sl_lasso

## 
## Call:  
## SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = "SL.glmnet") 
## 
## 
## 
##                    Risk Coef
## SL.glmnet_All 0.1511311    1

pred_lasso = predict(sl_lasso, X_holdout)
AUC_eval("lasso", pred_lasso$pred, Y_holdout)

## [1] 0.8149636

In order to fit a ridge regression model, we can adjust the SL.glmnet() code to set alpha = 0 as shown below. Fitting the ridge regression model, we obtain a risk of 0.1529 and an AUC of 0.824.

set.seed(904)
learners    = create.Learner("SL.glmnet", params = list(alpha = 0))
learners$names

## [1] "SL.glmnet_1"

sl_ridge = SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = learners$names)
sl_ridge

## 
## Call:  
## SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = learners$names) 
## 
## 
## 
##                      Risk Coef
## SL.glmnet_1_All 0.1510155    1

pred_ridge = predict(sl_ridge, X_holdout)
AUC_eval("ridge", pred_ridge$pred, Y_holdout)

## [1] 0.8209912

####Stacked Model To combine base models, we define a library of base learners. Super learner then uses cross validation to minimize the selected method, method.NNLS(), to minimize the mean square error. Other optimization methods include: method.NNLS2(), method.NNloglik(), method.CC_LS(), and method.AUC(nlopt_method=NULL, optim_method="L-BFGS-B", bounds=c(0, Inf), normalize=TRUE). Running the initial stacked model below, we obtain the following risks and model weights.

SL.library <- c("SL.glm", "SL.gam", "SL.glmnet", "SL.glmnet_1", "SL.ranger", "SL.ksvm", "SL.mean")
set.seed(904)
sl_stacked = SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library, 
                          verbose = F, cvControl = list(V = 5), method = "method.NNLS")
sl_stacked

## 
## Call:  
## SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library,  
##     method = "method.NNLS", verbose = F, cvControl = list(V = 5)) 
## 
## 
##                      Risk        Coef
## SL.glm_All      0.1518958 0.055268351
## SL.gam_All      0.1518958 0.000000000
## SL.glmnet_All   0.1511216 0.000000000
## SL.glmnet_1_All 0.1504423 0.884281802
## SL.ranger_All   0.1624184 0.007314434
## SL.ksvm_All     0.1627421 0.053135414
## SL.mean_All     0.2287339 0.000000000

In order to compare models, the function CV.SuperLearner() implements five random assignments of observations to the five fold cross validations to obtain five different weighted models.

set.seed(904)
cv_stacked <- CV.SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library,
                              verbose = F, cvControl = list(V = 5), method = "method.NNLS")

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

summary(cv_stacked)

## 
## Call:  
## CV.SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library,  
##     method = "method.NNLS", verbose = F, cvControl = list(V = 5)) 
## 
## Risk is based on: Mean Squared Error
## 
## All risk estimates are based on V =  5 
## 
##        Algorithm     Ave        se     Min     Max
##    Super Learner 0.15364 0.0095270 0.12748 0.20017
##      Discrete SL 0.15096 0.0094375 0.12748 0.19571
##       SL.glm_All 0.15183 0.0104358 0.12493 0.19716
##       SL.gam_All 0.15183 0.0104358 0.12493 0.19716
##    SL.glmnet_All 0.15086 0.0100379 0.12388 0.19571
##  SL.glmnet_1_All 0.15037 0.0092181 0.12748 0.19274
##    SL.ranger_All 0.16443 0.0091860 0.12936 0.22053
##      SL.ksvm_All 0.16232 0.0095447 0.13524 0.19826
##      SL.mean_All 0.22866 0.0067506 0.21055 0.24468

The summary above and plot below, indicate that the super learner model performed the best with the smallest risk of 0.1518.

plot(cv_stacked) + theme_bw(base_size = 15)

Comparing the AUC of the test set across the different model predictions, the stacked model reaches the second heighest AUC of 0.8282 after the generalized additive model with an AUC of 0.8302.

####Modify Learners In attempts to improve the AUC of the super learner model, I selected the best parameters for the base ranger model, recall that ranger is a version of random forest. This can be seen from the code below.

SL.ranger

## function (Y, X, newX, family, obsWeights, num.trees = 500, mtry = floor(sqrt(ncol(X))), 
##     write.forest = TRUE, probability = family$family == "binomial", 
##     min.node.size = ifelse(family$family == "gaussian", 5, 1), 
##     replace = TRUE, sample.fraction = ifelse(replace, 1, 0.632), 
##     num.threads = 1, verbose = T, ...) 
## {
##     .SL.require("ranger")
##     if (family$family == "binomial") {
##         Y = as.factor(Y)
##     }
##     if (is.matrix(X)) {
##         X = data.frame(X)
##     }
##     fit <- ranger::ranger(`_Y` ~ ., data = cbind(`_Y` = Y, X), 
##         num.trees = num.trees, mtry = mtry, min.node.size = min.node.size, 
##         replace = replace, sample.fraction = sample.fraction, 
##         case.weights = obsWeights, write.forest = write.forest, 
##         probability = probability, num.threads = num.threads, 
##         verbose = verbose)
##     pred <- predict(fit, data = newX)$predictions
##     if (family$family == "binomial") {
##         pred = pred[, "1"]
##     }
##     fit <- list(object = fit, verbose = verbose)
##     class(fit) <- c("SL.ranger")
##     out <- list(pred = pred, fit = fit)
##     return(out)
## }
## <bytecode: 0x000000001d52e650>
## <environment: namespace:SuperLearner>

Adjusting the number of variables selected at each split and the number of trees fitted, I compare sixteen different ranger models.

mtry_seq    <- seq(2, 5, 1)
n.trees_seq <- seq(500, 800, 100) 
learners.ranger <- create.Learner("SL.ranger", tune = list(mtry = mtry_seq, num.trees = n.trees_seq))
learners.ranger$names

##  [1] "SL.ranger_1"  "SL.ranger_2"  "SL.ranger_3"  "SL.ranger_4" 
##  [5] "SL.ranger_5"  "SL.ranger_6"  "SL.ranger_7"  "SL.ranger_8" 
##  [9] "SL.ranger_9"  "SL.ranger_10" "SL.ranger_11" "SL.ranger_12"
## [13] "SL.ranger_13" "SL.ranger_14" "SL.ranger_15" "SL.ranger_16"

In order to compare the various combinations of models, I placed the learners defined above into my super learner library.

set.seed(904)
cv_stacked.ranger <- CV.SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = learners.ranger$names, verbose = F, cvControl = list(V = 5), method = "method.NNLS")
plot(cv_stacked.ranger) + theme_bw()

The best ranger model was SL.ranger_5 with mtry = 2 and num.trees = 600.

SL.ranger_5

## function (...) 
## SL.ranger(..., mtry = 2, num.trees = 600)
## <bytecode: 0x000000002600d1c8>

For a revised super learner model, I kept the logistic generalized linear model, adjusted the gernalized additive model as below, kept the ridge regression model, adjusted the ranger model as below, and kept the mean model.

## function (...) 
## SL.gam(..., deg.gam = 4, cts.num = 3)
## <bytecode: 0x0000000028f0f1b8>

## function (...) 
## SL.glmnet(..., alpha = 0)
## <bytecode: 0x000000001db0b200>

## function (...) 
## SL.ranger(..., mtry = 2, num.trees = 600)
## <bytecode: 0x000000002600d1c8>

Fitting the revised super learner model, the ridge model was given the most weight with 0.24, followed by the generlized additive model, ranger, generalized linear model, and mean.

SL.library2 <- c("SL.glm", "SL.gam_12", "SL.glmnet_1",  "SL.ranger_5", "SL.mean")
set.seed(904)
cv_stacked2 <- CV.SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library2,
                               verbose = F, cvControl = list(V = 5), method = "method.NNLS")

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

review_weights(cv_stacked2)

##                 mean(weight)        sd       min       max
## SL.glmnet_1_All    0.5478432 0.4228494 0.1127412 1.0000000
## SL.gam_12_All      0.1941200 0.2588027 0.0000000 0.5103122
## SL.ranger_5_All    0.1614616 0.1808809 0.0000000 0.4043823
## SL.glm_All         0.0965753 0.2159489 0.0000000 0.4828765
## SL.mean_All        0.0000000 0.0000000 0.0000000 0.0000000

The super learner model again, outperformed other models with the lowest risk of 0.152.

## 
## Call:  
## CV.SuperLearner(Y = Y_train, X = X_train, family = binomial(), SL.library = SL.library2,  
##     method = "method.NNLS", verbose = F, cvControl = list(V = 5)) 
## 
## Risk is based on: Mean Squared Error
## 
## All risk estimates are based on V =  5 
## 
##        Algorithm     Ave        se     Min     Max
##    Super Learner 0.15298 0.0094776 0.12740 0.19925
##      Discrete SL 0.15037 0.0092181 0.12748 0.19274
##       SL.glm_All 0.15183 0.0104358 0.12493 0.19716
##    SL.gam_12_All 0.15183 0.0104358 0.12493 0.19716
##  SL.glmnet_1_All 0.15037 0.0092181 0.12748 0.19274
##  SL.ranger_5_All 0.16308 0.0091416 0.13025 0.22119
##      SL.mean_All 0.22866 0.0067506 0.21055 0.24468

With the goal of improving the AUC of our super learner model, we now obtain an AUC of 0.8303, just slightly higher than that of our original generalized additive model.

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

## Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts
## argument ignored

A major limitation to super learner is the time constraint and computing power. Another limitation is that there is no capability to look into the components of the models through the super learner package. For example, I was unable to look at the coefficients for SL.glm(). Lastly, the optimization method for AUC did not work correctly as I kept obtaining errors, after searching online, it appeared that other users have run into the same issue. Overall, due to a statisticians toolbox having so many potential models in it, I found super learner to be a simple way to compare all the models.

References

Naimi, Ashley I., and Laura B. Balzer. 2018. “Stacked Generalization: An Introduction to Super Learning.” European Journal of Epidemiology 33 (5): 459–64. https://doi.org/10.1007/s10654-018-0390-z.

Polley, Eric, Erin LeDell, Chris Kennedy, and Mark van der Laan. 2018. SuperLearner: Super Learner Prediction. https://CRAN.R-project.org/package=SuperLearner.

Smith, Jack W., J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. 1988. “Using the Adap Learning Algorithm to Forecast the Onset of Diabetes Mellitus.” In Proceedings of the Annual Symposium on Computer Application in Medical Care, edited by NA, 261–65. American Medical Informatics Association.

Steinki, Oliver, and Ziad Mohammad. 2015. “Introduction to Ensemble Learning.” European Journal of Epidemiology. https://doi.org/10.2139/ssrn.2634092.

Vardeman, S. B. 2018. “Lecture Notes on Modern Multivariate Statistical Learning-Version Ii.” European Journal of Epidemiology, 109–12. http://www.analyticsiowa.com/wp-content/uploads/2018/07/StatLearningNotesII.pdf.