In this chapter, you’ll fit classification models with train() and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).
What is the point of making a train/test split for binary classification problems?
Answer the question
50 XP
Possible Answers
To make the problem harder for the model by reducing the dataset size.
To evaluate your models out-of-sample, on new data. [ans]
To reduce the dataset size, so your models fit faster.
There is no real reason; it is no different than evaluating your models in-sample.
As you saw in the video, you’ll be working with the Sonar dataset in this chapter, using a 60% training set and a 40% test set. We’ll practice making a train/test split one more time, just to be sure you have the hang of it. Recall that you can use the sample() function to get a random permutation of the row indices in a dataset, to use when making train/test splits, e.g.:
rows <- sample(nrow(my_data)) And then use those row indices to randomly reorder the dataset, e.g.:
my_data <- my_data[rows, ] Once your dataset is randomly ordered, you can split off the first 60% as a training set and the last 40% as a test set.
Instructions
100 XP
Shuffle the row indices of Sonar and store the result in rows.
Use rows to randomly reorder the rows of Sonar.
Identify the proper row to split on for a 60/40 split. Store this row number as split.
Save the first 60% as a training set.
Save the last 40% as the test set.
install.packages("miceadds")
Sonar<-miceadds::load.Rdata2('Sonar.RData', path=getwd())
head(Sonar)
# Shuffle row indices: rows
rows<-sample(nrow(Sonar))
# Randomly order data: Sonar
Sonar<-Sonar[rows, ]
# Identify row to split on: split
split <- round(nrow(Sonar) * .60)
# Create train
train<-Sonar[1:split,]
# Create test
test<-Sonar[(split+1):nrow(Sonar),]
Once you have your random training and test sets you can fit a logistic regression model to your training set using the glm() function. glm() is a more advanced version of lm() that allows for more varied types of regression models, aside from plain vanilla ordinary least squares regression.
Be sure to pass the argument family = “binomial” to glm() to specify that you want to do logistic (rather than linear) regression. For example:
glm(Target ~ ., family = “binomial”, dataset) Don’t worry about warnings like glm.fit: algorithm did not converge or glm.fit: fitted probabilities numerically 0 or 1 occurred. These are common on smaller datasets and usually don’t cause any issues. They typically mean your dataset is perfectly separable, which can cause problems for the math behind the model, but R’s glm() function is almost always robust enough to handle this case with no problems.
Once you have a glm() model fit to your dataset, you can predict the outcome (e.g. rock or mine) on the test set using the predict() function with the argument type = “response”:
predict(my_model, test, type = “response”)
Instructions
100 XP
Fit a logistic regression called model to predict Class using all other variables as predictors. Use the training set for Sonar.
Predict on the test set using that model. Call the result p like you’ve done before.
# Fit glm model: model
model<-glm(Class~.,family="binomial", train)
# Predict on test: p
p<-predict(model,test,type="response")
prediction from a rank-deficient fit may be misleading
round(p)
3 13 179 158 19 105 124 206 48 80 116 45 160 10 133 57 114 197 127 143 202 12 47 175 153
0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0
207 166 112 18 42 162 66 113
0 1 0 1 1 1 1 0
As you saw in the video, a confusion matrix is a very useful tool for calibrating the output of a model and examining all possible outcomes of your predictions (true positive, true negative, false positive, false negative).
Before you make your confusion matrix, you need to “cut” your predicted probabilities at a given threshold to turn probabilities into a factor of class predictions. Combine ifelse() with factor() as follows:
pos_or_neg <- ifelse(probability_prediction > threshold, positive_class, negative_class) p_class <- factor(pos_or_neg, levels = levels(test_values)) confusionMatrix() in caret improves on table() from base R by adding lots of useful ancillary statistics in addition to the base rates in the table. You can calculate the confusion matrix (and the associated statistics) using the predicted outcomes as well as the actual outcomes, e.g.:
confusionMatrix(p_class, test_values)
Instructions
Use ifelse() to create a character vector, m_or_r that is the positive class, “M”, when p is greater than 0.5, and the negative class, “R”, otherwise.
Convert m_or_r to be a factor, p_class, with levels the same as those of test[[“Class”]].
Make a confusion matrix with confusionMatrix(), passing p_class and the “Class” column from the test dataset.
# If p exceeds threshold of 0.5, M else R: m_or_r
m_or_r <- ifelse(p > 0.5, "M", "R")
# Convert to factor: p_class
p_class <- factor(m_or_r, levels = levels(test[["Class"]]))
# Create confusion matrix
confusionMatrix(p_class, test[["Class"]])
Confusion Matrix and Statistics
Reference
Prediction M R
M 11 5
R 9 8
Accuracy : 0.5758
95% CI : (0.3922, 0.7452)
No Information Rate : 0.6061
P-Value [Acc > NIR] : 0.7063
Kappa : 0.1569
Mcnemar's Test P-Value : 0.4227
Sensitivity : 0.5500
Specificity : 0.6154
Pos Pred Value : 0.6875
Neg Pred Value : 0.4706
Prevalence : 0.6061
Detection Rate : 0.3333
Detection Prevalence : 0.4848
Balanced Accuracy : 0.5827
'Positive' Class : M
Use confusionMatrix(p_class, test[[“Class”]]) to calculate a confusion matrix on the test set.
Ans: [57.58%~0.5758]
Ans: [55%~0.55]
Ans: [61.54%~0.6154]
# Define Positive (+) to be "M", Negative (-) "R"
############################################################################
# Reference (Actual)
# Prediction M , R
# M(+) TP= 11 , FP=5
# R(-) FN= 9 , TN=8
###########################################################################
# 1. Test set Accuracy rate
# = TP+TN/Total~0.5758
sprintf("Test set Accuracy rate : %f ", (11+8)/(11+5+9+8))
[1] "Test set Accuracy rate : 0.575758 "
#########################################################################
# 2. test set true positive rate (or sensitivity) of this model
# i.e. the proportion of Actual (M) Positives classified correctly out of ALL Actual Positives (M)
# = TP/(TP+FN)~0.55
sprintf("Test set true positive rate (sensitivity): %f ",(11)/(11+9))
[1] "Test set true positive rate (sensitivity): 0.550000 "
###########################################################################
# 3. test set true negative rate (or specificity) of this model,
# i.e. the proportion of Actual (R) Negatives classified correctly out of ALL Actual Negatives (R)
# =TN/(TN+FP)~0.6154
sprintf("Test set true negative rate (specificity): %f ", 8/(8+5))
[1] "Test set true negative rate (specificity): 0.615385 "
In the previous exercises, you used a threshold of 0.50 to cut your predicted probabilities to make class predictions (rock vs mine). However, this classification threshold does not always align with the goals for a given modeling problem.
For example, pretend you want to identify the objects you are really certain are mines. In this case, you might want to use a probability threshold of 0.90 to get fewer predicted mines, but with greater confidence in each prediction.
The code pattern for cutting probabilities into predicted classes, then calculating a confusion matrix, was shown in Exercise 7 of this chapter.
Instructions
100 XP
Use ifelse() to create a character vector, m_or_r that is the positive class, “M”, when p is greater than 0.9, and the negative class, “R”, otherwise.
Convert m_or_r to be a factor, p_class, with levels the same as those of test[[“Class”]].
Make a confusion matrix with confusionMatrix(), passing p_class and the “Class” column from the test dataset.
# If p exceeds threshold of 0.9, M else R: m_or_r
m_or_r<-ifelse(p>0.99,"M","R")
# Convert to factor: p_class
p_class<-factor(m_or_r, levels = levels(test[["Class"]]))
# Create confusion matrix
confusionMatrix(p_class, test[["Class"]])
Confusion Matrix and Statistics
Reference
Prediction M R
M 11 5
R 9 8
Accuracy : 0.5758
95% CI : (0.3922, 0.7452)
No Information Rate : 0.6061
P-Value [Acc > NIR] : 0.7063
Kappa : 0.1569
Mcnemar's Test P-Value : 0.4227
Sensitivity : 0.5500
Specificity : 0.6154
Pos Pred Value : 0.6875
Neg Pred Value : 0.4706
Prevalence : 0.6061
Detection Rate : 0.3333
Detection Prevalence : 0.4848
Balanced Accuracy : 0.5827
'Positive' Class : M
Remarks: Note that there are (slightly) fewer predicted mines with this higher threshold
Conversely, say you want to be really certain that your model correctly identifies all the mines as mines. In this case, you might use a prediction threshold of 0.10, instead of 0.90.
The code pattern for cutting probabilities into predicted classes, then calculating a confusion matrix, was shown in Exercise 7 of this chapter.
Instructions
100 XP
Use ifelse() to create a character vector, m_or_r that is the positive class, “M”, when p is greater than 0.1, and the negative class, “R”, otherwise.
Convert m_or_r to be a factor, p_class, with levels the same as those of test[[“Class”]].
Make a confusion matrix with confusionMatrix(), passing p_class and the “Class” column from the test dataset.
# If p exceeds threshold of 0.9, M else R: m_or_r
m_or_r<-ifelse(p>0.1,"M","R")
# Convert to factor: p_class
p_class<-factor(m_or_r, levels = levels(test[["Class"]]))
# Create confusion matrix
confusionMatrix(p_class, test[["Class"]])
Confusion Matrix and Statistics
Reference
Prediction M R
M 11 5
R 9 8
Accuracy : 0.5758
95% CI : (0.3922, 0.7452)
No Information Rate : 0.6061
P-Value [Acc > NIR] : 0.7063
Kappa : 0.1569
Mcnemar's Test P-Value : 0.4227
Sensitivity : 0.5500
Specificity : 0.6154
Pos Pred Value : 0.6875
Neg Pred Value : 0.4706
Prevalence : 0.6061
Detection Rate : 0.3333
Detection Prevalence : 0.4848
Balanced Accuracy : 0.5827
'Positive' Class : M
Remark: Note that there are (slightly) more predicted mines with this lower threshold
What is the primary value of an ROC curve?
Answer the question
50 XP
Possible Answers
It has a cool acronym.
It can be used to determine the true positive and false positive rates for a particular classification threshold.
It evaluates all possible thresholds for splitting predicted probabilities into predicted classes.
As you saw in the video, an ROC curve is a really useful shortcut for summarizing the performance of a classifier over all possible thresholds. This saves you a lot of tedious work computing class predictions for many different thresholds and examining the confusion matrix for each.
My favorite package for computing ROC curves is caTools, which contains a function called colAUC(). This function is very user-friendly and can actually calculate ROC curves for multiple predictors at once. In this case, you only need to calculate the ROC curve for one predictor, e.g.:
colAUC(predicted_probabilities, actual, plotROC = TRUE) The function will return a score called AUC (more on that later) and the plotROC = TRUE argument will return the plot of the ROC curve for visual inspection.
Instructions
100 XP
model, test, and train from the last exercise using the sonar data are loaded in your workspace.
Predict probabilities (i.e. type = “response”) on the test set, then store the result as p.
Make an ROC curve using the predicted test set probabilities.
# Predict on test: p
p<-predict(model,test,type="response")
prediction from a rank-deficient fit may be misleading
# Make ROC curve
colAUC(p, test[["Class"]], plotROC = TRUE)
[,1]
M vs. R 0.5538462
As you saw in the video, area under the ROC curve is a very useful, single-number summary of a model’s ability to discriminate the positive from the negative class (e.g. mines from rocks). An AUC of 0.5 is no better than random guessing, an AUC of 1.0 is a perfectly predictive model, and an AUC of 0.0 is perfectly anti-predictive (which rarely happens).
This is often a much more useful metric than simply ranking models by their accuracy at a set threshold, as different models might require different calibration steps (looking at a confusion matrix at each step) to find the optimal classification threshold for that model.
You can use the trainControl() function in caret to use AUC (instead of acccuracy), to tune the parameters of your models. The twoClassSummary() convenience function allows you to do this easily.
When using twoClassSummary(), be sure to always include the argument classProbs = TRUE or your model will throw an error! (You cannot calculate AUC with just class predictions. You need to have class probabilities as well.)
Instructions
100 XP
Customize the trainControl object to use twoClassSummary rather than defaultSummary.
Use 10-fold cross-validation.
Be sure to tell trainControl() to return class probabilities.
# Create trainControl object: myControl
myControl <- trainControl(
method = "cv",
number = 10,
summaryFunction = twoClassSummary,
classProbs = TRUE, # IMPORTANT!
verboseIter = TRUE
)
Now that you have a custom trainControl object, it’s easy to fit caret models that use AUC rather than accuracy to tune and evaluate the model. You can just pass your custom trainControl object to the train() function via the trControl argument, e.g.:
train(
Instructions
100 XP
Use train() to fit a glm model (i.e. method = “glm”) to Sonar using your custom trainControl object, myControl. You want to predict Class from all other variables in the data (i.e. Class ~ .). Save the result to model.
Print the model to the console and examine its output.
# Train glm with custom trainControl: model
model<-train(method="glm",data=Sonar,Class~.,trControl=myControl)
The metric "Accuracy" was not in the result set. ROC will be used instead.
+ Fold01: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold01: parameter=none
+ Fold02: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold02: parameter=none
+ Fold03: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold03: parameter=none
+ Fold04: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold04: parameter=none
+ Fold05: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold05: parameter=none
+ Fold06: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold06: parameter=none
+ Fold07: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold07: parameter=none
+ Fold08: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold08: parameter=none
+ Fold09: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold09: parameter=none
+ Fold10: parameter=none
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
- Fold10: parameter=none
Aggregating results
Fitting final model on full training set
glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred
# Print model to console
model
Generalized Linear Model
83 samples
60 predictors
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 74, 75, 76, 75, 74, 74, ...
Resampling results:
ROC Sens Spec
0.588125 0.665 0.55
Remark: For now, don’t worry about the warning messages generated by your code.