intro
In some applications, we would like to know the predicted class probabilities which we can then use for other calculations (e.g. Bayesian inference).
this demo will walk through a few visualization tools to measure the performance of classification models, i.e. models with a binary categorical predictor. here we will:
- Simulate train & test datasets
- Fit two models & predict train data
- Predict test data
- Compare the Sensitivity and Specificity of each model
- Confusion Matrices for each Model
- ROC Curves
- Lift Curves
- Asses quality of the class probabilities: Calibration Curves
dependencies:
library( dplyr )
library( ggplot2 )
library( gridExtra )
library( AppliedPredictiveModeling )
library( caret )
library( klaR )
library( MASS )
library( pROC )
library( randomForest )simulate predictors and outcomes
quadBoundaryFunc() - The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: \(-1-2*X_1 -0.2*X_1^2 + 2*X_2^2\). The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.
set.seed( 123 )
simulatedTrain <- quadBoundaryFunc( 500 )
simulatedTest <- quadBoundaryFunc( 1000 )
glimpse( simulatedTrain )## Rows: 500
## Columns: 4
## $ X1 <dbl> 1.0340633, 1.5449120, 1.3654937, 0.5154079, 2.1642139, 2.2733339…
## $ X2 <dbl> -0.9859200, -0.6781224, 2.5050483, 0.3712043, -0.3757418, 2.3080…
## $ prob <dbl> 0.207854397, 0.025395130, 0.999785424, 0.140836503, 0.002515341,…
## $ class <fct> Class1, Class2, Class1, Class2, Class2, Class1, Class2, Class1, …
Fit Models & Predict Class for test data
Here we will fit two difference model classes to simulatedTrain
Model 1: Random Forest:
rfMod <- randomForest( class ~ X1 + X2, data = simulatedTrain, nTree = 2000 )Model 2: Quadratic Discriminant:
qdaMod <- qda( class ~ X1 + X2, data = simulatedTrain )Now to predict the classification of the train dataset with the models:
#predict train w/Mod1
rfTrain_Pred <- predict( rfMod, simulatedTrain, type = 'prob')
simulatedTrain$RFprob <- rfTrain_Pred[,'Class1']
#predict train w/Mod2
qdaTrain_Pred <- predict( qdaMod, simulatedTrain )
simulatedTrain$QDAprob <- qdaTrain_Pred$posterior[,'Class1']Predict Class for test data
Now to predict the classification of the train dataset with the models:
#predict train w/Mod1
rfTest_Pred <- predict( rfMod, simulatedTest, type = 'prob')
simulatedTest$RFprob <- rfTest_Pred[,'Class1']
simulatedTest$RFclass <- predict( rfMod, simulatedTest )
#predict train w/Mod2
qdaTest_Pred <- predict( qdaMod, simulatedTest )
simulatedTest$QDAprob <- qdaTest_Pred$posterior[,'Class1']
simulatedTest$QDAclass <- qdaTest_Pred$class
glimpse( simulatedTest )## Rows: 1,000
## Columns: 8
## $ X1 <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.345…
## $ X2 <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460, …
## $ prob <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.9872035…
## $ class <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…
## $ RFprob <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.000…
## $ RFclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…
## $ QDAprob <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896, 0…
## $ QDAclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…
Sensitivity and Specificity
Sensitivity: the rate that the event of interest is predicted correctly
\[Sensitivity = \frac{\mbox{# pos samples that were predicted pos}}{\mbox{# pos samples}}\] Specificity: the rate that nonevents are predicted as nonevents
\[Specificity = \frac{\mbox{# neg samples that were predicted neg}}{\mbox{# neg samples}}\] False Positive rate = 1 - Specificity **
Use Class1 as the event of interest. I.e. in a signal detection theory approach, treat Class1 as a signal present event.
RF_sens <- sensitivity( data = simulatedTest$RFclass,
reference = simulatedTest$class,
positive = 'Class1' )
RF_spec <- specificity( data = simulatedTest$RFclass,
reference = simulatedTest$class,
negative = 'Class2' )
QDA_sens <- sensitivity( data = simulatedTest$QDAclass,
reference = simulatedTest$class,
positive = 'Class1' )
QDA_spec <- specificity( data = simulatedTest$QDAclass,
reference = simulatedTest$class,
negative = 'Class2' )
res <- paste( 'Random Forest: Sensitivity =', round( RF_sens, 4 ), 'Specificity =', round( RF_spec, 4 ),
'\nQuadratic Discriminant: Sensitivity =', round( QDA_sens, 4 ), 'Specificity =', round( QDA_spec, 4 ) )
cat( res, sep = '\n' )## Random Forest: Sensitivity = 0.7836 Specificity = 0.902
## Quadratic Discriminant: Sensitivity = 0.7927 Specificity = 0.9483
sensitivity and specificity are conditional measures of accuracy. If we would like to take the prevalence into account, we can look at:
Positive Predicted Values: unconditional analogue of sensitivity \[PPV = \frac{ \mbox{ Sensitivity } \cdot \mbox{ Prevalence} }{ (\mbox{ Sensitivity } \cdot \mbox{ Prevalence}) + ( (1 -\mbox{ Sensitivity } ) \cdot ( 1 - \mbox{ Prevalence} )}\] Negative Predicted Value: unconditional analogue of specificity \[NPV = \frac{ \mbox{ Specificity } \cdot ( 1- \mbox{ Prevalence} ) }{ (\mbox{ Prevalence } \cdot (1-\mbox{ Sensitivity})) + ( \mbox{ Specificity } \cdot ( 1 - \mbox{ Prevalence} )}\]
RF_posPV <- posPredValue( data = simulatedTest$RFclass,
reference = simulatedTest$class,
positive = 'Class1' )
RF_negPV <- negPredValue( data = simulatedTest$RFclass,
reference = simulatedTest$class,
positive = 'Class2' )
QDA_posPV <- posPredValue( data = simulatedTest$QDAclass,
reference = simulatedTest$class,
positive = 'Class1' )
QDA_negPV <- negPredValue( data = simulatedTest$QDAclass,
reference = simulatedTest$class,
positive = 'Class2' )
res <- paste( 'Random Forest: Sensitivity =', round( RF_posPV, 4 ), 'Specificity =', round( RF_negPV, 4 ),
'\nQuadratic Discriminant: Sensitivity =', round( QDA_posPV, 4 ), 'Specificity =', round( QDA_negPV, 4 ) )
cat( res, sep = '\n' )## Random Forest: Sensitivity = 0.8622 Specificity = 0.8419
## Quadratic Discriminant: Sensitivity = 0.9231 Specificity = 0.8539
Confusion Matrices
Model 1: Random Forest confusion matrix
confusionMatrix( data = simulatedTest$RFclass,
reference = simulatedTest$class,
positive = 'Class1' )## Confusion Matrix and Statistics
##
## Reference
## Prediction Class1 Class2
## Class1 344 55
## Class2 95 506
##
## Accuracy : 0.85
## 95% CI : (0.8263, 0.8716)
## No Information Rate : 0.561
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6924
##
## Mcnemar's Test P-Value : 0.001451
##
## Sensitivity : 0.7836
## Specificity : 0.9020
## Pos Pred Value : 0.8622
## Neg Pred Value : 0.8419
## Prevalence : 0.4390
## Detection Rate : 0.3440
## Detection Prevalence : 0.3990
## Balanced Accuracy : 0.8428
##
## 'Positive' Class : Class1
##
Model 2: Quadratic Discriminant confusion matrix
confusionMatrix( data = simulatedTest$QDAclass,
reference = simulatedTest$class,
positive = 'Class1' )## Confusion Matrix and Statistics
##
## Reference
## Prediction Class1 Class2
## Class1 348 29
## Class2 91 532
##
## Accuracy : 0.88
## 95% CI : (0.8582, 0.8995)
## No Information Rate : 0.561
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7526
##
## Mcnemar's Test P-Value : 2.569e-08
##
## Sensitivity : 0.7927
## Specificity : 0.9483
## Pos Pred Value : 0.9231
## Neg Pred Value : 0.8539
## Prevalence : 0.4390
## Detection Rate : 0.3480
## Detection Prevalence : 0.3770
## Balanced Accuracy : 0.8705
##
## 'Positive' Class : Class1
##
ROC curves
Evaluating Cost Probabilities.
We can use class probabilities to compare models.
Receiver Operator Characteristic (ROC) curves evaluate the trade-off between sensitivity and specificity. These plots are a great tool for choosing a threshold that appropriately maximizes the trade-off between sensitivity and specificity.
Importantly, ROC curves can be used to quantitatively asses and compare models
roc_mod1 <- roc( response = simulatedTest$class,
predictor = simulatedTest$RFprob )
roc_mod2 <- roc( response = simulatedTest$class,
predictor = simulatedTest$QDAprob )
roc_plot <- plot( roc_mod1,
print.auc = TRUE, col = "blue")
roc_plot <- plot( roc_mod2,
print.auc = TRUE,
col = "green", print.auc.y = .4, add = TRUE)roc_plot##
## Call:
## roc.default(response = simulatedTest$class, predictor = simulatedTest$QDAprob)
##
## Data: simulatedTest$QDAprob in 439 controls (simulatedTest$class Class1) > 561 cases (simulatedTest$class Class2).
## Area under the curve: 0.9532
The Quadratic Discriminant (Model 2) has a slightly higher AUC. Before we interpret this as Model 2 having an classification edge of Model 1, let us inspect the confidence intervals for the two AUC values:
ci.auc( roc_mod1 )## 95% CI: 0.9175-0.9472 (DeLong)
ci.auc( roc_mod2 )## 95% CI: 0.9414-0.965 (DeLong)
Both AUC measures are outside of the other’s 95% CI, so the difference in AUCs between the two models is meaningful.
Lift Charts
Lift charts plot the cumulative gain/lift against the cumulative percentage of samples that have been screened
labs <- c( RFprob = "Random Forest",
QDAprob = "Quadratic Discriminat" )
liftCurve <- lift( class ~ RFprob + QDAprob,
data = simulatedTest,
labels = labs )
liftCurve##
## Call:
## lift.formula(x = class ~ RFprob + QDAprob, data = simulatedTest, labels = labs)
##
## Models: Random Forest, Quadratic Discriminat
## Event: Class1 (43.9%)
plot the 2 curves:
xyplot( liftCurve,
auto.key = list( columns = 2, lines = T, points = F ) ) The shaded region: the diagonal 45\(\deg\) long edge of the triangle represents performance of an uninformative model (chance) where as the upper two bounds of the triangle represent the performance of a perfect classifier. Here, we see that both models follow the envelope of the perfect classifier much more closely than the uninformative model. quadratic discriminant model seems to have a slight advantage over the random forest.
Calibration Probabilities
calCurve <- calibration( class ~ RFprob + QDAprob, data = simulatedTest )
xyplot( calCurve, auto.key = list( columns = 2 ) )sigmoidalCal <- glm( relevel( class, ref = "Class2" ) ~ RFprob,
data = simulatedTrain,
family = binomial )## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
sigmoidProbs <- predict( sigmoidalCal,
newdata = simulatedTest[, "RFprob", drop = F ],
type = "response" )
simulatedTest$RFsigmoid <- sigmoidProbs
sigmoidalCal <- glm( relevel( class, ref = "Class2" ) ~ QDAprob,
data = simulatedTrain,
family = binomial )
sigmoidProbs <- predict( sigmoidalCal,
newdata = simulatedTest[, "QDAprob", drop = F ],
type = "response" )
simulatedTest$QDAsigmoid <- sigmoidProbs
glimpse( simulatedTest )## Rows: 1,000
## Columns: 10
## $ X1 <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.3…
## $ X2 <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460…
## $ prob <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.98720…
## $ class <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFprob <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.0…
## $ RFclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ QDAprob <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896,…
## $ QDAclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFsigmoid <dbl> 2.220446e-16, 2.220446e-16, 2.220446e-16, 2.220446e-16, 1.0…
## $ QDAsigmoid <dbl> 0.01788140, 0.05605607, 0.01529288, 0.01332735, 0.96582349,…
use a naive Bayes approach for calibration
BayesCal_RF <- NaiveBayes( class ~ RFprob, data = simulatedTrain,
usekernel = T )
BayesProbs_RF <- predict( BayesCal_RF,
newdata = simulatedTest[, "RFprob", drop = F ],
type = "response" )
simulatedTest$RFBayes <- BayesProbs_RF$posterior[, "Class1"]
BayesCal_QDA <- NaiveBayes( class ~ QDAprob, data = simulatedTrain,
usekernel = T )
BayesProbs_QDA <- predict( BayesCal_QDA,
newdata = simulatedTest[, "QDAprob", drop = F ],
type = "response" )
simulatedTest$QDABayes <- BayesProbs_QDA$posterior[, "Class1"]
glimpse( simulatedTest )## Rows: 1,000
## Columns: 12
## $ X1 <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.3…
## $ X2 <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460…
## $ prob <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.98720…
## $ class <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFprob <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.0…
## $ RFclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ QDAprob <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896,…
## $ QDAclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFsigmoid <dbl> 2.220446e-16, 2.220446e-16, 2.220446e-16, 2.220446e-16, 1.0…
## $ QDAsigmoid <dbl> 0.01788140, 0.05605607, 0.01529288, 0.01332735, 0.96582349,…
## $ RFBayes <dbl> 6.697126e-05, 5.781845e-05, 6.268862e-05, 6.268862e-05, 9.9…
## $ QDABayes <dbl> 0.01737412, 0.07520965, 0.01374626, 0.01103307, 0.99665493,…
visualize calibration curves:
calCurve2 <- calibration( class ~ RFprob + RFBayes + RFsigmoid, data = simulatedTest )
p1 <- xyplot( calCurve2, auto.key = list( columns = 2 ) )
calCurve2 <- calibration( class ~ QDAprob + QDABayes + QDAsigmoid, data = simulatedTest )
p2 <- xyplot( calCurve2, auto.key = list( columns = 2 ) )
grid.arrange( p1, p2, ncol = 2 )Something is fishy with the outcome for Random Forest here. But we can see for the Quadratic Discriminant that the Bayes and Sigmoid recalibrations improves predictions (these data point plot close to the unity line).