Applied Predictive Modeling

Measuring Performance in Classification Models

intro

In some applications, we would like to know the predicted class probabilities which we can then use for other calculations (e.g. Bayesian inference).

this demo will walk through a few visualization tools to measure the performance of classification models, i.e. models with a binary categorical predictor. here we will:

Simulate train & test datasets
Fit two models & predict train data
Predict test data
Compare the Sensitivity and Specificity of each model
Confusion Matrices for each Model
ROC Curves
Lift Curves
Asses quality of the class probabilities: Calibration Curves

dependencies:

library( dplyr )
library( ggplot2 )
library( gridExtra )
library( AppliedPredictiveModeling )
library( caret )
library( klaR )
library( MASS )
library( pROC )
library( randomForest )

simulate predictors and outcomes

quadBoundaryFunc() - The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: \(-1-2*X_1 -0.2*X_1^2 + 2*X_2^2\). The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.

set.seed( 123 )
simulatedTrain <- quadBoundaryFunc( 500 )
simulatedTest <- quadBoundaryFunc( 1000 )
glimpse( simulatedTrain )

## Rows: 500
## Columns: 4
## $ X1    <dbl> 1.0340633, 1.5449120, 1.3654937, 0.5154079, 2.1642139, 2.2733339…
## $ X2    <dbl> -0.9859200, -0.6781224, 2.5050483, 0.3712043, -0.3757418, 2.3080…
## $ prob  <dbl> 0.207854397, 0.025395130, 0.999785424, 0.140836503, 0.002515341,…
## $ class <fct> Class1, Class2, Class1, Class2, Class2, Class1, Class2, Class1, …

Fit Models & Predict Class for test data

Here we will fit two difference model classes to simulatedTrain

Model 1: Random Forest:

rfMod <- randomForest( class ~ X1 + X2, data = simulatedTrain, nTree = 2000 )

Model 2: Quadratic Discriminant:

qdaMod <- qda( class ~ X1 + X2, data = simulatedTrain )

Now to predict the classification of the train dataset with the models:

#predict train w/Mod1
rfTrain_Pred <- predict( rfMod, simulatedTrain, type = 'prob')
simulatedTrain$RFprob <- rfTrain_Pred[,'Class1']

#predict train w/Mod2
qdaTrain_Pred <- predict( qdaMod, simulatedTrain )
simulatedTrain$QDAprob <- qdaTrain_Pred$posterior[,'Class1']

Predict Class for test data

Now to predict the classification of the train dataset with the models:

#predict train w/Mod1
rfTest_Pred <- predict( rfMod, simulatedTest, type = 'prob')
simulatedTest$RFprob <- rfTest_Pred[,'Class1']
simulatedTest$RFclass <- predict( rfMod, simulatedTest )

#predict train w/Mod2
qdaTest_Pred <- predict( qdaMod, simulatedTest )
simulatedTest$QDAprob <- qdaTest_Pred$posterior[,'Class1']
simulatedTest$QDAclass <- qdaTest_Pred$class

glimpse( simulatedTest )

## Rows: 1,000
## Columns: 8
## $ X1       <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.345…
## $ X2       <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460, …
## $ prob     <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.9872035…
## $ class    <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…
## $ RFprob   <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.000…
## $ RFclass  <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…
## $ QDAprob  <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896, 0…
## $ QDAclass <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Class…

Sensitivity and Specificity

Sensitivity: the rate that the event of interest is predicted correctly
\[Sensitivity = \frac{\mbox{# pos samples that were predicted pos}}{\mbox{# pos samples}}\] Specificity: the rate that nonevents are predicted as nonevents
\[Specificity = \frac{\mbox{# neg samples that were predicted neg}}{\mbox{# neg samples}}\] False Positive rate = 1 - Specificity **

Use Class1 as the event of interest. I.e. in a signal detection theory approach, treat Class1 as a signal present event.

RF_sens <- sensitivity( data = simulatedTest$RFclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )
RF_spec <- specificity( data = simulatedTest$RFclass,
                        reference = simulatedTest$class,
                        negative = 'Class2' )
QDA_sens <- sensitivity( data = simulatedTest$QDAclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )
QDA_spec <- specificity( data = simulatedTest$QDAclass,
                        reference = simulatedTest$class,
                        negative = 'Class2' )

res <- paste( 'Random Forest: Sensitivity =', round( RF_sens, 4 ), 'Specificity =', round( RF_spec, 4 ),
              '\nQuadratic Discriminant: Sensitivity =', round( QDA_sens, 4 ), 'Specificity =', round( QDA_spec, 4 ) )

cat( res, sep = '\n' )

## Random Forest: Sensitivity = 0.7836 Specificity = 0.902 
## Quadratic Discriminant: Sensitivity = 0.7927 Specificity = 0.9483

sensitivity and specificity are conditional measures of accuracy. If we would like to take the prevalence into account, we can look at:
Positive Predicted Values: unconditional analogue of sensitivity \[PPV = \frac{ \mbox{ Sensitivity } \cdot \mbox{ Prevalence} }{ (\mbox{ Sensitivity } \cdot \mbox{ Prevalence}) + ( (1 -\mbox{ Sensitivity } ) \cdot ( 1 - \mbox{ Prevalence} )}\] Negative Predicted Value: unconditional analogue of specificity \[NPV = \frac{ \mbox{ Specificity } \cdot ( 1- \mbox{ Prevalence} ) }{ (\mbox{ Prevalence } \cdot (1-\mbox{ Sensitivity})) + ( \mbox{ Specificity } \cdot ( 1 - \mbox{ Prevalence} )}\]

RF_posPV <- posPredValue( data = simulatedTest$RFclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )
RF_negPV <- negPredValue( data = simulatedTest$RFclass,
                        reference = simulatedTest$class,
                        positive = 'Class2' )
QDA_posPV <- posPredValue( data = simulatedTest$QDAclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )
QDA_negPV <- negPredValue( data = simulatedTest$QDAclass,
                        reference = simulatedTest$class,
                        positive = 'Class2' )

res <- paste( 'Random Forest: Sensitivity =', round( RF_posPV, 4 ), 'Specificity =', round( RF_negPV, 4 ),
              '\nQuadratic Discriminant: Sensitivity =', round( QDA_posPV, 4 ), 'Specificity =', round( QDA_negPV, 4 ) )

cat( res, sep = '\n' )

## Random Forest: Sensitivity = 0.8622 Specificity = 0.8419 
## Quadratic Discriminant: Sensitivity = 0.9231 Specificity = 0.8539

Confusion Matrices

Model 1: Random Forest confusion matrix

confusionMatrix(  data = simulatedTest$RFclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Class1 Class2
##     Class1    344     55
##     Class2     95    506
##                                           
##                Accuracy : 0.85            
##                  95% CI : (0.8263, 0.8716)
##     No Information Rate : 0.561           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6924          
##                                           
##  Mcnemar's Test P-Value : 0.001451        
##                                           
##             Sensitivity : 0.7836          
##             Specificity : 0.9020          
##          Pos Pred Value : 0.8622          
##          Neg Pred Value : 0.8419          
##              Prevalence : 0.4390          
##          Detection Rate : 0.3440          
##    Detection Prevalence : 0.3990          
##       Balanced Accuracy : 0.8428          
##                                           
##        'Positive' Class : Class1          
##

Model 2: Quadratic Discriminant confusion matrix

confusionMatrix(  data = simulatedTest$QDAclass,
                        reference = simulatedTest$class,
                        positive = 'Class1' )

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Class1 Class2
##     Class1    348     29
##     Class2     91    532
##                                           
##                Accuracy : 0.88            
##                  95% CI : (0.8582, 0.8995)
##     No Information Rate : 0.561           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7526          
##                                           
##  Mcnemar's Test P-Value : 2.569e-08       
##                                           
##             Sensitivity : 0.7927          
##             Specificity : 0.9483          
##          Pos Pred Value : 0.9231          
##          Neg Pred Value : 0.8539          
##              Prevalence : 0.4390          
##          Detection Rate : 0.3480          
##    Detection Prevalence : 0.3770          
##       Balanced Accuracy : 0.8705          
##                                           
##        'Positive' Class : Class1          
##

ROC curves

Evaluating Cost Probabilities.
We can use class probabilities to compare models.

Receiver Operator Characteristic (ROC) curves evaluate the trade-off between sensitivity and specificity. These plots are a great tool for choosing a threshold that appropriately maximizes the trade-off between sensitivity and specificity.

Importantly, ROC curves can be used to quantitatively asses and compare models

roc_mod1 <- roc( response = simulatedTest$class, 
                       predictor = simulatedTest$RFprob )
roc_mod2 <- roc( response = simulatedTest$class, 
                       predictor = simulatedTest$QDAprob )
roc_plot <- plot( roc_mod1, 
                  print.auc = TRUE, col = "blue")
roc_plot <- plot( roc_mod2,
                  print.auc = TRUE, 
                 col = "green", print.auc.y = .4, add = TRUE)

roc_plot

## 
## Call:
## roc.default(response = simulatedTest$class, predictor = simulatedTest$QDAprob)
## 
## Data: simulatedTest$QDAprob in 439 controls (simulatedTest$class Class1) > 561 cases (simulatedTest$class Class2).
## Area under the curve: 0.9532

The Quadratic Discriminant (Model 2) has a slightly higher AUC. Before we interpret this as Model 2 having an classification edge of Model 1, let us inspect the confidence intervals for the two AUC values:

ci.auc( roc_mod1 )

## 95% CI: 0.9175-0.9472 (DeLong)

ci.auc( roc_mod2 )

## 95% CI: 0.9414-0.965 (DeLong)

Both AUC measures are outside of the other’s 95% CI, so the difference in AUCs between the two models is meaningful.

Lift Charts

Lift charts plot the cumulative gain/lift against the cumulative percentage of samples that have been screened

labs <- c( RFprob = "Random Forest",
           QDAprob = "Quadratic Discriminat" )
liftCurve <- lift( class ~ RFprob + QDAprob, 
                   data = simulatedTest,
                   labels = labs )
liftCurve

## 
## Call:
## lift.formula(x = class ~ RFprob + QDAprob, data = simulatedTest, labels = labs)
## 
## Models: Random Forest, Quadratic Discriminat 
## Event: Class1 (43.9%)

plot the 2 curves:

xyplot( liftCurve, 
        auto.key = list( columns = 2, lines = T, points = F ) )

The shaded region: the diagonal 45\(\deg\) long edge of the triangle represents performance of an uninformative model (chance) where as the upper two bounds of the triangle represent the performance of a perfect classifier. Here, we see that both models follow the envelope of the perfect classifier much more closely than the uninformative model. quadratic discriminant model seems to have a slight advantage over the random forest.

Calibration Probabilities

calCurve <- calibration( class ~ RFprob + QDAprob, data = simulatedTest )
xyplot( calCurve, auto.key = list( columns = 2 ) )

sigmoidalCal <- glm( relevel( class, ref = "Class2" ) ~ RFprob,
                     data = simulatedTrain,
                     family = binomial )

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

sigmoidProbs <- predict( sigmoidalCal,
                         newdata = simulatedTest[, "RFprob", drop = F ],
                         type = "response" )
simulatedTest$RFsigmoid <- sigmoidProbs

sigmoidalCal <- glm( relevel( class, ref = "Class2" ) ~ QDAprob,
                     data = simulatedTrain,
                     family = binomial )
sigmoidProbs <- predict( sigmoidalCal,
                         newdata = simulatedTest[, "QDAprob", drop = F ],
                         type = "response" )
simulatedTest$QDAsigmoid <- sigmoidProbs

glimpse( simulatedTest )

## Rows: 1,000
## Columns: 10
## $ X1         <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.3…
## $ X2         <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460…
## $ prob       <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.98720…
## $ class      <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFprob     <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.0…
## $ RFclass    <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ QDAprob    <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896,…
## $ QDAclass   <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFsigmoid  <dbl> 2.220446e-16, 2.220446e-16, 2.220446e-16, 2.220446e-16, 1.0…
## $ QDAsigmoid <dbl> 0.01788140, 0.05605607, 0.01529288, 0.01332735, 0.96582349,…

use a naive Bayes approach for calibration

BayesCal_RF <- NaiveBayes( class ~ RFprob, data = simulatedTrain,
                            usekernel = T )
BayesProbs_RF <- predict( BayesCal_RF,
                           newdata = simulatedTest[, "RFprob", drop = F ],
                           type = "response" )
simulatedTest$RFBayes <- BayesProbs_RF$posterior[, "Class1"]

BayesCal_QDA <- NaiveBayes( class ~ QDAprob, data = simulatedTrain,
                            usekernel = T )
BayesProbs_QDA <- predict( BayesCal_QDA,
                           newdata = simulatedTest[, "QDAprob", drop = F ],
                           type = "response" )
simulatedTest$QDABayes <- BayesProbs_QDA$posterior[, "Class1"]

glimpse( simulatedTest )

## Rows: 1,000
## Columns: 12
## $ X1         <dbl> 2.6307086, 1.5170728, 2.3096284, 2.6816979, 1.3504838, -0.3…
## $ X2         <dbl> 0.97793876, 0.81016180, 0.07525156, -0.19835474, 2.05078460…
## $ prob       <dbl> 0.003227501, 0.039859327, 0.001260724, 0.000442332, 0.98720…
## $ class      <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFprob     <dbl> 0.016, 0.006, 0.000, 0.000, 0.968, 0.782, 0.876, 0.002, 0.0…
## $ RFclass    <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ QDAprob    <dbl> 0.08862341, 0.20612883, 0.07281945, 0.05894790, 0.81889896,…
## $ QDAclass   <fct> Class2, Class2, Class2, Class2, Class1, Class1, Class1, Cla…
## $ RFsigmoid  <dbl> 2.220446e-16, 2.220446e-16, 2.220446e-16, 2.220446e-16, 1.0…
## $ QDAsigmoid <dbl> 0.01788140, 0.05605607, 0.01529288, 0.01332735, 0.96582349,…
## $ RFBayes    <dbl> 6.697126e-05, 5.781845e-05, 6.268862e-05, 6.268862e-05, 9.9…
## $ QDABayes   <dbl> 0.01737412, 0.07520965, 0.01374626, 0.01103307, 0.99665493,…

visualize calibration curves:

calCurve2 <- calibration( class ~ RFprob + RFBayes + RFsigmoid, data = simulatedTest )
p1 <- xyplot( calCurve2, auto.key = list( columns = 2 ) )
calCurve2 <- calibration( class ~ QDAprob + QDABayes + QDAsigmoid, data = simulatedTest )
p2 <- xyplot( calCurve2, auto.key = list( columns = 2 ) )

grid.arrange( p1, p2, ncol = 2 )

Something is fishy with the outcome for Random Forest here. But we can see for the Quadratic Discriminant that the Bayes and Sigmoid recalibrations improves predictions (these data point plot close to the unity line).