Overview
My last Blog looked into evaluation logit regression coefficients and the concept of Log Odds. This post will dive into evaluation logit regression models. I’m sticking with the classification model theme because I expect to be using this tool frequently - so mastering it is essential.
Classification models place outcomes into one of two or more categories. The Confusion Matrix
is a popular tool to evaluate classification model performance. I’m going to use a sample scenario to introduce the confusion matrix and explore how to use it to evaluate my horse racing classification models.
Example Scenario
I want to classify if a horse will improve in it's upcoming race.
Here are the steps I’d take to perform this analysis.
1. Read in Horse Racing Past Performance Data.
2. Split data into training and test sets
3. Build and fit the logistic regression model
4. Make predictions on the training and test sets
Horse Improvement Classification Results
sample <- improveTest[c(7,21, 35, 49), c('improve','pred')]
print(sample)
## improve pred
## no 0.38565
## no 0.303246227
## yes 0.5800498077
## no 0.0006846551
## yes 0.51434345
# improve column provide predicted class - improve or not improve
# pred column provide the predicted probability that a horse will improve if the probability is
# is greater than 50% the horse is classified to improve
The Confusion Matrix
Here’s how you read the table in six easy steps:
1. The rows of the table (labeled truth) correspond to the label of the datum: whether the horse improve (yes) or not (no).
2. The columns of the table (labeled prediction) correspond to the prediction which the model makes.
3. The first cell of the table (truth = “no” and prediction = “no”) corresponds to the 264 horses in the test set which did not improve, and that the model (correctly) predicts the lack of improvement.
4. The second row of the table (truth = “yes” and prediction = “no”) corresponds to the 54 horses in the test set which did improve, and that the model (incorrectly) predicted = “no”.
5. The first row and second column (truth = no and predict = yes) corresponds to the 14 horses in the test set which did not improved but the model (incorrectly) predicted they would.
6. Finally, the second row and second column (truth = yes and predict = yes) corresponds to the 158 horse in the test set which did improve and the model (correctly) predicted they would.
Every cell in a 2 by 2 Confusion Matrix has a special name:
Truth =No |
True No (264) |
False Yes(14) |
Truth =Yes |
False No (54) |
True Yes (158) |
With an understanding of these concepts, we’re ready to start calculating some of the performance measures that make the confusion matrix so valuable and that will enable me to evaluate my horse racing models.
Accuracy
Truth =No |
True NO |
False Yes |
Truth =Yes |
False No |
True Yes |
###Accuracy = (True No + True Yes) / (True No + True Yes + False No + False Yes)
In words, Accuracy is the fraction of correct responses to total responses. Accuracy answers the question: When the model says the horse will improve or not improve, what’s the probability of that’s it correct. In our example the accuracy would be calculated as follows:
(cm_improve[1,1] + cm_improve[2,2]) / sum(cm_improve)
## [1] 0.9213974
An error rate of 8% in horse racing would be an Hall of Fame Performance. Let’s hope my actual results measure up.
Precision And Recall
The Confusion Matrix doesn’t stop with Accuracy. Another evaluation measure is called Precision and Recall. Precision answers the question “If the model says this horse will improve, what’s the probability that it does” Therefore we define Precision as the ratio of true Yes to predicted Yes. It’s calculation is set forth below:
cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[1,2])
## [1] 0.9186047
# Prediction = True Yes / (True Yes + False Yes)
# Precision is about the second column of the Confusion Matrix
In this case Precision is close to Accuracy - That’s not always the case. Coming into the Blog Post, I thought Accuracy was the most important metric in evaluating a model. Now, as I think about it, as a Horse Player high precision would actually be more important to me. Precision tells how accurate the model is at predicting improvement. If you assume a horse stand a better chance to win when she improves you would value high precision over high accuracy - Nice, I learned something.
The comparison metric to precision is Recall. Recall answers the question "Of all the horses that improved, what fraction did the model get right. Recall is about the second row of the confusion matrix (False No and True Yes). Here’s how we calculate it:
cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[2,1])
## [1] 0.877
# Prediction = True Yes / (True Yes + False No)
# Precision is about the second column of the Confusion Matrix
As a Horse Player what does an 87.7% recall mean to me. I believe it means that the model is missing approximately 12% of the horses that improve.
You can think of Precision as a measure of confirmation (when the model indicates improvement, how often is it correct). Recall is more about productivity or utility - Of all the opportunities to identify an improving horse how much will the model get right. One minus recall represents lost wagering opportunities.
So now I’ve learned about Accuracy, Precision and Recall. Suppose I’ve developed several different models to predict improvement. Each model has a different combination Accuracy, Precision and Recall. How do I pick the best model?
F1 Score
One good way to pick the best model is the F1 score. The F1 score measures a trade off between precision and recall. It’s defined as the harmonic mean of the precision and recall. Here’s the calculation:
precision <- cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[1,2])
recall <- cm_improve[2,2] / (cm_improve[2,2] + cm_improve[2,1])
(F1 <- 2 * precision * recall / (precision + recall) )
## [1] 0.897
Our Horse improvement model has an F1 score of approximately 0.9 - What does it mean? F1 will by 1.0 when a model has perfect Precision and Recall. Suppose you are using your model and determine that you are losing to many bets (because you thought the horse would improve). To improve this situation, you would try to improve Precision. Often when you improve Precision its at the cost of lower Recall. If Recall falls too much F1 may also fall. When this happens, it means you traded away too much Recall for the Precision you gained.