Overview

My last Blog looked into evaluation logit regression coefficients and the concept of Log Odds. This post will dive into evaluation logit regression models. I’m sticking with the classification model theme because I expect to be using this tool frequently - so mastering it is essential.

Classification models place outcomes into one of two or more categories. The `Confusion Matrix` is a popular tool to evaluate classification model performance. I’m going to use a sample scenario to introduce the confusion matrix and explore how to use it to evaluate my horse racing classification models.

Example Scenario

`I want to classify if a horse will improve in it's upcoming race.`

Here are the steps I’d take to perform this analysis.

1. Read in Horse Racing Past Performance Data.

2. Split data into training and test sets

3. Build and fit the logistic regression model

4. Make predictions on the training and test sets

Horse Improvement Classification Results

sample <- improveTest[c(7,21, 35, 49), c('improve','pred')]

 print(sample)
 ## improve pred
 ## no      0.38565   
 ## no      0.303246227
 ## yes     0.5800498077
 ## no      0.0006846551
 ## yes     0.51434345
 
 # improve column provide predicted class - improve or not improve
 # pred column provide the predicted probability that a horse will improve if the probability is
 # is greater than 50% the horse is classified to improve

The Confusion Matrix

The Confusion Matrix classifies the model performance against actual data classifications. The beauty of the Confusion Matrix is its simplicity and ease of understanding. It’s a table that sums how often each combination of known outcomes (actuals) occurred in combination with each prediction type. In my horse racing example, the confusion matrix is calculated with the following R code.

cm_improve <- table(truth = improveTest$improve, prediction = ifelse(improveTest$pred > 0.5, "spam", "non-spam"))

 print(cm_improve)
 
 ##            prediction
 ## truth        no   yes
 ##   no         264   14
 ##   yes         54  158

Here’s how you read the table in six easy steps:

1. The rows of the table (labeled truth) correspond to the label of the datum: whether the horse improve (yes) or not (no).

2. The columns of the table (labeled prediction) correspond to the prediction which the model makes.

3. The first cell of the table (truth = “no” and prediction = “no”) corresponds to the 264 horses in the test set which did not improve, and that the model (correctly) predicts the lack of improvement.

4. The second row of the table (truth = “yes” and prediction = “no”) corresponds to the 54 horses in the test set which did improve, and that the model (incorrectly) predicted = “no”.

5. The first row and second column (truth = no and predict = yes) corresponds to the 14 horses in the test set which did not improved but the model (incorrectly) predicted they would.

6. Finally, the second row and second column (truth = yes and predict = yes) corresponds to the 158 horse in the test set which did improve and the model (correctly) predicted they would.

Every cell in a 2 by 2 Confusion Matrix has a special name:

Truth /Predict	Predict=No	Predict =Yes
Truth =No	True No (264)	False Yes(14)
Truth =Yes	False No (54)	True Yes (158)

With an understanding of these concepts, we’re ready to start calculating some of the performance measures that make the confusion matrix so valuable and that will enable me to evaluate my horse racing models.

Accuracy

Truth /Predict	Predict=No	Predict =Yes
Truth =No	`True NO`	False Yes
Truth =Yes	False No	`True Yes`

###Accuracy = (True No + True Yes) / (True No + True Yes + False No + False Yes)

In words, Accuracy is the fraction of correct responses to total responses. Accuracy answers the question: When the model says the horse will improve or not improve, what’s the probability of that’s it correct. In our example the accuracy would be calculated as follows:

(cm_improve[1,1] + cm_improve[2,2]) / sum(cm_improve)
## [1] 0.9213974

An error rate of 8% in horse racing would be an Hall of Fame Performance. Let’s hope my actual results measure up.

Precision And Recall

The Confusion Matrix doesn’t stop with Accuracy. Another evaluation measure is called Precision and Recall. Precision answers the question “If the model says this horse will improve, what’s the probability that it does” Therefore we define Precision as the ratio of true Yes to predicted Yes. It’s calculation is set forth below:

cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[1,2])
 ## [1] 0.9186047     
 
 # Prediction = True Yes / (True Yes + False Yes)
 # Precision is about the second column of the Confusion Matrix

In this case Precision is close to Accuracy - That’s not always the case. Coming into the Blog Post, I thought Accuracy was the most important metric in evaluating a model. Now, as I think about it, as a Horse Player high precision would actually be more important to me. Precision tells how accurate the model is at predicting improvement. If you assume a horse stand a better chance to win when she improves you would value high precision over high accuracy - Nice, I learned something.

The comparison metric to precision is Recall. Recall answers the question "Of all the horses that improved, what fraction did the model get right. Recall is about the second row of the confusion matrix (False No and True Yes). Here’s how we calculate it:

cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[2,1])
 ## [1] 0.877     
 
 # Prediction = True Yes / (True Yes + False No)
 # Precision is about the second column of the Confusion Matrix

As a Horse Player what does an 87.7% recall mean to me. I believe it means that the model is missing approximately 12% of the horses that improve.

You can think of Precision as a measure of confirmation (when the model indicates improvement, how often is it correct). Recall is more about productivity or utility - Of all the opportunities to identify an improving horse how much will the model get right. One minus recall represents lost wagering opportunities.

So now I’ve learned about Accuracy, Precision and Recall. Suppose I’ve developed several different models to predict improvement. Each model has a different combination Accuracy, Precision and Recall. How do I pick the best model?

F1 Score

One good way to pick the best model is the F1 score. The F1 score measures a trade off between precision and recall. It’s defined as the harmonic mean of the precision and recall. Here’s the calculation:

precision <- cm_improve[2,2] / (cm_improve[2,2]+ cm_improve[1,2])
recall <- cm_improve[2,2] / (cm_improve[2,2] + cm_improve[2,1])
  
 (F1 <- 2 * precision * recall / (precision + recall) )
 
 ## [1] 0.897

Our Horse improvement model has an F1 score of approximately 0.9 - What does it mean? F1 will by 1.0 when a model has perfect Precision and Recall. Suppose you are using your model and determine that you are losing to many bets (because you thought the horse would improve). To improve this situation, you would try to improve Precision. Often when you improve Precision its at the cost of lower Recall. If Recall falls too much F1 may also fall. When this happens, it means you traded away too much Recall for the Precision you gained.

Conclusion

The Confusion Matrix and its Accuracy, Precision, Recall and F1 metrics provide all the tools necessary to evaluate a winning horse racing model. Thanks for reading.

Note - The Precision Matrix has other metrics, Sensitivity and Specificity. These metrics, while useful, answer questions that are less pertinent to Horse Racing space.

Blog Post 3 - Evaluating Logistic Regression - A Horse Players Perspective

Jim Mundy