NFL Win Probability Predictor

November 22, 2015

David Rubinger

Purpose

Predicting the outcome of an NFL football game can be a fun (and potentially lucrative) exercise. The NFL Win Probability Predictor web application predicts the probability of a home and away team winning an upcoming game against each other based on historical data that you input. This app also allows you to see how a change in the variables impacts win probabilities by playing around with the inputs.

Model

The predictive model used here is a logistic regression that was trained on regular season NFL games from 2010 to 2014 (data source: pro-football-reference.com). It predicts the probability the home team wins the game (and thereby the probability the away team wins) based on the following predictors for both home and away teams:

  • Average Point Differential This Season --- the average net points per game the team has accumulated so far in the season
  • Winning Streak --- the number of consecutive wins the team has coming into the game

Running the model in R looks like:

mod.glm <- glm(home_result ~ home_pts_diff_season + 
                   home_w_streak + away_pts_diff_season +
                   away_w_streak, data = train,
               family=binomial(logit))

Model Summary

From the model summary below, we can see that the coefficients on home and away point differential variables have the expected sign and are statistically significant. Neither home nor away winning streaks are significant, with the away winning streak variable having the sign opposite of expected.

##                         Estimate  Std. Error    z value     Pr(>|z|)
## (Intercept)           0.30574488 0.091885046  3.3274716 8.763791e-04
## home_pts_diff_season  0.05968274 0.009332071  6.3954449 1.600806e-10
## home_w_streak         0.01648228 0.048341494  0.3409550 7.331374e-01
## away_pts_diff_season -0.05151595 0.009294399 -5.5426871 2.978649e-08
## away_w_streak         0.01374611 0.045170621  0.3043152 7.608878e-01

Model Performance

Training this model on 70% of the sample data and testing it on the other 30% resulted in predicting the correct result 62.2% of the time. This is a pretty good result if you are picking games outright; however, it might not be good enough to beat the point spread!