As cliché as it sounds, American football is a game of inches. Given the fact that the NFL is a multi-billion dollar industry, these inches and the decisions that precede them have an immense monetary impact. In an attempt to glean additional insight into the impact of decision-making within the sport, our group set out to determine which in-game factors significantly impact a team’s WPA (“Win Probability Added”) for a given play. Specifically, we wanted to look at fourth down plays exclusively, as these are typically the most “high-leverage” decisions within a game.
For our dataset, we were able to scrape data for every NFL regular season play from 2009 to 2020. We then cleaned the data to solely look at fourth down plays within that time frame, which totaled around 43,950 observations. After dealing with a few outliers, we ended up with 43,905 fourth down plays over the course of 12 NFL seasons. Included below is a table explaining each variable:
| Variable | Description |
|---|---|
| yardline_100 | How far the offensive team is from the other team’s endzone. For example, if the offense is at the 75 ‘yardline_100’, they are 75 yards from the opponent’s endzone (or technically on their own 25 yard line) |
| game_date | Calendar date in which the play occured |
| game_seconds_remaining | How many seconds are remaining in the game |
| game_half | Which half of the game the play happens within (football has two halves, so values are ‘Half1’ and ‘Half 2’) |
| ydstogo | How many yards the offensive team needs for a first down on this particular fourth down play |
| play_type | What action the offense took during this situation |
| field_goal_result | If the offense attempted a field goal, this variable gives insight into the result of the attempt |
| score_differential | The difference in score between the team in possession versus the defensive team. For example, if the defensive team is winning by 14 points before the fourth-down play is run, the ‘score_differential’ value is -14 |
| ep | Expected Points. The value of the current distance, field position, and down situation in terms of future expected net point advantage. Essentially, it is the net point value a team can expect given a particular combination of down, distance, and field position |
| epa | Expected Points Added. The difference between the Expected Points (EP) at the start of a play and the EP at the end of the play. EPA is a measure of a play’s impact on the score of the game |
| wpa | Win Probability Added. The difference between a team’s Win Probability (WP) at the start of a play and the WP at the end of the play. WPA is another measure of a play’s impact on the outcome of a game. Measured in percentages (e.g. a value of 6 is a 6 percent increase in WP for a play) |
| fourth_down_converted | Whether a team that ‘goes for it’ in a fourth down situation (doesn’t punt or kick a field goal) achieves a first down. Coded as 1 for a conversion, 0 for failure (or not applicable). |
| fourth_down_failed | Whether a team that ‘goes for it’ in a fourth down situation (doesn’t punt or kick a field goal) fails to get a first down. Coded as 1 for failure, 0 for success (or not applicable) |
| punt | Whether a team decides to punt in a fourth down situation. Coded as 1 for punt, 0 for no punt |
| field_goal_missed | Whether a team that kicks a field goal in a fourth down situation misses the kick. Coded as 1 for a missed field goal, 0 for a made field goal (or not applicable) |
| field_goal_good | Whether a team that kicks a field goal in a fourthdown situation makes the kick. Coded as 1 for a made field goal, 0 for a missed field goal (or not applicable) |
The following section contains analyses of variable distributions within our dataset:
| Var1 | Freq |
|---|---|
| field_goal | 10630 |
| pass | 2923 |
| punt | 28296 |
| run | 2056 |
| Var1 | Freq |
|---|---|
| Half1 | 23111 |
| Half2 | 20794 |
The majority of our data comes from punts and field goals. We have less data for runs and passes which may skew the results of any future models. We also see that our data is relatively evenly spread across the two halves. Outliers as shown by the boxplot could have a disproportionate impact on the model.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 31.00 55.00 51.08 72.00 99.00
As one would expect, most of the punt decisions come further away from an opponent’s endzone (far left), while the spread for the other decisions are found closer to the opponent’s endzone. As shown by the summary, we have a relatively normal distribution overall for the variable.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 7.000 7.782 11.000 48.000
The distribution of our yards to go data shows that for 75% of our plays, teams were within 11 yards of a 1st down. This data is positively skewed, meaning that the majority of our data comes from when teams were close to reaching a first down/touchdown.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -59.000 -7.000 0.000 -1.076 4.000 59.000
Looking at the distribution of score differential, it is evident that the majority of the plays occurred when teams were within 7 points of one another (7 points down OR 7 points up).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 121 931 1834 1803 2659 3588
Looking at the summary and histogram of the “game_seconds_remaining” variable, there seems to be a relatively even distribution for the observations.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -49.9524 -1.4213 0.0357 0.2273 2.1444 47.5490
20 random observations are selected for the scatter plot and when compared to the ellipse matrix, the significant correlations are still visible
| yardline_100 | game_seconds_remaining | ydstogo | score_differential | ep | epa | wpa | fourth_down_converted | fourth_down_failed | punt | field_goal_missed | field_goal_good | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| yardline_100 | 1.0000 | 0.0593 | 0.2410 | -0.0340 | -0.9651 | 0.0142 | 0.0439 | -0.1606 | -0.1562 | 0.7971 | -0.1885 | -0.6751 |
| game_seconds_remaining | 0.0593 | 1.0000 | -0.0206 | -0.0075 | -0.0819 | 0.0329 | 0.0453 | -0.0869 | -0.1461 | 0.1189 | 0.0023 | -0.0098 |
| ydstogo | 0.2410 | -0.0206 | 1.0000 | -0.0537 | -0.3027 | 0.0244 | 0.0847 | -0.2218 | -0.1190 | 0.2167 | 0.0152 | -0.0670 |
| score_differential | -0.0340 | -0.0075 | -0.0537 | 1.0000 | 0.0367 | -0.0020 | -0.0241 | -0.1114 | -0.1653 | 0.0566 | 0.0277 | 0.0775 |
| ep | -0.9651 | -0.0819 | -0.3027 | 0.0367 | 1.0000 | -0.0295 | -0.0353 | 0.2050 | 0.1788 | -0.8283 | 0.1933 | 0.6712 |
| epa | 0.0142 | 0.0329 | 0.0244 | -0.0020 | -0.0295 | 1.0000 | 0.7036 | 0.4601 | -0.4569 | -0.0288 | -0.4253 | 0.2147 |
| wpa | 0.0439 | 0.0453 | 0.0847 | -0.0241 | -0.0353 | 0.7036 | 1.0000 | 0.3285 | -0.2672 | -0.0280 | -0.3365 | 0.1446 |
| fourth_down_converted | -0.1606 | -0.0869 | -0.2218 | -0.1114 | 0.2050 | 0.4601 | 0.3285 | 1.0000 | -0.0600 | -0.3397 | -0.0488 | -0.1285 |
| fourth_down_failed | -0.1562 | -0.1461 | -0.1190 | -0.1653 | 0.1788 | -0.4569 | -0.2672 | -0.0600 | 1.0000 | -0.3198 | -0.0460 | -0.1211 |
| punt | 0.7971 | 0.1189 | 0.2167 | 0.0566 | -0.8283 | -0.0288 | -0.0280 | -0.3397 | -0.3198 | 1.0000 | -0.2606 | -0.6858 |
| field_goal_missed | -0.1885 | 0.0023 | 0.0152 | 0.0277 | 0.1933 | -0.4253 | -0.3365 | -0.0488 | -0.0460 | -0.2606 | 1.0000 | -0.0986 |
| field_goal_good | -0.6751 | -0.0098 | -0.0670 | 0.0775 | 0.6712 | 0.2147 | 0.1446 | -0.1285 | -0.1211 | -0.6858 | -0.0986 | 1.0000 |
Punting and field position seem to be strongly positively correlated, as is WPA and EPA (which makes sense given that they are relatively similar measures). We can also see that field position and EP are strongly negatively correlated, as are punts and EP.
To better suit our research question we split our data set into three separate data frames and models, one for each potential “action” that a team could take on a fourth down play. For each action, we fit a linear model to attempt to determine what situations and actions dictate a fourth down play’s WPA (Win Probability Added) for the offensive team. As you will see below, those actions are as follows: punt, field goal, and going for it.
##
## Call:
## lm(formula = wpa ~ yardline_100 + ydstogo + score_differential +
## game_seconds_remaining, data = punt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.591 -1.425 -0.042 1.439 36.097
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.731e+00 1.000e-01 -27.305 <2e-16 ***
## yardline_100 2.374e-02 1.355e-03 17.514 <2e-16 ***
## ydstogo 7.234e-02 3.343e-03 21.636 <2e-16 ***
## score_differential 2.515e-03 1.872e-03 1.344 0.179
## game_seconds_remaining 3.576e-04 1.974e-05 18.116 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.263 on 28291 degrees of freedom
## Multiple R-squared: 0.04014, Adjusted R-squared: 0.04001
## F-statistic: 295.8 on 4 and 28291 DF, p-value: < 2.2e-16
Using the P Values, it appears all variables are significant other than “score_differential”. In this model, we will remove that variable.
##
## Call:
## lm(formula = wpa ~ yardline_100 + ydstogo + game_seconds_remaining,
## data = punt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.569 -1.432 -0.044 1.441 36.090
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.716e+00 9.938e-02 -27.33 <2e-16 ***
## yardline_100 2.361e-02 1.352e-03 17.46 <2e-16 ***
## ydstogo 7.195e-02 3.331e-03 21.60 <2e-16 ***
## game_seconds_remaining 3.552e-04 1.966e-05 18.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.263 on 28292 degrees of freedom
## Multiple R-squared: 0.04008, Adjusted R-squared: 0.03998
## F-statistic: 393.8 on 3 and 28292 DF, p-value: < 2.2e-16
Removing score_differential from the model increased our F statistic (393.8) and increased the significance of the model as a whole. With an R-squared of 0.04, only 4% of the variance in WPA is accounted for within our “Punt” model. With the ydstogo coefficient as an example, for each ~ 13.9 yards-to-go increase there is an expected increase of 1 in WPA added for a punt.
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 62.68649, Df = 1, p = 2.4237e-15
With a high P value, it is evident that this model presents a violation of the linear regression assumption regarding consistency of variance.
## [1] 1500 21843
## Error in shapiro.test(linear$residuals): sample size must be between 3 and 5000
We see numerous outliers within the qqplot, and our dataset is too large to perform the Shapiro-Wilk test. The maximum number of rows for this test is 5000.
##
## Call:
## lm(formula = wpa ~ ydstogo + yardline_100 + epa + game_seconds_remaining,
## data = fd_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.347 -1.602 -0.025 1.532 46.709
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.513e+00 1.047e-01 -14.449 <2e-16 ***
## ydstogo 1.414e-01 7.651e-03 18.479 <2e-16 ***
## yardline_100 3.815e-02 3.617e-03 10.549 <2e-16 ***
## epa 2.265e+00 2.228e-02 101.676 <2e-16 ***
## game_seconds_remaining -8.486e-05 3.666e-05 -2.315 0.0207 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.981 on 7436 degrees of freedom
## Multiple R-squared: 0.5984, Adjusted R-squared: 0.5982
## F-statistic: 2770 on 4 and 7436 DF, p-value: < 2.2e-16
## RMSE_model RMSE_test
## 1 2.979561 2.931066
All coefficients are appropriately positive or negative, and the RMSE’s for the testing and model track relatively well.
59.84% of the variance in wpa is explained by the predictor variables in this model. With football there are an infinite number of human and environmental factors that can change the outcome of a play, which leaves wide room for error. This makes it essentially impossible to perfectly predict WPA within a model context.
The F-statistic of 2770 is the largest of all the attempted models and shows evidence against the null hypothesis. The p values for ydstogo, yardline_100, and epa are all highly significant, the p-value for game_seconds_remaining, .025 is still below .05, so it is significant, but not as significant as the other variables. The model worked less well when we removed it, so it still provides valuable information. With field position as an example, an increase in about 29 yards away from the opponent’s endzone would have an expected increase of 1 WPA if the team decides to kick a field goal on fourth down.
## ydstogo yardline_100 epa
## 1.113177 1.113888 1.001212
## game_seconds_remaining
## 1.000545
All vif and gvif values below 10, which showed no multicollinearity at each iteration
There are a few outliers here, and the relationship appears to be non-linear
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 444.1782, Df = 1, p = < 2.22e-16
Our current model here has a Chisquare of 414.5 with a very low p value - this was another signal that this data set might not be appropriate for regression analysis.
## [1] 159 4262
There are certainly some outliers outside the bounds of this plot, as shown by the black dots outside the blue lines.
## Error in shapiro.test(fd_fit$residuals): sample size must be between 3 and 5000
As shown by the error, the data set is too large for the Shapiro test, so we’re unable to test for normality in this manner.
After the first test (and several iterations) we decided to re-assess all variables including ep and epa. Note: after a few more iterations, we discovered that ep was a key predictor variable within the “field goal” model.
##
## Call:
## lm(formula = wpa ~ yardline_100 + ydstogo + score_differential +
## game_seconds_remaining, data = goforit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.165 -3.514 -0.101 4.312 44.257
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1315508 0.2848813 -0.462 0.644
## yardline_100 0.0240009 0.0049738 4.825 1.44e-06 ***
## ydstogo -0.1109493 0.0262418 -4.228 2.40e-05 ***
## score_differential -0.0028924 0.0084565 -0.342 0.732
## game_seconds_remaining 0.0004442 0.0001094 4.062 4.94e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.165 on 4974 degrees of freedom
## Multiple R-squared: 0.01194, Adjusted R-squared: 0.01115
## F-statistic: 15.03 on 4 and 4974 DF, p-value: 3.252e-12
Our linear model for “go for it” shows that the score_differential is not a statistically significant variable that increases a team’s WPA (the p-value has no significance). However, the coefficients for all of our predictor variables make logical sense– we would expect a negative relationship between number of yards a team has to go before a 1st down and an increase in WPA because teams would have further to go before having a chance of scoring.
Because of this, we are going to create a new linear model without the score_differential variable.
##
## Call:
## lm(formula = wpa ~ yardline_100 + ydstogo + game_seconds_remaining,
## data = goforit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.184 -3.512 -0.086 4.313 44.234
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1045708 0.2737182 -0.382 0.702
## yardline_100 0.0240404 0.0049720 4.835 1.37e-06 ***
## ydstogo -0.1096648 0.0259694 -4.223 2.46e-05 ***
## game_seconds_remaining 0.0004347 0.0001058 4.110 4.02e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.165 on 4975 degrees of freedom
## Multiple R-squared: 0.01192, Adjusted R-squared: 0.01132
## F-statistic: 20 on 3 and 4975 DF, p-value: 6.973e-13
Our new linear model made all of our predictor variables statistically significant, but our adjusted R^2 is very small (0.01132), indicating that only 1.13% of the variation in WPA is caused by game seconds remaining, a team’s yard line position, and how many yards a team has to go before a 1st down. With yardline as an example, for each 41.6 field position increase, there is an expected increase of 1 WPA when going for it on fourth down.
The above plot shows that the model that was created has a wide variance in residual values. The values fall within a wide range of residual values towards the middle of our residual line which is an indicator of a problem with our model.
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.5249524, Df = 1, p = 0.46874
The large p-value indicates that the ncvTest found that there is no constant variance for the residuals of this model (i.e., our residuals do not follow a pattern).
## [1] 278 3942
The problem with this model is also seen in the qqPlot. Whereas we would hope that our residuals would fall within the paramaters of the red lines, there appears to be a lot of outliers and other residual values that fall outside of the red bounds (i.e., the residuals from our model are all over the graph).
##
## Shapiro-Wilk normality test
##
## data: linear_gfi$residuals
## W = 0.97738, p-value < 2.2e-16
As shown by the miniscule p-value within the Shapiro-Wilk test, our model seems to violate the test for normality.
From the statistical tests run so far on the model created from the “goforit” data, it is evident that a linear model does not fit the data for “go for it” fourth down plays. Given that all three of our attempted linear models violate assumptions, we thought to transform the “ydstogo” variable, as this variable is the only one that we found to be significantly skewed.
##
## Call:
## lm(formula = wpa ~ yardline_100 + log(ydstogo) + score_differential +
## game_seconds_remaining, data = goforit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.368 -3.434 -0.060 4.246 44.443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0512686 0.2874628 -0.178 0.858457
## yardline_100 0.0245044 0.0049742 4.926 8.65e-07 ***
## log(ydstogo) -0.5723206 0.1219946 -4.691 2.79e-06 ***
## score_differential -0.0051723 0.0085133 -0.608 0.543506
## game_seconds_remaining 0.0004163 0.0001102 3.779 0.000159 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.163 on 4974 degrees of freedom
## Multiple R-squared: 0.01276, Adjusted R-squared: 0.01196
## F-statistic: 16.07 on 4 and 4974 DF, p-value: 4.426e-13
##
## Call:
## lm(formula = wpa ~ yardline_100 + log(ydstogo) + score_differential +
## game_seconds_remaining, data = punt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.739 -1.414 -0.059 1.437 36.198
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.272e+00 1.048e-01 -31.234 <2e-16 ***
## yardline_100 2.403e-02 1.346e-03 17.845 <2e-16 ***
## log(ydstogo) 6.099e-01 2.387e-02 25.549 <2e-16 ***
## score_differential 2.848e-03 1.865e-03 1.527 0.127
## game_seconds_remaining 3.580e-04 1.966e-05 18.204 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.253 on 28291 degrees of freedom
## Multiple R-squared: 0.04627, Adjusted R-squared: 0.04613
## F-statistic: 343.1 on 4 and 28291 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = wpa ~ yardline_100 + log(ydstogo) + epa + game_seconds_remaining,
## data = fd_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.478 -1.612 -0.047 1.523 46.610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.222e+00 1.170e-01 -18.99 < 2e-16 ***
## yardline_100 3.672e-02 3.571e-03 10.28 < 2e-16 ***
## log(ydstogo) 1.022e+00 4.833e-02 21.14 < 2e-16 ***
## epa 2.267e+00 2.213e-02 102.42 < 2e-16 ***
## game_seconds_remaining -9.391e-05 3.641e-05 -2.58 0.00991 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.961 on 7436 degrees of freedom
## Multiple R-squared: 0.6038, Adjusted R-squared: 0.6036
## F-statistic: 2833 on 4 and 7436 DF, p-value: < 2.2e-16
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 4.123907, Df = 1, p = 0.042281
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 49.02377, Df = 1, p = 2.5288e-12
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 490.8003, Df = 1, p = < 2.22e-16
As shown above, a logarithmic transformation of our “ydstogo” variable doesn’t have much of an impact (if any). In fact, it causes the “goforit” model to violate the non-constant variance assumption.
Due to all three of our “action” models still violating multiple linear assumptions, we find that none of them are necessarily equipped to predict WPA. This was somewhat surprising, especially since we had already split up the linear models into the different actions. A next step in research would be to apply a highly predictive model and combine it with situational action probabilities to create a “suggested” course of action for an offensive team for any fourth down scenario. In the long run, teams would potentially try to maximize WPA for each of their play calls, and this next step would go a long way towards satisfying that. However, given that our linear models aren’t a good fit, we thought to incorporate our question (predicting WPA) into a Random Forest machine-learning framework:
## meanImp decision
## yardline_100 33.33682 Confirmed
## ydstogo 29.13630 Confirmed
## score_differential 66.52185 Confirmed
## game_seconds_remaining 30.69461 Confirmed
## result 82.84026 Confirmed
Each independent variable is confirmed as significant in our modeling of WPA.
## [1] 2.528733
Our RMSE for the train model is 2.52, which means that for a given prediction of WPA, there is a typical variance (or error) of 2.52 WPA either way. We are very happy with these results, as this works well with the spread in WPA in our data.
## [1] 2.55113
Our RMSE for the final model is 2.55, which means that for a given prediction of WPA, there is a typical variance (or error) of 2.55 WPA either way. As this tracks well with the train model, we are very happy with these results, as this also works well with the spread in WPA in our data.
As mentioned before, there are a lot of factors unaccounted for in our models. There are environmental factors, personnel factors, and in-game offensive and defensive strategy factors to consider when trying to determine the best course of action on a given fourth down play. However, our research does a great job of opening a new door into the analytics-based decision making that go into each play call in the football world. As mentioned previously, next steps could be to create a model that “suggests” the action with the highest predicted WPA given a certain offensive fourth down situation.