The Purpose of this project was to use a Random Forest model to project each NFL Team’s win total before the season begins. Using 5 input parameters, the model is trained on data consisting of NFL teams from the 2009 season up until the 2021 season. Historical data from sportsbooks used as a comparison had a root mean squared error of 2.6 wins between a team’s predicted wins and actual wins. Running the model on the test data removed from the training data resulted in a root mean squared error of 2.7 wins.
The data used to create the model is from Pro Football Focus (PFF), NFLverse, and Sports Odds History. The raw data from PFF requires a subscription and can not be shared. A big thank you to the NFLverse Discord for help with the code and model problems I experienced.
The following independent variables were used as inputs to train and test the model:
A team’s win-loss record is not always the best representation of their performance on the field. Ultimately it does decide who moves on to the playoffs, but wins in one season can only tell us so much about how many wins the team will win in the next season. Since 2008, a team’s winning percentage in season N has a correlation coefficient of 0.38 with the team’s winning percentage in season N + 1, but this can be improved upon.
A common metric used to measure each NFL team’s performance is Expected Points Added (EPA), which can be read about here. By combining a team’s EPA/play on offense and defense, the team’s total performance can be described by one metric, although it is not adjusted to strength of schedule. Similar to Ben Baldwin’s team tiers, offensive EPA is weighted 1.5 times more than defensive EPA due to it being more stable and predictive. In the data frame constructed using NFLverse play by play data, EPA/play is standardized within each NFL season since EPA can change depending on the season. Thus the final metric, Relative Performance, shows how well a team performed compared to league average in a given season. A league average team would have a Relative Performance value of zero, while a team one standard deviation above average would have a value of one, a team one standard deviation below average would have a value of -1, and so on. The plot below shows a team’s Relative Performance is a great indicator of a team’s success. Using 2020 and 2021 as examples, only four teams finished inside the top 14 in relative performance but failed to make the playoffs. Three of these teams, the 2020 Dolphins, 2021 Chargers, and 2021 Colts, were eliminated from playoff contention in the last week of the regular season.
The plots below show, unsurprisingly, since 2009 a team’s Relative Performance had a strong correlation coefficient of 0.87 with the team’s win percentage. More importantly, however, a team’s Relative Performance had a correlation coefficient of 0.42 with the team’s win percentage the following season, 10% greater than just using wins.
The quarterback position is the most important position on a football team. Knowing this, variation of QB play from one season to the next can have a major impact on a team’s performance. The purpose of this independent variable is to capture the overall season-long QB play for a team in a given season. This metric is weighted by number of drop backs. For example, if player A and player B both had 250 drop backs on the season, the Team’s QB Grade would be the average of the two. If player A had 300 drop backs and player B had 200, the Team QB Grade would be valued closer to player A’s PFF grade. For teams that started a rookie QB for week 1, the average rookie QB PFF grade, 69.29, was used.
The combined winning percentage of all of a team’s opponents in a given season. The purpose of the independent variable is to help determine if a team over performed their true ability because they played bad opponents, or if they under performed their true ability because they played good opponents.
Obviously, it is much easier to win games when facing easier opponents. By using sportsbooks’ preseason win totals, a team’s expected SOS can be calculated. While it is very difficult to predict win totals for every team, historically a team’s expected SOS has been moderately correlated to its actual SOS. With a correlation value of .42, illustrated by the plot below, the purpose of this independent variable is to estimate the difficulty of a team’s schedule in a given season.
The purpose of this independent variable is to help predict which teams will take a step forward or a step backward with a new quarterback. Recent examples of this are the 2020 Tampa Bay Buccaneers (Tom Brady) and the 2021 Los Angeles Rams (Matthew Stafford) both winning the Super Bowl in large part to their upgrade at the quarterback position.
Through data joining and cleaning, the data set used to create the model can be seen as below:
## team season last_rel_perf last_sos last_grade exp_sos last_QBgrade wp
## 2 ARI 2009 0.2325 0.486 70.28 0.490 72.1 0.625
## 3 ARI 2010 0.1443 0.445 72.32 0.482 49.4 0.312
## 4 ARI 2011 -1.9479 0.465 48.56 0.475 54.0 0.500
## 5 ARI 2012 -0.5157 0.469 46.82 0.539 35.8 0.312
## 6 ARI 2013 -1.8758 0.559 47.29 0.543 65.7 0.625
The model will use the columns last_rel_perf, last_sos, last_grade, exp_sos, and last_QBgrade, to predict a team’s winning percentage shown in column wp. When viewing the model results, winning percentage will be changed to wins by multiplying it by games played (17 for 2021 and 16 for all other seasons).
After splitting the data into training, validation, and testing sets, and running through a handful of tweaks and iterations of the model, the code below provides a random forest model and the importance of each independent variable.
#Model
rf1 <- randomForest(data = train, wp ~ last_rel_perf + last_sos +
exp_sos + last_grade + last_QBgrade,
mtry = 4, ntree = 1000)
varImpPlot(rf1)
As expected, the prior season’s Relative Performance was the most important factor. Goods teams tend to stay good. Bad teams tend to stay bad. The other four variables were all of similar importance, with the week 1 starting QB’s PFF grade from the prior season as the slightly most important of the group.
Using the model to predict the winning percentage for the teams in the test data set results in the plot below. The Root Mean Square Error (RMSE) of the test data set is 2.7 wins. The Sports Odds History historical data, which used a variety of sportsbooks through the years, had an RMSE of 2.6 wins. Considering only 5 variables were used in a very basic model, the results are very promising. Looking into where the model made mistakes, as well as adding more independent variables should bring better results.
The 2021 season will be used to evaluate the results and look at what teams the model missed on. The “NFL Teams’ Relative Performance” plot shown previously will help identify some differences between teams in 2020 and 2021. The top 3 underrated teams and the top 3 overrated teams by the model are shown below. For some of the teams, there were clear reasons as to why the model missed on them while for others there were not.
Predicted Wins | Actual Wins | Difference | |
---|---|---|---|
6.39 | 12 | -5.61 | |
5.95 | 10 | -4.05 | |
8.35 | 12 | -3.65 | |
8.84 | 7 | 1.84 | |
10.06 | 8 | 2.06 | |
7.69 | 3 | 4.69 |
When looking ahead, there are a few tweaks to the model that may result in better results. First, adding a garbage time filter when calculating the previous season’s Relative Performance may help identify a team’s true talent better. Games that end in blowouts can often end in meaningless 4th quarters that can change a team’s seasonal EPA/play. Second, a better metric for predicting QB play can be used rather than just using the QB’s previous PFF grade or the rookie average. This could take into account years of experience, expected O-line play, expected WR play, or draft pick for rookies. Third, other position groups PFF grades could be used as an independent variable. Some examples were just mentioned but others would be the team’s expected pass rush and coverage grades. Lastly, rather than trying to predict all the win totals as one, a model could be used to predict a team’s true ability. That metric could be combined with the season’s schedule to simulate every game to come to a predicted win total for each team.