Predicting NFL Scores

The Emergence of Sports Gambling

If you follow professional or college sports in any capacity, you have probably noticed the rise to popularity that sports gambling has had in the last few years. Fans have always gambled on sports, but recently it has become legal in various states across the country. With the emergence of legal online sportsbooks, the activity has experienced a popularity boom. Lines and odds have gone from an unspoken reality to a publicized one. Leagues and teams have begun to endorse gambling, as it brings exposure to their sports. While it does have its downfalls, sports gambling ultimately provides entertainment and interaction for lifelong and newly adopted fans alike.

Obsession with Projection

With gambling rise to public fame, fans and analysts have become obsessed with game lines and projections. Sports analytics as an industry has benefited from the phenomenon because it provides exposure to people who otherwise wouldn’t care about the insightful analysis that goes on behind the scenes.

As a result of this, when certain games come around, many fans only care about one thing: “Which team is favored and what is the line?” For those that may not be familiar, the gambling line of a game is essentially a prediction of the final score. For a football example, if the Chicago Bears were favored to win a game by 4 points, their line would be -4. Conversely, if they were the underdog by 4 points, their line would be +4.

This concept is fascinating to me, not because I am a gambler, but because I am curious about the analytics that go into generating this kind of prediction.

The Science Behind it All

Gambling lines and projections are obtained through extremely detailed prediction models that account for nearly every conceivable variable you could think of. These would obviously include the teams playing and their past results, but also accounts for where the game is played, what the weather is like, and players that are potentially injured and not playing.

What I want to do in this analysis is to examine NFL game data and see if I can build a predictive model to effectively project the final score of a potential matchup. Let’s get started.

The Data

The data used in this analysis is individual game data from the 2021 regular season. Below is a data dictionary with information on each variable. For each variable, there is a separate column for the away team and the home team.

away	Name of the away team
home	Name of the home team
first_downs	Number of first downs recorded
passing_yards	Number of passing yards recorded on offense
rushing_yards	Number of rushing yards recorded on offense
total_yards	Total number of offensive yards recorded
sacks	Number of sacks given up by the offense
sacks_yds_lost	Number of yards lost on sacks given up
rushing_attempts	Number of rushing plays ran by the offense
fumbles	Number of fumbles lost by the offense
int	Number of interceptions thrown by the offense
turnovers	Number of turnovers committed by the offense
penalties	Number of penalties committed by the team as a whole
penalty_yds	Total umber of yards lost as a result of penalties committed
drives	Number of drives the offense ran through the duration of the game
def_st_td	Number of defensive or special teams touchdowns scored
possession	Amount of time the offense was in possession of the ball (in minutes)
score	Total number of points scored by the team as a whole

Summary Statistics

A few interesting notes on this past NFL season:

Home Team Average Score	Away Team Average Score
23.83824	22.125

First, the home teams scored on average one more point than the away teams. We would expect the home teams to score more than away teams, but this small of a difference is a bit surprising.

Home Team Penalties	Home Penalty Yds	Away Team Penalties	Away Penalty Yds
5.783088	49.375	6.040441	52.62132

The NFL is known for some hostile stadium atmospheres. Places like Kansas City, Seattle, and Pittsburgh are renowned for loud crowds that have a profound impact on the game by confusing the offense and forcing penalties like delay of games and false starts. However, based on the averages across the league, penalties affected the home and away teams very similarly, with difference less than one between the two.

Most Points by a Home Team	Most Points by an Away Team
56	51

Lastly, let’s look at some high scoring games. The most points scored by a home team during the 2021 season was 56, and the most scored by an away team was 51.

Trends from the 2021 Season

The basis of this analysis is to see if we can predict the final score of NFL games. In gambling terms, this most closely relates to the spread of the game, or how many points a team will win or lose by. However, there are many more parts of the game that affect the gambling world, so let’s look at some trends that might be helpful.

Moneyline

A popular sports bet is the money line. As we’ve learned, there is a spread for every game, where a certain team is favored by a certain number of points. But you can also simply bet a team to win. It’s a much simpler process and is a popular tactic for fans who just want a team to root for.

The graphs below shows the every team’s average points scored and average points allowed at home. The teams toward the bottom right corner of the graph would be great teams to pick to win if you are looking for an easy bet.

If the Buccaneers, Cowboys, Packers, or Patriots are playing at home, they are a good bet to score a lot of points and win the game. In 2021, they all averaged over 30 points and gave up fewer than 22 in games that they played at home. Now let’s look at road games. Keep an eye on that bottom right corner:

It’s always tougher to play on the road than at home, but these teams are good picks for such a game. The Bills and Cardinals look like the best picks in away games, as they both average over 29 points scored and less than 20 points given up.

The Over/Under

A popular gambling statistic is the over/under. Vegas gives a predicted number of points scored by both teams combined in a particular game, and bettors decide whether or not they think the actual number will be higher or lower.

For overs, you are looking for games with high-powered offenses that can score a lot of points in a short amount of time. Or, if you think in terms of defense, you want games with lousy defenses that couldn’t stop a nose bleed. The graphs below shows the teams that averaged the most total points in their home games last season:

As you can see from the graph, we have some common teams from before. The Bengals, Chargers, Colts, Cowboys, and Eagles all averaged over 50 total points scored in their games last year. If these teams are playing, there is a good chance the total score hits the over.

What if you like defense and want to bet the under? Let’s look at the opposite trend: the teams that average the fewest total points in their home games:

It looks like the Bears, Broncos, Browns, Giants, and Jaguars are the best picks if we are looking for an under. The teams in their home games averaged at or under 40 total points over the course of last season.

Prop Bets

A third popular tactic is a prop bet. This is a bet that looks at a specific part of the game as opposed to the final score. For example, you can bet the over/under for a specific player or team statistic like passing or rushing yards. As the high scoring football fan that I am, I am going to look at the highest scores in these categories.

Passing and Rushing

The following graph shows the average passing and rushing yards in home games last season:

Passing yards are on the x-axis rushing yards on the y-axis. The best passing teams are farther to the right and the better rushing teams are higher on the graph. So the Buccaneers, Bengals, and Cowboys are the best passing teams, and the Ravens, Eagles, Colts, and Browns are the best rushing teams.

These are the results in away games:

In away games, the Buccaneers and Cowboys are both still really strong in the passing department. The Eagles, Colts, and Browns are still strong rushers, with the Titans leading the pack for away games.

This is not a guarantee

Obviously, any one of these scenarios depends on the context of the game and match up in question, but these are some trends that give you a good idea of which teams excel in different areas.

Sentiment Analysis

I want to take a quick step away from gambling trends and predictions and talk about something that affects spreads and totals but is not included in this model: individual players. Individual players have a profound impact on the outcomes of games, and subsequently the predictions of games. If a star player like Aaron Rodgers is injured and not playing, the Packers are not going to be projected to win by as much than if he was playing. There is a particular event that highlights individual players and will have a huge impact on next season: the draft.

The Draft

The NFL Draft plays a huge role in the development and success of NFL teams. Future stars and busts alike are drafted by every team, and there is a lot of buzz and excitement surrounding the event. What’s more, these individual players will bring value to their teams and contribute to wins or losses, something that could potentially be a variable in a prediction model. Even though I won’t be adding individual players to my model, I want to look at some conversation surrounding the draft using sentiment analysis.

Two teams that have been at the forefront of draft conversation are the New York Jets and the New York Giants. Both teams have been pretty bad for the last few years, and they are always talked about during the draft, both for good picks and bad picks.

New York Jets vs New York Giants

Both of these teams picked early in the first round of this year’s draft, and there was a lot of hype surrounding their picks. I want to do a sentiment analysis on tweets about these two teams and see how people were feeling about the respective picks.

With their first pick in the draft, the Jets selected Cincinnati corner back Sauce Gardner at #4 overall. The Giants picked Oregon’s edge rusher Kayvon Thibodeaux with the very next pick. The first question I want to answer is “Were the reactions on Twitter surrounding the Jets’ or Giants’ first pick positive or negative?” The data used for this analysis was pulled from Twitter in the minutes following each pick. Let’s look at the Jets’ pick:

Disclaimer: these graphs contain graphic language (Twitter is very vulgar)

Overall, it looks like Twitter’s reaction directly following the pick was more negative than positive. Keep in mind, these are only the tweets that tagged the Jets in the post. Now let’s look at the Giants’ pick:

For the Giants, the reaction seems to be about even between positive and negative. So relative to the Jets, there is some more positivity surrounding the Giants’ first round pick of Kayvon Thibodeaux. The results of this are interesting. As a neutral fan watching the draft and scrolling through my own Twitter feed, the reactions to these picked seemed mostly positive. This goes to show how vast the Twittersphere is and how much content there is that we don’t see.

Both of these players could end up being stars that lead their teams to the Super Bowls, or they could just as likely never play an NFL game. Nobody knows what’s going to happen. However, it is still interesting to look at this sentiment data to give us an idea of what NFL fans are thinking about their favorite teams’ draft picks. Now let’s get back to modeling.

Predicting Game Scores

Now we are going to get into the nitty gritty of the statistical analysis and try to predict the scores of potential game match ups. The best way to predict a numerical score is to build a linear regression. I used the software JMP to build my model.

The Model

I entered all of the variables from the table at the top of this page into the model builder, except for away and home. These will come back later when we predict the score for specific teams. Naturally, we would expect certain stats to positively affect score, like passing and rushing yards, while some stats will negatively affect score, like turnovers.

I created one model for the home stats and one for the away stats, so at the end there were two prediction equations used. It’s hard to include variables for the tough atmospheres that directly affect the game that I mentioned earlier, but this was the best way that I could account for teams playing at home as opposed to on the road. The dependent variables were the respective final scores for home and away, and the independent variables were all of the rest of the numeric variables.

Both models ended up having significant variables from the home and away teams. For the Home model, there are significant stats for the home team in question, and the teams that visited them throughout the season. The significant variables in this model are:

– Home team stats: first downs, total yards, rushing attempts, turnovers, penalties, defense and special teams touchdowns, and possession time

– Away team stats: turnovers, penalty yards

The Away model has significant stats for the road team in question and the teams that they visited throughout the season. These significant variables are:

– Away team stats: rushing yards, total yards, turnovers, penalty yards, and defense and special teams touchdowns

– Home team stats: first downs, yards lost on sacks, turnovers, penalty yards, drives

The Predictions

Once I had these variables and my prediction equations from JMP, I moved into Excel to create a dashboard for my final predictions. In order to calculate the score predictions, I pulled each team’s average statistics from home and away games. These are the numbers I plugged into the equations to calculate the predictions.

On the dashboard, I entered the coefficients for each variable in the prediction equations and lined them up with the team average. All I did was use some simple vlookups so that every time I changed the team names, the stats would change automatically and the score predictions would refresh. I used sumproduct to do the equation calculation since the team statistics changed with every team.

Divisional Matchup

Looking at division rivals is always fun, so let’s look at the predicted score if the Bengals were playing the Steelers, two fairly evenly matched teams in the NFC North division.

If the Steelers came to Cincinnati to play the Bengals, (with rounding) this model predicts the Bengals to win 24-22. This would probably lead to the Bengals being 1.5 point favorites before the game. But as we know, playing at home can lead to a different result than playing on the road would produce. If the game were played in Pittsburgh instead of Cincinnati, the prediction changes.

As the away team, the Bengals are only predicted to win by 1 point, 20-19, instead of two points. Playing at the Steelers home field makes the score prediction slightly closer, and leads to both teams scoring fewer total points.

Uneven Match Up

A good way to see how the model works is two put a league leader against a team at the bottom of the standings. Let’s look at the predicted score if the Packers played at home against the Texans.

This combination leads to a much larger spread. This model predicts the Packers to win 22-10, a 12 point difference. This is about what we would expect since the Packers are an elite team, especially at their home stadium, and the Texans only won 4 games all of last year.

Further Analysis

The model built in this analysis is a fairly simple one as far as score predictions go. It takes hours upon hours of meticulous analysis to build a Las Vegas-level prediction model. Some things that could be added to this model to make it more accurate would be variables regarding weather conditions or individual players that may be injured and not playing.

I hope you enjoyed this analysis and learned something from it, whether you were interested in the football or the predictive modeling, or both. There are so many cool ways to combine sports and analytics, and I for one am excited to see the new ways that are discovered in the future.