The Emergence of Sports Gambling

If you follow professional or college sports in any capacity, you have probably noticed the rise to popularity that sports gambling has had in the last few years. Fans have always gambled on sports, but recently it has become legal in various states across the country. With the emergence of legal online sportsbooks, the activity has experienced a popularity boom. Lines and odds have gone from an unspoken reality to a publicized one. Leagues and teams have begun to endorse gambling, as it brings exposure to their sports. While it does have its downfalls, sports gambling ultimately provides entertainment and interaction for lifelong and newly adopted fans alike.

Obsession with Projection

With gambling rise to public fame, fans and analysts have become obsessed with game lines and projections. Sports analytics as an industry has benefited from the phenomenon because it provides exposure to people who otherwise wouldn’t care about the insightful analysis that goes on behind the scenes.

As a result of this, when certain games come around, many fans only care about one thing: “Which team is favored and what is the line?” For those that may not be familiar, the gambling line of a game is essentially a prediction of the final score. For a football example, if the Chicago Bears were favored to win a game by 4 points, their line would be -4. Conversely, if they were the underdog by 4 points, their line would be +4.

This concept is fascinating to me, not because I am a gambler, but because I am curious about the analytics that go into generating this kind of prediction.

The Science Behind it All

Gambling lines and projections are obtained through extremely detailed prediction models that account for nearly every conceivable variable you could think of. These would obviously include the teams playing and their past results, but also accounts for where the game is played, what the weather is like, and players that are potentially injured and not playing.

What I want to do in this analysis is to examine NFL game data and see if I can build a predictive model to effectively project the final score of a potential matchup. Let’s get started.

The Data

The data used in this analysis is individual game data from the 2021 regular season. Below is a data dictionary with information on each variable. For each variable, there is a separate column for the away team and the home team.

away Name of the away team
home Name of the home team
first_downs Number of first downs recorded
passing_yards Number of passing yards recorded on offense
rushing_yards Number of rushing yards recorded on offense
total_yards Total number of offensive yards recorded
sacks Number of sacks given up by the offense
sacks_yds_lost Number of yards lost on sacks given up
rushing_attempts Number of rushing plays ran by the offense
fumbles Number of fumbles lost by the offense
int Number of interceptions thrown by the offense
turnovers Number of turnovers committed by the offense
penalties Number of penalties committed by the team as a whole
penalty_yds Total umber of yards lost as a result of penalties committed
drives Number of drives the offense ran through the duration of the game
def_st_td Number of defensive or special teams touchdowns scored
possession Amount of time the offense was in possession of the ball (in minutes)
score Total number of points scored by the team as a whole

Summary Statistics

A few interesting notes on this past NFL season:

Home Team Average Score Away Team Average Score
23.83824 22.125

First, the home teams scored on average one more point than the away teams. We would expect the home teams to score more than away teams, but this small of a difference is a bit surprising.

Home Team Penalties Home Penalty Yds Away Team Penalties Away Penalty Yds
5.783088 49.375 6.040441 52.62132

The NFL is known for some hostile stadium atmospheres. Places like Kansas City, Seattle, and Pittsburgh are renowned for loud crowds that have a profound impact on the game by confusing the offense and forcing penalties like delay of games and false starts. However, based on the averages across the league, penalties affected the home and away teams very similarly, with difference less than one between the two.

Most Points by a Home Team Most Points by an Away Team
56 51

Lastly, let’s look at some high scoring games. The most points scored by a home team during the 2021 season was 56, and the most scored by an away team was 51.

Sentiment Analysis

I want to take a quick step away from gambling trends and predictions and talk about something that affects spreads and totals but is not included in this model: individual players. Individual players have a profound impact on the outcomes of games, and subsequently the predictions of games. If a star player like Aaron Rodgers is injured and not playing, the Packers are not going to be projected to win by as much than if he was playing. There is a particular event that highlights individual players and will have a huge impact on next season: the draft.

The Draft

The NFL Draft plays a huge role in the development and success of NFL teams. Future stars and busts alike are drafted by every team, and there is a lot of buzz and excitement surrounding the event. What’s more, these individual players will bring value to their teams and contribute to wins or losses, something that could potentially be a variable in a prediction model. Even though I won’t be adding individual players to my model, I want to look at some conversation surrounding the draft using sentiment analysis.

Two teams that have been at the forefront of draft conversation are the New York Jets and the New York Giants. Both teams have been pretty bad for the last few years, and they are always talked about during the draft, both for good picks and bad picks.

New York Jets vs New York Giants

Both of these teams picked early in the first round of this year’s draft, and there was a lot of hype surrounding their picks. I want to do a sentiment analysis on tweets about these two teams and see how people were feeling about the respective picks.

With their first pick in the draft, the Jets selected Cincinnati corner back Sauce Gardner at #4 overall. The Giants picked Oregon’s edge rusher Kayvon Thibodeaux with the very next pick. The first question I want to answer is “Were the reactions on Twitter surrounding the Jets’ or Giants’ first pick positive or negative?” The data used for this analysis was pulled from Twitter in the minutes following each pick. Let’s look at the Jets’ pick:

Disclaimer: these graphs contain graphic language (Twitter is very vulgar)

Overall, it looks like Twitter’s reaction directly following the pick was more negative than positive. Keep in mind, these are only the tweets that tagged the Jets in the post. Now let’s look at the Giants’ pick:

For the Giants, the reaction seems to be about even between positive and negative. So relative to the Jets, there is some more positivity surrounding the Giants’ first round pick of Kayvon Thibodeaux. The results of this are interesting. As a neutral fan watching the draft and scrolling through my own Twitter feed, the reactions to these picked seemed mostly positive. This goes to show how vast the Twittersphere is and how much content there is that we don’t see.

Both of these players could end up being stars that lead their teams to the Super Bowls, or they could just as likely never play an NFL game. Nobody knows what’s going to happen. However, it is still interesting to look at this sentiment data to give us an idea of what NFL fans are thinking about their favorite teams’ draft picks. Now let’s get back to modeling.

Predicting Game Scores

Now we are going to get into the nitty gritty of the statistical analysis and try to predict the scores of potential game match ups. The best way to predict a numerical score is to build a linear regression. I used the software JMP to build my model.

The Model

I entered all of the variables from the table at the top of this page into the model builder, except for away and home. These will come back later when we predict the score for specific teams. Naturally, we would expect certain stats to positively affect score, like passing and rushing yards, while some stats will negatively affect score, like turnovers.

I created one model for the home stats and one for the away stats, so at the end there were two prediction equations used. It’s hard to include variables for the tough atmospheres that directly affect the game that I mentioned earlier, but this was the best way that I could account for teams playing at home as opposed to on the road. The dependent variables were the respective final scores for home and away, and the independent variables were all of the rest of the numeric variables.

Both models ended up having significant variables from the home and away teams. For the Home model, there are significant stats for the home team in question, and the teams that visited them throughout the season. The significant variables in this model are:

– Home team stats: first downs, total yards, rushing attempts, turnovers, penalties, defense and special teams touchdowns, and possession time

– Away team stats: turnovers, penalty yards

The Away model has significant stats for the road team in question and the teams that they visited throughout the season. These significant variables are:

– Away team stats: rushing yards, total yards, turnovers, penalty yards, and defense and special teams touchdowns

– Home team stats: first downs, yards lost on sacks, turnovers, penalty yards, drives

The Predictions

Once I had these variables and my prediction equations from JMP, I moved into Excel to create a dashboard for my final predictions. In order to calculate the score predictions, I pulled each team’s average statistics from home and away games. These are the numbers I plugged into the equations to calculate the predictions.

On the dashboard, I entered the coefficients for each variable in the prediction equations and lined them up with the team average. All I did was use some simple vlookups so that every time I changed the team names, the stats would change automatically and the score predictions would refresh. I used sumproduct to do the equation calculation since the team statistics changed with every team.

Divisional Matchup

Looking at division rivals is always fun, so let’s look at the predicted score if the Bengals were playing the Steelers, two fairly evenly matched teams in the NFC North division.

If the Steelers came to Cincinnati to play the Bengals, (with rounding) this model predicts the Bengals to win 24-22. This would probably lead to the Bengals being 1.5 point favorites before the game. But as we know, playing at home can lead to a different result than playing on the road would produce. If the game were played in Pittsburgh instead of Cincinnati, the prediction changes.

As the away team, the Bengals are only predicted to win by 1 point, 20-19, instead of two points. Playing at the Steelers home field makes the score prediction slightly closer, and leads to both teams scoring fewer total points.

Uneven Match Up

A good way to see how the model works is two put a league leader against a team at the bottom of the standings. Let’s look at the predicted score if the Packers played at home against the Texans.

This combination leads to a much larger spread. This model predicts the Packers to win 22-10, a 12 point difference. This is about what we would expect since the Packers are an elite team, especially at their home stadium, and the Texans only won 4 games all of last year.

Further Analysis

The model built in this analysis is a fairly simple one as far as score predictions go. It takes hours upon hours of meticulous analysis to build a Las Vegas-level prediction model. Some things that could be added to this model to make it more accurate would be variables regarding weather conditions or individual players that may be injured and not playing.

I hope you enjoyed this analysis and learned something from it, whether you were interested in the football or the predictive modeling, or both. There are so many cool ways to combine sports and analytics, and I for one am excited to see the new ways that are discovered in the future.