import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'D:/Anaconda3/Library/plugins/platforms'

Introduction

This report provides insight into National Football League (NFL) game results, statistics, and trends from 2010-2021. The NFL is undoubtedly one of the most popular sports in the United States with over 35% of the population claiming it as their favorite sport to watch. In 2022 over 99 million people tuned in to watch the Superbowl which was 15 million shy of the most watched Superbowl in 2015 with an astounding 114 million viewers. The NFL’s growth has no end it sight; they are now turning their sights to the global market by playing four games in Europe this year in hopes it will attract new audiences. The objective of this report is to provide a summary of NFL game and season results and highlight relationships between variables affecting game results. This report can be utilized by NFL fans to recognize team rankings and results over the past decade. Additionally, this report can be used by sports bettors to identify game and team trends to support their picks.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import urllib3
import matplotlib.patches as mpatches
import seaborn as sns
from matplotlib.ticker import FuncFormatter

from urllib.request import urlretrieve

warnings.filterwarnings("ignore")


dst = r"U:\Data Visualization\NFL Scores\NFL Scores.csv"
dst2 = r"U:\Data Visualization\NFL Scores\nfl_teams.csv"

filename = dst
filename2 = dst2

df = pd.read_csv(filename)
df_teams = pd.read_csv(filename2)
df10 = pd.read_csv(filename)

df = df[df['schedule_date']. notna()]

df = df[df['over_under_line']. notna()]

x = df.groupby(['schedule_season', 'spread_favorite'])['schedule_season'].count().reset_index(name='count')
x= pd.DataFrame(x)
x = x.astype({'schedule_season':'int'})
x2 = x.loc[~x['schedule_season'].isin(range(2000))]
x3 = x2.loc[~x['schedule_season'].isin(range(2010))]
x3 = x3.reset_index(drop=True)

result = []
for value in x3["spread_favorite"]:
    if value == 0:
        result.append("0")
    elif value <= -0.5 and value >= -4.5:
        result.append("0.5-3")
    elif value <= -5 and value >= -9.5:
        result.append("3.5-7")
    elif value <= -10 and value >= -14:
        result.append("7.5-10")
    elif value <= -14.5:
        result.append("14.5")
    
      
x3["Result"] = result 
total_game_points= df

total_game_points = total_game_points.loc[~total_game_points['schedule_season'].isin(range(2010))]
total_game_points= pd.DataFrame(total_game_points)

total_game_points.drop(['schedule_date', 'schedule_week', 'schedule_playoff', 'stadium', 'stadium_neutral', 
                      'weather_humidity', 'weather_detail'], axis=1, inplace=True)

total_game_points['total_points'] = total_game_points['score_home'] + total_game_points['score_away']

total_game_points = total_game_points.reset_index(drop=True)
total_game_points = total_game_points.astype({'over_under_line':'float'})

total_game_points.loc[total_game_points['total_points']  > total_game_points['over_under_line'], 'total_points_over'] = 'Yes'
total_game_points.loc[total_game_points['total_points']  < total_game_points['over_under_line'], 'total_points_over'] = 'No'
total_game_points.loc[total_game_points['total_points'] == total_game_points['over_under_line'], 'total_points_over'] = 'Push'

winner = total_game_points

winner.loc[winner['score_home']  > winner['score_away'], 'game_winner'] = winner['team_home']
winner.loc[winner['score_home']  < winner['score_away'], 'game_winner'] = winner['team_away']
winner.loc[winner['score_home'] == winner['score_away'], 'game_winner'] = 'Tie'

winner = winner.replace('St. Louis Rams', 'Los Angeles Rams')
winner = winner.replace('Washington Redskins', 'Washington Football Team')
winner = winner.replace('Oakland Raiders', 'Las Vegas Raiders')
winner = winner.replace('San Diego Chargers', 'Los Angeles Chargers')

main_data = df.loc[~df['schedule_season'].isin(range(2010))]
main_data['total_points'] = main_data['score_home'] + main_data['score_away']
main_data = main_data.astype({'over_under_line':'float'})
main_data.loc[main_data['total_points']  > main_data['over_under_line'], 'total_points_over'] = 'Yes'
main_data.loc[main_data['total_points']  < main_data['over_under_line'], 'total_points_over'] = 'No'
main_data.loc[main_data['total_points'] == main_data['over_under_line'], 'total_points_over'] = 'Push'

inner_join = pd.merge(main_data, 
                      df_teams, 
                      on ='team_favorite_id', 
                      how ='inner')
inner_join
main_data2 = inner_join

main_data2.loc[main_data2['score_home'] > main_data2['score_away'], 'home_team_win'] = 'Home Team Wins'
main_data2.loc[main_data2['score_home'] < main_data2['score_away'], 'home_team_win'] = 'Away Team Wins'
main_data2.loc[main_data2['score_home'] == main_data2['score_away'], 'home_team_win'] = 'Tie'

main_data2.loc[main_data2['score_home'] > main_data2['score_away'], 'game_winner'] = main_data2['team_home']
main_data2.loc[main_data2['score_home'] < main_data2['score_away'], 'game_winner'] = main_data2['team_away']
main_data2.loc[main_data2['score_home'] == main_data2['score_away'], 'game_winner'] = 'Tie'

main_data2 = main_data2.replace('SuperBowl', 'Superbowl')
main_data2 = main_data2.replace('WildCard', 'Wildcard')
main_data2 = main_data2.astype({'schedule_season':'int'})

Dataset

The dataset is comprised of data points from over 3,100 NFL games spanning between 2010 and 2021. The original dataset was sourced from a public data archive on Kaggle.com. The dataset includes NFL game results and features that include the NFL season, schedule week, home and away teams, home and away score, point spread, favored team, over and under line and weather conditions. The NFL game dataset was last updated in early 2022 to incorporate all games from the 2021 season.

Findings

Total Points Per Game

The line graph, below, provides insight into the average total points scored per game, year over year. The visualization shows that the average total points per game has been trending upwards since the start of 2011. The average total points per game, during this period, peaked at 49.5 points a game in 2020 and had a low point of 43.5 points per game in 2017. The surge in points per game in 2020 was over 8% higher than 2019. The increase in average points per game can be attributed to a shift towards more passing on offense and recent rule adjustments that adversely affect defenses.


total_points = main_data2.groupby(['schedule_season', 'schedule_week']).agg({'total_points':
                                                       ['mean']}).reset_index()
total_points = pd.DataFrame(total_points)
total_points.columns = ['Season', 'Week', 'AvgPoints']
total_points.drop(total_points[total_points['Week'] == 'Division'].index, inplace = True)
total_points.drop(total_points[total_points['Week'] == 'Superbowl'].index, inplace = True)
total_points.drop(total_points[total_points['Week'] == 'Wildcard'].index, inplace = True)
total_points.drop(total_points[total_points['Week'] == 'Conference'].index, inplace = True)

total_points_by_year = total_points.groupby(['Season']).agg({'AvgPoints':['mean']}).reset_index()

total_points_by_year.columns = ['Season', 'AvgPoints']

total_points2 = total_points.loc[~total_points['Season'].isin(range(2016))]

total_points2.sort_values(by = ['Week'], ascending=True)
plt.figure(figsize=(15,10))

plt.plot(total_points_by_year['Season'], total_points_by_year['AvgPoints'], color='blue', marker='o')
plt.title('Average Total Points 2010-2021', fontsize=18)
plt.xlabel('Season', fontsize=18)
plt.ylabel('Average Points', fontsize=18)
plt.tick_params(axis='x', labelsize = 14, rotation =0)
plt.tick_params(axis='y', labelsize = 14, rotation =0)


my_x_ticks = [*range( total_points_by_year['Season'].min(), total_points_by_year['Season'].max()+1, 1)]
plt.xticks(my_x_ticks, fontsize = 14, color = 'black')
plt.grid(True)
plt.show()

Home vs. Away Wins

The line graph, below, provides insight into the occurrence of home and away wins by season. This visualization indicates that historically, teams playing at home have an advantage over their opponent. The average home team win percentage since the start of 2010 was 56%. In 2018, home teams won 158 games which was 60% win rate. In 2020, however, away teams matched the number of wins by the home team at 134 wins. The data points between 2019 and 2021 indicate that home field advantage, in the NFL, is trending downwards. The recent decline can potentially be attributed to more competitive teams throughout the league and improved travel methods that reduce jetlag and fatigue.


home_team_wins = main_data2.groupby(['schedule_season', 'home_team_win'])['home_team_win'].count().reset_index(name='Count')
home_team_wins = pd.DataFrame(home_team_wins)
home_team_wins=home_team_wins.drop(home_team_wins.index[[6,9,12,17,22,25,28,31]])


fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'Home Team Wins':'blue',
            'Away Team Wins':'green'}

for key, grp in home_team_wins.groupby(['home_team_win']):
    grp.plot(ax=ax, kind='line', x = 'schedule_season', y= 'Count',
            color= my_colors, label=key, marker='8')

plt.title('Home and Away Wins 2010-2021', fontsize = 18)
ax.set_xlabel('Season', fontsize =14)
ax.set_ylabel('Total Wins', fontsize = 14)

plt.show()

Total Wins by Team 2010-2021

The horizontal bar graph, below, shows the total number of wins by team since 2010. The average number of wins over this period was 100 wins. The New England Patriots have been one of the league’s best teams with 158 games won since 2010 which is the most in the NFL and 58 games above the mean. During this period the Patriots made it to the playoffs an astounding 11 times and won three Superbowl’s. Along with the Patriots, the Green Bay Packers, Seattle Seahawks, Kansas City Chiefs and Pittsburgh Steelers all rank in the top five; each winning over 126 games since 2010. The Jacksonville Jaguars have won the least number of games over the same period with a league low 56 wins which is 44 games below the mean and 64% less wins than the New England Patriots. The Cleveland Browns, Washington Football Team, New York Jets and Detroit Lions round out the bottom five with less than 80 wins each. Coaching, quarterback play, defenses and team front offices all play a major factor in a team’s ability to consistently win. The teams in the top five of wins check the box on all four of those categories while the bottom five have struggled due to poor performance and leadership.


team_wins = winner.groupby(['game_winner'])['game_winner'].count().reset_index(name='Count')
team_wins= pd.DataFrame(team_wins)

team_wins = team_wins.sort_values(by= ['Count'], ascending =True)

team_wins = team_wins.drop(labels=31, axis=0)

def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.Count.mean()
    for each in this_data.Count:
        if each > avg*1.01:
            colors.append('blue')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('black')
    return colors

d1 = team_wins
my_colors1 = pick_colors_according_to_mean_count(d1)


Above = mpatches.Patch(color = 'blue', label = "Above Average")
Below = mpatches.Patch(color = 'green', label = "Below Average")

fig = plt.figure(figsize=(18,16))

ax1 = fig.add_subplot(1,1,1)
ax1.barh(d1.game_winner, d1.Count, label = "Count", 
        color = my_colors1)
for row_counter, value_at_row_counter in enumerate(d1.Count):
    ax1.text(value_at_row_counter +2, row_counter, str(value_at_row_counter),
            color='black', size =12, fontweight='bold',
            ha='left', va= 'center', backgroundcolor= 'white')
plt.xlim(0,d1.Count.max()*1.1)
ax1.legend(handles=[Above, Below] , fontsize=14)
plt.axvline(d1.Count.mean(), color ='black', linestyle='dashed')
ax1.set_title('Most Wins by Team 2010-2021')
ax1.text(d1.Count.mean()+2, 0, 'Mean = ' + str(round(d1.Count.mean())), fontsize = 14)

plt.show()

Top 10 Teams as Favorite and Average Point Spread

The bar graph, below, provides insight into the 10 teams that were favored the most since the start of 2010. The graph also shows the team’s average point spread as the favorite over the same period. As we can see, the New England Patriots are once again at the top of the list. Over this period the Patriots were favored 184 times which was 30 more games more than the next closest team, the Green Bay Packers. The Patriots average point spread in games that they were favored was 7.5 which was over a full point higher than the other teams in the top 10. During the Patriots dominant run in the 2010’s they covered the spread almost 60% of the time which is the highest among any team in the league. This shows just how great Tom Brady and the Patriots were during this period; not only were they favored the most, but they also covered their point spread the most as well.

One interesting data point, from this graph, is the average point spread of the Seattle Seahawks. The Seahawks were favored in games 7th most in the league since 2010 but on average they were favored by the 3rd highest points per game at 6.5 points. These two data points suggest that the Seattle Seahawks had a very good team for shorter period between 2010 and 2021. The Seahawks went to the playoffs in five straight seasons between 2012 and 2016, making it to two Superbowl’s during that time and winning one.


top_spread = winner.groupby(['team_favorite_id']).agg({'team_favorite_id': ['count'], 'spread_favorite':
                                                       ['mean']}).reset_index()
top_spread = pd.DataFrame(top_spread)


top_spread.columns = ['Team', 'Count', 'AvgSpread']
top_spread['AvgSpread'] = top_spread['AvgSpread'].abs()
top_spread = top_spread.sort_values(by= ['Count'], ascending =False)
top_spread = top_spread.reset_index(drop=True)
top_spread = top_spread.loc[0:9]

def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, 
                    symbol+format(height, place_of_decimals), fontsize = 11,
                                 color = 'black', ha='center', va='bottom')
                                 
                                 
                                 
                                 
                                 
fig = plt.figure(figsize= (18, 10))
ax1 = fig.add_subplot(1,1,1)
ax2= ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
count_bars = ax1.bar(x_pos-(0.5*bar_width), top_spread.Count, 
                    bar_width, color = 'gray', edgecolor= 'black'
                    , label = 'Favorite Count')

avg_spread_bars = ax2.bar(x_pos+(0.5*bar_width), top_spread.AvgSpread, 
                    bar_width, color = 'green', edgecolor= 'black'
                    , label = 'Average Spread')
ax1.set_xlabel('Team', fontsize = 18)
ax1.set_ylabel('Favorite Occurence', fontsize=18, labelpad= 20)
ax2.set_ylabel('Average Spread', fontsize = 18, rotation=270, labelpad = 20)
ax1.tick_params(axis='y', labelsize = 14)
ax2.tick_params(axis='y', labelsize = 14)

plt.title('Top 10 Teams as Favorite and Average Total Spread', fontsize = 18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(top_spread.Team, fontsize =18)

count_color, count_label = ax1.get_legend_handles_labels()
fine_color, fine_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + fine_color, 
                    count_label + fine_label, loc = 'upper right',
                   frameon = True, ncol = 1, shadow = True, 
                   borderpad = 1, fontsize = 15)
ax1.set_ylim(0, top_spread.Count.max()*1.2)
autolabel(count_bars, ax1, '.0f', '')
autolabel(avg_spread_bars, ax2, '.2f', '')

plt.show()

Heatmap Number of Wins by Division and Year

The heatmap below, provides insight into the number of wins by year for each division since 2010. These data points indicate which divisions have been the most consistent and have had the best seasons in the past decade. On average, the AFC east has won the most games per year at approximately 35 wins. The AFC South has won the least amount games per year on average at 29 wins. The highest number of wins recorded in a single season, during this period, was 47 wins by the NFC West in 2013. This was 22 wins more than the AFC South who recorded a league low 25 wins in the same year. The lowest number of wins in a single season was 23 wins by the NFC South in 2014 and 23 Wins by the NFC East in 2020.


df10 = df10.loc[~df10['schedule_season'].isin(range(2010))]
df10 = df10[df10['schedule_date']. notna()]
df10 = df10.astype({'schedule_season':'int'})



df10.loc[df10['score_home'] > df10['score_away'], 'game_winner'] = df10['team_home']
df10.loc[df10['score_home'] < df10['score_away'], 'game_winner'] = df10['team_away']
df10.loc[df10['score_home'] == df10['score_away'], 'game_winner'] = 'Tie'

heatmap_df = df10[['schedule_season', 'game_winner']].copy()

heatmap_df = heatmap_df.replace('St. Louis Rams', 'Los Angeles Rams')
heatmap_df = heatmap_df.replace('Washington Redskins', 'Washington Football Team')
heatmap_df = heatmap_df.replace('Oakland Raiders', 'Las Vegas Raiders')
heatmap_df = heatmap_df.replace('San Diego Chargers', 'Los Angeles Chargers')

heatmap_df = heatmap_df.rename(columns={'game_winner': 'team_name'})
inner_join = pd.merge(heatmap_df, 
                      df_teams, 
                      on ='team_name', 
                      how ='inner')
inner_join
heatmap_df2 = inner_join

heatmap_df3 = heatmap_df2.groupby(['schedule_season', 'team_division'])['team_division'].count().reset_index(name='Count')
heatmap_df3 = pd.DataFrame(heatmap_df3)

hm_df = pd.pivot_table(heatmap_df3, index = 'team_division', columns = 'schedule_season',
                      values = 'Count')

fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)


ax = sns.heatmap(hm_df, linewidth=0.2, annot = True, 
                 cmap = 'coolwarm', fmt = '.0f', square = True,
                 annot_kws={'size': 16},
                 cbar_kws = {'orientation': 'vertical'})
plt.title('Heatmap of the Number of Wins by Division and Year',
         fontsize = 18, pad =15)
plt.xlabel('Season', fontsize = 18)
plt.ylabel('Division', fontsize = 18)
plt.yticks(rotation = 0, size =14)
plt.xticks(size = 14)
cbar = ax.collections[0].colorbar


plt.show()

Total Superbowl Wins by Conference and Year

The pie graph below provides insight into the total Superbowl wins by conference and team. Since 2010, seven of the eight conferences have had a team win a Superbowl. The AFC South is the only division that has not had a Superbowl winner during this period. The AFC East has the most Superbowl wins over this period with all three belonging to the New England Patriots. The AFC West and NFC East had a great stretch as well with two Superbowl winners from each division. Superbowl’s are hard to come by in the NFL, only the best and most consistent teams have the chance to compete for one each year. The team’s that have won recent Superbowl’s are all within the top half of wins per year and total games as favorite. This indicates that teams that are consistent with good coaching, quarterbacks, defenses, and front offices are positioned favorably to win championships.


superbowl =main_data2.query("schedule_week == 'Superbowl'")

superbowl.loc[superbowl['score_home'] > superbowl['score_away'], 'game_winner'] = superbowl['team_home']
superbowl.loc[superbowl['score_home'] < superbowl['score_away'], 'game_winner'] = superbowl['team_away']
superbowl.loc[superbowl['score_home'] == superbowl['score_away'], 'game_winner'] = 'Tie'






pie_df2 = superbowl.groupby(['game_winner'])['game_winner'].count().reset_index(name='Count')
pie_df2 = pd.DataFrame(pie_df2)


pie_df2 = pie_df2.rename(columns={'game_winner': 'team_name'})


inner_join = pd.merge(pie_df2, 
                      df_teams, 
                      on ='team_name', 
                      how ='inner')
inner_join
pie_df2 = inner_join


pie_df2 = pie_df2.sort_values('team_division', ascending=False)


outside_color_ref_number = [0, 16, 12, 4, 8, 9, 10]
inside_color_ref_number = [1, 2, 3, 17, 13, 14, 5, 6, 11]

fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap('tab20')
outer_colors = colormap(outside_color_ref_number)

all_wins = pie_df2.Count.sum()

pie_df2.groupby(['team_division'])['Count'].sum().plot(
    kind = 'pie', radius = 1, colors = outer_colors, 
    pctdistance = 0.85, labeldistance = 1.1, 
    wedgeprops = dict(edgecolor = 'w'), 
    textprops = {'fontsize' : 18},
    autopct = lambda p: '{:.2f}%\n({:.0f})'.format(p,(p/100)*all_wins),
    startangle= 90)

inner_colors = colormap(inside_color_ref_number)
pie_df2.Count.plot(
    kind = 'pie', radius = 0.7, colors = inner_colors, 
    pctdistance = 0.55, labeldistance = 0.72, 
    wedgeprops = dict(edgecolor = 'w'), 
    textprops = {'fontsize' : 11},
    labels = pie_df2.team_name_short,
    autopct = '%.0f%%',
    startangle= 90)

hole = plt.Circle((0,0), 0.3, fc = 'white')
fig1= plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Superbowl Wins by Conference and Team 2010-2021', fontsize = 18)

ax.text(0, 0, 'Total Games' + ' ' + str(all_wins), ha = 'center', va= 'center')

ax.axis('equal')
plt.tight_layout()

plt.show()

Conclusion

The popularity of the NFL has risen at a tremendous rate in the past decade and has become one of the most popular sports to watch and bet on in the world. One of the reasons for the NFL’s rise in popularity is the fast paced, high scoring games. A shift towards more passing plays on offense and league rule changes that adversely affect defenses has resulted in an upward trend in scoring since 2011. The NFL has benefited by the rise in total points per game by helping retain and grow their audience. Historically, NFL teams playing at home have had an advantage over their visiting opponent. However, according to the data, home field advantage has trended downwards over the past few years with away teams winning the same number of games as teams playing at home in 2020. Staying on top in the NFL is difficult; teams usually have short windows to be great due to the nature of the league. The New England Patriots have been the best team by far winning 158 games and three Superbowl’s over this period. The Green Bay Packers, Seattle Seahawks, Kansas City Chiefs and Pittsburgh Steelers have also maintained consistency, each winning over 126 games since 2010 and winning three Superbowl’s between them. Good coaching, quarterback play, defenses and team front offices all play a major factor in a team’s ability to consistently win. Over the past decade these teams have clearly managed these factors exceptionally well and are positioned well to continue winning in the future.