From a young age, I have watched NFL football and cheered for my team every week. Given my dad is from Pittsburgh and my parents met in Pittsburgh, I was naturally raised a lifelong Steelers fan. With that said, I not only wanted to compare my team individually, but also the league as a whole. The past 20 years have seen successes and failures from every NFL team. When choosing an option for my final capstone project, I was instantly drawn to extending a project I had previously done on the NFL. The NFL has a plethora of data points publicly available. I thought, “What all goes into winning an NFL game, and what teams are historically successful in the final standings?” Using the past 20 years worth of data, I sought to investigate this problem.
Aforementioned, for my final capstone project, I am expanding upon my final project from Data Wrangling in R (BANA 7025) with Professor Tianhai Zu. I originally worked with a partner on this project; however, the extension will be my own individual work. I plan on using the functions in R to deliver overall summary statistics on games and standings. Additionally, I will use the data to develop potential correlations and plot respective data visualizations. Utilizing descriptive analysis of the past 20 years, I am looking to see if there can be predictive tendencies for NFL teams.
This analysis includes data from 2000 - 2019. I added 2020 season data to every dataset aside from nfl_attendance and nfl_games, as these would be skewed if 2020 data was added. This skewness would be due to the impact of COVID-19. COVID-19 caused games to be played on different days / times, cancellation of games, and it also caused little to no attendance based on location.
This NFL analysis consists of eight individual datasets:
NFL Attendance (nfl_attendance)
NFL Standings (nfl_standings)
NFL Games (nfl_games)
NFL Weather (nfl_weather)
NFL Playoff Coaches and Quarterbacks (nfl_playoffs)
NFL Passing Yards Leaders (nfl_passing)
NFL Rushing Yards Leaders (nfl_rushing)
NFL Penalty Yards Per Game (nfl_penalty)
More detailed information about each dataset can be found in the Data Preparation tab.
The NFL is a multi-billion dollar industry. Millions of fans across the world cheer for these 32 teams every year. People are now looking for ways to understand the game better.
Coaches want to understand what makes a team more successful. Sports gamblers want to get an edge and make the correct picks based on more than just gut feelings. Fans want to know if their team is progressing in the right direction. This analysis is useful for all of these situations. Using descriptive analysis, past results can be better explained. As such, trends can be deduced to predict how NFL games and seasons will occur. Although no one can see into the future, understanding the data sheds a better light on the probability of certain results occurring in the NFL.
The goal of my analysis is to inform my readers on what all goes into winning an NFL game. My hope is that the audience will finish reading my report and better understand historic trends and performance from teams, players, and coaches alike. As a final capstone project, I hope to demonstrate proficiency in R using R Markdown as well as flexdashboard with Shiny components.
The datasets contain loads of information for the NFL. With a wide range of variables, many options are available to analytically investigate the NFL. With the eight datasets at hand, I looked to compare them to draw conclusions about team performances. To see if statistical significance or rational conclusions related to the NFL could be realized, the following situations were explored:
nfl_games dataset contains many variables for games. Turnovers, day of the week, points, etc. are shown for every match-up. Correlations into why teams win or lose will be the goal of this analysis. Using a plethora of variables, significance of certain variables will be essential for further understanding.nfl_weather dataset contains the information of both the home and away teams from 2000 - 2013. This dataset also includes three weather-related variables: (1) temperature, (2) humidity, and (3) wind speed (in mph). I want to see which teams perform under certain weather conditions. Additionally, I hope to create a few linear models to see if weather conditions can predict whether or not the game will be high-scoring or low-scoring.nfl_playoffs dataset includes information of teams who went to the playoffs from 2000 - 2020. This dataset also includes the Super Bowl Champions. I am curious to analyze trends regarding the coaches and quarterbacks who led the teams to success. Are certain quarterbacks consistently better-performing? Are there better head coaches than others?nfl_passing dataset includes information from the past 20 years on the players with the most passing yards. Which player has performed consistently over the past 20 years? Who is the “best”?nfl_rushing dataset has the same information as the nfl_passing information, except it focuses on rushing yards instead of passing yards. Which players had the most rushing yards each year from 2000 - 2020?This project requires a variety of packages. Given there are over 10,000 packages in R, I want to focus on the ones that will provide me with the best results while cleaning and interpreting the data.
Some packages will be more useful than others. For example, ggplot2 allows for great visualizations that provide better understanding of the data. Additionally, dplyr can drill deeper into the eight datasets to come to conclusions that may be hidden at first. R has powerful functions that can derive explanations for questions to massive datasets. Please see below for all of the packages loaded for this analysis:
# Packages required
library(tidyverse) # Use to tidy data
library(dplyr) # Use to manipulate data
library(ggplot2) # Use to plot data and create visualizations
library(tibble) # Use to manipulate and re-imagine data
library(readr) # Use to import data cleanly and efficiently
library(DT) # Use to create comprehensive data tables with HTML output
library(knitr) # Use for dynamic report generation
library(base) # Contains Base R functions
library(ggthemes) # Use themes in data visualizations
library(plotly) # Use to plot data and create visualizations
library(ggpubr) # Use to show multiple plots at once
library(GGally) # Use to produce scatter plot matrix
library(rmarkdown) # Use to produce report
library(flexdashboard) # Use to produce flexdashboard
library(stringr) # Provides functions to work with strings
library(highcharter) # Includes shortcut functions to plot R objects
library(shinythemes) # Use to implement themes for outputMost of the data (nfl_attendance, nfl_standings, and nfl_games) was obtained from my professor, Tianhai Zu, for the Data Wrangling in R class. He had provided four different datasets in which to choose, and my partner and I chose the NFL option. These datasets can be found on GitHub. Reading the information on GitHub led me to find the original source of the data, which is Pro Football Reference Standings and Pro Football Reference Attendance.
This NFL analysis contains of eight individual datasets - (1) nfl_attendance, (2) nfl_standings, (3) nfl_games, (4) nfl_weather, (5) nfl_playoffs, (6) nfl_passing, (7) nfl_rushing, and (8) nfl_penalty.
I first merged three of the datasets (nfl_attendance, nfl_standings, and nfl_games) into one dataframe called nfl_df. I decided it might be beneficial to have multiple frames of reference, some utilizing individual datasets, and another by looking at the combined dataframe. Rather than using str() and summary() to show descriptive statistics for each variable, I decided to create comprehensive tables. Then, in the Data Preparation tab, I cleaned every dataset.
[1] "C:/Users/katie/OneDrive - University of Cincinnati/SS21/Full Semester/Capstone (BANA 8083)/Project Files"
# Get the data
nfl_attendance <- readr::read_csv('attendance.csv')
nfl_standings <- readr::read_csv('updatedstandings.csv')
nfl_games <- readr::read_csv('games.csv')
nfl_weather <- readr::read_csv('weather.csv')
nfl_playoffs <- readr::read_csv('post_season.csv')
nfl_passing <- readr::read_csv('passing_yards_leaders.csv')
nfl_rushing <- readr::read_csv('rushing_yards_leaders.csv')
nfl_penalty <- readr::read_csv('penalty_yards_per_game.csv')
# To use 2020 data you need to update tidytuesdayR from GitHub
# Install via devtools::install_github("thebioengineer/tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2020-02-04')
Downloading file 1 of 3: `attendance.csv`
Downloading file 2 of 3: `games.csv`
Downloading file 3 of 3: `standings.csv`
Downloading file 1 of 3: `attendance.csv`
Downloading file 2 of 3: `games.csv`
Downloading file 3 of 3: `standings.csv`
As aforementioned, the nfl_attendance dataset was imported and obtained from Pro Football Reference. The original data contains 10,846 observations and eight variables. There are two character type variables, team and team_name. There are six numeric type variables, year, total, home, away, week, weekly_attendance. The data was collected from 2000 - 2020, and the values for the columns were observed during the 17 weeks of the NFL season.
# Examine the structure of the dataset
datatable(head(nfl_attendance, 10))
# Create a data dictionary for attendance
var_names_att <- colnames(nfl_attendance)
var_types_att <- lapply(nfl_attendance, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_att <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total attendance per season", "Total attendance at home games per season", "Total attendance at away games per season", "Week in which game was played", "Attendance for given week")
data_dict_att <- as_tibble(cbind(var_names_att, var_types_att, var_descriptions_att))
colnames(data_dict_att) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_att) # kable returns a single table for a single data objectLooking at the missing data values, the only column in which missing values exist is the weekly_attendance. This makes sense, as each NFL team has at least one bye week during the regular season. I decided to omit these values as they would skew the data and misrepresent the trends for each team.
colSums(is.na(nfl_attendance)) # Find the number of missing values per column
nfl_attendance <- na.omit(nfl_attendance)
colSums(is.na(nfl_attendance)) # Confirm there are no missing valuesLooking at this above original dataset, I decided to first rename the columns to better describe the data.
nfl_attendance <- nfl_attendance %>% dplyr::rename(
team_location = team,
total_attendance = total,
total_home_attendance = home,
total_away_attendance = away
)Additionally, I split it into two dataframes, the first omitting the weekly data, and the second omitting the season totals. This decision was made largely to remove duplicates, and I knew it would bode for better visualizations during the exploratory data analysis (EDA).
The first dataset, nfl_total_attendance erased the two columns, week and weekly_attendance. This dataset will show the season totals for attendance per each team. The second dataset, nfl_weekly_attendance erased the season total data columns, total, home, and away.
nfl_total_attendance <- nfl_attendance[-c(7, 8)] # Remove weekly data
nfl_total_attendance <- nfl_total_attendance[!duplicated(nfl_total_attendance), ] # Remove duplicates
datatable(head(nfl_total_attendance, 10))
nfl_weekly_attendance <- nfl_attendance[-c(4, 5, 6)] # Remove season total attendance data
datatable(head(nfl_weekly_attendance, 10))Now, for a summary of the two datasets and associated tables of the CLEANED data, please see below.
NFL Total Attendance Dataset
Data Dictionary for the NFL Total Attendance Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| team_location | character | City or state in which the team originates |
| team_name | character | Name or mascot of the team |
| year | numeric | Year |
| total_attendance | numeric | Total attendance per season |
| total_home_attendance | numeric | Total attendance at home games per season |
| total_away_attendance | numeric | Total attendance at away games per season |
NFL Weekly Attendance Dataset
Data Dictionary for the NFL Weekly Attendance Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| team_location | character | City or state in which the team originates |
| team_name | character | Name or mascot of the team |
| year | numeric | Year |
| week | numeric | Week in which game was played |
| weekly_attendance | numeric | Attendance for given week |
The nfl_standings dataset was imported and obtained from Pro Football Reference. The original data contains 638 observations and 15 variables. There are four character type variables, team, team_name, playoffs, and sb_winner. There are 11 numeric type variables, year, wins, loss, points_for, points_against, points_differential, margin_of_victory, strength_of_schedule, simple_rating, offensive_ranking, and defensive_ranking. The data observed was collected from 2000 - 2020. The process of cleaning the ORIGINAL data can be seen below.
# Examine the structure of the dataset
datatable(head(nfl_standings, 10))
# Create a data dictionary for standings
var_names_st <- colnames(nfl_standings)
var_types_st <- lapply(nfl_standings, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_st <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins per season (0 to 16)", "Total losses per season (0 to 16)", "Total points the team scored per season", "Total points the opponent scored on the team per season", "The difference between the total points for the team and against the team", "Points differential divided by the total number of games per season", "Difficulty of schedule based on opponent records", "A rating for the team that takes into account points differential and strength of schedule (measured by Simple Rating System)", "A rating comparing how well the offense performs to opponent teams (measured by Simple Rating System)", "A rating comparing how well the defense performs to opponent teams (measured by Simple Rating System)", "Stating whether or not the team made it to the playoffs", "Stating whether or not the team won the Super Bowl for the season")
data_dict_st <- as_tibble(cbind(var_names_st, var_types_st, var_descriptions_st))
colnames(data_dict_st) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_st) # kable returns a single table for a single data objectLooking at the above dataset, I first decided to change the column names to better describe the data.
nfl_standings <- nfl_standings %>% dplyr::rename(
team_location = team,
total_wins = wins,
total_losses = loss
)It is important to note as well that a few of the variable names refer to calculated values. The calculated value for points_differential is: points_differential = points_for - points_against. Additionally, margin_of_victory is calculated by: points_scored - points_allowed / games_played.
Lastly, the simple_rating is calculated by: \[SRS = MoV + SoS = OSRS + DSRS\]
In layman’s terms, the simple rating system is equal to the margin of victory plus the strength of schedule. This is equal to the offensive simple rating standing plus the defensive simple rating standing.
Next, I wanted to see what the sum of missing values was per column. As evident below, there are no missing values.Moving forward, I decided to change both the playoffs and sb_winner to binary variables. This is because they both only have two unique values.
unique(nfl_standings$playoffs, incomparables = FALSE) # View the unique values for the playoffs column
unique(nfl_standings$sb_winner, incomparables = FALSE) # View the unique values for the sb_winner columnKnowing this, I changed the two columns to binary variables. For the playoffs column, a value of one stands for “Playoffs”, and a value of zero stands for “No Playoffs”.
nfl_standings$playoffs[nfl_standings$playoffs == "Playoffs"] <- "1"
nfl_standings$playoffs[nfl_standings$playoffs == "No Playoffs"] <- "0"
nfl_standings$playoffs <- as.numeric(nfl_standings$playoffs)For the sb_winner column, a value of one denotes “Won Superbowl”, and a value of zero denotes “No Superbowl”.
nfl_standings$sb_winner[nfl_standings$sb_winner == "Won Superbowl"] <- "1"
nfl_standings$sb_winner[nfl_standings$sb_winner == "No Superbowl"] <- "0"
nfl_standings$sb_winner <- as.numeric(nfl_standings$sb_winner)Now, for a summary of the dataset and associated table of the data, please see the CLEANED dataset below.
NFL Standings Dataset
Data Dictionary for the NFL Standings Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| team_location | character | City or state in which the team originates |
| team_name | character | Name or mascot of the team |
| year | numeric | Year |
| total_wins | numeric | Total wins per season (0 to 16) |
| total_losses | numeric | Total losses per season (0 to 16) |
| points_for | numeric | Total points the team scored per season |
| points_against | numeric | Total points the opponent scored on the team per season |
| points_differential | numeric | The difference between the total points for the team and against the team |
| margin_of_victory | numeric | Points differential divided by the total number of games per season |
| strength_of_schedule | numeric | Difficulty of schedule based on opponent records |
| simple_rating | numeric | A rating for the team that takes into account points differential and strength of schedule (measured by Simple Rating System) |
| offensive_ranking | numeric | A rating comparing how well the offense performs to opponent teams (measured by Simple Rating System) |
| defensive_ranking | numeric | A rating comparing how well the defense performs to opponent teams (measured by Simple Rating System) |
| playoffs | numeric | Stating whether or not the team made it to the playoffs |
| sb_winner | numeric | Stating whether or not the team won the Super Bowl for the season |
Once again the nfl_games data was imported and obtained from Pro Football Reference. The original data contains 5,324 observations and 19 variables. There are 11 character variables, week, home_team, away_team, winner, tie, day, date, home_team_name, home_team_city, away_team_name, and away_team_city. There are seven numeric type variables, year, pts_win, pts_loss, yds_win, turnovers_win, yds_loss, and turnovers_loss. See the ORIGINAL dataset below.
# Examine the structure of the dataset
datatable(head(nfl_games, 10))
# Create a data dictionary for games
var_names_games <- colnames(nfl_games)
var_types_games <- lapply(nfl_games, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_games <- c("Year", "Week of the season in which the game was played", "Home team for the game", "Away team for the game", "Winner of the game", "Was there a tie? (if so, the other team will be listed in this column)", "Day of the week in which the game was played", "Date of the game", "Time of the day in which the game was played", "Number of points the winning team scored", "Number of points the losing team scored", "Total number of yards the winning team had", "Total number of turnovers the winning team had", "Total number of yards the losing team had", "Total number of turnovers the losing team had", "Name or mascot of the winning team", "City of the winning team", "Name or mascot of the losing team", "City of the losing team")
data_dict_games <- as_tibble(cbind(var_names_games, var_types_games, var_descriptions_games))
colnames(data_dict_games) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_games) # kable returns a single table for a single data objectLooking at the above dataset, the first step I took to clean the data was to remove the last four unnecessary columns, as I felt they were redundant.
names(nfl_games)
nfl_games <- nfl_games[-c(16, 17, 18, 19)] # Remove redundant columns
names(nfl_games)Then, I changed the week column to be numeric.
Looking at missing values, the only column which contained them was the tie column. This makes sense, as very few NFL games result in a tie.
Next, the way in which a tie was denoted was by listing one team name in the winner column, and the opponent team name in the tie column. To fix this, I identified any game that resulted in a tie. Then, for these specific games, I renamed the value in the winner column to “Tie”. The tie column was then erased.
colSums(is.na(nfl_games))
unique(nfl_games$tie, incomparables = FALSE)
nfl_games$winner[nfl_games$tie != is.na(nfl_games$tie)] <- "Tie"
nfl_games <- nfl_games[-c(6)] # Remove the tie column
colSums(is.na(nfl_games)) # Confirm there are no missing valuesTo view the summary and structure of the CLEANED data:
NFL Games Dataset
Data Dictionary for the NFL Games Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| year | numeric | Year |
| week | numeric | Week of the season in which the game was played |
| home_team | character | Home team for the game |
| away_team | character | Away team for the game |
| winner | character | Winner of the game |
| day | character | Day of the week in which the game was played |
| date | character | Date of the game |
| time | hms , difftime | Time of the day in which the game was played |
| pts_win | numeric | Number of points the winning team scored |
| pts_loss | numeric | Number of points the losing team scored |
| yds_win | numeric | Total number of yards the winning team had |
| turnovers_win | numeric | Total number of turnovers the winning team had |
| yds_loss | numeric | Total number of yards the losing team had |
| turnovers_loss | numeric | Total number of turnovers the losing team had |
Incorporating weather data into my analysis is an interesting next step. I want to see how the weather impacts the outcome of individual games. The nfl_weather data is from NFLsavant.com. All data and statistics from this site are compiled from publicly-available NFL play-by-play on the Internet. The one negative is that this data only has until 2013; however, I thought 13 years of data was enough to see any significant trends.
The original data contains 3,521 observations and 13 variables. The variables are described in the data dictionary below. See the ORIGINAL NFL Weather data below.
# Examine the structure of the dataset
datatable(head(nfl_weather, 10))
# Create a data dictionary for standings
var_names_w <- colnames(nfl_weather)
var_types_w <- lapply(nfl_weather, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_w <- c("Full home team name", "City or state in which the home team originates", "Name or mascot of the home team", "Total points scored by the home team", "Full away team name", "City or state in which the away team originates", "Name or mascot of the away team", "Total points scored by the away team", "Winner of the game", "Temperature during the game (in Fahrenheit)", "Humidity percentage during the game", "Wind speed in miles per hour (mph) during the game", "Date of the game played")
data_dict_w <- as_tibble(cbind(var_names_w, var_types_w, var_descriptions_w))
colnames(data_dict_w) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_w) # kable returns a single table for a single data objectLooking at the above dataset, the first step I took to clean the data was to remove the home_team and away_team columns, as I felt they were redundant.
names(nfl_weather)
nfl_weather <- nfl_weather[-c(1, 5)] # Remove redundant columns
names(nfl_weather)To view the summary and structure of the CLEANED data:
NFL Weather Dataset
Data Dictionary for the NFL Weather Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| home_team_city | character | City or state in which the home team originates |
| home_team_name | character | Name or mascot of the home team |
| home_score | numeric | Total points scored by the home team |
| away_team_city | character | City or state in which the away team originates |
| away_team_name | character | Name or mascot of the away team |
| away_score | numeric | Total points scored by the away team |
| winning_team | character | Winner of the game |
| temperature | numeric | Temperature during the game (in Fahrenheit) |
| humidity | numeric | Humidity percentage during the game |
| wind_mph | numeric | Wind speed in miles per hour (mph) during the game |
| date | character | Date of the game played |
The next dataset within my analysis is the nfl_playoffs dataset. This looks into the coaches and quarterbacks for each team that went to the playoffs from 2000 - 2020. I created this dataset myself through research.
The original data contains 3,521 observations and 13 variables. The variables are described in the data dictionary below. See the ORIGINAL NFL Weather data below.
# Examine the structure of the dataset
datatable(head(nfl_playoffs, 10))
# Create a data dictionary for standings
var_names_playoffs <- colnames(nfl_playoffs)
var_types_playoffs <- lapply(nfl_playoffs, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_playoffs <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins for the team", "Total losses for the team", "Whether or not the team went to the Playoffs", "Whether or not the team won the Super Bowl", "Head coach of the team", "Starting quarterback during the postseason")
data_dict_playoffs <- as_tibble(cbind(var_names_playoffs, var_types_playoffs, var_descriptions_playoffs))
colnames(data_dict_playoffs) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_playoffs) # kable returns a single table for a single data objectMoving forward, I decided to change the sb_winner to binary variables. This is because it only has two unique values. Because the unique value for the playoffs column is only “Playoffs”, I decided to drop that column.
unique(nfl_playoffs$playoffs, incomparables = FALSE) # View the unique values for the playoffs column
unique(nfl_playoffs$sb_winner, incomparables = FALSE) # View the unique values for the sb_winner column
names(nfl_playoffs)
nfl_playoffs <- nfl_playoffs[-6] # Remove unnecessary column
names(nfl_playoffs)For the sb_winner column, a value of one denotes “Won Superbowl”, and a value of zero denotes “No Superbowl”.
nfl_playoffs$sb_winner[nfl_playoffs$sb_winner == "Won Superbowl"] <- "1"
nfl_playoffs$sb_winner[nfl_playoffs$sb_winner == "No Superbowl"] <- "0"
nfl_playoffs$sb_winner <- as.numeric(nfl_playoffs$sb_winner)To view the summary and structure of the CLEANED data:
NFL Playoffs Dataset
Data Dictionary for the NFL Playoffs Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| team | character | City or state in which the team originates |
| team_name | character | Name or mascot of the team |
| year | numeric | Year |
| wins | numeric | Total wins for the team |
| loss | numeric | Total losses for the team |
| sb_winner | numeric | Whether or not the team won the Super Bowl |
| head_coach | character | Head coach of the team |
| qb | character | Starting quarterback during the postseason |
The nfl_passing dataset contains information regarding the league leader for passing yards from each year. Their respective team information is included. This data is from Pro Football Reference.
This dataset does not need to be cleaned or edited, so to view the summary and structure of the CLEANED data:
NFL Passing Dataset
Data Dictionary for the NFL Passing Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| year | numeric | Year |
| player | character | Name of the player with the most passing yards |
| yds | numeric | Total yards |
| team | character | Location of the team from which the player is on |
| team_name | character | Name or mascot of the team from which the player is on |
The last dataset, nfl_rushing, contains information regarding the league leader for rushing yards from each year. Their respective team information is included. This data is also from Pro Football Reference.
Similar to the last dataset, this dataset does not need to be cleaned or edited, so to view the summary and structure of the CLEANED data:
NFL Rushing Dataset
Data Dictionary for the NFL Rushing Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| year | numeric | Year |
| player | character | Name of the player with the most rushing yards |
| yds | numeric | Total yards |
| team | character | Location of the team from which the player is on |
| team_name | character | Name or mascot of the team from which the player is on |
The nfl_penalty dataset contains of average penalty yards per game per team from 2003 - 2020. The data is from TeamRankings.
This dataset did not need to be cleaned, so To look at the summary and structure of the CLEANED data:
NFL Penalty Dataset
Data Dictionary for the NFL Penalty Dataset
| Variable Name | Variable Data Type | Variable Description |
|---|---|---|
| team | character | City or state in which the team originates |
| team_name | character | Name or mascot of the team |
| 2020 | numeric | Average penalty yards per game from 2020 |
| 2019 | numeric | Average penalty yards per game from 2019 |
| 2018 | numeric | Average penalty yards per game from 2018 |
| 2017 | numeric | Average penalty yards per game from 2017 |
| 2016 | numeric | Average penalty yards per game from 2016 |
| 2015 | numeric | Average penalty yards per game from 2015 |
| 2014 | numeric | Average penalty yards per game from 2014 |
| 2013 | numeric | Average penalty yards per game from 2013 |
| 2012 | numeric | Average penalty yards per game from 2012 |
| 2011 | numeric | Average penalty yards per game from 2011 |
| 2010 | numeric | Average penalty yards per game from 2010 |
| 2009 | numeric | Average penalty yards per game from 2009 |
| 2008 | numeric | Average penalty yards per game from 2008 |
| 2007 | numeric | Average penalty yards per game from 2007 |
| 2006 | numeric | Average penalty yards per game from 2006 |
| 2005 | numeric | Average penalty yards per game from 2005 |
| 2004 | numeric | Average penalty yards per game from 2004 |
| 2003 | numeric | Average penalty yards per game from 2003 |
| total | numeric | Total penalty yards |
Now, I wanted to break attendance down on a division-basis. In order to do this, I added a column to the dataset, called “division”.
Once the division column was created, the breakdown of the strongest and weakest fan bases per division can be seen in the table below. Individual graphs for both the AFC and NFC can be seen under the tabs AFC Attendance Breakdown and NFC Attendance Breakdown.
| Strongest Fan Base | Weakest Fan Base | |
|---|---|---|
| AFC East | New York Jets | Miami Dolphins |
| AFC North | Baltimore Ravens | Cincinnati Bengals |
| AFC South | Houston Texans | Indianapolis Colts |
| AFC West | Kansas City Chiefs | Los Angeles Chargers |
| NFC East | Dallas Cowboys | Washington Redskins |
| NFC North | Green Bay Packers | Detroit Lions |
| NFC South | New Orleans Saints | Tampa Bay Buccaneers |
| NFC West | Los Angeles Rams | Arizona Cardinals |
Knowing the previously discussed attendance statistics, I want to see if a stronger home attendance impacts the total number of wins. A team cannot necessarily control their away attendance, as their most loyal fans are assumed to be unlikely attendees at an away game.
First, I wanted to discover if home attendance impacts total wins. To do so, I created a linear model with total_wins as the response variable and total_home_attendance as the predictor variable. I also obtained the correlation coefficient between the two variables. To the right, in the Home Attendance tab, it appears that there is a slight, positive linear relationship between the predictor variable (X or total_home_attendance) and the response variable (Y or total_wins). The correlation coefficient between the two variables is 0.1507, and this relationship is statistically significant at a 99% confidence level with a p-value of 0.000133. The lm() function was used to perform simple linear regression between the two variables.
Next, I wanted to discover if away attendance impacts total wins. I followed the same process I did for home attendance, creating a linear model with total_wins as the response variable and total_away_attendance as the predictor variable. From the visualization in the Away Attendance tab, it appears that there is also a very slight, positive linear relationship between the predictor variable (X or total_away_attendance) and the response variable (Y or total_wins). The correlation coefficient between the two variables is 0.1274, and this relationship is statistically significant at a 99% confidence level with a p-value of 0.00126. The lm() function was used to perform simple linear regression between the two variables.
# Attach the dataset
attach(joined_data)
# Create linear model for home attendance
home_attendance_model <- lm(total_wins ~ total_home_attendance)
summary(home_attendance_model)
Call:
lm(formula = total_wins ~ total_home_attendance)
Residuals:
Min 1Q Median 3Q Max
-7.7799 -2.2250 -0.0726 2.2478 7.9490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.225e+00 9.851e-01 4.289 2.07e-05 ***
total_home_attendance 6.955e-06 1.809e-06 3.845 0.000133 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.051 on 636 degrees of freedom
Multiple R-squared: 0.02272, Adjusted R-squared: 0.02118
F-statistic: 14.78 on 1 and 636 DF, p-value: 0.0001327
[1] 0.1507174
# Attach the dataset
attach(joined_data)
# Create linear model for away attendance
away_attendance_model <- lm(total_wins ~ total_away_attendance)
summary(away_attendance_model)
Call:
lm(formula = total_wins ~ total_away_attendance)
Residuals:
Min 1Q Median 3Q Max
-7.8922 -2.2928 -0.1177 2.2793 7.7255
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.310e-01 2.571e+00 -0.129 0.89758
total_away_attendance 1.539e-05 4.751e-06 3.238 0.00126 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.061 on 636 degrees of freedom
Multiple R-squared: 0.01622, Adjusted R-squared: 0.01468
F-statistic: 10.49 on 1 and 636 DF, p-value: 0.001265
[1] 0.1273652
This part of the analysis will look into the qualities of the division winners and the attributes that teams high in the standings have over teams in the lower portion of the standings. Furthermore, this approach will discover what separates Super Bowl Champions from the 31 other teams each season.
Firstly, I wanted to see which division has brought home the most Super Bowl Championships over the past 20 years. I once again added a “division” column to nfl_standings. As evident in the below visualization, the AFC East has had the most Super Bowl wins between 2000-2020. This can be largely attributed to the New England Patriots’ former quarterback Tom Brady and current head coach Bill Belichick bringing home championships in 2002, 2004, 2005, 2015, 2017, and 2019. Additionally, the second-best division appears to be the AFC North, with both the Pittsburgh Steelers and Baltimore Ravens winning at least one Super Bowl Championship each. Conversely, it appears the AFC South, NFC North, and NFC West have all only won one Super Bowl over the past two decades.
Analyzing NFL standings with the given datasets is a bit tricky due to the fact that standings are calculated using tie-breakers if necessary. Additionally, choosing which teams make the playoffs is largely based off of division success. With that being said, the team that had the most wins might not be the team with the best standing. For this analysis, I decided to break the teams down by division and see which ones have been dominant over the years.
I analyzed their success by using summary statistics showing the Average Total Wins, Average Total Losses, Average Points Per Game, and Average Opponent Points Per Game. The results can be seen in the tabs AFC Summaries and NFC Summaries. I also developed box plots for the average total wins per season by division to analyze the range of data for each team and any relevant outliers. These box plots can be seen in the tabs AFC Box Plots | Total Wins and NFC Box Plots | Total Wins. I also grouped the box plots by conference (AFC vs. NFC).
The most dominant teams per division, defined by highest average of total wins, (as discovered in the AFC Summaries and NFC Summaries tabs) are as follows:
I also developed a table of the last 20 Super Bowl winners with their offensive and defensive ranking. This table can be found in the Rankings of Super Bowl Champions tab.
| Year | Super Bowl Champion | Offensive Ranking | Defensive Ranking |
|---|---|---|---|
| 2000 | Ravens | 0.0 | 8.0 |
| 2001 | Patriots | 1.2 | 3.1 |
| 2002 | Buccaneers | -1.0 | 9.8 |
| 2003 | Patriots | 2.1 | 4.9 |
| 2004 | Patriots | 6.4 | 6.5 |
| 2005 | Steelers | 3.8 | 4.0 |
| 2006 | Colts | 6.9 | -1.1 |
| 2007 | Giants | 2.8 | 0.4 |
| 2008 | Steelers | 1.6 | 8.2 |
| 2009 | Saints | 11.2 | -0.5 |
| 2010 | Packers | 3.1 | 7.9 |
| 2011 | Giants | 3.1 | -1.5 |
| 2012 | Ravens | 1.9 | 1.0 |
| 2013 | Seahawks | 4.1 | 8.9 |
| 2014 | Patriots | 7.5 | 3.5 |
| 2015 | Broncos | 0.3 | 5.5 |
| 2016 | Patriots | 4.3 | 5.0 |
| 2017 | Eagles | 7.0 | 2.5 |
| 2018 | Patriots | 3.1 | 2.1 |
| 2019 | Chiefs | 6.2 | 2.9 |
| 2020 | Buccaneers | 6.5 | 2.8 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Bills | 7.142857 | 8.857143 | 20.41369 | 22.33631 |
| Dolphins | 7.571429 | 8.428571 | 20.11607 | 21.64881 |
| Jets | 7.142857 | 8.857143 | 19.61607 | 21.72917 |
| Patriots | 11.619048 | 4.380952 | 26.93155 | 18.74405 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Bengals | 7.095238 | 8.714286 | 20.64583 | 22.41071 |
| Browns | 5.238095 | 10.714286 | 17.69345 | 23.16071 |
| Ravens | 9.571429 | 6.428571 | 22.75298 | 18.31250 |
| Steelers | 10.333333 | 5.571429 | 23.20536 | 18.50893 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Colts | 9.904762 | 6.095238 | 25.01488 | 22.25893 |
| Jaguars | 6.095238 | 9.904762 | 19.55357 | 22.34226 |
| Texans | 7.105263 | 8.894737 | 21.02632 | 23.01316 |
| Titans | 8.142857 | 7.857143 | 21.80060 | 22.60714 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Broncos | 8.904762 | 7.095238 | 23.33929 | 21.99405 |
| Chargers | 8.047619 | 7.952381 | 24.05655 | 21.98214 |
| Chiefs | 8.571429 | 7.428571 | 23.58333 | 22.04167 |
| Raiders | 6.333333 | 9.666667 | 20.43452 | 24.71429 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Cowboys | 8.285714 | 7.714286 | 22.41071 | 21.98810 |
| Eagles | 9.238095 | 6.666667 | 24.03571 | 20.80357 |
| Giants | 7.809524 | 8.190476 | 21.79762 | 22.42857 |
| Redskins | 6.619048 | 9.333333 | 19.55655 | 22.33929 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Bears | 7.857143 | 8.142857 | 20.38393 | 20.93750 |
| Lions | 5.666667 | 10.285714 | 20.72619 | 24.98810 |
| Packers | 10.000000 | 5.904762 | 25.55357 | 21.33929 |
| Vikings | 8.190476 | 7.714286 | 22.87798 | 22.39881 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| Buccaneers | 7.095238 | 8.904762 | 21.03869 | 21.94643 |
| Falcons | 8.000000 | 7.952381 | 22.71429 | 22.90179 |
| Panthers | 7.714286 | 8.238095 | 21.19048 | 21.78274 |
| Saints | 9.285714 | 6.714286 | 26.13988 | 23.18452 |
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| 49ers | 7.333333 | 8.619048 | 21.13393 | 22.48512 |
| Cardinals | 6.904762 | 9.000000 | 20.37202 | 23.61607 |
| Rams | 7.333333 | 8.619048 | 21.58631 | 23.45238 |
| Seahawks | 9.238095 | 6.714286 | 23.19643 | 20.69048 |
Combining the tables from the previous tabs to form one table with average statistics, the following leaders can be found:
For a more in-depth look at each team, please refer to the table below.
| Team Name | Average Total Wins | Average Total Losses | Average Points Per Game | Average Opponent Points Per Game |
|---|---|---|---|---|
| 49ers | 7.333333 | 8.619048 | 21.13393 | 22.48512 |
| Bears | 7.857143 | 8.142857 | 20.38393 | 20.93750 |
| Bengals | 7.095238 | 8.714286 | 20.64583 | 22.41071 |
| Bills | 7.142857 | 8.857143 | 20.41369 | 22.33631 |
| Broncos | 8.904762 | 7.095238 | 23.33929 | 21.99405 |
| Browns | 5.238095 | 10.714286 | 17.69345 | 23.16071 |
| Buccaneers | 7.095238 | 8.904762 | 21.03869 | 21.94643 |
| Cardinals | 6.904762 | 9.000000 | 20.37202 | 23.61607 |
| Chargers | 8.047619 | 7.952381 | 24.05655 | 21.98214 |
| Chiefs | 8.571429 | 7.428571 | 23.58333 | 22.04167 |
| Colts | 9.904762 | 6.095238 | 25.01488 | 22.25893 |
| Cowboys | 8.285714 | 7.714286 | 22.41071 | 21.98810 |
| Dolphins | 7.571429 | 8.428571 | 20.11607 | 21.64881 |
| Eagles | 9.238095 | 6.666667 | 24.03571 | 20.80357 |
| Falcons | 8.000000 | 7.952381 | 22.71429 | 22.90179 |
| Giants | 7.809524 | 8.190476 | 21.79762 | 22.42857 |
| Jaguars | 6.095238 | 9.904762 | 19.55357 | 22.34226 |
| Jets | 7.142857 | 8.857143 | 19.61607 | 21.72917 |
| Lions | 5.666667 | 10.285714 | 20.72619 | 24.98810 |
| Packers | 10.000000 | 5.904762 | 25.55357 | 21.33929 |
| Panthers | 7.714286 | 8.238095 | 21.19048 | 21.78274 |
| Patriots | 11.619048 | 4.380952 | 26.93155 | 18.74405 |
| Raiders | 6.333333 | 9.666667 | 20.43452 | 24.71429 |
| Rams | 7.333333 | 8.619048 | 21.58631 | 23.45238 |
| Ravens | 9.571429 | 6.428571 | 22.75298 | 18.31250 |
| Redskins | 6.619048 | 9.333333 | 19.55655 | 22.33929 |
| Saints | 9.285714 | 6.714286 | 26.13988 | 23.18452 |
| Seahawks | 9.238095 | 6.714286 | 23.19643 | 20.69048 |
| Steelers | 10.333333 | 5.571429 | 23.20536 | 18.50893 |
| Texans | 7.105263 | 8.894737 | 21.02632 | 23.01316 |
| Titans | 8.142857 | 7.857143 | 21.80060 | 22.60714 |
| Vikings | 8.190476 | 7.714286 | 22.87798 | 22.39881 |
The last analysis takes a look at data from the individual NFL games. Using the nfl_games dataset, I investigated the different variables.
Now, to analyze the correlation between different variables, I used the GGally package to produce a detailed scatter plot matrix. The function ggpairs() produced histograms along the diagonal of the matrix. Pearson’s rho estimates, or statistics showing correlation, are seen in the upper-right. Scatter plots are seen in the lower-left. I analyzed six variables here - (1) Points Scored by Winning Team (pts_win); (2) Yards Gained by Winning Team (yds_win); (3) Turnovers Committed by Winning Team (turnovers_win); (4) Points Scored by Losing Team (pts_loss); (5) Yards Gained by Losing Team (yds_loss); and (6) Turnovers Committed by Losing Team (turnovers_loss).
I then grouped these variables by winning team vs. losing team. This correlation matrix can be seen in the first tab to the right. As evident through both the scatter plots and Pearson’s rho estimates, there is little to no relationship between Points Scored by Winning Team vs. Turnovers Committed by Winning Team as well as Yards Gained by Winning Team vs. Turnovers Committed by Winning Team. All of these correlation coefficients are close to zero. On the other hand, there is a strong, positive relationship between Points Scored by Winning Team vs. Yards Gained by Winning Team, with a Pearson rho estimate of 0.537.
Looking at the variables by losing team in the second tab to the right – very similar to the winning teams, there is little to no relationship between Points Scored by Losing Team vs. Turnovers Committed by Losing Team as well as Yards Gained by Losing Team vs. Turnovers Committed by Losing Team. All of these correlation coefficients are close to zero. On the other hand, there is a strong, positive relationship between Points Scored by Losing Team vs. Yards Gained by Losing Team, with a Pearson rho estimate of 0.632.
The main takeaway from these correlation matrices are that the more yards gained, the more likely you are to score. To compare a winning team and a losing team, I wanted to see if more turnovers from a losing team caused more points for the winning team. Please reference the third tab to the right to reference the linear model with pts_win as the response variable and turnovers_loss as the predictor variable. In this graphic, there is a slight, positive relationship between the Turnovers Committed by Losing Team and Points Scored by Winning Team. The correlation coefficient between the two variables is 0.176.
Looking at the nfl_weather dataset, I wanted to see which teams performed well under certain weather conditions. To do this, I first wanted to observe the average temperature, humidity, and wind speed at each home location. In R, I utilized the dplyr package to tidy my data and create new columns with mutate. To visualize the average temperature, humidity, and wind speed at each location, I created bar graphs for each variable per city.
From the visualizations to the right, it appears that the following five cities have the highest average temperatures:
Miami, Florida – 76.70°F
Detroit, Michigan – 71.64°F
Tampa Bay, Florida – 71.51°F
New Orleans, Louisiana – 71.03°F
Houston, Texas – 71.03°F
The following five cities have the highest humidity percentage:
Seattle, Washington – 79%
San Francisco, California – 71%
Oakland, California – 71%
Green Bay, Wisconsin – 71%
Miami, Florida – 70%
Lastly, the following five cities have the highest winds (in mph):
New England, Massachusetts – 11.54 mph
New York, New York –10.57 mph
Dallas, Texas – 10.27 mph
Denver, Colorado – 9.96 mph
Buffalo, New York – 9.95 mph
attach(nfl_weather)
cor(total_score, temperature); cor(total_score, humidity); cor(total_score, wind_mph)
# Split the data into training and testing
sample_index <- sample(nrow(nfl_weather), nrow(nfl_weather)*0.70)
weather_train <- nfl_weather[sample_index,]
weather_test <- nfl_weather[-sample_index,]# Create the linear model
weather_model <- lm(total_score ~ temperature + humidity + wind_mph, data = weather_train)
model_summary <- summary(weather_model)
model_summary
Call:
lm(formula = total_score ~ temperature + humidity + wind_mph,
data = weather_train)
Residuals:
Min 1Q Median 3Q Max
-37.371 -10.303 -1.084 8.629 65.488
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.48511 1.36569 34.770 < 2e-16 ***
temperature -0.01548 0.01849 -0.837 0.40268
humidity -3.12415 1.16805 -2.675 0.00753 **
wind_mph -0.28202 0.06450 -4.372 1.28e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.45 on 2460 degrees of freedom
Multiple R-squared: 0.02244, Adjusted R-squared: 0.02125
F-statistic: 18.82 on 3 and 2460 DF, p-value: 4.55e-12
# Out-of-sample performance
pi <- predict(object = weather_model, newdata = weather_test)
mean((pi - weather_test$total_score)^2) # MSE[1] 190.3678
# Drop all variables except wind_mph
weather_model_2 <- lm(total_score ~ wind_mph, data = weather_train)
model_summary_2 <- summary(weather_model_2)
model_summary_2
Call:
lm(formula = total_score ~ wind_mph, data = weather_train)
Residuals:
Min 1Q Median 3Q Max
-36.593 -10.377 -0.936 8.773 65.526
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.59288 0.47042 96.920 < 2e-16 ***
wind_mph -0.36566 0.05221 -7.004 3.2e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.46 on 2462 degrees of freedom
Multiple R-squared: 0.01953, Adjusted R-squared: 0.01914
F-statistic: 49.05 on 1 and 2462 DF, p-value: 3.205e-12
# Out-of-sample performance
pi_2 <- predict(object = weather_model_2, newdata = weather_test)
mean((pi_2 - weather_test$total_score)^2) # MSE[1] 191.7339
top_10_teams_playoffs <- nfl_teams_playoffs %>% top_n(10, playoffs) %>%
arrange(desc(playoffs))
top_10_teams_playoffs <- top_10_teams_playoffs[1:10,]
kable(top_10_teams_playoffs)| team_name | playoffs |
|---|---|
| Patriots | 17 |
| Colts | 15 |
| Packers | 15 |
| Seahawks | 14 |
| Eagles | 13 |
| Ravens | 13 |
| Steelers | 13 |
| Chiefs | 10 |
| Saints | 10 |
| Broncos | 9 |
top_10_coaches_playoffs <- nfl_coaches_playoffs %>% top_n(10, playoffs) %>% arrange(desc(playoffs))
top_10_coaches_playoffs <- top_10_coaches_playoffs[1:10,]
kable(top_10_coaches_playoffs)| head_coach | playoffs |
|---|---|
| Bill Belichick | 17 |
| Andy Reid | 16 |
| John Harbaugh | 9 |
| Mike McCarthy | 9 |
| Mike Tomlin | 9 |
| Pete Carroll | 9 |
| Sean Payton | 9 |
| Tony Dungy | 9 |
| John Fox | 7 |
| Marvin Lewis | 7 |
top_10_coaches <- nfl_coaches_sb %>% top_n(10, total_sb_coach) %>% arrange(desc(total_sb_coach))
top_10_coaches <- top_10_coaches[1:10,]
kable(top_10_coaches)| head_coach | total_sb_coach |
|---|---|
| Bill Belichick | 6 |
| Tom Coughlin | 2 |
| Brian Billick | 1 |
| Jon Gruden | 1 |
| Andy Reid | 1 |
| Tony Dungy | 1 |
| Bill Cowher | 1 |
| Sean Payton | 1 |
| Mike Tomlin | 1 |
| Mike McCarthy | 1 |
top_10_qb_playoffs <- nfl_qb_playoffs %>% top_n(10, playoffs) %>% arrange(desc(playoffs))
top_10_qb_playoffs <- top_10_qb_playoffs[1:10,]
kable(top_10_qb_playoffs)| qb | playoffs |
|---|---|
| Tom Brady | 17 |
| Ben Roethlisberger | 9 |
| Drew Brees | 9 |
| Aaron Rodgers | 8 |
| Russell Wilson | 8 |
| Donovan McNabb | 7 |
| Peyton Manning | 7 |
| Joe Flacco | 6 |
| Matt Hasselbeck | 5 |
| Eli Manning | 5 |
top_10_qb <- nfl_qb_sb %>% top_n(10, total_sb_qb) %>% arrange(desc(total_sb_qb))
top_10_qb <- top_10_qb[1:10,]
kable(top_10_qb)| qb | total_sb_qb |
|---|---|
| Tom Brady | 7 |
| Peyton Manning | 2 |
| Ben Roethlisberger | 2 |
| Eli Manning | 2 |
| Trent Dilfer | 1 |
| Brad Johnson | 1 |
| Drew Brees | 1 |
| Joe Flacco | 1 |
| Aaron Rodgers | 1 |
| Russell Wilson | 1 |
| year | player | yds | team | team_name |
|---|---|---|---|---|
| 2020 | Deshaun Watson | 4823 | Houston | Texans |
| 2019 | Jameis Winston | 5109 | Tampa Bay | Buccaneers |
| 2018 | Ben Roethlisberger | 5129 | Pittsburgh | Steelers |
| 2017 | Tom Brady | 4577 | New England | Patriots |
| 2016 | Drew Brees | 5208 | New Orleans | Saints |
| 2015 | Drew Brees | 4870 | New Orleans | Saints |
| 2014 | Drew Brees | 4952 | New Orleans | Saints |
| 2014 | Ben Roethlisberger | 4952 | Pittsburgh | Steelers |
| 2013 | Peyton Manning | 5477 | Denver | Broncos |
| 2012 | Drew Brees | 5177 | New Orleans | Saints |
| 2011 | Drew Brees | 5476 | New Orleans | Saints |
| 2010 | Philip Rivers | 4710 | San Diego | Chargers |
| 2009 | Matt Schaub | 4770 | Houston | Texans |
| 2008 | Drew Brees | 5069 | New Orleans | Saints |
| 2007 | Tom Brady | 4806 | New England | Patriots |
| 2006 | Drew Brees | 4418 | New Orleans | Saints |
| 2005 | Tom Brady | 4110 | New England | Patriots |
| 2004 | Daunte Culpepper | 4717 | Minnesota | Vikings |
| 2003 | Peyton Manning | 4267 | Indianapolis | Colts |
| 2002 | Rich Gannon | 4689 | Oakland | Raiders |
| 2001 | Kurt Warner | 4830 | St. Louis | Rams |
| 2000 | Peyton Manning | 4413 | Indianapolis | Colts |
| year | player | yds | team | team_name |
|---|---|---|---|---|
| 2020 | Derrick Henry | 2027 | Tennessee | Titans |
| 2019 | Derrick Henry | 1540 | Tennessee | Titans |
| 2018 | Ezekiel Elliott | 1434 | Dallas | Cowboys |
| 2017 | Kareem Hunt | 1327 | Kansas City | Chiefs |
| 2016 | Ezekiel Elliott | 1631 | Dallas | Cowboys |
| 2015 | Adrian Peterson | 1485 | Minnesota | Vikings |
| 2014 | DeMarco Murray | 1845 | Dallas | Cowboys |
| 2013 | LeSean McCoy | 1607 | Philadelphia | Eagles |
| 2012 | Adrian Peterson | 2097 | Minnesota | Vikings |
| 2011 | Maurice Jones-Drew | 1606 | Jacksonville | Jaguars |
| 2010 | Arian Foster | 1616 | Houston | Texans |
| 2009 | Chris Johnson | 2006 | Tennessee | Titans |
| 2008 | Adrian Peterson | 1760 | Minnesota | Vikings |
| 2007 | LaDainian Tomlinson | 1474 | San Diego | Chargers |
| 2006 | LaDainian Tomlinson | 1815 | San Diego | Chargers |
| 2005 | Shaun Alexander | 1880 | Seattle | Seahawks |
| 2004 | Curtis Martin | 1697 | New York | Jets |
| 2003 | Jamal Lewis | 2066 | Baltimore | Ravens |
| 2002 | Ricky Williams | 1853 | Miami | Dolphins |
| 2001 | Priest Holmes | 1555 | Kansas City | Chiefs |
| 2000 | Edgerrin James | 1709 | Indianapolis | Colts |
Looking at the average penalty yards per game, I was able to find a dataset that recorded the average penalty yards against a team from 2003 - 2020. I wanted to figure out – which team was the most penalized?
From the graphic below, it is evident that the Las Vegas Raiders have been the most penalized team in the NFL. The top five most penalized teams are:
Las Vegas Raiders
Baltimore Ravens
Detroit Lions
Tampa Bay Buccaneers
Los Angeles Rams
The least penalized team in the NFL is the Indianapolis Colts.
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| Chicago Bears | 12 |
| Green Bay Packers | 29 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 5 | Green Bay Packers | Chicago Bears | Chicago Bears | Sun | October 1 | 16:15:00 | 27 | 24 | 370 | 0 | 364 | 3 |
| 2000 | 14 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 3 | 20:35:00 | 28 | 6 | 304 | 0 | 330 | 2 |
| 2001 | 9 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | November 11 | 13:02:00 | 20 | 12 | 368 | 2 | 262 | 1 |
| 2001 | 13 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | December 9 | 13:02:00 | 17 | 7 | 352 | 1 | 189 | 1 |
| 2002 | 5 | Chicago Bears | Green Bay Packers | Green Bay Packers | Mon | October 7 | 21:08:00 | 34 | 21 | 457 | 1 | 380 | 4 |
| 2002 | 13 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | December 1 | 13:02:00 | 30 | 20 | 396 | 2 | 304 | 4 |
| 2003 | 4 | Chicago Bears | Green Bay Packers | Green Bay Packers | Mon | September 29 | 21:09:00 | 38 | 23 | 380 | 1 | 361 | 2 |
| 2003 | 14 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | December 7 | 13:02:00 | 34 | 21 | 307 | 1 | 275 | 5 |
| 2004 | 2 | Green Bay Packers | Chicago Bears | Chicago Bears | Sun | September 19 | 13:02:00 | 21 | 10 | 307 | 2 | 404 | 3 |
| 2004 | 17 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | January 2 | 13:03:00 | 31 | 14 | 387 | 0 | 246 | 1 |
| 2005 | 13 | Chicago Bears | Green Bay Packers | Chicago Bears | Sun | December 4 | 13:05:00 | 19 | 7 | 190 | 2 | 358 | 4 |
| 2005 | 16 | Green Bay Packers | Chicago Bears | Chicago Bears | Sun | December 25 | 17:11:00 | 24 | 17 | 292 | 1 | 365 | 4 |
| 2006 | 1 | Green Bay Packers | Chicago Bears | Chicago Bears | Sun | September 10 | 16:15:00 | 26 | 0 | 361 | 1 | 267 | 3 |
| 2006 | 17 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 31 | 20:15:00 | 26 | 7 | 373 | 1 | 316 | 6 |
| 2007 | 5 | Green Bay Packers | Chicago Bears | Chicago Bears | Sun | October 7 | 20:24:00 | 27 | 20 | 285 | 1 | 439 | 5 |
| 2007 | 16 | Chicago Bears | Green Bay Packers | Chicago Bears | Sun | December 23 | 13:03:00 | 35 | 7 | 240 | 0 | 274 | 2 |
| 2008 | 11 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | November 16 | 13:02:00 | 37 | 3 | 427 | 1 | 234 | 1 |
| 2008 | 16 | Chicago Bears | Green Bay Packers | Chicago Bears | Mon | December 22 | 20:40:00 | 20 | 17 | 210 | 2 | 325 | 2 |
| 2009 | 1 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | September 13 | 20:30:00 | 21 | 15 | 226 | 0 | 352 | 4 |
| 2009 | 14 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 13 | 13:02:00 | 21 | 14 | 315 | 2 | 254 | 2 |
| 2010 | 3 | Chicago Bears | Green Bay Packers | Chicago Bears | Mon | September 27 | 20:40:00 | 20 | 17 | 276 | 1 | 379 | 2 |
| 2010 | 17 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | January 2 | 16:15:00 | 10 | 3 | 284 | 2 | 227 | 2 |
| 2010 | NA | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | January 23 | 15:05:00 | 21 | 14 | 356 | 2 | 301 | 3 |
| 2011 | 3 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | September 25 | 16:15:00 | 27 | 17 | 392 | 2 | 291 | 2 |
| 2011 | 16 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | December 25 | 20:30:00 | 35 | 21 | 363 | 0 | 441 | 2 |
| 2012 | 2 | Green Bay Packers | Chicago Bears | Green Bay Packers | Thu | September 13 | 20:29:00 | 23 | 10 | 321 | 2 | 168 | 4 |
| 2012 | 15 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 16 | 13:03:00 | 21 | 13 | 391 | 2 | 190 | 1 |
| 2013 | 9 | Green Bay Packers | Chicago Bears | Chicago Bears | Mon | November 4 | 20:40:00 | 27 | 20 | 442 | 0 | 312 | 1 |
| 2013 | 17 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 29 | 16:25:00 | 33 | 28 | 473 | 2 | 345 | 2 |
| 2014 | 4 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | September 28 | 13:02:00 | 38 | 17 | 358 | 0 | 496 | 2 |
| 2014 | 10 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | November 9 | 20:30:00 | 55 | 14 | 451 | 1 | 311 | 3 |
| 2015 | 1 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | September 13 | 13:00:00 | 31 | 23 | 322 | 0 | 402 | 1 |
| 2015 | 12 | Green Bay Packers | Chicago Bears | Chicago Bears | Thu | November 26 | 20:30:00 | 17 | 13 | 290 | 0 | 365 | 2 |
| 2016 | 7 | Green Bay Packers | Chicago Bears | Green Bay Packers | Thu | October 20 | 20:26:00 | 26 | 10 | 406 | 1 | 189 | 2 |
| 2016 | 15 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | December 18 | 13:00:00 | 30 | 27 | 451 | 0 | 449 | 4 |
| 2017 | 4 | Green Bay Packers | Chicago Bears | Green Bay Packers | Thu | September 28 | 20:25:00 | 35 | 14 | 260 | 0 | 308 | 4 |
| 2017 | 10 | Chicago Bears | Green Bay Packers | Green Bay Packers | Sun | November 12 | 13:00:00 | 23 | 16 | 342 | 0 | 323 | 1 |
| 2018 | 1 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | September 9 | 20:20:00 | 24 | 23 | 370 | 2 | 294 | 1 |
| 2018 | 15 | Chicago Bears | Green Bay Packers | Chicago Bears | Sun | December 16 | 13:00:00 | 24 | 17 | 332 | 1 | 323 | 1 |
| 2019 | 1 | Chicago Bears | Green Bay Packers | Green Bay Packers | Thu | September 5 | 20:20:00 | 10 | 3 | 213 | 0 | 254 | 1 |
| 2019 | 15 | Green Bay Packers | Chicago Bears | Green Bay Packers | Sun | December 15 | 13:00:00 | 21 | 13 | 292 | 0 | 415 | 3 |
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| Dallas Cowboys | 19 |
| Philadelphia Eagles | 22 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 1 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | September 3 | 16:05:00 | 41 | 14 | 425 | 3 | 167 | 2 |
| 2000 | 10 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | November 5 | 13:03:00 | 16 | 13 | 357 | 2 | 295 | 2 |
| 2001 | 3 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | September 30 | 20:38:00 | 40 | 18 | 276 | 3 | 242 | 5 |
| 2001 | 10 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | November 18 | 13:03:00 | 36 | 3 | 227 | 1 | 213 | 4 |
| 2002 | 3 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | September 22 | 13:02:00 | 44 | 13 | 447 | 2 | 304 | 4 |
| 2002 | 16 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sat | December 21 | 20:39:00 | 27 | 3 | 359 | 2 | 146 | 3 |
| 2003 | 6 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | October 12 | 13:03:00 | 23 | 21 | 292 | 1 | 232 | 1 |
| 2003 | 14 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | December 7 | 13:03:00 | 36 | 10 | 403 | 0 | 225 | 2 |
| 2004 | 10 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Mon | November 15 | 21:09:00 | 49 | 21 | 485 | 0 | 317 | 3 |
| 2004 | 15 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | December 19 | 13:02:00 | 12 | 7 | 328 | 3 | 237 | 2 |
| 2005 | 5 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | October 9 | 16:15:00 | 33 | 10 | 456 | 1 | 129 | 0 |
| 2005 | 10 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Mon | November 14 | 21:08:00 | 21 | 20 | 241 | 1 | 359 | 1 |
| 2006 | 5 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | October 8 | 16:14:00 | 38 | 24 | 383 | 2 | 320 | 5 |
| 2006 | 16 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Mon | December 25 | 17:07:00 | 23 | 7 | 426 | 1 | 201 | 3 |
| 2007 | 9 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | November 4 | 20:23:00 | 38 | 17 | 434 | 1 | 316 | 3 |
| 2007 | 15 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | December 16 | 16:15:00 | 10 | 6 | 315 | 1 | 240 | 3 |
| 2008 | 2 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Mon | September 15 | 20:40:00 | 41 | 37 | 380 | 2 | 337 | 1 |
| 2008 | 17 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | December 28 | 16:15:00 | 44 | 6 | 303 | 1 | 298 | 5 |
| 2009 | 9 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | November 8 | 20:31:00 | 20 | 16 | 358 | 1 | 297 | 2 |
| 2009 | 17 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | January 3 | 16:15:00 | 24 | 0 | 474 | 1 | 228 | 1 |
| 2009 | NA | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sat | January 9 | 20:05:00 | 34 | 14 | 426 | 1 | 340 | 4 |
| 2010 | 14 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | December 12 | 20:30:00 | 30 | 27 | 429 | 2 | 349 | 2 |
| 2010 | 17 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | January 2 | 16:15:00 | 14 | 13 | 272 | 1 | 244 | 4 |
| 2011 | 8 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | October 30 | 20:28:00 | 34 | 7 | 495 | 0 | 267 | 1 |
| 2011 | 16 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sat | December 24 | 16:15:00 | 20 | 7 | 386 | 1 | 238 | 0 |
| 2012 | 10 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | November 11 | 16:25:00 | 38 | 23 | 294 | 0 | 369 | 2 |
| 2012 | 13 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | December 2 | 20:20:00 | 38 | 33 | 417 | 0 | 423 | 1 |
| 2013 | 7 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | October 20 | 13:02:00 | 17 | 3 | 368 | 2 | 278 | 3 |
| 2013 | 17 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | December 29 | 20:30:00 | 24 | 22 | 366 | 1 | 414 | 3 |
| 2014 | 13 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Thu | November 27 | 16:36:00 | 33 | 10 | 464 | 1 | 267 | 3 |
| 2014 | 15 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | December 14 | 20:35:00 | 38 | 27 | 364 | 1 | 294 | 3 |
| 2015 | 2 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | September 20 | 16:25:00 | 20 | 10 | 359 | 2 | 226 | 3 |
| 2015 | 9 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | November 8 | 20:31:00 | 33 | 27 | 459 | 0 | 411 | 1 |
| 2016 | 8 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | October 30 | 20:31:00 | 29 | 23 | 460 | 1 | 291 | 1 |
| 2016 | 17 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | January 1 | 13:00:00 | 27 | 13 | 346 | 0 | 195 | 2 |
| 2017 | 11 | Dallas Cowboys | Philadelphia Eagles | Philadelphia Eagles | Sun | November 19 | 20:30:00 | 37 | 9 | 383 | 0 | 225 | 4 |
| 2017 | 17 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | December 31 | 13:00:00 | 6 | 0 | 301 | 1 | 219 | 2 |
| 2018 | 10 | Philadelphia Eagles | Dallas Cowboys | Dallas Cowboys | Sun | November 11 | 20:20:00 | 27 | 20 | 410 | 0 | 421 | 1 |
| 2018 | 14 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | December 9 | 16:25:00 | 29 | 23 | 576 | 3 | 256 | 1 |
| 2019 | 7 | Dallas Cowboys | Philadelphia Eagles | Dallas Cowboys | Sun | October 20 | 20:20:00 | 37 | 10 | 402 | 1 | 283 | 4 |
| 2019 | 16 | Philadelphia Eagles | Dallas Cowboys | Philadelphia Eagles | Sun | December 22 | 16:25:00 | 17 | 9 | 431 | 0 | 311 | 1 |
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| Kansas City Chiefs | 25 |
| Oakland Raiders | 15 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 7 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | October 15 | 13:00:00 | 20 | 17 | 391 | 1 | 346 | 1 |
| 2000 | 10 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Sun | November 5 | 13:15:00 | 49 | 31 | 473 | 0 | 513 | 3 |
| 2001 | 1 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | September 9 | 12:01:00 | 27 | 24 | 427 | 3 | 254 | 3 |
| 2001 | 13 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Sun | December 9 | 16:15:00 | 28 | 26 | 264 | 5 | 447 | 1 |
| 2002 | 8 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | October 27 | 13:00:00 | 20 | 10 | 323 | 2 | 417 | 2 |
| 2002 | 17 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Sat | December 28 | 17:15:00 | 24 | 0 | 354 | 1 | 176 | 1 |
| 2003 | 7 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Mon | October 20 | 21:05:00 | 17 | 10 | 319 | 1 | 357 | 3 |
| 2003 | 12 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | November 23 | 16:15:00 | 27 | 24 | 384 | 0 | 379 | 0 |
| 2004 | 13 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | December 5 | 16:05:00 | 34 | 27 | 500 | 1 | 364 | 0 |
| 2004 | 16 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sat | December 25 | 17:00:00 | 31 | 30 | 433 | 2 | 300 | 1 |
| 2005 | 2 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | September 18 | 20:30:00 | 23 | 17 | 354 | 1 | 327 | 2 |
| 2005 | 9 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | November 6 | 13:00:00 | 27 | 23 | 321 | 1 | 263 | 1 |
| 2006 | 11 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | November 19 | 13:00:00 | 17 | 13 | 292 | 0 | 326 | 1 |
| 2006 | 16 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sat | December 23 | 20:00:00 | 20 | 9 | 292 | 1 | 307 | 5 |
| 2007 | 7 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | October 21 | 16:05:00 | 12 | 10 | 290 | 1 | 268 | 2 |
| 2007 | 12 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | November 25 | 13:00:00 | 20 | 17 | 312 | 1 | 292 | 1 |
| 2008 | 2 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | September 14 | 13:00:00 | 23 | 8 | 355 | 2 | 190 | 2 |
| 2008 | 13 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | November 30 | 16:15:00 | 20 | 13 | 301 | 1 | 271 | 2 |
| 2009 | 2 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | September 20 | 13:00:00 | 13 | 10 | 166 | 0 | 409 | 2 |
| 2009 | 10 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | November 15 | 16:05:00 | 16 | 10 | 318 | 3 | 272 | 2 |
| 2010 | 9 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Sun | November 7 | 16:15:00 | 23 | 20 | 321 | 3 | 304 | 2 |
| 2010 | 17 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | January 2 | 13:02:00 | 31 | 10 | 344 | 1 | 201 | 2 |
| 2011 | 7 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | October 23 | 19:05:00 | 28 | 0 | 300 | 2 | 322 | 6 |
| 2011 | 16 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sat | December 24 | 13:03:00 | 16 | 13 | 308 | 2 | 435 | 2 |
| 2012 | 8 | Kansas City Chiefs | Oakland Raiders | Oakland Raiders | Sun | October 28 | 16:06:00 | 26 | 16 | 344 | 1 | 299 | 4 |
| 2012 | 15 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Sun | December 16 | 16:25:00 | 15 | 0 | 385 | 1 | 119 | 1 |
| 2013 | 6 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | October 13 | 13:02:00 | 24 | 7 | 216 | 1 | 274 | 3 |
| 2013 | 15 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | December 15 | 16:05:00 | 56 | 31 | 384 | 1 | 461 | 7 |
| 2014 | 12 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Thu | November 20 | 20:26:00 | 24 | 20 | 351 | 1 | 313 | 0 |
| 2014 | 15 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | December 14 | 13:03:00 | 31 | 13 | 388 | 1 | 280 | 1 |
| 2015 | 13 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | December 6 | 16:05:00 | 34 | 20 | 232 | 2 | 361 | 3 |
| 2015 | 17 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | January 3 | 16:26:00 | 23 | 17 | 339 | 2 | 205 | 1 |
| 2016 | 6 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | October 16 | 16:05:00 | 26 | 10 | 406 | 0 | 285 | 2 |
| 2016 | 14 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Thu | December 8 | 20:25:00 | 21 | 13 | 323 | 3 | 244 | 0 |
| 2017 | 7 | Oakland Raiders | Kansas City Chiefs | Oakland Raiders | Thu | October 19 | 20:25:00 | 31 | 30 | 505 | 0 | 425 | 0 |
| 2017 | 14 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | December 10 | 13:00:00 | 26 | 15 | 408 | 1 | 268 | 3 |
| 2018 | 13 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | December 2 | 16:05:00 | 40 | 33 | 469 | 1 | 442 | 3 |
| 2018 | 17 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | December 30 | 16:25:00 | 35 | 3 | 409 | 1 | 292 | 4 |
| 2019 | 2 | Oakland Raiders | Kansas City Chiefs | Kansas City Chiefs | Sun | September 15 | 16:05:00 | 28 | 10 | 467 | 1 | 307 | 2 |
| 2019 | 13 | Kansas City Chiefs | Oakland Raiders | Kansas City Chiefs | Sun | December 1 | 16:25:00 | 40 | 9 | 259 | 0 | 332 | 3 |
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| Baltimore Ravens | 22 |
| Pittsburgh Steelers | 22 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 1 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | September 3 | 13:02:00 | 16 | 0 | 336 | 0 | 223 | 1 |
| 2000 | 9 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 29 | 13:02:00 | 9 | 6 | 231 | 1 | 274 | 3 |
| 2001 | 8 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | November 4 | 13:01:00 | 13 | 10 | 183 | 1 | 348 | 1 |
| 2001 | 14 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 16 | 20:35:00 | 26 | 21 | 476 | 0 | 207 | 1 |
| 2001 | NA | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | January 20 | 12:40:00 | 27 | 10 | 297 | 1 | 150 | 4 |
| 2002 | 8 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 27 | 13:02:00 | 31 | 18 | 283 | 1 | 360 | 5 |
| 2002 | 17 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | December 29 | 13:02:00 | 34 | 31 | 351 | 2 | 422 | 4 |
| 2003 | 1 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | September 7 | 13:04:00 | 34 | 15 | 339 | 1 | 231 | 2 |
| 2003 | 17 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | December 28 | 20:36:00 | 13 | 10 | 279 | 2 | 214 | 5 |
| 2004 | 2 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | September 19 | 13:02:00 | 30 | 13 | 259 | 0 | 310 | 3 |
| 2004 | 16 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | December 26 | 13:01:00 | 20 | 7 | 404 | 2 | 248 | 1 |
| 2005 | 8 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Mon | October 31 | 21:07:00 | 20 | 19 | 261 | 2 | 318 | 3 |
| 2005 | 11 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | November 20 | 13:02:00 | 16 | 13 | 241 | 2 | 282 | 2 |
| 2006 | 12 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | November 26 | 13:02:00 | 27 | 0 | 275 | 0 | 172 | 3 |
| 2006 | 16 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | December 24 | 13:02:00 | 31 | 7 | 359 | 3 | 251 | 3 |
| 2007 | 9 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Mon | November 5 | 20:38:00 | 38 | 7 | 291 | 1 | 104 | 4 |
| 2007 | 17 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | December 30 | 16:17:00 | 27 | 21 | 334 | 1 | 264 | 3 |
| 2008 | 4 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Mon | September 29 | 20:40:00 | 23 | 20 | 237 | 1 | 243 | 1 |
| 2008 | 15 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 14 | 16:15:00 | 13 | 9 | 311 | 2 | 202 | 2 |
| 2008 | NA | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | January 18 | 18:43:00 | 23 | 14 | 275 | 1 | 198 | 4 |
| 2009 | 12 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | November 29 | 20:30:00 | 20 | 17 | 393 | 2 | 298 | 1 |
| 2009 | 16 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | December 27 | 13:02:00 | 23 | 20 | 286 | 2 | 323 | 3 |
| 2010 | 4 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | October 3 | 13:02:00 | 17 | 14 | 320 | 2 | 210 | 1 |
| 2010 | 13 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 5 | 20:30:00 | 13 | 10 | 288 | 1 | 269 | 1 |
| 2010 | NA | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sat | January 15 | 16:35:00 | 31 | 24 | 263 | 2 | 126 | 3 |
| 2011 | 1 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | September 11 | 13:05:00 | 35 | 7 | 385 | 0 | 312 | 7 |
| 2011 | 9 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | November 6 | 20:30:00 | 23 | 20 | 356 | 1 | 392 | 2 |
| 2012 | 11 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | November 18 | 20:30:00 | 13 | 10 | 200 | 0 | 309 | 3 |
| 2012 | 13 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 2 | 16:25:00 | 23 | 20 | 366 | 3 | 288 | 2 |
| 2013 | 7 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | October 20 | 16:25:00 | 19 | 16 | 286 | 1 | 287 | 0 |
| 2013 | 13 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Thu | November 28 | 20:31:00 | 22 | 20 | 311 | 0 | 329 | 0 |
| 2014 | 2 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Thu | September 11 | 20:28:00 | 26 | 6 | 323 | 0 | 301 | 3 |
| 2014 | 9 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | November 2 | 20:30:00 | 43 | 23 | 376 | 1 | 332 | 2 |
| 2014 | NA | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sat | January 3 | 20:15:00 | 30 | 17 | 299 | 1 | 387 | 3 |
| 2015 | 4 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Thu | October 1 | 20:26:00 | 23 | 20 | 356 | 2 | 263 | 0 |
| 2015 | 16 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | December 27 | 13:02:00 | 20 | 17 | 386 | 0 | 308 | 3 |
| 2016 | 9 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | November 6 | 13:00:00 | 21 | 14 | 274 | 1 | 277 | 1 |
| 2016 | 16 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | December 25 | 16:30:00 | 31 | 27 | 406 | 2 | 368 | 1 |
| 2017 | 4 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 1 | 13:00:00 | 26 | 9 | 381 | 1 | 288 | 3 |
| 2017 | 14 | Pittsburgh Steelers | Baltimore Ravens | Pittsburgh Steelers | Sun | December 10 | 20:30:00 | 39 | 38 | 545 | 0 | 413 | 1 |
| 2018 | 4 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | September 30 | 20:20:00 | 26 | 14 | 451 | 1 | 284 | 2 |
| 2018 | 9 | Baltimore Ravens | Pittsburgh Steelers | Pittsburgh Steelers | Sun | November 4 | 13:00:00 | 23 | 16 | 395 | 0 | 265 | 0 |
| 2019 | 5 | Pittsburgh Steelers | Baltimore Ravens | Baltimore Ravens | Sun | October 6 | 13:00:00 | 26 | 23 | 277 | 3 | 269 | 2 |
| 2019 | 17 | Baltimore Ravens | Pittsburgh Steelers | Baltimore Ravens | Sun | December 29 | 16:25:00 | 28 | 10 | 304 | 2 | 168 | 2 |
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| New York Giants | 27 |
| Washington Redskins | 13 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 4 | New York Giants | Washington Redskins | Washington Redskins | Sun | September 24 | 20:37:00 | 16 | 6 | 394 | 0 | 261 | 1 |
| 2000 | 14 | Washington Redskins | New York Giants | New York Giants | Sun | December 3 | 13:01:00 | 9 | 7 | 305 | 3 | 290 | 2 |
| 2001 | 4 | New York Giants | Washington Redskins | New York Giants | Sun | October 7 | 13:04:00 | 23 | 9 | 309 | 4 | 181 | 5 |
| 2001 | 7 | Washington Redskins | New York Giants | Washington Redskins | Sun | October 28 | 16:05:00 | 35 | 21 | 353 | 0 | 388 | 2 |
| 2002 | 11 | New York Giants | Washington Redskins | New York Giants | Sun | November 17 | 13:04:00 | 19 | 17 | 299 | 3 | 166 | 2 |
| 2002 | 14 | Washington Redskins | New York Giants | New York Giants | Sun | December 8 | 13:02:00 | 27 | 21 | 316 | 0 | 447 | 5 |
| 2003 | 3 | Washington Redskins | New York Giants | New York Giants | Sun | September 21 | 16:05:00 | 24 | 21 | 399 | 0 | 456 | 1 |
| 2003 | 14 | New York Giants | Washington Redskins | Washington Redskins | Sun | December 7 | 13:02:00 | 20 | 7 | 288 | 0 | 220 | 3 |
| 2004 | 2 | New York Giants | Washington Redskins | New York Giants | Sun | September 19 | 13:04:00 | 20 | 14 | 277 | 1 | 322 | 7 |
| 2004 | 13 | Washington Redskins | New York Giants | Washington Redskins | Sun | December 5 | 16:16:00 | 31 | 7 | 379 | 0 | 145 | 0 |
| 2005 | 8 | New York Giants | Washington Redskins | New York Giants | Sun | October 30 | 13:06:00 | 36 | 0 | 386 | 1 | 125 | 4 |
| 2005 | 16 | Washington Redskins | New York Giants | Washington Redskins | Sat | December 24 | 13:03:00 | 35 | 20 | 380 | 1 | 332 | 1 |
| 2006 | 5 | New York Giants | Washington Redskins | New York Giants | Sun | October 8 | 13:03:00 | 19 | 3 | 411 | 0 | 164 | 0 |
| 2006 | 17 | Washington Redskins | New York Giants | New York Giants | Sat | December 30 | 20:05:00 | 34 | 28 | 355 | 0 | 393 | 2 |
| 2007 | 3 | Washington Redskins | New York Giants | New York Giants | Sun | September 23 | 16:15:00 | 24 | 17 | 315 | 3 | 260 | 1 |
| 2007 | 15 | New York Giants | Washington Redskins | Washington Redskins | Sun | December 16 | 20:24:00 | 22 | 10 | 309 | 0 | 307 | 1 |
| 2008 | 1 | New York Giants | Washington Redskins | New York Giants | Thu | September 4 | 19:08:00 | 16 | 7 | 354 | 1 | 209 | 0 |
| 2008 | 13 | Washington Redskins | New York Giants | New York Giants | Sun | November 30 | 13:04:00 | 23 | 7 | 404 | 1 | 320 | 2 |
| 2009 | 1 | New York Giants | Washington Redskins | New York Giants | Sun | September 13 | 16:15:00 | 23 | 17 | 351 | 2 | 272 | 2 |
| 2009 | 15 | Washington Redskins | New York Giants | New York Giants | Mon | December 21 | 20:40:00 | 45 | 12 | 387 | 0 | 302 | 3 |
| 2010 | 13 | New York Giants | Washington Redskins | New York Giants | Sun | December 5 | 13:02:00 | 31 | 7 | 358 | 1 | 338 | 6 |
| 2010 | 17 | Washington Redskins | New York Giants | New York Giants | Sun | January 2 | 16:16:00 | 17 | 14 | 325 | 1 | 385 | 4 |
| 2011 | 1 | Washington Redskins | New York Giants | Washington Redskins | Sun | September 11 | 16:23:00 | 28 | 14 | 332 | 1 | 315 | 1 |
| 2011 | 15 | New York Giants | Washington Redskins | Washington Redskins | Sun | December 18 | 13:02:00 | 23 | 10 | 300 | 2 | 324 | 3 |
| 2012 | 7 | New York Giants | Washington Redskins | New York Giants | Sun | October 21 | 13:03:00 | 27 | 23 | 393 | 2 | 480 | 4 |
| 2012 | 13 | Washington Redskins | New York Giants | Washington Redskins | Mon | December 3 | 20:40:00 | 17 | 16 | 370 | 1 | 390 | 0 |
| 2013 | 13 | Washington Redskins | New York Giants | New York Giants | Sun | December 1 | 20:31:00 | 24 | 17 | 286 | 1 | 323 | 1 |
| 2013 | 17 | New York Giants | Washington Redskins | New York Giants | Sun | December 29 | 13:04:00 | 20 | 6 | 278 | 3 | 251 | 4 |
| 2014 | 4 | Washington Redskins | New York Giants | New York Giants | Thu | September 25 | 20:26:00 | 45 | 14 | 449 | 1 | 329 | 6 |
| 2014 | 15 | New York Giants | Washington Redskins | New York Giants | Sun | December 14 | 13:02:00 | 24 | 13 | 287 | 1 | 372 | 1 |
| 2015 | 3 | New York Giants | Washington Redskins | New York Giants | Thu | September 24 | 20:26:00 | 32 | 21 | 363 | 0 | 393 | 3 |
| 2015 | 12 | Washington Redskins | New York Giants | Washington Redskins | Sun | November 29 | 13:03:00 | 20 | 14 | 407 | 0 | 332 | 3 |
| 2016 | 3 | New York Giants | Washington Redskins | Washington Redskins | Sun | September 25 | 13:02:00 | 29 | 27 | 403 | 1 | 457 | 3 |
| 2016 | 17 | Washington Redskins | New York Giants | New York Giants | Sun | January 1 | 16:25:00 | 19 | 10 | 332 | 0 | 284 | 3 |
| 2017 | 12 | Washington Redskins | New York Giants | Washington Redskins | Thu | November 23 | 20:30:00 | 20 | 10 | 323 | 1 | 170 | 1 |
| 2017 | 17 | New York Giants | Washington Redskins | New York Giants | Sun | December 31 | 13:00:00 | 18 | 10 | 381 | 1 | 197 | 3 |
| 2018 | 8 | New York Giants | Washington Redskins | Washington Redskins | Sun | October 28 | 13:00:00 | 20 | 13 | 360 | 1 | 303 | 2 |
| 2018 | 14 | Washington Redskins | New York Giants | New York Giants | Sun | December 9 | 13:00:00 | 40 | 16 | 402 | 1 | 288 | 3 |
| 2019 | 4 | New York Giants | Washington Redskins | New York Giants | Sun | September 29 | 13:00:00 | 24 | 3 | 389 | 4 | 176 | 4 |
| 2019 | 16 | Washington Redskins | New York Giants | New York Giants | Sun | December 22 | 13:00:00 | 41 | 35 | 552 | 0 | 361 | 0 |
Which team has won more games in the past 20 years?
| team | games |
|---|---|
| Cincinnati Bengals | 9 |
| Pittsburgh Steelers | 33 |
Individual Game Statistics
| year | week | home_team | away_team | winner | day | date | time | pts_win | pts_loss | yds_win | turnovers_win | yds_loss | turnovers_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 7 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | October 15 | 13:01:00 | 15 | 0 | 274 | 0 | 232 | 3 |
| 2000 | 13 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | November 26 | 13:02:00 | 48 | 28 | 372 | 0 | 309 | 3 |
| 2001 | 4 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | October 7 | 13:02:00 | 16 | 7 | 413 | 2 | 214 | 1 |
| 2001 | 16 | Cincinnati Bengals | Pittsburgh Steelers | Cincinnati Bengals | Sun | December 30 | 13:02:00 | 26 | 23 | 544 | 3 | 313 | 5 |
| 2002 | 6 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 13 | 13:02:00 | 34 | 7 | 408 | 2 | 268 | 4 |
| 2002 | 12 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | November 24 | 13:01:00 | 29 | 21 | 391 | 0 | 352 | 1 |
| 2003 | 3 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | September 21 | 13:02:00 | 17 | 10 | 376 | 1 | 182 | 1 |
| 2003 | 13 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | November 30 | 13:01:00 | 24 | 20 | 379 | 0 | 384 | 2 |
| 2004 | 4 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | October 3 | 13:01:00 | 28 | 17 | 333 | 2 | 293 | 3 |
| 2004 | 11 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | November 21 | 13:03:00 | 19 | 14 | 235 | 1 | 209 | 1 |
| 2005 | 7 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 23 | 13:02:00 | 27 | 13 | 304 | 2 | 302 | 2 |
| 2005 | 13 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | December 4 | 13:02:00 | 38 | 31 | 324 | 0 | 474 | 4 |
| 2005 | NA | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | January 8 | 16:36:00 | 31 | 17 | 346 | 0 | 327 | 2 |
| 2006 | 3 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | September 24 | 13:02:00 | 28 | 20 | 246 | 3 | 365 | 5 |
| 2006 | 17 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 31 | 13:03:00 | 23 | 17 | 482 | 2 | 295 | 0 |
| 2007 | 8 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 28 | 13:02:00 | 24 | 13 | 390 | 1 | 296 | 1 |
| 2007 | 13 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 2 | 20:24:00 | 24 | 10 | 285 | 4 | 249 | 1 |
| 2008 | 7 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 19 | 13:04:00 | 38 | 10 | 375 | 0 | 212 | 1 |
| 2008 | 12 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Thu | November 20 | 20:15:00 | 27 | 10 | 364 | 1 | 208 | 1 |
| 2009 | 3 | Cincinnati Bengals | Pittsburgh Steelers | Cincinnati Bengals | Sun | September 27 | 16:15:00 | 23 | 20 | 273 | 0 | 373 | 1 |
| 2009 | 10 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | November 15 | 13:02:00 | 18 | 12 | 218 | 0 | 226 | 1 |
| 2010 | 9 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Mon | November 8 | 20:40:00 | 27 | 21 | 314 | 2 | 272 | 2 |
| 2010 | 14 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 12 | 13:02:00 | 23 | 7 | 354 | 0 | 190 | 3 |
| 2011 | 10 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | November 13 | 13:02:00 | 24 | 17 | 328 | 1 | 279 | 2 |
| 2011 | 13 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 4 | 13:02:00 | 35 | 7 | 295 | 0 | 232 | 2 |
| 2012 | 7 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 21 | 20:30:00 | 24 | 17 | 431 | 2 | 185 | 1 |
| 2012 | 16 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | December 23 | 13:02:00 | 13 | 10 | 267 | 3 | 280 | 3 |
| 2013 | 2 | Cincinnati Bengals | Pittsburgh Steelers | Cincinnati Bengals | Mon | September 16 | 20:41:00 | 20 | 10 | 407 | 0 | 278 | 2 |
| 2013 | 15 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 15 | 20:30:00 | 30 | 20 | 290 | 1 | 279 | 1 |
| 2014 | 14 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 7 | 13:02:00 | 42 | 21 | 543 | 0 | 408 | 2 |
| 2014 | 17 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 28 | 20:30:00 | 27 | 17 | 346 | 3 | 337 | 3 |
| 2015 | 8 | Pittsburgh Steelers | Cincinnati Bengals | Cincinnati Bengals | Sun | November 1 | 13:02:00 | 16 | 10 | 296 | 2 | 356 | 3 |
| 2015 | 14 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 13 | 13:02:00 | 33 | 20 | 354 | 1 | 385 | 3 |
| 2015 | NA | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sat | January 9 | 20:15:00 | 18 | 16 | 369 | 2 | 279 | 4 |
| 2016 | 2 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | September 18 | 13:02:00 | 24 | 16 | 374 | 2 | 412 | 2 |
| 2016 | 15 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | December 18 | 13:00:00 | 24 | 20 | 382 | 0 | 222 | 1 |
| 2017 | 7 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | October 22 | 16:25:00 | 29 | 14 | 420 | 0 | 179 | 2 |
| 2017 | 13 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Mon | December 4 | 20:30:00 | 23 | 20 | 374 | 1 | 353 | 0 |
| 2018 | 6 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | October 14 | 13:00:00 | 28 | 21 | 481 | 0 | 275 | 0 |
| 2018 | 17 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Sun | December 30 | 16:25:00 | 16 | 13 | 343 | 1 | 196 | 0 |
| 2019 | 4 | Pittsburgh Steelers | Cincinnati Bengals | Pittsburgh Steelers | Mon | September 30 | 20:15:00 | 27 | 3 | 326 | 1 | 175 | 2 |
| 2019 | 12 | Cincinnati Bengals | Pittsburgh Steelers | Pittsburgh Steelers | Sun | November 24 | 13:00:00 | 16 | 10 | 338 | 1 | 244 | 2 |
In this analysis, the main goal was to understand what all goes into winning an NFL game and what teams are historically successful in the standings. I was able to successfully break out this analysis into multiple different sections, including, but not limited to: (1) The Importance of Fan Attendance; (2) Standings over the Years; (3) Offense vs. Defense; and (4) Individual Game Observations.
Through extensive use of R, I investigated eight datasets with various information regarding the NFL. Linear modeling to discover the correlation between several datasets was frequently used. Additionally, the ggplot2 package delivered great visualizations to showcase this breakdown of the NFL. New variables and tables were created as well to drill deeper into the data for a better understanding of the raw data. One of my primary focuses was a breakdown of the divisions and their successes over the past 20 years. Box plot visualizations between the two conferences illuminated how teams have fared in the win column from their best season to their worst season.
My first analysis looked into NFL Fan Attendance. Graphical representations were created to better understand which teams have a strong fan base and the consistency at which fans show up on a yearly basis. From this analysis, it was evident that the Dallas Cowboys have the strongest fan base and the Los Angeles Chargers have the weakest. Additionally, the greater attendance to games positively correlated to a team’s total wins per season.
Secondly, I focused on the divisional standings through the years. As mentioned above, box plot visualizations by division showed the range of success for NFL teams. Per division, these teams have had the most success based on the nfl_standings dataset:
Using geom_col(), I observed that the AFC East has won the most Super Bowl Championships. This is due to the phenomenal success of Tom Brady and the New England Patriots during this time period.
Next, I researched one of the most common arguments in football - is the offense or defense more important? Linear modeling of the nfl_standings data was completed on several variables. High offensive rankings and defensive rankings correlate to more wins for teams. Even though having a great offense and defense are both important, the correlation tests indicated that a better offense is slightly more important to a team’s success than a better defense. I created a table of the last 20 Super Bowl Champions and showcased the offensive_ranking and defensive_ranking. Teams have been trending towards having better offenses in the last few years as evident by this table.
I then observed individual game data in the NFL. Through graphs created by ggpairs(), I was able to view correlation coefficients for six variables. The main conclusion I deduced from this is that a positive correlation exists between yards gained and points scored.
Extending my analysis, I looked into how weather conditions play a role in game outcomes. I was able to find the average temperature, humidity, and wind for each location teams may play. I also was able to train and test a model with a 70-30 split to see if weather conditions predict whether the game will be high-scoring or low-scoring. I deduced there is little predictability of game outcomes from weather conditions, as both the in-sample and out-of-sample performance of my models were underwhelming.
I also analyzed league leaders over the past 20 years – this includes teams, head coaches, quarterbacks, passing yards leaders, rushing yards leaders, and penalty leaders. Understanding who the game-changers are is important when trying to predict which team will win.
Lastly, I did an analysis of popular rivalries in the NFL to see which teams have been dominant. My personal favorite is that the Pittsburgh Steelers are 33-9 against the Cincinnati Bengals the past 20 years.
As a big NFL fan, it was incredibly interesting to see how the NFL has worked during my entire lifetime. Also, it was intriguing to see my favorite team’s success over this time span. The NFL is one of the biggest industries in the world that has large implications on many levels. Sports gambling, the NFL Draft, fantasy football, and the common fan could all have different takeaways from this analysis that would help them better understand the recent history of the NFL. With fans across the globe, a deep dive into the NFL is exciting for many groups. Coaches and players would be able to more effectively prepare for their opponents, gamblers could make more educated bets, general managers could derive their team’s needs in the Draft, and the common fan could revel in their team’s history.
This data tells a phenomenal story of the state of the NFL. However, it is a game for a reason. No one will ever be able to fully predict NFL outcomes, and that is what makes the sport as intriguing as it is!
---
title: 'More than Touchdowns: An NFL Data Analysis'
output:
flexdashboard::flex_dashboard:
source_code: embed
social: menu
theme: flatly
vertical_layout: fill
---
Project Introduction {data-navmenu="Background" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **Breaking Down the NFL**
From a young age, I have watched NFL football and cheered for my team every week. Given my dad is from Pittsburgh and my parents met in Pittsburgh, I was naturally raised a lifelong Steelers fan. With that said, I not only wanted to compare my team individually, but also the league as a whole. The past 20 years have seen successes and failures from every NFL team. When choosing an option for my final capstone project, I was instantly drawn to extending a project I had previously done on the NFL. The NFL has a plethora of data points publicly available. I thought, "What all goes into winning an NFL game, and what teams are historically successful in the final standings?" Using the past 20 years worth of data, I sought to investigate this problem.
### **My Focus**
Aforementioned, for my final capstone project, I am expanding upon my final project from Data Wrangling in R (BANA 7025) with Professor Tianhai Zu. I originally worked with a partner on this project; however, the extension will be my own individual work. I plan on using the functions in R to deliver overall summary statistics on games and standings. Additionally, I will use the data to develop potential correlations and plot respective data visualizations. Utilizing descriptive analysis of the past 20 years, I am looking to see if there can be predictive tendencies for NFL teams.
This analysis includes data from 2000 - 2019. I added 2020 season data to every dataset aside from `nfl_attendance` and `nfl_games`, as these would be skewed if 2020 data was added. This skewness would be due to the impact of COVID-19. COVID-19 caused games to be played on different days / times, cancellation of games, and it also caused little to no attendance based on location.
This NFL analysis consists of eight individual datasets:
1. **NFL Attendance** (`nfl_attendance`)
2. **NFL Standings** (`nfl_standings`)
3. **NFL Games** (`nfl_games`)
4. **NFL Weather** (`nfl_weather`)
5. **NFL Playoff Coaches and Quarterbacks** (`nfl_playoffs`)
6. **NFL Passing Yards Leaders** (`nfl_passing`)
7. **NFL Rushing Yards Leaders** (`nfl_rushing`)
8. **NFL Penalty Yards Per Game** (`nfl_penalty`)
More detailed information about each dataset can be found in the *Data Preparation* tab.
Row
-----------------------------------------------------------------------------
### **Goal**
The NFL is a multi-billion dollar industry. Millions of fans across the world cheer for these 32 teams every year. People are now looking for ways to understand the game better.
Coaches want to understand what makes a team more successful. Sports gamblers want to get an edge and make the correct picks based on more than just gut feelings. Fans want to know if their team is progressing in the right direction. This analysis is useful for all of these situations. Using descriptive analysis, past results can be better explained. As such, trends can be deduced to predict how NFL games and seasons will occur. Although no one can see into the future, understanding the data sheds a better light on the probability of certain results occurring in the NFL.
The goal of my analysis is to inform my readers on what all goes into winning an NFL game. My hope is that the audience will finish reading my report and better understand historic trends and performance from teams, players, and coaches alike. As a final capstone project, I hope to demonstrate proficiency in R using R Markdown as well as flexdashboard with Shiny components.
{width=10%} {width=12%} {width=9%}
Analytical Technique and Approach {data-navmenu="Background"}
=============================================================================
### **Analytical Approach**
The datasets contain loads of information for the NFL. With a wide range of variables, many options are available to analytically investigate the NFL. With the eight datasets at hand, I looked to compare them to draw conclusions about team performances. To see if statistical significance or rational conclusions related to the NFL could be realized, the following situations were explored:
* **The Importance of Fan Attendance** - This data analysis will look into if the number of fans in attendance correlates to a team's success (number of games won). Additionally, it will provide comparisons of how teams fare in home versus away games while keeping in mind the attendance at those games.
* **Standings over the Years** - The NFL has two conferences: the American Football Conference (AFC) and the National Football Conference (NFC). Each conference contains four divisions with four teams in each division. Each division then has a winner over the 16 game regular season. This analysis will look into the qualities of the division winners and the attributes that teams high in the standings have over teams in the lower portion of the standings. Furthermore, this approach will discover what separates Super Bowl Champions from the 31 other teams each season.
* **Offense vs. Defense** - The two main parts of a NFL team are the offense and defense. The goal for each team is to be great on both sides. However, this is rarely the case. Using individual game data and season-long statistics, a thorough breakdown of how having a great offense or defense improves teams will be given. I will also see if having a better offense or defense is critical to success over the years.
* **Individual Game Observations** - The `nfl_games` dataset contains many variables for games. Turnovers, day of the week, points, etc. are shown for every match-up. Correlations into why teams win or lose will be the goal of this analysis. Using a plethora of variables, significance of certain variables will be essential for further understanding.
* **The Impact of Weather on Game Outcomes** - The `nfl_weather` dataset contains the information of both the home and away teams from 2000 - 2013. This dataset also includes three weather-related variables: (1) temperature, (2) humidity, and (3) wind speed (in mph). I want to see which teams perform under certain weather conditions. Additionally, I hope to create a few linear models to see if weather conditions can predict whether or not the game will be high-scoring or low-scoring.
* **Successful Teams, Head Coaches, and Quarterbacks** - The `nfl_playoffs` dataset includes information of teams who went to the playoffs from 2000 - 2020. This dataset also includes the Super Bowl Champions. I am curious to analyze trends regarding the coaches and quarterbacks who led the teams to success. Are certain quarterbacks consistently better-performing? Are there better head coaches than others?
* **Passing Yards Leaders** - The `nfl_passing` dataset includes information from the past 20 years on the players with the most passing yards. Which player has performed consistently over the past 20 years? Who is the "best"?
* **Rushing Yards Leaders** - The `nfl_rushing` dataset has the same information as the `nfl_passing` information, except it focuses on rushing yards instead of passing yards. Which players had the most rushing yards each year from 2000 - 2020?
* **Average Penalty Yards Per Game** - Penalties are game-changers when it comes to success in a football game. One mistake can lead to an automatic first down compared to what could have been a fourth down and ten yards. In this analysis, I want to see which teams have consistently lost yards in games due to penalties.
* **Rivalry Analysis** - Every sports fanatic knows the top rivalries in the NFL. Regardless of whether you are a fan of these teams, many will tune into the game as the level of intensity is typically higher. With that said, I wanted to see which teams have been dominant in their respective rivalries.
### **Packages Required**
This project requires a variety of packages. Given there are over 10,000 packages in R, I want to focus on the ones that will provide me with the best results while cleaning and interpreting the data.
Some packages will be more useful than others. For example, `ggplot2` allows for great visualizations that provide better understanding of the data. Additionally, `dplyr` can drill deeper into the eight datasets to come to conclusions that may be hidden at first. R has powerful functions that can derive explanations for questions to massive datasets. Please see below for all of the packages loaded for this analysis:
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Packages required
library(tidyverse) # Use to tidy data
library(dplyr) # Use to manipulate data
library(ggplot2) # Use to plot data and create visualizations
library(tibble) # Use to manipulate and re-imagine data
library(readr) # Use to import data cleanly and efficiently
library(DT) # Use to create comprehensive data tables with HTML output
library(knitr) # Use for dynamic report generation
library(base) # Contains Base R functions
library(ggthemes) # Use themes in data visualizations
library(plotly) # Use to plot data and create visualizations
library(ggpubr) # Use to show multiple plots at once
library(GGally) # Use to produce scatter plot matrix
library(rmarkdown) # Use to produce report
library(flexdashboard) # Use to produce flexdashboard
library(stringr) # Provides functions to work with strings
library(highcharter) # Includes shortcut functions to plot R objects
library(shinythemes) # Use to implement themes for output
```
Importing the Data {data-navmenu="Background" data-orientation=rows}
=============================================================================
### **Importing the Data**
Most of the data (`nfl_attendance`, `nfl_standings`, and `nfl_games`) was obtained from my professor, Tianhai Zu, for the Data Wrangling in R class. He had provided four different datasets in which to choose, and my partner and I chose the NFL option. These datasets can be found on [GitHub](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-04/readme.md). Reading the information on GitHub led me to find the original source of the data, which is [Pro Football Reference Standings](https://www.pro-football-reference.com/years/2019/index.htm) and [Pro Football Reference Attendance](https://www.pro-football-reference.com/years/2019/attendance.htm).
This NFL analysis contains of eight individual datasets - (1) `nfl_attendance`, (2) `nfl_standings`, (3) `nfl_games`, (4) `nfl_weather`, (5) `nfl_playoffs`, (6) `nfl_passing`, (7) `nfl_rushing`, and (8) `nfl_penalty`.
I first merged three of the datasets (`nfl_attendance`, `nfl_standings`, and `nfl_games`) into one dataframe called `nfl_df`. I decided it might be beneficial to have multiple frames of reference, some utilizing individual datasets, and another by looking at the combined dataframe. Rather than using `str()` and `summary()` to show descriptive statistics for each variable, I decided to create comprehensive tables. Then, in the *Data Preparation* tab, I cleaned every dataset.
```{r, message = FALSE, warning = FALSE, echo = TRUE, cache = TRUE}
# Get working directory
getwd()
# Get the data
nfl_attendance <- readr::read_csv('attendance.csv')
nfl_standings <- readr::read_csv('updatedstandings.csv')
nfl_games <- readr::read_csv('games.csv')
nfl_weather <- readr::read_csv('weather.csv')
nfl_playoffs <- readr::read_csv('post_season.csv')
nfl_passing <- readr::read_csv('passing_yards_leaders.csv')
nfl_rushing <- readr::read_csv('rushing_yards_leaders.csv')
nfl_penalty <- readr::read_csv('penalty_yards_per_game.csv')
# To use 2020 data you need to update tidytuesdayR from GitHub
# Install via devtools::install_github("thebioengineer/tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2020-02-04')
tuesdata <- tidytuesdayR::tt_load(2020, week = 6)
attendance <- tuesdata$attendance
# Join the data relatively nicely with dplyr
nfl_df <- dplyr::left_join(nfl_attendance, nfl_standings, nfl_games, by = c("year", "team_name", "team"))
```
Data Preparation {data-navmenu="Background" data-orientation=rows}
=============================================================================
Row {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Attendance**
As aforementioned, the `nfl_attendance` dataset was imported and obtained from Pro Football Reference. The original data contains 10,846 observations and eight variables. There are two character type variables, `team` and `team_name`. There are six numeric type variables, `year`, `total`, `home`, `away`, `week`, `weekly_attendance`. The data was collected from 2000 - 2020, and the values for the columns were observed during the 17 weeks of the NFL season.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
# Examine the structure of the dataset
datatable(head(nfl_attendance, 10))
# Create a data dictionary for attendance
var_names_att <- colnames(nfl_attendance)
var_types_att <- lapply(nfl_attendance, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_att <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total attendance per season", "Total attendance at home games per season", "Total attendance at away games per season", "Week in which game was played", "Attendance for given week")
data_dict_att <- as_tibble(cbind(var_names_att, var_types_att, var_descriptions_att))
colnames(data_dict_att) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_att) # kable returns a single table for a single data object
```
Looking at the missing data values, the only column in which missing values exist is the `weekly_attendance`. This makes sense, as each NFL team has at least one bye week during the regular season. I decided to omit these values as they would skew the data and misrepresent the trends for each team.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
colSums(is.na(nfl_attendance)) # Find the number of missing values per column
nfl_attendance <- na.omit(nfl_attendance)
colSums(is.na(nfl_attendance)) # Confirm there are no missing values
```
Looking at this above original dataset, I decided to first rename the columns to better describe the data.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_attendance <- nfl_attendance %>% dplyr::rename(
team_location = team,
total_attendance = total,
total_home_attendance = home,
total_away_attendance = away
)
```
Additionally, I split it into two dataframes, the first omitting the weekly data, and the second omitting the season totals. This decision was made largely to remove duplicates, and I knew it would bode for better visualizations during the exploratory data analysis (EDA).
The first dataset, `nfl_total_attendance` erased the two columns, `week` and `weekly_attendance`. This dataset will show the season totals for attendance per each team. The second dataset, `nfl_weekly_attendance` erased the season total data columns, `total`, `home`, and `away`.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_total_attendance <- nfl_attendance[-c(7, 8)] # Remove weekly data
nfl_total_attendance <- nfl_total_attendance[!duplicated(nfl_total_attendance), ] # Remove duplicates
datatable(head(nfl_total_attendance, 10))
nfl_weekly_attendance <- nfl_attendance[-c(4, 5, 6)] # Remove season total attendance data
datatable(head(nfl_weekly_attendance, 10))
```
Now, for a summary of the two datasets and associated tables of the ***CLEANED*** data, please see below.
**NFL Total Attendance Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the final summary and structure of the nfl_total_attendance dataset
datatable(head(nfl_total_attendance, 10))
```
**Data Dictionary for the NFL Total Attendance Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for nfl_total_attendance
var_names_att <- colnames(nfl_total_attendance)
var_types_att <- lapply(nfl_total_attendance, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_att <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total attendance per season", "Total attendance at home games per season", "Total attendance at away games per season")
data_dict_total_att <- as_tibble(cbind(var_names_att, var_types_att, var_descriptions_att))
colnames(data_dict_total_att) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_total_att) # kable returns a single table for a single data object
```
**NFL Weekly Attendance Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the final summary and structure of the nfl_weekly_attendance dataset
datatable(head(nfl_weekly_attendance, 10))
```
**Data Dictionary for the NFL Weekly Attendance Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for nfl_weekly_attendance
var_names_att <- colnames(nfl_weekly_attendance)
var_types_att <- lapply(nfl_weekly_attendance, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_att <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Week in which game was played", "Attendance for given week")
data_dict_weekly_att <- as_tibble(cbind(var_names_att, var_types_att, var_descriptions_att))
colnames(data_dict_weekly_att) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_weekly_att) # kable returns a single table for a single data object
```
### **Standings**
The `nfl_standings` dataset was imported and obtained from Pro Football Reference. The original data contains 638 observations and 15 variables. There are four character type variables, `team`, `team_name`, `playoffs`, and `sb_winner`. There are 11 numeric type variables, `year`, `wins`, `loss`, `points_for`, `points_against`, `points_differential`, `margin_of_victory`, `strength_of_schedule`, `simple_rating`, `offensive_ranking`, and `defensive_ranking`. The data observed was collected from 2000 - 2020. The process of cleaning the ***ORIGINAL*** data can be seen below.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
# Examine the structure of the dataset
datatable(head(nfl_standings, 10))
# Create a data dictionary for standings
var_names_st <- colnames(nfl_standings)
var_types_st <- lapply(nfl_standings, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_st <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins per season (0 to 16)", "Total losses per season (0 to 16)", "Total points the team scored per season", "Total points the opponent scored on the team per season", "The difference between the total points for the team and against the team", "Points differential divided by the total number of games per season", "Difficulty of schedule based on opponent records", "A rating for the team that takes into account points differential and strength of schedule (measured by Simple Rating System)", "A rating comparing how well the offense performs to opponent teams (measured by Simple Rating System)", "A rating comparing how well the defense performs to opponent teams (measured by Simple Rating System)", "Stating whether or not the team made it to the playoffs", "Stating whether or not the team won the Super Bowl for the season")
data_dict_st <- as_tibble(cbind(var_names_st, var_types_st, var_descriptions_st))
colnames(data_dict_st) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_st) # kable returns a single table for a single data object
```
Looking at the above dataset, I first decided to change the column names to better describe the data.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_standings <- nfl_standings %>% dplyr::rename(
team_location = team,
total_wins = wins,
total_losses = loss
)
```
It is important to note as well that a few of the variable names refer to calculated values. The calculated value for `points_differential` is: `points_differential = points_for - points_against`. Additionally, `margin_of_victory` is calculated by: `points_scored - points_allowed / games_played`.
Lastly, the `simple_rating` is calculated by: $$SRS = MoV + SoS = OSRS + DSRS$$
In layman's terms, the simple rating system is equal to the margin of victory plus the strength of schedule. This is equal to the offensive simple rating standing plus the defensive simple rating standing.
Next, I wanted to see what the sum of missing values was per column. As evident below, there are no missing values.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
colSums(is.na(nfl_standings))
```
Moving forward, I decided to change both the `playoffs` and `sb_winner` to binary variables. This is because they both only have two unique values.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
unique(nfl_standings$playoffs, incomparables = FALSE) # View the unique values for the playoffs column
unique(nfl_standings$sb_winner, incomparables = FALSE) # View the unique values for the sb_winner column
```
Knowing this, I changed the two columns to binary variables. For the `playoffs` column, a value of one stands for "Playoffs", and a value of zero stands for "No Playoffs".
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_standings$playoffs[nfl_standings$playoffs == "Playoffs"] <- "1"
nfl_standings$playoffs[nfl_standings$playoffs == "No Playoffs"] <- "0"
nfl_standings$playoffs <- as.numeric(nfl_standings$playoffs)
```
For the `sb_winner` column, a value of one denotes "Won Superbowl", and a value of zero denotes "No Superbowl".
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_standings$sb_winner[nfl_standings$sb_winner == "Won Superbowl"] <- "1"
nfl_standings$sb_winner[nfl_standings$sb_winner == "No Superbowl"] <- "0"
nfl_standings$sb_winner <- as.numeric(nfl_standings$sb_winner)
```
Now, for a summary of the dataset and associated table of the data, please see the ***CLEANED*** dataset below.
**NFL Standings Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_standings, 10))
```
**Data Dictionary for the NFL Standings Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_st <- colnames(nfl_standings)
var_types_st <- lapply(nfl_standings, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_st <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins per season (0 to 16)", "Total losses per season (0 to 16)", "Total points the team scored per season", "Total points the opponent scored on the team per season", "The difference between the total points for the team and against the team", "Points differential divided by the total number of games per season", "Difficulty of schedule based on opponent records", "A rating for the team that takes into account points differential and strength of schedule (measured by Simple Rating System)", "A rating comparing how well the offense performs to opponent teams (measured by Simple Rating System)", "A rating comparing how well the defense performs to opponent teams (measured by Simple Rating System)", "Stating whether or not the team made it to the playoffs", "Stating whether or not the team won the Super Bowl for the season")
data_dict_st <- as_tibble(cbind(var_names_st, var_types_st, var_descriptions_st))
colnames(data_dict_st) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_st) # kable returns a single table for a single data object
```
### **Games**
Once again the `nfl_games` data was imported and obtained from Pro Football Reference. The original data contains 5,324 observations and 19 variables. There are 11 character variables, `week`, `home_team`, `away_team`, `winner`, `tie`, `day`, `date`, `home_team_name`, `home_team_city`, `away_team_name`, and `away_team_city`. There are seven numeric type variables, `year`, `pts_win`, `pts_loss`, `yds_win`, `turnovers_win`, `yds_loss`, and `turnovers_loss`. See the ***ORIGINAL*** dataset below.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
# Examine the structure of the dataset
datatable(head(nfl_games, 10))
# Create a data dictionary for games
var_names_games <- colnames(nfl_games)
var_types_games <- lapply(nfl_games, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_games <- c("Year", "Week of the season in which the game was played", "Home team for the game", "Away team for the game", "Winner of the game", "Was there a tie? (if so, the other team will be listed in this column)", "Day of the week in which the game was played", "Date of the game", "Time of the day in which the game was played", "Number of points the winning team scored", "Number of points the losing team scored", "Total number of yards the winning team had", "Total number of turnovers the winning team had", "Total number of yards the losing team had", "Total number of turnovers the losing team had", "Name or mascot of the winning team", "City of the winning team", "Name or mascot of the losing team", "City of the losing team")
data_dict_games <- as_tibble(cbind(var_names_games, var_types_games, var_descriptions_games))
colnames(data_dict_games) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_games) # kable returns a single table for a single data object
```
Looking at the above dataset, the first step I took to clean the data was to remove the last four unnecessary columns, as I felt they were redundant.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
names(nfl_games)
nfl_games <- nfl_games[-c(16, 17, 18, 19)] # Remove redundant columns
names(nfl_games)
```
Then, I changed the `week` column to be numeric.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_games$week <- as.numeric(nfl_games$week)
```
Looking at missing values, the only column which contained them was the `tie` column. This makes sense, as very few NFL games result in a tie.
Next, the way in which a tie was denoted was by listing one team name in the `winner` column, and the opponent team name in the `tie` column. To fix this, I identified any game that resulted in a tie. Then, for these specific games, I renamed the value in the `winner` column to "Tie". The `tie` column was then erased.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
colSums(is.na(nfl_games))
unique(nfl_games$tie, incomparables = FALSE)
nfl_games$winner[nfl_games$tie != is.na(nfl_games$tie)] <- "Tie"
nfl_games <- nfl_games[-c(6)] # Remove the tie column
colSums(is.na(nfl_games)) # Confirm there are no missing values
```
To view the summary and structure of the ***CLEANED*** data:
**NFL Games Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_games, 10))
```
**Data Dictionary for the NFL Games Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for games
var_names_games <- colnames(nfl_games)
var_types_games <- lapply(nfl_games, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_games <- c("Year", "Week of the season in which the game was played", "Home team for the game", "Away team for the game", "Winner of the game", "Day of the week in which the game was played", "Date of the game", "Time of the day in which the game was played", "Number of points the winning team scored", "Number of points the losing team scored", "Total number of yards the winning team had", "Total number of turnovers the winning team had", "Total number of yards the losing team had", "Total number of turnovers the losing team had")
data_dict_games <- as_tibble(cbind(var_names_games, var_types_games, var_descriptions_games))
colnames(data_dict_games) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_games) # kable returns a single table for a single data object
```
### **Weather**
Incorporating weather data into my analysis is an interesting next step. I want to see how the weather impacts the outcome of individual games. The `nfl_weather` data is from [NFLsavant.com](http://nflsavant.com/about.php). All data and statistics from this site are compiled from publicly-available NFL play-by-play on the Internet. The one negative is that this data only has until 2013; however, I thought 13 years of data was enough to see any significant trends.
The original data contains 3,521 observations and 13 variables. The variables are described in the data dictionary below. See the ***ORIGINAL*** NFL Weather data below.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
# Examine the structure of the dataset
datatable(head(nfl_weather, 10))
# Create a data dictionary for standings
var_names_w <- colnames(nfl_weather)
var_types_w <- lapply(nfl_weather, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_w <- c("Full home team name", "City or state in which the home team originates", "Name or mascot of the home team", "Total points scored by the home team", "Full away team name", "City or state in which the away team originates", "Name or mascot of the away team", "Total points scored by the away team", "Winner of the game", "Temperature during the game (in Fahrenheit)", "Humidity percentage during the game", "Wind speed in miles per hour (mph) during the game", "Date of the game played")
data_dict_w <- as_tibble(cbind(var_names_w, var_types_w, var_descriptions_w))
colnames(data_dict_w) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_w) # kable returns a single table for a single data object
```
Looking at the above dataset, the first step I took to clean the data was to remove the `home_team` and `away_team` columns, as I felt they were redundant.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
names(nfl_weather)
nfl_weather <- nfl_weather[-c(1, 5)] # Remove redundant columns
names(nfl_weather)
```
To view the summary and structure of the ***CLEANED*** data:
**NFL Weather Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_weather, 10))
```
**Data Dictionary for the NFL Weather Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_w <- colnames(nfl_weather)
var_types_w <- lapply(nfl_weather, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_w <- c("City or state in which the home team originates", "Name or mascot of the home team", "Total points scored by the home team", "City or state in which the away team originates", "Name or mascot of the away team", "Total points scored by the away team", "Winner of the game", "Temperature during the game (in Fahrenheit)", "Humidity percentage during the game", "Wind speed in miles per hour (mph) during the game", "Date of the game played")
data_dict_w <- as_tibble(cbind(var_names_w, var_types_w, var_descriptions_w))
colnames(data_dict_w) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_w) # kable returns a single table for a single data object
```
### **Playoffs**
The next dataset within my analysis is the `nfl_playoffs` dataset. This looks into the coaches and quarterbacks for each team that went to the playoffs from 2000 - 2020. I created this dataset myself through research.
The original data contains 3,521 observations and 13 variables. The variables are described in the data dictionary below. See the ***ORIGINAL*** NFL Weather data below.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
# Examine the structure of the dataset
datatable(head(nfl_playoffs, 10))
# Create a data dictionary for standings
var_names_playoffs <- colnames(nfl_playoffs)
var_types_playoffs <- lapply(nfl_playoffs, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_playoffs <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins for the team", "Total losses for the team", "Whether or not the team went to the Playoffs", "Whether or not the team won the Super Bowl", "Head coach of the team", "Starting quarterback during the postseason")
data_dict_playoffs <- as_tibble(cbind(var_names_playoffs, var_types_playoffs, var_descriptions_playoffs))
colnames(data_dict_playoffs) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_playoffs) # kable returns a single table for a single data object
```
Moving forward, I decided to change the `sb_winner` to binary variables. This is because it only has two unique values. Because the unique value for the `playoffs` column is only "Playoffs", I decided to drop that column.
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
unique(nfl_playoffs$playoffs, incomparables = FALSE) # View the unique values for the playoffs column
unique(nfl_playoffs$sb_winner, incomparables = FALSE) # View the unique values for the sb_winner column
names(nfl_playoffs)
nfl_playoffs <- nfl_playoffs[-6] # Remove unnecessary column
names(nfl_playoffs)
```
For the `sb_winner` column, a value of one denotes "Won Superbowl", and a value of zero denotes "No Superbowl".
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
nfl_playoffs$sb_winner[nfl_playoffs$sb_winner == "Won Superbowl"] <- "1"
nfl_playoffs$sb_winner[nfl_playoffs$sb_winner == "No Superbowl"] <- "0"
nfl_playoffs$sb_winner <- as.numeric(nfl_playoffs$sb_winner)
```
To view the summary and structure of the ***CLEANED*** data:
**NFL Playoffs Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_playoffs, 10))
```
**Data Dictionary for the NFL Playoffs Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_playoffs <- colnames(nfl_playoffs)
var_types_playoffs <- lapply(nfl_playoffs, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_playoffs <- c("City or state in which the team originates", "Name or mascot of the team", "Year", "Total wins for the team", "Total losses for the team", "Whether or not the team won the Super Bowl", "Head coach of the team", "Starting quarterback during the postseason")
data_dict_playoffs <- as_tibble(cbind(var_names_playoffs, var_types_playoffs, var_descriptions_playoffs))
colnames(data_dict_playoffs) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_playoffs) # kable returns a single table for a single data object
```
### **Passing Yards Leaders**
The `nfl_passing` dataset contains information regarding the league leader for passing yards from each year. Their respective team information is included. This data is from [Pro Football Reference](https://www.pro-football-reference.com/).
This dataset does not need to be cleaned or edited, so to view the summary and structure of the ***CLEANED*** data:
**NFL Passing Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_passing, 10))
```
**Data Dictionary for the NFL Passing Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_passing <- colnames(nfl_passing)
var_types_passing <- lapply(nfl_passing, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_passing <- c("Year", "Name of the player with the most passing yards", "Total yards", "Location of the team from which the player is on", "Name or mascot of the team from which the player is on")
data_dict_passing <- as_tibble(cbind(var_names_passing, var_types_passing, var_descriptions_passing))
colnames(data_dict_passing) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_passing) # kable returns a single table for a single data object
```
### **Rushing Yards Leaders**
The last dataset, `nfl_rushing`, contains information regarding the league leader for rushing yards from each year. Their respective team information is included. This data is also from [Pro Football Reference](https://www.pro-football-reference.com/).
Similar to the last dataset, this dataset does not need to be cleaned or edited, so to view the summary and structure of the ***CLEANED*** data:
**NFL Rushing Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_rushing, 10))
```
**Data Dictionary for the NFL Rushing Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_rushing <- colnames(nfl_rushing)
var_types_rushing <- lapply(nfl_rushing, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_rushing <- c("Year", "Name of the player with the most rushing yards", "Total yards", "Location of the team from which the player is on", "Name or mascot of the team from which the player is on")
data_dict_rushing <- as_tibble(cbind(var_names_rushing, var_types_rushing, var_descriptions_rushing))
colnames(data_dict_rushing) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_rushing) # kable returns a single table for a single data object
```
### **Penalties Per Game**
The `nfl_penalty` dataset contains of average penalty yards per game per team from 2003 - 2020. The data is from [TeamRankings](https://www.teamrankings.com/nfl/stat/penalty-yards-per-game?date=2020-02-03).
This dataset did not need to be cleaned, so To look at the summary and structure of the ***CLEANED*** data:
**NFL Penalty Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Examine the structure of the dataset
datatable(head(nfl_penalty, 10))
```
**Data Dictionary for the NFL Penalty Dataset**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create a data dictionary for standings
var_names_penalty <- colnames(nfl_penalty)
var_types_penalty <- lapply(nfl_penalty, class) # lapply returns a list of the same length as X (a vector)
var_descriptions_penalty <- c("City or state in which the team originates", "Name or mascot of the team", "Average penalty yards per game from 2020", "Average penalty yards per game from 2019", "Average penalty yards per game from 2018", "Average penalty yards per game from 2017", "Average penalty yards per game from 2016", "Average penalty yards per game from 2015", "Average penalty yards per game from 2014", "Average penalty yards per game from 2013", "Average penalty yards per game from 2012", "Average penalty yards per game from 2011", "Average penalty yards per game from 2010", "Average penalty yards per game from 2009", "Average penalty yards per game from 2008", "Average penalty yards per game from 2007", "Average penalty yards per game from 2006", "Average penalty yards per game from 2005", "Average penalty yards per game from 2004", "Average penalty yards per game from 2003", "Total penalty yards")
data_dict_penalty <- as_tibble(cbind(var_names_penalty, var_types_penalty, var_descriptions_penalty))
colnames(data_dict_penalty) <- c("Variable Name", "Variable Data Type", "Variable Description")
kable(data_dict_penalty) # kable returns a single table for a single data object
```
Total Attendance Breakdown {data-navmenu="Attendance" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Total Attendance Breakdown**
As mentioned in the introduction, this data analysis will look into if the number of fans in attendance correlates to a team's success (number of games won). Additionally, it will provide comparisons of how teams fare in home versus away games while keeping in mind the attendance at those games. Earlier in the data preparation, I split the `attendance` dataset into two separate datasets, `nfl_total_attendance` and `nfl_weekly_attendance`. To first understand the importance of fan attendance, it is critical to observe which teams have the strongest fan base over the past 20 years.
Instead of using the teams' total attendance numbers, I wanted to take an average of each team's weekly attendance. I feel this will give me a more accurate representation of attendance. With that said, I added a column to the `nfl_weekly_attendance` column to calculate the mean.
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of yearly means for each team's attendance
nfl_weekly_attendance <- nfl_weekly_attendance %>%
group_by(team_name, year) %>%
mutate(avg_attendance = mean(weekly_attendance))
head(nfl_weekly_attendance, 20)
```
Using the `ggplotly` function, the graph becomes interactive. To look and interact with the visualization to the right, you can scroll over the lines to get a detailed description including the year, total attendance, and team. You can click on a team once to *remove* it from the visualization, or you can double-click on the team in the legend to *isolate* that line. This interaction enables you to filter to specific teams in order to see their attendance trends since 2000.
In the visualization to the right, it is evident that the **Dallas Cowboys** appear to have the strongest fan base, and the **Los Angeles Chargers** appear to have the weakest fan base. The top five teams with the current highest attendance records are:
1. **Dallas Cowboys**
2. **Green Bay Packers**
3. **Los Angeles Rams**
4. **New York Giants**
5. **Philadelphia Eagles**
It is also important to note that the spike in attendance for the Dallas Cowboys in 2009 can be attributed to the opening of their brand new AT&T Stadium. This stadium opened on May 27, 2009. The stadium holds 80,000 people in the stands but can be expanded to hold more than 100,000 individuals when standing room only areas are included.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **NFL Weekly Attendance**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create visualization for average weekly attendance per year for all teams
team_avg_attendance <-
ggplot(data = nfl_weekly_attendance,
aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
ggplotly(team_avg_attendance)
```
### **Division-Basis**
Now, I wanted to break attendance down on a division-basis. In order to do this, I added a column to the dataset, called "division".
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Attach the dataset to avoid calling on specific columns
attach(nfl_weekly_attendance)
# Create new column
nfl_weekly_attendance$division <-
ifelse(team_name == "Patriots" | team_name == "Bills" | team_name == "Jets" | team_name == "Dolphins", "AFC East",
ifelse(team_name == "Ravens" | team_name == "Steelers" | team_name == "Bengals" | team_name == "Browns", "AFC North",
ifelse(team_name == "Texans" | team_name == "Titans" | team_name == "Colts" | team_name == "Jaguars", "AFC South",
ifelse(team_name == "Chiefs" | team_name == "Broncos" | team_name == "Raiders" | team_name == "Chargers", "AFC West",
ifelse(team_name == "Eagles" | team_name == "Cowboys" | team_name == "Giants" | team_name == "Redskins", "NFC East",
ifelse(team_name == "Packers" | team_name == "Vikings" | team_name == "Bears" | team_name == "Lions", "NFC North",
ifelse(team_name == "Saints" | team_name == "Falcons" | team_name == "Buccaneers" | team_name == "Panthers", "NFC South",
ifelse(team_name == "49ers" | team_name == "Seahawks" | team_name == "Rams" | team_name == "Cardinals", "NFC West",
NA))))))) )
```
Once the `division` column was created, the breakdown of the strongest and weakest fan bases per division can be seen in the table below. Individual graphs for both the AFC and NFC can be seen under the tabs *AFC Attendance Breakdown* and *NFC Attendance Breakdown*.
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create table for attendance summary
attendance_summary <- matrix(c("New York Jets", "Miami Dolphins", "Baltimore Ravens", "Cincinnati Bengals", "Houston Texans", "Indianapolis Colts", "Kansas City Chiefs", "Los Angeles Chargers", "Dallas Cowboys", "Washington Redskins", "Green Bay Packers", "Detroit Lions", "New Orleans Saints", "Tampa Bay Buccaneers", "Los Angeles Rams", "Arizona Cardinals"), ncol = 2, byrow = TRUE)
colnames(attendance_summary) <- c("Strongest Fan Base","Weakest Fan Base")
rownames(attendance_summary) <- c("AFC East","AFC North","AFC South", "AFC West", "NFC East", "NFC North", "NFC South", "NFC West")
attendance_summary <- as.table(attendance_summary)
kable(attendance_summary)
```
AFC Attendance Breakdown {data-navmenu="Attendance" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **AFC East**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC East
ggplotly(
nfl_weekly_attendance %>%
filter(division == "AFC East") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("AFC East Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
### **AFC North**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC North
ggplotly(
nfl_weekly_attendance %>%
filter(division == "AFC North") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("AFC North Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
Row
------------------------------------------------------------------------------
### **AFC South**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC South
ggplotly(
nfl_weekly_attendance %>%
filter(division == "AFC South") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("AFC South Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
### **AFC West**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC West
ggplotly(
nfl_weekly_attendance %>%
filter(division == "AFC West") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("AFC West Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
NFC Attendance Breakdown {data-navmenu="Attendance" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **NFC East**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC East
ggplotly(
nfl_weekly_attendance %>%
filter(division == "NFC East") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("NFC East Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
### **NFC North**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC North
ggplotly(
nfl_weekly_attendance %>%
filter(division == "NFC North") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("NFC North Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
Row
-----------------------------------------------------------------------------
### **NFC South**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC South
ggplotly(
nfl_weekly_attendance %>%
filter(division == "NFC South") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("NFC South Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
### **NFC West**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC West
ggplotly(
nfl_weekly_attendance %>%
filter(division == "NFC West") %>%
ggplot(aes(x = year,
y = avg_attendance,
color = team_name)) +
geom_point(size = 1, alpha = .8) +
geom_smooth(size = .8, se = FALSE) +
scale_y_continuous(name = "Average Weekly Attendance") +
scale_x_continuous(name = "Year") +
ggtitle("NFC West Average Weekly Attendance Per Year") +
labs(col = "Team Name") +
theme_stata()
)
```
Impact on Wins {data-navmenu="Attendance" data-orientation=columns}
=============================================================================
Column
-----------------------------------------------------------------------------
### **What Impacts Total Wins?**
Knowing the previously discussed attendance statistics, I want to see if a stronger home attendance impacts the total number of wins. A team cannot necessarily control their away attendance, as their most loyal fans are assumed to be unlikely attendees at an away game.
First, I wanted to discover if home attendance impacts total wins. To do so, I created a linear model with `total_wins` as the response variable and `total_home_attendance` as the predictor variable. I also obtained the correlation coefficient between the two variables. To the right, in the **Home Attendance** tab, it appears that there is a slight, positive linear relationship between the predictor variable (**X** or `total_home_attendance`) and the response variable (**Y** or `total_wins`). The correlation coefficient between the two variables is 0.1507, and this relationship is statistically significant at a 99% confidence level with a p-value of *0.000133*. The `lm()` function was used to perform simple linear regression between the two variables.
Next, I wanted to discover if away attendance impacts total wins. I followed the same process I did for home attendance, creating a linear model with `total_wins` as the response variable and `total_away_attendance` as the predictor variable. From the visualization in the **Away Attendance** tab, it appears that there is also a very slight, positive linear relationship between the predictor variable (**X** or `total_away_attendance`) and the response variable (**Y** or `total_wins`). The correlation coefficient between the two variables is 0.1274, and this relationship is statistically significant at a 99% confidence level with a p-value of *0.00126*. The `lm()` function was used to perform simple linear regression between the two variables.
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Select columns from nfl_standings
nfl_wins <-
nfl_standings %>%
select(team_name, year, total_wins)
# Perform left join to get needed statistics
joined_data <- left_join(nfl_total_attendance, nfl_wins, by = c("team_name", "year"))
joined_data
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Attach the dataset
attach(joined_data)
# Create linear model for home attendance
home_attendance_model <- lm(total_wins ~ total_home_attendance)
summary(home_attendance_model)
cor(total_wins, total_home_attendance)
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Attach the dataset
attach(joined_data)
# Create linear model for away attendance
away_attendance_model <- lm(total_wins ~ total_away_attendance)
summary(away_attendance_model)
cor(total_wins, total_away_attendance)
```
Column {.tabset}
------------------------------------------------------------------------------
### **Home Attendance**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Plot data
home_atten_v_wins <- ggplot(data = joined_data, aes(total_home_attendance, total_wins)) +
geom_point(size = 1, alpha = .8, col = "red") +
geom_smooth(method = "lm", size = .8, se = FALSE) +
xlab("Total Home Attendance") +
ylab("Total Wins") +
ggtitle("Total Home Attendance vs. Total Wins")
ggplotly(home_atten_v_wins)
```
### **Away Attendance**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Plot data
away_atten_v_wins <- ggplot(data = joined_data, aes(total_away_attendance, total_wins)) +
geom_point(size = 1, alpha = .8, col = "magenta") +
geom_smooth(method = "lm", size = .8, se = FALSE) +
xlab("Total Away Attendance") +
ylab("Total Wins") +
ggtitle("Total Away Attendance vs. Total Wins")
ggplotly(away_atten_v_wins)
```
Super Bowl Champions {data-navmenu="Standings" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **Standings Over the Years**
This part of the analysis will look into the qualities of the division winners and the attributes that teams high in the standings have over teams in the lower portion of the standings. Furthermore, this approach will discover what separates Super Bowl Champions from the 31 other teams each season.
Firstly, I wanted to see which division has brought home the most Super Bowl Championships over the past 20 years. I once again added a "division" column to `nfl_standings`. As evident in the below visualization, the **AFC East** has had the most Super Bowl wins between 2000-2020. This can be largely attributed to the New England Patriots' former quarterback Tom Brady and current head coach Bill Belichick bringing home championships in 2002, 2004, 2005, 2015, 2017, and 2019. Additionally, the second-best division appears to be the **AFC North**, with both the Pittsburgh Steelers and Baltimore Ravens winning at least one Super Bowl Championship each. Conversely, it appears the AFC South, NFC North, and NFC West have all only won one Super Bowl over the past two decades.
Analyzing NFL standings with the given datasets is a bit tricky due to the fact that standings are calculated using tie-breakers if necessary. Additionally, choosing which teams make the playoffs is largely based off of division success. With that being said, the team that had the most wins might not be the team with the best standing. For this analysis, I decided to break the teams down by division and see which ones have been dominant over the years.
I analyzed their success by using summary statistics showing the *Average Total Wins*, *Average Total Losses*, *Average Points Per Game*, and *Average Opponent Points Per Game*. The results can be seen in the tabs *AFC Summaries* and *NFC Summaries*. I also developed box plots for the average total wins per season by division to analyze the range of data for each team and any relevant outliers. These box plots can be seen in the tabs *AFC Box Plots | Total Wins* and *NFC Box Plots | Total Wins*. I also grouped the box plots by conference (AFC vs. NFC).
The most dominant teams per division, defined by highest average of total wins, (as discovered in the *AFC Summaries* and *NFC Summaries* tabs) are as follows:
* **AFC East**: New England Patriots
* **AFC North**: Pittsburgh Steelers
* **AFC South**: Indianapolis Colts
* **AFC West**: Denver Broncos
* **NFC East**: Philadelphia Eagles
* **NFC North**: Green Bay Packers
* **NFC South**: New Orleans Saints
* **NFC West**: Seattle Seahawks
I also developed a table of the last 20 Super Bowl winners with their offensive and defensive ranking. This table can be found in the **Rankings of Super Bowl Champions** tab.
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Attach the dataset
attach(nfl_standings)
# Create new column
nfl_standings$division <-
ifelse(team_name == "Patriots" | team_name == "Bills" | team_name == "Jets" | team_name == "Dolphins", "AFC East",
ifelse(team_name == "Ravens" | team_name == "Steelers" | team_name == "Bengals" | team_name == "Browns", "AFC North",
ifelse(team_name == "Texans" | team_name == "Titans" | team_name == "Colts" | team_name == "Jaguars", "AFC South",
ifelse(team_name == "Chiefs" | team_name == "Broncos" | team_name == "Raiders" | team_name == "Chargers", "AFC West",
ifelse(team_name == "Eagles" | team_name == "Cowboys" | team_name == "Giants" | team_name == "Redskins", "NFC East",
ifelse(team_name == "Packers" | team_name == "Vikings" | team_name == "Bears" | team_name == "Lions", "NFC North",
ifelse(team_name == "Saints" | team_name == "Falcons" | team_name == "Buccaneers" | team_name == "Panthers", "NFC South",
ifelse(team_name == "49ers" | team_name == "Seahawks" | team_name == "Rams" | team_name == "Cardinals", "NFC West",
NA))))))) )
```
Row {.tabset .tabset-fade}
------------------------------------------------------------------------------
### **Super Bowl Champions Per Division**
```{r, message = FALSE, warning = FALSE, echo = FALSE, fig.width = 10, fig.height = 11}
# Plot Super Bowl Championships per division
sb_champions_division <- ggplot(data = nfl_standings,
aes(reorder(division, -sb_winner), sb_winner, col = team_name)) + geom_col() +
ggtitle("Super Bowl Winners by Division") +
xlab("Division") + ylab("Count of Super Bowl Championships Won") +
labs(col = "Team Name")
ggplotly(sb_champions_division)
```
### **Rankings of Super Bowl Champions**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create table for all Super Bowl Champions
super_bowl_champs <- nfl_standings %>%
filter(sb_winner == 1) %>%
select(year, team_name, offensive_ranking, defensive_ranking)
colnames(super_bowl_champs) <- c("Year", "Super Bowl Champion", "Offensive Ranking", "Defensive Ranking")
kable(super_bowl_champs)
```
AFC Summaries {data-navmenu="Standings" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **AFC East Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC East
afc_east <- nfl_standings %>%
filter(division == "AFC East") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(afc_east) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(afc_east)
```
### **AFC North Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC North
afc_north <- nfl_standings %>%
filter(division == "AFC North") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(afc_north) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(afc_north)
```
Row
-----------------------------------------------------------------------------
### **AFC South Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC South
afc_south <- nfl_standings %>%
filter(division == "AFC South") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(afc_south) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(afc_south)
```
### **AFC West Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC West
afc_west <- nfl_standings %>%
filter(division == "AFC West") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(afc_west) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(afc_west)
```
AFC Box Plots | Total Wins {data-navmenu="Standings" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **AFC East Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC East box plot
afc_east_box <- nfl_standings %>%
filter(division == "AFC East") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_afceast <- ggplot(afc_east_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "blue") +
ggtitle("AFC East") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_afceast)
```
### **AFC North Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC North box plot
afc_north_box <- nfl_standings %>%
filter(division == "AFC North") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_afcnorth <- ggplot(afc_north_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "purple") +
ggtitle("AFC North") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_afcnorth)
```
Row
-----------------------------------------------------------------------------
### **AFC South Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC South box plot
afc_south_box <- nfl_standings %>%
filter(division == "AFC South") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_afcsouth <- ggplot(afc_south_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "red") +
ggtitle("AFC South") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_afcsouth)
```
### **AFC West Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# AFC West box plot
afc_west_box <- nfl_standings %>%
filter(division == "AFC West") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_afcwest <- ggplot(afc_west_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "seagreen") +
ggtitle("AFC West") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_afcwest)
```
NFC Summaries {data-navmenu="Standings" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **NFC East Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC East
nfc_east <- nfl_standings %>%
filter(division == "NFC East") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(nfc_east) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(nfc_east)
```
### **NFC North Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC North
nfc_north <- nfl_standings %>%
filter(division == "NFC North") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(nfc_north) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(nfc_north)
```
Row
-----------------------------------------------------------------------------
### **NFC South Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC South
nfc_south <- nfl_standings %>%
filter(division == "NFC South") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(nfc_south) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(nfc_south)
```
### **NFC West Summary**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC West
nfc_west <- nfl_standings %>%
filter(division == "NFC West") %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses), points_for = mean(points_for)/16, points_against = mean(points_against)/16)
colnames(nfc_west) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(nfc_west)
```
NFC Box Plots | Total Wins {data-navmenu="Standings" data-orientation=rows}
=============================================================================
Row
-----------------------------------------------------------------------------
### **NFC East Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC East box plot
nfc_east_box <- nfl_standings %>%
filter(division == "NFC East") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_nfceast <- ggplot(nfc_east_box,
aes(team_name, total_wins)) +
geom_boxplot() +
ggtitle("NFC East") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_nfceast)
```
### **NFC North Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC North box plot
nfc_north_box <- nfl_standings %>%
filter(division == "NFC North") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_nfcnorth <- ggplot(nfc_north_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "brown") +
ggtitle("NFC North") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_nfcnorth)
```
Row
-----------------------------------------------------------------------------
### **NFC South Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC South box plot
nfc_south_box <- nfl_standings %>%
filter(division == "NFC South") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_nfcsouth <- ggplot(nfc_south_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "magenta") +
ggtitle("NFC South") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_nfcsouth)
```
### **NFC West Box Plot**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# NFC West box plot
nfc_west_box <- nfl_standings %>%
filter(division == "NFC West") %>%
select(team_name, total_wins, points_for, total_losses, points_against)
boxplot_nfcwest <- ggplot(nfc_west_box,
aes(team_name, total_wins)) +
geom_boxplot(col = "goldenrod4") +
ggtitle("NFC West") +
xlab("Team") + ylab("Total Wins") +
theme_stata()
ggplotly(boxplot_nfcwest)
```
Division Leaders {data-navmenu="Standings"}
=============================================================================
### **Division Leaders Breakdown**
Combining the tables from the previous tabs to form one table with average statistics, the following leaders can be found:
* **Average Total Wins**: New England Patriots
* **Average Total Losses**: Cleveland Browns
* **Average Points Per Game**: New England Patriots
* **Average Opponent Points Per Game**: Detroit Lions
For a more in-depth look at each team, please refer to the table below.
#### NFL Standings of all Teams
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Analysis of all teams in the NFL
division_leaders <- nfl_standings %>%
select(team_name, total_wins, points_for, total_losses, points_against) %>%
group_by(team_name) %>%
summarise(total_wins = mean(total_wins), total_losses = mean(total_losses),
points_for = mean(points_for)/16, points_against =
mean(points_against)/16)
colnames(division_leaders) <- c("Team Name", "Average Total Wins", "Average Total Losses", "Average Points Per Game", "Average Opponent Points Per Game")
kable(division_leaders)
```
Offense vs. Defense {data-navmenu="Standings" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Offense vs. Defense**
In the next step of this analysis, I investigated the importance of an offense and a defense. The offense in football is the 11 players who are on the field for a team when they have the ball. Conversely, the defense is the 11 players on the field when the other team has the ball. Sports writers and analysts have argued over the years whether a better offense or defense is more critical to a team's success. Utilizing the `nfl_standings` dataset, I sought to analyze this discussion.
First, I created a linear model showcasing a team's wins in a season using `offensive_ranking` and `defensive_ranking` as the predictor variables.
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Attach the dataset
attach(nfl_standings)
# Create a linear model
rankings_model <- lm(total_wins ~ offensive_ranking + defensive_ranking)
summary(rankings_model)
```
The model showcases that both `offensive_ranking` and `defensive_ranking` are significant variables in determining a team's total wins at a 99% confidence level. To drill deeper, the correlation coefficients were discovered for each predictor variable to total wins.
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Run correlation tests
cor(offensive_ranking, total_wins)
cor(defensive_ranking, total_wins)
```
The `offensive_ranking` had a coefficient of **0.7311** and the `defensive_ranking` had a coefficient **0.6379**. As such, it appears that a team's offense has a greater correlation to a team's wins than its defense. To visualize this, I plotted two graphs to further test this hypothesis. These can be seen in the tabs to the right.
These graphs confirm the positive correlation between an increasing offensive or defensive ranking and a team's win. Additionally, the confidence band in the defensive ranking is larger than the offensive ranking's band. This agrees with my conclusion that the offensive's ranking correlation is stronger than the defense.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Offensive Ranking**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
Off_Ranking_Plot <- ggplot(data = nfl_standings, aes(offensive_ranking, total_wins)) + geom_smooth() +
ggtitle("Total Wins | Offensive Ranking") +
xlab("Offensive Ranking") + ylab("Total Wins") +
xlim(-10, 10) + ylim(0, 16) +
theme_stata()
ggplotly(Off_Ranking_Plot)
```
### **Defensive Ranking**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create rankings graphs with confidence bands
Def_Ranking_Plot <- ggplot(data = nfl_standings, aes(defensive_ranking, total_wins)) +
geom_smooth(col = "red") +
ggtitle("Total Wins | Defensive Ranking") +
xlab("Defensive Ranking") +
ylab("Total Wins") +
xlim(-10, 10) + ylim(0, 16) +
theme_stata()
ggplotly(Def_Ranking_Plot)
```
Points For vs. Points Against {data-navmenu="Standings" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Points For vs. Points Against**
Using different statistics now, I changed the predictor variables to be `points_for` and `points_against` as these represent offensive and defensive success, respectively. Then, I used the binary `playoffs` variable to see how scoring or giving up points led to a team's probability of making the playoffs. I took the same approach as the previous variables.
This model shows that `points_for` and `points_against` are both significant as well to a team's total wins. Additionally, the correlations to total wins are **0.7276** and **-0.6667**. This indicates a strong, positive relationship for `points_for` and a strong, negative relationship for `points_against`. The offensive side, once again, has a slightly stronger relationship.
The graphical representations show these strong linear relationships as well with playoff teams typically having a low Total Points Against and high Total Points For on the season.
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Create linear model
points_model <- lm(total_wins ~ points_for + points_against)
summary(points_model)
# Run correlation tests
cor(points_for, total_wins)
cor(points_against, total_wins)
```
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Points For**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create scatter plots
Points_For_Plot <- ggplot(data = nfl_standings, aes(points_for, total_wins, col = as.character(playoffs))) +
geom_point() +
labs(col = "Playoffs") +
ggtitle("Total Wins | Total Points Scored") +
xlab("Total Points Scored") +
ylab("Total Wins") +
ylim(0, 16) +
theme_stata()
ggplotly(Points_For_Plot)
```
### **Points Against**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
Points_Against_Plot <- ggplot(data = nfl_standings, aes(points_against, total_wins, col = as.character(playoffs))) +
geom_point() +
labs(col = "Playoffs") +
ggtitle("Total Wins | Total Points Against") +
xlab("Total Points Against") +
ylab("Total Wins") +
ylim(0, 16) +
theme_stata()
ggplotly(Points_Against_Plot)
```
Individual Game Observations {data-navmenu="Standings" data-orientation=columns}
=============================================================================
Column
-----------------------------------------------------------------------------
### **Individual Game Observations**
The last analysis takes a look at data from the individual NFL games. Using the `nfl_games` dataset, I investigated the different variables.
Now, to analyze the correlation between different variables, I used the GGally package to produce a detailed scatter plot matrix. The function `ggpairs()` produced histograms along the diagonal of the matrix. Pearson’s rho estimates, or statistics showing correlation, are seen in the upper-right. Scatter plots are seen in the lower-left. I analyzed six variables here - (1) Points Scored by Winning Team (`pts_win`); (2) Yards Gained by Winning Team (`yds_win`); (3) Turnovers Committed by Winning Team (`turnovers_win`); (4) Points Scored by Losing Team (`pts_loss`); (5) Yards Gained by Losing Team (`yds_loss`); and (6) Turnovers Committed by Losing Team (`turnovers_loss`).
I then grouped these variables by winning team vs. losing team. This correlation matrix can be seen in the first tab to the right. As evident through both the scatter plots and Pearson’s rho estimates, there is little to no relationship between Points Scored by Winning Team vs. Turnovers Committed by Winning Team as well as Yards Gained by Winning Team vs. Turnovers Committed by Winning Team. All of these correlation coefficients are close to zero. On the other hand, there is a strong, positive relationship between Points Scored by Winning Team vs. Yards Gained by Winning Team, with a Pearson rho estimate of **0.537**.
Looking at the variables by losing team in the second tab to the right -- very similar to the winning teams, there is little to no relationship between Points Scored by Losing Team vs. Turnovers Committed by Losing Team as well as Yards Gained by Losing Team vs. Turnovers Committed by Losing Team. All of these correlation coefficients are close to zero. On the other hand, there is a strong, positive relationship between Points Scored by Losing Team vs. Yards Gained by Losing Team, with a Pearson rho estimate of **0.632**.
The main takeaway from these correlation matrices are that the more yards gained, the more likely you are to score. To compare a winning team and a losing team, I wanted to see if more turnovers from a losing team caused more points for the winning team. Please reference the third tab to the right to reference the linear model with `pts_win` as the response variable and `turnovers_loss` as the predictor variable. In this graphic, there is a slight, positive relationship between the Turnovers Committed by Losing Team and Points Scored by Winning Team. The correlation coefficient between the two variables is **0.176**.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Variables by Winning Team**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create correlation graphs for variables in nfl_games
cor_graphs1 <- ggpairs(nfl_games %>% select(pts_win, yds_win, turnovers_win))
ggplotly(cor_graphs1)
```
### **Variables by Losing Team**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Create correlation graphs for variables in nfl_games
cor_graphs2 <- ggpairs(nfl_games %>% select(pts_loss, yds_loss, turnovers_loss))
ggplotly(cor_graphs2)
```
### **Turnovers Committed by Losing Team vs. Points Scored by Winning Team**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Attach the dataset
attach(nfl_games)
# Create linear model
games_model <- lm(pts_win ~ turnovers_loss)
summary(games_model)
cor(turnovers_loss, pts_win)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Plot the data
turn_plot <- ggplot(nfl_games) + geom_point(aes(x = turnovers_loss, y = pts_win), color = "coral1") +
ggtitle("Turnovers by Losing Team vs. Points Scored by Winning Team")
ggplotly(turn_plot)
```
Average Weather Conditions {data-navmenu="Weather" data-orientation=columns}
=============================================================================
Column
-----------------------------------------------------------------------------
### **Understanding Weather Conditions**
Looking at the `nfl_weather` dataset, I wanted to see which teams performed well under certain weather conditions. To do this, I first wanted to observe the average temperature, humidity, and wind speed at each home location. In R, I utilized the `dplyr` package to tidy my data and create new columns with `mutate`. To visualize the average temperature, humidity, and wind speed at each location, I created bar graphs for each variable per city.
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
head(nfl_weather, 50)
# Add a column of weather averages for each team's location
nfl_weather <- nfl_weather %>%
group_by(home_team_city) %>%
mutate(avg_temperature = mean(temperature)) %>%
mutate(avg_humidity = mean(humidity)) %>%
mutate(avg_wind = mean(wind_mph))
names(nfl_weather)
nfl_weather_2 <- nfl_weather[-c(3, 4, 5, 6, 7, 8, 9, 10, 11)]
# Remove duplicates based on home_team_city column
nfl_weather_2 <- nfl_weather_2[!duplicated(nfl_weather_2$home_team_city), ]
# Round values to two decimal places
nfl_weather_2$avg_temperature <- round(nfl_weather_2$avg_temperature, digits = 2)
nfl_weather_2$avg_humidity <- round(nfl_weather_2$avg_humidity, digits = 2)
nfl_weather_2$avg_wind <- round(nfl_weather_2$avg_wind, digits = 2)
head(nfl_weather_2, 20)
```
From the visualizations to the right, it appears that the following five cities have the highest average temperatures:
1. **Miami, Florida** -- 76.70°F
2. **Detroit, Michigan** -- 71.64°F
3. **Tampa Bay, Florida** -- 71.51°F
4. **New Orleans, Louisiana** -- 71.03°F
5. **Houston, Texas** -- 71.03°F
The following five cities have the highest humidity percentage:
1. **Seattle, Washington** -- 79%
2. **San Francisco, California** -- 71%
3. **Oakland, California** -- 71%
4. **Green Bay, Wisconsin** -- 71%
5. **Miami, Florida** -- 70%
Lastly, the following five cities have the highest winds (in mph):
1. **New England, Massachusetts** -- 11.54 mph
2. **New York, New York** --10.57 mph
3. **Dallas, Texas** -- 10.27 mph
4. **Denver, Colorado** -- 9.96 mph
5. **Buffalo, New York** -- 9.95 mph
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Average Temperature Per City**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Bar graph of average temperature per city
avg_weather <- ggplot(nfl_weather_2, aes(x = reorder(home_team_city, avg_temperature), y = avg_temperature)) +
ggtitle( "Average Temperature by City") +
xlab("City") + ylab("Temperature (in Fahrenheit)") +
geom_col(width = 0.7) + coord_flip()
ggplotly(avg_weather)
```
### **Average Humidity Per City**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Bar graph of average humidity per city
avg_humidity <- ggplot(nfl_weather_2, aes(x = reorder(home_team_city, avg_humidity), y =
avg_humidity)) +
ggtitle( "Average Humidity by City") +
xlab("City") + ylab("Humidity") +
geom_col(width = 0.7) + coord_flip()
ggplotly(avg_humidity)
```
### **Average Wind Per City**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Bar graph of average wind per city
avg_wind <- ggplot(nfl_weather_2, aes(x = reorder(home_team_city, avg_wind), y = avg_wind)) +
ggtitle( "Average Wind (mph) by City") +
xlab("City") + ylab("Wind (mph)") +
geom_col(width = 0.7) + coord_flip()
ggplotly(avg_wind)
```
Can Weather Predict Game Outcomes? {data-navmenu="Weather" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Can Weather Predict Game Outcomes?**
Next, I wanted to see in high wind speeds were correlated to low-scoring games. To do this, I first combined the total score of the `home_score` and `away_score` variables. I created a `total_score` column. With this column, I ran correlation coefficients between `total_score` and `temperature`, `total_score` and `humidity`, and `total_score` and `wind_mph`.
I also ran two linear models to see if weather conditions could predict whether or not the game would be high-scoring or low-scoring. I was able to train and test my dataset with a 70-30 training-testing split.
It appears that the correlation coefficient between `total_score` and `temperature` is **0.0164**. Knowing that a correlation coefficient value of plus or minus one is said to be a perfect correlation, I know that there is little to no correlation between `total_score` and `temperature`.
The correlation coefficient between `total_score` and `humidity` is **-0.1207**. Here, I can see that there is a slight, negative correlation between the two variables. Similar to this is the relationship between `total_score` and `wind_mph`. The correlation coefficient is **-0.1328**, indicating a slight, negative correlation.
Of the three relationships tested, it appears that the wind speed correlates most to a lower total score. The higher the wind, the lower the combined score of the game.
Looking further, I decided to create a linear model to see if temperature, humidity, and wind can predict the total combined score of the game. The first model I created included all three predictor variables; however, it did not perform well. The adjusted R-squared of this model is around a meager 0.02, indicating that the model accounts for only 2% of the variance explained by the model. It did appear that the `humidity` and `wind_mph` variables were statistically significant at the 95% confidence level as they had p-values less than 0.05. The Mean Squared Error (MSE) of this model is higher than 170, which is incredibly high given one would ideally want an MSE of zero. It is important to note that these results will vary given the random training-testing split.
Using the above data and knowing that `wind_mph` was the most correlated with `total_score`, I decided to create a second model with just `wind_mph` as the sole predictor variable. This model performed even worse with an adjusted R-squared of around 0.015, indicating that the model accounts for less than 2% of the variance. The `wind_mph` variable is still statistically significant due to its p-value being less than 0.05 at the 95% confidence interval. Additionally, the MSE of this model is still higher than 170. It is important to note that these results will also vary given the random training-testing split.
In conclusion, weather does not accurately predict whether or not the game will be high-scoring or low-scoring. I originally thought it would be more difficult for the players to score given higher wind speeds; however, I was proved wrong.
I thought about analyzing how teams fared in games located on opposite sides of the country (e.g., New England Patriots at Los Angeles Rams); however, I decided against that analysis. I decided against this because I thought there would be other contributing factors to a loss, such as home-field advantage.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Linear Model with all Weather Variables**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
nfl_weather$total_score = nfl_weather$home_score + nfl_weather$away_score
head(nfl_weather)
```
```{r, message = FALSE, warning = FALSE, echo = TRUE, results = 'hide'}
attach(nfl_weather)
cor(total_score, temperature); cor(total_score, humidity); cor(total_score, wind_mph)
# Split the data into training and testing
sample_index <- sample(nrow(nfl_weather), nrow(nfl_weather)*0.70)
weather_train <- nfl_weather[sample_index,]
weather_test <- nfl_weather[-sample_index,]
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Create the linear model
weather_model <- lm(total_score ~ temperature + humidity + wind_mph, data = weather_train)
model_summary <- summary(weather_model)
model_summary
# Out-of-sample performance
pi <- predict(object = weather_model, newdata = weather_test)
mean((pi - weather_test$total_score)^2) # MSE
```
### **Linear Model with Wind Variable**
```{r, message = FALSE, warning = FALSE, echo = TRUE}
# Drop all variables except wind_mph
weather_model_2 <- lm(total_score ~ wind_mph, data = weather_train)
model_summary_2 <- summary(weather_model_2)
model_summary_2
# Out-of-sample performance
pi_2 <- predict(object = weather_model_2, newdata = weather_test)
mean((pi_2 - weather_test$total_score)^2) # MSE
```
Teams {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Successful Postseason Teams**
For the next part of my analysis, I decided to look at the best head coaches and quarterbacks from the past 20 years. I created this `nfl_playoffs` dataset myself by taking every team from the past 20 years that made the playoffs and then listing their head coach and starting postseason quarterback.
The goal of this portion of the analysis is to see if the head coach or quarterback really do make a difference in team success. For example, was it a coincidence that the Tampa Bay Buccaneers won the most recent Super Bowl -- conveniently, the first year with Tom Brady as quarterback?
Over the past 20 years, it appears that the top three teams are as follows:
1. **New England Patriots**
2. **Indianapolis Colts**
3. **Green Bay Packers**
The New England Patriots have been a dominant force the past few decades. I would argue that everyone that is not a New England Patriots fan strongly roots against them just because they have won so frequently. In the past few years, fans would claim the success comes from head coach Bill Belichick and former players -- Tom Brady and Rob Gronkowski.
The Indianapolis Colts' success over the past 20 years can be attributed to their former quarterback Peyton Manning. Manning arrived to the Colts in 1998 and led the team to its first championship in 36 seasons at Super Bowl XLI.
The Green Bay Packers have also been quite the team over the past 20 years. Their general manager, Ted Thompson, has been a key figure in their success. The Packers also have had incredible leaders through their coaching staff and players. Notably, the Packers also have a notoriously strong fan base.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Successful Postseason Teams**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
head(nfl_playoffs)
nfl_teams_playoffs <- nfl_playoffs %>% count(nfl_playoffs$team_name)
nfl_teams_playoffs <- as.data.frame(nfl_teams_playoffs)
nfl_teams_playoffs
names(nfl_teams_playoffs)
# Rename the columns
nfl_teams_playoffs <- nfl_teams_playoffs %>% rename(
team_name = "nfl_playoffs$team_name",
playoffs = "n"
)
nfl_teams_playoffs
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
top_10_teams_playoffs <- nfl_teams_playoffs %>% top_n(10, playoffs) %>%
arrange(desc(playoffs))
top_10_teams_playoffs <- top_10_teams_playoffs[1:10,]
kable(top_10_teams_playoffs)
```
### **Top Ten Teams by Playoff Appearances**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
playoff_team_plot <- ggplot(data = top_10_teams_playoffs, aes(x = reorder(team_name, -playoffs),
y = playoffs)) +
geom_bar(stat = "identity", width = 0.5, fill = "black") +
scale_y_continuous(name = "Total Playoff Appearances") +
scale_x_discrete(name = "Team") +
ggtitle("Top Ten Teams in the NFL by Playoff Appearances") +
theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
geom_text(aes(label = playoffs), position = position_dodge(width = 0.5),
vjust = 2,
color = "white", size = 3.5)
playoff_team_plot
```
Head Coaches {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Successful Postseason Head Coaches**
Now knowing the top-performing teams in the NFL, I want to see which coaches have led these teams to success.
Looking at the visualization **Top Ten Coaches by Playoff Appearances** to the right, it is evident that the top two coaches in the NFL over the past 20 years are:
1. **Bill Belichick**
2. **Andy Reid**
Bill Belichick, head coach of the New England Patriots, has been with the team since 2000. As head coach, he has six Super Bowl championships (XXXVI, XXXVIII, XXXIX, XLIX, LI, and LIII). He has won AP NFL Coach of the Year in 2003, 2007, and 2010. He has also won 31 playoff games. I cannot say I was surprised to see him listed as number one in this analysis. His career record as a coach is 311-148 (0.678).
Andy Reid is the current head coach for the Kansas City Chiefs. He has been with the team since 2013. Prior to that, Reid was the head coach for the Philadelphia Eagles (1999 - 2012). He has won two Super Bowl championships (XXXI and LIV) - one as an assistant coach and one as a head coach. His career record as a coach is 238-145-1 (0.621).
Next, I wanted to look at the head coaches with the most Super Bowl championships won. Is this consistent with the top coaches who go to postseason play? Given Marvin Lewis, former head coach of the Cincinnati Bengals, is number ten in the graphic **Top Ten Coaches by Playoff Appearances**, I cannot be quite sure.
In the visualization **Top Ten Coaches by Super Bowl Championships** to the right, it is evident that Bill Belichick is still the dominant head coach from the past 20 years, with *six* Super Bowl championships.
The head coach with the next highest number of Super Bowl championships as head coach is Tom Coughlin. Tom Coughlin was the head coach of the Jacksonville Jaguars from 1995 - 2002, and he was the head coach of the New York Giants from 2004 - 2015. His two Super Bowl championships were as the head coach of the New York Giants (XLII and XLVI). His career record as a coach in the NFL was 182-157 (0.537).
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **List of Top Coaches by Playoff Appearances**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
head(nfl_playoffs)
nfl_coaches_playoffs <- nfl_playoffs %>% count(nfl_playoffs$head_coach)
nfl_coaches_playoffs <- as.data.frame(nfl_coaches_playoffs)
nfl_coaches_playoffs
# Rename the columns
nfl_coaches_playoffs <- nfl_coaches_playoffs %>% rename(
head_coach = "nfl_playoffs$head_coach",
playoffs = "n"
)
nfl_coaches_playoffs
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
top_10_coaches_playoffs <- nfl_coaches_playoffs %>% top_n(10, playoffs) %>% arrange(desc(playoffs))
top_10_coaches_playoffs <- top_10_coaches_playoffs[1:10,]
kable(top_10_coaches_playoffs)
```
### **Top Ten Coaches by Playoff Appearances**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
playoff_coach_plot <- ggplot(data = top_10_coaches_playoffs, aes(x = reorder(head_coach, -playoffs), y = playoffs)) +
geom_bar(stat = "identity", width = 0.5, fill = "red") +
scale_y_continuous(name = "Total Playoff Appearances") +
scale_x_discrete(name = "Head Coach") +
ggtitle("Top Ten Coaches in the NFL by Playoff Appearances") +
theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
geom_text(aes(label = playoffs), position = position_dodge(width = 0.5), vjust = 2,
color = "white", size = 3.5)
playoff_coach_plot
```
### **List of Top Coaches by Super Bowl Championships**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
head(nfl_playoffs)
names(nfl_playoffs)
# Add a column of count of Super Bowl wins for each head coach
nfl_playoffs <- nfl_playoffs %>%
group_by(head_coach) %>%
mutate(total_sb_coach = sum(sb_winner))
head(nfl_playoffs)
# Create new dataset for top coaches
nfl_coaches_sb <- nfl_playoffs[-c(1, 2, 3, 4, 5, 6, 8)]
head(nfl_coaches_sb)
# Remove duplicates based on head_coach column
nfl_coaches_sb <- nfl_coaches_sb[!duplicated(nfl_coaches_sb$head_coach), ]
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
top_10_coaches <- nfl_coaches_sb %>% top_n(10, total_sb_coach) %>% arrange(desc(total_sb_coach))
top_10_coaches <- top_10_coaches[1:10,]
kable(top_10_coaches)
```
### **Top Ten Coaches by Super Bowl Championships**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
sb_coach_plot <- ggplot(data = top_10_coaches, aes(x = reorder(head_coach, -total_sb_coach), y = total_sb_coach)) +
geom_bar(stat = "identity", width = 0.5, fill = "darkblue") +
scale_y_continuous(name = "Total Super Bowls Won") +
scale_x_discrete(name = "Head Coach") +
ggtitle("Top Ten Coaches in the NFL by Super Bowl Championships") +
theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
geom_text(aes(label = total_sb_coach), position = position_dodge(width = 0.5), vjust = 2,
color = "white", size = 3.5)
sb_coach_plot
```
Quarterbacks {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Successful Postseason Quarterbacks**
An individual can be a great coach; however, they also need a great team. Quarterbacks are often described as the leaders of the NFL. With that said, I wanted to take a look at the best quarterbacks from the past 20 years.
From the visualization **Top Ten Quarterbacks by Playoff Appearances** to the right, it is evident that the top three quarterbacks in the NFL over the past 20 years are:
1. **Tom Brady**
2. **Ben Roethlisberger**
3. **Drew Brees**
It is no surprise that Tom Brady is number one, as he is often referred to as the Greatest Of All Time (GOAT). Tom Brady was drafted to the New England Patriots in the sixth round of the 2000 NFL Draft. Since then, he is a seven-time Super Bowl Champion (six with the New England Patriots and one with the Tampa Bay Buccaneers). He is still active in the NFL as the quarterback for the Tampa Bay Buccaneers. As of 2020, his completion percentage is 64% and his accolades are many. I am interested to see how much longer he will excel in the league.
Ben Roethlisberger, the long-time quarterback for the Pittsburgh Steelers, was drafted in the first round of the 2004 NFL Draft. He has won two Super Bowl championships, and his completion percentage is 64.4%. He just signed with the Steelers for another year, so (as a Steelers fan) I am hoping he will lead the team to a third championship this upcoming season.
Drew Brees started his career with the San Diego Chargers (2001 - 2005), but he is most known for his career as the quarterback for the New Orleans Saints (2006 - 2020). Brees has won one Super Bowl championship, and he just announced his retirement this year. His completion percentage was 67.7%.
These top three quarterbacks are some of the famous players in the NFL. It is interesting to see how their career statistics speak for themselves.
Next, I wanted to see which quarterbacks have won the most Super Bowl Championships over the past 20 years.
Once again, looking at the **Top Ten Quarterbacks by Super Bowl Championships** to the right, Tom Brady is the most dominant quarterback in the NFL based on Super Bowl Championships. Ben Roethlisberger is close behind. The other two quarterbacks which I have not discussed are Eli Manning and Peyton Manning. Both of them have won two Super Bowl Championships since 2000.
Eli Manning was the quarterback for the New York Giants from 2004 - 2019. He won two Super Bowl Championships (XLII and XLVI) and was the Super Bowl MVP for both games.
Peyton Manning was the quarterback for the Indianapolis Colts from 1998 - 2011 and the quarterback for the Denver Broncos from 2012 - 2015. He also won two Super Bowl Championships (XLI, 50) and was the Super Bowl MVP for Super Bowl XLI. He won one Super Bowl with the Colts in 2006 and one Super Bowl withe the Broncos in 2015.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **List of Top Quarterbacks by Playoff Appearances**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
head(nfl_playoffs)
nfl_qb_playoffs <- nfl_playoffs %>% count(nfl_playoffs$qb)
nfl_qb_playoffs <- as.data.frame(nfl_qb_playoffs)
nfl_qb_playoffs <- nfl_qb_playoffs[-1]
nfl_qb_playoffs
# Rename the columns
nfl_qb_playoffs <- nfl_qb_playoffs %>% rename(
qb = "nfl_playoffs$qb",
playoffs = "n"
)
nfl_qb_playoffs
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
top_10_qb_playoffs <- nfl_qb_playoffs %>% top_n(10, playoffs) %>% arrange(desc(playoffs))
top_10_qb_playoffs <- top_10_qb_playoffs[1:10,]
kable(top_10_qb_playoffs)
```
### **Top Ten Quarterbacks by Playoff Appearances**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
playoff_qb_plot <- ggplot(data = top_10_qb_playoffs, aes(x = reorder(qb, -playoffs), y = playoffs)) +
geom_bar(stat = "identity", width = 0.5, fill = "slategray") +
scale_y_continuous(name = "Total Playoff Appearances") +
scale_x_discrete(name = "Quarterback") +
ggtitle("Top Ten Quarterbacks in the NFL by Playoff Appearances") +
theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
geom_text(aes(label = playoffs), position = position_dodge(width = 0.5), vjust = 2,
color = "white", size = 3.5)
playoff_qb_plot
```
### **List of Top Quarterbacks by Super Bowl Championships**
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of Super Bowl wins for each quarterback
nfl_playoffs <- nfl_playoffs %>%
group_by(qb) %>%
mutate(total_sb_qb = sum(sb_winner))
head(nfl_playoffs)
# Create new dataset for top quarterbacks
nfl_qb_sb <- nfl_playoffs[-c(1, 2, 3, 4, 5, 6, 7, 9)]
head(nfl_qb_sb)
# Remove duplicates based on quarterback column
nfl_qb_sb <- nfl_qb_sb[!duplicated(nfl_qb_sb$qb), ]
```
```{r, message = FALSE, warning = FALSE, echo = TRUE}
top_10_qb <- nfl_qb_sb %>% top_n(10, total_sb_qb) %>% arrange(desc(total_sb_qb))
top_10_qb <- top_10_qb[1:10,]
kable(top_10_qb)
```
### **Top Ten Quarterbacks by Super Bowl Championships**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
sb_qb_plot <- ggplot(data = top_10_qb, aes(x = reorder(qb, -total_sb_qb), y = total_sb_qb)) +
geom_bar(stat = "identity", width = 0.5, fill = "darkgray") +
scale_y_continuous(name = "Total Super Bowls Won") +
scale_x_discrete(name = "Quarterback") +
ggtitle("Top Ten Quarterbacks in the NFL by Super Bowl Championships") +
theme(axis.text.x = element_text(angle = 65, vjust = 0.6)) +
geom_text(aes(label = total_sb_qb), position = position_dodge(width = 0.5), vjust = 2,
color = "white", size = 3.5)
sb_qb_plot
```
Passing Yards Leaders {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Passing Yards Leaders**
For the next portion of my analysis, I wanted to analyze the top passing yards leaders. Given these are always quarterbacks, I wanted to see if this was consistent with my previous analysis of postseason quarterback success.
From the visualizations to the right, it is evident that Drew Brees was the leader for passing yards six out of 20 times the past 20 years. However, he did not have the most playoff appearances. It is interesting that Tom Brady is on the list only three times, yet he has by far been the most successful quarterback.
Additionally, looking at the **Top Passing Yards Leaders** tab, the visualization is interactive. It is evident that the top passer over the past 20 years was Peyton Manning in 2013. The data points are colored based on player. The legend can be seen on the right.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Passing Yards Summary**
```{r, message = FALSE, warning = FALSE, echo = TRUE}
kable(nfl_passing)
```
### **Top Passing Yards Leaders**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
passing_leaders <- nfl_passing %>%
ggplot(aes(x = year, y = yds)) +
geom_point(alpha = 0.8, aes(color = player)) +
ggtitle("Top Passing Yards Per Year")
ggplotly(passing_leaders)
```
Rushing Yards Leaders {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Rushing Yards Leaders**
Looking at the leaders for rushing yards, I performed the same analysis as above.
As seen in the visualizations to the right, the rushing yards leaders have much more variation than the passing yards leaders. Most recently, **Derrick Henry** from the Tennessee Titans has been the dominant running back with 2027 yards in 2020. He has been the top rusher for the past two years in a row.
Additionally, looking at the **Top Rushing Yards Leaders** tab, the visualization is also interactive. It is evident that the top rusher over the past 20 years was Adrian Peterson in 2012. The data points are colored based on player. The legend can be seen on the right.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Rushing Yards Summary**
```{r, message = FALSE, warning = FALSE, echo = TRUE}
kable(nfl_rushing)
```
### **Top Rushing Yards Leaders**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
rushing_leaders <- nfl_rushing %>%
ggplot(aes(x = year, y = yds)) +
geom_point(alpha = 0.8, aes(color = player)) +
ggtitle("Top Rushing Yards Per Year")
ggplotly(rushing_leaders)
```
Penalty Yards Per Game {data-navmenu="League Leaders" data-orientation=rows}
=============================================================================
### **NFL Average Penalty Yards Per Game**
Looking at the average penalty yards per game, I was able to find a dataset that recorded the average penalty yards against a team from 2003 - 2020. I wanted to figure out -- which team was the most penalized?
From the graphic below, it is evident that the **Las Vegas Raiders** have been the most penalized team in the NFL. The top five most penalized teams are:
1. **Las Vegas Raiders**
2. **Baltimore Ravens**
3. **Detroit Lions**
4. **Tampa Bay Buccaneers**
5. **Los Angeles Rams**
The least penalized team in the NFL is the **Indianapolis Colts**.
### **Top Penalized Teams**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
top_penalized <- ggplot(nfl_penalty, aes(x = reorder(team_name, total), y = total)) +
geom_bar(fill = "slategray", stat = "identity") +
coord_flip() +
ggtitle("Most Penalized Teams in the NFL") +
ylab("Average Penalty Yards") + xlab("Team Name")
ggplotly(top_penalized)
```
Rival Analysis {data-navmenu="League Leaders" data-orientation=columns}
=============================================================================
Column {.sidebar data-width=450}
-----------------------------------------------------------------------------
#### **Rival Analysis**
The last part of my analysis is to look at the top five rivals in the NFL and the performance of these games over the past 20 years.
The term "rivalry" can be a bit subjective; however I chose the following five rivalries to analyze:
1. **Green Bay Packers** vs. **Chicago Bears** -- It is evident that the Green Bay Packers have been dominant in this rivalry, winning 71% of the encounters.
2. **Dallas Cowboys** vs. **Philadelphia Eagles** -- This is a very good rivalry, as both teams have performed. It is evident that the Philadelphia Eagles have been leading this rivalry, winning 54% of the encounters.
3. **Kansas City Chiefs** vs. **Las Vegas Raiders** (formerly Oakland Raiders) -- It is evident that the Kansas City Chiefs have been dominant in this rivalry, winning 63% of the encounters.
4. **Baltimore Ravens** vs. **Pittsburgh Steelers** -- This is a phenomenal rivalry to watch, as both times have won 22 games out of 44 games total. Each team has won 50% of the encounters.
5. **Washington Football Team** (formerly Washington Redskins) vs. **New York Giants** -- It is evident that the New York Giants have been dominant in this rivalry, winning 68% of the encounters.
Then, just for fun -- I decided to analyze the **Pittsburgh Steelers** vs. the **Cincinnati Bengals** for fun. It is evident that the Pittsburgh Steelers have been dominant in this rivalry, winning 79% of the encounters.
Column {.tabset .tabset-fade}
-----------------------------------------------------------------------------
### **Packers vs. Bears**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Packers vs. Bears
rivalry1 <- nfl_games %>% filter(home_team %in% c("Green Bay Packers", "Chicago Bears")) %>% filter(away_team %in% c("Green Bay Packers", "Chicago Bears"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry1_freq <- rivalry1 %>% count(rivalry1$winner)
rivalry1_freq <- as.data.frame(rivalry1_freq)
rivalry1_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry1_freq <- rivalry1_freq %>% rename(
team = "rivalry1$winner",
games = "n"
)
kable(rivalry1_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry1)
```
### Cowboys vs. Eagles
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Cowboys vs. Eagles
rivalry2 <- nfl_games %>% filter(home_team %in% c("Dallas Cowboys", "Philadelphia Eagles")) %>% filter(away_team %in% c("Dallas Cowboys", "Philadelphia Eagles"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry2_freq <- rivalry2 %>% count(rivalry2$winner)
rivalry2_freq <- as.data.frame(rivalry2_freq)
rivalry2_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry2_freq <- rivalry2_freq %>% rename(
team = "rivalry2$winner",
games = "n"
)
kable(rivalry2_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry2)
```
### Chiefs vs. Raiders
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Chiefs vs. Raiders
rivalry3 <- nfl_games %>% filter(home_team %in% c("Kansas City Chiefs", "Oakland Raiders")) %>% filter(away_team %in% c("Kansas City Chiefs", "Oakland Raiders"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry3_freq <- rivalry3 %>% count(rivalry3$winner)
rivalry3_freq <- as.data.frame(rivalry3_freq)
rivalry3_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry3_freq <- rivalry3_freq %>% rename(
team = "rivalry3$winner",
games = "n"
)
kable(rivalry3_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry3)
```
### Ravens vs. Steelers
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Ravens vs. Steelers
rivalry4 <- nfl_games %>% filter(home_team %in% c("Baltimore Ravens", "Pittsburgh Steelers")) %>% filter(away_team %in% c("Baltimore Ravens", "Pittsburgh Steelers"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry4_freq <- rivalry4 %>% count(rivalry4$winner)
rivalry4_freq <- as.data.frame(rivalry4_freq)
rivalry4_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry4_freq <- rivalry4_freq %>% rename(
team = "rivalry4$winner",
games = "n"
)
kable(rivalry4_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry4)
```
### Packers vs. Bears
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Packers vs. Bears
rivalry5 <- nfl_games %>% filter(home_team %in% c("Washington Redskins", "New York Giants")) %>% filter(away_team %in% c("Washington Redskins", "New York Giants"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry5_freq <- rivalry5 %>% count(rivalry5$winner)
rivalry5_freq <- as.data.frame(rivalry5_freq)
rivalry5_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry5_freq <- rivalry5_freq %>% rename(
team = "rivalry5$winner",
games = "n"
)
kable(rivalry5_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry5)
```
### Steelers vs. Bengals
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Steelers vs. Bengals
rivalry6 <- nfl_games %>% filter(home_team %in% c("Cincinnati Bengals", "Pittsburgh Steelers")) %>% filter(away_team %in% c("Cincinnati Bengals", "Pittsburgh Steelers"))
```
```{r, message = FALSE, warning = FALSE, echo = FALSE, results = 'hide'}
# Add a column of count of wins for each team
rivalry6_freq <- rivalry6 %>% count(rivalry6$winner)
rivalry6_freq <- as.data.frame(rivalry6_freq)
rivalry6_freq
```
**Which team has won more games in the past 20 years?**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
# Rename columns
rivalry6_freq <- rivalry6_freq %>% rename(
team = "rivalry6$winner",
games = "n"
)
kable(rivalry6_freq)
```
**Individual Game Statistics**
```{r, message = FALSE, warning = FALSE, echo = FALSE}
kable(rivalry6)
```
Summary {data-navmenu="Summary" data-orientation=columns}
=============================================================================
#### **Summary**
In this analysis, the main goal was to understand what all goes into winning an NFL game and what teams are historically successful in the standings. I was able to successfully break out this analysis into multiple different sections, including, but not limited to: (1) The Importance of Fan Attendance; (2) Standings over the Years; (3) Offense vs. Defense; and (4) Individual Game Observations.
Through extensive use of R, I investigated eight datasets with various information regarding the NFL. Linear modeling to discover the correlation between several datasets was frequently used. Additionally, the `ggplot2` package delivered great visualizations to showcase this breakdown of the NFL. New variables and tables were created as well to drill deeper into the data for a better understanding of the raw data. One of my primary focuses was a breakdown of the divisions and their successes over the past 20 years. Box plot visualizations between the two conferences illuminated how teams have fared in the win column from their best season to their worst season.
My first analysis looked into NFL Fan Attendance. Graphical representations were created to better understand which teams have a strong fan base and the consistency at which fans show up on a yearly basis. From this analysis, it was evident that the **Dallas Cowboys** have the strongest fan base and the **Los Angeles Chargers** have the weakest. Additionally, the greater attendance to games positively correlated to a team's total wins per season.
Secondly, I focused on the divisional standings through the years. As mentioned above, box plot visualizations by division showed the range of success for NFL teams. Per division, these teams have had the most success based on the `nfl_standings` dataset:
* **AFC East**: New England Patriots
* **AFC North**: Pittsburgh Steelers
* **AFC South**: Indianapolis Colts
* **AFC West**: Denver Broncos
* **NFC East**: Philadelphia Eagles
* **NFC North**: Green Bay Packers
* **NFC South**: New Orleans Saints
* **NFC West**: Seattle Seahawks
Using `geom_col()`, I observed that the **AFC East** has won the most Super Bowl Championships. This is due to the phenomenal success of Tom Brady and the New England Patriots during this time period.
Next, I researched one of the most common arguments in football - is the offense or defense more important? Linear modeling of the `nfl_standings` data was completed on several variables. High offensive rankings and defensive rankings correlate to more wins for teams. Even though having a great offense and defense are both important, the correlation tests indicated that a better offense is slightly more important to a team's success than a better defense. I created a table of the last 20 Super Bowl Champions and showcased the `offensive_ranking` and `defensive_ranking`. Teams have been trending towards having better offenses in the last few years as evident by this table.
I then observed individual game data in the NFL. Through graphs created by `ggpairs()`, I was able to view correlation coefficients for six variables. The main conclusion I deduced from this is that a positive correlation exists between yards gained and points scored.
Extending my analysis, I looked into how weather conditions play a role in game outcomes. I was able to find the average temperature, humidity, and wind for each location teams may play. I also was able to train and test a model with a 70-30 split to see if weather conditions predict whether the game will be high-scoring or low-scoring. I deduced there is little predictability of game outcomes from weather conditions, as both the in-sample and out-of-sample performance of my models were underwhelming.
I also analyzed league leaders over the past 20 years -- this includes teams, head coaches, quarterbacks, passing yards leaders, rushing yards leaders, and penalty leaders. Understanding who the game-changers are is important when trying to predict which team will win.
Lastly, I did an analysis of popular rivalries in the NFL to see which teams have been dominant. My personal favorite is that the Pittsburgh Steelers are 33-9 against the Cincinnati Bengals the past 20 years.
As a big NFL fan, it was incredibly interesting to see how the NFL has worked during my entire lifetime. Also, it was intriguing to see my favorite team's success over this time span. The NFL is one of the biggest industries in the world that has large implications on many levels. Sports gambling, the NFL Draft, fantasy football, and the common fan could all have different takeaways from this analysis that would help them better understand the recent history of the NFL. With fans across the globe, a deep dive into the NFL is exciting for many groups. Coaches and players would be able to more effectively prepare for their opponents, gamblers could make more educated bets, general managers could derive their team's needs in the Draft, and the common fan could revel in their team's history.
This data tells a phenomenal story of the state of the NFL. However, it is a game for a reason. No one will ever be able to fully predict NFL outcomes, and that is what makes the sport as intriguing as it is!