The topic of this analysis revolves around the performance statistics of NFL teams during the regular season from 1999 to 2022. The dataset contains various variables, including both categorical and quantitative ones, such as the team’s abbreviation, season year, points scored, points allowed, wins, losses, completion percentage, yards gained, and many more. These variables offer insights into the offensive and defensive capabilities of each team, allowing for a broad analysis of their performance over the years. The data was sourced using the nflreadr package in R, ensuring accuracy and reliability in its collection.
Loading data
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 765 Columns: 56
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): team
dbl (55): season, offense_completion_percentage, offense_total_yards_gained_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(nfl_data)
season team offense_completion_percentage
Min. :1999 Length:765 Min. :0.4128
1st Qu.:2005 Class :character 1st Qu.:0.5409
Median :2011 Mode :character Median :0.5730
Mean :2011 Mean :0.5732
3rd Qu.:2017 3rd Qu.:0.6061
Max. :2022 Max. :0.7043
offense_total_yards_gained_pass offense_total_yards_gained_run
Min. :1900 Min. :1001
1st Qu.:3117 1st Qu.:1612
Median :3541 Median :1808
Mean :3565 Mean :1843
3rd Qu.:3993 3rd Qu.:2044
Max. :5444 Max. :3326
offense_ave_yards_gained_pass offense_ave_yards_gained_run
Min. :3.896 Min. :2.979
1st Qu.:5.602 1st Qu.:3.992
Median :6.093 Median :4.278
Mean :6.143 Mean :4.300
3rd Qu.:6.718 3rd Qu.:4.596
Max. :8.577 Max. :5.784
offense_total_air_yards offense_ave_air_yards offense_total_yac
Min. : 0 Min. : 5.655 Min. : -1
1st Qu.: 0 1st Qu.: 7.711 1st Qu.: 0
Median :4180 Median : 8.278 Median :1557
Mean :3234 Mean : 8.312 Mean :1254
3rd Qu.:4800 3rd Qu.: 8.884 3rd Qu.:1882
Max. :6642 Max. :11.266 Max. :2850
NA's :221
offense_ave_yac offense_n_plays_pass offense_n_plays_run
Min. :-1.000 Min. :394.0 Min. :298.0
1st Qu.: 4.655 1st Qu.:541.0 1st Qu.:392.0
Median : 5.070 Median :580.0 Median :421.0
Mean : 5.092 Mean :579.5 Mean :426.8
3rd Qu.: 5.519 3rd Qu.:620.0 3rd Qu.:461.0
Max. :11.667 Max. :776.0 Max. :593.0
NA's :199
offense_n_interceptions offense_n_fumbles_lost_pass offense_n_fumbles_lost_run
Min. : 2.00 Min. : 0.000 Min. : 0.00
1st Qu.:12.00 1st Qu.: 4.000 1st Qu.: 2.00
Median :15.00 Median : 5.000 Median : 3.00
Mean :15.08 Mean : 5.339 Mean : 3.69
3rd Qu.:18.00 3rd Qu.: 7.000 3rd Qu.: 5.00
Max. :32.00 Max. :13.000 Max. :12.00
offense_total_epa_pass offense_total_epa_run offense_ave_epa_pass
Min. :-192.026 Min. :-107.835 Min. :-0.332342
1st Qu.: -48.991 1st Qu.: -43.718 1st Qu.:-0.087526
Median : 3.624 Median : -25.003 Median : 0.007097
Mean : 5.827 Mean : -22.281 Mean : 0.008385
3rd Qu.: 56.849 3rd Qu.: -5.015 3rd Qu.: 0.099770
Max. : 248.445 Max. : 84.928 Max. : 0.406900
offense_ave_epa_run offense_total_wpa_pass offense_total_wpa_run
Min. :-0.26904 Min. :-5.91666 Min. :-3.4171
1st Qu.:-0.10600 1st Qu.:-1.36412 1st Qu.:-1.1173
Median :-0.05740 Median :-0.10653 Median :-0.4679
Mean :-0.05591 Mean : 0.05851 Mean :-0.4312
3rd Qu.:-0.01138 3rd Qu.: 1.45503 3rd Qu.: 0.2319
Max. : 0.16437 Max. : 6.07264 Max. : 2.6223
offense_ave_wpa_pass offense_ave_wpa_run offense_success_rate_pass
Min. :-1.007e-02 Min. :-0.009572 Min. :0.3066
1st Qu.:-2.331e-03 1st Qu.:-0.002720 1st Qu.:0.4103
Median :-1.750e-04 Median :-0.001074 Median :0.4400
Mean : 4.066e-05 Mean :-0.001114 Mean :0.4412
3rd Qu.: 2.455e-03 3rd Qu.: 0.000533 3rd Qu.:0.4728
Max. : 9.903e-03 Max. : 0.004995 Max. :0.5799
offense_success_rate_run defense_completion_percentage
Min. :0.2837 Min. :0.4570
1st Qu.:0.3679 1st Qu.:0.5477
Median :0.3927 Median :0.5740
Mean :0.3953 Mean :0.5742
3rd Qu.:0.4211 3rd Qu.:0.6024
Max. :0.5238 Max. :0.6915
defense_total_yards_gained_pass defense_total_yards_gained_run
Min. :2413 Min. : 945
1st Qu.:3252 1st Qu.:1627
Median :3575 Median :1825
Mean :3565 Mean :1843
3rd Qu.:3857 3rd Qu.:2038
Max. :4800 Max. :2910
defense_ave_yards_gained_pass defense_ave_yards_gained_run
Min. :4.300 Min. :2.640
1st Qu.:5.747 1st Qu.:4.003
Median :6.125 Median :4.315
Mean :6.157 Mean :4.303
3rd Qu.:6.564 3rd Qu.:4.597
Max. :7.869 Max. :5.564
defense_total_air_yards defense_ave_air_yards defense_total_yac
Min. : 0 Min. : 5.761 Min. : 0
1st Qu.: 0 1st Qu.: 7.809 1st Qu.: 0
Median :4256 Median : 8.285 Median :1620
Mean :3234 Mean : 8.301 Mean :1254
3rd Qu.:4722 3rd Qu.: 8.793 3rd Qu.:1853
Max. :6235 Max. :10.511 Max. :2586
NA's :221
defense_ave_yac defense_n_plays_pass defense_n_plays_run
Min. : 0.000 Min. :433.0 Min. :317.0
1st Qu.: 4.763 1st Qu.:549.0 1st Qu.:397.0
Median : 5.093 Median :578.0 Median :424.0
Mean : 5.121 Mean :579.5 Mean :426.8
3rd Qu.: 5.479 3rd Qu.:612.0 3rd Qu.:454.0
Max. :11.667 Max. :728.0 Max. :584.0
NA's :202
defense_n_interceptions defense_n_fumbles_lost_pass defense_n_fumbles_lost_run
Min. : 2.00 Min. : 0.000 Min. : 0.00
1st Qu.:11.00 1st Qu.: 4.000 1st Qu.: 2.00
Median :15.00 Median : 5.000 Median : 3.00
Mean :15.08 Mean : 5.339 Mean : 3.69
3rd Qu.:18.00 3rd Qu.: 7.000 3rd Qu.: 5.00
Max. :33.00 Max. :18.000 Max. :14.00
defense_total_epa_pass defense_total_epa_run defense_ave_epa_pass
Min. :-184.562 Min. :-129.144 Min. :-0.33435
1st Qu.: -33.754 1st Qu.: -41.358 1st Qu.:-0.05870
Median : 8.190 Median : -23.425 Median : 0.01393
Mean : 5.827 Mean : -22.281 Mean : 0.01093
3rd Qu.: 45.419 3rd Qu.: -3.341 3rd Qu.: 0.07791
Max. : 174.264 Max. : 61.022 Max. : 0.29840
defense_ave_epa_run defense_total_wpa_pass defense_total_wpa_run
Min. :-0.360737 Min. :-4.72279 Min. :-3.3453
1st Qu.:-0.101735 1st Qu.:-1.01271 1st Qu.:-1.0272
Median :-0.055393 Median : 0.10270 Median :-0.4424
Mean :-0.055497 Mean : 0.05851 Mean :-0.4312
3rd Qu.:-0.007957 3rd Qu.: 1.14822 3rd Qu.: 0.2101
Max. : 0.129284 Max. : 5.13211 Max. : 2.2887
defense_ave_wpa_pass defense_ave_wpa_run defense_success_rate_pass
Min. :-0.0085558 Min. :-0.008906 Min. :0.3243
1st Qu.:-0.0017829 1st Qu.:-0.002522 1st Qu.:0.4192
Median : 0.0001650 Median :-0.001025 Median :0.4434
Mean : 0.0001062 Mean :-0.001112 Mean :0.4422
3rd Qu.: 0.0019818 3rd Qu.: 0.000460 3rd Qu.:0.4651
Max. : 0.0085110 Max. : 0.004849 Max. :0.5409
defense_success_rate_run points_scored points_allowed wins
Min. :0.2780 Min. :161.0 Min. :165.0 Min. : 0.000
1st Qu.:0.3689 1st Qu.:301.0 1st Qu.:314.0 1st Qu.: 6.000
Median :0.3963 Median :353.0 Median :351.0 Median : 8.000
Mean :0.3952 Mean :354.1 Mean :354.1 Mean : 8.022
3rd Qu.:0.4205 3rd Qu.:402.0 3rd Qu.:394.0 3rd Qu.:10.000
Max. :0.5024 Max. :606.0 Max. :519.0 Max. :16.000
losses ties score_differential
Min. : 0.000 Min. :0.0000 Min. :-261
1st Qu.: 6.000 1st Qu.:0.0000 1st Qu.: -74
Median : 8.000 Median :0.0000 Median : 1
Mean : 8.022 Mean :0.0366 Mean : 0
3rd Qu.:10.000 3rd Qu.:0.0000 3rd Qu.: 72
Max. :16.000 Max. :1.0000 Max. : 315
Mutating new columns for more categorical variables
# mutating and making categorical columns for high/low completion % teams and high/low yards gained teams. These are also only the offense stats.nfl_data <- nfl_data %>%mutate(completion_category =ifelse(offense_completion_percentage >=median(offense_completion_percentage), "High Completion", "Low Completion"), yards_category =ifelse(offense_total_yards_gained_pass >=median(offense_total_yards_gained_pass), "High Yards", "Low Yards") )# Preview the extra columnshead(nfl_data)
Including a linear regression analysis of points scored, offensive yards gained passing, and defensive yards gained.
# Multiple linear regrssionmodel <-lm(points_scored ~ offense_total_yards_gained_pass + defense_total_yards_gained_pass, data = nfl_data)summary(model)
Call:
lm(formula = points_scored ~ offense_total_yards_gained_pass +
defense_total_yards_gained_pass, data = nfl_data)
Residuals:
Min 1Q Median 3Q Max
-123.076 -39.143 -2.418 35.467 203.895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 67.815318 17.088487 3.968 7.92e-05 ***
offense_total_yards_gained_pass 0.076527 0.003320 23.053 < 2e-16 ***
defense_total_yards_gained_pass 0.003761 0.004697 0.801 0.423
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53.68 on 762 degrees of freedom
Multiple R-squared: 0.4456, Adjusted R-squared: 0.4442
F-statistic: 306.3 on 2 and 762 DF, p-value: < 2.2e-16
Simple plots on the winning/losing teams, high/low completions, and yards gained.
library(ggplot2)# Box plot comparing the amount of yards gained to the completion rateggplot(nfl_data, aes(x = completion_category, y = offense_total_yards_gained_pass)) +geom_boxplot() +labs(x ="Completion Category", y ="Total Yards Gained Passing") +ggtitle("Distribution of Total Yards Gained Passing by Completion Category")
ggplot(nfl_data, aes(x = outcome_category, y = offense_total_yards_gained_pass)) +geom_boxplot() +labs(x ="Outcome Category", y ="Total Yards Gained Passing") +ggtitle("Distribution of Total Yards Gained Passing by Outcome Category")
ggplot(nfl_data, aes(x = outcome_category, y = defense_total_yards_gained_pass)) +geom_boxplot() +labs(x ="Outcome Category", y ="Total Defense Yards Gained Passing") +ggtitle("Distribution of Total Yards Gained Passing by Outcome Category")
Maybe I’ve haven’t mutated enough but in this box plot it seems like the winning teams had less defensive yards gained compared to the losing team. Yet again maybe I didn’t mutate enough but in the future would definitly want to look into comparing the defensive stats.
Last statistical plot on offensive yards gained passing compared to winning/losing teams.
library(dplyr)library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
#Chat GBT helped# Filtering the data for the top 15 teams based on offensive yards gained passingtop_teams <- nfl_data %>%arrange(desc(offense_total_yards_gained_pass)) %>%slice(1:15)# Colors being used, I just found some random onescolors <-c("High Scoring"="#FF5733", "Low Scoring"="#33FF7E", "Winning Team"="#337FFF", "Losing Team"="#FF33F3")# Bar chart using highchartertop_teams %>%hchart(type ="column", hcaes(x = team, y = offense_total_yards_gained_pass, color = score_category, group = outcome_category)) %>%hc_legend(enabled =TRUE) %>%hc_colors(colors) %>%hc_title(text ="Top 15 NFL Teams by Offensive Yards Gained Passing") %>%hc_xAxis(title =list(text ="Team")) %>%hc_yAxis(title =list(text ="Offensive Yards Gained")) %>%hc_tooltip(pointFormat ="Offensive Yards Gained: {point.y}") %>%hc_plotOptions(column =list(stacking ="normal")) %>%hc_chart(zoomType ="x") %>%hc_add_theme(hc_theme_flat()) # Making a flat theme
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Reason for choising this dataset
I chose this topic and dataset due to its relevance and significance to me since I have a background of playing football. As a fan of American football, exploring the performance of NFL teams is both interesting and informative. Additionally, understanding the factors that contribute to a team’s success or failure can provide valuable insights into strategic decision-making and player performance evaluation, which are very crucial aspects of the sport.
Background
#The NFL (National Football League) is a professional American football league consisting of 32 teams, divided equally between the National Football Conference (NFC) and the American Football Conference (AFC). Each team competes in a 17-week regular season, followed by playoffs and ultimately the Super Bowl, which is the championship game of the NFL. But i this dataset it only measures the stats from the regular season.
Background research on this topic reveals the increasing importance of data analytics in sports, particularly in football. Teams and analysts can utilize these statistical techniques to gain a competitive edge, ranging from player performance evaluation to game strategies. Understanding variables such as completion percentage, yards gained, and scoring differentials can provide valuable understanding into a team’s strengths and weaknesses, guiding decision-making both on and off the field.
Visualization and Analysis
The visualization represents a bar chart showcasing the offensive yards gained by the top 15 NFL teams. Each bar is color-coded based on categories such as high/low scoring and winning/losing teams, providing an overview of each team’s performance. Upon analyzing the visualization, multiple interesting patterns emerged. The biggest one being there is a clear variation in offensive yards gained among the top teams. One surprise that arises within the visualization is the presence of outliers – teams that perform exceptionally well or poorly compared to others. These outliers may indicate unique strategies, exceptional talent, or underlying issues within the team. One aspect that could have been included in the visualization is interactive featureslike tooltips to provide more detailed information about each team’s performance.