Project 2

What Is My Topic?

The topic of this analysis revolves around the performance statistics of NFL teams during the regular season from 1999 to 2022. The dataset contains various variables, including both categorical and quantitative ones, such as the team’s abbreviation, season year, points scored, points allowed, wins, losses, completion percentage, yards gained, and many more. These variables offer insights into the offensive and defensive capabilities of each team, allowing for a broad analysis of their performance over the years. The data was sourced using the nflreadr package in R, ensuring accuracy and reliability in its collection.

Loading data

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/Truly/OneDrive/Documents/Data work")
nfl_data <- read_csv("nfl stats.csv")
Rows: 765 Columns: 56
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): team
dbl (55): season, offense_completion_percentage, offense_total_yards_gained_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(nfl_data)
     season         team           offense_completion_percentage
 Min.   :1999   Length:765         Min.   :0.4128               
 1st Qu.:2005   Class :character   1st Qu.:0.5409               
 Median :2011   Mode  :character   Median :0.5730               
 Mean   :2011                      Mean   :0.5732               
 3rd Qu.:2017                      3rd Qu.:0.6061               
 Max.   :2022                      Max.   :0.7043               
                                                                
 offense_total_yards_gained_pass offense_total_yards_gained_run
 Min.   :1900                    Min.   :1001                  
 1st Qu.:3117                    1st Qu.:1612                  
 Median :3541                    Median :1808                  
 Mean   :3565                    Mean   :1843                  
 3rd Qu.:3993                    3rd Qu.:2044                  
 Max.   :5444                    Max.   :3326                  
                                                               
 offense_ave_yards_gained_pass offense_ave_yards_gained_run
 Min.   :3.896                 Min.   :2.979               
 1st Qu.:5.602                 1st Qu.:3.992               
 Median :6.093                 Median :4.278               
 Mean   :6.143                 Mean   :4.300               
 3rd Qu.:6.718                 3rd Qu.:4.596               
 Max.   :8.577                 Max.   :5.784               
                                                           
 offense_total_air_yards offense_ave_air_yards offense_total_yac
 Min.   :   0            Min.   : 5.655        Min.   :  -1     
 1st Qu.:   0            1st Qu.: 7.711        1st Qu.:   0     
 Median :4180            Median : 8.278        Median :1557     
 Mean   :3234            Mean   : 8.312        Mean   :1254     
 3rd Qu.:4800            3rd Qu.: 8.884        3rd Qu.:1882     
 Max.   :6642            Max.   :11.266        Max.   :2850     
                         NA's   :221                            
 offense_ave_yac  offense_n_plays_pass offense_n_plays_run
 Min.   :-1.000   Min.   :394.0        Min.   :298.0      
 1st Qu.: 4.655   1st Qu.:541.0        1st Qu.:392.0      
 Median : 5.070   Median :580.0        Median :421.0      
 Mean   : 5.092   Mean   :579.5        Mean   :426.8      
 3rd Qu.: 5.519   3rd Qu.:620.0        3rd Qu.:461.0      
 Max.   :11.667   Max.   :776.0        Max.   :593.0      
 NA's   :199                                              
 offense_n_interceptions offense_n_fumbles_lost_pass offense_n_fumbles_lost_run
 Min.   : 2.00           Min.   : 0.000              Min.   : 0.00             
 1st Qu.:12.00           1st Qu.: 4.000              1st Qu.: 2.00             
 Median :15.00           Median : 5.000              Median : 3.00             
 Mean   :15.08           Mean   : 5.339              Mean   : 3.69             
 3rd Qu.:18.00           3rd Qu.: 7.000              3rd Qu.: 5.00             
 Max.   :32.00           Max.   :13.000              Max.   :12.00             
                                                                               
 offense_total_epa_pass offense_total_epa_run offense_ave_epa_pass
 Min.   :-192.026       Min.   :-107.835      Min.   :-0.332342   
 1st Qu.: -48.991       1st Qu.: -43.718      1st Qu.:-0.087526   
 Median :   3.624       Median : -25.003      Median : 0.007097   
 Mean   :   5.827       Mean   : -22.281      Mean   : 0.008385   
 3rd Qu.:  56.849       3rd Qu.:  -5.015      3rd Qu.: 0.099770   
 Max.   : 248.445       Max.   :  84.928      Max.   : 0.406900   
                                                                  
 offense_ave_epa_run offense_total_wpa_pass offense_total_wpa_run
 Min.   :-0.26904    Min.   :-5.91666       Min.   :-3.4171      
 1st Qu.:-0.10600    1st Qu.:-1.36412       1st Qu.:-1.1173      
 Median :-0.05740    Median :-0.10653       Median :-0.4679      
 Mean   :-0.05591    Mean   : 0.05851       Mean   :-0.4312      
 3rd Qu.:-0.01138    3rd Qu.: 1.45503       3rd Qu.: 0.2319      
 Max.   : 0.16437    Max.   : 6.07264       Max.   : 2.6223      
                                                                 
 offense_ave_wpa_pass offense_ave_wpa_run offense_success_rate_pass
 Min.   :-1.007e-02   Min.   :-0.009572   Min.   :0.3066           
 1st Qu.:-2.331e-03   1st Qu.:-0.002720   1st Qu.:0.4103           
 Median :-1.750e-04   Median :-0.001074   Median :0.4400           
 Mean   : 4.066e-05   Mean   :-0.001114   Mean   :0.4412           
 3rd Qu.: 2.455e-03   3rd Qu.: 0.000533   3rd Qu.:0.4728           
 Max.   : 9.903e-03   Max.   : 0.004995   Max.   :0.5799           
                                                                   
 offense_success_rate_run defense_completion_percentage
 Min.   :0.2837           Min.   :0.4570               
 1st Qu.:0.3679           1st Qu.:0.5477               
 Median :0.3927           Median :0.5740               
 Mean   :0.3953           Mean   :0.5742               
 3rd Qu.:0.4211           3rd Qu.:0.6024               
 Max.   :0.5238           Max.   :0.6915               
                                                       
 defense_total_yards_gained_pass defense_total_yards_gained_run
 Min.   :2413                    Min.   : 945                  
 1st Qu.:3252                    1st Qu.:1627                  
 Median :3575                    Median :1825                  
 Mean   :3565                    Mean   :1843                  
 3rd Qu.:3857                    3rd Qu.:2038                  
 Max.   :4800                    Max.   :2910                  
                                                               
 defense_ave_yards_gained_pass defense_ave_yards_gained_run
 Min.   :4.300                 Min.   :2.640               
 1st Qu.:5.747                 1st Qu.:4.003               
 Median :6.125                 Median :4.315               
 Mean   :6.157                 Mean   :4.303               
 3rd Qu.:6.564                 3rd Qu.:4.597               
 Max.   :7.869                 Max.   :5.564               
                                                           
 defense_total_air_yards defense_ave_air_yards defense_total_yac
 Min.   :   0            Min.   : 5.761        Min.   :   0     
 1st Qu.:   0            1st Qu.: 7.809        1st Qu.:   0     
 Median :4256            Median : 8.285        Median :1620     
 Mean   :3234            Mean   : 8.301        Mean   :1254     
 3rd Qu.:4722            3rd Qu.: 8.793        3rd Qu.:1853     
 Max.   :6235            Max.   :10.511        Max.   :2586     
                         NA's   :221                            
 defense_ave_yac  defense_n_plays_pass defense_n_plays_run
 Min.   : 0.000   Min.   :433.0        Min.   :317.0      
 1st Qu.: 4.763   1st Qu.:549.0        1st Qu.:397.0      
 Median : 5.093   Median :578.0        Median :424.0      
 Mean   : 5.121   Mean   :579.5        Mean   :426.8      
 3rd Qu.: 5.479   3rd Qu.:612.0        3rd Qu.:454.0      
 Max.   :11.667   Max.   :728.0        Max.   :584.0      
 NA's   :202                                              
 defense_n_interceptions defense_n_fumbles_lost_pass defense_n_fumbles_lost_run
 Min.   : 2.00           Min.   : 0.000              Min.   : 0.00             
 1st Qu.:11.00           1st Qu.: 4.000              1st Qu.: 2.00             
 Median :15.00           Median : 5.000              Median : 3.00             
 Mean   :15.08           Mean   : 5.339              Mean   : 3.69             
 3rd Qu.:18.00           3rd Qu.: 7.000              3rd Qu.: 5.00             
 Max.   :33.00           Max.   :18.000              Max.   :14.00             
                                                                               
 defense_total_epa_pass defense_total_epa_run defense_ave_epa_pass
 Min.   :-184.562       Min.   :-129.144      Min.   :-0.33435    
 1st Qu.: -33.754       1st Qu.: -41.358      1st Qu.:-0.05870    
 Median :   8.190       Median : -23.425      Median : 0.01393    
 Mean   :   5.827       Mean   : -22.281      Mean   : 0.01093    
 3rd Qu.:  45.419       3rd Qu.:  -3.341      3rd Qu.: 0.07791    
 Max.   : 174.264       Max.   :  61.022      Max.   : 0.29840    
                                                                  
 defense_ave_epa_run defense_total_wpa_pass defense_total_wpa_run
 Min.   :-0.360737   Min.   :-4.72279       Min.   :-3.3453      
 1st Qu.:-0.101735   1st Qu.:-1.01271       1st Qu.:-1.0272      
 Median :-0.055393   Median : 0.10270       Median :-0.4424      
 Mean   :-0.055497   Mean   : 0.05851       Mean   :-0.4312      
 3rd Qu.:-0.007957   3rd Qu.: 1.14822       3rd Qu.: 0.2101      
 Max.   : 0.129284   Max.   : 5.13211       Max.   : 2.2887      
                                                                 
 defense_ave_wpa_pass defense_ave_wpa_run defense_success_rate_pass
 Min.   :-0.0085558   Min.   :-0.008906   Min.   :0.3243           
 1st Qu.:-0.0017829   1st Qu.:-0.002522   1st Qu.:0.4192           
 Median : 0.0001650   Median :-0.001025   Median :0.4434           
 Mean   : 0.0001062   Mean   :-0.001112   Mean   :0.4422           
 3rd Qu.: 0.0019818   3rd Qu.: 0.000460   3rd Qu.:0.4651           
 Max.   : 0.0085110   Max.   : 0.004849   Max.   :0.5409           
                                                                   
 defense_success_rate_run points_scored   points_allowed       wins       
 Min.   :0.2780           Min.   :161.0   Min.   :165.0   Min.   : 0.000  
 1st Qu.:0.3689           1st Qu.:301.0   1st Qu.:314.0   1st Qu.: 6.000  
 Median :0.3963           Median :353.0   Median :351.0   Median : 8.000  
 Mean   :0.3952           Mean   :354.1   Mean   :354.1   Mean   : 8.022  
 3rd Qu.:0.4205           3rd Qu.:402.0   3rd Qu.:394.0   3rd Qu.:10.000  
 Max.   :0.5024           Max.   :606.0   Max.   :519.0   Max.   :16.000  
                                                                          
     losses            ties        score_differential
 Min.   : 0.000   Min.   :0.0000   Min.   :-261      
 1st Qu.: 6.000   1st Qu.:0.0000   1st Qu.: -74      
 Median : 8.000   Median :0.0000   Median :   1      
 Mean   : 8.022   Mean   :0.0366   Mean   :   0      
 3rd Qu.:10.000   3rd Qu.:0.0000   3rd Qu.:  72      
 Max.   :16.000   Max.   :1.0000   Max.   : 315      
                                                     

Mutating new columns for more categorical variables

# Mutate the data, doing high/low scoring and winning/losing teams
nfl_data <- nfl_data %>%
  mutate(
    score_category = ifelse(points_scored >= median(points_scored), "High Scoring", "Low Scoring"),
    outcome_category = ifelse(wins >= median(wins), "Winning Team", "Losing Team")
  )

head(nfl_data)
# A tibble: 6 × 58
  season team  offense_completion_percentage offense_total_yards_gained_pass
   <dbl> <chr>                         <dbl>                           <dbl>
1   2022 ARI                           0.607                            3632
2   2022 ATL                           0.571                            2703
3   2022 BAL                           0.575                            3042
4   2022 BUF                           0.595                            4131
5   2022 CAR                           0.544                            2996
6   2022 CHI                           0.508                            2221
# ℹ 54 more variables: offense_total_yards_gained_run <dbl>,
#   offense_ave_yards_gained_pass <dbl>, offense_ave_yards_gained_run <dbl>,
#   offense_total_air_yards <dbl>, offense_ave_air_yards <dbl>,
#   offense_total_yac <dbl>, offense_ave_yac <dbl>, offense_n_plays_pass <dbl>,
#   offense_n_plays_run <dbl>, offense_n_interceptions <dbl>,
#   offense_n_fumbles_lost_pass <dbl>, offense_n_fumbles_lost_run <dbl>,
#   offense_total_epa_pass <dbl>, offense_total_epa_run <dbl>, …
# mutating and making categorical columns for high/low completion % teams and high/low yards gained teams. These are also only the offense stats.
nfl_data <- nfl_data %>%
  mutate(
    completion_category = ifelse(offense_completion_percentage >= median(offense_completion_percentage), "High Completion", "Low Completion"), yards_category = ifelse(offense_total_yards_gained_pass >= median(offense_total_yards_gained_pass), "High Yards", "Low Yards")
  )
# Preview the extra columns
head(nfl_data)
# A tibble: 6 × 60
  season team  offense_completion_percentage offense_total_yards_gained_pass
   <dbl> <chr>                         <dbl>                           <dbl>
1   2022 ARI                           0.607                            3632
2   2022 ATL                           0.571                            2703
3   2022 BAL                           0.575                            3042
4   2022 BUF                           0.595                            4131
5   2022 CAR                           0.544                            2996
6   2022 CHI                           0.508                            2221
# ℹ 56 more variables: offense_total_yards_gained_run <dbl>,
#   offense_ave_yards_gained_pass <dbl>, offense_ave_yards_gained_run <dbl>,
#   offense_total_air_yards <dbl>, offense_ave_air_yards <dbl>,
#   offense_total_yac <dbl>, offense_ave_yac <dbl>, offense_n_plays_pass <dbl>,
#   offense_n_plays_run <dbl>, offense_n_interceptions <dbl>,
#   offense_n_fumbles_lost_pass <dbl>, offense_n_fumbles_lost_run <dbl>,
#   offense_total_epa_pass <dbl>, offense_total_epa_run <dbl>, …

Including a linear regression analysis of points scored, offensive yards gained passing, and defensive yards gained.

# Multiple linear regrssion
model <- lm(points_scored ~ offense_total_yards_gained_pass + defense_total_yards_gained_pass, data = nfl_data)

summary(model)

Call:
lm(formula = points_scored ~ offense_total_yards_gained_pass + 
    defense_total_yards_gained_pass, data = nfl_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-123.076  -39.143   -2.418   35.467  203.895 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     67.815318  17.088487   3.968 7.92e-05 ***
offense_total_yards_gained_pass  0.076527   0.003320  23.053  < 2e-16 ***
defense_total_yards_gained_pass  0.003761   0.004697   0.801    0.423    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 53.68 on 762 degrees of freedom
Multiple R-squared:  0.4456,    Adjusted R-squared:  0.4442 
F-statistic: 306.3 on 2 and 762 DF,  p-value: < 2.2e-16

Simple plots on the winning/losing teams, high/low completions, and yards gained.

library(ggplot2)
# Box plot comparing the amount of yards gained to the completion rate
ggplot(nfl_data, aes(x = completion_category, y = offense_total_yards_gained_pass)) +
  geom_boxplot() +
  labs(x = "Completion Category", y = "Total Yards Gained Passing") +
  ggtitle("Distribution of Total Yards Gained Passing by Completion Category")

ggplot(nfl_data, aes(x = outcome_category, y = offense_total_yards_gained_pass)) +
  geom_boxplot() +
  labs(x = "Outcome Category", y = "Total Yards Gained Passing") +
  ggtitle("Distribution of Total Yards Gained Passing by Outcome Category")

ggplot(nfl_data, aes(x = outcome_category, y = defense_total_yards_gained_pass)) +
  geom_boxplot() +
  labs(x = "Outcome Category", y = "Total Defense Yards Gained Passing") +
  ggtitle("Distribution of Total Yards Gained Passing by Outcome Category")

Maybe I’ve haven’t mutated enough but in this box plot it seems like the winning teams had less defensive yards gained compared to the losing team. Yet again maybe I didn’t mutate enough but in the future would definitly want to look into comparing the defensive stats.

Last statistical plot on offensive yards gained passing compared to winning/losing teams.

library(dplyr)
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
#Chat GBT helped
# Filtering the data for the top 15 teams based on offensive yards gained passing
top_teams <- nfl_data %>%
  arrange(desc(offense_total_yards_gained_pass)) %>%
  slice(1:15)

# Colors being used, I just found some random ones
colors <- c("High Scoring" = "#FF5733", "Low Scoring" = "#33FF7E", "Winning Team" = "#337FFF", "Losing Team" = "#FF33F3")

# Bar chart using highcharter
top_teams %>%
  hchart(type = "column", hcaes(x = team, y = offense_total_yards_gained_pass, color = score_category, group = outcome_category)) %>%
  hc_legend(enabled = TRUE) %>%
  hc_colors(colors) %>%
  hc_title(text = "Top 15 NFL Teams by Offensive Yards Gained Passing") %>%
  hc_xAxis(title = list(text = "Team")) %>%
  hc_yAxis(title = list(text = "Offensive Yards Gained")) %>%
  hc_tooltip(pointFormat = "Offensive Yards Gained: {point.y}") %>%
  hc_plotOptions(column = list(stacking = "normal")) %>%
  hc_chart(zoomType = "x") %>%
  hc_add_theme(hc_theme_flat())  # Making a flat theme
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.

Reason for choising this dataset

I chose this topic and dataset due to its relevance and significance to me since I have a background of playing football. As a fan of American football, exploring the performance of NFL teams is both interesting and informative. Additionally, understanding the factors that contribute to a team’s success or failure can provide valuable insights into strategic decision-making and player performance evaluation, which are very crucial aspects of the sport.

Background

#The NFL (National Football League) is a professional American football league consisting of 32 teams, divided equally between the National Football Conference (NFC) and the American Football Conference (AFC). Each team competes in a 17-week regular season, followed by playoffs and ultimately the Super Bowl, which is the championship game of the NFL. But i this dataset it only measures the stats from the regular season.

Background research on this topic reveals the increasing importance of data analytics in sports, particularly in football. Teams and analysts can utilize these statistical techniques to gain a competitive edge, ranging from player performance evaluation to game strategies. Understanding variables such as completion percentage, yards gained, and scoring differentials can provide valuable understanding into a team’s strengths and weaknesses, guiding decision-making both on and off the field.

Visualization and Analysis

The visualization represents a bar chart showcasing the offensive yards gained by the top 15 NFL teams. Each bar is color-coded based on categories such as high/low scoring and winning/losing teams, providing an overview of each team’s performance. Upon analyzing the visualization, multiple interesting patterns emerged. The biggest one being there is a clear variation in offensive yards gained among the top teams. One surprise that arises within the visualization is the presence of outliers – teams that perform exceptionally well or poorly compared to others. These outliers may indicate unique strategies, exceptional talent, or underlying issues within the team. One aspect that could have been included in the visualization is interactive featureslike tooltips to provide more detailed information about each team’s performance.