Football Divisions

Author

Asher Scott

Intro: My data Set is based on the top 50 quarterbacks in the NFL. These stats are based on the 2023-2024 regular and playoff season performances. The variables in my dataset were the division, passing yards, yards per attempt, total attempts, total completions, interceptions, completion, percentage, and passer ratings. The source of the data set came from the official NFL passing category. I chose this topic because I am an avid football fan and I specifically wanted to see how each division compared to one another. https://www.nfl.com/stats/player-stats/

library(ggplot2)
setwd("/Users/asherscott/Desktop/Data 110")
NFL <- read.csv("AFB.csv")

Here I use piping and mutate to create a second column and manualy placee the each quarterback(QB) into their respected division. Most I knew through memory, but I used the website where I got the dataset from to double check.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
FD_data <- NFL %>%
  mutate(Division = case_when(
    Player %in% c("Tua Tagovailoa", "Bailey Zappe", "Josh Allen","Trevor Siemian", "Tyrod Taylor", "Zach Wilson") ~ "AFC East",
    Player %in% c("Lamar Jackson", "Jake Browning", "Joe Burrow", "Kenny Pickett", "Deshaun Watson", "Joe Flacco", "Mitchell Trubisky", "Mason Rudolph") ~ "AFC North",
    Player %in% c("Jared Goff", "Jordan Love", "Justin Fields", "Joshua Dobbs", "Kirk Cousins", "Nick Mullens", "Tyson Bagent") ~ "NFC North",
    Player %in% c("Dak Prescott", "Sam Howell", "Jalen Hurts", "Tommy Devito", "Daniel Jones", "Tyrod Taylor", "Mac Jones") ~ "NFC East",
    Player %in% c("Patrick Mahomes", "Justin Herbert", "Russell Wilson", "Jimmy Garapallo", "Easton Stick", "Aiden O'connell") ~ "AFC West",
    Player %in% c("Derek Carr", "Baker Mayfield", "Bryce Young", "PJ Walker", "Taylor Heinicke", "Desmond Ridder") ~ "NFC South",
    Player %in% c("Brock Purdy", "Matthew Stafford", "Geno Smith", "Kyler Murray", "Drew Lock") ~ "NFC West",
    Player %in% c("Trevor Lawrence", "C.J. Stroud", "Gardner Minshew", "Anthony Richardson", "Will Levis", "Ryan Tanehill") ~ "AFC South",
    TRUE ~ "Unknown"  # Handle cases where player's division is not specified
  ))
head(FD_data)
           Player Pass.Yds Yds.Att Att Cmp Cmp.. TD INT  Rate X1st X1st. X20
1  Tua Tagovailoa     4624     8.3 560 388  69.3 29  14 101.1  222  39.6  58
2      Jared Goff     4575     7.6 605 407  67.3 30  12  97.9  227  37.5  69
3    Dak Prescott     4516     7.6 590 410  69.5 36   9 105.9  222  37.6  62
4      Josh Allen     4306     7.4 579 385  66.5 29  18  92.2  199  34.4  49
5     Brock Purdy     4280     9.6 444 308  69.4 31  11 113.0  192  43.2  72
6 Patrick Mahomes     4183     7.0 597 401  67.2 27  14  92.6  206  34.5  50
  X40 Lng Sck SckY  Division
1  11  78  29  171  AFC East
2   9  70  30  197 NFC North
3   7  92  39  255  NFC East
4   9  81  24  152  AFC East
5  14  76  28  153  NFC West
6   8  67  27  186  AFC West

In this chunk I averaged each vaariable and then subrtract the highest average from lowest of each section to get a more relative and acurate number. I then converted the visualization to long form so it would look more visually pleasing.The AFC West had the overall highest scores, which makes sense since considering majority of those QBs are known as “pocket qbs” or people who really on their arm strength. The AFC North had the worst stats, which also makes sense considering they have two teams that heavily run the ball: Ravens And Browns.

library(ggplot2)
library(viridis)
Loading required package: viridisLite
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
✔ readr     2.1.5     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
division_stats <- FD_data %>%
  group_by(Division) %>%
  summarise(
    Avg_Pss = mean(Pass.Yds, na.rm = TRUE),
    Avg_Yds_Att = mean(Yds.Att, na.rm = TRUE),
    Avg_Att = mean(Att, na.rm = TRUE),
    Avg_Cmp = mean(Cmp, na.rm = TRUE),
    Avg_INT = mean(INT, na.rm = TRUE),
    Avg_Rate = mean(Rate, na.rm = TRUE),
    Avg_Cmp_Percent = mean(Cmp.., na.rm = TRUE)
  ) %>%
  ungroup() %>%
  mutate(
    Avg_Pss = (Avg_Pss - min(Avg_Pss)) / (max(Avg_Pss) - min(Avg_Pss)),
    Avg_Yds_Att = (Avg_Yds_Att - min(Avg_Yds_Att)) / (max(Avg_Yds_Att) - min(Avg_Yds_Att)),
    Avg_Att = (Avg_Att - min(Avg_Att)) / (max(Avg_Att) - min(Avg_Att)),
    Avg_Cmp = (Avg_Cmp - min(Avg_Cmp)) / (max(Avg_Cmp) - min(Avg_Cmp)),
    Avg_INT = (Avg_INT - min(Avg_INT)) / (max(Avg_INT) - min(Avg_INT)),
    Avg_Rate = (Avg_Rate - min(Avg_Rate)) / (max(Avg_Rate) - min(Avg_Rate)),
    Avg_Cmp_Percent = (Avg_Cmp_Percent - min(Avg_Cmp_Percent)) / (max(Avg_Cmp_Percent) - min(Avg_Cmp_Percent))
  )
division_stats_long <- division_stats %>%
  pivot_longer(cols = starts_with("Avg"), names_to = "Metric", values_to = "Value")
division_stats_long$Division <- factor(division_stats_long$Division, levels = unique(division_stats_long$Division))
ggplot(division_stats_long, aes(x = Division, y = Metric, fill = Value)) +
  geom_tile() +
  scale_fill_viridis(name = "Relative Average Value", option = "cividis") +
  labs(title = "Relative Average Statistics by Division", x = "Division", y = "Metric") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

For this chunk I just chose the top 8 starter and backups of each Conference (AfC/NFC) because there were more AFC QBs than NFC. Note: Aaron Rogers was the starter for the Jets, but was injuried after four plays so I made Zach Wilson with the starters. I also know this becuase I wtched the game Live.

FD2 <- NFL %>%
  mutate(Starting = case_when(
    Player %in% c("Patrick Mahomes", "Tua Tagovailoa", "Josh Allen", "C.J. Stroud", "Trevor Lawrence", "Matthew Stafford",  "Lamar Jackson", "Justin Herbert") ~ "AFC Starters",
    Player %in% c("Jared Goff", "Dak Prescott", "Brock Purdy", "Jordan Love", "Baker Mayfield", "Derek Carr", "Jalen Hurts", "Sam Howell") ~ "NFC Starters",
    Player %in% c("Gardner Minshew", "Joe Flacco", "Jake Browning", "Tyrod Taylor", "Bailey Zappe", "Aiden O'connell", "Easton Stick", "Trevor Siemian") ~ "AFC Backups",
    Player %in% c("Joshua Dobbs", "Will Levis", "Nick Mullens", "Tommy Devito", "Taylor Heinicke", "Tyson Bagent", "PJ Walker", "Drew Lock") ~ "NFC Backups",
    TRUE ~ "Unknown"  
  ))

head(FD2)
           Player Pass.Yds Yds.Att Att Cmp Cmp.. TD INT  Rate X1st X1st. X20
1  Tua Tagovailoa     4624     8.3 560 388  69.3 29  14 101.1  222  39.6  58
2      Jared Goff     4575     7.6 605 407  67.3 30  12  97.9  227  37.5  69
3    Dak Prescott     4516     7.6 590 410  69.5 36   9 105.9  222  37.6  62
4      Josh Allen     4306     7.4 579 385  66.5 29  18  92.2  199  34.4  49
5     Brock Purdy     4280     9.6 444 308  69.4 31  11 113.0  192  43.2  72
6 Patrick Mahomes     4183     7.0 597 401  67.2 27  14  92.6  206  34.5  50
  X40 Lng Sck SckY     Starting
1  11  78  29  171 AFC Starters
2   9  70  30  197 NFC Starters
3   7  92  39  255 NFC Starters
4   9  81  24  152 AFC Starters
5  14  76  28  153 NFC Starters
6   8  67  27  186 AFC Starters

For the Second visualization I wanted to compare Starters versus backups in by conference. I intially had it by divison but it became a bit to clustered. From this visualization we can see that Starting QBS from the NFC had the highest scores in each category, which is especially interesting considering MVP, Superbowl MVP, And Offensive Rookie of the Year all went to AFC QBs.

library(dplyr)
library(tidyr)
library(ggplot2)
library(viridis)

starter_stats <- FD2 %>%
  group_by(Starting) %>%
  summarise(
    Avg_Pss = mean(Pass.Yds, na.rm = TRUE),
    Avg_Yds_Att = mean(Yds.Att, na.rm = TRUE),
    Avg_Att = mean(Att, na.rm = TRUE),
    Avg_Cmp = mean(Cmp, na.rm = TRUE),
    Avg_INT = mean(INT, na.rm = TRUE),
    Avg_Rate = mean(Rate, na.rm = TRUE),
    Avg_Cmp_Percent = mean(Cmp.., na.rm = TRUE)
  ) %>%
  ungroup() %>%
  mutate(
    Avg_Pss = (Avg_Pss - min(Avg_Pss)) / (max(Avg_Pss) - min(Avg_Pss)),
    Avg_Yds_Att = (Avg_Yds_Att - min(Avg_Yds_Att)) / (max(Avg_Yds_Att) - min(Avg_Yds_Att)),
    Avg_Att = (Avg_Att - min(Avg_Att)) / (max(Avg_Att) - min(Avg_Att)),
    Avg_Cmp = (Avg_Cmp - min(Avg_Cmp)) / (max(Avg_Cmp) - min(Avg_Cmp)),
    Avg_INT = (Avg_INT - min(Avg_INT)) / (max(Avg_INT) - min(Avg_INT)),
    Avg_Rate = (Avg_Rate - min(Avg_Rate)) / (max(Avg_Rate) - min(Avg_Rate)),
    Avg_Cmp_Percent = (Avg_Cmp_Percent - min(Avg_Cmp_Percent)) / (max(Avg_Cmp_Percent) - min(Avg_Cmp_Percent))
  ) %>%
  pivot_longer(cols = starts_with("Avg"), names_to = "Metric", values_to = "Value")
starter_stats_filtered <- starter_stats %>%
  filter(Starting != "Unknown")
ggplot(starter_stats, aes(x = Metric, y = Starting, fill = Value)) +
  geom_tile(color = "white") +
  scale_fill_viridis(name = "Relative Average Value", option = "cividis") +
  labs(title = "Relative Average Statistics by Starter Status", x = "Metrics", y = "Starter Status") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))