R Assignment-NFL Offensive Dataset 2019-2022

Introduction

Over the course of the NFL, the quarterback position has always been the most crucial component of a successful winning team. Over the past few decades, we have seen some incredible offensive performances by quarterbacks. Especially in the past 3 seasons from 2019-2021, offensive production from the league has been up dramatically thanks to a select bunch of quarterbacks. To really get an understanding of how good or bad some of these quarterbacks are I have selected the data set nfl_offensive_stats which will help me analyze the top quarterbacks from the seasons 2019-2021. At the end of my analysis, I wanted to figure out who out of the three seasons played the best and why?

Data set

In the data set, there are 19,973 rows and 70 columns containing information on the offensive statistics from the seasons 2019-2021. This data includes all positions on offense besides the offensive line and shows different statistics based on individual performance. When looking through this data I wanted to focus on the data strictly around quarterbacks. Meaning I specifically used the columns pass_td, player, position, pass_sacked, pass_cmp, pass_yds,and pass_rating. Using these selected columns I could narrow down based on the numbers who is considered the best quarterback during those three seasons. As well, my analysis was aimed at identifying patterns and trends in the data that could provide insights into the factors contributing to their performances.

When munging the data to get ready to start to figure out who the best quarterbacks I noticed that the 2022 season wasn’t filled. To fix this I decided it was best that I removed that season and just primarily focused on the seasons 2019-2021. After removing that season I was ready to start preparing my visualizations to better show the data that I was working with. When selecting my Top 10 QBs I focused on picking the top 10 touchdown leaders from the three seasons. With that result, those were considered my top 10 QBs because when I think of a good quarterback I think of a quarterback that can throw a lot of touchdowns.

Findings

My analysis revealed some interesting findings about the performance of the top 10 quarterbacks in the NFL from 2019-2021. First, in terms of passer rating, I found that Aaron Rodgers had the highest average passer rating over the past 3 seasons. With Kirk Cousins being a close second. Second, in terms of touchdown passes, I found that Rodgers had the most with a total of 116 over the three seasons. He was followed by Tom Brady, who had 111 touchdowns respectively.

Third, when it came to completions, Tom Brady led the pack with an average of 432.3 completions per season, followed closely by Patrick Mahomes with an average of 406.3. Fourth, in terms of sacks, I found that Russell Wilson was sacked the most with 137 sacks over the three seasons. He was followed by Matt Ryan who had 121. As well as Patrick Mahomes and Tom Brady were sacked the least with 72 and 74 total sacks. Finally, in terms of passing yards, I found that Aaron Rodgers had the most passing yards with a total of 13201 yards over the three seasons. He was followed by Josh Allen and Kirk Cousins, who had 12,762 and 12,253 passing yards. Tom Brady had the season high with 6,040 yards in an entire season, while Matthew Stafford had a low of 2,499.

Visulizations

Tab 1

My first visualization highlights how I figured out how to generate my top 10 quarterback list. First I generated who had the most throwing touchdowns of the combined three seasons. When I got the list of quarterbacks I decided to generate their scoring by season on a horizontal stacked bar chart. At the end of the bar I have added the players combined TD total at the end of the 3 seasons. This chart helped show me who had the most touchdowns in the past 3 seasons. Rodgers and Brady led the pack in throwing touchdowns for the past three NFL seasons.

df$year <- year(mdy(df$game_date))
top_10_players_total <- df %>%
  filter(position == "QB" & year >= 2019 & year <= 2021) %>%
  select(player,pass_td) %>%
  group_by(player) %>%
  summarize(total_pass_td = sum(pass_td), .groups = 'keep') %>%
  arrange(desc(total_pass_td)) %>%
  head(10) %>%
  data.frame()


top_10_players <- df %>%
  filter(position == "QB" & player %in% top_10_players_total$player & year >= 2019 & year <= 2021) %>%
  select(player,game_date,pass_td) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player,year) %>%
  summarize(pass_td = sum(pass_td), .groups = 'keep') %>%
  arrange(player,desc(pass_td)) %>%
  data.frame()


top_10_players$year <- factor(top_10_players$year)



ggplot(top_10_players, aes(x = reorder(player, -pass_td), y = pass_td, fill = year)) +
  geom_bar(stat = "identity",position = position_stack(reverse = TRUE)) +
  coord_flip() +
  labs(title = "Top 10 Quarterback TD Passers - 3 Years (Including Playoffs)", x = "Player", y = "TD Passes",fill = "Season")+
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "RdYlBu",guide = guide_legend(reverse = FALSE))+
  geom_text(data = top_10_players_total, aes(x = player , y = total_pass_td, label = total_pass_td, fill = NULL),hjust = -0.4, size=4) +
  scale_y_continuous(breaks = seq(0,140, by = 20), limits = c(0,140))

Tab 2

My second visualization is a multiple line plots which highlights the top 10 quarterbacks throwing yards for the past 3 seasons.It highlights who out of the best touchdown throwers could also throw the ball down the field and generate the offense. Tom Brady lead the pack with the highest total passing yard season with 6,040 passing yards. Matthew Stafford had the worst total passing yard season with 2,499 passing yards.

df$year <- year(mdy(df$game_date))

top_10_scorers_with_passing_yards <- df %>%
  filter(position == "QB" & player %in% top_10_players$player & year >= 2019 & year <= 2021) %>%
  select(player, game_date, pass_yds) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player,year) %>%
  summarize(pass_yds = sum(pass_yds), .groups = 'keep') %>%
  data.frame()



top_10_scorers_with_passing_yards$year <- factor(top_10_scorers_with_passing_yards$year)

hi_lo <- top_10_scorers_with_passing_yards %>%
  filter(pass_yds == min(pass_yds) | pass_yds == max(pass_yds)) %>%
  data.frame()




ggplot(top_10_scorers_with_passing_yards, aes(x = year, y = pass_yds, group = player)) +
  geom_line(aes(color=player), size = 2) +
  labs(title = "Top 10 Quarterback touchdown throwers passing yards (Including Playoffs)", x = "Season", y = "Passing yards", color = "Player") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))+
  geom_point(shape = 21, size = 3, color= "black", fill= "yellow") +
  scale_y_continuous(limits = c(0, max(top_10_scorers_with_passing_yards$pass_yds)), breaks = seq(0, max(top_10_scorers_with_passing_yards$pass_yds), by = 500), labels = scales::comma) +
  scale_color_brewer(palette = "Spectral", name = "Player", guide = guide_legend(reverse = TRUE)) +
  geom_point(data = hi_lo, aes(x = year, y = pass_yds), shape = 21, size = 4, fill = 'red', color = 'red') +
  geom_label_repel(aes(label = ifelse(pass_yds ==max(pass_yds) | pass_yds == min(pass_yds), scales::comma(pass_yds), "")), 
                   box.padding = 1, 
                   point.padding = 1, 
                   size = 4, 
                   color = 'red', 
                   segment.color = 'darkblue')

Tab 3

My third visualization is a heat map which highlights the top 10 quarterbacks total sacks over the past 3 seasons. This was important to generate because in the NFL a good offensive line could really change the production of a quarterback. Patrick Mahomes and Tom Brady were sacked the least over the past 3 seasons which played a factor in there performance. Russel Wilson who got sacked the most was somewhat average in his stats over the years. It makes you question if he did have a good offensive line would his stats be better?

sacks_df <- df %>%
  filter(player %in% top_10_players_total$player & year >= 2019 & year <= 2021) %>%
  select(player,game_date,pass_sacked) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player,year) %>%
  summarise(pass_sacked = sum(pass_sacked), .groups = 'keep') %>%
  data.frame()


sacks_df$year <- factor(sacks_df$year)

 ggplot(sacks_df, aes(x = reorder(player, -pass_sacked), y = year, fill = pass_sacked)) +
  geom_tile(color = "black") +
  geom_text(aes(label= pass_sacked))+
  coord_equal(ratio = 1) +
  labs(title = "Heatmap : Total sacks per year by a Top 10 QB (Including Playoffs)",
       x = "Player",
       y = "Season",
       fill = "Total Sacks") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_continuous(low = "white",high = "purple") +
  guides(fill = guide_legend(reverse = TRUE, override.aes = list(colour = "black")))

Tab 4

My fourth visualization is a trellis chart featuring the top 10 quarterbacks completions over the past 3 seasons. Completions are another big part in generating offense and usually a player with a high number of completions has the most yards and touchdowns. Tom Brady led the 2021 season with a high of 529 completions which is indeed a crazy number. This generated to his success in touchdown passes and passing yards over the past 3 seasons. Matthew Stafford had a low of 187 passing completions and that can be shown in his low of passing yards for 2019.

pass_cmp_df <- df %>%
  filter(player %in% top_10_players_total$player & year >= 2019 & year <= 2021) %>%
  select(player,game_date,pass_cmp) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player, year) %>%
  summarise(pass_cmp = sum(pass_cmp), .groups = 'keep') %>%
  data.frame()


pass_cmp_df$year <- factor(pass_cmp_df$year)

ggplot(pass_cmp_df, aes(x = year, y = pass_cmp, fill = year)) +
  geom_bar(stat = "identity", position ="dodge") + 
  geom_text(aes(label = pass_cmp), position = position_dodge(width = 1), vjust = -0.5) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(title = "Multiple Bar Charts-Total Passing Completions by Year and Player (Including Playoffs)",
       x = "Season",
       y = "Pass Completion Total",
       fill = "Season") +
  scale_fill_brewer(palette = "Blues") +
   scale_y_continuous(breaks = seq(0,600, by = 100), limits = c(0,600)) +
  facet_wrap(~player, ncol = 5, nrow = 2) +
  guides(fill = guide_legend(reverse = TRUE))

Tab 5

My final visualization is a nested donut chart of the top 10 quarterbacks passer rating. A QBs passer rating is generated through passing attempts, completions, yards, touchdowns, and interceptions. As I highlighted 3 out of the 5 components going into passer rating in my prior visualizations, this nested donut chart gives us an idea who had the best average of the top stats in a QB. The league average is 72 which isn’t the best, but out of these top 10 quarterbacks there average is 100 exactly. Even though some of these Quarterbacks stats might not jump out at you, their efficiencys are off the charts with the high passer ratings they all have.

pass_rating_df <- df %>%
  filter(player %in% top_10_players_total$player & year >= 2019 & year <= 2021) %>%
  select(player,game_date,pass_rating) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player,year) %>%
  summarise(pass_rating = round(mean(pass_rating)), .groups = 'keep') %>%
  data.frame()

pass_rating_league <- df %>%
  filter(position == "QB" & year >= 2019 & year <= 2021) %>%
  select(player,game_date,pass_rating) %>%
  mutate(year = year(mdy(game_date))) %>%
  group_by(player,year) %>%
  summarise(pass_rating = round(mean(pass_rating)), .groups = 'keep') %>%
  data.frame()

plot_ly(hole = 0.7) %>%
  layout(title = "Average Passer Rating by Top 10 QBs (Including Playoffs 2019-2021)") %>%
  layout(annotations = list(text = paste0("League avg pass rating: \n", 
                                          round((mean(pass_rating_league$pass_rating)))),
                            "showarrow"=F, font = list(size = 8))) %>%
  add_trace(data = pass_rating_df[pass_rating_df$year == 2021,],
            labels = ~player,
            values = ~pass_rating_df[pass_rating_df$year == 2021, "pass_rating"],
            type = "pie",
            textinfo = "value",
            textposition = "inside",
            hovertemplate = "Season: 2021<br>Player: %{label}<br>Percent: %{percent}<br> Average Passer Rating: %{value}<extra></extra>") %>%
  add_trace(data = pass_rating_df[pass_rating_df$year == 2020,],
            labels = ~player,
            values = ~pass_rating_df[pass_rating_df$year == 2020, "pass_rating"],
            type = "pie",
            textinfo = "value",
            textposition = "inside",
            hovertemplate = "Season: 2020<br>Player: %{label}<br>Percent: %{percent}<br> Average Passer Rating: %{value}<extra></extra>",
            domain = list(
              x = c(0.16,0.84),
              y = c(0.16,0.84))) %>%
  add_trace(data = pass_rating_df[pass_rating_df$year == 2019,],
            labels = ~player,
            values = ~pass_rating_df[pass_rating_df$year == 2019, "pass_rating"],
            type = "pie",
            textinfo = "value",
            textposition = "inside",
            hovertemplate = "Season: 2019<br>Player: %{label}<br>Percent: %{percent}<br> Average Passer Rating: %{value}<extra></extra>",
            domain = list(
              x = c(0.27,0.73),
              y = c(0.27,0.73)))

Conclusion

In conclusion, my analysis of the top 10 quarterbacks in the NFL from 2019-2021 provided insights into their performances over the last three seasons. I found that the top quarterbacks had consistent performances across multiple seasons, with some leading in specific categories. There was one quarterback in my opinion that really led the pack in all the analysis I did. The QB that I think had the best 3 seasons was Tom Brady. When you look at his stats it really shows that he is the greatest QB of all time. Although the talent that is behind him with players like Patrick Mahomes and Josh Allen pave the way for new records to be broken and a new greatest player of all time to emerge.