Final Project STA 518 - Jared Goff New Contract Analysis

Preface

This project should help to answer the question: How much should Jared Goff be paid based on his statistics relative to other quarterbacks?

Jared Goff is on the last year of his contract with the Detroit Lions. He has led the lions to 2 straight winning seasons, and recently led the Lions to runner-up in the NFC, a few plays away from a Superbowl appearance.

This exploratory data analysis will evaluate the trends and nature of the current quarterback contract market in the NFL, and also attempt to measure how Jared Goff statistical performance relatively stacks up against other NFL quarterbacks.

Then some statistical inference tools are flexed to test whether Jared Goff is better or the same as average NFL quarterbacks using a composite score I created.

Lastly I present an example for how bootstrapping can be used to estimate the median given a sample of quarterback passing yards total for 2023.

# Project Specific Libraries
library(nflreadr) # includes many NFL datasets
library(nflplotR) # includes geoms for NFL plots! https://nflplotr.nflverse.com/articles/nflplotR.html#lets-play-with-wordmarks-and-other-imagesj

# Dependencies
library(dplyr)
library(stringr)
library(ggplot2)
library(skimr)
library(flextable)
library(gt)
library(naniar)
library(tidyverse)

# Loading the data from the nflreadr and nflplotR packages
years <- c(2008:2023) # We only care about the recent 15 years

player_stats <- nflreadr::load_player_stats(years)
players <- nflreadr::load_players()
contracts <- load_contracts()
teams <- load_teams()

Data Dictionaries

While we create many data transformations for our different plots and tables we are largely deriving from the contracts and player_stats data sets.

contracts <- contracts |>
  select(player, year_signed, position, team, apy, years, value, guaranteed, apy_cap_pct)

dataDictionaryContracts <- tibble(Variable = colnames(contracts),
                         Description = c( "Name of the player",
                                          "Year the contract was signed",
                                          "Player's position on the team",
                                          "Team the player signed with",
                                          "Average annual salary in the contract (in millions of dollars)",
                                          "Total number of years in the contract",
                                          "Total value of the contract (in millions of dollars)",
                                          "Amount of guaranteed money in the contract (in millions of dollars)",
                                          "Average annual salary as a percentage of the salary cap"),
                         Type = map_chr(contracts, .f = function(x){typeof(x)[1]}),
                         Class = map_chr(contracts, .f = function(x){class(x)[1]}))

# Printing nicely in R Markdown
flextable::flextable(dataDictionaryContracts, cwidth = 2) |> theme_apa()

Variable	Description	Type	Class
player	Name of the player	character	character
year_signed	Year the contract was signed	integer	integer
position	Player's position on the team	character	character
team	Team the player signed with	character	character
apy	Average annual salary in the contract (in millions of dollars)	double	numeric
years	Total number of years in the contract	integer	integer
value	Total value of the contract (in millions of dollars)	double	numeric
guaranteed	Amount of guaranteed money in the contract (in millions of dollars)	double	numeric
apy_cap_pct	Average annual salary as a percentage of the salary cap	double	numeric

player_stats <- player_stats |>
  select(player_id, position, player_display_name, season, season_type, week, completions, attempts, passing_yards, passing_tds, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, rushing_yards, rushing_tds, interceptions, sack_fumbles_lost)

dataDictionaryPlayerStats <- tibble(Variable = colnames(player_stats),
                         Description = c( "Player unique identifier (used to join other datasets provided by nflreadr)",
                                          "Postion of a player on the field",
                                          "Full name of the player",
                                          "NFL season (2023 means the games played in 2023-2024 season)",
                                          "Denotes if the game was in the playoffs or the regular season",
                                          "Week of the season",
                                          "Number of completed passes by the player",
                                          "Number of attempted passes by the player",
                                          "Distance in yards gained by the player by passing",
                                          "Number of touchdown passes thrown by the player",
                                          "Distance in yards that a quarterback gained by only throwing the ball",
                                          "Distance in yards gained by the receiver after the catch credited to the passing player",
                                          "Number of first downs gained through passing plays",
                                          "Expected Points Added per pass attempt (efficiency metric)",
                                          "Distance in rushing yards gained by the player",
                                          "Number of rushing touchdowns scored by the player",
                                          "Number of interceptions thrown by the player",
                                          "Number of fumbles lost by the player when being sacked"),
                         Type = map_chr(player_stats, .f = function(x){typeof(x)[1]}),
                         Class = map_chr(player_stats, .f = function(x){class(x)[1]}))

# Printing nicely in R Markdown
flextable::flextable(dataDictionaryPlayerStats, cwidth = 2) |> theme_apa()

Variable	Description	Type	Class
player_id	Player unique identifier (used to join other datasets provided by nflreadr)	character	character
position	Postion of a player on the field	character	character
player_display_name	Full name of the player	character	character
season	NFL season (2023 means the games played in 2023-2024 season)	integer	integer
season_type	Denotes if the game was in the playoffs or the regular season	character	character
week	Week of the season	integer	integer
completions	Number of completed passes by the player	integer	integer
attempts	Number of attempted passes by the player	integer	integer
passing_yards	Distance in yards gained by the player by passing	double	numeric
passing_tds	Number of touchdown passes thrown by the player	integer	integer
passing_air_yards	Distance in yards that a quarterback gained by only throwing the ball	double	numeric
passing_yards_after_catch	Distance in yards gained by the receiver after the catch credited to the passing player	double	numeric
passing_first_downs	Number of first downs gained through passing plays	double	numeric
passing_epa	Expected Points Added per pass attempt (efficiency metric)	double	numeric
rushing_yards	Distance in rushing yards gained by the player	double	numeric
rushing_tds	Number of rushing touchdowns scored by the player	integer	integer
interceptions	Number of interceptions thrown by the player	double	numeric
sack_fumbles_lost	Number of fumbles lost by the player when being sacked	integer	integer

Checking for Missingness

contracts |>
  dplyr::select(player:apy_cap_pct) |> #all variables not id
  gg_miss_fct(fct = position)

The only variable we have to be weary of using is the years variable, as it is missing for several positions, but luckily it is not missing for quarterbacks which is the focus of this analysis.

player_stats |>
  dplyr::select(position:sack_fumbles_lost) |> #all variables not id
  gg_miss_fct(fct = position)

Again we don’t have any missing variables at the QB position, and missing passing epa for other positions is completely fine as I will not be using it in this analysis.

Data Cleaning

Building a base Quarterback table

qb_player_stats <- player_stats |>
  filter(season > 2016, season_type == "REG") |> # filter for only regular season games after 2016
  filter(position == "QB") # filter for only quarterbacks

Building Contracts Tables

qb_contracts <- contracts |>
  filter(year_signed >= 2009) |> # Last 15 years
  filter(position == "QB")

big_qb_contracts <- qb_contracts |> filter(apy > 5) # "big" means making more than 5 million a year

big_contracts <- contracts |> filter(apy > 5) |> filter(year_signed >= 2009)

# Let's add a column that multiplies the apy_cap_pct by the current cap and call it apy_cap_adj_2024

total_cap <- 255.4

top_qb_pay <- big_contracts |>
  filter(position == 'QB') |>
  filter(apy_cap_pct > 0.175) |> # Filter for yearly apy that is greater than 17.5% of the cap
  mutate(apy_cap_adj_2024 = total_cap * apy_cap_pct) |> # create new variable for what contract apy would be if signed with the same percent with the latest salary cap total 
  select(player, team, year_signed, years, value, apy_cap_pct, apy, apy_cap_adj_2024) |>
  dplyr::arrange(desc(apy_cap_adj_2024))

Merging Data Sets

Merging more information to our QB datasets to help with the visualizations

qbs <- players |> filter(position == "QB")

qb_full_info <- dplyr::left_join(qb_player_stats, qbs, join_by(player_id == gsis_id)) |>
  mutate(season = season.x, position = position.x) |>
  select(-c(season.x, season.y, position.x, position.y, jersey_number, draft_round, uniform_number, draft_number, smart_id, years_of_experience, team_seq, position_group, esb_id, status, entry_year, draft_club, status_description_abbr, college_conference, gsis_it_id, player_display_name)) # add additional information to the quarterbacks 

qb_full_info <- left_join(qb_full_info, teams, join_by(current_team_id == team_id)) # add in team information for the players current team (April 2024)

String Manipuation

I noticed one data set reported Matthew Stafford as Matt, so this is to account for that.

I build a composite score here for each QB’s weekly performance based on my domain knowledge.

top_qb_pay$player <- str_replace_all(top_qb_pay$player, pattern="Matt", replacement="Matthew") # my only stringr function, the data is unfortunately too curated for stringr use-cases in this project

top_qbs_data <- qb_full_info[qb_full_info$display_name %in% top_qb_pay$player, ] # take only the top paid players 

top_qb_weekly_score <- top_qbs_data |>
  mutate(composite_score = (passing_yards + rushing_yards) / 100.0 + completions + passing_tds + rushing_tds 
         + (passing_air_yards + passing_yards_after_catch + rushing_yards) / 100.0 - interceptions - sack_fumbles_lost) |> # create composite score column (my idea and formula)
  select(display_name, season, week, composite_score)
  
# create some filtered/grouped tables
average_top_qb_weekly_score <- top_qb_weekly_score |>
  filter(display_name != "Jared Goff") |> 
  group_by(week) |> 
  summarize(top_qb_average_score = mean(composite_score))

jared_goff_weekly_score <- top_qb_weekly_score |>
  filter(display_name == "Jared Goff")

jared_vs_qb_weekly_scores = merge(average_top_qb_weekly_score, jared_goff_weekly_score) |>
  mutate(goff_score = composite_score) |>
  select(season, week, top_qb_average_score, goff_score)

Exploratory Data Analysis

Part 1 - Understanding the Quarterback Market

This part will examine the current QB Market and how it is trending in recent years.

First lets filter the contract data for QBs drafted after 2009, and only include the columns we need

Lets create a couple of visualizations to understand Quarterback Salaries

qb_contracts |>
  ggplot(aes(x = year_signed, y = apy)) +
  geom_point() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Figure 1: Yearly NFL Quarterback Contract Values",
       x = "Year Signed",
       y = "Contract Amount per year in Millions of Dollars",
       caption = "Since 2009, Data Source: overthecap.com") +
  theme_minimal()

We observe a positive linear trend here for quarterback pay, indicating that quarterbacks are signing for more every year. (2024 signings are not complete yet)

big_qb_contracts |>
  ggplot(aes(x = apy)) +
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Figure 2: Yearly Contract Distribution of NFL Quarterbacks",
       x = "Contract Amount per year in Millions of Dollars",
       y = "Count",
       caption = "Since 2009, Data Source: overthecap.com") +
  theme_minimal()

We see a left-skewed distribution here, but the highest paid quarterbacks are stretching to over 50 million per year.

#How much do the best at valuable positions get paid?

#The filter we apply for this summary table will be any position with over 100 players that have made at least 5 million per year. 

big_contracts |>
  group_by(position) |>
  summarize(count = n(), mean = mean(apy, na.rm = TRUE), sd = sd(apy), min = min(apy), max = max(apy)) |> 
  filter(count >= 100) |>
  arrange(desc(mean)) |>
  flextable() |>
  add_header_lines(top = TRUE, value = "Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD)") |>
  theme_apa()

Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD)
position	count	mean	sd	min	max
QB	190	17.16	12.01	5.05	55.00
WR	231	11.42	5.68	5.04	32.00
ED	254	11.31	5.35	5.14	34.00
LT	117	10.91	4.73	5.12	25.00
IDL	203	10.83	5.37	5.12	31.75
CB	189	9.77	3.97	5.05	21.00
S	133	8.83	3.37	5.03	19.00
LB	153	8.63	3.14	5.01	20.00
TE	111	8.29	2.78	5.10	17.12

Since 2009 quarterbacks are making on average 5 million more than the next highest paid position.

big_contracts |> 
  ggplot(aes(x = position, y = apy)) +
  geom_boxplot(aes(fill = position)) +
  scale_fill_manual(values = teams$team_color) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  labs(y = "Yearly Contract Amount In Millions",
       x = "Position",
       title = "Figure 3: NFL Distrbution of contract amount by position",
       caption = "Data Source: Overthecap.com Since 2009") +
  theme_minimal() + 
  theme(legend.position = "none")

The QB market has the highest average mean when it comes to yearly contract value.

big_contracts |> 
  ggplot(aes(x = position, y = apy_cap_pct)) +
  geom_boxplot(aes(fill = position)) +
  scale_fill_manual(values = teams$team_color) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  labs(x = "Position",
       y = "Percentage of Salary Cap",
       title = "Figure 4: NFL Distrbution of Contract Percent of Yearly Salary Cap by Position",
       caption = "Data Source: Overthecap.com Since 2009") +
  theme_minimal() +
  theme(legend.position = "none")

From this graph we can ascertain that 50% of quarterbacks are making over 10% of their teams salary cap, and that most of the QBs above the 3rd quartile are taking more cap space than the max at any other position.

From external resources we know that the salary cap was raised 30 million dollars at the start of the 2024 off-season to 255.4 Million. Source

For this experiment let’s assume that this increase will apply proportionally to the current quarterback market as contracts expire over the next several years, but keep in mind that this might not be the case and other positions like Edge, Wide Receiver, and Defensive Tackle may increase their market share of the salary cap.

top_qb_pay |> flextable() |> theme_apa()

player	team	year_signed	years	value	apy_cap_pct	apy	apy_cap_adj_2024
Joe Burrow	Bengals	2,023	5	275.00	0.24	55.00	62.57
Aaron Rodgers	GB/NYJ	2,022	5	150.81	0.24	50.27	61.55
Josh Allen	Bills	2,021	6	258.00	0.24	43.00	60.27
Russell Wilson	Broncos	2,022	5	245.00	0.23	49.00	60.02
Justin Herbert	Chargers	2,023	5	262.50	0.23	52.50	59.76
Lamar Jackson	Ravens	2,023	5	260.00	0.23	52.00	59.00
Patrick Mahomes	Chiefs	2,020	10	450.00	0.23	45.00	57.98
Jalen Hurts	Eagles	2,023	5	255.00	0.23	51.00	57.98
Kyler Murray	Cardinals	2,022	5	230.50	0.22	46.10	56.44
Deshaun Watson	Browns	2,022	5	230.00	0.22	46.00	56.44
Dak Prescott	Cowboys	2,021	4	160.00	0.22	40.00	55.93
Deshaun Watson	Texans	2,020	4	156.00	0.20	39.00	50.31
Derek Carr	Raiders	2,022	3	121.42	0.19	40.47	49.55
Matthew Stafford	Rams	2,022	4	160.00	0.19	40.00	49.04
Aaron Rodgers	Packers	2,018	4	134.00	0.19	33.50	48.27
Russell Wilson	SEA/DEN	2,019	4	140.00	0.19	35.00	47.50
Ben Roethlisberger	Steelers	2,019	2	68.00	0.18	34.00	46.23
Aaron Rodgers	Packers	2,013	5	110.00	0.18	22.00	45.72
Jared Goff	LAR/DET	2,019	4	134.00	0.18	33.50	45.46
Daniel Jones	Giants	2,023	4	160.00	0.18	40.00	45.46
Kirk Cousins	Falcons	2,024	4	180.00	0.18	45.00	44.95

Some conclusions to make from this plot include: - NFL teams are not willing to spend more than 24% of their cap space on a quarterback - The very best Quarterbacks are making 22-24% of the overall cap space on their team - With the recent salary cap increase, if quarterbacks continue to maintain their current cap share new contracts for top quarterbacks could be 55-63 million dollars per year

*Note this table is including purely for the purpose of this analysis and thus is not titled, as it is not present as a requirement of this project.

Part 2 - How does Jared Goff statistically compare to other Quarterbacks

# Perform transformations for this plot 
top_qbs_summary <- top_qbs_data |>
  group_by(display_name) |>
  mutate(yards_per_game = passing_yards + rushing_yards) |>
  summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))

top_qbs_summary |> 
  arrange(desc(mean)) |>
  flextable() |>
  add_header_lines(top = TRUE, value = "Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016") |>
  theme_apa()

Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016
display_name	mean	median	standard_deviation	min	max
Patrick Mahomes	316.25	306.50	76.58	78.00	509.00
Justin Herbert	292.48	299.00	68.85	96.00	472.00
Deshaun Watson	284.09	292.00	88.34	5.00	473.00
Joe Burrow	282.46	279.00	92.61	81.00	536.00
Josh Allen	279.62	281.50	88.06	5.00	466.00
Dak Prescott	279.53	271.00	86.72	130.00	514.00
Kyler Murray	278.38	275.00	79.86	12.00	444.00
Ben Roethlisberger	272.53	266.50	77.14	75.00	511.00
Jared Goff	271.08	263.50	76.19	78.00	517.00
Kirk Cousins	270.35	268.50	76.70	97.00	460.00
Matthew Stafford	270.19	272.00	68.02	17.00	434.00
Aaron Rodgers	260.11	253.00	87.19	0.00	474.00
Russell Wilson	259.98	254.00	71.80	122.00	482.00
Derek Carr	254.38	252.50	75.92	53.00	441.00
Lamar Jackson	245.87	263.50	104.87	0.00	504.00
Daniel Jones	240.43	244.00	81.60	22.00	429.00
Jalen Hurts	233.89	243.00	120.16	-1.00	434.00

From this summary table we can ascertain that Jared Goff has less yards per game on average than Kyler Murray and Dak Prescott, but more than Aaron Rodgers, Lamar Jackson, and Russel Wilson.

# Perform transformations for this plot 
top_qbs_summary2 <- top_qbs_data |>
  group_by(player_id, team_color) |>
  mutate(yards_per_game = passing_yards + rushing_yards) |>
  summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))

top_qbs_summary2 |>
  ggplot(aes(x = fct_reorder(player_id, median, .desc = FALSE), y = median, fill = player_id)) +
  geom_col(color = 'black', position = position_dodge(width = 0.2)) +
  scale_fill_manual(values = top_qbs_summary2$team_color) +
  labs(title = "Figure 5: Top QB Yards Gains Since 2016",
       x = "Player",
       y = "Median Yards Gained Per Game",
       caption = "Data source: nflreadr package") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.y = element_nfl_headshot(size=1)) # Change the y-axis labels to the player's headshot

From this plot we can ascertain that Jared Goff is 12th in median yards gained per game among the top paid quarterbacks in the league.

# Perform transformations for this plot 

qb_list <- c("Patrick Mahomes", "Jared Goff", "Josh Allen", "Joe Burrow", "Aaron Rodgers") #qb list to filter on for readability, handpicked with domain knowledge
top_qbs_summary3 <- top_qbs_data |>
  group_by(display_name, season) |>
  summarize(sum_epa = sum(passing_epa)) |>
  filter(display_name %in% qb_list) # filtering logic

ggplot(top_qbs_summary3, aes(x = season, y = sum_epa, color = display_name)) +
  geom_line(size = 2) +  
  labs(title = "Figure 6: Top QB Expected Points Added Per Season Since 2016",
       x = "Year",
       y = "Average Expected Points Added",
       caption = "Data source: nflreadr package",
       color = "Player") +
  theme_minimal()

This plot tells us that while Jared has had some great seasons, he still struggles to compete with some of the other top QBs in the expected points added metric.

# Perform transformations for this plot 
top_qbs_summary4 <- top_qbs_data |>
  group_by(display_name, player_id) |>
  summarize(avg_epa = mean(passing_epa), avg_air_yards = mean(passing_air_yards))


top_qbs_summary4 |>
  ggplot(aes(x = avg_air_yards, y = avg_epa)) +
  geom_point() +
  nflplotR::geom_nfl_headshots(aes(player_gsis = player_id), width = 0.1, vjust = 0.5) + # This geom adds the player's headshot picture as the point
  labs(title = "Figure 7: Quarterback Weekly Average Passing EPA by Average Air Yards ",
       x = "Average Air Yards Per Game",
       y = "Average Expected Points Added Per Game",
       caption = "Since 2016, Data Source: nflreadr package") +
  theme_minimal()

Jared stands out here having the most points added per game while having less than 260 air yards per game, this indicates he’s a very efficient quarterback, while other quarterbacks need to pass for many more air yards to get similar expected points added.

Monte Carlo Methods of Inference

For this project and data set I wouldn’t normally expect permutation tests to be use, but in this case we can check to see whether Jared Goff having a composite score better than the average of the other top quarterbacks in the game is statistically signifcant or if it more due to random chance but generating a null distribution.

Alternative Hypothesis: (hA)

Jared Goff has a mean composite score better than the other top paid quarterbacks in the league since 2021.

Null Hypothesis: (h0)

Jared Goff has a mean composite score that is not different from the other top paid quarterbacks in the league since 2021.

observed_statistic <- mean(jared_vs_qb_weekly_scores$goff_score - jared_vs_qb_weekly_scores$top_qb_average_score)

set.seed(1999)

n_perms <- 1000 

permTs <- vector(length = n_perms)

# Calculating test statistic for each permutation
for(p in 1:n_perms) {
  combined_scores <- c(jared_vs_qb_weekly_scores$goff_score, jared_vs_qb_weekly_scores$top_qb_average_score)
  half <- length(jared_vs_qb_weekly_scores$goff_score)
  permutation <- sample(combined_scores)
  scores_A <- permutation[1:half]
  scores_B <- permutation[(half+1):length(permutation)]
  permTs[p] <- mean(scores_A - scores_B)
}

tibble(value = permTs) |>
  ggplot(aes(x = value)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Permutation simulated null distribution",
       x = "Test Statistic",
       y = "Frequency") +
  geom_vline(xintercept = quantile(permTs, 0.95), color = "red", linetype = "dashed") + # adding the 95th percentile for statistical significance
  geom_vline(xintercept = observed_statistic, color = "blue") + # adding in our observed test statistic
  theme_minimal()

mean(permTs >= observed_statistic) # calculating p-value

## [1] 0.176

With a p-value of 0.16, we fail to reject the null hypothesis that Jared Goff has a greater mean composite score than the other top paid quarterbacks in the league since 2021.

Bootstrap Methods of Inference

Again this isn’t the best technique for this given project as we do have the complete data set. However we can do a bootstrapping exercise with a subset of the data set and illustrate how bootstrapping can be a useful technique by comparing it to the actual distribution.

Here is the distribution for quarterback passing yards in 2023:

qb_full_info_2023 <- qb_full_info |> filter(season == 2023)

passing_yards <- qb_full_info_2023 |> 
  group_by(display_name) |>
  summarize(season_yards = sum(passing_yards))

passing_yards |>
  ggplot(aes(x = season_yards)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Actual QB Passing Yards Distribution",
       x = "Passing Yards",
       y = "Frequency") +
  geom_vline(xintercept = quantile(passing_yards$season_yards, 0.50), color = "red") + # median
  theme_minimal()

Now lets take a sample distribution and calculate its median.

set.seed(2000)

random_indices <- sample(1:nrow(passing_yards), 30, replace = FALSE)

passing_yards_sample <- passing_yards[random_indices, ]

passing_yards_sample |>
  ggplot(aes(x = season_yards)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Sample QB Passing Yards Distribution",
       x = "Passing Yards",
       y = "Frequency") +
  geom_vline(xintercept = quantile(passing_yards_sample$season_yards, 0.50), color = "red") + # median
  theme_minimal()

quantile(passing_yards$season_yards, 0.50) # actual median

## 50% 
## 909

quantile(passing_yards_sample$season_yards, 0.50) # sample median

##  50% 
## 1167

We can see that the actual median is 909 passing yards for 2023, but our sample median is 1167, if we bootstrap this sampled data we should be able to estimate a median confidence interval and standard error that would include the actual median.

Lets pretend its the early 1960s and we only have the 10 samples for data in passing yards because we were only able to monitor and record data for the quarterbacks that played within 5 hours of travel time. Lets now use a bootstrapping technique to estimate the actual median with 95% confidence.

B <- 1000
set.seed(2000)

boot_medians <- vector(length = B)

for(b in 1:B){
  boot_medians[b] <- median(slice_sample(passing_yards_sample, prop = 1, replace = TRUE)$season_yards) # using with replacement creates variability
}

tibble(value = boot_medians) |>
  ggplot(aes(x = value)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Bootstrapped Medians of NFL Quarterback Seasonal Passing Yard Totals",
       x = "Observed Median",
       y = "Frequency") +
  geom_vline(xintercept = quantile(boot_medians, 0.025), color = "red") + # Confidence interval lower bound
  geom_vline(xintercept = quantile(boot_medians, 0.975), color = "red") + # Confidence interval upper bound
  theme_minimal()

sd(boot_medians) # standard error

## [1] 429.2928

quantile(boot_medians, probs = c(0.025, 0.975)) # Confidence Interval 95%

##  2.5% 97.5% 
##   468  2028

Given our 30 samples and using nonparametric bootstrapping technique with replacement we are 95% confident that the median passing yards for quarterbacks in 2023 is between 468 passing yards and 2028 passing yards, with a standard error of 452.71 yards in either direction.

Conclusion

The market for top quarterbacks in the NFL is trending towards and astronomical level. With the salary cap increasing by a substantial amount in the past year, quarterbacks are likely to see around 60 million dollars per year on their contracts in the next couple of years.

We have little evidence from our analysis that Jared Goff is a significantly better performer than some of the other quarterbacks in the NFL. Our permutation test did not show that Jared’s observed greater EPA difference than our average top quarterback was statistically significant, but we did observe a positive difference. Figure 7 did indicate that Jared Goff is particularly efficient compared to other top quarterbacks, which might be reason to pay him higher even though he is middle of the pack when it comes to total yards per game, and average yards per game. One observation is that Jalen Hurts is earning a top contract even though his metrics are markedly worse than Jared Goff’s in the data we analyzed.

If we take into account that Jared Goff has led the Detroit Lions to two playoff wins in his 3 years, he may mean more to the Lions organization than Jalen Hurts means to the Eagles. Considering the salary cap increase of 30 million dollars this off-season, Jared Goff’s agent would not be unreasonable to ask for a contract of 55 million a year, which would make him the highest paid quarterback now, but may end up looking like a bargain 4 years from now considering its less than 22% of the new salary cap. However the Lions organization may have a great opportunity to sign Jared Goff at our near 50 million a year which would make him among the highest paid and where he aligns performance-wise currently. In conclusion I would suggest that Jared Goff may end up signing at about 53 million per year for his next contract.