Preface

This project should help to answer the question: How much should Jared Goff be paid based on his statistics relative to other quarterbacks?

Jared Goff is on the last year of his contract with the Detroit Lions. He has led the lions to 2 straight winning seasons, and recently led the Lions to runner-up in the NFC, a few plays away from a Superbowl appearance.

This exploratory data analysis will evaluate the trends and nature of the current quarterback contract market in the NFL, and also attempt to measure how Jared Goff statistical performance relatively stacks up against other NFL quarterbacks.

Then some statistical inference tools are flexed to test whether Jared Goff is better or the same as average NFL quarterbacks using a composite score I created.

Lastly I present an example for how bootstrapping can be used to estimate the median given a sample of quarterback passing yards total for 2023.

# Project Specific Libraries
library(nflreadr) # includes many NFL datasets
library(nflplotR) # includes geoms for NFL plots! https://nflplotr.nflverse.com/articles/nflplotR.html#lets-play-with-wordmarks-and-other-imagesj

# Dependencies
library(dplyr)
library(stringr)
library(ggplot2)
library(skimr)
library(flextable)
library(gt)
library(naniar)
library(tidyverse)
# Loading the data from the nflreadr and nflplotR packages
years <- c(2008:2023) # We only care about the recent 15 years

player_stats <- nflreadr::load_player_stats(years)
players <- nflreadr::load_players()
contracts <- load_contracts()
teams <- load_teams()

Data Dictionaries

While we create many data transformations for our different plots and tables we are largely deriving from the contracts and player_stats data sets.

contracts <- contracts |>
  select(player, year_signed, position, team, apy, years, value, guaranteed, apy_cap_pct)

dataDictionaryContracts <- tibble(Variable = colnames(contracts),
                         Description = c( "Name of the player",
                                          "Year the contract was signed",
                                          "Player's position on the team",
                                          "Team the player signed with",
                                          "Average annual salary in the contract (in millions of dollars)",
                                          "Total number of years in the contract",
                                          "Total value of the contract (in millions of dollars)",
                                          "Amount of guaranteed money in the contract (in millions of dollars)",
                                          "Average annual salary as a percentage of the salary cap"),
                         Type = map_chr(contracts, .f = function(x){typeof(x)[1]}),
                         Class = map_chr(contracts, .f = function(x){class(x)[1]}))

# Printing nicely in R Markdown
flextable::flextable(dataDictionaryContracts, cwidth = 2) |> theme_apa()

Variable

Description

Type

Class

player

Name of the player

character

character

year_signed

Year the contract was signed

integer

integer

position

Player's position on the team

character

character

team

Team the player signed with

character

character

apy

Average annual salary in the contract (in millions of dollars)

double

numeric

years

Total number of years in the contract

integer

integer

value

Total value of the contract (in millions of dollars)

double

numeric

guaranteed

Amount of guaranteed money in the contract (in millions of dollars)

double

numeric

apy_cap_pct

Average annual salary as a percentage of the salary cap

double

numeric

player_stats <- player_stats |>
  select(player_id, position, player_display_name, season, season_type, week, completions, attempts, passing_yards, passing_tds, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, rushing_yards, rushing_tds, interceptions, sack_fumbles_lost)

dataDictionaryPlayerStats <- tibble(Variable = colnames(player_stats),
                         Description = c( "Player unique identifier (used to join other datasets provided by nflreadr)",
                                          "Postion of a player on the field",
                                          "Full name of the player",
                                          "NFL season (2023 means the games played in 2023-2024 season)",
                                          "Denotes if the game was in the playoffs or the regular season",
                                          "Week of the season",
                                          "Number of completed passes by the player",
                                          "Number of attempted passes by the player",
                                          "Distance in yards gained by the player by passing",
                                          "Number of touchdown passes thrown by the player",
                                          "Distance in yards that a quarterback gained by only throwing the ball",
                                          "Distance in yards gained by the receiver after the catch credited to the passing player",
                                          "Number of first downs gained through passing plays",
                                          "Expected Points Added per pass attempt (efficiency metric)",
                                          "Distance in rushing yards gained by the player",
                                          "Number of rushing touchdowns scored by the player",
                                          "Number of interceptions thrown by the player",
                                          "Number of fumbles lost by the player when being sacked"),
                         Type = map_chr(player_stats, .f = function(x){typeof(x)[1]}),
                         Class = map_chr(player_stats, .f = function(x){class(x)[1]}))

# Printing nicely in R Markdown
flextable::flextable(dataDictionaryPlayerStats, cwidth = 2) |> theme_apa()

Variable

Description

Type

Class

player_id

Player unique identifier (used to join other datasets provided by nflreadr)

character

character

position

Postion of a player on the field

character

character

player_display_name

Full name of the player

character

character

season

NFL season (2023 means the games played in 2023-2024 season)

integer

integer

season_type

Denotes if the game was in the playoffs or the regular season

character

character

week

Week of the season

integer

integer

completions

Number of completed passes by the player

integer

integer

attempts

Number of attempted passes by the player

integer

integer

passing_yards

Distance in yards gained by the player by passing

double

numeric

passing_tds

Number of touchdown passes thrown by the player

integer

integer

passing_air_yards

Distance in yards that a quarterback gained by only throwing the ball

double

numeric

passing_yards_after_catch

Distance in yards gained by the receiver after the catch credited to the passing player

double

numeric

passing_first_downs

Number of first downs gained through passing plays

double

numeric

passing_epa

Expected Points Added per pass attempt (efficiency metric)

double

numeric

rushing_yards

Distance in rushing yards gained by the player

double

numeric

rushing_tds

Number of rushing touchdowns scored by the player

integer

integer

interceptions

Number of interceptions thrown by the player

double

numeric

sack_fumbles_lost

Number of fumbles lost by the player when being sacked

integer

integer

Checking for Missingness

contracts |>
  dplyr::select(player:apy_cap_pct) |> #all variables not id
  gg_miss_fct(fct = position)

The only variable we have to be weary of using is the years variable, as it is missing for several positions, but luckily it is not missing for quarterbacks which is the focus of this analysis.

player_stats |>
  dplyr::select(position:sack_fumbles_lost) |> #all variables not id
  gg_miss_fct(fct = position)

Again we don’t have any missing variables at the QB position, and missing passing epa for other positions is completely fine as I will not be using it in this analysis.

Data Cleaning

Building a base Quarterback table

qb_player_stats <- player_stats |>
  filter(season > 2016, season_type == "REG") |> # filter for only regular season games after 2016
  filter(position == "QB") # filter for only quarterbacks

Building Contracts Tables

qb_contracts <- contracts |>
  filter(year_signed >= 2009) |> # Last 15 years
  filter(position == "QB")

big_qb_contracts <- qb_contracts |> filter(apy > 5) # "big" means making more than 5 million a year

big_contracts <- contracts |> filter(apy > 5) |> filter(year_signed >= 2009)

# Let's add a column that multiplies the apy_cap_pct by the current cap and call it apy_cap_adj_2024

total_cap <- 255.4

top_qb_pay <- big_contracts |>
  filter(position == 'QB') |>
  filter(apy_cap_pct > 0.175) |> # Filter for yearly apy that is greater than 17.5% of the cap
  mutate(apy_cap_adj_2024 = total_cap * apy_cap_pct) |> # create new variable for what contract apy would be if signed with the same percent with the latest salary cap total 
  select(player, team, year_signed, years, value, apy_cap_pct, apy, apy_cap_adj_2024) |>
  dplyr::arrange(desc(apy_cap_adj_2024))

Merging Data Sets

Merging more information to our QB datasets to help with the visualizations

qbs <- players |> filter(position == "QB")

qb_full_info <- dplyr::left_join(qb_player_stats, qbs, join_by(player_id == gsis_id)) |>
  mutate(season = season.x, position = position.x) |>
  select(-c(season.x, season.y, position.x, position.y, jersey_number, draft_round, uniform_number, draft_number, smart_id, years_of_experience, team_seq, position_group, esb_id, status, entry_year, draft_club, status_description_abbr, college_conference, gsis_it_id, player_display_name)) # add additional information to the quarterbacks 

qb_full_info <- left_join(qb_full_info, teams, join_by(current_team_id == team_id)) # add in team information for the players current team (April 2024)

String Manipuation

I noticed one data set reported Matthew Stafford as Matt, so this is to account for that.

I build a composite score here for each QB’s weekly performance based on my domain knowledge.

top_qb_pay$player <- str_replace_all(top_qb_pay$player, pattern="Matt", replacement="Matthew") # my only stringr function, the data is unfortunately too curated for stringr use-cases in this project

top_qbs_data <- qb_full_info[qb_full_info$display_name %in% top_qb_pay$player, ] # take only the top paid players 

top_qb_weekly_score <- top_qbs_data |>
  mutate(composite_score = (passing_yards + rushing_yards) / 100.0 + completions + passing_tds + rushing_tds 
         + (passing_air_yards + passing_yards_after_catch + rushing_yards) / 100.0 - interceptions - sack_fumbles_lost) |> # create composite score column (my idea and formula)
  select(display_name, season, week, composite_score)
  
# create some filtered/grouped tables
average_top_qb_weekly_score <- top_qb_weekly_score |>
  filter(display_name != "Jared Goff") |> 
  group_by(week) |> 
  summarize(top_qb_average_score = mean(composite_score))

jared_goff_weekly_score <- top_qb_weekly_score |>
  filter(display_name == "Jared Goff")

jared_vs_qb_weekly_scores = merge(average_top_qb_weekly_score, jared_goff_weekly_score) |>
  mutate(goff_score = composite_score) |>
  select(season, week, top_qb_average_score, goff_score)

Exploratory Data Analysis

Part 1 - Understanding the Quarterback Market

This part will examine the current QB Market and how it is trending in recent years.

First lets filter the contract data for QBs drafted after 2009, and only include the columns we need

Lets create a couple of visualizations to understand Quarterback Salaries

qb_contracts |>
  ggplot(aes(x = year_signed, y = apy)) +
  geom_point() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Figure 1: Yearly NFL Quarterback Contract Values",
       x = "Year Signed",
       y = "Contract Amount per year in Millions of Dollars",
       caption = "Since 2009, Data Source: overthecap.com") +
  theme_minimal()

We observe a positive linear trend here for quarterback pay, indicating that quarterbacks are signing for more every year. (2024 signings are not complete yet)

big_qb_contracts |>
  ggplot(aes(x = apy)) +
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Figure 2: Yearly Contract Distribution of NFL Quarterbacks",
       x = "Contract Amount per year in Millions of Dollars",
       y = "Count",
       caption = "Since 2009, Data Source: overthecap.com") +
  theme_minimal()

We see a left-skewed distribution here, but the highest paid quarterbacks are stretching to over 50 million per year.

#How much do the best at valuable positions get paid?

#The filter we apply for this summary table will be any position with over 100 players that have made at least 5 million per year. 

big_contracts |>
  group_by(position) |>
  summarize(count = n(), mean = mean(apy, na.rm = TRUE), sd = sd(apy), min = min(apy), max = max(apy)) |> 
  filter(count >= 100) |>
  arrange(desc(mean)) |>
  flextable() |>
  add_header_lines(top = TRUE, value = "Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD)") |>
  theme_apa()

Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD)

position

count

mean

sd

min

max

QB

190

17.16

12.01

5.05

55.00

WR

231

11.42

5.68

5.04

32.00

ED

254

11.31

5.35

5.14

34.00

LT

117

10.91

4.73

5.12

25.00

IDL

203

10.83

5.37

5.12

31.75

CB

189

9.77

3.97

5.05

21.00

S

133

8.83

3.37

5.03

19.00

LB

153

8.63

3.14

5.01

20.00

TE

111

8.29

2.78

5.10

17.12

Since 2009 quarterbacks are making on average 5 million more than the next highest paid position.

big_contracts |> 
  ggplot(aes(x = position, y = apy)) +
  geom_boxplot(aes(fill = position)) +
  scale_fill_manual(values = teams$team_color) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  labs(y = "Yearly Contract Amount In Millions",
       x = "Position",
       title = "Figure 3: NFL Distrbution of contract amount by position",
       caption = "Data Source: Overthecap.com Since 2009") +
  theme_minimal() + 
  theme(legend.position = "none") 

The QB market has the highest average mean when it comes to yearly contract value.

big_contracts |> 
  ggplot(aes(x = position, y = apy_cap_pct)) +
  geom_boxplot(aes(fill = position)) +
  scale_fill_manual(values = teams$team_color) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  labs(x = "Position",
       y = "Percentage of Salary Cap",
       title = "Figure 4: NFL Distrbution of Contract Percent of Yearly Salary Cap by Position",
       caption = "Data Source: Overthecap.com Since 2009") +
  theme_minimal() +
  theme(legend.position = "none")

From this graph we can ascertain that 50% of quarterbacks are making over 10% of their teams salary cap, and that most of the QBs above the 3rd quartile are taking more cap space than the max at any other position.

From external resources we know that the salary cap was raised 30 million dollars at the start of the 2024 off-season to 255.4 Million. Source

For this experiment let’s assume that this increase will apply proportionally to the current quarterback market as contracts expire over the next several years, but keep in mind that this might not be the case and other positions like Edge, Wide Receiver, and Defensive Tackle may increase their market share of the salary cap.

top_qb_pay |> flextable() |> theme_apa()

player

team

year_signed

years

value

apy_cap_pct

apy

apy_cap_adj_2024

Joe Burrow

Bengals

2,023

5

275.00

0.24

55.00

62.57

Aaron Rodgers

GB/NYJ

2,022

5

150.81

0.24

50.27

61.55

Josh Allen

Bills

2,021

6

258.00

0.24

43.00

60.27

Russell Wilson

Broncos

2,022

5

245.00

0.23

49.00

60.02

Justin Herbert

Chargers

2,023

5

262.50

0.23

52.50

59.76

Lamar Jackson

Ravens

2,023

5

260.00

0.23

52.00

59.00

Patrick Mahomes

Chiefs

2,020

10

450.00

0.23

45.00

57.98

Jalen Hurts

Eagles

2,023

5

255.00

0.23

51.00

57.98

Kyler Murray

Cardinals

2,022

5

230.50

0.22

46.10

56.44

Deshaun Watson

Browns

2,022

5

230.00

0.22

46.00

56.44

Dak Prescott

Cowboys

2,021

4

160.00

0.22

40.00

55.93

Deshaun Watson

Texans

2,020

4

156.00

0.20

39.00

50.31

Derek Carr

Raiders

2,022

3

121.42

0.19

40.47

49.55

Matthew Stafford

Rams

2,022

4

160.00

0.19

40.00

49.04

Aaron Rodgers

Packers

2,018

4

134.00

0.19

33.50

48.27

Russell Wilson

SEA/DEN

2,019

4

140.00

0.19

35.00

47.50

Ben Roethlisberger

Steelers

2,019

2

68.00

0.18

34.00

46.23

Aaron Rodgers

Packers

2,013

5

110.00

0.18

22.00

45.72

Jared Goff

LAR/DET

2,019

4

134.00

0.18

33.50

45.46

Daniel Jones

Giants

2,023

4

160.00

0.18

40.00

45.46

Kirk Cousins

Falcons

2,024

4

180.00

0.18

45.00

44.95

Some conclusions to make from this plot include: - NFL teams are not willing to spend more than 24% of their cap space on a quarterback - The very best Quarterbacks are making 22-24% of the overall cap space on their team - With the recent salary cap increase, if quarterbacks continue to maintain their current cap share new contracts for top quarterbacks could be 55-63 million dollars per year

*Note this table is including purely for the purpose of this analysis and thus is not titled, as it is not present as a requirement of this project.

Part 2 - How does Jared Goff statistically compare to other Quarterbacks

# Perform transformations for this plot 
top_qbs_summary <- top_qbs_data |>
  group_by(display_name) |>
  mutate(yards_per_game = passing_yards + rushing_yards) |>
  summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))

top_qbs_summary |> 
  arrange(desc(mean)) |>
  flextable() |>
  add_header_lines(top = TRUE, value = "Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016") |>
  theme_apa()

Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016

display_name

mean

median

standard_deviation

min

max

Patrick Mahomes

316.25

306.50

76.58

78.00

509.00

Justin Herbert

292.48

299.00

68.85

96.00

472.00

Deshaun Watson

284.09

292.00

88.34

5.00

473.00

Joe Burrow

282.46

279.00

92.61

81.00

536.00

Josh Allen

279.62

281.50

88.06

5.00

466.00

Dak Prescott

279.53

271.00

86.72

130.00

514.00

Kyler Murray

278.38

275.00

79.86

12.00

444.00

Ben Roethlisberger

272.53

266.50

77.14

75.00

511.00

Jared Goff

271.08

263.50

76.19

78.00

517.00

Kirk Cousins

270.35

268.50

76.70

97.00

460.00

Matthew Stafford

270.19

272.00

68.02

17.00

434.00

Aaron Rodgers

260.11

253.00

87.19

0.00

474.00

Russell Wilson

259.98

254.00

71.80

122.00

482.00

Derek Carr

254.38

252.50

75.92

53.00

441.00

Lamar Jackson

245.87

263.50

104.87

0.00

504.00

Daniel Jones

240.43

244.00

81.60

22.00

429.00

Jalen Hurts

233.89

243.00

120.16

-1.00

434.00

From this summary table we can ascertain that Jared Goff has less yards per game on average than Kyler Murray and Dak Prescott, but more than Aaron Rodgers, Lamar Jackson, and Russel Wilson.

# Perform transformations for this plot 
top_qbs_summary2 <- top_qbs_data |>
  group_by(player_id, team_color) |>
  mutate(yards_per_game = passing_yards + rushing_yards) |>
  summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))

top_qbs_summary2 |>
  ggplot(aes(x = fct_reorder(player_id, median, .desc = FALSE), y = median, fill = player_id)) +
  geom_col(color = 'black', position = position_dodge(width = 0.2)) +
  scale_fill_manual(values = top_qbs_summary2$team_color) +
  labs(title = "Figure 5: Top QB Yards Gains Since 2016",
       x = "Player",
       y = "Median Yards Gained Per Game",
       caption = "Data source: nflreadr package") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.y = element_nfl_headshot(size=1)) # Change the y-axis labels to the player's headshot

From this plot we can ascertain that Jared Goff is 12th in median yards gained per game among the top paid quarterbacks in the league.

# Perform transformations for this plot 

qb_list <- c("Patrick Mahomes", "Jared Goff", "Josh Allen", "Joe Burrow", "Aaron Rodgers") #qb list to filter on for readability, handpicked with domain knowledge
top_qbs_summary3 <- top_qbs_data |>
  group_by(display_name, season) |>
  summarize(sum_epa = sum(passing_epa)) |>
  filter(display_name %in% qb_list) # filtering logic

ggplot(top_qbs_summary3, aes(x = season, y = sum_epa, color = display_name)) +
  geom_line(size = 2) +  
  labs(title = "Figure 6: Top QB Expected Points Added Per Season Since 2016",
       x = "Year",
       y = "Average Expected Points Added",
       caption = "Data source: nflreadr package",
       color = "Player") +
  theme_minimal()

This plot tells us that while Jared has had some great seasons, he still struggles to compete with some of the other top QBs in the expected points added metric.

# Perform transformations for this plot 
top_qbs_summary4 <- top_qbs_data |>
  group_by(display_name, player_id) |>
  summarize(avg_epa = mean(passing_epa), avg_air_yards = mean(passing_air_yards))


top_qbs_summary4 |>
  ggplot(aes(x = avg_air_yards, y = avg_epa)) +
  geom_point() +
  nflplotR::geom_nfl_headshots(aes(player_gsis = player_id), width = 0.1, vjust = 0.5) + # This geom adds the player's headshot picture as the point
  labs(title = "Figure 7: Quarterback Weekly Average Passing EPA by Average Air Yards ",
       x = "Average Air Yards Per Game",
       y = "Average Expected Points Added Per Game",
       caption = "Since 2016, Data Source: nflreadr package") +
  theme_minimal()

Jared stands out here having the most points added per game while having less than 260 air yards per game, this indicates he’s a very efficient quarterback, while other quarterbacks need to pass for many more air yards to get similar expected points added.

Monte Carlo Methods of Inference

For this project and data set I wouldn’t normally expect permutation tests to be use, but in this case we can check to see whether Jared Goff having a composite score better than the average of the other top quarterbacks in the game is statistically signifcant or if it more due to random chance but generating a null distribution.

Alternative Hypothesis: (hA)

Jared Goff has a mean composite score better than the other top paid quarterbacks in the league since 2021.

Null Hypothesis: (h0)

Jared Goff has a mean composite score that is not different from the other top paid quarterbacks in the league since 2021.

observed_statistic <- mean(jared_vs_qb_weekly_scores$goff_score - jared_vs_qb_weekly_scores$top_qb_average_score)

set.seed(1999)

n_perms <- 1000 

permTs <- vector(length = n_perms)

# Calculating test statistic for each permutation
for(p in 1:n_perms) {
  combined_scores <- c(jared_vs_qb_weekly_scores$goff_score, jared_vs_qb_weekly_scores$top_qb_average_score)
  half <- length(jared_vs_qb_weekly_scores$goff_score)
  permutation <- sample(combined_scores)
  scores_A <- permutation[1:half]
  scores_B <- permutation[(half+1):length(permutation)]
  permTs[p] <- mean(scores_A - scores_B)
}
tibble(value = permTs) |>
  ggplot(aes(x = value)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Permutation simulated null distribution",
       x = "Test Statistic",
       y = "Frequency") +
  geom_vline(xintercept = quantile(permTs, 0.95), color = "red", linetype = "dashed") + # adding the 95th percentile for statistical significance
  geom_vline(xintercept = observed_statistic, color = "blue") + # adding in our observed test statistic
  theme_minimal()

mean(permTs >= observed_statistic) # calculating p-value
## [1] 0.176

With a p-value of 0.16, we fail to reject the null hypothesis that Jared Goff has a greater mean composite score than the other top paid quarterbacks in the league since 2021.

Bootstrap Methods of Inference

Again this isn’t the best technique for this given project as we do have the complete data set. However we can do a bootstrapping exercise with a subset of the data set and illustrate how bootstrapping can be a useful technique by comparing it to the actual distribution.

Here is the distribution for quarterback passing yards in 2023:

qb_full_info_2023 <- qb_full_info |> filter(season == 2023)

passing_yards <- qb_full_info_2023 |> 
  group_by(display_name) |>
  summarize(season_yards = sum(passing_yards))

passing_yards |>
  ggplot(aes(x = season_yards)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Actual QB Passing Yards Distribution",
       x = "Passing Yards",
       y = "Frequency") +
  geom_vline(xintercept = quantile(passing_yards$season_yards, 0.50), color = "red") + # median
  theme_minimal()

Now lets take a sample distribution and calculate its median.

set.seed(2000)

random_indices <- sample(1:nrow(passing_yards), 30, replace = FALSE)

passing_yards_sample <- passing_yards[random_indices, ]

passing_yards_sample |>
  ggplot(aes(x = season_yards)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Sample QB Passing Yards Distribution",
       x = "Passing Yards",
       y = "Frequency") +
  geom_vline(xintercept = quantile(passing_yards_sample$season_yards, 0.50), color = "red") + # median
  theme_minimal()

quantile(passing_yards$season_yards, 0.50) # actual median
## 50% 
## 909
quantile(passing_yards_sample$season_yards, 0.50) # sample median
##  50% 
## 1167

We can see that the actual median is 909 passing yards for 2023, but our sample median is 1167, if we bootstrap this sampled data we should be able to estimate a median confidence interval and standard error that would include the actual median.

Lets pretend its the early 1960s and we only have the 10 samples for data in passing yards because we were only able to monitor and record data for the quarterbacks that played within 5 hours of travel time. Lets now use a bootstrapping technique to estimate the actual median with 95% confidence.

B <- 1000
set.seed(2000)

boot_medians <- vector(length = B)

for(b in 1:B){
  boot_medians[b] <- median(slice_sample(passing_yards_sample, prop = 1, replace = TRUE)$season_yards) # using with replacement creates variability
}

tibble(value = boot_medians) |>
  ggplot(aes(x = value)) + 
  geom_histogram(color = "white") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Bootstrapped Medians of NFL Quarterback Seasonal Passing Yard Totals",
       x = "Observed Median",
       y = "Frequency") +
  geom_vline(xintercept = quantile(boot_medians, 0.025), color = "red") + # Confidence interval lower bound
  geom_vline(xintercept = quantile(boot_medians, 0.975), color = "red") + # Confidence interval upper bound
  theme_minimal()

sd(boot_medians) # standard error 
## [1] 429.2928
quantile(boot_medians, probs = c(0.025, 0.975)) # Confidence Interval 95%
##  2.5% 97.5% 
##   468  2028

Given our 30 samples and using nonparametric bootstrapping technique with replacement we are 95% confident that the median passing yards for quarterbacks in 2023 is between 468 passing yards and 2028 passing yards, with a standard error of 452.71 yards in either direction.

Conclusion

The market for top quarterbacks in the NFL is trending towards and astronomical level. With the salary cap increasing by a substantial amount in the past year, quarterbacks are likely to see around 60 million dollars per year on their contracts in the next couple of years.

We have little evidence from our analysis that Jared Goff is a significantly better performer than some of the other quarterbacks in the NFL. Our permutation test did not show that Jared’s observed greater EPA difference than our average top quarterback was statistically significant, but we did observe a positive difference. Figure 7 did indicate that Jared Goff is particularly efficient compared to other top quarterbacks, which might be reason to pay him higher even though he is middle of the pack when it comes to total yards per game, and average yards per game. One observation is that Jalen Hurts is earning a top contract even though his metrics are markedly worse than Jared Goff’s in the data we analyzed.

If we take into account that Jared Goff has led the Detroit Lions to two playoff wins in his 3 years, he may mean more to the Lions organization than Jalen Hurts means to the Eagles. Considering the salary cap increase of 30 million dollars this off-season, Jared Goff’s agent would not be unreasonable to ask for a contract of 55 million a year, which would make him the highest paid quarterback now, but may end up looking like a bargain 4 years from now considering its less than 22% of the new salary cap. However the Lions organization may have a great opportunity to sign Jared Goff at our near 50 million a year which would make him among the highest paid and where he aligns performance-wise currently. In conclusion I would suggest that Jared Goff may end up signing at about 53 million per year for his next contract.