This project should help to answer the question: How much should Jared Goff be paid based on his statistics relative to other quarterbacks?
Jared Goff is on the last year of his contract with the Detroit Lions. He has led the lions to 2 straight winning seasons, and recently led the Lions to runner-up in the NFC, a few plays away from a Superbowl appearance.
This exploratory data analysis will evaluate the trends and nature of the current quarterback contract market in the NFL, and also attempt to measure how Jared Goff statistical performance relatively stacks up against other NFL quarterbacks.
Then some statistical inference tools are flexed to test whether Jared Goff is better or the same as average NFL quarterbacks using a composite score I created.
Lastly I present an example for how bootstrapping can be used to estimate the median given a sample of quarterback passing yards total for 2023.
# Project Specific Libraries
library(nflreadr) # includes many NFL datasets
library(nflplotR) # includes geoms for NFL plots! https://nflplotr.nflverse.com/articles/nflplotR.html#lets-play-with-wordmarks-and-other-imagesj
# Dependencies
library(dplyr)
library(stringr)
library(ggplot2)
library(skimr)
library(flextable)
library(gt)
library(naniar)
library(tidyverse)
# Loading the data from the nflreadr and nflplotR packages
years <- c(2008:2023) # We only care about the recent 15 years
player_stats <- nflreadr::load_player_stats(years)
players <- nflreadr::load_players()
contracts <- load_contracts()
teams <- load_teams()
While we create many data transformations for our different plots and tables we are largely deriving from the contracts and player_stats data sets.
contracts <- contracts |>
select(player, year_signed, position, team, apy, years, value, guaranteed, apy_cap_pct)
dataDictionaryContracts <- tibble(Variable = colnames(contracts),
Description = c( "Name of the player",
"Year the contract was signed",
"Player's position on the team",
"Team the player signed with",
"Average annual salary in the contract (in millions of dollars)",
"Total number of years in the contract",
"Total value of the contract (in millions of dollars)",
"Amount of guaranteed money in the contract (in millions of dollars)",
"Average annual salary as a percentage of the salary cap"),
Type = map_chr(contracts, .f = function(x){typeof(x)[1]}),
Class = map_chr(contracts, .f = function(x){class(x)[1]}))
# Printing nicely in R Markdown
flextable::flextable(dataDictionaryContracts, cwidth = 2) |> theme_apa()
Variable | Description | Type | Class |
|---|---|---|---|
player | Name of the player | character | character |
year_signed | Year the contract was signed | integer | integer |
position | Player's position on the team | character | character |
team | Team the player signed with | character | character |
apy | Average annual salary in the contract (in millions of dollars) | double | numeric |
years | Total number of years in the contract | integer | integer |
value | Total value of the contract (in millions of dollars) | double | numeric |
guaranteed | Amount of guaranteed money in the contract (in millions of dollars) | double | numeric |
apy_cap_pct | Average annual salary as a percentage of the salary cap | double | numeric |
player_stats <- player_stats |>
select(player_id, position, player_display_name, season, season_type, week, completions, attempts, passing_yards, passing_tds, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, rushing_yards, rushing_tds, interceptions, sack_fumbles_lost)
dataDictionaryPlayerStats <- tibble(Variable = colnames(player_stats),
Description = c( "Player unique identifier (used to join other datasets provided by nflreadr)",
"Postion of a player on the field",
"Full name of the player",
"NFL season (2023 means the games played in 2023-2024 season)",
"Denotes if the game was in the playoffs or the regular season",
"Week of the season",
"Number of completed passes by the player",
"Number of attempted passes by the player",
"Distance in yards gained by the player by passing",
"Number of touchdown passes thrown by the player",
"Distance in yards that a quarterback gained by only throwing the ball",
"Distance in yards gained by the receiver after the catch credited to the passing player",
"Number of first downs gained through passing plays",
"Expected Points Added per pass attempt (efficiency metric)",
"Distance in rushing yards gained by the player",
"Number of rushing touchdowns scored by the player",
"Number of interceptions thrown by the player",
"Number of fumbles lost by the player when being sacked"),
Type = map_chr(player_stats, .f = function(x){typeof(x)[1]}),
Class = map_chr(player_stats, .f = function(x){class(x)[1]}))
# Printing nicely in R Markdown
flextable::flextable(dataDictionaryPlayerStats, cwidth = 2) |> theme_apa()
Variable | Description | Type | Class |
|---|---|---|---|
player_id | Player unique identifier (used to join other datasets provided by nflreadr) | character | character |
position | Postion of a player on the field | character | character |
player_display_name | Full name of the player | character | character |
season | NFL season (2023 means the games played in 2023-2024 season) | integer | integer |
season_type | Denotes if the game was in the playoffs or the regular season | character | character |
week | Week of the season | integer | integer |
completions | Number of completed passes by the player | integer | integer |
attempts | Number of attempted passes by the player | integer | integer |
passing_yards | Distance in yards gained by the player by passing | double | numeric |
passing_tds | Number of touchdown passes thrown by the player | integer | integer |
passing_air_yards | Distance in yards that a quarterback gained by only throwing the ball | double | numeric |
passing_yards_after_catch | Distance in yards gained by the receiver after the catch credited to the passing player | double | numeric |
passing_first_downs | Number of first downs gained through passing plays | double | numeric |
passing_epa | Expected Points Added per pass attempt (efficiency metric) | double | numeric |
rushing_yards | Distance in rushing yards gained by the player | double | numeric |
rushing_tds | Number of rushing touchdowns scored by the player | integer | integer |
interceptions | Number of interceptions thrown by the player | double | numeric |
sack_fumbles_lost | Number of fumbles lost by the player when being sacked | integer | integer |
contracts |>
dplyr::select(player:apy_cap_pct) |> #all variables not id
gg_miss_fct(fct = position)
The only variable we have to be weary of using is the years variable, as it is missing for several positions, but luckily it is not missing for quarterbacks which is the focus of this analysis.
player_stats |>
dplyr::select(position:sack_fumbles_lost) |> #all variables not id
gg_miss_fct(fct = position)
Again we don’t have any missing variables at the QB position, and missing passing epa for other positions is completely fine as I will not be using it in this analysis.
Building a base Quarterback table
qb_player_stats <- player_stats |>
filter(season > 2016, season_type == "REG") |> # filter for only regular season games after 2016
filter(position == "QB") # filter for only quarterbacks
Building Contracts Tables
qb_contracts <- contracts |>
filter(year_signed >= 2009) |> # Last 15 years
filter(position == "QB")
big_qb_contracts <- qb_contracts |> filter(apy > 5) # "big" means making more than 5 million a year
big_contracts <- contracts |> filter(apy > 5) |> filter(year_signed >= 2009)
# Let's add a column that multiplies the apy_cap_pct by the current cap and call it apy_cap_adj_2024
total_cap <- 255.4
top_qb_pay <- big_contracts |>
filter(position == 'QB') |>
filter(apy_cap_pct > 0.175) |> # Filter for yearly apy that is greater than 17.5% of the cap
mutate(apy_cap_adj_2024 = total_cap * apy_cap_pct) |> # create new variable for what contract apy would be if signed with the same percent with the latest salary cap total
select(player, team, year_signed, years, value, apy_cap_pct, apy, apy_cap_adj_2024) |>
dplyr::arrange(desc(apy_cap_adj_2024))
Merging more information to our QB datasets to help with the visualizations
qbs <- players |> filter(position == "QB")
qb_full_info <- dplyr::left_join(qb_player_stats, qbs, join_by(player_id == gsis_id)) |>
mutate(season = season.x, position = position.x) |>
select(-c(season.x, season.y, position.x, position.y, jersey_number, draft_round, uniform_number, draft_number, smart_id, years_of_experience, team_seq, position_group, esb_id, status, entry_year, draft_club, status_description_abbr, college_conference, gsis_it_id, player_display_name)) # add additional information to the quarterbacks
qb_full_info <- left_join(qb_full_info, teams, join_by(current_team_id == team_id)) # add in team information for the players current team (April 2024)
I noticed one data set reported Matthew Stafford as Matt, so this is to account for that.
I build a composite score here for each QB’s weekly performance based on my domain knowledge.
top_qb_pay$player <- str_replace_all(top_qb_pay$player, pattern="Matt", replacement="Matthew") # my only stringr function, the data is unfortunately too curated for stringr use-cases in this project
top_qbs_data <- qb_full_info[qb_full_info$display_name %in% top_qb_pay$player, ] # take only the top paid players
top_qb_weekly_score <- top_qbs_data |>
mutate(composite_score = (passing_yards + rushing_yards) / 100.0 + completions + passing_tds + rushing_tds
+ (passing_air_yards + passing_yards_after_catch + rushing_yards) / 100.0 - interceptions - sack_fumbles_lost) |> # create composite score column (my idea and formula)
select(display_name, season, week, composite_score)
# create some filtered/grouped tables
average_top_qb_weekly_score <- top_qb_weekly_score |>
filter(display_name != "Jared Goff") |>
group_by(week) |>
summarize(top_qb_average_score = mean(composite_score))
jared_goff_weekly_score <- top_qb_weekly_score |>
filter(display_name == "Jared Goff")
jared_vs_qb_weekly_scores = merge(average_top_qb_weekly_score, jared_goff_weekly_score) |>
mutate(goff_score = composite_score) |>
select(season, week, top_qb_average_score, goff_score)
This part will examine the current QB Market and how it is trending in recent years.
First lets filter the contract data for QBs drafted after 2009, and only include the columns we need
Lets create a couple of visualizations to understand Quarterback Salaries
qb_contracts |>
ggplot(aes(x = year_signed, y = apy)) +
geom_point() +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Figure 1: Yearly NFL Quarterback Contract Values",
x = "Year Signed",
y = "Contract Amount per year in Millions of Dollars",
caption = "Since 2009, Data Source: overthecap.com") +
theme_minimal()
We observe a positive linear trend here for quarterback pay, indicating that quarterbacks are signing for more every year. (2024 signings are not complete yet)
big_qb_contracts |>
ggplot(aes(x = apy)) +
geom_histogram(color = "white") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Figure 2: Yearly Contract Distribution of NFL Quarterbacks",
x = "Contract Amount per year in Millions of Dollars",
y = "Count",
caption = "Since 2009, Data Source: overthecap.com") +
theme_minimal()
We see a left-skewed distribution here, but the highest paid quarterbacks are stretching to over 50 million per year.
#How much do the best at valuable positions get paid?
#The filter we apply for this summary table will be any position with over 100 players that have made at least 5 million per year.
big_contracts |>
group_by(position) |>
summarize(count = n(), mean = mean(apy, na.rm = TRUE), sd = sd(apy), min = min(apy), max = max(apy)) |>
filter(count >= 100) |>
arrange(desc(mean)) |>
flextable() |>
add_header_lines(top = TRUE, value = "Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD)") |>
theme_apa()
Table 1: NFL Contract Yearly Value Statistics by Position Since 2009 (In Millions USD) | |||||
|---|---|---|---|---|---|
position | count | mean | sd | min | max |
QB | 190 | 17.16 | 12.01 | 5.05 | 55.00 |
WR | 231 | 11.42 | 5.68 | 5.04 | 32.00 |
ED | 254 | 11.31 | 5.35 | 5.14 | 34.00 |
LT | 117 | 10.91 | 4.73 | 5.12 | 25.00 |
IDL | 203 | 10.83 | 5.37 | 5.12 | 31.75 |
CB | 189 | 9.77 | 3.97 | 5.05 | 21.00 |
S | 133 | 8.83 | 3.37 | 5.03 | 19.00 |
LB | 153 | 8.63 | 3.14 | 5.01 | 20.00 |
TE | 111 | 8.29 | 2.78 | 5.10 | 17.12 |
Since 2009 quarterbacks are making on average 5 million more than the next highest paid position.
big_contracts |>
ggplot(aes(x = position, y = apy)) +
geom_boxplot(aes(fill = position)) +
scale_fill_manual(values = teams$team_color) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
coord_flip() +
labs(y = "Yearly Contract Amount In Millions",
x = "Position",
title = "Figure 3: NFL Distrbution of contract amount by position",
caption = "Data Source: Overthecap.com Since 2009") +
theme_minimal() +
theme(legend.position = "none")
The QB market has the highest average mean when it comes to yearly contract value.
big_contracts |>
ggplot(aes(x = position, y = apy_cap_pct)) +
geom_boxplot(aes(fill = position)) +
scale_fill_manual(values = teams$team_color) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
coord_flip() +
labs(x = "Position",
y = "Percentage of Salary Cap",
title = "Figure 4: NFL Distrbution of Contract Percent of Yearly Salary Cap by Position",
caption = "Data Source: Overthecap.com Since 2009") +
theme_minimal() +
theme(legend.position = "none")
From this graph we can ascertain that 50% of quarterbacks are making over 10% of their teams salary cap, and that most of the QBs above the 3rd quartile are taking more cap space than the max at any other position.
From external resources we know that the salary cap was raised 30 million dollars at the start of the 2024 off-season to 255.4 Million. Source
For this experiment let’s assume that this increase will apply proportionally to the current quarterback market as contracts expire over the next several years, but keep in mind that this might not be the case and other positions like Edge, Wide Receiver, and Defensive Tackle may increase their market share of the salary cap.
top_qb_pay |> flextable() |> theme_apa()
player | team | year_signed | years | value | apy_cap_pct | apy | apy_cap_adj_2024 |
|---|---|---|---|---|---|---|---|
Joe Burrow | Bengals | 2,023 | 5 | 275.00 | 0.24 | 55.00 | 62.57 |
Aaron Rodgers | GB/NYJ | 2,022 | 5 | 150.81 | 0.24 | 50.27 | 61.55 |
Josh Allen | Bills | 2,021 | 6 | 258.00 | 0.24 | 43.00 | 60.27 |
Russell Wilson | Broncos | 2,022 | 5 | 245.00 | 0.23 | 49.00 | 60.02 |
Justin Herbert | Chargers | 2,023 | 5 | 262.50 | 0.23 | 52.50 | 59.76 |
Lamar Jackson | Ravens | 2,023 | 5 | 260.00 | 0.23 | 52.00 | 59.00 |
Patrick Mahomes | Chiefs | 2,020 | 10 | 450.00 | 0.23 | 45.00 | 57.98 |
Jalen Hurts | Eagles | 2,023 | 5 | 255.00 | 0.23 | 51.00 | 57.98 |
Kyler Murray | Cardinals | 2,022 | 5 | 230.50 | 0.22 | 46.10 | 56.44 |
Deshaun Watson | Browns | 2,022 | 5 | 230.00 | 0.22 | 46.00 | 56.44 |
Dak Prescott | Cowboys | 2,021 | 4 | 160.00 | 0.22 | 40.00 | 55.93 |
Deshaun Watson | Texans | 2,020 | 4 | 156.00 | 0.20 | 39.00 | 50.31 |
Derek Carr | Raiders | 2,022 | 3 | 121.42 | 0.19 | 40.47 | 49.55 |
Matthew Stafford | Rams | 2,022 | 4 | 160.00 | 0.19 | 40.00 | 49.04 |
Aaron Rodgers | Packers | 2,018 | 4 | 134.00 | 0.19 | 33.50 | 48.27 |
Russell Wilson | SEA/DEN | 2,019 | 4 | 140.00 | 0.19 | 35.00 | 47.50 |
Ben Roethlisberger | Steelers | 2,019 | 2 | 68.00 | 0.18 | 34.00 | 46.23 |
Aaron Rodgers | Packers | 2,013 | 5 | 110.00 | 0.18 | 22.00 | 45.72 |
Jared Goff | LAR/DET | 2,019 | 4 | 134.00 | 0.18 | 33.50 | 45.46 |
Daniel Jones | Giants | 2,023 | 4 | 160.00 | 0.18 | 40.00 | 45.46 |
Kirk Cousins | Falcons | 2,024 | 4 | 180.00 | 0.18 | 45.00 | 44.95 |
Some conclusions to make from this plot include: - NFL teams are not willing to spend more than 24% of their cap space on a quarterback - The very best Quarterbacks are making 22-24% of the overall cap space on their team - With the recent salary cap increase, if quarterbacks continue to maintain their current cap share new contracts for top quarterbacks could be 55-63 million dollars per year
*Note this table is including purely for the purpose of this analysis and thus is not titled, as it is not present as a requirement of this project.
# Perform transformations for this plot
top_qbs_summary <- top_qbs_data |>
group_by(display_name) |>
mutate(yards_per_game = passing_yards + rushing_yards) |>
summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))
top_qbs_summary |>
arrange(desc(mean)) |>
flextable() |>
add_header_lines(top = TRUE, value = "Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016") |>
theme_apa()
Table 2: Weekly Yards Gained By Top Paid NFL Quarterbacks Since 2016 | |||||
|---|---|---|---|---|---|
display_name | mean | median | standard_deviation | min | max |
Patrick Mahomes | 316.25 | 306.50 | 76.58 | 78.00 | 509.00 |
Justin Herbert | 292.48 | 299.00 | 68.85 | 96.00 | 472.00 |
Deshaun Watson | 284.09 | 292.00 | 88.34 | 5.00 | 473.00 |
Joe Burrow | 282.46 | 279.00 | 92.61 | 81.00 | 536.00 |
Josh Allen | 279.62 | 281.50 | 88.06 | 5.00 | 466.00 |
Dak Prescott | 279.53 | 271.00 | 86.72 | 130.00 | 514.00 |
Kyler Murray | 278.38 | 275.00 | 79.86 | 12.00 | 444.00 |
Ben Roethlisberger | 272.53 | 266.50 | 77.14 | 75.00 | 511.00 |
Jared Goff | 271.08 | 263.50 | 76.19 | 78.00 | 517.00 |
Kirk Cousins | 270.35 | 268.50 | 76.70 | 97.00 | 460.00 |
Matthew Stafford | 270.19 | 272.00 | 68.02 | 17.00 | 434.00 |
Aaron Rodgers | 260.11 | 253.00 | 87.19 | 0.00 | 474.00 |
Russell Wilson | 259.98 | 254.00 | 71.80 | 122.00 | 482.00 |
Derek Carr | 254.38 | 252.50 | 75.92 | 53.00 | 441.00 |
Lamar Jackson | 245.87 | 263.50 | 104.87 | 0.00 | 504.00 |
Daniel Jones | 240.43 | 244.00 | 81.60 | 22.00 | 429.00 |
Jalen Hurts | 233.89 | 243.00 | 120.16 | -1.00 | 434.00 |
From this summary table we can ascertain that Jared Goff has less yards per game on average than Kyler Murray and Dak Prescott, but more than Aaron Rodgers, Lamar Jackson, and Russel Wilson.
# Perform transformations for this plot
top_qbs_summary2 <- top_qbs_data |>
group_by(player_id, team_color) |>
mutate(yards_per_game = passing_yards + rushing_yards) |>
summarize(mean = mean(yards_per_game, na.rm = TRUE), median = median(yards_per_game), standard_deviation = sd(yards_per_game), min = min(yards_per_game), max = max(yards_per_game))
top_qbs_summary2 |>
ggplot(aes(x = fct_reorder(player_id, median, .desc = FALSE), y = median, fill = player_id)) +
geom_col(color = 'black', position = position_dodge(width = 0.2)) +
scale_fill_manual(values = top_qbs_summary2$team_color) +
labs(title = "Figure 5: Top QB Yards Gains Since 2016",
x = "Player",
y = "Median Yards Gained Per Game",
caption = "Data source: nflreadr package") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
coord_flip() +
theme_minimal() +
theme(legend.position = "none",
axis.text.y = element_nfl_headshot(size=1)) # Change the y-axis labels to the player's headshot
From this plot we can ascertain that Jared Goff is 12th in median yards gained per game among the top paid quarterbacks in the league.
# Perform transformations for this plot
qb_list <- c("Patrick Mahomes", "Jared Goff", "Josh Allen", "Joe Burrow", "Aaron Rodgers") #qb list to filter on for readability, handpicked with domain knowledge
top_qbs_summary3 <- top_qbs_data |>
group_by(display_name, season) |>
summarize(sum_epa = sum(passing_epa)) |>
filter(display_name %in% qb_list) # filtering logic
ggplot(top_qbs_summary3, aes(x = season, y = sum_epa, color = display_name)) +
geom_line(size = 2) +
labs(title = "Figure 6: Top QB Expected Points Added Per Season Since 2016",
x = "Year",
y = "Average Expected Points Added",
caption = "Data source: nflreadr package",
color = "Player") +
theme_minimal()
This plot tells us that while Jared has had some great seasons, he still struggles to compete with some of the other top QBs in the expected points added metric.
# Perform transformations for this plot
top_qbs_summary4 <- top_qbs_data |>
group_by(display_name, player_id) |>
summarize(avg_epa = mean(passing_epa), avg_air_yards = mean(passing_air_yards))
top_qbs_summary4 |>
ggplot(aes(x = avg_air_yards, y = avg_epa)) +
geom_point() +
nflplotR::geom_nfl_headshots(aes(player_gsis = player_id), width = 0.1, vjust = 0.5) + # This geom adds the player's headshot picture as the point
labs(title = "Figure 7: Quarterback Weekly Average Passing EPA by Average Air Yards ",
x = "Average Air Yards Per Game",
y = "Average Expected Points Added Per Game",
caption = "Since 2016, Data Source: nflreadr package") +
theme_minimal()
Jared stands out here having the most points added per game while having less than 260 air yards per game, this indicates he’s a very efficient quarterback, while other quarterbacks need to pass for many more air yards to get similar expected points added.
For this project and data set I wouldn’t normally expect permutation tests to be use, but in this case we can check to see whether Jared Goff having a composite score better than the average of the other top quarterbacks in the game is statistically signifcant or if it more due to random chance but generating a null distribution.
Alternative Hypothesis: (hA)
Jared Goff has a mean composite score better than the other top paid quarterbacks in the league since 2021.
Null Hypothesis: (h0)
Jared Goff has a mean composite score that is not different from the other top paid quarterbacks in the league since 2021.
observed_statistic <- mean(jared_vs_qb_weekly_scores$goff_score - jared_vs_qb_weekly_scores$top_qb_average_score)
set.seed(1999)
n_perms <- 1000
permTs <- vector(length = n_perms)
# Calculating test statistic for each permutation
for(p in 1:n_perms) {
combined_scores <- c(jared_vs_qb_weekly_scores$goff_score, jared_vs_qb_weekly_scores$top_qb_average_score)
half <- length(jared_vs_qb_weekly_scores$goff_score)
permutation <- sample(combined_scores)
scores_A <- permutation[1:half]
scores_B <- permutation[(half+1):length(permutation)]
permTs[p] <- mean(scores_A - scores_B)
}
tibble(value = permTs) |>
ggplot(aes(x = value)) +
geom_histogram(color = "white") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Permutation simulated null distribution",
x = "Test Statistic",
y = "Frequency") +
geom_vline(xintercept = quantile(permTs, 0.95), color = "red", linetype = "dashed") + # adding the 95th percentile for statistical significance
geom_vline(xintercept = observed_statistic, color = "blue") + # adding in our observed test statistic
theme_minimal()
mean(permTs >= observed_statistic) # calculating p-value
## [1] 0.176
With a p-value of 0.16, we fail to reject the null hypothesis that Jared Goff has a greater mean composite score than the other top paid quarterbacks in the league since 2021.
Again this isn’t the best technique for this given project as we do have the complete data set. However we can do a bootstrapping exercise with a subset of the data set and illustrate how bootstrapping can be a useful technique by comparing it to the actual distribution.
Here is the distribution for quarterback passing yards in 2023:
qb_full_info_2023 <- qb_full_info |> filter(season == 2023)
passing_yards <- qb_full_info_2023 |>
group_by(display_name) |>
summarize(season_yards = sum(passing_yards))
passing_yards |>
ggplot(aes(x = season_yards)) +
geom_histogram(color = "white") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Actual QB Passing Yards Distribution",
x = "Passing Yards",
y = "Frequency") +
geom_vline(xintercept = quantile(passing_yards$season_yards, 0.50), color = "red") + # median
theme_minimal()
Now lets take a sample distribution and calculate its median.
set.seed(2000)
random_indices <- sample(1:nrow(passing_yards), 30, replace = FALSE)
passing_yards_sample <- passing_yards[random_indices, ]
passing_yards_sample |>
ggplot(aes(x = season_yards)) +
geom_histogram(color = "white") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Sample QB Passing Yards Distribution",
x = "Passing Yards",
y = "Frequency") +
geom_vline(xintercept = quantile(passing_yards_sample$season_yards, 0.50), color = "red") + # median
theme_minimal()
quantile(passing_yards$season_yards, 0.50) # actual median
## 50%
## 909
quantile(passing_yards_sample$season_yards, 0.50) # sample median
## 50%
## 1167
We can see that the actual median is 909 passing yards for 2023, but our sample median is 1167, if we bootstrap this sampled data we should be able to estimate a median confidence interval and standard error that would include the actual median.
Lets pretend its the early 1960s and we only have the 10 samples for data in passing yards because we were only able to monitor and record data for the quarterbacks that played within 5 hours of travel time. Lets now use a bootstrapping technique to estimate the actual median with 95% confidence.
B <- 1000
set.seed(2000)
boot_medians <- vector(length = B)
for(b in 1:B){
boot_medians[b] <- median(slice_sample(passing_yards_sample, prop = 1, replace = TRUE)$season_yards) # using with replacement creates variability
}
tibble(value = boot_medians) |>
ggplot(aes(x = value)) +
geom_histogram(color = "white") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(title = "Bootstrapped Medians of NFL Quarterback Seasonal Passing Yard Totals",
x = "Observed Median",
y = "Frequency") +
geom_vline(xintercept = quantile(boot_medians, 0.025), color = "red") + # Confidence interval lower bound
geom_vline(xintercept = quantile(boot_medians, 0.975), color = "red") + # Confidence interval upper bound
theme_minimal()
sd(boot_medians) # standard error
## [1] 429.2928
quantile(boot_medians, probs = c(0.025, 0.975)) # Confidence Interval 95%
## 2.5% 97.5%
## 468 2028
Given our 30 samples and using nonparametric bootstrapping technique with replacement we are 95% confident that the median passing yards for quarterbacks in 2023 is between 468 passing yards and 2028 passing yards, with a standard error of 452.71 yards in either direction.
The market for top quarterbacks in the NFL is trending towards and astronomical level. With the salary cap increasing by a substantial amount in the past year, quarterbacks are likely to see around 60 million dollars per year on their contracts in the next couple of years.
We have little evidence from our analysis that Jared Goff is a significantly better performer than some of the other quarterbacks in the NFL. Our permutation test did not show that Jared’s observed greater EPA difference than our average top quarterback was statistically significant, but we did observe a positive difference. Figure 7 did indicate that Jared Goff is particularly efficient compared to other top quarterbacks, which might be reason to pay him higher even though he is middle of the pack when it comes to total yards per game, and average yards per game. One observation is that Jalen Hurts is earning a top contract even though his metrics are markedly worse than Jared Goff’s in the data we analyzed.
If we take into account that Jared Goff has led the Detroit Lions to two playoff wins in his 3 years, he may mean more to the Lions organization than Jalen Hurts means to the Eagles. Considering the salary cap increase of 30 million dollars this off-season, Jared Goff’s agent would not be unreasonable to ask for a contract of 55 million a year, which would make him the highest paid quarterback now, but may end up looking like a bargain 4 years from now considering its less than 22% of the new salary cap. However the Lions organization may have a great opportunity to sign Jared Goff at our near 50 million a year which would make him among the highest paid and where he aligns performance-wise currently. In conclusion I would suggest that Jared Goff may end up signing at about 53 million per year for his next contract.