What The Best Job In The World? Clipboard Holders.

Money for Nothing and Yardsticks for Free

Whats the best job in the world? Everyone has a dream job, but maybe there’s some middle ground and agreement for some professions. Is it those who travel the world as TV hosts - like Samantha Brown on the travel channel? Perhaps, its someone like Alex Trebek or Pat Sajak who get to host games for a living.

There’s no right or wrong answer in this icebreaker questions, but I think I’m close when I guess that backup QBs in the NFL are definitely up there. Surely they make some money holding a clipboard. I wondered, “Do they earn more per yard than the greats like Tom Brady and Drew Brees?”

Here I’ll be leveraging multiple R libraries, but the most important one here is the nflverse library. You can read up on all of the data available in the nflverse here.

Let’s pull the career earnings from overthecap.com for all players for the last few decades. By leveraging the read_html and html_table functions from rvest (like harvesting data), a library that allows us to scrape web pages. Let’s take a quick look at the data structure and the top 10 rows.

Over The Cap - Salary Data

library(tidyverse)
library(nflverse)
library(rvest)
library(janitor)
library(gt)
library(gtExtras)
library(vroom)

# Pull earning from over the cap using rvest
over_the_cap <- read_html("https://overthecap.com/career-earnings")

otc_tables <- over_the_cap %>%
  html_table(fill = TRUE)

nfl_career_earnings_otc <- otc_tables[[1]]

nfl_career_earnings_otc %>% 
  select(1:5) %>% 
  slice_head(n = 10) %>% 
  gt()
Player Pos. Years Active Career Earnings Cap-Inflated Earnings
Aaron Rodgers QB 2005‑ $343,531,094 $517,479,266
Matt Stafford QB 2009‑ $328,000,000 $481,260,983
Tom Brady QB 2000‑2022 $317,619,794 $568,977,577
Matt Ryan QB 2008‑2022 $306,205,882 $475,356,305
Drew Brees QB 2001‑2020 $273,933,000 $487,133,971
Ben Roethlisberger QB 2004‑2021 $266,724,382 $449,660,328
Russell Wilson QB 2012‑ $266,340,123 $350,049,301
Peyton Manning QB 1998‑2015 $247,714,000 $611,505,673
Philip Rivers QB 2004‑2020 $242,150,000 $430,861,370
Eli Manning QB 2004‑2019 $232,490,000 $427,087,671

Leveraging the nflverse

Now that we have our salary data in order, let’s look at what the nflverse has to offer for player data. The load_player_stats function provides over 50 columns of player stats by season. Looks like the player stats goes back to 1999. Lucky me, that’s when I really started to pay attention to the league as a kid.

nflreadr::load_player_stats(seasons=T) %>% 
  glimpse()
Rows: 129,682
Columns: 53
$ player_id                   <chr> "00-0000003", "00-0000003", "00-0000003", …
$ player_name                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ player_display_name         <chr> "Abdul-Karim al-Jabbar", "Abdul-Karim al-J…
$ position                    <chr> "RB", "RB", "RB", "RB", "RB", "RB", "RB", …
$ position_group              <chr> "RB", "RB", "RB", "RB", "RB", "RB", "RB", …
$ headshot_url                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ recent_team                 <chr> "MIA", "MIA", "MIA", "CLE", "CLE", "CLE", …
$ season                      <int> 1999, 1999, 1999, 1999, 1999, 1999, 1999, …
$ week                        <int> 1, 2, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
$ season_type                 <chr> "REG", "REG", "REG", "REG", "REG", "REG", …
$ completions                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ attempts                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_yards               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_tds                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ interceptions               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sacks                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sack_yards                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sack_fumbles                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sack_fumbles_lost           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_air_yards           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_yards_after_catch   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_first_downs         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ passing_epa                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ passing_2pt_conversions     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ pacr                        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ dakota                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ carries                     <int> 16, 9, 3, 6, 13, 9, 17, 7, 15, 10, 5, 13, …
$ rushing_yards               <dbl> 60, 33, 2, 27, 39, 23, 54, 11, 35, 29, 10,…
$ rushing_tds                 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ rushing_fumbles             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ rushing_fumbles_lost        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ rushing_first_downs         <dbl> 4, 1, 0, 0, 2, 1, 4, 0, 1, 1, 0, 2, 4, 0, …
$ rushing_epa                 <dbl> 6.24877114, -1.43495017, -1.53995173, 0.21…
$ rushing_2pt_conversions     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ receptions                  <int> 1, 3, 0, 2, 0, 1, 1, 2, 1, 1, 2, 1, 2, 0, …
$ targets                     <int> 1, 4, 1, 2, 0, 2, 1, 3, 1, 1, 2, 1, 3, 0, …
$ receiving_yards             <dbl> 7, 18, 0, 8, 0, 2, 7, 2, 1, 21, 7, 4, 7, 0…
$ receiving_tds               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
$ receiving_fumbles           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ receiving_fumbles_lost      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ receiving_air_yards         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ receiving_yards_after_catch <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ receiving_first_downs       <dbl> 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, …
$ receiving_epa               <dbl> 0.29237815, 0.37700888, -0.69957773, -0.22…
$ receiving_2pt_conversions   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ racr                        <dbl> 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA…
$ target_share                <dbl> 0.05263158, 0.11764706, 0.02380952, 0.0500…
$ air_yards_share             <dbl> 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA…
$ wopr                        <dbl> 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA…
$ special_teams_tds           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fantasy_points              <dbl> 12.70, 5.10, 0.20, 3.50, 3.90, 2.50, 6.10,…
$ fantasy_points_ppr          <dbl> 13.70, 8.10, 0.20, 5.50, 3.90, 3.50, 7.10,…
$ opponent_team               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
career_yards <- nflreadr::load_player_stats(seasons=T) %>% 
  filter(position == "QB", !is.na(player_display_name )) %>%
  group_by(player_display_name ) %>%
  summarize(
    passing_yards = sum(passing_yards, na.rm = TRUE),
    total_cmp = sum(completions, na.rm = TRUE)
    ) %>% 
  ungroup() %>% 
  filter(total_cmp >= 100) %>% 
  arrange(desc(passing_yards))

career_yards %>% 
  slice_head(n=10) %>% 
  gt()
player_display_name passing_yards total_cmp
Tom Brady 102618 8954
Drew Brees 85730 7624
Peyton Manning 75427 6439
Ben Roethlisberger 70092 5941
Philip Rivers 66399 5515
Matt Ryan 65464 5788
Aaron Rodgers 64948 5502
Eli Manning 59841 5137
Matthew Stafford 58510 5031
Joe Flacco 47466 4291

Trick Plays

We have a bit of an issue between our salary data and player data. The former data goes back further than 1999 which is where the player data begins. For example, Dan Marino and Troy Aikman, NFL Hall of Fame members, have tens of thousands of yards accumulated in their career, but it won’t show up since their careers ended around the time this data begins.

So I’m creating a filter that removes QBs who have only one season in the data. Also, I’d like to assign one primary team to the QBs who played on multiple teams. I’m choosing which ever team they spent the most time with.

# need to filter out marino and aikman, but want to avoid manual removal
QB_Filter <- nflreadr::load_player_stats(seasons=T) %>% 
  filter(position == "QB", !is.na(player_display_name )) %>% 
  group_by(player_display_name,recent_team) %>% 
    summarise(n_seasons = n_distinct(season)) %>% 
  ungroup() %>% 
  filter(n_seasons > 2) %>% 
  distinct(player_display_name, .keep_all = T)

# add primary team 
primary_team_add <- nflreadr::load_player_stats(seasons=T) %>% 
  filter(position == "QB", !is.na(player_display_name )) %>% 
  group_by(player_display_name,recent_team) %>% 
  summarise(n_seasons = n_distinct(season)) %>% 
  filter(n_seasons == max(n_seasons)) %>% 
  ungroup() %>% 
  distinct(player_display_name, .keep_all = TRUE) %>%  #break ties for players who had similar amount of years %>% 
  rename(primary_team = recent_team)

# combine all the data sources
which_QB_has_it_easiest <- left_join(career_yards, nfl_career_earnings_otc, by = c("player_display_name"="Player")) %>% 
  janitor::clean_names() %>% 
  mutate(career_earnings_numeric = as.numeric(gsub("[$,]","", career_earnings )),
         earnings_per_yard = round(career_earnings_numeric/passing_yards,0) 
         ) %>% 
  select(player_display_name,career_earnings_numeric,passing_yards,  earnings_per_yard) %>% 
  arrange(desc(earnings_per_yard)) %>%
  inner_join(QB_Filter, by = 'player_display_name') %>% 
  inner_join(primary_team_add, by = 'player_display_name')

Creating the final table

Awesome! We’re really close to a final product, however, I’d like to make this table as professional looking and easy to digest as possible. Let’s leverage the functionality of the library *gt* (great tables) to add some flair.

Luckily, the nflverse provides the team logo’s URL in an R Data Frame for us to utilize. Even better, the load_rosters function for us to pull headshot photos with a URL.

It looks like there’s multiple headshots per player, so we can filter the data down to one headshot per player.

teams <- "https://github.com/nflverse/nflfastR-data/raw/master/teams_colors_logos.rds"
team_df <- readRDS(url(teams)) %>% 
  select(team_abbr, team_logo_espn)

# grab one headshot per player
headshot_list <- nflreadr::load_rosters(1999:2022) %>%
  group_by(full_name, gsis_id) %>% 
  mutate(index = row_number()) %>%
  ungroup() %>% 
  group_by(full_name) %>% 
  mutate(index_flag = if_else(max(index) == index, 1,0)) %>% 
  ungroup() %>% 
  filter(index_flag == 1) %>% 
  select(full_name, headshot_url)

load_rosters(1999:2022) %>% 
  group_by(full_name) %>% 
  summarise(headshot_count = n_distinct(headshot_url),
            team_count = n_distinct(team)) %>% 
  arrange((desc(headshot_count))) %>% 
  slice_head(n=10) %>% 
  gt()
full_name headshot_count team_count
Chris Jones 11 9
Brandon Williams 8 14
Mike Williams 8 9
Josh Harris 7 4
Josh Johnson 7 13
Aaron Brewer 6 3
Andre Smith 6 9
Anthony Brown 6 3
Brian Allen 6 7
Jonah Williams 6 2

Green Flag instead of Yellow Flag: Clipboard Holding

I’ve been using many different libraries for tables in the past, but great tables has really impressed me with the syntax. I was able incorporate the NFL logo and the player’s headshot in this table which was exciting to build.

# create the gt table 
custom_format <- function(x) {
  paste0("$", format(x / 1000, nsmall = 1), "k")
}


which_QB_has_it_easiest_gt <- which_QB_has_it_easiest %>% 
  left_join(team_df, by = c("primary_team" = "team_abbr" )) %>% 
  left_join(headshot_list, by = c("player_display_name" = "full_name" )) %>% 
  distinct(player_display_name, .keep_all = T) %>% 
  select(player_display_name,headshot_url, career_earnings_numeric,passing_yards,
         earnings_per_yard, primary_team,team_logo_espn) %>% 
  arrange(desc(earnings_per_yard)) %>% 
  slice_head(n = 10) %>% # pipe to the great tables function  
  gt::gt() %>% 
  gt::tab_header(
    title = md("**Clipboard Holders: Which QB Earned Easy Money?**"),
    subtitle = md("*1999-2023  |  Earnings per Yard Passed*")) %>% 
  tab_stubhead(label = "Quarterback") %>%
  gt_img_rows(team_logo_espn , height = 40) %>% 
  gt_img_rows(headshot_url, height = 25)  %>% 
  cols_label(
    player_display_name = "Quarterback",
    headshot_url = "",
    career_earnings_numeric = "Career Earnings",
    passing_yards = "Career Passing Yards",
    earnings_per_yard = "Earnings Per Yard",
    primary_team = "Primary Team",
    team_logo_espn = "Team Logo") %>%
  cols_move_to_start(
    columns = headshot_url) %>%
  cols_align(align = "center", columns = c("player_display_name")) %>% 
  gt_color_box(earnings_per_yard, domain = 5000:30000,
               palette = "ggsci::blue_material",
               format = custom_format) %>% 
  fmt_currency(
    columns = c(career_earnings_numeric),   # Specify the column(s) to format
    currency = "USD",         # Specify the currency symbol
    decimals = 0                # Specify the number of decimal places
  ) %>%
  fmt_number(
    columns = c(passing_yards),  # Specify the column(s) to format
    use_seps = TRUE ,
    decimals = 0 # Enable thousands separators
  ) %>% 
  gtExtras::gt_theme_538() %>% 
  tab_source_note(
    source_note = md("*Special thanks to friends and family for the idea and Brad J. Congelio for his textbook on NFL analytics*"))  %>% 
  tab_footnote(
    footnote  = "Table created by: Tyler Otto"
  )

which_QB_has_it_easiest_gt
Clipboard Holders: Which QB Earned Easy Money?
1999-2023 | Earnings per Yard Passed
Quarterback Career Earnings Career Passing Yards Earnings Per Yard Primary Team
Chase Daniel $43,102,217 1,746
24 686
NO
Taysom Hill $42,670,000 2,158
19 773
NO
JaMarcus Russell $39,365,000 4,083
9 641
LV
Jimmy Garoppolo $148,806,839 16,456
9 043
SF
Deshaun Watson $142,761,982 17,626
8 100
HOU
Luke McCown $17,310,000 2,370
7 304
JAX
Kevin Kolb $37,680,375 5,206
7 238
PHI
Matt Flynn $17,758,587 2,541
6 989
GB
Sam Bradford $129,982,500 19,449
6 683
LA
Lamar Jackson $112,487,652 16,939
6 641
BAL
Special thanks to friends and family for the idea and Brad J. Congelio for his textbook on NFL analytics
Table created by: Tyler Otto