This series of workshops is designed to be able to teach the basics of sports analytics to anyone, especially those who have never coded anything in their life. If that’s you. no worries. If you’re unable to make it feel free to message me on GroupMe or email me (max.whalen@uga.edu) as you troublshoot.
To start today, we are just going to look at an intro to R and RStudio, how to use packages for data analysis.
Any data we use can be found in this folder: https://drive.google.com/drive/folders/17x3TZ0Y2bsk3jwYpH1dQ9lbB3AdHpb18?usp=sharing
To start, we just set up our environment. A good first step may be to create a folder somewhere easily accessible. This will be your working directory. The following chunk of code will include…
Markdown set-up
Setting working directory
Updating packages that need updating
knitr::opts_chunk$set(echo = TRUE)
#knitr::opts_knit$set(root.dir = "C:/Users/maxwh/Downloads/SSOUGA")
#setwd("C:/Users/maxwh/Downloads/SSOUGA")
# TODO Run ~once a week to keep up with package dependencies
#update.packages()
Our environment is now pretty much set up. Changing the directory gave us a place where all the data we need to import manually (not from packages) will live and where all our exports will go.
The foundation of these workshops will be built on packages, specifically one called the “tidyverse”, a collection of packages that aim to make R code more efficient and readable, and thus easier to learn. We also will use data from NFLFastR/NFLReadR. These packages essentially perform functions under the hood to save us time and eliminate technical barriers to entry.
This is a one time situation, you only need to install packages, the first time you use them. After that update.packages() will keep them up to date.
library(nflfastR)
## Warning: package 'nflfastR' was built under R version 4.1.3
library(scales)
## Warning: package 'scales' was built under R version 4.1.3
library(nflreadr)
## Warning: package 'nflreadr' was built under R version 4.1.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
library(ggthemes)
library(ggtext)
## Warning: package 'ggtext' was built under R version 4.1.3
library(ggimage)
## Warning: package 'ggimage' was built under R version 4.1.3
library(gghighlight)
## Warning: package 'gghighlight' was built under R version 4.1.3
library(gt)
## Warning: package 'gt' was built under R version 4.1.3
library(gtExtras)
## Warning: package 'gtExtras' was built under R version 4.1.3
The library([package_name]) function will load the packages into your environment each time you open RStudio.
There is a way of using functions without loading the whole package seen below.
carData::Mandel
## x1 x2 y
## 1 16.85 1.46 41.38
## 2 24.81 -4.61 31.01
## 3 18.85 -0.21 37.41
## 4 12.63 4.93 50.05
## 5 21.38 -1.36 39.17
## 6 18.78 -0.08 38.86
## 7 15.58 2.98 46.14
## 8 16.30 1.73 44.47
Here we called a dataframe from the carData package without loading the whole package.
This is useful for a few reasons…
Using a function/dataframe/etc from a package without loading the entire package
Writing more readable code, so people know what package your functions come from
Calling a function that may have the same name in two different packages
So we want to data analysis. The first step is to load our data. There are a few ways to do this. Here we will mostly try and do this by using packages. We can start by using CSVs and XLSX files for now though.
To do this, we use the tidyverse functions read_csv().
Before we use this, it might be helpful to have an idea how. When using a function that you are unfamiliar with or need a quick reminder, go to the help tab in the bottom right of your environment and type in the name of the function. If you have loaded the package, there should be documentation to help you out.
# Remember to make sure this data is in your working directory
#readr::read_csv("combine.csv")
And there it is, we now have some data to work with.
Let’s try again with an XLSX file.
Using the readxl package, we didn’t load it though because we will only use it this once.
# Make sure to clarify the sheet when loading data from xlsx.
#readxl::read_xlsx("hockeyplayerdata2018.xlsx", sheet = 2)
So we have options for loading data, these are a couple of them. For the time being we will primarily be using packages to load data.
Let’s get started with last year’s NFL Data and assign it to an object. R is flexible in the sense you don’t need to initialize anything to save it for those of you coming from a Java-ish background.
We can also use this moment as an introduction to vectors. Vectors can be thought of as a list essentially of similar objects, here we are going to create a vector of numbers to use as the seasons we want to collect data on. To create a vector use c([obj1, obj2, …])
# Vector of one season
szn <- c(2021)
#Vector of select seasons
szns <- c(2011, 2021)
#Vector of series of seasons (all seasons between 2011 & 2021)
decade <- c(2011:2021)
# The assignment operator can be '=' or '<-", I use the latter typically.
# Here we are just going to use last season so we don't actually need a vector
pbp21 <- load_pbp(seasons = 2021)
# This could take a second
pbp_decade <- load_pbp(seasons = decade)
So now we have the data, let’s take a peek.
Use the head() function to look at the top of data, by default head() calls 6 rows.
Use tail() for the converse.
You can change the argument in head to view as many rows as you want, or even to “trim” a data frame to x amount of rows.
# Two ways of calling
head(pbp21)
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 6 x 372
## play_id game_id old_g~1 home_~2 away_~3 seaso~4 week posteam poste~5 defteam
## <dbl> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 1 2021_01~ 202109~ TEN ARI REG 1 <NA> <NA> <NA>
## 2 40 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 3 55 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 4 76 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 5 100 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 6 122 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## # ... with 362 more variables: side_of_field <chr>, yardline_100 <dbl>,
## # game_date <chr>, quarter_seconds_remaining <dbl>,
## # half_seconds_remaining <dbl>, game_seconds_remaining <dbl>,
## # game_half <chr>, quarter_end <dbl>, drive <dbl>, sp <dbl>, qtr <dbl>,
## # down <dbl>, goal_to_go <dbl>, time <chr>, yrdln <chr>, ydstogo <dbl>,
## # ydsnet <dbl>, desc <chr>, play_type <chr>, yards_gained <dbl>,
## # shotgun <dbl>, no_huddle <dbl>, qb_dropback <dbl>, qb_kneel <dbl>, ...
pbp21 %>% head()
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 6 x 372
## play_id game_id old_g~1 home_~2 away_~3 seaso~4 week posteam poste~5 defteam
## <dbl> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 1 2021_01~ 202109~ TEN ARI REG 1 <NA> <NA> <NA>
## 2 40 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 3 55 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 4 76 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 5 100 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## 6 122 2021_01~ 202109~ TEN ARI REG 1 TEN home ARI
## # ... with 362 more variables: side_of_field <chr>, yardline_100 <dbl>,
## # game_date <chr>, quarter_seconds_remaining <dbl>,
## # half_seconds_remaining <dbl>, game_seconds_remaining <dbl>,
## # game_half <chr>, quarter_end <dbl>, drive <dbl>, sp <dbl>, qtr <dbl>,
## # down <dbl>, goal_to_go <dbl>, time <chr>, yrdln <chr>, ydstogo <dbl>,
## # ydsnet <dbl>, desc <chr>, play_type <chr>, yards_gained <dbl>,
## # shotgun <dbl>, no_huddle <dbl>, qb_dropback <dbl>, qb_kneel <dbl>, ...
This introduces us to the pipe operator ( %>% ), it’s a very helpful operator. It’s windows shortcut is ctrl+shift+m. The pipe operator takes the output of one function and pushes it into the object that’s going to be operated on in the next object.
Think about how we used the head function above, we took the output from the pbp21 object and then pushed it into the object slot for the head() function.
The filter function is used to identify the rows in a dataframe we do or do not need.
Let’s filter by games Philadelphia (my favorite team)
pbp21 %>%
filter(home_team == "PHI" | away_team == "PHI")
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 3,170 x 372
## play_id game_id old_g~1 home_~2 away_~3 seaso~4 week posteam poste~5 defteam
## <dbl> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 1 2021_0~ 202109~ ATL PHI REG 1 <NA> <NA> <NA>
## 2 41 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 3 56 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 4 80 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 5 101 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 6 122 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 7 146 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 8 176 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 9 210 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## 10 231 2021_0~ 202109~ ATL PHI REG 1 ATL home PHI
## # ... with 3,160 more rows, 362 more variables: side_of_field <chr>,
## # yardline_100 <dbl>, game_date <chr>, quarter_seconds_remaining <dbl>,
## # half_seconds_remaining <dbl>, game_seconds_remaining <dbl>,
## # game_half <chr>, quarter_end <dbl>, drive <dbl>, sp <dbl>, qtr <dbl>,
## # down <dbl>, goal_to_go <dbl>, time <chr>, yrdln <chr>, ydstogo <dbl>,
## # ydsnet <dbl>, desc <chr>, play_type <chr>, yards_gained <dbl>,
## # shotgun <dbl>, no_huddle <dbl>, qb_dropback <dbl>, qb_kneel <dbl>, ...
So what’s the logic here?
The syntax of filter is filter(row_name [operator] target)
So what are the types of operators, below are a few:
“==”: This operator checks for equivalency
“>’,”<“,”<=“, or”>=“: When comparing quantitative data we can use greater than or less than
“|”: This is the “or” operator. When we use this we are saying for each row one of the conditions on each side of the or operator must be true
“&”: This is the “and” operator. When we use this we are saying for each row both of the conditions on each side of the and operator must be true
pbp21 %>%
filter(home_team == "PHI" | away_team == "PHI") %>%
select(home_team, away_team, desc, game_seconds_remaining, home_wp)
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 3,170 x 5
## home_team away_team desc game_~1 home_wp
## <chr> <chr> <chr> <dbl> <dbl>
## 1 ATL PHI GAME 3600 0.546
## 2 ATL PHI 4-J.Elliott kicks 65 yards from PHI 35 t~ 3600 0.546
## 3 ATL PHI (15:00) 2-M.Ryan pass deep middle to 18-~ 3600 0.546
## 4 ATL PHI (14:34) (No Huddle) 28-M.Davis right gua~ 3574 0.573
## 5 ATL PHI (14:05) (No Huddle) 28-M.Davis left tack~ 3545 0.584
## 6 ATL PHI (13:45) (No Huddle) 2-M.Ryan pass short ~ 3525 0.569
## 7 ATL PHI (13:08) (No Huddle, Shotgun) 2-M.Ryan pa~ 3488 0.542
## 8 ATL PHI (12:50) (No Huddle) 2-M.Ryan pass incomp~ 3470 0.632
## 9 ATL PHI (12:44) 28-M.Davis right guard to PHI 28~ 3464 0.618
## 10 ATL PHI (12:08) (Shotgun) 2-M.Ryan pass short mi~ 3428 0.601
## # ... with 3,160 more rows, and abbreviated variable name
## # 1: game_seconds_remaining
The select function is pretty simple. The select function simply selects the columns you want to keep in your dataset. This is helpful for a lot of reasons, but mostly it’s just helpful to clean up data that has a lot of extraneous data.
pbp21 %>%
filter(home_team == "PHI" | away_team == "PHI") %>%
select(home_team, away_team, desc, game_seconds_remaining, home_wp) %>%
mutate(away_wp = 1 - home_wp)
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 3,170 x 6
## home_team away_team desc game_~1 home_wp away_wp
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 ATL PHI GAME 3600 0.546 0.454
## 2 ATL PHI 4-J.Elliott kicks 65 yards from ~ 3600 0.546 0.454
## 3 ATL PHI (15:00) 2-M.Ryan pass deep middl~ 3600 0.546 0.454
## 4 ATL PHI (14:34) (No Huddle) 28-M.Davis r~ 3574 0.573 0.427
## 5 ATL PHI (14:05) (No Huddle) 28-M.Davis l~ 3545 0.584 0.416
## 6 ATL PHI (13:45) (No Huddle) 2-M.Ryan pas~ 3525 0.569 0.431
## 7 ATL PHI (13:08) (No Huddle, Shotgun) 2-M~ 3488 0.542 0.458
## 8 ATL PHI (12:50) (No Huddle) 2-M.Ryan pas~ 3470 0.632 0.368
## 9 ATL PHI (12:44) 28-M.Davis right guard t~ 3464 0.618 0.382
## 10 ATL PHI (12:08) (Shotgun) 2-M.Ryan pass ~ 3428 0.601 0.399
## # ... with 3,160 more rows, and abbreviated variable name
## # 1: game_seconds_remaining
Mutate is one of the most useful functions we have. Mutate is super helpful for creating new columns based on really anything, but especially based on data we already have in other columns. So here we created a column that gives us the away_wp based on the measured home_wp.
So let’s combine the 3 here and see if we can create some unique insight. Since I was at this game, I know the eagles won. Let’s see if we can find the play that was the biggest momentum swing.
pbp21 %>%
filter(home_team == "ATL" & away_team == "PHI") %>%
select(home_team, away_team, desc, game_seconds_remaining, home_wp) %>%
mutate(away_wp = 1 - home_wp) %>%
# This next line is straight from stack overflow, great resource
mutate(PHI_wp_change = home_wp-lag(away_wp,default=first(away_wp))) %>%
select(desc, PHI_wp_change)
## -- nflverse play by play data --------------------------------------------------
## i Data updated: 2022-09-27 07:35:02 EDT
## # A tibble: 191 x 2
## desc PHI_w~1
## <chr> <dbl>
## 1 GAME 0.0925
## 2 4-J.Elliott kicks 65 yards from PHI 35 to end zone, Touchback. 0.0925
## 3 (15:00) 2-M.Ryan pass deep middle to 18-C.Ridley to ATL 41 for 16 ya~ 0.0925
## 4 (14:34) (No Huddle) 28-M.Davis right guard to PHI 47 for 12 yards (5~ 0.119
## 5 (14:05) (No Huddle) 28-M.Davis left tackle to PHI 43 for 4 yards (57~ 0.157
## 6 (13:45) (No Huddle) 2-M.Ryan pass short right to 84-C.Patterson to P~ 0.154
## 7 (13:08) (No Huddle, Shotgun) 2-M.Ryan pass short right to 18-C.Ridle~ 0.112
## 8 (12:50) (No Huddle) 2-M.Ryan pass incomplete short right to 8-K.Pitt~ 0.175
## 9 (12:44) 28-M.Davis right guard to PHI 28 for 3 yards (94-J.Sweat; 58~ 0.250
## 10 (12:08) (Shotgun) 2-M.Ryan pass short middle to 18-C.Ridley to PHI 1~ 0.218
## # ... with 181 more rows, and abbreviated variable name 1: PHI_wp_change
So now we have all the plays from this game as well as how that changed the win probability for Philadelphia.
pbp21 %>%
filter(home_team == "ATL" & away_team == "PHI") %>%
select(home_team, away_team, down, ydstogo, yardline_100, desc, qtr ,time,
home_wp) %>%
mutate(away_wp = 1 - home_wp) %>%
# This next line is straight from stack overflow, great resource
mutate(PHI_wp_change = home_wp-lag(away_wp,default=first(away_wp))) %>%
select(desc, down, ydstogo, qtr, time, yardline_100, PHI_wp_change) %>%
drop_na(PHI_wp_change) %>%
arrange(-PHI_wp_change) %>%
head(5)
## # A tibble: 5 x 7
## desc down ydstogo qtr time yardl~1 PHI_w~2
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 (10:11) (No Huddle) 2-M.Ryan pass i~ 2 3 1 10:11 3 0.374
## 2 (10:07) (Shotgun) 2-M.Ryan pass inc~ 3 3 1 10:07 3 0.363
## 3 (10:37) (No Huddle) 28-M.Davis left~ 1 6 1 10:37 6 0.341
## 4 (11:35) 28-M.Davis right guard to P~ 1 5 1 11:35 11 0.335
## 5 (11:03) 2-M.Ryan pass short left to~ 2 7 1 11:03 13 0.331
## # ... with abbreviated variable names 1: yardline_100, 2: PHI_wp_change
Now we have the top 5 most influential plays of the game in favor of the eagles.
The arrange function sorts the dataframe in descending order based on the specified column. The “-” sorts in descending order. Without the “-”, the table will be sorted in ascending order.
We also use the drop_na() function to get rid of any NA values our data might have.
Let’s try and create a table to see who the best RBs in the NFL are
pbp21 %>%
filter(rush_attempt == 1) %>%
group_by(rusher_player_name, posteam, rusher_id) %>%
drop_na(rusher_id) %>%
summarise(Rushes = sum(n()), Total_EPA = sum(epa), Yards = sum(yards_gained)) %>%
mutate(EPA_rush = Total_EPA/Rushes, Yds_rush = Yards/Rushes) %>%
filter(Rushes > 30) %>%
arrange(-Yds_rush)
## `summarise()` has grouped output by 'rusher_player_name', 'posteam'. You can
## override using the `.groups` argument.
## # A tibble: 110 x 8
## # Groups: rusher_player_name, posteam [110]
## rusher_player_name posteam rusher_id Rushes Total_EPA Yards EPA_rush Yds_r~1
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 R.Penny SEA 00-0034750 119 11.2 749 9.41e-2 6.29
## 2 D.Hilliard TEN 00-0034253 56 9.06 350 1.62e-1 6.25
## 3 D.Samuel SF 00-0035719 86 18.7 502 2.18e-1 5.84
## 4 N.Chubb CLE 00-0034791 228 10.6 1259 4.66e-2 5.52
## 5 T.Pollard DAL 00-0035261 134 12.4 733 9.25e-2 5.47
## 6 J.Taylor IND 00-0036223 332 34.5 1811 1.04e-1 5.45
## 7 D.Johnson CLE 00-0035628 100 6.44 534 6.44e-2 5.34
## 8 M.Sanders PHI 00-0035243 145 -1.77 772 -1.22e-2 5.32
## 9 T.Williams BAL 00-0036457 35 -0.730 185 -2.09e-2 5.29
## 10 J.Jackson LAC 00-0034440 69 0.0122 364 1.77e-4 5.28
## # ... with 100 more rows, and abbreviated variable name 1: Yds_rush
The group_by() function is used to group data, hence the name. It can be hard to see what exactly this looks like in practice until we get into the summarise() function.
The summarise() function takes the grouping items and then applies a function to them as a unit. So here we grouped by player and added a couple other variables that we knew would be the same for each instance of the overarching grouping variable, the player.
One of the really nice things the NFlFastR package offers is a dataframe we can attach to other dataframes we have built that helps us create custom graphics based on team colors among other things. We will see what that looks like after using the left_join() function.
pbp21 %>%
filter(rush_attempt == 1) %>%
group_by(rusher_player_name, posteam, rusher_id) %>%
drop_na(rusher_id) %>%
summarise(Rushes = sum(n()), Total_EPA = sum(epa), Yards = sum(yards_gained)) %>%
mutate(EPA_rush = Total_EPA/Rushes, Yds_rush = Yards/Rushes) %>%
filter(Rushes > 50) %>%
arrange(-Yds_rush) %>%
left_join(teams_colors_logos, by =c("posteam" = "team_abbr"))
## `summarise()` has grouped output by 'rusher_player_name', 'posteam'. You can
## override using the `.groups` argument.
## # A tibble: 83 x 22
## # Groups: rusher_player_name, posteam [83]
## rushe~1 posteam rushe~2 Rushes Total~3 Yards EPA_rush Yds_r~4 team_~5 team_id
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 R.Penny SEA 00-003~ 119 11.2 749 9.41e-2 6.29 Seattl~ 4600
## 2 D.Hill~ TEN 00-003~ 56 9.06 350 1.62e-1 6.25 Tennes~ 2100
## 3 D.Samu~ SF 00-003~ 86 18.7 502 2.18e-1 5.84 San Fr~ 4500
## 4 N.Chubb CLE 00-003~ 228 10.6 1259 4.66e-2 5.52 Clevel~ 1050
## 5 T.Poll~ DAL 00-003~ 134 12.4 733 9.25e-2 5.47 Dallas~ 1200
## 6 J.Tayl~ IND 00-003~ 332 34.5 1811 1.04e-1 5.45 Indian~ 2200
## 7 D.John~ CLE 00-003~ 100 6.44 534 6.44e-2 5.34 Clevel~ 1050
## 8 M.Sand~ PHI 00-003~ 145 -1.77 772 -1.22e-2 5.32 Philad~ 3700
## 9 J.Jack~ LAC 00-003~ 69 0.0122 364 1.77e-4 5.28 Los An~ 4400
## 10 C.Edmo~ ARI 00-003~ 124 8.68 620 7.00e-2 5 Arizon~ 3800
## # ... with 73 more rows, 12 more variables: team_nick <chr>, team_conf <chr>,
## # team_division <chr>, team_color <chr>, team_color2 <chr>,
## # team_color3 <chr>, team_color4 <chr>, team_logo_wikipedia <chr>,
## # team_logo_espn <chr>, team_wordmark <chr>, team_conference_logo <chr>,
## # team_league_logo <chr>, and abbreviated variable names
## # 1: rusher_player_name, 2: rusher_id, 3: Total_EPA, 4: Yds_rush,
## # 5: team_name
As we can see– this join added a bunch of columns with team information, links to logos, color codes, and more.
Joining different dataframes with left_join() is pretty simple. The syntax is
old_df %>% left_join(new_df, by = c(“old_df_col” = “new_df_col”))
There are more ways to do joins but we will start with this and explore more on a as needed basis. The idea here is that the column in our two columns have similar data. For each instance of an item in the original data frame that has an equivalent in the new dataframe that data will be matched to the original dataframe. This can happen in a one-to-one fashion or a one-to-many fashion. I hope this makes sense.
To practice these concepts of data manipulation… we will walk through the functions we mentioned above to see if we can contort the data in order to answer our questions.
The functions we have available to use are
filter()
group_by()
summarise()
arrange()
left_join() (and other joins)
mutate()
select()
Other useful functions: head(), slice(), ungroup(), drop_na()
Hint: Use the pbp_decade data
## # A tibble: 1,701 x 4
## receiver_player_name Yards TDs Total_EPA
## <chr> <dbl> <dbl> <dbl>
## 1 J.Jones 17432 99 720.
## 2 A.Brown 16393 118 654.
## 3 T.Kelce 10337 69 543.
## 4 R.Gronkowski 10066 98 540.
## 5 D.Adams 9051 82 424.
## 6 T.Hill 8137 71 391.
## 7 D.Baldwin 7297 55 391.
## 8 T.Hilton 10476 59 381.
## 9 J.Nelson 9202 81 379.
## 10 M.Evans 9757 80 375.
## # ... with 1,691 more rows
## `summarise()` has grouped output by 'receiver_player_name'. You can override
## using the `.groups` argument.
## # A tibble: 5,356 x 5
## # Groups: receiver_player_name [1,701]
## receiver_player_name season Yards TDs Total_EPA
## <chr> <int> <dbl> <dbl> <dbl>
## 1 C.Kupp 2021 2427 23 139.
## 2 J.Jones 2012 2192 17 117.
## 3 J.Jones 2015 2827 16 115.
## 4 R.Gronkowski 2011 1585 20 114.
## 5 W.Welker 2011 1737 10 111.
## 6 S.Diggs 2020 1846 10 103.
## 7 M.Thomas 2019 1809 9 101.
## 8 A.Boldin 2013 1406 8 101.
## 9 V.Cruz 2011 1805 11 99.9
## 10 A.Brown 2020 1722 19 98.8
## # ... with 5,346 more rows
hint: use the ‘wpa’ variable rather than the method we used above
## # A tibble: 10 x 4
## desc posteam defteam wpa
## <chr> <chr> <chr> <dbl>
## 1 (:10) (Shotgun) 7-C.Keenum pass deep right to 14-S.Dig~ MIN NO 0.699
## 2 (:26) (Field Goal formation) 3-B.Walsh 27 yard field g~ MIN SEA 0.588
## 3 (3:59) 4-S.Koch punts 52 yards to DEN 14, Center-46-M.~ BAL DEN 0.575
## 4 (:26) (Shotgun) 3-R.Wilson pass short right intended f~ SEA NE 0.573
## 5 (9:42) 5-S.Weatherford punts 37 yards to SF 19, Center~ NYG SF 0.540
## 6 (:17) (Shotgun) 17-J.Allen pass deep middle to 13-G.Da~ BUF KC 0.508
## 7 (15:00) (Shotgun) 15-T.Tebow pass deep middle to 88-D.~ DEN PIT 0.473
## 8 (:14) 11-Alex Smith pass short middle to 85-V.Davis fo~ SF NO 0.440
## 9 (:10) (Field Goal formation) 1-C.Parkey 43 yard field ~ CHI PHI 0.414
## 10 (15:00) 3-C.Palmer pass short left to 11-L.Fitzgerald ~ ARI GB 0.389
A little about the slice function: The slice function is useful for taking one instance that we specify. So here it’s taking one instance where the most wpa was gained or lost out of each group. I’m personally not super familiar with it, so that may be a representation of its true use, but that’s the way I used it here.
Full nflReadR functions: https://nflreadr.nflverse.com/reference/index.html
draft <- load_draft_picks()
snaps <- load_snap_counts(seasons = 2021)
head(draft)
## -- nflverse Draft Picks, via Pro Football Reference ----------------------------
## i Data updated: 2022-11-02 01:42:17 EDT
## # A tibble: 6 x 36
## season round pick team gsis_id pfr_p~1 cfb_p~2 pfr_p~3 hof posit~4 categ~5
## <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
## 1 1980 1 1 DET <NA> SimsBi~ billy-~ Billy ~ FALSE RB RB
## 2 1980 1 2 NYJ <NA> JoneLa~ lam-jo~ Lam Jo~ FALSE WR WR
## 3 1980 1 3 CIN <NA> MunoAn~ <NA> Anthon~ TRUE T OL
## 4 1980 1 4 GNB <NA> ClarBr~ bruce-~ Bruce ~ FALSE DE DL
## 5 1980 1 5 BAL <NA> DickCu~ curtis~ Curtis~ FALSE RB RB
## 6 1980 1 6 STL <NA> GreeCu~ curtis~ Curtis~ FALSE DE DL
## # ... with 25 more variables: side <chr>, college <chr>, age <dbl>, to <int>,
## # allpro <dbl>, probowls <dbl>, seasons_started <dbl>, w_av <dbl>,
## # car_av <dbl>, dr_av <dbl>, games <dbl>, pass_completions <dbl>,
## # pass_attempts <dbl>, pass_yards <dbl>, pass_tds <dbl>, pass_ints <dbl>,
## # rush_atts <dbl>, rush_yards <dbl>, rush_tds <dbl>, receptions <dbl>,
## # rec_yards <dbl>, rec_tds <dbl>, def_solo_tackles <dbl>, def_ints <dbl>,
## # def_sacks <dbl>, and abbreviated variable names 1: pfr_player_id, ...
draft %>%
filter(season == 2021) -> draft21
head(snaps)
## -- nflverse snap counts --------------------------------------------------------
## i Data updated: 2022-07-08 10:06:34 EDT
## # A tibble: 6 x 16
## game_id pfr_g~1 season game_~2 week player pfr_p~3 posit~4 team oppon~5
## <chr> <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr>
## 1 2021_01_DAL~ 202109~ 2021 REG 1 Trist~ WirfTr~ T TB DAL
## 2 2021_01_DAL~ 202109~ 2021 REG 1 Ali M~ MarpAl~ G TB DAL
## 3 2021_01_DAL~ 202109~ 2021 REG 1 Alex ~ CappAl~ G TB DAL
## 4 2021_01_DAL~ 202109~ 2021 REG 1 Donov~ SmitDo~ T TB DAL
## 5 2021_01_DAL~ 202109~ 2021 REG 1 Tom B~ BradTo~ QB TB DAL
## 6 2021_01_DAL~ 202109~ 2021 REG 1 Ryan ~ JensRy~ C TB DAL
## # ... with 6 more variables: offense_snaps <dbl>, offense_pct <dbl>,
## # defense_snaps <dbl>, defense_pct <dbl>, st_snaps <dbl>, st_pct <dbl>, and
## # abbreviated variable names 1: pfr_game_id, 2: game_type, 3: pfr_player_id,
## # 4: position, 5: opponent
production)
We know have the 2020 draft class loaded. Let’s load all snaps from last year.
Join them and summarise…
draft21 %>%
left_join(snaps) %>% # No by() bc they have common variables
group_by(player, team) %>%
mutate(Snaps = offense_snaps + defense_snaps) %>%
drop_na(Snaps) %>%
summarise(TotalSnaps = sum(Snaps)) %>%
ungroup() %>%
group_by(team) %>%
summarise(Snaps = sum(TotalSnaps)) %>%
arrange(-Snaps) -> production_draft_21
## Joining, by = c("season", "team", "pfr_player_id", "position")
## `summarise()` has grouped output by 'player'. You can override using the
## `.groups` argument.
production_draft_21
## # A tibble: 24 x 2
## team Snaps
## <chr> <dbl>
## 1 NYJ 2628
## 2 PIT 2039
## 3 WAS 1763
## 4 HOU 1691
## 5 CIN 1668
## 6 CAR 1603
## 7 PHI 1573
## 8 JAX 1557
## 9 ARI 1480
## 10 ATL 1265
## # ... with 14 more rows
production_draft_21 %>%
left_join(teams_colors_logos, by = c("team" = "team_abbr")) %>%
select(Team = team_wordmark, Snaps) %>%
head(10) %>%
gt() %>%
tab_header(
title = "Most Productive 2021 Draft Classes",
subtitle = "Measured in snaps played") %>%
cols_label(Snaps = "Rookie Snaps Played") %>%
opt_table_outline() %>% # Creating outline
data_color( # Update cell colors...
columns = c(Snaps), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(min(production_draft_21$Snaps), max(Snaps))
)
) %>%
cols_align(
align = "center",
columns = Team # Selecting all columns
) %>%
gt_img_rows(columns = Team, height = 25)
| Most Productive 2021 Draft Classes | |
| Measured in snaps played | |
| Team | Rookie Snaps Played |
|---|---|
| 2628 | |
| 2039 | |
| 1763 | |
| 1691 | |
| 1668 | |
| 1603 | |
| 1573 | |
| 1557 | |
| 1480 | |
| 1265 | |
Here we will take a lot of what we learned, step-by-step
## Redundant but just for exercise sake...
pbp22 <- load_pbp(seasons = 2022)
We can go step by step here to really see what each function is doing
pbp22 %>%
filter(rush_attempt == 1) %>% #Including QB scrambles this time
group_by(rusher_player_name, posteam) %>%
summarise(epa_per_run = mean(epa), yards_per_run = mean(yards_gained),
carries = n(), yards = sum(yards_gained), TDs = sum(touchdown)) %>%
filter(carries > 12) %>%
ungroup() %>%
arrange(-epa_per_run) -> runners
## `summarise()` has grouped output by 'rusher_player_name'. You can override
## using the `.groups` argument.
runners
## # A tibble: 110 x 7
## rusher_player_name posteam epa_per_run yards_per_run carries yards TDs
## <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl>
## 1 T.Hill NO 0.578 8.64 39 337 5
## 2 J.Brissett CLE 0.425 4.58 31 142 2
## 3 J.Allen BUF 0.394 5.77 53 306 2
## 4 L.Jackson BAL 0.378 7.37 75 553 2
## 5 K.Pickett PIT 0.335 5.16 19 98 2
## 6 J.Fields CHI 0.326 5.58 76 424 3
## 7 P.Mahomes KC 0.310 5.38 21 113 0
## 8 C.Hubbard CAR 0.308 6.47 15 97 1
## 9 T.Lance SF 0.304 4.19 16 67 0
## 10 J.Burrow CIN 0.272 4.26 31 132 3
## # ... with 100 more rows
Now let’s get rid of the non-qbs…
rosters <- load_rosters()
rosters %>%
mutate(rusher_player_name = paste(substr(first_name,1,1), last_name, sep = ".")) %>%
select(position, team, rusher_player_name, headshot_url) -> roster1
runners %>%
left_join(roster1, by = c("rusher_player_name" = "rusher_player_name",
"posteam" = "team")) %>% #Both must be the same
mutate(position = if_else(rusher_player_name == "T.Hill", "QB", position)) %>%
filter(position == "QB") -> gg_runners
gg_runners
## # A tibble: 27 x 9
## rusher_player_n~1 posteam epa_p~2 yards~3 carries yards TDs posit~4 heads~5
## <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
## 1 T.Hill NO 0.578 8.64 39 337 5 QB https:~
## 2 J.Brissett CLE 0.425 4.58 31 142 2 QB https:~
## 3 J.Allen BUF 0.394 5.77 53 306 2 QB https:~
## 4 L.Jackson BAL 0.378 7.37 75 553 2 QB https:~
## 5 K.Pickett PIT 0.335 5.16 19 98 2 QB https:~
## 6 J.Fields CHI 0.326 5.58 76 424 3 QB https:~
## 7 P.Mahomes KC 0.310 5.38 21 113 0 QB https:~
## 8 T.Lance SF 0.304 4.19 16 67 0 QB https:~
## 9 J.Burrow CIN 0.272 4.26 31 132 3 QB https:~
## 10 D.Jones NYG 0.267 5.67 64 363 3 QB https:~
## # ... with 17 more rows, and abbreviated variable names 1: rusher_player_name,
## # 2: epa_per_run, 3: yards_per_run, 4: position, 5: headshot_url
Now we have a reasonable ranking of the most efficient runners
For this, we use the gt package.
gt has a ton of capabilites, very few of them I know off hand but there are loads of resources we can pull from the internet.
Here’s the one I use most often: https://themockup.blog/posts/2020-09-04-10-table-rules-in-r/
gg_runners %>%
gt()
| rusher_player_name | posteam | epa_per_run | yards_per_run | carries | yards | TDs | position | headshot_url |
|---|---|---|---|---|---|---|---|---|
| T.Hill | NO | 0.577800511 | 8.641026 | 39 | 337 | 5 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pjp1aahvacigoh0sorpg |
| J.Brissett | CLE | 0.425142481 | 4.580645 | 31 | 142 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/z12an2n54jj6ycouwg9h |
| J.Allen | BUF | 0.393975384 | 5.773585 | 53 | 306 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/btfnqtymqsqgybnv4u6n |
| L.Jackson | BAL | 0.377596557 | 7.373333 | 75 | 553 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gylnuxkxgm3zd4j0d0ku |
| K.Pickett | PIT | 0.334632485 | 5.157895 | 19 | 98 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vbabaquryx2jw1uahay2 |
| J.Fields | CHI | 0.325976183 | 5.578947 | 76 | 424 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wllk6sizrqmxhcjavard |
| P.Mahomes | KC | 0.310359993 | 5.380952 | 21 | 113 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vs40h82nvqaqvyephwwu |
| T.Lance | SF | 0.304378472 | 4.187500 | 16 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/q4iahwof4dq9xf3vqdcb |
| J.Burrow | CIN | 0.272125841 | 4.258065 | 31 | 132 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pbl27kxsr5ulgxmvtvfn |
| D.Jones | NYG | 0.266698466 | 5.671875 | 64 | 363 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/af6nbjqa6qubnu8oi4ms |
| K.Murray | ARI | 0.217541677 | 5.788462 | 52 | 301 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nkb9x92lbea6ayknzoba |
| K.Cousins | MIN | 0.157339211 | 3.076923 | 13 | 40 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/h9ndf9ralxifgjvot2q4 |
| J.Hurts | PHI | 0.151733908 | 3.835443 | 79 | 303 | 6 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/iv04omcunr78zezpnf8t |
| T.Lawrence | JAX | 0.137825031 | 3.666667 | 27 | 99 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nel2x2crpaasltsb6vwj |
| C.Wentz | WAS | 0.126443902 | 4.157895 | 19 | 79 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/indtay8ngoq8ubhhydbp |
| M.Jones | NE | 0.098671547 | 3.333333 | 24 | 80 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/hhexcwf5oe7onhlemzq9 |
| M.Mariota | ATL | 0.073667942 | 5.090909 | 55 | 280 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/sd59iwjj69atxaqjti1r |
| R.Wilson | DEN | 0.039562910 | 4.520000 | 25 | 113 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gebzv2aljujgleboe4ns |
| D.Carr | LV | 0.013295016 | 4.466667 | 15 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/l1eqjrajbnisr91dmlcl |
| J.Herbert | LAC | 0.005741635 | 2.285714 | 21 | 48 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/qknkhe4w6liovjubzmnh |
| J.Garoppolo | SF | -0.086321370 | 1.769231 | 13 | 23 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/fuepwtifp7wd9grpzvwb |
| Z.Wilson | NYJ | -0.095414433 | 2.750000 | 16 | 44 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gr2wm3rg1bcgtpj58i7l |
| J.Goff | DET | -0.275612063 | 2.461538 | 13 | 32 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/s4hzfab5iglq2tmjb0xy |
| M.Trubisky | PIT | -0.333272156 | 2.230769 | 13 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wx17o0eu4hffytrbdjgl |
| T.Tagovailoa | MIA | -0.360181131 | 2.692308 | 13 | 35 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/dyrrl1knhcxqjwuvhcwm |
| B.Mayfield | CAR | -0.381053088 | 2.857143 | 14 | 40 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/aztmh1hzf253ntkzjgkn |
| R.Tannehill | TEN | -0.465958769 | 1.526316 | 19 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/ns30g7th6gkhjaeoivgj |
SO that’s maybe prettier, but not pretty yet.
gg_runners %>%
gt() %>%
tab_header(
title = "Most Efficient Runners", # ...with this title
subtitle = "Max Whalen") %>%
fmt_number( # A column (numeric data)
columns = c(epa_per_run, yards_per_run), # What column variable? BOD$Time
decimals = 2 # With two decimal places
)
| Most Efficient Runners | ||||||||
| Max Whalen | ||||||||
| rusher_player_name | posteam | epa_per_run | yards_per_run | carries | yards | TDs | position | headshot_url |
|---|---|---|---|---|---|---|---|---|
| T.Hill | NO | 0.58 | 8.64 | 39 | 337 | 5 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pjp1aahvacigoh0sorpg |
| J.Brissett | CLE | 0.43 | 4.58 | 31 | 142 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/z12an2n54jj6ycouwg9h |
| J.Allen | BUF | 0.39 | 5.77 | 53 | 306 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/btfnqtymqsqgybnv4u6n |
| L.Jackson | BAL | 0.38 | 7.37 | 75 | 553 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gylnuxkxgm3zd4j0d0ku |
| K.Pickett | PIT | 0.33 | 5.16 | 19 | 98 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vbabaquryx2jw1uahay2 |
| J.Fields | CHI | 0.33 | 5.58 | 76 | 424 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wllk6sizrqmxhcjavard |
| P.Mahomes | KC | 0.31 | 5.38 | 21 | 113 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vs40h82nvqaqvyephwwu |
| T.Lance | SF | 0.30 | 4.19 | 16 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/q4iahwof4dq9xf3vqdcb |
| J.Burrow | CIN | 0.27 | 4.26 | 31 | 132 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pbl27kxsr5ulgxmvtvfn |
| D.Jones | NYG | 0.27 | 5.67 | 64 | 363 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/af6nbjqa6qubnu8oi4ms |
| K.Murray | ARI | 0.22 | 5.79 | 52 | 301 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nkb9x92lbea6ayknzoba |
| K.Cousins | MIN | 0.16 | 3.08 | 13 | 40 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/h9ndf9ralxifgjvot2q4 |
| J.Hurts | PHI | 0.15 | 3.84 | 79 | 303 | 6 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/iv04omcunr78zezpnf8t |
| T.Lawrence | JAX | 0.14 | 3.67 | 27 | 99 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nel2x2crpaasltsb6vwj |
| C.Wentz | WAS | 0.13 | 4.16 | 19 | 79 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/indtay8ngoq8ubhhydbp |
| M.Jones | NE | 0.10 | 3.33 | 24 | 80 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/hhexcwf5oe7onhlemzq9 |
| M.Mariota | ATL | 0.07 | 5.09 | 55 | 280 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/sd59iwjj69atxaqjti1r |
| R.Wilson | DEN | 0.04 | 4.52 | 25 | 113 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gebzv2aljujgleboe4ns |
| D.Carr | LV | 0.01 | 4.47 | 15 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/l1eqjrajbnisr91dmlcl |
| J.Herbert | LAC | 0.01 | 2.29 | 21 | 48 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/qknkhe4w6liovjubzmnh |
| J.Garoppolo | SF | −0.09 | 1.77 | 13 | 23 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/fuepwtifp7wd9grpzvwb |
| Z.Wilson | NYJ | −0.10 | 2.75 | 16 | 44 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gr2wm3rg1bcgtpj58i7l |
| J.Goff | DET | −0.28 | 2.46 | 13 | 32 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/s4hzfab5iglq2tmjb0xy |
| M.Trubisky | PIT | −0.33 | 2.23 | 13 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wx17o0eu4hffytrbdjgl |
| T.Tagovailoa | MIA | −0.36 | 2.69 | 13 | 35 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/dyrrl1knhcxqjwuvhcwm |
| B.Mayfield | CAR | −0.38 | 2.86 | 14 | 40 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/aztmh1hzf253ntkzjgkn |
| R.Tannehill | TEN | −0.47 | 1.53 | 19 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/ns30g7th6gkhjaeoivgj |
Better, but the columns are ugly
gg_runners %>%
gt() %>%
tab_header(
title = "Most Efficient Runners", # ...with this title
subtitle = "Max Whalen") %>%
fmt_number( # A column (numeric data)
columns = c(epa_per_run, yards_per_run), # What column variable? BOD$Time
decimals = 2 # With two decimal places
) %>%
cols_label(rusher_player_name = "Rusher", epa_per_run = "EPA/Carry", posteam = "Team",
yards_per_run = "Yards/Carry", carries = "Carries", yards = "Yards") %>%
cols_align(
align = "center",
columns = colnames(runners) # Selecting all columns
)
| Most Efficient Runners | ||||||||
| Max Whalen | ||||||||
| Rusher | Team | EPA/Carry | Yards/Carry | Carries | Yards | TDs | position | headshot_url |
|---|---|---|---|---|---|---|---|---|
| T.Hill | NO | 0.58 | 8.64 | 39 | 337 | 5 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pjp1aahvacigoh0sorpg |
| J.Brissett | CLE | 0.43 | 4.58 | 31 | 142 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/z12an2n54jj6ycouwg9h |
| J.Allen | BUF | 0.39 | 5.77 | 53 | 306 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/btfnqtymqsqgybnv4u6n |
| L.Jackson | BAL | 0.38 | 7.37 | 75 | 553 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gylnuxkxgm3zd4j0d0ku |
| K.Pickett | PIT | 0.33 | 5.16 | 19 | 98 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vbabaquryx2jw1uahay2 |
| J.Fields | CHI | 0.33 | 5.58 | 76 | 424 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wllk6sizrqmxhcjavard |
| P.Mahomes | KC | 0.31 | 5.38 | 21 | 113 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vs40h82nvqaqvyephwwu |
| T.Lance | SF | 0.30 | 4.19 | 16 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/q4iahwof4dq9xf3vqdcb |
| J.Burrow | CIN | 0.27 | 4.26 | 31 | 132 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pbl27kxsr5ulgxmvtvfn |
| D.Jones | NYG | 0.27 | 5.67 | 64 | 363 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/af6nbjqa6qubnu8oi4ms |
| K.Murray | ARI | 0.22 | 5.79 | 52 | 301 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nkb9x92lbea6ayknzoba |
| K.Cousins | MIN | 0.16 | 3.08 | 13 | 40 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/h9ndf9ralxifgjvot2q4 |
| J.Hurts | PHI | 0.15 | 3.84 | 79 | 303 | 6 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/iv04omcunr78zezpnf8t |
| T.Lawrence | JAX | 0.14 | 3.67 | 27 | 99 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nel2x2crpaasltsb6vwj |
| C.Wentz | WAS | 0.13 | 4.16 | 19 | 79 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/indtay8ngoq8ubhhydbp |
| M.Jones | NE | 0.10 | 3.33 | 24 | 80 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/hhexcwf5oe7onhlemzq9 |
| M.Mariota | ATL | 0.07 | 5.09 | 55 | 280 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/sd59iwjj69atxaqjti1r |
| R.Wilson | DEN | 0.04 | 4.52 | 25 | 113 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gebzv2aljujgleboe4ns |
| D.Carr | LV | 0.01 | 4.47 | 15 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/l1eqjrajbnisr91dmlcl |
| J.Herbert | LAC | 0.01 | 2.29 | 21 | 48 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/qknkhe4w6liovjubzmnh |
| J.Garoppolo | SF | −0.09 | 1.77 | 13 | 23 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/fuepwtifp7wd9grpzvwb |
| Z.Wilson | NYJ | −0.10 | 2.75 | 16 | 44 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gr2wm3rg1bcgtpj58i7l |
| J.Goff | DET | −0.28 | 2.46 | 13 | 32 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/s4hzfab5iglq2tmjb0xy |
| M.Trubisky | PIT | −0.33 | 2.23 | 13 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wx17o0eu4hffytrbdjgl |
| T.Tagovailoa | MIA | −0.36 | 2.69 | 13 | 35 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/dyrrl1knhcxqjwuvhcwm |
| B.Mayfield | CAR | −0.38 | 2.86 | 14 | 40 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/aztmh1hzf253ntkzjgkn |
| R.Tannehill | TEN | −0.47 | 1.53 | 19 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/ns30g7th6gkhjaeoivgj |
This looks presentable; now let’s change the formatting within the columns
gg_runners %>%
gt() %>%
tab_header(
title = "Most Efficient Runners", # ...with this title
subtitle = "Max Whalen") %>%
fmt_number( # A column (numeric data)
columns = c(epa_per_run, yards_per_run), # What column variable? BOD$Time
decimals = 2 # With two decimal places
) %>%
cols_label(rusher_player_name = "Rusher", epa_per_run = "EPA/Carry", posteam = "Team",
yards_per_run = "Yards/Carry", carries = "Carries", yards = "Yards") %>%
cols_align(
align = "center",
columns = colnames(runners) # Selecting all columns
) %>%
opt_table_outline() %>% # Creating outline
data_color( # Update cell colors...
columns = c(TDs), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(TDs)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(yards_per_run), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(yards_per_run)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(epa_per_run), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(min(epa_per_run),max(epa_per_run)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(carries), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(carries)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(yards), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(yards)) # Column scale endpoints
)
)
| Most Efficient Runners | ||||||||
| Max Whalen | ||||||||
| Rusher | Team | EPA/Carry | Yards/Carry | Carries | Yards | TDs | position | headshot_url |
|---|---|---|---|---|---|---|---|---|
| T.Hill | NO | 0.58 | 8.64 | 39 | 337 | 5 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pjp1aahvacigoh0sorpg |
| J.Brissett | CLE | 0.43 | 4.58 | 31 | 142 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/z12an2n54jj6ycouwg9h |
| J.Allen | BUF | 0.39 | 5.77 | 53 | 306 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/btfnqtymqsqgybnv4u6n |
| L.Jackson | BAL | 0.38 | 7.37 | 75 | 553 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gylnuxkxgm3zd4j0d0ku |
| K.Pickett | PIT | 0.33 | 5.16 | 19 | 98 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vbabaquryx2jw1uahay2 |
| J.Fields | CHI | 0.33 | 5.58 | 76 | 424 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wllk6sizrqmxhcjavard |
| P.Mahomes | KC | 0.31 | 5.38 | 21 | 113 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vs40h82nvqaqvyephwwu |
| T.Lance | SF | 0.30 | 4.19 | 16 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/q4iahwof4dq9xf3vqdcb |
| J.Burrow | CIN | 0.27 | 4.26 | 31 | 132 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pbl27kxsr5ulgxmvtvfn |
| D.Jones | NYG | 0.27 | 5.67 | 64 | 363 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/af6nbjqa6qubnu8oi4ms |
| K.Murray | ARI | 0.22 | 5.79 | 52 | 301 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nkb9x92lbea6ayknzoba |
| K.Cousins | MIN | 0.16 | 3.08 | 13 | 40 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/h9ndf9ralxifgjvot2q4 |
| J.Hurts | PHI | 0.15 | 3.84 | 79 | 303 | 6 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/iv04omcunr78zezpnf8t |
| T.Lawrence | JAX | 0.14 | 3.67 | 27 | 99 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nel2x2crpaasltsb6vwj |
| C.Wentz | WAS | 0.13 | 4.16 | 19 | 79 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/indtay8ngoq8ubhhydbp |
| M.Jones | NE | 0.10 | 3.33 | 24 | 80 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/hhexcwf5oe7onhlemzq9 |
| M.Mariota | ATL | 0.07 | 5.09 | 55 | 280 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/sd59iwjj69atxaqjti1r |
| R.Wilson | DEN | 0.04 | 4.52 | 25 | 113 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gebzv2aljujgleboe4ns |
| D.Carr | LV | 0.01 | 4.47 | 15 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/l1eqjrajbnisr91dmlcl |
| J.Herbert | LAC | 0.01 | 2.29 | 21 | 48 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/qknkhe4w6liovjubzmnh |
| J.Garoppolo | SF | −0.09 | 1.77 | 13 | 23 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/fuepwtifp7wd9grpzvwb |
| Z.Wilson | NYJ | −0.10 | 2.75 | 16 | 44 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gr2wm3rg1bcgtpj58i7l |
| J.Goff | DET | −0.28 | 2.46 | 13 | 32 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/s4hzfab5iglq2tmjb0xy |
| M.Trubisky | PIT | −0.33 | 2.23 | 13 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wx17o0eu4hffytrbdjgl |
| T.Tagovailoa | MIA | −0.36 | 2.69 | 13 | 35 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/dyrrl1knhcxqjwuvhcwm |
| B.Mayfield | CAR | −0.38 | 2.86 | 14 | 40 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/aztmh1hzf253ntkzjgkn |
| R.Tannehill | TEN | −0.47 | 1.53 | 19 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/ns30g7th6gkhjaeoivgj |
For this we need to use a join…
First let’s look at the table we need to get our images
head(teams_colors_logos)
## # A tibble: 6 x 15
## team_abbr team_name team_id team_~1 team_~2 team_~3 team_~4 team_~5 team_~6
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 ARI Arizona Car~ 3800 Cardin~ NFC NFC We~ #97233F #000000 #ffb612
## 2 ATL Atlanta Fal~ 0200 Falcons NFC NFC So~ #A71930 #000000 #a5acaf
## 3 BAL Baltimore R~ 0325 Ravens AFC AFC No~ #241773 #9E7C0C #9e7c0c
## 4 BUF Buffalo Bil~ 0610 Bills AFC AFC Ea~ #00338D #C60C30 #0c2e82
## 5 CAR Carolina Pa~ 0750 Panthe~ NFC NFC So~ #0085CA #000000 #bfc0bf
## 6 CHI Chicago Bea~ 0810 Bears NFC NFC No~ #0B162A #C83803 #0b162a
## # ... with 6 more variables: team_color4 <chr>, team_logo_wikipedia <chr>,
## # team_logo_espn <chr>, team_wordmark <chr>, team_conference_logo <chr>,
## # team_league_logo <chr>, and abbreviated variable names 1: team_nick,
## # 2: team_conf, 3: team_division, 4: team_color, 5: team_color2,
## # 6: team_color3
Here, we only need the team logo for our purposes
teams_colors_logos %>%
select(posteam = team_abbr, logo = team_logo_espn) -> logos # Rename w/select
gg_runners %>%
left_join(logos) %>%
mutate(posteam = logo) %>% # Switching values in these columns
select(-logo) %>% # '-' drops entire column
gt() %>%
tab_header(
title = "Most Efficient Runners", # ...with this title
subtitle = "Max Whalen") %>%
fmt_number( # A column (numeric data)
columns = c(epa_per_run, yards_per_run), # What column variable? BOD$Time
decimals = 2 # With two decimal places
) %>%
cols_label(rusher_player_name = "Rusher", epa_per_run = "EPA/Carry", posteam = "Team",
yards_per_run = "Yards/Carry", carries = "Carries", yards = "Yards") %>%
cols_align(
align = "center",
columns = colnames(runners) # Selecting all columns
) %>%
opt_table_outline() %>% # Creating outline
gt_img_rows(columns = posteam, height = 25) %>%
data_color( # Update cell colors...
columns = c(TDs), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(TDs)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(yards_per_run), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(yards_per_run)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(epa_per_run), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(min(epa_per_run),max(epa_per_run)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(carries), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(carries)) # Column scale endpoints
)
) %>%
data_color( # Update cell colors...
columns = c(yards), # ...for mean_len column
colors = scales::col_numeric(
palette = c(
"#FF6962", "white","#5FA777" ), # Overboard colors!
domain = c(0,max(yards)) # Column scale endpoints
)
)
## Joining, by = "posteam"
| Most Efficient Runners | ||||||||
| Max Whalen | ||||||||
| Rusher | Team | EPA/Carry | Yards/Carry | Carries | Yards | TDs | position | headshot_url |
|---|---|---|---|---|---|---|---|---|
| T.Hill | 0.58 | 8.64 | 39 | 337 | 5 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pjp1aahvacigoh0sorpg | |
| J.Brissett | 0.43 | 4.58 | 31 | 142 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/z12an2n54jj6ycouwg9h | |
| J.Allen | 0.39 | 5.77 | 53 | 306 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/btfnqtymqsqgybnv4u6n | |
| L.Jackson | 0.38 | 7.37 | 75 | 553 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gylnuxkxgm3zd4j0d0ku | |
| K.Pickett | 0.33 | 5.16 | 19 | 98 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vbabaquryx2jw1uahay2 | |
| J.Fields | 0.33 | 5.58 | 76 | 424 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wllk6sizrqmxhcjavard | |
| P.Mahomes | 0.31 | 5.38 | 21 | 113 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/vs40h82nvqaqvyephwwu | |
| T.Lance | 0.30 | 4.19 | 16 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/q4iahwof4dq9xf3vqdcb | |
| J.Burrow | 0.27 | 4.26 | 31 | 132 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/pbl27kxsr5ulgxmvtvfn | |
| D.Jones | 0.27 | 5.67 | 64 | 363 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/af6nbjqa6qubnu8oi4ms | |
| K.Murray | 0.22 | 5.79 | 52 | 301 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nkb9x92lbea6ayknzoba | |
| K.Cousins | 0.16 | 3.08 | 13 | 40 | 2 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/h9ndf9ralxifgjvot2q4 | |
| J.Hurts | 0.15 | 3.84 | 79 | 303 | 6 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/iv04omcunr78zezpnf8t | |
| T.Lawrence | 0.14 | 3.67 | 27 | 99 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/nel2x2crpaasltsb6vwj | |
| C.Wentz | 0.13 | 4.16 | 19 | 79 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/indtay8ngoq8ubhhydbp | |
| M.Jones | 0.10 | 3.33 | 24 | 80 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/hhexcwf5oe7onhlemzq9 | |
| M.Mariota | 0.07 | 5.09 | 55 | 280 | 3 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/sd59iwjj69atxaqjti1r | |
| R.Wilson | 0.04 | 4.52 | 25 | 113 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gebzv2aljujgleboe4ns | |
| D.Carr | 0.01 | 4.47 | 15 | 67 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/l1eqjrajbnisr91dmlcl | |
| J.Herbert | 0.01 | 2.29 | 21 | 48 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/qknkhe4w6liovjubzmnh | |
| J.Garoppolo | −0.09 | 1.77 | 13 | 23 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/fuepwtifp7wd9grpzvwb | |
| Z.Wilson | −0.10 | 2.75 | 16 | 44 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/gr2wm3rg1bcgtpj58i7l | |
| J.Goff | −0.28 | 2.46 | 13 | 32 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/s4hzfab5iglq2tmjb0xy | |
| M.Trubisky | −0.33 | 2.23 | 13 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/wx17o0eu4hffytrbdjgl | |
| T.Tagovailoa | −0.36 | 2.69 | 13 | 35 | 0 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/dyrrl1knhcxqjwuvhcwm | |
| B.Mayfield | −0.38 | 2.86 | 14 | 40 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/aztmh1hzf253ntkzjgkn | |
| R.Tannehill | −0.47 | 1.53 | 19 | 29 | 1 | QB | https://static.www.nfl.com/image/private/f_auto,q_auto/league/ns30g7th6gkhjaeoivgj | |
Let’s continue with the RB Data
gg_runners %>%
left_join(logos) -> gg_data
## Joining, by = "posteam"
First, pipe the dataframe into the function. Most of the work happens in the aes() function.
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run)) +
geom_point()
There we have a simple plot. One that doesn’t tell much of a story though
Note we use a ‘+’ instead of a ‘%>%’ within the ggplot world
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, size = carries,
label = rusher_player_name)) +
geom_point() +
geom_text(nudge_y = 0.04)
Better but still ugly and pretty useless. There’s a great package to resolve our labeling issue though
library(ggrepel)
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, size = carries,
label = rusher_player_name)) +
geom_point() +
geom_text_repel() # replace w/repel function
So it’s still noisy, but this is a lot better. Now let’s clean up the labels
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, size = carries,
label = rusher_player_name)) +
geom_point() +
geom_text_repel() +
labs( x = "Yards/Carry",
y = "EPA/Carry",
title = "Who are the NFL's best runners?",
subtitle = "EPA is used measure the value of a run in the context
of the game")
We can make our visualizations a lot cleaner, without much work, using the ggthemes package.
library(ggthemes)
It’s worth exploring the different options, but I always use the light theme or fivethirtyeight. Here’s a look…
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, size = carries,
label = rusher_player_name)) +
geom_point() +
geom_text_repel() +
labs( x = "Yards/Carry",
y = "EPA/Carry",
title = "Who are the NFL's best runners?",
subtitle = "EPA is used measure the value of a run in the context
of the game") +
theme_light()
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, size = carries,
label = rusher_player_name)) +
geom_point() +
geom_text_repel() +
labs( x = "Yards/Carry",
y = "EPA/Carry",
title = "Who are the NFL's best runners?",
subtitle = "EPA is used measure the value of a run in the context
of the game") +
theme_fivethirtyeight()
Here’s a cool package that helps us replace these black dots with something a little more visually appealling.
library(ggimage)
gg_data %>%
ggplot(aes(x = yards_per_run, y = epa_per_run, label = rusher_player_name)) +
geom_image(aes(image=headshot_url, size = I(0.1))) + # UPDATE
geom_text_repel() +
labs( x = "Yards/Carry",
y = "EPA/Carry",
title = "Who are the NFL's best running QBs?",
subtitle = "EPA is used measure the value of a run in the context
of the game") +
theme_fivethirtyeight()
This is a super helpful package for highlighting certain features of the data you want to highlight
# gg_data %>%
# ggplot(aes(x = yards_per_run, y = epa_per_run, label = rusher_player_name)) +
# geom_image(aes(image=logo, size = I(0.1))) +
# gghighlight(carries >= 25) + # UPDATE
# geom_text_repel() +
# labs( x = "Yards/Carry",
# y = "EPA/Carry",
# title = "Who are the NFL's best runners?",
# subtitle = "EPA is used measure the value of a run in the context
# of the game") +
# theme_fivethirtyeight()