For this assignment, we were prompted to expand upon and improve a traffic metric for a punt returner. The metric at its core simply calculated the distance between the returner and the defender and then took the mean of all these distances. A higher rating would indicate the ball carrier being surrounded by more people and a lower one suggests that they have more room to run.
To expand upon this metric, we wanted to make it so that it only included players who actually have a chance to make a play on the ball carrier, as the main issue we noticed with the previous metric was that it factored in everyone no matter their chance at getting to the ball, most notably, defenders who were behind him. In order to due this, we wanted to create a metric that filtered out trailing defenders based on the estimated amount of time until the defender reaches the ball carrier by comparing their speed to that of the ball carrier as well as how much distance they would have to make up to make the tackle. Initially, we wanted to create a metric that solely used the time to the ball carrier as we thought it would give a more accurate representation of traffic, but while trying to implement it we realized that it served better as a filter to ensure players who are too far away and too slow were not represented.
Using these numbers we calculated the difference in speed between the ball carrier and defender and if the defender was moving faster than the ball carrier, we would use the distance between the two in order to calculate how fast the defender could get to the ball. If the time it would take for the defender to get to the ball is shorter than the time it would take the ball carrier to reach the endzone, then we will account for that defender in the metric and factor in their proximity to the ball carrier.
#Buffalo (home) vs Miami Dolphins (away)
suppressWarnings(suppressMessages(library(data.table)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(gganimate)))
suppressWarnings(suppressMessages(library(gifski)))
setwd("/Users/mike/Desktop/nfl-big-data-bowl-2022")
#read in the necessary files for tracking and plays
player_tracking_df <- fread("tracking2020.csv")
plays_df <- fread("plays.csv")
# I am interested, for demo purposes, in play 1586 in game 2021010300
game_ID <- c("2021010300")
play_ID <- c("1586", "395", "1232", "3930")
df <- player_tracking_df %>%
filter(gameId %in% game_ID & playId %in% play_ID) %>%
left_join(plays_df, by = c("gameId" = "gameId", "playId" = "playId")) %>%
select(gameId, playId, time, x, y, dis, nflId, jerseyNumber, team,
frameId, playDescription, playDirection, s) %>%
data.frame()
rm(player_tracking_df)
rm(plays_df)
df19 <- df %>%
select(gameId, playId, time, jerseyNumber, x, y, frameId, dis, s) %>%
filter(jerseyNumber == 19) %>%
mutate(runnerDtoEndzone = 110-x) %>%
data.frame()
df10 <- left_join(df, df19, by = c("gameId" = "gameId",
"playId" = "playId",
"time" = "time",
"frameId" = "frameId")) %>%
select(gameId, playId, time, x.x, y.x, nflId, jerseyNumber.x,
frameId, x.y, y.y,team, playDescription, dis.y, s.y, runnerDtoEndzone, s.x) %>%
# compute the distance from player 19 to all opposing players, otherwise NA
mutate(dist = ifelse(team == "away", sqrt((x.x - x.y)^2 + (y.x - y.y)^2), NA )) %>%
data.frame()
df11 <- df10 %>%
mutate(secondsToTD = runnerDtoEndzone/s.x) %>%
data.frame()
df12 <- df11 %>%
mutate(secondsToTD = runnerDtoEndzone/s.x) %>%
data.frame()
df13 <- df12 %>%
mutate(secondsToRunner = 1/(s.y - s.x)*dist) %>%
data.frame()
df2 <- df13 %>%
mutate(dist = ifelse(x.x > x.y & dist <= 15,
ifelse(secondsToRunner < secondsToTD,
dist, NA),
ifelse(dist <= 15, dist, NA))) %>%
select(-c("x.y","y.y", "s.x")) %>%
rename(x = x.x,
y = y.x,
s = s.y,
jerseyNumber = jerseyNumber.x) %>%
# find the average distance from player 19 to opposing players
#113.33 = dist on field from lower left corner to upper right corner
#100 yards long and 53.3333 wide or 120 feet wide
group_by(gameId, playId, frameId) %>%
mutate(mean_dist_to_19 = mean(dist, na.rm = TRUE),
alpha = 1 - (mean_dist_to_19/113.33) ) %>%
ungroup() %>%
# compute the total distance traveled by player 19
arrange(jerseyNumber, time, frameId) %>%
group_by(gameId, playId, jerseyNumber) %>%
mutate(dist_running_total = cumsum(dis.y)) %>%
# create a width for a moving bar
mutate(width = round( 20*(1-(mean_dist_to_19/113.33)) ,0) ) %>%
# create each play level stats for the receiver
group_by(gameId, playId) %>%
mutate(play_stat = mean(unique(mean_dist_to_19))) %>%
ungroup() %>%
# create each game level stats for the receiver
group_by(gameId) %>%
mutate(game_stat = mean(unique(play_stat))) %>%
ungroup() %>%
# create each player stat over all plays and all games
mutate(player_stat = mean(unique(game_stat))) %>%
# create a new data frame
data.frame()
#print out play, game, and player stats
result1 = df2 %>% select(gameId, playId, play_stat) %>% distinct %>% data.frame()
result2 = df2 %>% select(gameId, game_stat) %>% distinct %>% data.frame()
result3 = df2 %>% select(player_stat) %>% distinct %>% data.frame()
# THE CODE BELOW IS DESIGNED FOR gameId = 2021010300 and playId = 1586
df3 <- df2 %>% filter(gameId == 2021010300 & playId == 1586) %>% data.frame()
To show this traffic meter in action, we animated a punt return that went for a touchdown in a game between the Bills and Dolphins. In this play, the ball carrier, Isaiah McKenzie, was set back deep for the return and then proceeds to weave in and out of traffic as he takes the ball all the way to the endzone. Above the animation, you can see a traffic meter showing his traffic rating at any given moment, and the circle around McKenzie becomes more solid with a higher traffic rating, and more transparent as the rating decreases.
# create a title for the plot using the play description
plot_title <- paste0(df3$playDescription[1],
"\n The halo around the receiver grows darker based on the proximity of players from the opposing team.")
#loading command to make NFL field in ggplot (credit to Marschall Furman)
source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
g <- ggplot(data=df3) +
# sets up colors, shapes, and circle sizes
scale_size_manual(values = c(6, 4, 6), guide="none") +
scale_shape_manual(values = c(21, 16, 21), guide="none") +
scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide="none") +
scale_colour_manual(values = c("#000000", "#663300", "#000000"), guide="none") +
# plot the field (full size)
gg_field(yardmin = -5, yardmax = 125) +
# add labels and background to the plot
labs(title = plot_title) +
theme(panel.background = element_rect(fill = 'forestgreen',
color = 'forestgreen'),
panel.grid = element_blank()) +
# add points for players on both teams and the football
geom_point(aes(x = x,
y = y,
shape = team,
fill = team,
group = nflId,
size = team,
colour = team)) +
# add a larger circle around the receiver and shade it
geom_point( data = df3%>% filter(jerseyNumber==19),
aes(x=x,y=y,alpha=alpha),
shape=21,
size=10,
stroke=1,
fill="black",
colour="black") +
# include jersey numbers in the center of all players' circles
geom_text( aes(x = x,
y = y,
group = nflId,
label=ifelse(is.na(jerseyNumber),"",jerseyNumber)),
colour="white", vjust=0.36, size=3.5) +
# add informational text at the top of the plot
geom_text(aes(x=-3, y=56, label="Traffic Meter Scale:"),
colour="white",vjust=0,hjust=0) +
geom_text(aes(x=38, y=56, label="Value:"),
colour="white",vjust=0,hjust=0) +
geom_text(aes(x=53, y=56, label="Distance traveled by receiver (yards):"),
colour="white",vjust=0,hjust=0) +
geom_text(aes(x=92.5, y=56, label="Speed (yards / sec):"),
colour="white",vjust=0,hjust=0) +
# draw an empty rectangle at the top of the plot, fill it in
geom_rect(aes(xmin = 15, xmax = 35, ymin = 56, ymax = 57),
color="black", fill="white") +
geom_rect(data=df3 %>% filter(jerseyNumber==19),
aes(xmin = 15, xmax = 15 + width,
ymin = 56, ymax = 57, alpha=alpha),
color="black", fill="black") +
# report out key metrics (mean dist to receiver)
geom_text(data = df3%>% filter(jerseyNumber==19),
aes(x = 44.5, y = 56,
#label=format(round(100-mean_dist_to_19,2),nsmall=2),
label = format( round( 100*(1-(mean_dist_to_19/113.33)) ,2), nsmall=2),
vjust=0, hjust=0),
color="white") +
# report out key metrics (total yards ran by receiver)
geom_text(data = df3%>% filter(jerseyNumber==19),
aes(x = 85.5, y = 56, label=format(dist_running_total, nsmall=2),vjust=0,hjust=0),
color="white") +
# report out key metrics (speed of receiver)
geom_text(data = df3%>% filter(jerseyNumber==19),
aes(x = 111.9, y = 56, label=format(s, nsmall=2),vjust=0,hjust=0),
color="white") +
# sketch the plot for each frameId
transition_time(frameId) +
# remove all color legends from the side of the plot
guides(alpha = "none")
# number of frames to animate
frames_to_display <- length(unique(df3$frameId))
# build the animation, specify fps, frame size
animate(g, fps=10, nframe = frames_to_display,
width = 800, height = 380,
renderer = gifski_renderer())
# if you want to save the plot as a gif file
anim_save("out.gif")
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.