The focus of this assignment is modify our professor’s code that evaluates a punt return. In Lehman’s terms, we will be scraping his ‘traffic metric’ and upgrading it to our ‘circular traffic metric’ statistic. This statistics considers the average distance of the kicking team’s player defending the punt against the punt returner within a specified radius. We will be testing three radii throughout our code.
This GIF below is the product we are modifying.
knitr::include_graphics("C:/Users/raymo/OneDrive/Documents/Loyola Fall 2024/IS470/Data/Prof Tallon Original Punt Return Visualization.gif")
Prof Tallon Original Punt Return Visualization
The code below helps us complete a few things. First, we obtain access to the working files via the working directory. Next, we have two functions: a package loader and a game+playId checker. These functions were provided to us and are simplified for an easier coding experience.
setwd("C:/")
# ---------- supplied functions start here ----------------------------------
# ---------------------------------------------------------------------------
# --------------------------- Load Libraries --------------------------------
# ---------------------------------------------------------------------------
# supply a vector of desired packages. Packages will be installed if not
# already installed and available to RStudio
load_packages <- function(packages = c()) {
if (length(packages) == 0) {
print('You did not specify any packages/libraries to load')
}
else {
for (i in packages){
if(! i %in% installed.packages()){
install.packages(i, dependencies = TRUE)
}
suppressMessages(suppressWarnings(library(i, character.only=T)))
}
}
}
# -----------------------------------------------------------------------------------------------
# ------------ Checks for a valid playId and gameId; optionally check for valid frameId ---------
# -----------------------------------------------------------------------------------------------
check_playId_gameId <- function(game_Id,
play_Id,
frame_Id = c(),
data) {
if (!game_Id %in% unique(data$gameId)) {
return("the game_Id you have used is not in the data frame.")
}
if (!play_Id %in% unique(data$playId)) {
return("the play_Id you have used is not in the data frame.")
}
if(!missing(frame_Id)) {
if (!frame_Id %in% unique(data$frameId)) {
return("the frame_Id you have used is not in the data frame.")
}
}
return("ok")
}
# ---------------------------------------------------------------------------
# ------------------ Visualize an Individual Play (with player) -------------
# ---------------------------------------------------------------------------
# provide a merged data frame, a game Id, a play Id, and a player.
visualize_play <- function(df,
df_team1,
df_team2,
playerNumber = NA,
max_dist)
{
source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
max_distance <- 4*max(df[[max_dist]], na.rm = TRUE)
distance_value <- as.numeric(sub("dist_", "", max_dist))
mean_distance_column <- switch(max_dist,
"dist_5" = "mean_dist_to_player_5",
"dist_10" = "mean_dist_to_player_10",
"dist_15" = "mean_dist_to_player_15",
"dist_113.33" = "mean_dist_to_player_113.33")
alpha_column <- switch(max_dist,
"dist_5" = "alpha_5",
"dist_10" = "alpha_10",
"dist_15" = "alpha_15",
"dist_113.33" = "alpha_113.33")
g <- ggplot(data = df, aes(x = x, y = y)) +
# customize colors, shapes, and sizes of players and the football
scale_size_manual(values = c(6, 4, 6), guide = "none") +
scale_shape_manual(values = c(21, 16, 21), guide = "none") +
scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide = "none") +
scale_colour_manual(values = c("black", "#663300", "black"), guide = "none") +
scale_alpha_continuous(range = c(0, 1), guide = "none") +
gg_field(yardmin = -5, yardmax = 125) +
# add home team logo to the center point
geom_image(data = df_team1, aes(x = 59, y = 53.33/2, image = team_logo_wikipedia), size = 0.15) +
# add end zone team names
annotate("text", x = 4, y = 27, label = df_team2$team_name, angle = 90, size = 5, fontface = 2, color = "white") +
annotate("text", x = 115, y = 27, label = df_team1$team_name, angle = 270, size = 5, fontface = 2, color = "white") +
# add informational text at the top of the field
annotate("text", x = -3, y = 56, label = "Traffic Meter Scale:", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 38, y = 56, label = "Value:" , colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 52, y = 56, label = "Distance traveled by receiver (yards):", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 92.5, y = 56, label = "Speed (yards / sec):", colour = "white", vjust = 0, hjust = 0 ) +
# add convex hulls for both teams
geom_encircle( data = df %>% filter(team != "football"),
aes(fill = team,
group = team,
color = team),
s_shape = 1,
expand = 0,
alpha = 0.5,
show.legend = TRUE) +
# This will change depending on which distance we use, for now I will hard code the size
# It seems that how ever many yards we are going for should be mult. by 4
# add halo around receiver
geom_point(data = df %>% filter(jerseyNumber == playerNumber),
aes(x = x, y = y, alpha = get(alpha_column)),
shape = 21,
size = max_distance,
stroke = 1, # width of the circle border
fill = "pink",
colour = "pink") +
# add points to plot for all players and the football
geom_point(data = df, aes(x = x, y = y, shape = team, colour = team, size = team, fill = team) ) +
# insert jersey number for each player
geom_text( data = df %>% filter(team != "football"),
aes(x = x, y = y,
label = jerseyNumber), colour = "white", size = 3.5, vjust = 0.36 ) +
# report out key metrics (total yards ran by receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 85.5, y = 56, label = format(dist_running_total, nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (mean dist to player)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 44.5, y = 56, label = format
(round(100 * (1 - (get(mean_distance_column)/ distance_value)), 2), nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (speed of receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 111.9, y = 56, label = s, vjust = 0, hjust = 0), color = "white" ) +
# add some labels to report the play description
labs(title = df$playDescription) +
# set the theme to dark green to color the areas beyond the end zones
theme(panel.background = element_rect(fill = "forestgreen",
color = "forestgreen"), panel.grid = element_blank()) +
annotate("rect", xmin = 15, xmax = 35, ymin = 56, ymax = 57, color = "black", fill = "white") +
geom_rect(data = df %>% filter(jerseyNumber == playerNumber),
aes(xmin = 15, xmax = 15 + (get(alpha_column)*20), ymin = 56, ymax = 57,
alpha = get(alpha_column)), color = "black", fill = "black" ) +
guides(alpha = "none") +
transition_time(frameId)
g
gganimate::transition_time(frameId)
frames_to_display <- max(df$frameId)
df$hms <- as_hms(df$time)
df$seconds <- hour(df$hms) * 3600 + minute(df$hms) * 60 + second(df$hms)
fps <- frames_to_display / (max(df$seconds) - min(df$second))
animate(g,
fps = fps,
nframe = frames_to_display,
width = 720,
height = 440,
renderer = gifski_renderer())
anim_save("out.gif")
}
Instead of using the average of all defending players, we looked at all players within certain physical intervals of the punt returner. Those specific intervals were 5 yds, 10 yds, 15yds.
# ---------- main code starts here ----------------------------------------------------------------------
load_packages(c("ggplot2", "ggalt", "ggforce", "hms", "gganimate", "lubridate", "data.table", "dplyr", "nflfastR", "gifski", "png", "ggimage"))
df_tracking <- fread("C:/Users/raymo/OneDrive/Documents/Loyola Fall 2024/IS470/Data/tracking2020.csv")
df_plays <- fread("C:/Users/raymo/OneDrive/Documents/Loyola Fall 2024/IS470/Data/plays.csv")
df_games <- fread("C:/Users/raymo/OneDrive/Documents/Loyola Fall 2024/IS470/Data/games.csv")
# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586
my_playerNumber <- 19
df <- df_tracking %>%
filter(gameId == my_gameId & playId == my_playId ) %>%
left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time,
nflId, dis, playDescription, s) %>%
data.frame()
team1 <- as.character( df_games[df_games$gameId == my_gameId, "homeTeamAbbr"] )
team2 <- as.character( df_games[df_games$gameId == my_gameId, "visitorTeamAbbr"] )
df_team1 <- teams_colors_logos %>%
filter(team_abbr == team1)
df_team2 <- teams_colors_logos %>%
filter(team_abbr == team2)
df_player1 <- df %>%
filter(jerseyNumber == my_playerNumber) %>%
select(gameId, playId, frameId, x, y, dis, team) %>%
data.frame()
df <- df %>%
left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
# ----------------------------- here is the code that I created in class -----------------------------
# compute the distance to each player on the opposing team (not to your teammates and not to the ball)
mutate(dist_113.33 = ifelse( team.x != team.y & team.x != "football",
sqrt( (x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
# compute the average of all distances
mutate(mean_dist_to_player_113.33 = mean(dist_113.33, na.rm = TRUE ),
alpha_113.33 = 1 - (mean_dist_to_player_113.33/113.33) ) %>%
ungroup() %>%
# -------------------------------------- my code ends here ------------------------------------------
# -------------------------------------start your code below ----------------------------------------
mutate(dist_5 = ifelse(dist_113.33 > 2.5, NA, dist_113.33)) %>%
group_by(gameId, playId, frameId) %>%
mutate(mean_dist_to_player_5 = mean(dist_5, na.rm = TRUE ),
alpha_5 = 1 - (mean_dist_to_player_5/2.5) ) %>%
ungroup() %>%
mutate(dist_10 = ifelse(dist_113.33 > 5, NA, dist_113.33)) %>%
group_by(gameId, playId, frameId) %>%
mutate(mean_dist_to_player_10 = mean(dist_10, na.rm = TRUE ),
alpha_10 = 1 - (mean_dist_to_player_10/5) ) %>%
ungroup() %>%
mutate(dist_15 = ifelse(dist_113.33 > 7.5, NA, dist_113.33)) %>%
group_by(gameId, playId, frameId) %>%
mutate(mean_dist_to_player_15 = mean(dist_15, na.rm = TRUE ),
alpha_15 = 1 - (mean_dist_to_player_15/7.5) ) %>%
ungroup() %>%
# Comparing the distance of the old metric and the three new ones
# We can see that the NA's decrease as the circle is expanded, this is expected
# When running the summary tests there seems to be some skewness to the left, the mode is greater than the mean
# ------------------------------------- your code ends here -----------------------------------------
# compute a running total of the distance traveled
arrange(nflId, frameId) %>%
group_by(nflId) %>%
mutate( dist_running_total = cumsum(dis.x)) %>%
rename( "x" = "x.x",
"y" = "y.x",
"team" = "team.x",
"dis" = "dis.x") %>%
data.frame()
This visualization is the 15 yard diameter area that intakes only the defenders with the pink circular area.
knitr::include_graphics("C:/Users/raymo/OneDrive/Documents/Loyola Fall 2024/IS470/Data/Modified Punt Return Visualization.gif")
Modified Punt Return Visualization
The PDF plot below indicates that probabilities of the distances for the defensive players in relation to the punt returner.
x <- df %>%
select(dist_113.33, dist_15) %>% stack() %>%
data.frame()
ggplot(data = x[!is.na(x$values),], aes(values, group = ind, fill = ind)) + geom_density(alpha=0.5) +
labs(x = "Distance away From Punt Returner (yards)",
y = "Probability",
title = "Circular Traffic Metric (15 yard diameter) PDF Plot",
fill = "Metric Results") +
theme(plot.title = element_text(hjust = 0.5))
Because “p-value < 2.2e-16” the p-value is quite small, this result is statistically significant. Statistical significance in this instance indicates that our metric is different than Prof. Tallon’s and it does offer something different that his metric did not offer before.
t.test(df$dist_113.33, df$dist_15)
##
## Welch Two Sample t-test
##
## data: df$dist_113.33 and df$dist_15
## t = 57.848, df = 2801.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 21.75841 23.28520
## sample estimates:
## mean of x mean of y
## 27.085621 4.563815