The purpose of this assignment is to improve on our professor’s metric to utilizing 2022 NFL Big Data Bowl Data. Our Professor created a metric in which he calculated the average distance of a punt returner (Isaiah McKenzie) and the oppossing team (Miami Dolphins). From this, we can see how McKenzie is able to avoid defenders with an animation, looking at this specific play frame by frame.
Using the data from the 2022 NFL Big Data Bowl and our Professor’s code, we improved on the metric by creating weighted distance. We created a rating system of determining how valuable an opposing player’s position is to the punt returner from within 20 yards. On a scale of 0 to 100, 100 being the highest rating, we created an equation that reduces the initial value of 100 by a factor that depends on the defender’s distance to the punt returner. The less that is being subtracted from 100, the higher the weighted distance is. We have also implemented some code to test correlation between our metric and our Professor’s metric.
The data visualization shows an animation of Isaiah McKenzie returning a punt for a touchdown against the Miami Dolphins.
# ------------------------ Libraries -------------------------------------------
setwd("//apporto.com/dfs/LOYOLA/Users/ssilguero_loyola/Desktop/IS470")
#setwd("C:/Users/brygn/Dropbox/Stuff/School/IS470")
library(tidyr)
# ---------- supplied functions start here -----------------------------------------------------------------
# ---------------------------------------------------------------------------
# --------------------------- Load Libraries --------------------------------
# ---------------------------------------------------------------------------
# supply a vector of desired packages. Packages will be installed if not
# already installed and available to RStudio
load_packages <- function(packages = c()) {
if (length(packages) == 0) {
print('You did not specify any packages/libraries to load')
}
else {
for (i in packages){
if(! i %in% installed.packages()){
install.packages(i, dependencies = TRUE)
}
suppressMessages(suppressWarnings(library(i, character.only=T)))
}
}
}
# -----------------------------------------------------------------------------------------------
# ------------ Checks for a valid playId and gameId; optionally check for valid frameId ---------
# -----------------------------------------------------------------------------------------------
check_playId_gameId <- function(game_Id,
play_Id,
frame_Id = c(),
data) {
if (!game_Id %in% unique(data$gameId)) {
return("the game_Id you have used is not in the data frame.")
}
if (!play_Id %in% unique(data$playId)) {
return("the play_Id you have used is not in the data frame.")
}
if(!missing(frame_Id)) {
if (!frame_Id %in% unique(data$frameId)) {
return("the frame_Id you have used is not in the data frame.")
}
}
return("ok")
}
# ---------------------------------------------------------------------------
# ------------------ Visualize an Individual Play (with player) -------------
# ---------------------------------------------------------------------------
# provide a merged data frame, a game Id, a play Id, and a player.
visualize_play <- function(df,
df_team1,
df_team2,
playerNumber = NA)
{
source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
g <- ggplot(data = df, aes(x = x, y = y)) +
# customize colors, shapes, and sizes of players and the football
scale_size_manual(values = c(6, 4, 6), guide = "none") +
scale_shape_manual(values = c(21, 16, 21), guide = "none") +
scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide = "none") +
scale_colour_manual(values = c("black", "#663300", "black"), guide = "none") +
gg_field(yardmin = -5, yardmax = 125) +
# add home team logo to the center point
geom_image(data = df_team1, aes(x = 59, y = 53.33/2, image = team_logo_wikipedia), size = 0.15) +
# add end zone team names
annotate("text", x = 4, y = 27, label = df_team2$team_name, angle = 90, size = 5, fontface = 2, color = "white") +
annotate("text", x = 115, y = 27, label = df_team1$team_name, angle = 270, size = 5, fontface = 2, color = "white") +
# add informational text at the top of the field
annotate("text", x = -3, y = 56, label = "Traffic Meter Scale:", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 38, y = 56, label = "Value:" , colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 52, y = 56, label = "Distance traveled by receiver (yards):", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 92.5, y = 56, label = "Speed (yards / sec):", colour = "white", vjust = 0, hjust = 0 ) +
# add convex hulls for both teams
geom_encircle( data = df %>% filter(team != "football"),
aes(fill = team,
group = team,
color = team),
s_shape = 1,
expand = 0,
alpha = 0.5,
show.legend = TRUE) +
# add halo around receiver
geom_point(data = df %>% filter(jerseyNumber == playerNumber),
aes(x = x, y = y, alpha = alpha),
shape = 21,
size = 10,
stroke = 1, # width of the circle border
fill = "black",
colour = "black") +
# add points to plot for all players and the football
geom_point(data = df, aes(x = x, y = y, shape = team, colour = team, size = team, fill = team) ) +
# insert jersey number for each player
geom_text( data = df %>% filter(team != "football"),
aes(x = x, y = y,
label = jerseyNumber), colour = "white", size = 3.5, vjust = 0.36 ) +
# report out key metrics (total yards ran by receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 85.5, y = 56, label = format(dist_running_total, nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (mean dist to player)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 44.5, y = 56, label = format(round(100 * (1 - (mean_dist_to_player/113.33)),2), nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (speed of receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 111.9, y = 56, label = s, vjust = 0, hjust = 0), color = "white" ) +
# add some labels to report the play description
labs(title = df$playDescription) +
# set the theme to dark green to color the areas beyond the end zones
theme(panel.background = element_rect(fill = "forestgreen",
color = "forestgreen"), panel.grid = element_blank()) +
annotate("rect", xmin = 15, xmax = 35, ymin = 56, ymax = 57, color = "black", fill = "white") +
geom_rect(data = df %>% filter(jerseyNumber == playerNumber),
aes(xmin = 15, xmax = 15 + (alpha*20), ymin = 56, ymax = 57,
alpha = alpha), color = "black", fill = "black" ) +
guides(alpha = "none") +
transition_time(frameId)
g
}
# ---------- main code starts here ----------------------------------------------------------------------
load_packages(c("ggplot2", "ggalt", "ggforce", "hms", "gganimate", "data.table", "dplyr", "nflfastR", "gifski", "png", "ggimage"))
df_tracking <- fread("Data/NFLBDB2022/tracking2020.csv")
df_plays <- fread("Data/NFLBDB2022/plays.csv")
df_games <- fread("Data/NFLBDB2022/games.csv")
# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586
my_playerNumber <- 19
df <- df_tracking %>%
filter(gameId == my_gameId & playId == my_playId ) %>%
left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time,
nflId, dis, playDescription, s) %>%
data.frame()
team1 <- as.character( df_games[df_games$gameId == my_gameId, "homeTeamAbbr"] )
team2 <- as.character( df_games[df_games$gameId == my_gameId, "visitorTeamAbbr"] )
df_team1 <- teams_colors_logos %>%
filter(team_abbr == team1)
df_team2 <- teams_colors_logos %>%
filter(team_abbr == team2)
df_player1 <- df %>%
filter(jerseyNumber == my_playerNumber) %>%
select(gameId, playId, frameId, x, y, dis, team) %>%
data.frame()
df <- df %>%
left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
# ----------------------------- here is the code that I created in class -----------------------------
# compute the distance to each player on the opposing team (not to your teammates and not to the ball)
mutate(dist = ifelse( team.x != team.y & team.x != "football",
sqrt( (x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
# compute the average of all distances
mutate(mean_dist_to_player = mean(dist, na.rm = TRUE ),
alpha = 1 - (mean_dist_to_player/113.33) ) %>%
ungroup() %>%
# -------------------------------------- my code ends here ------------------------------------------
# -------------------------------------start your code below ----------------------------------------
mutate(closenessTest = ifelse(dist <= 20, TRUE, FALSE)) %>%
group_by(gameId, playId, frameId) %>%
arrange(frameId) %>%
#mutate(closenessSum = cumsum(replace_na(closenessTest, 0))) %>%
#mutate(alpha = closenessSum / 11) %>%
mutate(weightedDist = ifelse(closenessTest, (100 - (100/sqrt(20))*sqrt(dist)), NA)) %>%
mutate(meanWeightedDist = mean(weightedDist, na.rm= TRUE)) %>%
mutate(alpha = meanWeightedDist / 100) %>%
ungroup() %>%
# ------------------------------------- your code ends here -----------------------------------------
# compute a running total of the distance traveled
arrange(nflId, frameId) %>%
group_by(nflId) %>%
mutate( dist_running_total = cumsum(dis.x)) %>%
rename( "x" = "x.x",
"y" = "y.x",
"team" = "team.x",
"dis" = "dis.x") %>%
data.frame()
# run the line below if you want to visualize the play
# think about what statistical tests to run to compare my Euclidean measure with your measures
# Correlation, Ttest, probability density functions overlapping.
test <- cor.test(df$meanWeightedDist, df$mean_dist_to_player, method = "kendall", use = "complete.obs")
print(test)
##
## Kendall's rank correlation tau
##
## data: df$meanWeightedDist and df$mean_dist_to_player
## z = -30.841, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.3358463
visualize_play(df, df_team1, df_team2, playerNumber = 19)
This new statistic metric is a better representation of how much of a threat a defender poses towards the punt returner. We are able to rate a defender’s positioning frame by frame more specifically and create a rating system based on the given distance. However, our correlation test displays that it is not significantly different than our Professor’s created metric.
If interested in more or have any questions or suggestions, please get in contact with me at brygnichols@gmail.com/bgnichols@loyola.edu and ssilguero@loyola.edu.