We were tasked with improving the metric provided to identify the defensive players who were closest to the player of interest/ had the shortest distance from them . We then were tasked with comparing this data and visualizing the results
To modify the metric used to calculate distance from the player, we used the arrange function to sort the players from greatest to least distance from the player of interest. We then isolated the last three players from these results, the players with the shortest distance to the player per frame. To analyze this metric and the metric provided to us, we used a t test to compare.
To visualize the comparison/ analysis done with the t test, we created a density plot to model the difference between the given and developed results
setwd("C:/Users/genny/OneDrive/Documents/IS470 Sports Analysis")
library(ggplot2)
library(data.table)
library(dplyr)
library(scales)
library(tidytext)
library(RColorBrewer)
library(kableExtra)
library(scales)
library(patchwork)
library(reshape2)
df_tracking <- fread("Data/NFLBDB2022/tracking2020.csv")
df_plays <- fread("Data/NFLBDB2022/plays.csv")
df_games <- fread("Data/NFLBDB2022/games.csv")
# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586
my_playerNumber <- 19
df <- df_tracking %>%
filter(gameId == my_gameId & playId == my_playId ) %>%
left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time,
nflId, dis, playDescription, s) %>%
data.frame()
df_player1 <- df %>%
filter(jerseyNumber == my_playerNumber) %>%
select(gameId, playId, frameId, x, y, dis, team) %>%
data.frame()
results <- df %>%
left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
# ----------------------------- here is the code that I created in class -----------------------------
# compute the distance to each player on the opposing team (not to your teammates and not to the ball)
mutate(dist = ifelse(team.x != team.y & team.x != "football",
sqrt((x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
mutate(mean_dist_to_player = mean(dist, na.rm = TRUE),
alpha = 1 - (mean_dist_to_player / 113.33)) %>%
# -------------------------------------start your code below ----------------------------------------
# Arrange code by frameId and distance to determine closest three players
arrange(frameId, dist) %>% # Arrange by frameId and dist
group_by(frameId, team.x == "away") %>%
mutate(closest3_distance = nth(dist, 3, order_by = dist),
closest_players = ifelse(dist <= closest3_distance, 1, 0)) %>%
ungroup() %>%
data.frame()
t_test_data <- results %>%
select(mean_dist_to_player, closest3_distance)%>%
na.omit()
t_test_result <- t.test(t_test_data$mean_dist_to_player,
t_test_data$closest3_distance)
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: t_test_data$mean_dist_to_player and t_test_data$closest3_distance
## t = 10.107, df = 5045.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.102183 6.076631
## sample estimates:
## mean of x mean of y
## 27.08562 21.99621
# correlation test
cor_results <- cor.test(t_test_data$mean_dist_to_player,
t_test_data$closest3_distance , method = "pearson")
print(cor_results)
##
## Pearson's product-moment correlation
##
## data: t_test_data$mean_dist_to_player and t_test_data$closest3_distance
## t = 371.09, df = 2539, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9901746 0.9915846
## sample estimates:
## cor
## 0.9909068
# ------------------------------------- your code ends here -----------------------------------------
#density graph histogram
d_df <- melt(results %>% select(mean_dist_to_player,closest3_distance, dist)%>% filter(!is.na(dist)), id = c())
# Plot
plot1 <- ggplot(d_df, aes(x = value, fill = variable)) +
geom_density(fill = "lightblue", alpha = 0.2) +
facet_wrap(~variable, scale = "free", nrow = 1)
# Print the plot
print(plot1)
After filtering and sorting the data to units of interest, we were able to plot and visualize the different densities of the mean distance vs the closest three players’ distance. We found that the Closest 3 players, as expected had shorter distances to the target player overall in comparison to the mean distance of the players.
Again as expected the mean distance has a larger distribution across the graph, peaking at 12-15 units and again at 50
There is significant overlap around 12-15 units, suggesting correlation between the mean and the closest three players around that time
These plots suggests that players mean distance includes farther players leading to a spread out distribution, but upon consolidating the data of interest to the closest three players, results in consistently shorter distances. This is relevant in certain tactical contexts like analyzing traffic in this example and/or close player interactions or pressure
We also used a Pearson correlation test to determine the correlation
between the average distance the three closest players. The results were
as follows: a strong positive correlation : .99 indicating a strong
positive correlation between the two variables p-value: <2.2e-16 very
low p value indicating strong likelihood that the correlation is not
random and supports the hypothesis that this is a meaningful
relationship
confidence interval of 95% from 0.9901746 to 0.9915846, which is very
close to 1 and indicates true correlation being close to .99