This project explores NFL player movement to provide insights into crowding and tackle probability. Building on an initial “Traffic Meter Scale,” We enhanced the analysis by incorporating new metrics, such as tackle probability, opponent density, and crowding indices. These additional calculations allow for a more comprehensive view of player behavior during a game, as they highlight both individual and situational dynamics. My goal was to make each metric inform a more precise analysis of in-game conditions that influence player movement and outcomes.
The dataset used in this project comes from the NFL Big Data Bowl, focusing on player tracking data. It includes real-time x-y coordinates of players throughout a play, allowing for dynamic analysis of distances, velocities, and positioning.
We centered the analysis on a specific play from a January 3, 2021, game between Buffalo and Miami, where Isaiah McKenzie completed an 84-yard touchdown run. The data captured each player’s movements, enabling me to calculate metrics that reflect the intensity of surrounding opponent traffic. Key metrics developed include:
Tackle Probability: This metric measures the likelihood of a tackle based on the distance to the nearest defender. Opponent Density: Counts the number of defenders within a 5-yard radius of McKenzie to assess the level of “crowding” around him. Crowding Index: Combines opponent density with the average distance to defenders, offering a refined measure of traffic intensity. These metrics were implemented to highlight not just how many defenders were nearby, but also the specific areas of highest tackle threat along McKenzie’s path.
We created several visualizations to display these metrics and demonstrate the defensive pressure McKenzie faced during the play. These visuals included:
Player Movement Map: A dynamic map of player movements, showing both McKenzie’s path and the positions of defenders. This map uses a color-coded “Traffic Meter Scale” to highlight crowd density around him as he advances downfield.
Tackle Probability Over Time: A line graph showing changes in tackle probability at each frame of the play. Spikes in this graph indicate moments where McKenzie is at a high risk of being tackled, allowing for a focused view of key interactions.
Density Halo Visualization: A halo surrounding McKenzie in the map visual adjusts in size and color based on the opponent density and crowding index, dynamically illustrating moments of high pressure.
Each visualization provided a clearer picture of McKenzie’s risk and movement strategy, with the density measures and tackle probability adding layers of tactical insight. These visualizations illustrate both quantitative and situational aspects of the play, leading to a more nuanced understanding of player movement and defensive tactics in the game.
setwd("C:/Users/jason/Desktop/F24 Classes/IS470")
# ---------------------------------------------------------------------------
# ------------------ Visualize an Individual Play (with player) -------------
# ---------------------------------------------------------------------------
# provide a merged data frame, a game Id, a play Id, and a player.
visualize_play <- function(df,
df_team1,
df_team2,
playerNumber = NA)
{
source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
g <- ggplot(data = df, aes(x = x, y = y)) +
# customize colors, shapes, and sizes of players and the football
scale_size_manual(values = c(6, 4, 6), guide = "none") +
scale_shape_manual(values = c(21, 16, 21), guide = "none") +
scale_fill_manual(values = c("dodgerblue1", "#663300", "firebrick1"), guide = "none") +
scale_colour_manual(values = c("black", "#663300", "black"), guide = "none") +
gg_field(yardmin = -5, yardmax = 125) +
# add home team logo to the center point
geom_image(data = df_team1, aes(x = 59, y = 53.33/2, image = team_logo_wikipedia), size = 0.15) +
# add end zone team names
annotate("text", x = 4, y = 27, label = df_team2$team_name, angle = 90, size = 5, fontface = 2, color = "white") +
annotate("text", x = 115, y = 27, label = df_team1$team_name, angle = 270, size = 5, fontface = 2, color = "white") +
# add informational text at the top of the field
annotate("text", x = -3, y = 56, label = "Traffic Meter", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 38, y = 56, label = "Value:" , colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 52, y = 54, label = "Dist traveled by receiver (yds):", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = 80, y = 56, label = "Speed(yds/sec):", colour = "white", vjust = 0, hjust = 0 ) +
annotate("text", x = -3, y = 54, label = "Tackle Probability:", colour = "white", vjust = 0, hjust = 0) +
# add convex hulls for both teams
geom_encircle( data = df %>% filter(team != "football"),
aes(fill = team,
group = team,
color = team),
s_shape = 1,
expand = 0,
alpha = 0.5,
show.legend = TRUE) +
# add halo around receiver
geom_point(data = df %>% filter(jerseyNumber == playerNumber),
aes(x = x, y = y, alpha = alpha),
shape = 21,
size = 10,
stroke = 1, # width of the circle border
fill = "black",
colour = "black") +
# add points to plot for all players and the football
geom_point(data = df, aes(x = x, y = y, shape = team, colour = team, size = team, fill = team) ) +
# insert jersey number for each player
geom_text( data = df %>% filter(team != "football"),
aes(x = x, y = y,
label = jerseyNumber), colour = "white", size = 3.5, vjust = 0.36 ) +
# report out key metrics (total yards ran by receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 95, y = 54, label = format(dist_running_total, nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (mean dist to player)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 48, y = 56, label = format(round(100 * (1 - (mean_dist_to_player/113.33)),2), nsmall = 2),
vjust = 0, hjust = 0), color = "white" ) +
# report out key metrics (speed of receiver)
geom_text( data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 104, y = 56, label = s, vjust = 0, hjust = 0), color = "white" ) +
# Report out the tackle probability (displayed as a percentage)
geom_text(data = df %>% filter(jerseyNumber == playerNumber),
aes(x = 23, y = 54, label = format(round(tackle_probability * 100, 2), nsmall = 2)),
color = "white", vjust = 0, hjust = 0) +
# add some labels to report the play description
labs(title = df$playDescription) +
# set the theme to dark green to color the areas beyond the end zones
theme(panel.background = element_rect(fill = "forestgreen",
color = "forestgreen"), panel.grid = element_blank()) +
annotate("rect", xmin = 15, xmax = 35, ymin = 56, ymax = 57, color = "black", fill = "white") +
geom_rect(data = df %>% filter(jerseyNumber == playerNumber),
aes(xmin = 15, xmax = 15 + (alpha*20), ymin = 56, ymax = 57,
alpha = alpha), color = "black", fill = "black" ) +
guides(alpha = "none") +
transition_time(frameId)
g
}
load_packages <- function(packages = c()) {
if (length(packages) == 0) {
print('You did not specify any packages/libraries to load')
}
else {
for (i in packages){
if(! i %in% installed.packages()){
install.packages(i, dependencies = TRUE)
}
suppressMessages(suppressWarnings(library(i, character.only=T)))
}
}
}
required_packages <- c("ggplot2", "ggalt", "ggforce", "hms",
"gganimate", "lubridate", "data.table",
"dplyr", "nflfastR", "gifski", "png", "ggimage",
"RColorBrewer", "plotly", "reshape2", "gt")
load_packages(required_packages)
df_tracking <- fread("Data/NFLBDB2022/tracking2020.csv")
df_plays <- fread("Data/NFLBDB2022/plays.csv")
df_games <- fread("Data/NFLBDB2022/games.csv")
# you are creating a metric for a player in a play (playId) in a game (gameId)
my_gameId <- 2021010300
my_playId <- 1586
my_playerNumber <- 19
df <- df_tracking %>%
filter(gameId == my_gameId & playId == my_playId ) %>%
left_join(df_plays, by = c("playId" = "playId", "gameId" = "gameId")) %>%
select(x, y, displayName, jerseyNumber, team, gameId, playId, frameId, time,
nflId, dis, playDescription, s) %>%
data.frame()
team1 <- as.character( df_games[df_games$gameId == my_gameId, "homeTeamAbbr"] )
team2 <- as.character( df_games[df_games$gameId == my_gameId, "visitorTeamAbbr"] )
df_team1 <- teams_colors_logos %>%
filter(team_abbr == team1)
df_team2 <- teams_colors_logos %>%
filter(team_abbr == team2)
df_player1 <- df %>%
filter(jerseyNumber == my_playerNumber) %>%
select(gameId, playId, frameId, x, y, dis, team) %>%
data.frame()
df <- df %>%
left_join(df_player1, by = c( "playId" = "playId", "gameId" = "gameId", "frameId" = "frameId")) %>%
# ----------------------------- here is the code that I created in class -----------------------------
# compute the distance to each player on the opposing team (not to your teammates and not to the ball)
mutate(dist = ifelse( team.x != team.y & team.x != "football",
sqrt( (x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
# compute the average of all distances
mutate(mean_dist_to_player = mean(dist, na.rm = TRUE ),
alpha = 1 - (mean_dist_to_player/113.33) ) %>%
ungroup() %>%
# -------------------------------------- my code ends here ------------------------------------------
# -------------------------------------start your code below ----------------------------------------
mutate(made_up_dist = 1 + sample(c(1:100), nrow(.), replace = TRUE) / 10) %>%
group_by(gameId, playId, frameId) %>%
mutate(made_up_mean_dist_to_player = 1,
alpha = 0.5) %>%
ungroup() %>%
mutate(dist = ifelse( team.x != team.y & team.x != "football",
sqrt( (x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
# compute the average of all distances
mutate(min_dist_to_defender = min(dist, na.rm = TRUE),
# Compute the average of all distances for 'alpha' (if needed)
mean_dist_to_player = mean(dist, na.rm = TRUE),
# Define 'alpha' as a measure of distance
alpha = 1 - (mean_dist_to_player / 113.33),
# Calculate tackle probability metric based on nearest defender
tackle_probability = 1 - (min_dist_to_defender / 113.33)) %>%
ungroup() %>%
# Calculate distance to each opponent and crowding index
mutate(dist_to_opponent = ifelse(team.x != team.y & team.x != "football",
sqrt((x.x - x.y)^2 + (y.x - y.y)^2),
NA)) %>%
group_by(gameId, playId, frameId) %>%
# Count opponents within 5 yards
mutate(opponent_density = sum(dist_to_opponent < 5, na.rm = TRUE), # count of opponents within 5 yards
crowding_index = opponent_density / (113.33 - mean_dist_to_player + 1), # preventing division by zero
mean_dist_to_player = mean(dist, na.rm = TRUE),
alpha = 1 - (mean_dist_to_player / 113.33)) %>%
ungroup() %>%
# ------------------------------------- your code ends here -----------------------------------------
# compute a running total of the distance traveled
arrange(nflId, frameId) %>%
group_by(nflId) %>%
mutate( dist_running_total = cumsum(dis.x)) %>%
rename( "x" = "x.x",
"y" = "y.x",
"team" = "team.x",
"dis" = "dis.x") %>%
data.frame()
# run the line below if you want to visualize the play
visualize_play(df, df_team1, df_team2, playerNumber = 19)
# Calculate Pearson correlations
correlation_results <- cor(df[, c("dist", "mean_dist_to_player", "opponent_density", "crowding_index", "tackle_probability")], use = "complete.obs")
correlation_df <- as.data.frame(correlation_results)
correlation_df <- tibble::rownames_to_column(correlation_df, "Variable")
# Display with gt
correlation_df %>%
gt() %>%
tab_header(title = "Pearson Correlation Matrix for Key Variables") %>%
fmt_number(columns = -1, decimals = 2) %>%
tab_options(
table.font.size = "small",
heading.align = "center"
)
| Pearson Correlation Matrix for Key Variables | |||||
| Variable | dist | mean_dist_to_player | opponent_density | crowding_index | tackle_probability |
|---|---|---|---|---|---|
| dist | 1.00 | 0.90 | −0.64 | −0.65 | −0.83 |
| mean_dist_to_player | 0.90 | 1.00 | −0.71 | −0.72 | −0.92 |
| opponent_density | −0.64 | −0.71 | 1.00 | 1.00 | 0.61 |
| crowding_index | −0.65 | −0.72 | 1.00 | 1.00 | 0.62 |
| tackle_probability | −0.83 | −0.92 | 0.61 | 0.62 | 1.00 |
p <- ggplot(df, aes(x = dist, y = mean_dist_to_player)) +
geom_point(aes(color = as.factor(opponent_density), size = crowding_index), alpha = 0.7) +
scale_color_brewer(palette = "Set1") + # Use a color palette that enhances visibility
labs(title = "Distance Traveled vs. Mean Distance to Player",
x = "Distance Traveled (yards)",
y = "Mean Distance to Opponent (yards)",
color = "Opponent Density",
size = "Crowding Index") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 14),
legend.text = element_text(size = 12),
legend.key.size = unit(1.5, "cm"))
ggplotly(p)
ggplot() +
geom_density(data = df[!is.na(df$dist), ], aes(x = dist, fill = "Distance Traveled"), alpha = 0.5) +
geom_density(data = df[!is.na(df$mean_dist_to_player), ], aes(x = mean_dist_to_player, fill = "Mean Distance to Opponent"), alpha = 0.5) +
labs(title = "Density Plots of Distance Metrics",
x = "Distance (yards)",
fill = "Metrics") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 14),
legend.text = element_text(size = 12),
legend.key.size = unit(1.5, "cm"))
t <- ggplot(df, aes(x = mean_dist_to_player, y = tackle_probability, color = as.factor(opponent_density))) +
geom_point(alpha = 0.7, size = 3) +
labs(title = "Tackle Probability vs. Mean Distance to Opponent",
x = "Mean Distance to Opponent (yards)",
y = "Tackle Probability",
color = "Opponent Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(t)
This analysis provides a detailed look at the spatial and tactical dynamics within a single NFL play, demonstrating how tracking data can yield actionable insights. By enhancing the initial “Traffic Meter Scale” with metrics such as tackle probability, opponent density, and crowding indices, we created a more robust framework for evaluating defensive pressure and player movement.
The analysis shows that high tackle probability zones often coincide with crowded areas, indicating that defenders’ positioning plays a critical role in disrupting play progression. Additionally, the metrics developed here could be valuable for applications beyond a single play, potentially aiding teams in assessing player positioning strategies and defensive setups.
These metrics not only bring clarity to complex in-game scenarios but also open doors for further development in player performance and play analysis, ultimately providing a foundation for more predictive and descriptive modeling of player interactions in future plays.
```