Introduction

In this project, we analyzed red zone football plays to uncover strategic insights that can enhance offensive and defensive decision-making. By merging datasets from games, players, plays, and tracking, we filtered red zone plays (within the opponent’s 20-yard line) and evaluated key factors such as player movements, offensive formations, and expected vs. actual outcomes. Our analysis leveraged efficiency metrics, heatmaps, and play success trends to highlight actionable opportunities for improving red zone performance.

Description of Project

This project analyzes NFL red zone plays using comprehensive tracking and game data. By integrating various datasets (e.g., games, plays, players, and tracking data), we explore critical factors influencing play outcomes, offensive efficiency, and player movements in the red zone. The analysis focuses on:

  1. Pre-Snap and Post-Snap Efficiency: Evaluating offensive formations, player motion, and their impact on yards gained, expected points, and touchdown rates.
  2. Player Performance Metrics: Analyzing tracking data to assess speed, acceleration, and orientation for effectiveness.
  3. Red Zone Scoring Efficiency: Comparing expected vs. actual points to identify key contributors to success in the red zone.
  4. Formation Analysis by Depth: Examining formation preferences and success rates based on the red zone depth (goal line, 5-10 yards, etc.).
  5. Visualizing Insights: Using heatmaps, bar charts, and scatterplots to provide a clear understanding of trends and relationships in red zone plays.

The overarching goal is to uncover actionable insights for optimizing offensive strategies in the red zone, where scoring efficiency is critical to success in football.

Data Visualization

The analysis uses visualizations to highlight key trends and findings, including:

  • Heatmaps to illustrate player movements in the red zone.
  • Bar charts to compare offensive formations and scoring efficiency.
  • Scatterplots and boxplots to examine relationships between variables like down, formation, and expected points.
  • Depth-based visualizations to analyze red zone strategies by field position.
setwd("C:/Users/jason/Desktop/F24 Classes/IS470")
source('https://raw.githubusercontent.com/ptallon/SportsAnalytics_Fall2024/main/SharedCode.R') 
# From Prof's Assignment 3 code...
load_packages(c("httr", "ggplot2", "ggalt", "ggforce", "hms", "gganimate", "lubridate", "data.table", "dplyr", 
                "nflfastR", "gifski", "png", "ggimage", "randomForest", "DescTools", "progress", "tidyr", "caret", "scales", "viridis", "kableExtra"))


# 1. Merge Datasets for Comprehensive Analysis
# Load additional datasets to include context and player information.
games <- fread("Data/NFLBDB2025/games.csv")
player_play <- fread("Data/NFLBDB2025/player_play.csv")
plays <- fread("Data/NFLBDB2025/plays.csv")
players <- fread("Data/NFLBDB2025/players.csv")
tracking <- fread("Data/NFLBDB2025/tracking_week_1.csv")  # Replace with other weeks as needed

df <- left_join(tracking, games, by = c("gameId"))
df <- left_join(df, plays, by = c("gameId", "playId"))
df <- left_join(df, players, by = c("nflId"))
df <- left_join(df, player_play, by = c("gameId", "playId", "nflId"))

# 2. Filter Red Zone Plays
# We focus only on plays happening within the opponent's 20-yard line,
# which is considered the red zone, using the absoluteYardlineNumber column.
red_zone_plays <- df %>%
  filter(absoluteYardlineNumber <= 20)

# 3. Pre-Snap Efficiency Analysis
# Analyze offensive formations in the red zone to see which are most effective.
# We group by offenseFormation and calculate average yards gained, expected points,
# and the touchdown rate for each formation type.
formation_analysis <- df %>%
  mutate(offenseFormation = replace_na(offenseFormation, "Other")) %>%
  group_by(offenseFormation) %>%
  summarize(
    avg_yards_gained = mean(yardsToGo),
    avg_expected_points = mean(expectedPointsAdded),
    touchdown_rate = mean(ifelse(event == "touchdown", 1, 0))
  )

# Make the table prettier with kableExtra
table_data <- table(df$offenseFormation, useNA = "ifany")
kable(table_data, caption = "Offense Formation Distribution") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Offense Formation Distribution
Var1 Freq
EMPTY 563086
I_FORM 491648
JUMBO 30176
PISTOL 298632
SHOTGUN 4002828
SINGLEBACK 1674653
WILDCAT 8418
NA 35259
# Examine pre-snap movements like player motions or shifts, 
# and assess their impact on the average yards gained and expected points.
pre_snap_analysis <- df %>%
  filter(inMotionAtBallSnap == TRUE | motionSinceLineset == TRUE) %>%
  summarize(
    avg_yards_gained = mean(yardsToGo, na.rm = TRUE),
    avg_expected_points = mean(expectedPointsAdded, na.rm = TRUE)
  )

# Print the results of the pre-snap analysis with improved formatting
kable(pre_snap_analysis, caption = "Pre-Snap Analysis of Movements and Expected Points") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Pre-Snap Analysis of Movements and Expected Points
avg_yards_gained avg_expected_points
8.327148 -0.0077426
kable(formation_analysis, caption = "Formation Analysis in Red Zone") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Formation Analysis in Red Zone
offenseFormation avg_yards_gained avg_expected_points touchdown_rate
EMPTY 8.261171 -0.0993501 NA
I_FORM 8.044910 -0.1050960 NA
JUMBO 3.629573 -0.1293538 NA
Other 10.172211 -0.1030225 NA
PISTOL 8.902110 0.0512716 NA
SHOTGUN 8.429796 0.0025405 NA
SINGLEBACK 8.437379 -0.0192747 NA
WILDCAT 3.989071 -2.1371462 NA
# 4. Post-Snap Analysis Using Tracking Data
# Assess player speed, acceleration, and orientation during plays to evaluate their effectiveness.
tracking_analysis <- df %>%
  group_by(nflId) %>%
  summarize(
    avg_speed = mean(s, na.rm = TRUE),
    avg_acceleration = mean(a, na.rm = TRUE),
    avg_orientation = mean(o, na.rm = TRUE)
  )

# Analyze key play events like ball_snap, pass_forward, and touchdown,
# and calculate the average yards gained and play success rate for these events.
event_analysis <- df %>%
  filter(event %in% c("ball_snap", "pass_forward", "touchdown")) %>%
  group_by(event) %>%
  summarize(
    avg_yards_gained = mean(yardsToGo, na.rm = TRUE),
    play_success_rate = mean(ifelse(event == "touchdown", 1, 0), na.rm = TRUE)
  )

# 5. Develop Efficiency Metrics
# Create new metrics to evaluate red zone play success and scoring efficiency.
# Calculate play success based on gaining the necessary yards or scoring a touchdown.
# Scoring efficiency is calculated as expected points added divided by yards to go.
efficiency_metrics <- df %>%
  mutate(
    play_success = ifelse(yardsGained >= yardsToGo | event == "touchdown", 1, 0),
    scoring_efficiency = expectedPointsAdded / yardsToGo
  ) %>%
  summarize(
    play_success_rate = mean(play_success, na.rm = TRUE),
    avg_scoring_efficiency = mean(scoring_efficiency, na.rm = TRUE)
  )

# Print the calculated efficiency metrics with improved formatting
kable(event_analysis, caption = "Event Analysis for Key Play Outcomes") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
Event Analysis for Key Play Outcomes
event avg_yards_gained play_success_rate
ball_snap 8.355236 0
pass_forward 8.558442 0
touchdown 5.291667 1

Visualization 1 : The heatmap visualizes player movements in the red zone, revealing concentrated activity between the 15-25-yard markers along the central axis of the field. The data also highlights a hotspot near the 20-yard line, indicating intensified play activity as teams approach scoring opportunities. These patterns suggest heavily utilized areas that could be targeted for strategic offensive or defensive adjustments, helping teams refine their red zone approaches.

# 6. Visualize Key Insights
# Create a heatmap of player movements to show density in the red zone.

# Filter the data to include only red zone (x between 10 and 30)
red_zone_data <- tracking %>%
  filter(x >= 10 & x <= 30)

# Create the heatmap with density
g <- ggplot(red_zone_data, aes(x = x, y = y)) +
  geom_bin2d(bins = 30, aes(fill = ..count..)) +  # Bin data and fill based on count
  scale_fill_viridis(option = "turbo") +  # Use the Inferno color scale
  labs(title = "Heatmap of Player Movements in the Red Zone",
       x = "Field Position X (10 to 30 yards)",
       y = "Field Position Y") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),  # Center the title
    axis.text.x = element_text(angle = 45, hjust = 1),  # Angle x-axis labels for better visibility
    axis.text.y = element_text(angle = 0, hjust = 1)  # Keep y-axis labels readable
  )

# Print the plot
print(g)


Visualization 2: This analysis evaluates the scoring efficiency of various offensive formations in the red zone using Expected Points Added (EPA). Formations such as Wildcat performed poorly, showcasing significantly lower efficiency. Conversely, standard formations like Shotgun and I-Form displayed consistent performance with minimal variation in EPA. These insights enable teams to prioritize effective formations while avoiding those with low efficiency, enhancing their red zone success rates.

n <- nrow(formation_analysis)
pb <- progress_bar$new(
  format = "  Processing [:bar] :percent in :elapsed",
  total = n, clear = FALSE, width = 60
)

# Loop through formations (simulating a calculation per formation)
for (i in 1:n) {
  Sys.sleep(0.1)  # Simulating a time-consuming calculation
  pb$tick()       # Update the progress bar
}

# Create a bar plot after calculations
# Horizontal bar chart for red zone efficiency
red_zone_plot <- ggplot(formation_analysis, aes(x = reorder(offenseFormation, avg_expected_points), y = avg_expected_points)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  coord_flip() +
  theme_minimal() +
  labs(
    title = "Red Zone Efficiency by Offensive Formation",
    x = "Offensive Formation",
    y = "Average Expected Points Added"
  ) +
  theme(
    plot.title = element_text(size = 20, hjust = 0.5),  # Center the title
    axis.text.x = element_text(size = 15), 
    axis.text.y = element_text(size = 15),
    axis.title = element_text(size = 18)
  )

print(red_zone_plot)


Visualization 3: Formation usage varies across different red zone depths, with Shotgun dominating at both 10-15 and 15-20-yard markers. Single back is the second most common, particularly at 15-20 yards, while specialized formations like Jumbo and Pistol serve specific strategic roles. This analysis highlights how teams adjust their formation preferences based on field position, helping optimize their play-calling strategies in the red zone. Understanding formation tendencies at different red zone depths provides valuable insights for defenses, enabling them to anticipate offensive strategies and adjust their alignments accordingly. This knowledge can help defenses better position themselves to disrupt plays and reduce scoring opportunities.

# Reworking formation preference summary to be based on unique plays and retaining necessary columns
formation_preference <- red_zone_plays %>%
  distinct(gameId, playId, offenseFormation, absoluteYardlineNumber, yardsToGo, event, expectedPointsAdded) %>%  # Retain necessary columns
  mutate(red_zone_depth = case_when(
    absoluteYardlineNumber <= 5 ~ "Goal Line",
    absoluteYardlineNumber <= 10 ~ "5-10 Yards",
    absoluteYardlineNumber <= 15 ~ "10-15 Yards",
    TRUE ~ "15-20 Yards"
  )) %>%
  group_by(offenseFormation, red_zone_depth) %>%
  summarize(
    count = n(),
    avg_yards_gained = mean(yardsToGo, na.rm = TRUE),
    touchdown_rate = mean(ifelse(event == "touchdown", 1, 0)),
    avg_expected_points = mean(expectedPointsAdded, na.rm = TRUE)
  ) %>%
  arrange(red_zone_depth, desc(count))

# Print the formation preference summary with kableExtra for better styling
kable(formation_preference, caption = "Formation Preference in Red Zone by Depth") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Formation Preference in Red Zone by Depth
offenseFormation red_zone_depth count avg_yards_gained touchdown_rate avg_expected_points
SHOTGUN 10-15 Yards 213 4.352113 NA -0.0873441
SINGLEBACK 10-15 Yards 99 5.191919 NA -0.2274308
I_FORM 10-15 Yards 55 3.400000 NA 0.2396531
JUMBO 10-15 Yards 25 1.000000 NA 0.5331242
EMPTY 10-15 Yards 7 4.000000 NA 1.3909096
SHOTGUN 15-20 Yards 209 7.488038 NA 0.1717825
SINGLEBACK 15-20 Yards 60 7.733333 NA -0.3738211
I_FORM 15-20 Yards 32 8.687500 NA -0.3113056
EMPTY 15-20 Yards 12 5.500000 NA -0.5233404
PISTOL 15-20 Yards 8 9.000000 NA 0.3271988
# Visualize Formation Preferences by Red Zone Depth
formation_depth_plot <- ggplot(formation_preference, aes(x = red_zone_depth, y = count, fill = offenseFormation)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(
    title = "Formation Preferences by Red Zone Depth (Based on Plays)",
    x = "Red Zone Depth",
    y = "Number of Plays",
    fill = "Formation"
  ) +
  scale_fill_viridis_d(option = "plasma", direction = -1) +  # Better color palette
  theme(
    plot.title = element_text(size = 20, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 17),
    axis.text = element_text(size = 14),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 12)
  )

# Print the plot
print(formation_depth_plot)


Visualization 4: This comparison evaluates the performance of offensive formations against their expected outcomes in the red zone. The Jumbo formation consistently exceeded expectations, demonstrating exceptional reliability and effectiveness. Meanwhile, formations like Single back underperformed, indicating inefficiencies. These findings allow teams to focus on formations that consistently deliver results and refine underperforming strategies to improve overall efficiency.

# Expected vs Actual Points Analysis
expected_vs_actual <- red_zone_plays %>%
  mutate(
    actual_points = ifelse(event == "touchdown", 7, ifelse(event == "field_goal", 3, 0))
  ) %>%
  summarize(
    avg_expected_points = mean(expectedPointsAdded, na.rm = TRUE),
    avg_actual_points = mean(actual_points, na.rm = TRUE)
  )

# Print the expected vs actual points table with kableExtra
kable(expected_vs_actual, caption = "Expected vs Actual Points in Red Zone") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Expected vs Actual Points in Red Zone
avg_expected_points avg_actual_points
-0.0720081 0.1674641
# Motion and Formation Impact on Points
motion_impact <- red_zone_plays %>%
  filter(inMotionAtBallSnap == TRUE | motionSinceLineset == TRUE) %>%
  summarize(
    avg_expected_points_motion = mean(expectedPointsAdded, na.rm = TRUE),
    avg_actual_points_motion = mean(ifelse(event == "touchdown", 7, ifelse(event == "field_goal", 3, 0)), na.rm = TRUE)
  )

formation_impact <- red_zone_plays %>%
  group_by(offenseFormation) %>%
  summarize(
    avg_expected_points = mean(expectedPointsAdded, na.rm = TRUE),
    avg_actual_points = mean(ifelse(event == "touchdown", 7, ifelse(event == "field_goal", 3, 0)), na.rm = TRUE)
  )

# Print motion and formation impact tables with kableExtra
kable(motion_impact, caption = "Impact of Motion on Expected and Actual Points") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Impact of Motion on Expected and Actual Points
avg_expected_points_motion avg_actual_points_motion
-0.1384531 0.1761006
kable(formation_impact, caption = "Impact of Formation on Expected and Actual Points") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Impact of Formation on Expected and Actual Points
offenseFormation avg_expected_points avg_actual_points
EMPTY -0.0064161 0.0000000
I_FORM 0.0071192 0.2763158
JUMBO 0.5139676 0.6363636
PISTOL 0.3271988 0.0000000
SHOTGUN -0.0327270 0.1338798
SINGLEBACK -0.3181078 0.1500000
# Visualize Comparison of Expected vs Actual Points by Offensive Formation
g2 <- ggplot(formation_impact, aes(x = reorder(offenseFormation, avg_actual_points))) +
  geom_bar(aes(y = avg_actual_points, fill = "Actual Points"), stat = "identity", position = "dodge", alpha = 0.8) +
  geom_bar(aes(y = avg_expected_points, fill = "Expected Points"), stat = "identity", position = "dodge", alpha = 0.8) +
  geom_text(aes(y = avg_actual_points, label = round(avg_actual_points, 2)), vjust = -0.5, color = "black", size = 4.5, fontface = "bold") +
  geom_text(aes(y = avg_expected_points, label = round(avg_expected_points, 2)), vjust = 1.5, color = "black", size = 4.5, fontface = "bold") +
  theme_minimal() +
  labs(
    title = "Comparison of Expected vs. Actual Points by Offensive Formation",
    subtitle = "Bars sorted by average actual points",
    x = "Offensive Formation",
    y = "Average Points",
    fill = "Points Type"
  ) +
  scale_fill_manual(values = c("Expected Points" = "green", "Actual Points" = "coral")) +
  theme(
    plot.title = element_text(size = 20, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 14, hjust = 0.5),
    axis.title = element_text(size = 18),
    axis.text.x = element_text(size = 14, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 14),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 12)
  )

print(g2)

Visualization 6: This analysis delves into the reliability and variability of offensive formations in the red zone. Consistent formations like I-Form and Jumbo yielded positive EPA outcomes, proving to be reliable choices for critical plays. On the other hand, the Empty formation exhibited significant variability, demonstrating both high-risk and high-reward potential. These findings provide a framework for selecting formations tailored to specific game situations and risk tolerances.

# Box plot for Offense Formation vs Expected Points Added
ggplot(expected_points_analysis, aes(x = reorder(offenseFormation, -expectedPointsAdded), y = expectedPointsAdded)) +
  geom_boxplot(aes(fill = offenseFormation), show.legend = FALSE, alpha = 0.8, outlier.color = "red") +
  scale_fill_viridis_d(option = "turbo") +
  labs(
    title = "Offensive Formation vs Expected Points Added",
    x = "Offensive Formation",
    y = "Expected Points Added"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 20, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 18),
    axis.text.x = element_text(size = 14, angle = 45, hjust = 1),
    axis.text.y = element_text(size = 14)
  )


Conclusion

This project represents an exciting step toward uncovering valuable insights into red zone football strategies, but it is far from complete. Over the upcoming break, we plan to expand and refine our analysis before submitting it to the NFL Big Data Bowl. Our future efforts include developing a predictive model to assess the likelihood of successful plays and creating dynamic animations to better visualize player movements and formations.

While our current focus on red zone plays has provided targeted insights, it also limits the broader applicability of our findings. To address this, we are considering expanding our analysis to include the entire football field, which would allow us to uncover patterns and trends across all areas of play. This broader approach could provide even deeper insights, enhancing our understanding of game dynamics and maximizing the potential impact of our work.

```