Introduction

In this report, we explore the performance of NFL kickers in field goals using data from the NFL Big Data Bowl 2022. We look into field goal success based on kick length, expected points, and “clutch” situations where the score difference is tight, and the game is in its final moments.

# Load required libraries
library(data.table)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)  # For percentage formatting

Step 1: Data Filtering and Preprocessing

We first filter the data for field goal plays, create a binary variable indicating success or failure, and make necessary adjustments for blocked kicks.

# Read in and merge four data files
my_df1 <- fread("NFL Big Data Bowl 2022/plays.csv")
my_df2 <- fread("NFL Big Data Bowl 2022/players.csv")
my_df3 <- fread("NFL Big Data Bowl 2022/games.csv")
my_df4 <- fread("NFL Big Data Bowl 2022/PFFScoutingData.csv")

df <- left_join(my_df1, my_df2, by = c("kickerId" = "nflId"))
df <- left_join(df, my_df3, by = c("gameId"))
df <- left_join(df, my_df4, by = c("gameId", "playId"))

# Filter for field goal plays and relevant outcomes
field_goal_data_filtered <- df %>%
  filter(specialTeamsPlayType == "Field Goal", 
         specialTeamsResult %in% c("Kick Attempt Good", "Kick Attempt No Good", "Blocked Kick Attempt")) %>%
  mutate(success = ifelse(specialTeamsResult == "Kick Attempt Good", 1, 0),
         kickLength = ifelse(specialTeamsResult == "Blocked Kick Attempt", yardlineNumber + 18, kickLength))

Step 2: Logistic Regression for Field Goal Success Based on Kick Length

We fit a logistic regression model to predict field goal success based on kick length and plot the predicted probability of success.

# Fit a logistic regression model to predict success based on kick length
field_goal_model <- glm(success ~ kickLength, data = field_goal_data_filtered, family = binomial())

# Generate predicted probabilities of success for a range of kick lengths (20 to 67 yards)
new_data <- data.frame(kickLength = seq(20, 67, by = 1))
new_data$predicted_success <- predict(field_goal_model, newdata = new_data, type = "response")

# Extract model coefficients (intercept and slope)
intercept <- coef(field_goal_model)[1]
slope <- coef(field_goal_model)[2]

# Format the formula as a string for annotation
formula_text <- sprintf("log(p / (1 - p)) = %.2f + %.2f * kickLength", intercept, slope)

# Plot the predicted success rate by kick length with the formula
plot1 <- ggplot(new_data, aes(x = kickLength, y = predicted_success)) +
  geom_line() +
  labs(
    title = "Predicted Probability of Field Goal Success by Kick Length",
    x = "Kick Length (yards)",
    y = "Predicted Probability of Success"
  ) +
  theme_minimal() +
  scale_x_continuous(breaks = seq(20, 67, by = 5)) +
  
  # Add the formula as an annotation
  annotate("text", x = 30, y = 0.1, label = formula_text, hjust = 0, size = 5, color = "blue")

# Print the plot
plot1

Step 3: Calculating Expected Points and Expected Points Added (eXPA)

Here we calculate the expected points based on the predicted success probabilities and plot average expected points by kick length.

# Calculate expected points
field_goal_data_filtered <- field_goal_data_filtered %>%
  mutate(predicted_success_prob = predict(field_goal_model, newdata = field_goal_data_filtered, type = "response"),
         expected_points = 3 * predicted_success_prob,
         Points = ifelse(success == 1, 3, 0),
         eXPA = Points - expected_points)

# Summarize expected points by kick length
expected_points_summary <- field_goal_data_filtered %>%
  group_by(kickLength) %>%
  summarise(average_expected_points = mean(expected_points))

# Bar chart showing average expected points by kick length
ggplot(expected_points_summary, aes(x = factor(kickLength), y = average_expected_points)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Expected Points by Kick Length", 
       x = "Kick Length (yards)", 
       y = "Expected Points") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Step 4: Visualizing Expected Points vs. Probability of Success

In this plot, we visualize both expected points and predicted probability of success on a dual y-axis to analyze their relationship

# Dual y-axis plot: expected points and probability of success
ggplot() +
  geom_bar(data = expected_points_summary, aes(x = kickLength, y = average_expected_points), 
           stat = "identity", fill = "steelblue", alpha = 0.6) +
  geom_line(data = new_data, aes(x = kickLength, y = predicted_success * 3), color = "orange", size = 1.2) +
  scale_y_continuous(name = "Expected Points", limits = c(0, 3), 
                     sec.axis = sec_axis(~./3, name = "Predicted Probability of Success", 
                                         labels = percent_format())) +
  scale_x_continuous(breaks = seq(20, 67, by = 5)) +
  labs(title = "Expected Points and Probability of Success by Kick Length", 
       x = "Kick Length (yards)") +
  theme_minimal() +
  theme(axis.title.y.left = element_text(color = "steelblue", size = 12), 
        axis.title.y.right = element_text(color = "orange", size = 12), 
        axis.text.x = element_text(angle = 90, hjust = 1))

The expected points follow a clear linear trend with expected points decreasing as attempted kick length increases, which make logical sense.

Step 5: Summarizing Points by Kicker

Here, we summarize the total points and expected points for the top 10 kickers.

# Summarize total points and expected points for each kicker
kicker_stats <- field_goal_data_filtered %>%
  group_by(displayName) %>%
  summarise(total_points = sum(Points), total_expected_points = sum(expected_points)) %>%
  arrange(desc(total_points)) %>%
  slice_head(n = 10)

# Reshape for side-by-side comparison of points
kicker_stats_long <- kicker_stats %>%
  pivot_longer(cols = c(total_points, total_expected_points), 
               names_to = "points_type", 
               values_to = "points")

# Parallel bar chart: total points vs expected points
ggplot(kicker_stats_long, aes(x = reorder(displayName, points), y = points, fill = points_type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.4) +
  coord_flip() +
  scale_fill_manual(values = c("total_points" = "orange", "total_expected_points" = "steelblue"), 
                    labels = c("Total Expected Points", "Total Points")) +
  labs(title = "Total Points vs Expected Points by Top 10 Kickers", 
       x = "Kicker", 
       y = "Points", 
       fill = "Points Type") +
  theme_minimal() +
  geom_text(aes(label = round(points, 1)), position = position_dodge(width = 0.9), vjust = -0.5, size = 3)

Justin Tucker led all kickers with an additional 25.1 points over expected, followed by Jason Myers with 19.4, Josh Lambo with 18.1, and Brandon McManus with 17.65. Nineteen of 42 kickers produced more points than expected.

Step 6: Kicking Accuracy by Kick Length Range

We now analyze the kicking accuracy for the top 10 kickers by total points, broken down by kick length range. We also compare each kicker’s performance against the league-wide average.

# Step 1: Identify the top 10 kickers by total points
top_10_kickers <- field_goal_data_filtered %>%
  group_by(displayName) %>%
  summarise(total_points = sum(Points)) %>%
  arrange(desc(total_points)) %>%
  slice_head(n = 10)  # Select top 10

# Step 2: Filter the data for the top 10 kickers
top_kicker_data <- field_goal_data_filtered %>%
  filter(displayName %in% top_10_kickers$displayName)

# Step 3: Create kick length ranges
top_kicker_data <- top_kicker_data %>%
  mutate(kick_length_range = cut(kickLength, 
                                 breaks = c(10, 20, 30, 40, 50, 60, Inf), 
                                 labels = c("10-19", "20-29", "30-39", "40-49", "50-59", "60+"),
                                 right = FALSE))

# Step 4: Calculate accuracy by kicker and kick length range
accuracy_by_kicker_range <- top_kicker_data %>%
  group_by(displayName, kick_length_range) %>%
  summarise(
    total_attempts = n(),
    successful_attempts = sum(success),
    accuracy = mean(success)
  )

# Step 5: Calculate league-wide average accuracy by kick length range
league_average_accuracy <- field_goal_data_filtered %>%
  mutate(kick_length_range = cut(kickLength, 
                                 breaks = c(10, 20, 30, 40, 50, 60, Inf), 
                                 labels = c("10-19", "20-29", "30-39", "40-49", "50-59", "60+"),
                                 right = FALSE)) %>%
  group_by(kick_length_range) %>%
  summarise(
    league_accuracy = mean(success)
  )

# Step 6: Create the trellis chart with single dot markers for league average accuracy
ggplot(accuracy_by_kicker_range, aes(x = kick_length_range, y = accuracy)) +
  
  # Plot kicking accuracy by kicker
  geom_bar(stat = "identity", fill = "steelblue") +
  
  # Add league-wide average accuracy as a point marker (use the correct data mapping)
  geom_point(data = league_average_accuracy, aes(x = kick_length_range, y = league_accuracy), 
             color = "red", size = 3, shape = 18) +  # Red diamond marker for league average
  
  # Facet by kicker
  facet_wrap(~displayName, ncol = 2) +  # Facet by kicker, 2 columns of plots
  
  # Format y-axis as percentages
  scale_y_continuous(labels = scales::percent_format()) +
  
  # Add titles and labels
  labs(
    title = "Kicking Accuracy by Kick Length Range for Top 10 Kickers (with League Average)",
    x = "Kick Length Range (yards)",
    y = "Accuracy (Success Rate)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

Here are the most accurate kickers by range:

20-29 Daniel Carlson 100% (27 for 27)
30-39 Ka’imi Fairbairn 100% (28 for 28)
40-49 Graham Gano 100% (13 for 13)
50-59 Josh Lambo 100% (10 for 11)
60+ Brett Maher 100% (3 for 3)

Step 7: Clutch Kicking Analysis

Lastly, we define clutch kicks as those occurring with a score difference of 6 points or less and less than 2 minutes remaining in the 4th quarter. We visualize the performance of kickers in clutch situations.

# Filter for clutch kicks
clutch_kicks <- field_goal_data_filtered %>%
  filter(abs(preSnapHomeScore - preSnapVisitorScore) <= 6, 
         quarter == 4, 
         as.numeric(sub(".*:", "", gameClock)) < 2)

# Clutch kicking stats by kicker (minimum 9 attempts)
clutch_kicking_by_kicker <- clutch_kicks %>%
  group_by(displayName) %>%
  summarise(clutch_attempts = n(), clutch_successful_attempts = sum(success), clutch_accuracy = mean(success)) %>%
  filter(clutch_attempts >= 9) %>%
  arrange(desc(clutch_accuracy))

# Bar chart: clutch accuracy by kicker
ggplot(clutch_kicking_by_kicker, aes(x = reorder(displayName, clutch_accuracy), y = clutch_accuracy)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  scale_y_continuous(labels = percent_format()) +
  labs(title = "Clutch Kicking Accuracy by Kicker (Min 9 Attempts)", 
       x = "Kicker", 
       y = "Clutch Accuracy (Success Rate)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

There were four kickers who made every “clutch kick attempt”: Sanders, Tucker, Gould, and Succop.

Conclusion

Determining the best kicker is a somewhat subjective task. Is the best kicker the most accurate kicker, or perhaps the most clutch, or should it be the kicker who delivered the most points above what they were expected to produce. When considering all three criteria, it is our asssement that Justin Tucker was the best NFL kicker from 2018 to 2020. Tucker connected on 92.5% of his kicks, ranking 3rd overall. He made 100% of his clutch kick attempts (14 of 14). And Tucker delivered more points over expected than any kicker in the league (+25.1)

NFL Field Goal Analysis