Introduction

For assignment 2, we were tasked with developing a way to find the best kickers in the NFL.

My solution: FG% Above Expected

Description of Project

To develop FG% Above Expected, I set out to build a logistic regression model.

Initially, I wanted to include several metrics: distance, point differential, quarter, and hash mark. But after viewing the output for this model, it was clear that kick distance was the only significant predictor.

From here, I built a logistic regression model to predict whether kickers would make their fied goals based on the kick distance. It should be noted that blocked kicks were excluded from this analysis.

Model

## 
## Call:
## glm(formula = numericResult ~ kickLength, family = "binomial", 
##     data = dfKicking)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.501392   0.325741   19.96   <2e-16 ***
## kickLength  -0.113773   0.007145  -15.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2193.1  on 2605  degrees of freedom
## Residual deviance: 1860.7  on 2604  degrees of freedom
## AIC: 1864.7
## 
## Number of Fisher Scoring iterations: 5

Confusion Matrix

# Plotting confusion matrix
ggplot(confusion_df, aes(x = Actual, y = Predicted)) +
  geom_tile(aes(fill = Percentage), color = "black") +
  geom_text(aes(label = sprintf("%.1f%%", Percentage)), vjust = 1) +
  scale_fill_gradient(low = "white", high = "lightblue") +
  theme_minimal() +
  labs(title = "Confusion Matrix",
       x = "Actual Values",
       y = "Predicted Values")

Here, we can see our model did a good job predicting when kickers missed and made field goals. Lets look at the diagnostic metrics for our model:

Model Metrics

# Model Diagnostics
TN <- confusion_matrix[1, 1]  # True Negatives
FP <- confusion_matrix[1, 2]  # False Positives
FN <- confusion_matrix[2, 1]  # False Negatives
TP <- confusion_matrix[2, 2]  # True Positives
Accuracy <- (TP + TN) / sum(confusion_matrix)
Precision <- TP / (TP + FP)
Recall <- TP / (TP + FN)
F1 <- 2 * ((Precision * Recall) / (Precision + Recall))

# Putting model diagnostics in df
metrics_df <- data.frame(
  Metric = c("Accuracy", "Precision", "Recall", "F1 Score"),
  Value = c(Accuracy, Precision, Recall, F1)
)

# Outputting model diagnostics to table
knitr::kable(head(metrics_df, 4), caption = "Model Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Model Metrics
Metric Value
Accuracy 0.8511128
Precision 0.9923354
Recall 0.8557543
F1 Score 0.9189979

Given these good metrics for our model, it’s clear we’ve developed a solid metric for predicting field goals. Lets look at our models probability plot by distance:

Probability Plot

# Creating DF with predictions
dfKickingPredicted <- dfKicking %>%
  mutate(predictedResult = predict(kickingLogit, dfKicking, type = "response"))


# Plotting the logistic regression curve
ggplot(dfKickingPredicted, aes(x = kickLength, y = numericResult)) +
  geom_line(aes(y = predictedResult), color = "lightblue", size = 1) +  # Logistic regression curve
  labs(title = "Probability of a successful kick by distance",
       x = "Kick Length",
       y = "Probability of Success") +
  theme_minimal()

Final Output

Now that we can predict when kickers make field goals, we can simply take the difference of their actual FG% and their expected FG% to get a solid measure of their performance. Here’s what the results look like:

FG% vs FG% Above Expected

# Final Output Plot
ggplot(dfKickingOutput, aes(x = actualFGPerc, y = percAboveExpected)) +
  geom_point(size = 1, color = "lightblue") +  # Logistic regression curve
  geom_text(aes(label = displayName), vjust = -1, hjust = 0.5, size = 2, color = "black") +
  labs(title = "Actual FG% against FG% Above Expected",
       x = "Actual FG%",
       y = "FG% Above Expected"
       ) +
  theme_minimal()

Top 20 Kickers by FG% Above Expected

# Final Output Table
knitr::kable(head(dfKickingOutput, 20), caption = "Top 20 Kickers by FG% above Expected") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
Top 20 Kickers by FG% above Expected
displayName actualFGPerc expectedFGPerc percAboveExpected
Graham Gano 93.33333 83.25496 10.0783723
Justin Tucker 94.56522 84.74019 9.8250287
Josh Lambo 94.73684 85.06711 9.6697333
Jason Myers 91.25000 84.15655 7.0934515
Brandon McManus 87.80488 81.07513 6.7297461
Nick Folk 92.68293 87.25926 5.4236712
Wil Lutz 90.58824 86.32991 4.2583251
Mason Crosby 88.73239 84.53271 4.1996859
Harrison Butker 91.66667 87.60192 4.0647422
Younghoe Koo 91.22807 87.27391 3.9541633
Randy Bullock 85.71429 84.06908 1.6452041
Jason Sanders 85.54217 83.90276 1.6394116
Dustin Hopkins 85.88235 84.29455 1.5878029
Matt Bryant 82.35294 80.77104 1.5819025
Chris Boswell 88.70968 88.16898 0.5406992
Greg Zuerlein 82.60870 82.66352 -0.0548229
Sebastian Janikowski 83.33333 83.58398 -0.2506468
Aldrick Rosas 86.53846 87.00913 -0.4706684
Ka’imi Fairbairn 84.37500 85.19309 -0.8180913
Joey Slye 79.24528 80.21382 -0.9685366

Conclusion

Given our results, it’s clear that the NFL had three elite kickers in 2022: Justin Tucker, Graham Gano, and Josh Lambo.

Everyone in this elite trio had some of the highest field goal percentages in the league while simultaneously having some of the lowest expected field goal percentages in the league.