For assignment 2, we were tasked with developing a way to find the best kickers in the NFL.
My solution: FG% Above Expected
To develop FG% Above Expected, I set out to build a logistic regression model.
Initially, I wanted to include several metrics: distance, point differential, quarter, and hash mark. But after viewing the output for this model, it was clear that kick distance was the only significant predictor.
From here, I built a logistic regression model to predict whether kickers would make their fied goals based on the kick distance. It should be noted that blocked kicks were excluded from this analysis.
##
## Call:
## glm(formula = numericResult ~ kickLength, family = "binomial",
## data = dfKicking)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.501392 0.325741 19.96 <2e-16 ***
## kickLength -0.113773 0.007145 -15.92 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2193.1 on 2605 degrees of freedom
## Residual deviance: 1860.7 on 2604 degrees of freedom
## AIC: 1864.7
##
## Number of Fisher Scoring iterations: 5
# Plotting confusion matrix
ggplot(confusion_df, aes(x = Actual, y = Predicted)) +
geom_tile(aes(fill = Percentage), color = "black") +
geom_text(aes(label = sprintf("%.1f%%", Percentage)), vjust = 1) +
scale_fill_gradient(low = "white", high = "lightblue") +
theme_minimal() +
labs(title = "Confusion Matrix",
x = "Actual Values",
y = "Predicted Values")
Here, we can see our model did a good job predicting when kickers missed and made field goals. Lets look at the diagnostic metrics for our model:
# Model Diagnostics
TN <- confusion_matrix[1, 1] # True Negatives
FP <- confusion_matrix[1, 2] # False Positives
FN <- confusion_matrix[2, 1] # False Negatives
TP <- confusion_matrix[2, 2] # True Positives
Accuracy <- (TP + TN) / sum(confusion_matrix)
Precision <- TP / (TP + FP)
Recall <- TP / (TP + FN)
F1 <- 2 * ((Precision * Recall) / (Precision + Recall))
# Putting model diagnostics in df
metrics_df <- data.frame(
Metric = c("Accuracy", "Precision", "Recall", "F1 Score"),
Value = c(Accuracy, Precision, Recall, F1)
)
# Outputting model diagnostics to table
knitr::kable(head(metrics_df, 4), caption = "Model Metrics") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Metric | Value |
|---|---|
| Accuracy | 0.8511128 |
| Precision | 0.9923354 |
| Recall | 0.8557543 |
| F1 Score | 0.9189979 |
Given these good metrics for our model, it’s clear we’ve developed a solid metric for predicting field goals. Lets look at our models probability plot by distance:
# Creating DF with predictions
dfKickingPredicted <- dfKicking %>%
mutate(predictedResult = predict(kickingLogit, dfKicking, type = "response"))
# Plotting the logistic regression curve
ggplot(dfKickingPredicted, aes(x = kickLength, y = numericResult)) +
geom_line(aes(y = predictedResult), color = "lightblue", size = 1) + # Logistic regression curve
labs(title = "Probability of a successful kick by distance",
x = "Kick Length",
y = "Probability of Success") +
theme_minimal()
Now that we can predict when kickers make field goals, we can simply take the difference of their actual FG% and their expected FG% to get a solid measure of their performance. Here’s what the results look like:
# Final Output Plot
ggplot(dfKickingOutput, aes(x = actualFGPerc, y = percAboveExpected)) +
geom_point(size = 1, color = "lightblue") + # Logistic regression curve
geom_text(aes(label = displayName), vjust = -1, hjust = 0.5, size = 2, color = "black") +
labs(title = "Actual FG% against FG% Above Expected",
x = "Actual FG%",
y = "FG% Above Expected"
) +
theme_minimal()
# Final Output Table
knitr::kable(head(dfKickingOutput, 20), caption = "Top 20 Kickers by FG% above Expected") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| displayName | actualFGPerc | expectedFGPerc | percAboveExpected |
|---|---|---|---|
| Graham Gano | 93.33333 | 83.25496 | 10.0783723 |
| Justin Tucker | 94.56522 | 84.74019 | 9.8250287 |
| Josh Lambo | 94.73684 | 85.06711 | 9.6697333 |
| Jason Myers | 91.25000 | 84.15655 | 7.0934515 |
| Brandon McManus | 87.80488 | 81.07513 | 6.7297461 |
| Nick Folk | 92.68293 | 87.25926 | 5.4236712 |
| Wil Lutz | 90.58824 | 86.32991 | 4.2583251 |
| Mason Crosby | 88.73239 | 84.53271 | 4.1996859 |
| Harrison Butker | 91.66667 | 87.60192 | 4.0647422 |
| Younghoe Koo | 91.22807 | 87.27391 | 3.9541633 |
| Randy Bullock | 85.71429 | 84.06908 | 1.6452041 |
| Jason Sanders | 85.54217 | 83.90276 | 1.6394116 |
| Dustin Hopkins | 85.88235 | 84.29455 | 1.5878029 |
| Matt Bryant | 82.35294 | 80.77104 | 1.5819025 |
| Chris Boswell | 88.70968 | 88.16898 | 0.5406992 |
| Greg Zuerlein | 82.60870 | 82.66352 | -0.0548229 |
| Sebastian Janikowski | 83.33333 | 83.58398 | -0.2506468 |
| Aldrick Rosas | 86.53846 | 87.00913 | -0.4706684 |
| Ka’imi Fairbairn | 84.37500 | 85.19309 | -0.8180913 |
| Joey Slye | 79.24528 | 80.21382 | -0.9685366 |
Given our results, it’s clear that the NFL had three elite kickers in 2022: Justin Tucker, Graham Gano, and Josh Lambo.
Everyone in this elite trio had some of the highest field goal percentages in the league while simultaneously having some of the lowest expected field goal percentages in the league.