Assignment2

Introduction
Description of Project
Model
Final Output
- FG% vs FG% Above Expected
- Top 20 Kickers by FG% Above Expected
Conclusion

Introduction

For assignment 2, we were tasked with developing a way to find the best kickers in the NFL.

My solution: FG% Above Expected

Description of Project

To develop FG% Above Expected, I set out to build a logistic regression model.

Initially, I wanted to include several metrics: distance, point differential, quarter, and hash mark. But after viewing the output for this model, it was clear that kick distance was the only significant predictor.

From here, I built a logistic regression model to predict whether kickers would make their fied goals based on the kick distance. It should be noted that blocked kicks were excluded from this analysis.

Model

## 
## Call:
## glm(formula = numericResult ~ kickLength, family = "binomial", 
##     data = dfKicking)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.501392   0.325741   19.96   <2e-16 ***
## kickLength  -0.113773   0.007145  -15.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2193.1  on 2605  degrees of freedom
## Residual deviance: 1860.7  on 2604  degrees of freedom
## AIC: 1864.7
## 
## Number of Fisher Scoring iterations: 5

Confusion Matrix

# Plotting confusion matrix
ggplot(confusion_df, aes(x = Actual, y = Predicted)) +
  geom_tile(aes(fill = Percentage), color = "black") +
  geom_text(aes(label = sprintf("%.1f%%", Percentage)), vjust = 1) +
  scale_fill_gradient(low = "white", high = "lightblue") +
  theme_minimal() +
  labs(title = "Confusion Matrix",
       x = "Actual Values",
       y = "Predicted Values")

Here, we can see our model did a good job predicting when kickers missed and made field goals. Lets look at the diagnostic metrics for our model:

Model Metrics

# Model Diagnostics
TN <- confusion_matrix[1, 1]  # True Negatives
FP <- confusion_matrix[1, 2]  # False Positives
FN <- confusion_matrix[2, 1]  # False Negatives
TP <- confusion_matrix[2, 2]  # True Positives
Accuracy <- (TP + TN) / sum(confusion_matrix)
Precision <- TP / (TP + FP)
Recall <- TP / (TP + FN)
F1 <- 2 * ((Precision * Recall) / (Precision + Recall))

# Putting model diagnostics in df
metrics_df <- data.frame(
  Metric = c("Accuracy", "Precision", "Recall", "F1 Score"),
  Value = c(Accuracy, Precision, Recall, F1)
)

# Outputting model diagnostics to table
knitr::kable(head(metrics_df, 4), caption = "Model Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Model Metrics
Metric	Value
Accuracy	0.8511128
Precision	0.9923354
Recall	0.8557543
F1 Score	0.9189979

Given these good metrics for our model, it’s clear we’ve developed a solid metric for predicting field goals. Lets look at our models probability plot by distance:

Probability Plot

# Creating DF with predictions
dfKickingPredicted <- dfKicking %>%
  mutate(predictedResult = predict(kickingLogit, dfKicking, type = "response"))


# Plotting the logistic regression curve
ggplot(dfKickingPredicted, aes(x = kickLength, y = numericResult)) +
  geom_line(aes(y = predictedResult), color = "lightblue", size = 1) +  # Logistic regression curve
  labs(title = "Probability of a successful kick by distance",
       x = "Kick Length",
       y = "Probability of Success") +
  theme_minimal()

Final Output

Now that we can predict when kickers make field goals, we can simply take the difference of their actual FG% and their expected FG% to get a solid measure of their performance. Here’s what the results look like:

FG% vs FG% Above Expected

# Final Output Plot
ggplot(dfKickingOutput, aes(x = actualFGPerc, y = percAboveExpected)) +
  geom_point(size = 1, color = "lightblue") +  # Logistic regression curve
  geom_text(aes(label = displayName), vjust = -1, hjust = 0.5, size = 2, color = "black") +
  labs(title = "Actual FG% against FG% Above Expected",
       x = "Actual FG%",
       y = "FG% Above Expected"
       ) +
  theme_minimal()

Top 20 Kickers by FG% Above Expected

# Final Output Table
knitr::kable(head(dfKickingOutput, 20), caption = "Top 20 Kickers by FG% above Expected") %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Top 20 Kickers by FG% above Expected
displayName	actualFGPerc	expectedFGPerc	percAboveExpected
Graham Gano	93.33333	83.25496	10.0783723
Justin Tucker	94.56522	84.74019	9.8250287
Josh Lambo	94.73684	85.06711	9.6697333
Jason Myers	91.25000	84.15655	7.0934515
Brandon McManus	87.80488	81.07513	6.7297461
Nick Folk	92.68293	87.25926	5.4236712
Wil Lutz	90.58824	86.32991	4.2583251
Mason Crosby	88.73239	84.53271	4.1996859
Harrison Butker	91.66667	87.60192	4.0647422
Younghoe Koo	91.22807	87.27391	3.9541633
Randy Bullock	85.71429	84.06908	1.6452041
Jason Sanders	85.54217	83.90276	1.6394116
Dustin Hopkins	85.88235	84.29455	1.5878029
Matt Bryant	82.35294	80.77104	1.5819025
Chris Boswell	88.70968	88.16898	0.5406992
Greg Zuerlein	82.60870	82.66352	-0.0548229
Sebastian Janikowski	83.33333	83.58398	-0.2506468
Aldrick Rosas	86.53846	87.00913	-0.4706684
Ka’imi Fairbairn	84.37500	85.19309	-0.8180913
Joey Slye	79.24528	80.21382	-0.9685366

Conclusion

Given our results, it’s clear that the NFL had three elite kickers in 2022: Justin Tucker, Graham Gano, and Josh Lambo.

Everyone in this elite trio had some of the highest field goal percentages in the league while simultaneously having some of the lowest expected field goal percentages in the league.