Kicks Above Replacement - A Better Way to Evaluate NFL Kickers

Introduction

The purpose of this project is to evaluate the best kicker in the NFL through the 2018-2020 seasons using concepts found in common SABR statistics in MLB such as Wins Above Replacement (WAR). Using these concepts, each field goal attempted is assigned a Weighted Kick Value (wKV) based on a normalized average range called Distance+. These values, when summed, form a statistic I am calling “Kicks Above Average” (KAA) where the greater the value, the better overall performance of that kicker.

Description of Project

Using the data from the NFL Big Data Bowl, multiple data frames are created in order to generate the KAA for each kicker. First, a main data frame is generated which collects all the data that is useful in the creation of the statistic. dfMain then informs the creation of dfOverallPerformance which includes a number of averages and values that are referenced later. These are used to generate the data in dfKAAData which produces the Distance+ and wKV for each kick attempt. These are then summed in the dfKAA data frame which illustrates each players Kicks Above Average for each season studied.

Data Visualization

The data visualization shown illustrates the overall KAA for each kicker during each of the 3 seasons. A chart also illustrates this data alongside traditional kicks attempted and accuracy statistics

Player KAA Tables

Top 10 Kickers by KAA for 2018 Season :

# ------------------------ Libraries -------------------------------------------

setwd("C:/Users/brygn/Dropbox/Stuff/School/IS470")

suppressMessages(library(ggplot2)) # Data viz tools
suppressMessages(library(ggalt)) # ???
suppressMessages(library(ggforce)) # ???
suppressMessages(library(hms)) # Tools for calculating Time
suppressMessages(library(data.table)) # Summarization of data tools
suppressMessages(library(dplyr)) # Tools for creating DF
suppressMessages(library(nflfastR)) # NFL team data
suppressMessages(library(ggimage)) # Tools for getting an image in plots
suppressMessages(library(tidyverse)) # Tools for formatting data
suppressMessages(library(scales))
suppressMessages(library(kableExtra))

# ------------------------ Data Formulation ------------------------------------

# Importation of data files referenced
gamesFile <- "Data/NFLBDB2022/games.csv"
dfGames <- fread(gamesFile)

playsFile <- "Data/NFLBDB2022/plays.csv"
dfPlays <- fread(playsFile)

dfPlays <- dfPlays %>%
  filter(specialTeamsPlayType == "Field Goal") %>%
  data.frame()

playersFile <- "Data/NFLBDB2022/players.csv"
dfPlayers <- fread(playersFile)

scoutingFile <- "Data/NFLBDB2022/PFFScoutingData.csv"
dfScouting <- fread(scoutingFile)


# Joins data sets and filters down to the play types and variables desired
dfMain <- dfGames %>%
  
  # Joins with plays, players, and scouting data
  left_join(dfPlays, by = c("gameId" = "gameId")) %>%
  left_join(dfPlayers, by = c("kickerId" = "nflId")) %>%
  left_join(dfScouting, by = c("gameId" = "gameId", "playId" = "playId")) %>%
  
  #Sorts by display name and gameId for easy reading
  arrange(displayName, gameId) %>%
  
  # Selection of the variables that are useful for the study
  select(displayName,gameId,
         season,week,homeTeamAbbr,visitorTeamAbbr,playId,playDescription,quarter, 
         possessionTeam, specialTeamsResult, kickerId, gameClock, kickLength
         ) %>%
  # Filters out the data points that are not either a successful or failed field goal
  filter(specialTeamsResult != "Non-Special Teams Result" & specialTeamsResult != "Out of Bounds" & specialTeamsResult != "Downed" & specialTeamsResult != "Blocked Kick Attempt") %>%
  data.frame()


# Creates an overall reference for stats that will be used in creation of KAA
dfOverallPerformance <- dfMain %>%
  group_by(season) %>%
  summarise(
    kicksAttempted = n(), # Sums total kick attempts for that season
    kicksMade = sum(specialTeamsResult == "Kick Attempt Good"), # Sum of all good kicks
    kicksMissed = sum(specialTeamsResult == "Kick Attempt No Good"), # Sum of all missed kicks
    minDisAttempted = min(kickLength), # Shortest kick of that season
    maxDisAttempted = max(kickLength), # Longest kick of that season 
    avgDisAttempted = signif(sum(kickLength) / kicksAttempted, 4), # Average distance tried
    minRationalized = minDisAttempted / avgDisAttempted, # normalizing the min to the average
    maxRationalized = maxDisAttempted / avgDisAttempted, # normalizing the max to the average
    firstQuartile = quantile(kickLength, probs = 0.25), # Quartile calculations, Deprecated but still a good reference
    secondQuartile = quantile(kickLength, probs = 0.50),
    thirdQuartile = quantile(kickLength, probs = 0.75)
    ) %>%
  mutate(
    kickAccuracy = signif((kicksMade / (kicksAttempted))*100, 4) # Calculates a percentage of overall accuracy
  ) %>%
  
  # Sort by season and decending kick accuracy
  arrange(season, -kickAccuracy) %>% 
  data.frame()

# Using the above calculated averages, the weighted value of each kick is calculated
dfKAAData <- dfMain %>%
  select(displayName, gameId, playId, season, week, kickLength, specialTeamsResult) %>% # Filtering down to just the variables for wKV
  mutate(
    # Distance Ratio+ normalizes that kick distance as a measure above or below the average. A 10 point delta in DR+ is a 10% delta from average. 
    disRatioPlus = ifelse(season == "2018", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [1])), 3),
                             ifelse(season == "2019", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [2])), 3),
                                    ifelse(season == "2020", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [3])), 3), 0)
                                   )
                         ),
    # Weighted Kick Value assigns a value for kicks based on if it was made, and the distance. Kickers receive more points for making a further kick and lose more for missing a shorter kick
    wKickValue = ifelse(season == "2018", ifelse(specialTeamsResult == "Kick Attempt Good", ((disRatioPlus / 100)- dfOverallPerformance$minRationalized [1]), -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [1] + (dfOverallPerformance$maxRationalized [1] - 1.5)))),
                        ifelse(season == "2019", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [2], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [2] + (dfOverallPerformance$maxRationalized [2] - 1.5)))),
                               ifelse(season == "2020", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [3], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [3] + (dfOverallPerformance$maxRationalized [3] - 1.5)))), 0)
                              )
                  ),
  ) %>%
  
  # Sorted by kick length, season, and week
  arrange(kickLength, season, week) %>%
  data.frame()

# Sums the wKickValue for each kick attempt to generate KAA value 
dfKAA <- dfKAAData %>%
  filter(season == "2018") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

displayName	playerKicksAttempted	playerKickAccuracy	kicksAboveAverage
Jason Myers	36	92%	16.70110
Justin Tucker	36	97%	16.36900
Ka’imi Fairbairn	41	88%	14.46000
Mason Crosby	35	83%	13.09640
Wil Lutz	26	100%	12.92870
Aldrick Rosas	31	97%	12.75710
Robbie Gould	31	97%	12.33110
Dustin Hopkins	28	89%	12.30260
Matt Bryant	21	95%	10.64030
Brett Maher	34	82%	9.70826

Top 10 Kickers by KAA for 2019 Season :

dfKAA <- dfKAAData %>%
  filter(season == "2019") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

displayName	playerKicksAttempted	playerKickAccuracy	kicksAboveAverage
Brandon McManus	31	90%	13.83370
Harrison Butker	34	94%	13.43810
Josh Lambo	31	97%	12.28360
Justin Tucker	29	97%	12.17690
Wil Lutz	32	91%	11.88860
Joey Slye	27	81%	10.73060
Randy Bullock	30	87%	10.54600
Dan Bailey	27	93%	10.40840
Matt Gay	31	81%	10.30300
Chris Boswell	28	93%	9.61819

Top 10 Kickers by KAA for 2020 Season :

dfKAA <- dfKAAData %>%
  filter(season == "2020") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

displayName	playerKicksAttempted	playerKickAccuracy	kicksAboveAverage
Jason Sanders	35	91%	15.51500
Graham Gano	30	97%	15.06150
Brandon McManus	31	87%	14.16790
Greg Zuerlein	36	83%	13.41440
Younghoe Koo	33	94%	13.17240
Jason Myers	20	100%	11.38360
Justin Tucker	27	89%	11.07870
Nick Folk	25	96%	10.90410
Cairo Santos	29	93%	10.84130
Ka’imi Fairbairn	28	89%	9.59302

Visualization

Visualization 1: Overall Visualization Across 3 Seasons:

dfKAA <- dfKAAData %>%
  group_by(displayName, season) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(season, -kicksAboveAverage) %>%
  data.frame()

ggplot(data = dfKAA) +
  geom_bar(aes(x = reorder(displayName, kicksAboveAverage), y = kicksAboveAverage, fill = kicksAboveAverage), colour = "black", stat = "identity") +
  coord_flip() + 
  labs(x= "Kickers", y = "Kicks Above Average", title = "Overall Kicks Above Average by Player", fill = "KAA") +
  scale_fill_continuous(breaks = seq(0,17),
                        limits = c(0, 17),
                        labels = seq(0, 17),
                        low = "red",
                        high = "forestgreen") +
  theme(plot.title= element_text(hjust = 0.50))

Conclusion

This new statistic ultimately creates a one stop number that demonstrates what can often be deciphered by looking at a kicker’s numbers. Using a statistical reporting website like Pro-Football-Reference.com, a careful viewer can see that a kicker may have a lower accuracy, but has more attempts from a further distance, or the inverse a kicker may have a higher accuracy but more attempts from a shorter distance. This value, by rewarding kickers for making long distance kicks, and harming them by missing short kicks, creates one overall number that demonstrates how much above average that kicker is at making field goals beyond just a simple accuracy value.

With this concept in mind, it is clear how simple accuracy does not demonstrate a kicker’s value. In every season shown, there is examples of how a more accurate kicker has a lower KAA value due to not attempting kicks from a longer distances. KAA adjusts for this and creates a much more holistic view of kicker value.

From the data, we can see that Justin Tucker is the most valuable kicker across the three seasons studied with Brandon McManus and Jason Myers close behind him. While this does track with accuracy, it demonstrates how accuracy is not the most accurate predictor of kicker value.

Note

If interested in more or have any questions or suggestions, please get in contact with me at brygnichols@gmail.com