Introduction

The purpose of this project is to evaluate the best kicker in the NFL through the 2018-2020 seasons using concepts found in common SABR statistics in MLB such as Wins Above Replacement (WAR). Using these concepts, each field goal attempted is assigned a Weighted Kick Value (wKV) based on a normalized average range called Distance+. These values, when summed, form a statistic I am calling “Kicks Above Average” (KAA) where the greater the value, the better overall performance of that kicker.

Description of Project

Using the data from the NFL Big Data Bowl, multiple data frames are created in order to generate the KAA for each kicker. First, a main data frame is generated which collects all the data that is useful in the creation of the statistic. dfMain then informs the creation of dfOverallPerformance which includes a number of averages and values that are referenced later. These are used to generate the data in dfKAAData which produces the Distance+ and wKV for each kick attempt. These are then summed in the dfKAA data frame which illustrates each players Kicks Above Average for each season studied.

Data Visualization

The data visualization shown illustrates the overall KAA for each kicker during each of the 3 seasons. A chart also illustrates this data alongside traditional kicks attempted and accuracy statistics

Player KAA Tables


Top 10 Kickers by KAA for 2018 Season :

# ------------------------ Libraries -------------------------------------------

setwd("C:/Users/brygn/Dropbox/Stuff/School/IS470")

suppressMessages(library(ggplot2)) # Data viz tools
suppressMessages(library(ggalt)) # ???
suppressMessages(library(ggforce)) # ???
suppressMessages(library(hms)) # Tools for calculating Time
suppressMessages(library(data.table)) # Summarization of data tools
suppressMessages(library(dplyr)) # Tools for creating DF
suppressMessages(library(nflfastR)) # NFL team data
suppressMessages(library(ggimage)) # Tools for getting an image in plots
suppressMessages(library(tidyverse)) # Tools for formatting data
suppressMessages(library(scales))
suppressMessages(library(kableExtra))

# ------------------------ Data Formulation ------------------------------------

# Importation of data files referenced
gamesFile <- "Data/NFLBDB2022/games.csv"
dfGames <- fread(gamesFile)

playsFile <- "Data/NFLBDB2022/plays.csv"
dfPlays <- fread(playsFile)

dfPlays <- dfPlays %>%
  filter(specialTeamsPlayType == "Field Goal") %>%
  data.frame()

playersFile <- "Data/NFLBDB2022/players.csv"
dfPlayers <- fread(playersFile)

scoutingFile <- "Data/NFLBDB2022/PFFScoutingData.csv"
dfScouting <- fread(scoutingFile)


# Joins data sets and filters down to the play types and variables desired
dfMain <- dfGames %>%
  
  # Joins with plays, players, and scouting data
  left_join(dfPlays, by = c("gameId" = "gameId")) %>%
  left_join(dfPlayers, by = c("kickerId" = "nflId")) %>%
  left_join(dfScouting, by = c("gameId" = "gameId", "playId" = "playId")) %>%
  
  #Sorts by display name and gameId for easy reading
  arrange(displayName, gameId) %>%
  
  # Selection of the variables that are useful for the study
  select(displayName,gameId,
         season,week,homeTeamAbbr,visitorTeamAbbr,playId,playDescription,quarter, 
         possessionTeam, specialTeamsResult, kickerId, gameClock, kickLength
         ) %>%
  # Filters out the data points that are not either a successful or failed field goal
  filter(specialTeamsResult != "Non-Special Teams Result" & specialTeamsResult != "Out of Bounds" & specialTeamsResult != "Downed" & specialTeamsResult != "Blocked Kick Attempt") %>%
  data.frame()


# Creates an overall reference for stats that will be used in creation of KAA
dfOverallPerformance <- dfMain %>%
  group_by(season) %>%
  summarise(
    kicksAttempted = n(), # Sums total kick attempts for that season
    kicksMade = sum(specialTeamsResult == "Kick Attempt Good"), # Sum of all good kicks
    kicksMissed = sum(specialTeamsResult == "Kick Attempt No Good"), # Sum of all missed kicks
    minDisAttempted = min(kickLength), # Shortest kick of that season
    maxDisAttempted = max(kickLength), # Longest kick of that season 
    avgDisAttempted = signif(sum(kickLength) / kicksAttempted, 4), # Average distance tried
    minRationalized = minDisAttempted / avgDisAttempted, # normalizing the min to the average
    maxRationalized = maxDisAttempted / avgDisAttempted, # normalizing the max to the average
    firstQuartile = quantile(kickLength, probs = 0.25), # Quartile calculations, Deprecated but still a good reference
    secondQuartile = quantile(kickLength, probs = 0.50),
    thirdQuartile = quantile(kickLength, probs = 0.75)
    ) %>%
  mutate(
    kickAccuracy = signif((kicksMade / (kicksAttempted))*100, 4) # Calculates a percentage of overall accuracy
  ) %>%
  
  # Sort by season and decending kick accuracy
  arrange(season, -kickAccuracy) %>% 
  data.frame()

# Using the above calculated averages, the weighted value of each kick is calculated
dfKAAData <- dfMain %>%
  select(displayName, gameId, playId, season, week, kickLength, specialTeamsResult) %>% # Filtering down to just the variables for wKV
  mutate(
    # Distance Ratio+ normalizes that kick distance as a measure above or below the average. A 10 point delta in DR+ is a 10% delta from average. 
    disRatioPlus = ifelse(season == "2018", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [1])), 3),
                             ifelse(season == "2019", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [2])), 3),
                                    ifelse(season == "2020", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [3])), 3), 0)
                                   )
                         ),
    # Weighted Kick Value assigns a value for kicks based on if it was made, and the distance. Kickers receive more points for making a further kick and lose more for missing a shorter kick
    wKickValue = ifelse(season == "2018", ifelse(specialTeamsResult == "Kick Attempt Good", ((disRatioPlus / 100)- dfOverallPerformance$minRationalized [1]), -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [1] + (dfOverallPerformance$maxRationalized [1] - 1.5)))),
                        ifelse(season == "2019", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [2], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [2] + (dfOverallPerformance$maxRationalized [2] - 1.5)))),
                               ifelse(season == "2020", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [3], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [3] + (dfOverallPerformance$maxRationalized [3] - 1.5)))), 0)
                              )
                  ),
  ) %>%
  
  # Sorted by kick length, season, and week
  arrange(kickLength, season, week) %>%
  data.frame()

# Sums the wKickValue for each kick attempt to generate KAA value 
dfKAA <- dfKAAData %>%
  filter(season == "2018") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
displayName playerKicksAttempted playerKickAccuracy kicksAboveAverage
Jason Myers 36 92% 16.70110
Justin Tucker 36 97% 16.36900
Ka’imi Fairbairn 41 88% 14.46000
Mason Crosby 35 83% 13.09640
Wil Lutz 26 100% 12.92870
Aldrick Rosas 31 97% 12.75710
Robbie Gould 31 97% 12.33110
Dustin Hopkins 28 89% 12.30260
Matt Bryant 21 95% 10.64030
Brett Maher 34 82% 9.70826

Top 10 Kickers by KAA for 2019 Season :

dfKAA <- dfKAAData %>%
  filter(season == "2019") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
displayName playerKicksAttempted playerKickAccuracy kicksAboveAverage
Brandon McManus 31 90% 13.83370
Harrison Butker 34 94% 13.43810
Josh Lambo 31 97% 12.28360
Justin Tucker 29 97% 12.17690
Wil Lutz 32 91% 11.88860
Joey Slye 27 81% 10.73060
Randy Bullock 30 87% 10.54600
Dan Bailey 27 93% 10.40840
Matt Gay 31 81% 10.30300
Chris Boswell 28 93% 9.61819

Top 10 Kickers by KAA for 2020 Season :

dfKAA <- dfKAAData %>%
  filter(season == "2020") %>%
  group_by(displayName) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(-kicksAboveAverage) %>%
  data.frame()

knitr::kable(head(dfKAA, 10)) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
displayName playerKicksAttempted playerKickAccuracy kicksAboveAverage
Jason Sanders 35 91% 15.51500
Graham Gano 30 97% 15.06150
Brandon McManus 31 87% 14.16790
Greg Zuerlein 36 83% 13.41440
Younghoe Koo 33 94% 13.17240
Jason Myers 20 100% 11.38360
Justin Tucker 27 89% 11.07870
Nick Folk 25 96% 10.90410
Cairo Santos 29 93% 10.84130
Ka’imi Fairbairn 28 89% 9.59302

Visualization


Visualization 1: Overall Visualization Across 3 Seasons:

dfKAA <- dfKAAData %>%
  group_by(displayName, season) %>%
  summarise(
    playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
    playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
    kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
  ) %>%
  
  # Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
  filter((playerKicksAttempted / (16*0.75)) > 1) %>%
  
  # Sorts by season, and then descending KAA
  arrange(season, -kicksAboveAverage) %>%
  data.frame()

ggplot(data = dfKAA) +
  geom_bar(aes(x = reorder(displayName, kicksAboveAverage), y = kicksAboveAverage, fill = kicksAboveAverage), colour = "black", stat = "identity") +
  coord_flip() + 
  labs(x= "Kickers", y = "Kicks Above Average", title = "Overall Kicks Above Average by Player", fill = "KAA") +
  scale_fill_continuous(breaks = seq(0,17),
                        limits = c(0, 17),
                        labels = seq(0, 17),
                        low = "red",
                        high = "forestgreen") +
  theme(plot.title= element_text(hjust = 0.50))

Conclusion

This new statistic ultimately creates a one stop number that demonstrates what can often be deciphered by looking at a kicker’s numbers. Using a statistical reporting website like Pro-Football-Reference.com, a careful viewer can see that a kicker may have a lower accuracy, but has more attempts from a further distance, or the inverse a kicker may have a higher accuracy but more attempts from a shorter distance. This value, by rewarding kickers for making long distance kicks, and harming them by missing short kicks, creates one overall number that demonstrates how much above average that kicker is at making field goals beyond just a simple accuracy value.

With this concept in mind, it is clear how simple accuracy does not demonstrate a kicker’s value. In every season shown, there is examples of how a more accurate kicker has a lower KAA value due to not attempting kicks from a longer distances. KAA adjusts for this and creates a much more holistic view of kicker value.

From the data, we can see that Justin Tucker is the most valuable kicker across the three seasons studied with Brandon McManus and Jason Myers close behind him. While this does track with accuracy, it demonstrates how accuracy is not the most accurate predictor of kicker value.

Note

If interested in more or have any questions or suggestions, please get in contact with me at