The purpose of this project is to evaluate the best kicker in the NFL through the 2018-2020 seasons using concepts found in common SABR statistics in MLB such as Wins Above Replacement (WAR). Using these concepts, each field goal attempted is assigned a Weighted Kick Value (wKV) based on a normalized average range called Distance+. These values, when summed, form a statistic I am calling “Kicks Above Average” (KAA) where the greater the value, the better overall performance of that kicker.
Using the data from the NFL Big Data Bowl, multiple data frames are created in order to generate the KAA for each kicker. First, a main data frame is generated which collects all the data that is useful in the creation of the statistic. dfMain then informs the creation of dfOverallPerformance which includes a number of averages and values that are referenced later. These are used to generate the data in dfKAAData which produces the Distance+ and wKV for each kick attempt. These are then summed in the dfKAA data frame which illustrates each players Kicks Above Average for each season studied.
The data visualization shown illustrates the overall KAA for each kicker during each of the 3 seasons. A chart also illustrates this data alongside traditional kicks attempted and accuracy statistics
# ------------------------ Libraries -------------------------------------------
setwd("C:/Users/brygn/Dropbox/Stuff/School/IS470")
suppressMessages(library(ggplot2)) # Data viz tools
suppressMessages(library(ggalt)) # ???
suppressMessages(library(ggforce)) # ???
suppressMessages(library(hms)) # Tools for calculating Time
suppressMessages(library(data.table)) # Summarization of data tools
suppressMessages(library(dplyr)) # Tools for creating DF
suppressMessages(library(nflfastR)) # NFL team data
suppressMessages(library(ggimage)) # Tools for getting an image in plots
suppressMessages(library(tidyverse)) # Tools for formatting data
suppressMessages(library(scales))
suppressMessages(library(kableExtra))
# ------------------------ Data Formulation ------------------------------------
# Importation of data files referenced
gamesFile <- "Data/NFLBDB2022/games.csv"
dfGames <- fread(gamesFile)
playsFile <- "Data/NFLBDB2022/plays.csv"
dfPlays <- fread(playsFile)
dfPlays <- dfPlays %>%
filter(specialTeamsPlayType == "Field Goal") %>%
data.frame()
playersFile <- "Data/NFLBDB2022/players.csv"
dfPlayers <- fread(playersFile)
scoutingFile <- "Data/NFLBDB2022/PFFScoutingData.csv"
dfScouting <- fread(scoutingFile)
# Joins data sets and filters down to the play types and variables desired
dfMain <- dfGames %>%
# Joins with plays, players, and scouting data
left_join(dfPlays, by = c("gameId" = "gameId")) %>%
left_join(dfPlayers, by = c("kickerId" = "nflId")) %>%
left_join(dfScouting, by = c("gameId" = "gameId", "playId" = "playId")) %>%
#Sorts by display name and gameId for easy reading
arrange(displayName, gameId) %>%
# Selection of the variables that are useful for the study
select(displayName,gameId,
season,week,homeTeamAbbr,visitorTeamAbbr,playId,playDescription,quarter,
possessionTeam, specialTeamsResult, kickerId, gameClock, kickLength
) %>%
# Filters out the data points that are not either a successful or failed field goal
filter(specialTeamsResult != "Non-Special Teams Result" & specialTeamsResult != "Out of Bounds" & specialTeamsResult != "Downed" & specialTeamsResult != "Blocked Kick Attempt") %>%
data.frame()
# Creates an overall reference for stats that will be used in creation of KAA
dfOverallPerformance <- dfMain %>%
group_by(season) %>%
summarise(
kicksAttempted = n(), # Sums total kick attempts for that season
kicksMade = sum(specialTeamsResult == "Kick Attempt Good"), # Sum of all good kicks
kicksMissed = sum(specialTeamsResult == "Kick Attempt No Good"), # Sum of all missed kicks
minDisAttempted = min(kickLength), # Shortest kick of that season
maxDisAttempted = max(kickLength), # Longest kick of that season
avgDisAttempted = signif(sum(kickLength) / kicksAttempted, 4), # Average distance tried
minRationalized = minDisAttempted / avgDisAttempted, # normalizing the min to the average
maxRationalized = maxDisAttempted / avgDisAttempted, # normalizing the max to the average
firstQuartile = quantile(kickLength, probs = 0.25), # Quartile calculations, Deprecated but still a good reference
secondQuartile = quantile(kickLength, probs = 0.50),
thirdQuartile = quantile(kickLength, probs = 0.75)
) %>%
mutate(
kickAccuracy = signif((kicksMade / (kicksAttempted))*100, 4) # Calculates a percentage of overall accuracy
) %>%
# Sort by season and decending kick accuracy
arrange(season, -kickAccuracy) %>%
data.frame()
# Using the above calculated averages, the weighted value of each kick is calculated
dfKAAData <- dfMain %>%
select(displayName, gameId, playId, season, week, kickLength, specialTeamsResult) %>% # Filtering down to just the variables for wKV
mutate(
# Distance Ratio+ normalizes that kick distance as a measure above or below the average. A 10 point delta in DR+ is a 10% delta from average.
disRatioPlus = ifelse(season == "2018", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [1])), 3),
ifelse(season == "2019", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [2])), 3),
ifelse(season == "2020", signif(100*((kickLength / dfOverallPerformance$avgDisAttempted [3])), 3), 0)
)
),
# Weighted Kick Value assigns a value for kicks based on if it was made, and the distance. Kickers receive more points for making a further kick and lose more for missing a shorter kick
wKickValue = ifelse(season == "2018", ifelse(specialTeamsResult == "Kick Attempt Good", ((disRatioPlus / 100)- dfOverallPerformance$minRationalized [1]), -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [1] + (dfOverallPerformance$maxRationalized [1] - 1.5)))),
ifelse(season == "2019", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [2], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [2] + (dfOverallPerformance$maxRationalized [2] - 1.5)))),
ifelse(season == "2020", ifelse(specialTeamsResult == "Kick Attempt Good", (disRatioPlus / 100)- dfOverallPerformance$minRationalized [3], -1 + ((disRatioPlus / 100)- (dfOverallPerformance$minRationalized [3] + (dfOverallPerformance$maxRationalized [3] - 1.5)))), 0)
)
),
) %>%
# Sorted by kick length, season, and week
arrange(kickLength, season, week) %>%
data.frame()
# Sums the wKickValue for each kick attempt to generate KAA value
dfKAA <- dfKAAData %>%
filter(season == "2018") %>%
group_by(displayName) %>%
summarise(
playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
) %>%
# Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
filter((playerKicksAttempted / (16*0.75)) > 1) %>%
# Sorts by season, and then descending KAA
arrange(-kicksAboveAverage) %>%
data.frame()
knitr::kable(head(dfKAA, 10)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| displayName | playerKicksAttempted | playerKickAccuracy | kicksAboveAverage |
|---|---|---|---|
| Jason Myers | 36 | 92% | 16.70110 |
| Justin Tucker | 36 | 97% | 16.36900 |
| Ka’imi Fairbairn | 41 | 88% | 14.46000 |
| Mason Crosby | 35 | 83% | 13.09640 |
| Wil Lutz | 26 | 100% | 12.92870 |
| Aldrick Rosas | 31 | 97% | 12.75710 |
| Robbie Gould | 31 | 97% | 12.33110 |
| Dustin Hopkins | 28 | 89% | 12.30260 |
| Matt Bryant | 21 | 95% | 10.64030 |
| Brett Maher | 34 | 82% | 9.70826 |
dfKAA <- dfKAAData %>%
filter(season == "2019") %>%
group_by(displayName) %>%
summarise(
playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
) %>%
# Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
filter((playerKicksAttempted / (16*0.75)) > 1) %>%
# Sorts by season, and then descending KAA
arrange(-kicksAboveAverage) %>%
data.frame()
knitr::kable(head(dfKAA, 10)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| displayName | playerKicksAttempted | playerKickAccuracy | kicksAboveAverage |
|---|---|---|---|
| Brandon McManus | 31 | 90% | 13.83370 |
| Harrison Butker | 34 | 94% | 13.43810 |
| Josh Lambo | 31 | 97% | 12.28360 |
| Justin Tucker | 29 | 97% | 12.17690 |
| Wil Lutz | 32 | 91% | 11.88860 |
| Joey Slye | 27 | 81% | 10.73060 |
| Randy Bullock | 30 | 87% | 10.54600 |
| Dan Bailey | 27 | 93% | 10.40840 |
| Matt Gay | 31 | 81% | 10.30300 |
| Chris Boswell | 28 | 93% | 9.61819 |
dfKAA <- dfKAAData %>%
filter(season == "2020") %>%
group_by(displayName) %>%
summarise(
playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
) %>%
# Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
filter((playerKicksAttempted / (16*0.75)) > 1) %>%
# Sorts by season, and then descending KAA
arrange(-kicksAboveAverage) %>%
data.frame()
knitr::kable(head(dfKAA, 10)) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| displayName | playerKicksAttempted | playerKickAccuracy | kicksAboveAverage |
|---|---|---|---|
| Jason Sanders | 35 | 91% | 15.51500 |
| Graham Gano | 30 | 97% | 15.06150 |
| Brandon McManus | 31 | 87% | 14.16790 |
| Greg Zuerlein | 36 | 83% | 13.41440 |
| Younghoe Koo | 33 | 94% | 13.17240 |
| Jason Myers | 20 | 100% | 11.38360 |
| Justin Tucker | 27 | 89% | 11.07870 |
| Nick Folk | 25 | 96% | 10.90410 |
| Cairo Santos | 29 | 93% | 10.84130 |
| Ka’imi Fairbairn | 28 | 89% | 9.59302 |
dfKAA <- dfKAAData %>%
group_by(displayName, season) %>%
summarise(
playerKicksAttempted = n(), # Shows that player's attempted kicks for the season
playerKickAccuracy = percent((sum(specialTeamsResult == "Kick Attempt Good")) / playerKicksAttempted), #Represents their accuracy for that season
kicksAboveAverage = signif(sum(wKickValue), 6) # Sums their wKickValue to find KAA
) %>%
# Removes any player less than the measure to be a valid kicker (0.75 attempts per game)
filter((playerKicksAttempted / (16*0.75)) > 1) %>%
# Sorts by season, and then descending KAA
arrange(season, -kicksAboveAverage) %>%
data.frame()
ggplot(data = dfKAA) +
geom_bar(aes(x = reorder(displayName, kicksAboveAverage), y = kicksAboveAverage, fill = kicksAboveAverage), colour = "black", stat = "identity") +
coord_flip() +
labs(x= "Kickers", y = "Kicks Above Average", title = "Overall Kicks Above Average by Player", fill = "KAA") +
scale_fill_continuous(breaks = seq(0,17),
limits = c(0, 17),
labels = seq(0, 17),
low = "red",
high = "forestgreen") +
theme(plot.title= element_text(hjust = 0.50))
This new statistic ultimately creates a one stop number that demonstrates what can often be deciphered by looking at a kicker’s numbers. Using a statistical reporting website like Pro-Football-Reference.com, a careful viewer can see that a kicker may have a lower accuracy, but has more attempts from a further distance, or the inverse a kicker may have a higher accuracy but more attempts from a shorter distance. This value, by rewarding kickers for making long distance kicks, and harming them by missing short kicks, creates one overall number that demonstrates how much above average that kicker is at making field goals beyond just a simple accuracy value.
With this concept in mind, it is clear how simple accuracy does not demonstrate a kicker’s value. In every season shown, there is examples of how a more accurate kicker has a lower KAA value due to not attempting kicks from a longer distances. KAA adjusts for this and creates a much more holistic view of kicker value.
From the data, we can see that Justin Tucker is the most valuable kicker across the three seasons studied with Brandon McManus and Jason Myers close behind him. While this does track with accuracy, it demonstrates how accuracy is not the most accurate predictor of kicker value.
If interested in more or have any questions or suggestions, please get in contact with me at brygnichols@gmail.com