In order to determine who the best kicker is, I started by brainstorming different situations in which a kicker would be able to prove his ability. I ultimately landed on high pressure situations, and determined that those would be during the 4th quarter or overtime of any given game. Once I had create a subset for that data, I knew I had to come up with some sort of points system in order to create my ranking. I gave 1 point to each player for a made extra point, and 3 points to each player for a made field goal. I calculated both their actual amount of points scored, and the total possible points they could have scored, so that I could calculate their percentage later on. Once I then created a data frame excluding all of the repeated playIDs (because you can only score once per play), I calculated that percentage of scored vs possible, and ranked the players by that percentage. However, looking at this percentage was a bit misleading in my opinion, because a player with a score of 100% that had the opportunity to score 1 point and made 1 point is different than a player that had the opportunity to score 40 points and made 40 points. I decided I needed some sort of weighted score, to give more power to players with the larger amounts of opportunity. I used a logarithmic weighted ranking system, and ended up with Jason Sanders being the best kicker according to my ranking.
The first thing I did to subset my data was filter by the player being a kicker, the special teams play type being either Extra Point or Field Goal, and the quarter being either 4 or 5. I then selected the columns that I thought would be relevant to my work, most importantly being the player’s name, the game ID, play ID, the special teams play type and the special teams result. I then used the mutate function to create a new column called “actual_points” that would keep track of how many points the kicker made (1 point for Extra Point and 3 points for Field Goal). I accomplished this using “case when” and in that, also assigned 0 for when the kicker didn’t make the points. I then used the mutate function once again to add another column called “possible_points” and gave each player 1 point for Extra Point and 3 points for Field Goal no matter if they made the point or not. Once I had created a new data frame that excluded all of the duplicate play ID’s using “!duplicated” from the tidyverse library, I began calculating the totals for each players actual and possible points using the sum function, and then using mutate once again to create a new variable called “percentage” that was the player’s actual points scored divided by their total possible points, multiplied by 100. I then arranged those values from greatest to least, and looked at my result. I wasn’t completely satisfied with these results, and knew I needed to create some sort of weighted ranking system to allow players with more opportunities to be ranked higher than players with only 1 or 2 data points. Once I implemented that ranking system, I concluded that my best 3 kickers were Jason Sanders, Graham Gano, and Daniel Carlson.
In order to visualize this data, I decided a scatter plot/bubble chart would be the best way to go. I put total possible points on the x-axis and score percentage on the y-axis. Each bubble represents a kicker, and both the size and color represent their weighted score percentage. Bigger, lighter bubbles represent those players with higher weighted score percentages. The top 3 players are labeled, appearing in the top right section of the graph. Players with high score percentages but low total possible points are in the top left section and those with low score percentages and low possible points are in the bottom left.
setwd("/Users/rubysullivan/Desktop/Sports Analytics")
file.path(getwd(), "Data/NFLBDB2022")
## [1] "/Users/rubysullivan/Desktop/Sports Analytics/Data/NFLBDB2022"
dir.exists(file.path(getwd(), "Data/NFLBDB2022"))
## [1] TRUE
dir.create(file.path(getwd(), "Data/NFLBDB2022"), showWarnings = F, recursive = T)
library(tidyverse)
library(data.table)
library(dplyr)
library(lubridate)
library(httr)
library(DescTools)
library(scales)
library(ggplot2)
library(tidytext)
library(ggplot2)
library(ggrepel)
library(forcats)
library(RColorBrewer)
library(ggplot2)
library(ggalt)
library(ggforce)
library(hms)
library(gganimate)
library(data.table)
library(dplyr)
library(nflfastR)
library(ggimage)
library(gifski)
tracking_k <- fread("Data/NFLBDB2022/NFL2022/tracking2020.csv")
games_k <- fread("Data/NFLBDB2022/NFL2022/games.csv")
players_k <- fread("Data/NFLBDB2022/NFL2022/players.csv")
pff_k <- fread("Data/NFLBDB2022/NFL2022/PFFScoutingData.csv")
plays_k <- fread("Data/NFLBDB2022/NFL2022/plays.csv")
merged_df <- tracking_k %>%
left_join(games_k, by = "gameId") %>%
left_join(pff_k, by = c("gameId", "playId")) %>%
left_join(players_k, by = c("nflId")) %>%
left_join(plays_k, by = c("gameId", "playId"))
kickers <- merged_df %>%
filter(position == "K", specialTeamsPlayType %in% c("Extra Point", "Field Goal"), quarter %in% c("4", "5")) %>%
select(position, displayName.x, jerseyNumber, gameId, playId, collegeName, quarter, down, yardsToGo, specialTeamsPlayType, specialTeamsResult, kickerId, gameClock) %>%
group_by(displayName.x, gameId, playId) %>%
mutate(actual_points = case_when(
specialTeamsPlayType == "Extra Point" & specialTeamsResult == "Kick Attempt Good" ~ 1,
specialTeamsPlayType == "Extra Point" & specialTeamsResult == "Kick Attempt No Good" ~ 0,
specialTeamsPlayType == "Field Goal" & specialTeamsResult == "Kick Attempt Good" ~ 3,
specialTeamsPlayType == "Field Goal" & specialTeamsResult == "Kick Attempt No Good" ~ 0,
TRUE ~ 0
)) %>%
mutate(possible_points = case_when(
specialTeamsPlayType == "Extra Point" ~ 1,
specialTeamsPlayType == "Field Goal" ~ 3,
TRUE ~ 0
)) %>%
arrange(displayName.x) %>%
data.frame()
unidff <- kickers[!duplicated(kickers$playId), ]
total_df <- unidff %>%
select(displayName.x, playId, actual_points, possible_points) %>%
group_by(displayName.x) %>%
summarise(
total_possible_points = sum(possible_points, na.rm = TRUE),
total_actual_points = sum(actual_points, na.rm = TRUE)
) %>%
mutate(percentage = (100 * (total_actual_points / total_possible_points))) %>%
arrange(-total_possible_points) %>%
data.frame()
kicker_stats <- total_df %>%
mutate(weighted_score = percentage * log1p(total_possible_points)) %>%
arrange(desc(weighted_score))
top_3 <- kicker_stats %>%
arrange(desc(weighted_score)) %>%
slice(1:3)
ggplot(kicker_stats, aes(x = total_possible_points, y = percentage, size = weighted_score, color = weighted_score)) +
geom_point(alpha = 0.7) +
scale_size(range = c(2, 10)) +
labs(title = "Kicker Performance: Efficiency vs Opportunity",
x = "Total Possible Points",
y = "Score Percentage",
size = "Weighted Score",
color = "Weighted Score") +
theme_minimal() +
theme(legend.position = "right") +
geom_text(data = top_3, aes(label = displayName.x), vjust = 0, size = 3, color = "black")
My friend Alita said my ranking was inaccurate because Justin Tucker was number 5 and not number 1. This opinion makes sense as he is the most accurate kicker in NFL history, and based on that knowledge, should probably be ranked number 1. However, this is a testament to my weighted ranking system, as Justin Tucker didn’t have as many opportunities to score like the 4 players above him in the ranking, so he was placed lower even with a 100% percentage. There are most likely ways that this ranking could be improved, like perhaps narrowing down the opportunites even further, like being solely at the end of the 4th quarter or the end of overtime, but for this project, I think my ranking system ended up working pretty well.