My Approach

For this assignment, I went into it with two stats in mind, kick length and kicking percentage, as if you ask me the most important things in kicking is how far can you kick it and how accurate can you be. This is when I got the idea of creating a statistic in which I combine kicking percentage and average kick distance by simply multipling them together to create what I call a weighted kicker score. I also feel that a large differentiating factor between kickers is clutch kicking, as that is when the points from a field goal are most needed, so I decided to make my visualization only include 4th quarter stats due to the high pressure situations that generally occur in the 4th quarter.

Importing Data and Merging into a Data Frame

In the code below, I start by getting all the library lines that I will need and put them at the top, followed by setting my path and reading the files into data frames. With my 4 files loaded into data frames, I then had to merge them together to create my main data frame so I could then source this data to create the frame for my visualization.

library(dplyr)
library(data.table)
library(lubridate)
library(ggplot2)
library(tidytext)

my_path <- "C:/Users/nrhar/OneDrive/Documents/SportsAnalytics"

setwd(my_path)

my_df1 <- fread("plays.csv")

my_df1 <- my_df1 %>%
  filter(specialTeamsPlayType %in% c("Field Goal")) %>%
  data.frame()

my_df2 <- fread("players.csv")

my_df3 <- fread("games.csv")

my_df4 <- fread("PFFScoutingData.csv")

my_df <- merge(my_df1, my_df2, by.x = c("kickerId"), by.y = c("nflId"), all.x = TRUE)
my_df <- merge(my_df, my_df3, by = c("gameId"), all.x = TRUE)
my_df <- merge(my_df, my_df4, by = c("gameId", "playId"), all.x = TRUE)

Creating a Data Frame For The Visualization

Now that my main data frame was formed with all field goals from seasons 2018, 2019, and 2020, I then had to create 3 separate data frames for the different seasons as I wanted to make a trellis chart for the top 10 kickers by weighted kicker score in the 4th quarter by season. All of these data frames had the same basic code with a few variations due to different circumstances. Since a main part of my metric is kick distance, I decided that for all seasons I would exclude Brandon McManus as he is the kicker for the Broncos, who obviously play half their games in Denver where the increased elevation allows for kicks to travel farther. I also had to exclude Michael Badgley and Austin Seibert from select data frames as their weighted kicker scores tied with others, so I excluded them because they had less total attempts. I also made the decision to make there be a minimum number of kicks (5) as I saw there was a lot of kickers with attempts as low as 1, 2, or 3 for the season which I felt was too low and skewed the results. Once I had all my data frames ready with the 3 seasons, I then had to merge them together with the rbind function to create a data frame with the top 10 kickers per season for the 3 different seasons.

df <- my_df %>%
  select(displayName, season, specialTeamsPlayType, specialTeamsResult, kickLength, possessionTeam, quarter) %>%
  filter(specialTeamsPlayType == "Field Goal", !is.na(displayName),
         specialTeamsResult != "Blocked Kick Attempt", quarter == 4,
         displayName != "Brandon McManus", displayName != "Michael Badgley", season == 2018) %>%
  mutate(kick_good_yn = ifelse(specialTeamsResult == "Kick Attempt Good", 1, 0)) %>%
  group_by(displayName, season) %>%
  summarise(attempts = n(),
            kick_good = sum(kick_good_yn),
            kick_no_good = n() - kick_good,
            avg_dist = round(sum(kickLength)/(kick_good + kick_no_good)),
            kick_percentage = round(100 * kick_good/(kick_good + kick_no_good), 2),
            weighted_kicker_score = avg_dist * (kick_percentage/100),
            .groups = 'keep') %>%
  filter(attempts >= 5) %>%
  arrange(-weighted_kicker_score) %>%
  ungroup() %>%
  top_n(10, wt=weighted_kicker_score) %>%
  data.frame()

df1 <- my_df %>%
  select(displayName, season, specialTeamsPlayType, specialTeamsResult, kickLength, possessionTeam, quarter) %>%
  filter(specialTeamsPlayType == "Field Goal", !is.na(displayName),
         specialTeamsResult != "Blocked Kick Attempt", quarter == 4,
         displayName != "Brandon McManus", season == 2019) %>%
  mutate(kick_good_yn = ifelse(specialTeamsResult == "Kick Attempt Good", 1, 0)) %>%
  group_by(displayName, season) %>%
  summarise(attempts = n(),
            kick_good = sum(kick_good_yn),
            kick_no_good = n() - kick_good,
            avg_dist = round(sum(kickLength)/(kick_good + kick_no_good)),
            kick_percentage = round(100 * kick_good/(kick_good + kick_no_good), 2),
            weighted_kicker_score = avg_dist * (kick_percentage/100),
            .groups = 'keep') %>%
  filter(attempts >= 5) %>%
  arrange(-weighted_kicker_score) %>%
  ungroup() %>%
  top_n(10, wt=weighted_kicker_score) %>%
  data.frame()

df2 <- my_df %>%
  select(displayName, season, specialTeamsPlayType, specialTeamsResult, kickLength, possessionTeam, quarter) %>%
  filter(specialTeamsPlayType == "Field Goal", !is.na(displayName),
         specialTeamsResult != "Blocked Kick Attempt", quarter == 4,
         displayName != "Brandon McManus", displayName != "Austin Seibert", season == 2020) %>%
  mutate(kick_good_yn = ifelse(specialTeamsResult == "Kick Attempt Good", 1, 0)) %>%
  group_by(displayName, season) %>%
  summarise(attempts = n(),
            kick_good = sum(kick_good_yn),
            kick_no_good = n() - kick_good,
            avg_dist = round(sum(kickLength)/(kick_good + kick_no_good)),
            kick_percentage = round(100 * kick_good/(kick_good + kick_no_good), 2),
            weighted_kicker_score = avg_dist * (kick_percentage/100),
            .groups = 'keep') %>%
  filter(attempts >= 5) %>%
  arrange(-weighted_kicker_score) %>%
  ungroup() %>%
  top_n(10, wt=weighted_kicker_score) %>%
  data.frame()

dff <- rbind(df, df1)

df_final <- rbind(dff, df2)
ggplot(df_final, aes(x = reorder(displayName, -weighted_kicker_score), y = weighted_kicker_score, fill = kick_percentage)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0(kick_percentage, "%")), vjust = -0.5, size = 3) +
  labs(x = "Kicker", y = "Weighted Kicker Score", title = "4th Quarter Kicking by Weighted Kicker Score",
       fill = "Kicking Percentage") +
  scale_y_continuous(limits = c(0, max(df$weighted_kicker_score)*1.1)) +
  scale_x_reordered() +
  scale_fill_continuous(limits = c(60, 100),
                        labels = paste0(seq(60, 100, 10), "%"),
                        breaks = seq(60, 100, 10),
                        low = "red",
                        high = "dark green") +
  facet_wrap(ncol = 1, nrow = 3, ~season, scales = 'free') +
  theme(plot.title = element_text(hjust=0.5), axis.text.x = element_text(size = 6))

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.