Introduction

For Assignment 2, the goal was to create a data visualization presenting the top kickers in the NFL I determined that the best kickers in the NFL were the kickers with the highest accuracy of successful kicks to total kick attempts, given that the kicker has done at least 10 kicks.

#Libraries
library(data.table)
library(lubridate)
library(dplyr)
library(httr) 
library(ggplot2)
library(tidytext)
library(RColorBrewer)
library(knitr)
#Data Read
dfa <- fread("plays.csv")
dfb <- fread("players.csv")
dfc <- fread("games.csv")
dfd <- fread("PFFScoutingData.csv")
dfa <- dfa %>%
  filter(specialTeamsPlayType %in% c("Field Goal")) %>%
  data.frame()
#Data Merge
complete_df <- merge(dfa,dfb, by.x = c("kickerId"), by.y = c("nflId"),all.x = TRUE)
complete_df <- merge(complete_df,dfc, by = c("gameId"), all.x = TRUE)
complete_df <- merge(complete_df,dfd, by = c("gameId","playId"), all.x = TRUE)
#Filter Data
df <- complete_df %>%
  select(season, displayName,season,specialTeamsPlayType,specialTeamsResult,kickLength,possessionTeam)%>%
  filter(season >= 2019 & season <= 2021,
         specialTeamsPlayType == "Field Goal",
         !is.na(displayName),
         specialTeamsResult != "Blocked Kick Attempt")%>%
  mutate(score_yn = ifelse(specialTeamsResult == "Kick Attempt Good",1,0),
         team=recode(possessionTeam,'OAK'='LV','SD'='LAC','STL'='LA'),
         kl= case_when(
           kickLength <= 10 ~ "<10",
           kickLength > 10 & kickLength <= 20 ~ "10 - 20",
           kickLength > 20 & kickLength <= 30 ~ "20 - 30",
           kickLength > 30 & kickLength <= 40 ~ "30 - 40",
           kickLength > 40 & kickLength <= 50 ~ "40 - 50",
           kickLength > 50 & kickLength <= 60 ~ "50 - 60",
           kickLength > 60 & kickLength <= 70 ~ "60 - 70",
           kickLength > 70 ~ ">70",
           TRUE ~ "999")) %>%
  group_by(displayName, possessionTeam ,kl)%>%
  summarise(total = n(),
            success =sum(score_yn),
            fail=(n() - success),
            avg_kick = mean(kickLength),
            success_rate = round(100 * success / total,2),
            .groups = 'keep')%>%
  group_by(displayName) %>%
  mutate(accuracy = sum(success) / sum(total),
         total_kicks = sum(total)) %>%
  ungroup() %>%
  filter(total_kicks >= 10) %>%
  mutate(rank = dense_rank(-accuracy)) %>%
  filter(rank <= 10) %>%
  data.frame()

Bar Graph

For my data visualization, I created a bar chart with bars for each of the top 10 kickers, with different color bars representing various distances in yards from where the ball was kicked. The higher the bar, the more accurate the kicker is from that distance.

#Bar graph2
ggplot (df, aes(x=displayName, y=success_rate,group = kl, fill = kl)) + 
  geom_bar (stat="identity", position = position_dodge(width = 0.5))+
  labs(x='Kicker',y="Accuracy %",title="Kicker Accuracy by Distance")+
  guides(fill=guide_legend("Kick Length"))

## Conclusion It is clear that the best kickers do not miss from 10-30 yards, however, beyond that amount, there is some variation between kickers. I think this visualization could improve upon itsself if there was more kicking data and if the distances were divided into categories of 5 yards instead of 10. In conclusion, I am satisfied with the visualization as it is the foundation for which I will build my future work upon.