IS 470 Assignment #2

Introduction

To determine the best kicker, I decided to examine the success rate of 50+ yard field goal kicks. In my eyes, the best kickers are the ones who are making the difficult kicks consistently. The best kickers are the ones who have a high success rate as well as a high number of attempts. The traits show that the kicker is highly accurate and seen as highly reliable by their coach. This balance is shown by the bar chart that uses height to indicate success rate, and color to indicate the number of attempts (darker color means more attempts). The best kicker would be the one with the highest success rate, and the shade of the bar adds more context to each kicker and their success.

Description of Project

For my data frame, I needed to examine the total number of kicks for each kicker, the number of made kicks for each kicker, and the success rate which was the number of made kicks over the total number of kicks. I used a “group_by” line so that each kicker became only one data point. I then created a total_kicks variable, as well as a made_kicks variable by adding up the made field goals for each kicker. For this variable, I also had to filter out the extra point attempts, as well as any field goal that was under 50 yards. Lastly, I created the variable called “success_rate” by taking made_kicks nd dividing that by total_kick_attempts. This gave me the data frame that I used for the project.

Data Visualization

My visualization is a bar chart that shows the kicker with the lowest success_rate, all the way up to the kicker with the highest success_rate. My bar chart showed that Younghoe Koo was the best kicker in the NFL with his 100% success rate. He was 7 for 7 on kicks over 50 yards. The only downside to Koo, is that a 7 kick sample was on the lower side compared to others. Josh Lambo was another kicker, who attempted 11 kicks and made 10. This is around the average attempted kicks for the pool of kickers. So, depending on how you value success rate combined with number of attempts, one could make an argument another kicker is best. However, I value success rate heavily, making Koo my clear number 1 kicker in the NFL.

setwd("/Users/jonahgoodman/Desktop/IS 470/Data")

suppressMessages(library(data.table))
suppressMessages(library(dplyr))
suppressMessages(library(lubridate))
suppressMessages(library(httr))
suppressMessages(library(DescTools))
suppressMessages(library(scales))
suppressMessages(library(ggplot2))

my_df1 <- fread("Data/NFLBDB2022/plays.csv")

my_df1 <- my_df1 %>%
  filter(specialTeamsPlayType %in% c("Extra Point", "Field Goal")) %>%
  data.frame()
  
my_df2 <- fread("Data/NFLBDB2022/players.csv")

my_df3 <- fread("Data/NFLBDB2022/games.csv")

my_df4 <- fread("Data/NFLBDB2022/pffScoutingData.csv")

#Join data frames together into one data frame
my_df <- left_join(my_df1, my_df2, by = c("kickerId" = "nflId"))
my_df <- left_join(my_df, my_df3, by = c("gameId"))
my_df <- left_join(my_df, my_df4, by = c("gameId", "playId"))

kicker_df <- my_df %>%
  select(displayName, kickerId, specialTeamsResult, kickLength)%>%
  
  #takes out extra point attempts and field goals under 50 yards
  filter(!is.na(kickLength) & kickLength >= 50)%>%
  
  group_by(displayName)%>%
  
  #Creates variables in the data frame
  summarise(total_kick_attempts = n(), made_kicks = sum(specialTeamsResult == "Kick Attempt Good"),
            success_rate = (made_kicks / total_kick_attempts) * 100) %>%
  
  #Make names fit and be readable on graph
  mutate(displayName = sub(".*\\s(.*)", "\\1", displayName)) %>%
  
  #Make sure the kicker has an acceptable number of attempts 
  filter(total_kick_attempts > 5)%>%
  arrange(displayName)

View(kicker_df)

Visualization 1: A graph that shows …

ggplot(kicker_df, aes(x =reorder(displayName, success_rate), y = success_rate, fill = total_kick_attempts)) +
  
  geom_bar(stat = "identity") +
  
  labs(x = "Kicker", y = "Success rate (%)", title = "Kicker Success Rate (50+ yards)")+
  
  scale_fill_gradient(low = "skyblue", high ="purple")+
  
  theme(axis.text.x = element_text(size=7.5), axis.text.y = element_text(size = 20), axis.title.y = element_text(size = 20), axis.title.x = element_text(size = 20), axis.title = element_text(size = 20), plot.title = element_text(hjust = 0.5, size = 25))

Conclusion

Some general takeaways I had from the project is that creating a data frame is a meticulous process that requires a large attention to detail. Splicing data together can take time, especially when you must make sure you are adding the correct parts of each data frame together. Another takeaway I had was that you must fine tune your graph to make sure it is not only readable, but looks appealing and neat. If a graph is difficult to be, whether it’s the font size of number/words, graph shading, or even if the colors aren’t soothing, these are all things to at least think about when creating a graph. Lastly, it was very interesting to see the results of the data and graph I created. If I were to go further with the data, I would maybe create another equation that took the success_rate, the total_kick_attempts, and even the success rate of kicks under 50 yards into account. I would create some type of equation to take all of these data points, and it would add up to a score out of a certain possible amount of points. That would be another way to quantify who is really the best kicker. Overall, the project opened my mind to all sorts of possibilities when using NFL data going forward, and was a great first project to expand my skills using R.