For this project, I wanted to discover who the best kicker in the NFL solely using raw data from past NFL Big Data Bowls.
I created the dataframe used in this analysis by use of the dplyr library and select, mutate, summarise, and group_by functions.
# set path and working directory
mypath <- "/Users/mike/Desktop/nfl-big-data-bowl-2022"
setwd(mypath)
# read in datafiles
my_df1 <- fread("plays.csv")
my_df1 <- my_df1 %>%
filter(specialTeamsPlayType %in% c("Extra Point", "Field Goal")) %>%
data.frame()
my_df2 <- fread("players.csv")
my_df3 <- fread("games.csv")
my_df4 <- fread("PffScoutingData.csv")
my_df <- merge(my_df1, my_df2, by.x = c("kickerId"), by.y = c("nflId"), all.x = TRUE)
my_df <- merge(my_df, my_df3, by = c("gameId"), all.x = TRUE)
my_df <- merge(my_df, my_df4, by = c("gameId", "playId"), all.x = TRUE)
df <- my_df %>%
select(displayName, season, specialTeamsPlayType, specialTeamsResult,
kickLength, possessionTeam) %>%
filter(specialTeamsPlayType == "Field Goal",
!is.na(displayName),
specialTeamsResult != "Blocked Kick Attempt") %>%
mutate(score_yn = ifelse(specialTeamsResult == "Kick Attempt Good", 1, 0)) %>%
group_by(displayName, possessionTeam) %>%
summarise(kick_count = n(),
seasons = n_distinct(season),
success = sum(score_yn),
fail = n()-success,
median_yrds = median(kickLength),
avr_yrds = round(sum(kickLength[score_yn == 1])/success, 2),
max_success_yrds = if(success > 0) max(kickLength[score_yn == 1]) else 0,
min_success_yrds = if(success > 0) min(kickLength[score_yn == 1]) else 0,
third_q_success_yrds = if(success > 0) quantile(kickLength[score_yn == 1], probs = .75) else 0,
first_q_success_yrds = if(success > 0) quantile(kickLength[score_yn == 1], probs = .25) else 0,
max_fail_yrds = if(fail > 0) max(kickLength[score_yn == 0]) else 0,
min_fail_yrds = if(fail > 0) min(kickLength[score_yn == 0]) else 0,
third_q_fails_yrds = if(success > 0) quantile(kickLength[score_yn == 0], probs = .75) else 0,
first_q_fails_yrds = if(success > 0) quantile(kickLength[score_yn == 0], probs = .25) else 0,
avr_success_yrds = if(success > 0) round(sum(kickLength[score_yn == 1])/success, 2) else 0,
avr_fail_yrds = if (fail > 0) round(sum(kickLength[score_yn == 0])/fail, 2) else 0,
success_rate = if(fail > 0) round(100*success/(success + fail), 2) else 0,
min_pt = min(min_success_yrds, min_fail_yrds),
min_pt = max(max_success_yrds, max_fail_yrds),
overall_max = max(max_fail_yrds, max_success_yrds),
.groups = 'keep') %>%
arrange(-success_rate) %>%
filter(seasons > 1) %>%
ungroup() %>%
top_n(10, wt = success_rate) %>%
mutate(id = n() - row_number()) %>%
data.frame()
# set labels equal to specific columns
labs1 = df$success_rate
labs2 = df$kick_count
labs3 = df$avr_fail_yrds
labs4 = df$avr_success_yrds
# add labels
my_labels = paste0(df$success, "/", df$kick_count)
my_labels1 = paste0(labs1, "%")
my_labels2 = paste0(labs3)
my_labels3 = paste0(labs4)
# add colors corresponding to the team colors to each bar
mycolors <- c("darkcyan", "purple3", "darkblue", "red", "red3", "darkgrey", "darkgoldenrod", "darkgreen", "gold2", "blue")
teams <- df$possessionTeam
xteams <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# set locations for annotations
madex <- df$avr_success_yrds - 5
missx <- df$avr_fail_yrds + 5
scaleRight <- 1
In order to determine who the best kicker in the NFL is, I focused mainly on kick efficiency in my analysis. To calculate efficiency, I calculated the percentage of made field goals out of all field goal attempts by each kicker. To visualize the ranking of the top ten kickers between 2019 and 2022, I created a bar chart of the kick percentages of the ten players. Based on initial analysis of made field goal percentage alone, Josh Lambo of the Jacksonville Jaguars leads the pack of kickers with a percentage of 94.74%, with Justin Tucker of the Baltimore Ravens trailing slightly behind with a percentage of 94.57%. Lambo and Tucker separate themselves comfortably from the rest of the pack in kick percentage, with the third place Nick Folk of the New England Patriots almost an entire 2 percent behind the top 2 with a kick percentage of 92.68%. Although kick percentage shows on a face-value how likely a kicker has made a field goal out of all the attempts given, it does not show the full picture. Lengths of field goals can be a significant determinant in the efficiency of kickers, and the average lengths that these kickers typically kick for should be accounted for as well. To demonstrate the average lengths, I inputted a line graph of both average kick lengths for made field goals and missed field goals. The average for made field goals sat around a constant of 35 to 38 between all ten players, with none of the kickers showing to be a strong outlier. Lambo and Tucker both averaged 37 on makes; however, on the plot of missed field goals, Tucker separated himself from the pack, as his average was at 52 yards out. Lambo’s average was 46.67, 5 yards shorter than Tucker’s average misses. This statistic demonstrates that out of the 5 percent of total kicks that the top 2 kickers attempted, Tucker’s were significantly further, and as a result, it is clear that Tucker had to attempt more difficult field goals. Finally, I added the totals for makes and attempts for the top 10 kickers, in order to see what the sample size was with each kicker. A kicker with a smaller sample size of kicks could potentially the best percentage, but it is up for debate on whether that percentage is maintainable. It is notable that far above most other kickers in the top 10 by percentage, was Tucker, who also was almost tied for first place in percentage and on average, had the most difficult missed attempts. Although Tucker just comes short of first place by field goal percentage by a hair, his heavy and difficult workload are worth considering, and it makes clear the point that he is considerably the best kicker in the NFL.
ggplot(df, aes(x = reorder(displayName, -success_rate), y = success_rate)) +
geom_bar(stat = "identity", color = "black", fill = mycolors) +
geom_text(aes(label = my_labels1), vjust = -1, color = "black") +
geom_text(aes(label = my_labels), vjust = 2, color = "white", size = 3) +
theme_light() +
labs(title = "Rank of NFL Kickers by Field Goal Percentage\n who Played Two or More Seasons (2018-2020)", y = "FG Success Rate (%)\n", x = "") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_line(aes(x = displayName, y = avr_success_yrds, group = 1), size = 1) +
geom_line(aes(x = displayName, y = avr_fail_yrds, group = 1), size = 1) +
scale_color_manual(NULL, values = "black") +
scale_y_continuous(sec.axis = sec_axis(~.*scaleRight, name = "Mean FG Distance for Makes (circles) & Misses (squares)\n")) +
geom_point(inherit.aes = FALSE, data = df,
aes(x = displayName, y = avr_success_yrds, group = 1),
size = 3, shape = 21, fill = "green", color = "black") +
geom_point(inherit.aes = FALSE, data = df,
aes(x = displayName, y = avr_fail_yrds, group = 1),
size = 3, shape = "square", fill = "red", color = "black") +
annotate("text", xteams, y = -2, label = teams, color = "black") +
annotate("text", xteams, y = missx, label = my_labels2, color = "white") +
annotate("text", xteams, y = madex, label = my_labels3, color = "white") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())