In the realm of data science, the ability to distill complex information into actionable insights is paramount. This project embarks on a comprehensive journey through advanced data analysis techniques using the R programming language. Leveraging the power of libraries such as ggplot2 for data visualization and dplyr for data manipulation, we aim to uncover trends and patterns that can inform strategic decisions.
We will use R, dplyr and ggplot to demonstrate the best punters in the NFL.
In this project, we looked at 7 statistical categories to analyze a punters performance. We applied a weighted scale to determine who the best punter is.
# Load necessary library
library(dplyr)
library(data.table)
# Read the CSV files
plays <- fread("plays.csv")
players <- fread("players.csv")
# Filter for punt plays
punt_plays <- plays %>%
filter(specialTeamsPlayType == "Punt")
This is the average distance of punts minus return yards. It gives a better sense of how effective a punt is in terms of field position.
# Calculate Net Punting Average per punter
net_punting_average_per_punter <- punt_plays %>%
group_by(kickerId) %>%
summarise(NetPuntingAverage = mean(kickLength - kickReturnYardage, na.rm = TRUE))
This measures how often a punter can pin the opposing team deep in their own territory. More punts inside the 20 indicate better field position control.
# Calculate the number of punts inside the 20-yard line per punter
punts_inside_20_per_punter <- punt_plays %>%
filter(absoluteYardlineNumber <= 20) %>%
group_by(kickerId) %>%
summarise(PuntsInside20 = n())
This shows how many yards the opposing team gains on punt returns. Lower numbers are better, indicating the punter is effective at limiting return yardage.
# Calculate the total return yards per punter
return_yards_per_punter <- punt_plays %>%
group_by(kickerId) %>%
summarise(TotalReturnYards = sum(kickReturnYardage, na.rm = TRUE))
This is the average distance of all punts, regardless of return. A higher average indicates a punter can kick the ball farther.
# Calculate Gross Punting Average per punter
gross_punting_average_per_punter <- punt_plays %>%
group_by(kickerId) %>%
summarise(GrossPuntingAverage = mean(kickLength, na.rm = TRUE))
This measures how often the punter forces the returner to catch the ball without it rolling. More fair catches can indicate better accuracy and placement.
# Calculate the number of fair catches per punter
fair_catches_per_punter <- punt_plays %>%
filter(specialTeamsResult == "Fair Catch") %>%
group_by(kickerId) %>%
summarise(FairCatches = n())
This indicates how often the punter’s kicks are blocked. Fewer blocks are better.
# Calculate the number of punt blocks per punter
punt_blocks_per_punter <- punt_plays %>%
filter(specialTeamsResult == "Blocked Punt") %>%
group_by(kickerId) %>%
summarise(PuntBlocks = n())
This is the number of times the ball is kicked into the end zone on a punt. Fewer touchbacks are generally better, as they give the opposing team better field position.
# Calculate the number of touchbacks per punter
touchbacks_per_punter <- punt_plays %>%
filter(specialTeamsResult == "Touchback") %>%
group_by(kickerId) %>%
summarise(Touchbacks = n())
Next we need to take these individual statistics and assign weight to them as some are more impactful than others. Here is the scale that is used:
# Combine all metrics into one data frame
punters <- net_punting_average_per_punter %>%
merge(punts_inside_20_per_punter, by = "kickerId") %>%
merge(return_yards_per_punter, by = "kickerId") %>%
merge(gross_punting_average_per_punter, by = "kickerId") %>%
merge(fair_catches_per_punter, by = "kickerId") %>%
merge(punt_blocks_per_punter, by = "kickerId") %>%
merge(touchbacks_per_punter, by = "kickerId")
# Normalize the data
normalize <- function(x) {
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}
normalized_punters <- as.data.frame(lapply(punters[, -1], normalize))
# Weights
weights <- c(
NetPuntingAverage = 0.25,
PuntsInside20 = 0.20,
TotalReturnYards = 0.20,
GrossPuntingAverage = 0.15,
FairCatches = 0.15,
PuntBlocks = 0.10,
Touchbacks = -0.05
)
# Calculate composite score
composite_score <- as.vector(as.matrix(normalized_punters) %*% weights)
# Add composite score to the data frame
punters$composite_score <- composite_score
# Order by composite score
punters <- punters[order(-punters$composite_score), ]
# Join with player names
punters_with_names <- punters %>%
left_join(players %>% select(nflId, displayName), by = c("kickerId" = "nflId"))
# Display only the relevant columns
selected_columns <- punters_with_names[c("displayName", "composite_score")]
With our data categorize, normalized and weighted we can now graph to see who the best punter is.
source('https://raw.githubusercontent.com/mlfurman3/gg_field/main/gg_field.R')
# bar chart
library(ggplot2)
# Basic bar chart
ggplot(punters_with_names, aes(x = displayName, y = composite_score)) +
geom_bar(stat = "identity") +
labs(title = "Best Punter",
x = "Punter",
y = "Composite Score") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
In conclusion, this project has successfully explored and implemented advanced data analysis techniques within the R environment. By leveraging powerful libraries such as ggplot2 and dplyr, we have been able to visualize complex datasets and derive meaningful insights that are both informative and actionable. The integration of R Markdown has facilitated a seamless documentation process, ensuring that our analysis is both reproducible and easily interpretable.