Data Dive Week 2 - Summary Data

Start by setting up the packages to manipulate data.

suppressPackageStartupMessages({library(tidyverse)
  library(rio)
  library(logger)
  source("aptheme.R") #Code that helps format graphs
  })

Start by reading in data and generating a summary

data <- import("plays.csv")

We will now look at the yards to go column yardsToGo, recording the number of yards a team needs to travel to get a new set of downs.

#Summarize yards to go 
print(paste("Mean of yards to go is", 
               mean(data$yardsToGo)))
## [1] "Mean of yards to go is 8.46129992557678"
print(paste("Minimum yards to go is", 
               min(data$yardsToGo), 
               "and the maximum yards to go is", 
               max(data$yardsToGo)))
## [1] "Minimum yards to go is 1 and the maximum yards to go is 38"
print("Quantiles are as such")
## [1] "Quantiles are as such"
quantile(data$yardsToGo)
##   0%  25%  50%  75% 100% 
##    1    6   10   10   38
hist(data$yardsToGo,  xlab = "Yards To Go", main = "Yards to Go Distribution")

This shows us the tenancies for the number of yards to go for any given play in. On average, a team has to gain 8.5 yards to get to a first down, but that average number is being skewed by the times teams get into long yardage situations (the longest yard situation being 38).

The converse column yardsGained records the number of yards a play yielded for a given play.

print(paste("Mean of yards gained", 
               mean(data$yardsGained)))
## [1] "Mean of yards gained 5.46061771272637"
print(paste("Minimum yards gained", 
               min(data$yardsGained), 
               "and the maximum yards gained", 
               max(data$yardsGained)))
## [1] "Minimum yards gained -68 and the maximum yards gained 98"
print("Quantiles are as such")
## [1] "Quantiles are as such"
quantile(data$yardsGained)
##   0%  25%  50%  75% 100% 
##  -68    0    3    8   98
hist(data$yardsGained, xlab = "Yards Gained", main = "Yards Gained Distribution")

Here we can see a couple of interesting things. On average, 5.5 yards are gained per play, but looking at the distribution, we can see that the play results are very spread out. There have been plays that are 68 yard losses as well as 98 yard pick ups. This is interesting because the yards to go suggests a lot tighter distribution than the yards gained.

Data Questions

What are the differences in offensive formation and yards gained? Are there any particularly effective formations?

formation_data <- data %>%
  group_by(offenseFormation) %>%
  summarise(yardsGained = mean(yardsGained)) %>%
  arrange(desc(yardsGained)) %>%
  filter(!is.na(offenseFormation))

ggplot(data = formation_data,
       aes(x = reorder(offenseFormation, yardsGained), y = yardsGained)) + 
  geom_col(fill = "#669900") + 
  geom_hline(yintercept = mean(data$yardsGained)) + 
  labs(x = "Offensive Formation",
    y = "Yards Gained",
    title = "Yards Gained by Formation") + 
   theme_ap(family = "sans") + 
  geom_text(aes(label = round(yardsGained, digits = 1)), vjust = -0.5)

This suggests that Jumbo is one of the least effective formations, while shotgun and empty are the most effective, based solely on the number of yards gained on a play. However, it’s possible this graph is a little misleading if we are looking at effectiveness of a play. A Jumbo formation means there are a lot of linemen stacked up front and either the quarterback or running back is going to follow along behind the big guys up front. Another important question is what is the average yardage needed for each offensive formation.

What is the difference in plays called when looking at the number of yards needed?

yards_data <- data %>%
  group_by(offenseFormation) %>%
  summarise(yardsToGo = mean(yardsToGo)) %>%
  arrange(desc(yardsToGo)) %>%
  filter(!is.na(offenseFormation))

ggplot(data = yards_data,
       aes(x = reorder(offenseFormation, yardsToGo), y = yardsToGo)) + 
  geom_col(fill = "#669900") + 
  geom_hline(yintercept = mean(data$yardsToGo)) + 
  labs(x = "Offensive Formation",
    y = "Yards Needed",
    title = "Yards Needed by Formation") + 
   theme_ap(family = "sans") + 
  geom_text(aes(label = round(yardsToGo, digits = 1)), vjust = -0.5)

Here we can see that suspicion confirmed. On average, if a team is calling a jumbo formation, they don’t need as many yards as when they are calling an empty formation.

What offensive formations net the greatest yardage?

What we really need now is some way of controlling for the yards to gain. Here we will re-run our analysis looking at the net yards gained on a play by subtracting the yards needed from the yards picked up. So if there were 5 yards needed and 6 yards gained, that would be a net 1 yard.

net_data <- data %>%
  mutate(net_yards = yardsToGo - yardsGained) %>%
  group_by(offenseFormation) %>%
  summarise(yardsNet = mean(net_yards)) %>%
  arrange(desc(yardsNet)) %>%
  filter(!is.na(offenseFormation))

ggplot(data = net_data,
       aes(x = reorder(offenseFormation, yardsNet), y = yardsNet)) + 
  geom_col(fill = "#669900") + 
  labs(x = "Offensive Formation",
    y = "Net Yards",
    title = "Net Yards by Formation") + 
   theme_ap(family = "sans") + 
   geom_text(aes(label = round(yardsNet, digits = 1)), vjust = -0.5)

There are a couple interesting things to highlight here. First is the difference between wildcat and jumbo formations. Jumbo formations typically result in fewer yards than wildcat formations, however, jumbo formations get, on average, 1.9 yards more than needed while wildcat gets, on average, 1.6 yards more than needed. The interesting thing to note is the difference between pistol and shotgun formations. Shotgun formations average 0.7 yards more than pistol formations, but when talking about yards over what was needed to gain, the pistol formation out-gains shotgun formations by almost half a yard.

There needs to be more research done, particularly when it comes to the type of formation the defense is showing, but it appears that two pieces of insight come from this data. 1) In short yardage situations, the jumbo formation is more effective than wildcat and in longer yard situations pistol formations are more effective than shotgun formations.