Start by setting up the packages to manipulate data.
suppressPackageStartupMessages({
library(tidyverse)
library(rio)
source("aptheme.R") #Code that helps format graphs
})
Start by reading in data and generating a summary
data <- import("plays.csv")
During a rushing play, teams have the option of rushing to the right or to the left. I want to know if the Indianapolis Colts are more predictable on the side that they are rushing, depending on a specific play.
indy_rush <- data %>%
filter(possessionTeam == "IND") %>%
filter(!is.na(rushLocationType)) %>%
filter(!is.na(offenseFormation)) %>%
filter(rushLocationType != "UNKNOWN") %>%
mutate(rush_direction = ifelse(grepl("RIGHT", rushLocationType), "right", "left")) %>%
group_by(offenseFormation,
rush_direction) %>%
summarise(plays = n()) %>%
mutate(low_probability = plays == 1)
## `summarise()` has grouped output by 'offenseFormation'. You can override using
## the `.groups` argument.
right_plays <- sum(indy_rush[indy_rush$rush_direction == "right",]$plays)
left_plays <- sum(indy_rush[indy_rush$rush_direction == "left",]$plays)
total_rush <- right_plays + left_plays
print(paste("Probability of a run to the right", round(right_plays/total_rush, digits = 2)))
## [1] "Probability of a run to the right 0.53"
indy_rush
## # A tibble: 12 × 4
## # Groups: offenseFormation [7]
## offenseFormation rush_direction plays low_probability
## <chr> <chr> <int> <lgl>
## 1 EMPTY left 1 TRUE
## 2 EMPTY right 1 TRUE
## 3 I_FORM left 4 FALSE
## 4 I_FORM right 1 TRUE
## 5 JUMBO left 1 TRUE
## 6 PISTOL left 1 TRUE
## 7 SHOTGUN left 55 FALSE
## 8 SHOTGUN right 61 FALSE
## 9 SINGLEBACK left 34 FALSE
## 10 SINGLEBACK right 44 FALSE
## 11 WILDCAT left 1 TRUE
## 12 WILDCAT right 1 TRUE
This provides some insight into the Indianapolis Colt’s habits when running the ball. The probability that any given run goes towards the right is about 0.53, which is not a whole lot better than chance. If we look at the different formations, we can get a little bit more information. Shotgun formation is about the same as the overall probability, but if the formation is single back, then the probability of a run to the right goes up to 0.56. This is potentially useful information to both the Colt’s organization and the teams the Colts play.
For the opponents, it shows a slight preference towards a right-side run. If you see particularly a singleback formation, it may be a good idea to line up your better tacklers on the right side to try and account for a run to the right.
For the Colts, it shows that particilarly in the singleback formation, there is a tendancy that can be observed by opponents. It’s possible that this is by chance, and if so, it might be good to reiterate to the running backs the importance of mixing it up, so defenses do not start to overload the right side of the formation to account for this right sided tendancy. If it is not by chance, but because the right tackle and guard tend to be better at blocking than the left tackle and guard, it may be something to lean into. If your line is wining their matchups more on the right side, more runs should be designed to go to the right.
The testable hypothesis here is that the players on the right side of the line tend to win their matchups more than the players on the left side of the line.
#Generate the viz
viz_data <- indy_rush %>%
filter(offenseFormation %in% c("SHOTGUN", "SINGLEBACK"))
ggplot(data = viz_data, aes(x = offenseFormation, y = plays)) +
theme_ap(family = "sans") +
geom_col(aes(fill = rush_direction), position = "dodge") +
labs(x = "Offensive Formation",
y = "Number of Plays",
title = "Plays by Rush Direction") +
labs(fill = "Rush Direction") +
theme(legend.position = "right")
To address the idea that the right side of the offensive line for the Colts is just better at blocking than the right side, we look at the average number of yards gained on each play.
indy_rush_yards <- data %>%
filter(possessionTeam == "IND") %>%
filter(!is.na(rushLocationType)) %>%
filter(!is.na(offenseFormation)) %>%
filter(rushLocationType != "UNKNOWN") %>%
mutate(rush_direction = ifelse(grepl("RIGHT", rushLocationType), "right", "left")) %>%
group_by(offenseFormation,
rush_direction) %>%
summarise(avg_yards = mean(yardsGained))
## `summarise()` has grouped output by 'offenseFormation'. You can override using
## the `.groups` argument.
indy_rush_yards
## # A tibble: 12 × 3
## # Groups: offenseFormation [7]
## offenseFormation rush_direction avg_yards
## <chr> <chr> <dbl>
## 1 EMPTY left 0
## 2 EMPTY right 8
## 3 I_FORM left 3.75
## 4 I_FORM right 0
## 5 JUMBO left 3
## 6 PISTOL left 6
## 7 SHOTGUN left 3.78
## 8 SHOTGUN right 4.20
## 9 SINGLEBACK left 3.5
## 10 SINGLEBACK right 3.02
## 11 WILDCAT left -2
## 12 WILDCAT right 9
This presents some really interesting results. Looking first at the singleback formation, where the Colts have the strongest tendency to run to the right, the team averages about a half yard more when they run to the left. This would seem to support the idea that runs in this formation are becoming predictable, rather than the right side of the line being better at blocking. On the other hand, in the shotgun formation, running to the right leads to a small increase in yards picked up.
The testable hypothesis, running to the left in shotgun formation will lead to more yards than running to the right.
#Generate the viz
viz_data <- indy_rush_yards %>%
filter(offenseFormation %in% c("SHOTGUN", "SINGLEBACK"))
ggplot(data = viz_data, aes(x = offenseFormation, y = avg_yards)) +
theme_ap(family = "sans") +
geom_col(aes(fill = rush_direction), position = "dodge") +
labs(x = "Offensive Formation",
y = "Yards Gained",
title = "Yards Gained by Rush Direction") +
labs(fill = "Rush Direction") +
theme(legend.position = "right")
We have seen that there is a tendacy for the Colts to run to the right, but in some cases are more sucsessful when they run to the left. This begs the question, are there any patterns when it comes to the defense. Here we look at similar groupings, but when Indianapolis is the defensive team.
indy_rush_defense <- data %>%
filter(defensiveTeam == "IND") %>%
filter(!is.na(rushLocationType)) %>%
filter(!is.na(offenseFormation)) %>%
filter(rushLocationType != "UNKNOWN") %>%
mutate(rush_direction = ifelse(grepl("RIGHT", rushLocationType), "right", "left")) %>%
group_by(offenseFormation,
rush_direction) %>%
summarise(avg_yards = mean(yardsGained),
plays = n())
## `summarise()` has grouped output by 'offenseFormation'. You can override using
## the `.groups` argument.
right_plays <- sum(indy_rush_defense[indy_rush_defense$rush_direction == "right",]$plays)
left_plays <- sum(indy_rush_defense[indy_rush_defense$rush_direction == "left",]$plays)
total_rush <- right_plays + left_plays
print(paste("Probability of a run to the right", round(right_plays/total_rush, digits = 2)))
## [1] "Probability of a run to the right 0.47"
indy_rush_defense
## # A tibble: 13 × 4
## # Groups: offenseFormation [7]
## offenseFormation rush_direction avg_yards plays
## <chr> <chr> <dbl> <int>
## 1 EMPTY left 1.5 2
## 2 EMPTY right 6 2
## 3 I_FORM left 3.12 16
## 4 I_FORM right 3.33 18
## 5 JUMBO left 1 2
## 6 PISTOL left 5.83 6
## 7 PISTOL right 2.5 4
## 8 SHOTGUN left 3.94 53
## 9 SHOTGUN right 4.06 32
## 10 SINGLEBACK left 4.17 46
## 11 SINGLEBACK right 4.57 54
## 12 WILDCAT left -3 1
## 13 WILDCAT right 0 2
Here we can see that the probability of a run to the right for any team facing Indianapolis is 0.47. Much like when the Colts are on offense, the biggest difference can be seen with the shotgun and singleback formation. The probability of a run to the right, when the offense is set up in shotgun, is only 0.37. This shows a clear preference for a run to the left.
The shotgun formation is designed to allow the quarter back to either throw the ball or hand it off. In run plays, the quarter back has decided that his recievers are covered so it’s better to run the ball, but this also means there are likely more defensive backs who drop back in coverage, leaving the line defensive line a little thinner. The fact that running backs tend to go to the left so much more than the right, suggest at least the perception that the left side of the Colt’s defensive line is an easier match up. There is not a marked difference in the yardage gained running left or right, so that perception is likely just that.
On the other hand, when the teams the Colts are going up against are in the single back formation (more of a designed run than shotgun), the probabilty of a run to the right is 0.54. It’s a slight preference to the right side, but the interesting piece is that when teams run right on the Colts, they pick up almost a half yard more on average. This suggests that it might be worth looking at 1) the players that tend to play on the right side of the defensive line and 2) how often the defensive backs line up on either side of the line.
The testable hypothesis: Running to the right against the Colts will lead to an increase in total yards gained on a given play.
#Generate the viz
viz_data <- indy_rush_defense %>%
filter(offenseFormation %in% c("SHOTGUN", "SINGLEBACK"))
ggplot(data = viz_data, aes(x = offenseFormation, y = avg_yards)) +
theme_ap(family = "sans") +
geom_col(aes(fill = rush_direction), position = "dodge") +
labs(x = "Offensive Formation",
y = "Yards Gained",
title = "Yards Gained by Rush Direction",
subtitle = "By Colt's Opponents") +
labs(fill = "Rush Direction") +
theme(legend.position = "right")
Here we are looking at the frequency of a given play for a given down.
comb_data <- data %>%
filter(!is.na(offenseFormation)) %>%
group_by(down, offenseFormation) %>%
summarise(count = n()) %>%
arrange(desc(count))
## `summarise()` has grouped output by 'down'. You can override using the
## `.groups` argument.
comb_data
## # A tibble: 28 × 3
## # Groups: down [4]
## down offenseFormation count
## <int> <chr> <int>
## 1 1 SHOTGUN 3192
## 2 2 SHOTGUN 3074
## 3 1 SINGLEBACK 2357
## 4 3 SHOTGUN 2352
## 5 2 SINGLEBACK 1256
## 6 1 I_FORM 646
## 7 3 EMPTY 498
## 8 2 EMPTY 459
## 9 1 PISTOL 367
## 10 1 EMPTY 349
## # ℹ 18 more rows
All 28 possible combinations are present in the data (7 plays and 4 downs). The least common combinations are all fourth down plays where either a pistol formation or wildcat formation are used. This makes logical sense because teams are least likely to run a play that is not a punt on fourth down, and unless they are losing in the final quarter, not very likely to run a long yardage play on forth down.
The most common combination is running a shotgun formation on first down. This also makes intuitive sense. Shotgun sets up for a run-pass-option style play where the quarterback can either hand the ball off to the runningback or pass to a reciever depending on what the coverage looks like. This is ideal for first down because it increases the likelihood that some yards will be gained, so teams don’t end up in a second and long situation. Gaining some yards on first down helps to open up the rest of the playbook.
viz_data <- comb_data %>%
filter(down == 3)
ggplot(data = viz_data, aes(x = reorder(offenseFormation, count), y = count)) +
geom_col(fill = "#669900") +
theme_ap(family = "sans") +
labs(x = "Offensive Formation",
y = "Yards Gained",
title = "Yards Gained by Rush Direction",
subtitle = "By Colt's Opponents")
The above vizualisation shows the frequency of plays run on third down. This is particularly important because getting a stop on third down is the difference between getting off the field, or having to play through a fresh set of downs. Very clearly we can see that the formaiton is most often Shotgun.