Main Analysis
Calculating league-wide tendencies.
Preliminaries taken care of, let’s calculate the league-wide tendencies for each down-and-distance.
One thing to be aware of is that the run/pass tendency is likely to be influenced by both field position and score differential. Controlling for this completely would be tricky, but it’s probably good enough to constrain the sample to plays out of the redzone and in regulation when neither team has yet run away with the game.
# We'll create the "runpassmix" dataframe by filtering the play-by-play data to
# only runs and passes (A1)
# in regulation (A2)
# with the win probability for both teams between 25 and 75% (A3)
# and out of the redzone (A4)
# then divide the plays into groups by down and distance (A5)
# count the plays (A6)
# calculate a new variable "run.prop" with the observed run probability
# (or prop.ortion) (A7)
# finally, arrange everything nicely for readability (A8)
runpassmix <- data %>%
filter(play_type %in% c("run", "pass"), # A1
qtr <= 4, # A2
between(home_wp, 0.25, 0.75), # A3
between(yardline_100, 20, 80)) %>% # A4
group_by(down, ydstogo) %>% # A5
summarize(num = n(), # A6
run.prop = sum(play_type == "run", na.rm = T)/n()) %>% # A7
arrange(down, ydstogo) # A8
runpassmix
## # A tibble: 129 x 4
## # Groups: down [4]
## down ydstogo num run.prop
## <int> <int> <int> <dbl>
## 1 1 1 5 0.4
## 2 1 2 3 1
## 3 1 4 6 0.667
## 4 1 5 484 0.591
## 5 1 6 1 1
## 6 1 7 2 0
## 7 1 8 4 0.5
## 8 1 9 3 0.333
## 9 1 10 59837 0.512
## 10 1 11 4 0.5
## # ... with 119 more rows
This dataframe, runpassmix, has some interesting insights. For example, it seems that NFL teams are quite inclined to run on 1st down, but even more so on 2nd/3rd-and-short. Conversely, teams all but abandon the run on 3rd and 2 or more.
Received wisdom in action.
# Let's set the ggplot theme for the rest of the analysis. ggthemr provides some
# great out-of-the-box themes. We'll use "pale".
ggthemr(palette = 'pale')
# Next, we'll plot runpassmix as points, with each point having:
# x = the observed run probability (B1)
# y = the down (B2)
# size = the frequency of the down and distance (B3)
# fill = the down, again (B4)
# Then we'll add some pretty labels with geom_text_repel() on only (B5)
# part of the data so the labels don't clutter too much (B6)
# Finally, we'll add titles, labels, and (B7)
# supress the legend for the fill (B8)
runpassmix %>%
ggplot(aes(x = run.prop, # B1
y = factor(down))) + # B2
geom_point(aes(size = num, # B3
fill = factor(down)), # B4
shape = 21, # a circle with an outline in color
color = "black",
alpha = 0.6) +
scale_size_area(max_size = 18, # for a bit more range in the circle sizes
breaks = c(1000, 3000, 6000, 10000),
labels = c("1000", "3000", "6000", "10000")) +
geom_text_repel(aes(label = paste0(down, " and ", ydstogo)), # B5
data = subset(runpassmix, num > 2300), # B6
point.padding = 0.9) +
labs(title = "Observed run probability by down and distance", # B7
subtitle = "NFL Seasons 2009-2018",
x = "Observed run probability",
y = "Down") +
guides(fill = F, # B8
size = guide_legend(title = "Frequency")
) +
scale_x_continuous(labels = scales::percent)
Fitting a model
We now want to investigate whether observed run probability (i.e. received wisdom about down-and-distance i.e. run.prop) affects the efficiency of runs and passes, particularly,
- EPA on runs when run.prop is low, relative to all runs
- EPA on passes when run.prop is high, relative to all passes
Our tool of choice for this will be a linear regression. The two bullet points above, let’s call it the Deception Effect, will be contained in one predictor of the model, called an interaction term.
WTF is a linear regression and where is my football? Unforrow your brow, dear reader, and lookit this tl;dr: a linear regression is just a way to test whether a variable is linked to another variable in a consistent and regular way. It’s not much more complicated than the y = mx + b stuff we learned in 9th grade (in which y is linked to x in a consistent and regular way called ‘m’) except with some added matrix algebra that I’ll spare you. Linear models in R are useful for all manner of tests. Here’s a great resource to start you off: https://lindeloev.github.io/tests-as-linear/
We’ll fit a linear model that says “A given play’s EPA depends on the play type (run or pass), the observed run probability, and whether or not the play was a run when the observed run probability indicated run”
In R, that would be epa ~ run.prop + play_type + run.prop:play_type or just epa ~ runprop * play_type for short.
We’ll join our play-by-play data with runpassmix, filtering it first to constrain it to the same conditions as before. Then we’ll fit the linear model with lm()
rp.model <- data %>%
filter(play_type %in% c("run", "pass"),
between(home_wp, 0.25, 0.75),
between(yardline_100, 20, 80)) %>%
left_join(runpassmix) %>%
lm(data = ., epa ~ run.prop * play_type)
summary(rp.model)
##
## Call:
## lm(formula = epa ~ run.prop * play_type, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3411 -0.7041 -0.1529 0.7040 8.5620
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.14456 0.01074 -13.460 < 2e-16 ***
## run.prop 0.41421 0.02654 15.604 < 2e-16 ***
## play_typerun 0.15166 0.02521 6.015 1.81e-09 ***
## run.prop:play_typerun -0.59082 0.05247 -11.260 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.411 on 141133 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.002675, Adjusted R-squared: 0.002654
## F-statistic: 126.2 on 3 and 141133 DF, p-value: < 2.2e-16
Heyo look at those three pretty stars. That means that the effects of each variable (run.prop, play_type, and the interaction between the two) on EPA is very consistent (technically, very unlikely to be inconsistent).
However, the R2 is, to use a technical term, ratched. It means only 0.2% of the variability of EPA is linked to the run.prop * play_type predictors. That’s really extraordinarily poor, but in the realm of nflscrapR it’s not really unexpected. EPA is a very, very noisy variable to work with. So we have to lower our standards a bit here.
Let’s investigate those three little stars some more. If the rp.model is correct, we should see
- EPA move with run.prop,
- EPA move with play_type, (this is a known thing already, passing is better in terms of EPA all-up)
- EPA move in different directions with run.prop on runs and passes, i.e. the Deception Effect.
So let’s plot it out and see if tracks.
# We start with the same filter and join steps we used for the rp.model data,
# then we plot EPA against run.prop (D1)
# add a smoother that calculates a linear model (not the default smoother!) (D2)
# and separate the passes and runs into two different plots (D3)
# then prettify as we did above.
data %>%
filter(play_type %in% c("run", "pass"),
between(home_wp, 0.25, 0.75),
between(yardline_100, 20, 80)) %>%
left_join(runpassmix) %>%
ggplot(aes(x = run.prop, y = epa)) + # D1
geom_smooth(aes(color = play_type, fill = play_type), method = "lm") + # D2
facet_wrap(~ play_type) + # D3
labs(
title = "Estimate of the EPA of passes and runs according to observed run probability",
subtitle = "Linear estimate based on NFL seasons 2009-2018",
y = "EPA",
x = "Observed run probability"
) +
scale_x_continuous(labels = scales::percent) +
guides(color = guide_legend(title = "Play type"),
fill = guide_legend(title = "Play type"))
Reading the graph, we can see that the model estimates that passing gains about 0.1 point of EPA just by passing in a 75% run situation vis-a-vis a pass in a 25% run situation, i.e. when the defense might be expecting pass. Likewise, running loses a little less than 0.1 point of EPA by running when the defense expects run.
This is a very good result.
The ratched R2 is still bugging me, so I’m going to test it by fitting a null model (a cut-down model without the variable we’re interested in, in this case the run.prop * play_type interaction) and see what exactly we’re gaining in terms of explicatory power.
null.model <- data %>%
filter(play_type %in% c("run", "pass"), between(home_wp, 0.25, 0.75), between(yardline_100, 20, 80)) %>%
left_join(runpassmix) %>%
lm(data = ., epa ~ run.prop + play_type)
summary(null.model)
##
## Call:
## lm(formula = epa ~ run.prop + play_type, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3317 -0.6921 -0.1567 0.7094 8.5240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.090114 0.009594 -9.392 <2e-16 ***
## run.prop 0.262998 0.022907 11.481 <2e-16 ***
## play_typerun -0.116955 0.008170 -14.315 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.412 on 141134 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.001779, Adjusted R-squared: 0.001765
## F-statistic: 125.8 on 2 and 141134 DF, p-value: < 2.2e-16
There are those three little stars again! It seems that run.prop and play_type have an effect on EPA even without the Deception Effect. But! the R2 is even more ratched, so it might be that the Deception Effect is doing work.
We can test this by using an ANOVA (analysis of variance) between the null model and our Deception model. ANOVA tests whether the predictions of two models are substantively different.
anova(null.model, rp.model)
## Analysis of Variance Table
##
## Model 1: epa ~ run.prop + play_type
## Model 2: epa ~ run.prop * play_type
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 141134 281263
## 2 141133 281011 1 252.46 126.79 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The anova() test says that Model 2 (with the Deception Effect) fits better than the null model at the cost of one extra predictor (that’s what Df = 1, degrees of freedom, means). So for the cost of just slightly increasing the complexity of the model, we gained a much better fit.
This is really good result.
One last thing. Making really complicated models with a lot of predictors isn’t really a good thing, even when our fit (R2) increases. The reason is that each added predictor dilutes the effect of the other predictor and makes the model (and the underlying mechanism it’s supposed to represent) just that much harder to interpret and describe.
The standard test to see if a model has been complicated too much is the Akaike Information Criterion. No tl;dr, sorry, it comes from information theory which is so very not my field. But you can wikipedia it if you’re really curious.
The rule of thumb for using AIC is, a model with a low AIC is more efficient than model with a high AIC. So in our case, if the rp.model has a very high AIC compared to the null model, we might prefer the null model even with its lower predictive power.
So, let’s test it.
AIC(null.model, rp.model)
## df AIC
## null.model 4 497859.8
## rp.model 5 497735.0
Now this is where, in my line of work, you lean back and crack a cold one. The rp.model not only fits the data better, it does so at a lower AIC! A model that fits better and is more information-efficient? That’s some good modelin’.
Conclusion
What have we learned?
- NFL teams chose whether to run or pass at variable rates depending on (among other things), down and distance.
- Going against the run/pass trend on a given down-and-distance has a system (if small) advantage
- The advantage cannot be explained sufficiently by run/pass trend and run/pass effects alone. The interaction between the two, the Deception Effect, does matter.