“All warfare is based on deception. Hence, when we are able to attack, we must seem unable; when using our forces, we must appear inactive; when we are near, we must make the enemy believe we are far away; when far away, we must make him believe we are near.” - Sun Tzu, The Art of War

Introduction

If you watch NFL football long enough, you’ll start to observe some regularities in how offensive plays are called on particular down-and-distances. It will quickly seem normal to see run-optimized, 3-TE sets on 3rd-and-1, for example. 2nd-and-8 brings out the wide-outs.

It’s part of the received wisdom of football that some situations call for passing, and some for running, right?

If so, doesn’t the defense know this wisdom as well? Wouldn’t the LBs lean a little back on 2nd-and-8? Wouldn’t the safeties creep up and crash the line on 3rd-and 1?

If so, and the offense notices, wouldn’t it be advantageous to do the opposite of what the received wisdom was?

If ^if…

You see what I’m getting at. Playcalling as information warfare. Play and counterplay. Feint and counterfeint.

“All warfare is based on deception.”

Objective

In this analysis, we will investigate whether there is any advantage in going against league-wide tendencies in playcalling. The conjecture would be that defenses (either overtly through coaching or covertly through habit) expect either a run or a pass on a particular down-and-distance, and so going against that expectation yields the offense some element of surprise and thus some added efficiency.

Side note: the code in this analysis is perhaps a little over-explained. Veteran R users will find this annoying, however, I thought I would put a little extra effort into explaining how the fibbly bits work for the benefit of more novice users who are finding @nflscrapR data their gateway drug into R and the wild-and-wooly world of data science more broadly.

First, we set up our R environment, then load the play-by-play data from the nflscrapR github.

# LOAD REQUIRED LIBRARIES - INSTALL IF NOT PRESENT
#
# I stole this rather elegant piece of code from user Technophobe01 on SO:
# https://stackoverflow.com/questions/52574339/best-way-to-install-required-packages-in-r


requiredPackages <- c("tidyverse","magrittr", "devtools", "ggrepel")

## ipak() takes a vector of packages, iterates through them, and checks to see
## if they're installed. If not, it installs them.

ipak <- function(pkg) {
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

ipak(requiredPackages)

## tidyverse  magrittr  devtools   ggrepel 
##      TRUE      TRUE      TRUE      TRUE

# Load the libraries we just downloaded

library(devtools)
library(tidyverse)
library(magrittr)
library(ggrepel)

# ggthemr is not on CRAN, so we can't download it with ipak(). But, having 
# installed and loaded devtools, we can go get it quite easily.

devtools::install_github('cttobin/ggthemr')
library(ggthemr)

# LOAD THE NFLSCRAPR DATAFRAME
#
# There are a number of ways of doing this. This approach downloads the nflscrapr
# files into a tempfile,                                                    (1)
# then reads them into one dataframe using purrr                            (2)

# NOTE: This can take a while, so go get a drink or something :)

# read_nflscrapr()
# A generic scraping functions parametrized for @nflscrapR. It takes a vector
# of URLs, downloads them into a tempfile(), and reads them with the specific
# parameters that suit nflscrapR files.

read_nflscrapr <- function(url) {
  loc <- tempfile()
  download.file(url, loc)
  df <- read.csv(loc, skip = 0, header = TRUE, stringsAsFactors = F)
  df
}

### Now let's point read_nflscrapr to the correct files
### In this case, regular season and postseason play by plays on Ron Yurko's
### GitHub

data_urls  <- c(paste0("https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/regular_season/reg_pbp_", 2009:2018, ".csv"),
                paste0("https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/play_by_play_data/post_season/post_pbp_", 2009:2018, ".csv")
                )


### And now we use map_dfr() to iterate through the URLs.
### (Nerd note: modern R is somewhat allergic to iteration via for() loops - 
### the map() functions from purrr are the replacement of choice)

data <- purrr::map_dfr(data_urls, read_nflscrapr)

Main Analysis

Calculating league-wide tendencies.

Preliminaries taken care of, let’s calculate the league-wide tendencies for each down-and-distance.

One thing to be aware of is that the run/pass tendency is likely to be influenced by both field position and score differential. Controlling for this completely would be tricky, but it’s probably good enough to constrain the sample to plays out of the redzone and in regulation when neither team has yet run away with the game.

# We'll create the "runpassmix" dataframe by filtering the play-by-play data to
# only runs and passes                                                      (A1)
# in regulation                                                             (A2)
# with the win probability for both teams between 25 and 75%                (A3)
# and out of the redzone                                                    (A4)
# then divide the plays into groups by down and distance                    (A5)
# count the plays                                                           (A6)
# calculate a new variable "run.prop" with the observed run probability 
#   (or prop.ortion)                                                        (A7)
# finally, arrange everything nicely for readability                        (A8)

runpassmix <- data %>% 
  filter(play_type %in% c("run", "pass"),                                   # A1
         qtr <= 4,                                                          # A2
         between(home_wp, 0.25, 0.75),                                      # A3
         between(yardline_100, 20, 80)) %>%                                 # A4 
  group_by(down, ydstogo) %>%                                               # A5
  summarize(num = n(),                                                      # A6
            run.prop = sum(play_type == "run", na.rm = T)/n()) %>%          # A7
  arrange(down, ydstogo)                                                    # A8

runpassmix

## # A tibble: 129 x 4
## # Groups:   down [4]
##     down ydstogo   num run.prop
##    <int>   <int> <int>    <dbl>
##  1     1       1     5    0.4  
##  2     1       2     3    1    
##  3     1       4     6    0.667
##  4     1       5   484    0.591
##  5     1       6     1    1    
##  6     1       7     2    0    
##  7     1       8     4    0.5  
##  8     1       9     3    0.333
##  9     1      10 59837    0.512
## 10     1      11     4    0.5  
## # ... with 119 more rows

This dataframe, runpassmix, has some interesting insights. For example, it seems that NFL teams are quite inclined to run on 1st down, but even more so on 2nd/3rd-and-short. Conversely, teams all but abandon the run on 3rd and 2 or more.

Received wisdom in action.

# Let's set the ggplot theme for the rest of the analysis. ggthemr provides some
# great out-of-the-box themes. We'll use "pale".

ggthemr(palette = 'pale')


# Next, we'll plot runpassmix as points, with each point having:
# x = the observed run probability                                          (B1)
# y = the down                                                              (B2)
# size = the frequency of the down and distance                             (B3)
# fill = the down, again                                                    (B4)
# Then we'll add some pretty labels with geom_text_repel() on only          (B5)
# part of the data so the labels don't clutter too much                     (B6)
# Finally, we'll add titles, labels, and                                    (B7)
#       supress the legend for the fill                                     (B8)


runpassmix %>% 
  ggplot(aes(x = run.prop,                                                  # B1
             y = factor(down))) +                                           # B2
    geom_point(aes(size = num,                                              # B3       
                   fill = factor(down)),                                    # B4
               shape = 21,                   # a circle with an outline in color
               color = "black",
               alpha = 0.6) + 
  scale_size_area(max_size = 18, # for a bit more range in the circle sizes
                breaks = c(1000, 3000, 6000, 10000),
                labels = c("1000", "3000", "6000", "10000")) +    
  geom_text_repel(aes(label = paste0(down, " and ", ydstogo)),              # B5
                  data = subset(runpassmix, num > 2300),                    # B6
                  point.padding = 0.9) +
  labs(title = "Observed run probability by down and distance",             # B7
       subtitle = "NFL Seasons 2009-2018",
       x = "Observed run probability",
       y = "Down") +
  guides(fill = F,                                                          # B8
         size = guide_legend(title = "Frequency")
         ) +
  scale_x_continuous(labels = scales::percent)

Fitting a model

We now want to investigate whether observed run probability (i.e. received wisdom about down-and-distance i.e. run.prop) affects the efficiency of runs and passes, particularly,

EPA on runs when run.prop is low, relative to all runs
EPA on passes when run.prop is high, relative to all passes

Our tool of choice for this will be a linear regression. The two bullet points above, let’s call it the Deception Effect, will be contained in one predictor of the model, called an interaction term.

WTF is a linear regression and where is my football? Unforrow your brow, dear reader, and lookit this tl;dr: a linear regression is just a way to test whether a variable is linked to another variable in a consistent and regular way. It’s not much more complicated than the y = mx + b stuff we learned in 9th grade (in which y is linked to x in a consistent and regular way called ‘m’) except with some added matrix algebra that I’ll spare you. Linear models in R are useful for all manner of tests. Here’s a great resource to start you off: https://lindeloev.github.io/tests-as-linear/

We’ll fit a linear model that says “A given play’s EPA depends on the play type (run or pass), the observed run probability, and whether or not the play was a run when the observed run probability indicated run”

In R, that would be epa ~ run.prop + play_type + run.prop:play_type or just epa ~ runprop * play_type for short.

We’ll join our play-by-play data with runpassmix, filtering it first to constrain it to the same conditions as before. Then we’ll fit the linear model with lm()

rp.model <- data %>% 
  filter(play_type %in% c("run", "pass"), 
         between(home_wp, 0.25, 0.75), 
         between(yardline_100, 20, 80)) %>% 
  left_join(runpassmix) %>% 
  lm(data = ., epa ~ run.prop * play_type) 

summary(rp.model)

## 
## Call:
## lm(formula = epa ~ run.prop * play_type, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3411  -0.7041  -0.1529   0.7040   8.5620 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.14456    0.01074 -13.460  < 2e-16 ***
## run.prop               0.41421    0.02654  15.604  < 2e-16 ***
## play_typerun           0.15166    0.02521   6.015 1.81e-09 ***
## run.prop:play_typerun -0.59082    0.05247 -11.260  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.411 on 141133 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.002675,   Adjusted R-squared:  0.002654 
## F-statistic: 126.2 on 3 and 141133 DF,  p-value: < 2.2e-16

Heyo look at those three pretty stars. That means that the effects of each variable (run.prop, play_type, and the interaction between the two) on EPA is very consistent (technically, very unlikely to be inconsistent).

However, the R² is, to use a technical term, ratched. It means only 0.2% of the variability of EPA is linked to the run.prop * play_type predictors. That’s really extraordinarily poor, but in the realm of nflscrapR it’s not really unexpected. EPA is a very, very noisy variable to work with. So we have to lower our standards a bit here.

Let’s investigate those three little stars some more. If the rp.model is correct, we should see

EPA move with run.prop,
EPA move with play_type, (this is a known thing already, passing is better in terms of EPA all-up)
EPA move in different directions with run.prop on runs and passes, i.e. the Deception Effect.

So let’s plot it out and see if tracks.

# We start with the same filter and join steps we used for the rp.model data,
# then we plot EPA against run.prop                                         (D1)
# add a smoother that calculates a linear model (not the default smoother!) (D2)
# and separate the passes and runs into two different plots                 (D3)
# then prettify as we did above.


data %>% 
  filter(play_type %in% c("run", "pass"), 
         between(home_wp, 0.25, 0.75), 
         between(yardline_100, 20, 80)) %>% 
  left_join(runpassmix) %>% 
  ggplot(aes(x = run.prop, y = epa)) +                                      # D1
  geom_smooth(aes(color = play_type, fill = play_type), method = "lm") +    # D2
  facet_wrap(~ play_type) +                                                 # D3
  labs(
    title = "Estimate of the EPA of passes and runs according to observed run probability",
    subtitle = "Linear estimate based on NFL seasons 2009-2018",
    y = "EPA",
    x = "Observed run probability"
    ) +
  scale_x_continuous(labels = scales::percent) +
  guides(color = guide_legend(title = "Play type"),
         fill = guide_legend(title = "Play type"))

Reading the graph, we can see that the model estimates that passing gains about 0.1 point of EPA just by passing in a 75% run situation vis-a-vis a pass in a 25% run situation, i.e. when the defense might be expecting pass. Likewise, running loses a little less than 0.1 point of EPA by running when the defense expects run.

This is a very good result.

The ratched R² is still bugging me, so I’m going to test it by fitting a null model (a cut-down model without the variable we’re interested in, in this case the run.prop * play_type interaction) and see what exactly we’re gaining in terms of explicatory power.

null.model <- data %>% 
  filter(play_type %in% c("run", "pass"), between(home_wp, 0.25, 0.75), between(yardline_100, 20, 80)) %>% 
  left_join(runpassmix) %>% 
  lm(data = ., epa ~ run.prop + play_type) 

summary(null.model)

## 
## Call:
## lm(formula = epa ~ run.prop + play_type, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3317  -0.6921  -0.1567   0.7094   8.5240 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.090114   0.009594  -9.392   <2e-16 ***
## run.prop      0.262998   0.022907  11.481   <2e-16 ***
## play_typerun -0.116955   0.008170 -14.315   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.412 on 141134 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.001779,   Adjusted R-squared:  0.001765 
## F-statistic: 125.8 on 2 and 141134 DF,  p-value: < 2.2e-16

There are those three little stars again! It seems that run.prop and play_type have an effect on EPA even without the Deception Effect. But! the R² is even more ratched, so it might be that the Deception Effect is doing work.

We can test this by using an ANOVA (analysis of variance) between the null model and our Deception model. ANOVA tests whether the predictions of two models are substantively different.

anova(null.model, rp.model)

## Analysis of Variance Table
## 
## Model 1: epa ~ run.prop + play_type
## Model 2: epa ~ run.prop * play_type
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1 141134 281263                                  
## 2 141133 281011  1    252.46 126.79 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The anova() test says that Model 2 (with the Deception Effect) fits better than the null model at the cost of one extra predictor (that’s what Df = 1, degrees of freedom, means). So for the cost of just slightly increasing the complexity of the model, we gained a much better fit.

This is really good result.

One last thing. Making really complicated models with a lot of predictors isn’t really a good thing, even when our fit (R²) increases. The reason is that each added predictor dilutes the effect of the other predictor and makes the model (and the underlying mechanism it’s supposed to represent) just that much harder to interpret and describe.

The standard test to see if a model has been complicated too much is the Akaike Information Criterion. No tl;dr, sorry, it comes from information theory which is so very not my field. But you can wikipedia it if you’re really curious.

The rule of thumb for using AIC is, a model with a low AIC is more efficient than model with a high AIC. So in our case, if the rp.model has a very high AIC compared to the null model, we might prefer the null model even with its lower predictive power.

So, let’s test it.

AIC(null.model, rp.model)

##            df      AIC
## null.model  4 497859.8
## rp.model    5 497735.0

Now this is where, in my line of work, you lean back and crack a cold one. The rp.model not only fits the data better, it does so at a lower AIC! A model that fits better and is more information-efficient? That’s some good modelin’.

Conclusion

What have we learned?

NFL teams chose whether to run or pass at variable rates depending on (among other things), down and distance.
Going against the run/pass trend on a given down-and-distance has a system (if small) advantage
The advantage cannot be explained sufficiently by run/pass trend and run/pass effects alone. The interaction between the two, the Deception Effect, does matter.

Deception: investigating the advantages of contrarianism in NFL playcalling