Experiments in bivariate EPA plotting

Introduction

If you’ve spent any amount of time following #nflscrapR or #nflfastR, more likely than not you’ve come across a plot that looks like this:

These plots are pretty neat, even moreso because you can generate them yourself from a Shiny app the fine fellows at RBSDM.com have put together and made available to the community for free.(side note: be very careful about how you spell R-B-S-D-M or you’ll end up in some pretty… interesting corners of the internet)

I’ve always had a bit of an itch about these plots. Some curiosity, if you’d like. So humor me as I try to explain what could be improved about these plots, and try to build an improvement myself.

Side note: the code in this analysis is perhaps a little over-explained. Veteran R users will find this annoying, however, I thought I would put a little extra effort into explaining how the fibbly bits work for the benefit of more novice users who are finding @nflscrapR @nflfastR data their gateway drug into R and the wild-and-wooly world of data science more broadly.

Let’s set up an environment and get going, shall we?

# LOAD REQUIRED LIBRARIES - INSTALL IF NOT PRESENT
#
# I stole this rather elegant piece of code from user Technophobe01 on SO:
# https://stackoverflow.com/questions/52574339/best-way-to-install-required-packages-in-r


requiredPackages <- c("devtools", "tidyverse", "nflfastR", "depth",
                      "ggthemes", "ggforce", "ggimage", "ggrepel")

## ipak() takes a vector of packages, iterates through them, and checks to see
## if they're installed. If not, it installs them.

ipak <- function(pkg) {
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

ipak(requiredPackages)

##  devtools tidyverse  nflfastR     depth  ggthemes   ggforce   ggimage   ggrepel 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE

Preliminaries done with, let’s fetch our data.

And do some filtering to keep things manageable. Instead of exactly reproducing the original graph I linked, let’s filter it down using the same conditions for the data in the plot to the 2019 seaason, to help make sense of the data as we go.

df <- dat %>% 
  filter(
    play_type %in% c("pass", "run"),
    wp >= 0.05, 
    wp <= 0.95,
    !is.na(epa)) 

liveries <- teams_colors_logos %>% 
  select(team_abbr, team_logo_espn, team_color, team_color2) %>% 
  filter(team_abbr %in% df$posteam)

Calculating ‘mean EPA’

As a first step, let’s calculate the mean EPA per play across all of each team’s plays during the season and put the results in a new dataframe (mean_epas).

# We take our initial 'source' dataframe (df) and cut it down a bit           [1]
# then pivot the tibble so that all EPA values are associated with a team a
# side of the ball                                                            [2]
# then we group and summarize using the mean() function                       [3]
# recode the 'side variable' so it's more readable                            [4]
# and pivot it back out into columns                                          [5]


mean_epas <- df %>% 
  select(game_id, posteam, defteam, epa) %>%                                 #[1]
  pivot_longer(posteam:defteam,                                              #[2]
               names_to = "side", 
               values_to = "team") %>% 
  group_by(team, side) %>% 
  summarize(mean_epa = mean(epa, na.rm = T)) %>%                             #[3]
  mutate(side = recode(side,                                                 #[4]
                       "posteam" = "mean_Offense", 
                       "defteam" = "mean_Defense")) %>% 
  pivot_wider(names_from = side, values_from = mean_epa)                     #[5]

mean_epas

## # A tibble: 32 x 3
## # Groups:   team [32]
##    team  mean_Defense mean_Offense
##    <chr>        <dbl>        <dbl>
##  1 ARI        0.140         0.0439
##  2 ATL        0.0995        0.0346
##  3 BAL       -0.0950        0.138 
##  4 BUF       -0.0713       -0.0124
##  5 CAR        0.0654       -0.0657
##  6 CHI       -0.0605       -0.0685
##  7 CIN        0.146        -0.110 
##  8 CLE        0.0262       -0.0213
##  9 DAL        0.0300        0.119 
## 10 DEN       -0.00302      -0.0330
## # ... with 22 more rows

Reproducing the original graph

Now that we have the season-wide means for offensive and defensive EPAs, we can use it to reproduce the bivariate plots one sees in RBDMS.com:

# This is straight up ggplot-ting, but first we join the liveries table     
#   to get the team logos                                                    [1]
# two calls to geom_hline() and geom_vline to plot the league-wide means     [2]
# and geom_image() places the logos on the grid                              [3]

mean_epas %>% 
    left_join(liveries, by = c("team" = "team_abbr")) %>%                   #[1]
    ggplot(aes(x = mean_Offense, y = mean_Defense)) + 
    scale_y_reverse() + 
    theme_bw() + 
    theme(plot.title = element_text(size = 12, 
                                    hjust = 0.5, 
                                    face = "bold")) + 
    geom_hline(aes(yintercept = mean(mean_Defense, na.rm = T)),             #[2]
               color = "red", 
               linetype = "dashed") + 
    geom_vline(aes(xintercept = mean(mean_Offense, na.rm = T)), 
               color = "red", 
               linetype = "dashed") + 
    geom_image(aes(image = team_logo_espn)) +                               #[3]
    labs(
        title = "Offensive EPA/play vs Defensive EPA/play, 2019 season",
        subtitle = "Win probability between 5% and 95%, including post-season",
        x = "Offensive EPA/play",
        y = "Defensive EPA/play",
        caption = "CC-BY-SA @GrumpyNFLStats - data by @nflfastR"
    ) + 
    coord_fixed() ## necessary for team logos to display correctly *table_flip*

What could be improved about this graph?

It’s hard to dislike the RBDMS.com graph, because it’s very satisfying to look at and it seems to convey a lot of data. At a glance, we can see (for example) that the strength of the Chiefs was their offense, and that the Bengals and Redskins struggled mightily on both sides of the ball.

So what’s the problem, Grumpy?, you ask.

Well, here’s the thing. Three things, actually.

Analyzing EPA/play at the season level not quite as straight forward as taking the mean(). For one thing, at the season level EPA is pretty skewed:

# Quartile-quartile plots are classical tools to examine data.
# If the distribution of EPA were normal at the season-level, we'd see these
# points mostly on a straight line. The 'heavy tail' on the left is a tell-tale
# sign that the distribution is not normal.
qqnorm(df$epa)

Using the mean to calculate the central tendency of a skewed distribution is one way insights can get hidden. There are more robust ways of detecting the central tendency, like the median… but this leads us to:

Season-level means are probably not appropriate, but game-level means are a whole ’nother hill of beans. Let’s think this through:

Suppose 2 teams have 40 really great plays on both sides of the ball over the season. Team A has all 40 high-EPA plays in one game. Team B has 2 or 3 per game. If both teams have the same mean(EPA), which one is more likely to have the better record?

Answer: Team B. If the season-level mean(epa)/play is the same, then it figures that the mean(epa)/play of each team was about equal without the high-EPA plays. If Team A had its high-EPA plays in a single game, then its EPA/play in the rest of their games was much lower than Team B. Lower EPA/play ~ fewer wins.

So from that perspective, EPA/play makes perfect sense as a statistic at the game level, just not the season level.

This is getting perhaps a little technical, but an important thing to note about the RBDMS.com type graph is that it’s not EPA is not actually bivariate at the play level. In other words, no single play gives you both an Offensive EPA and a Defensive EPA value. The only way to create a meaningful duple (off_EPA, def_EPA) is by summarizing at the drive-, game-, or season-level. Once we have those duples, we can put them on a grid and then find the central tendency.

Finding a better season-level mean(epa)

What I’m going to try to do now is to resolve (or at least address) points 1-3 and produce a plot that tells the same story as the initial one, plotting Offensive EPA vs. Defensive EPA, while also communicating any other little discoveries we make along the way.

The plan of work is as follows. For each team for the 2019 season:

We calculate game-level mean(epa) for offense and defense
We use those game-level means to find a season-level bivariate median, in this case a depth median or Tukey median.
We use the depth medians of the bivariate distributions to create bag plots, which do the same job as box plots but for 2 dimensions.
We use the Euclidean distance between each bivariate point (i.e. each game) to find a deviation score.
We plot each team’s depth median and deviation score on a 2-d grid and crack a cold one.

Now wait a gorram minute, Grumpy, you say, What in blazes is a depth median?. Well, the math is hard but the concept is simple. Imagine that you spread a bunch of marbles in a sandbox. You start drawing lines that cross the sandbox from one end to another, so that you have the same number of marbles on either side of the box. Do this enough times, and those lines will start forming polygons, which in fancy-math speak are also called hulls. The center of the smallest hull is the depth median for that scattering of marbles. This explanation has a 50% chance of being only half-right, but just roll with it.

Cool? Cool Alright, then, we ride!

Calculating the various game-level EPAs and the bivariate (Tukey) median

## # A tibble: 32 x 9
## # Groups:   team [32]
##    team  data  dm_Defense dm_Offense depth season_mean_Def~ season_mean_Off~
##    <chr> <lis>      <dbl>      <dbl> <dbl>            <dbl>            <dbl>
##  1 ARI   <tib~    0.123       0.0129 0.438          0.151            0.0119 
##  2 ATL   <tib~   -0.00722     0.0816 0.438          0.105            0.0323 
##  3 BAL   <tib~   -0.106       0.183  0.353         -0.115            0.226  
##  4 BUF   <tib~   -0.0588      0.0107 0.412         -0.0800          -0.00155
##  5 CAR   <tib~    0.114      -0.0771 0.438          0.0787          -0.136  
##  6 CHI   <tib~   -0.0661     -0.0957 0.438         -0.0904          -0.0726 
##  7 CIN   <tib~    0.140      -0.155  0.438          0.173           -0.129  
##  8 CLE   <tib~    0.0623     -0.0266 0.438          0.0131          -0.0229 
##  9 DAL   <tib~    0.00410     0.150  0.438          0.0176           0.136  
## 10 DEN   <tib~    0.00743    -0.0165 0.438         -0.00725         -0.00666
## # ... with 22 more rows, and 2 more variables: play_mean_Defense <dbl>,
## #   play_mean_Offense <dbl>

I came to learn about the depth median as I was looking for a math-correct 2d- version of the classic box plot. Turns out, making a 2d boxplot is not as simple as extending the box and whiskers into the y-axis. Fortunately, people smarter than I (including that Tukey fellow) came up with the answer: the bag plot.

The concept of a bag plot is pretty nifty. For a bivariate (i.e. 2-dimensional) distribution, find its depth median, then draw a hull around that point such that the hull touches exactly half of the points in the distribution. This is called the bag. Then draw another hull of that touches the maximum possible number of points not in the bag that has a maximum area of 3x the bag. That’s called the fence. Compared to a box plot, the depth median is equivalent to the median, the bag is the box, and the fence are the whiskers. Any point outside the fence are outliers.

Makes a lot of sense, if you ask me. Me gusta.gif

Let’s take the list column of the all_mean_epas dataframe we just made (which has the game-level mean(epa)’s, remember) and use that data to construct bag plots for each team.

bagplots <- all_mean_epas %>% 
  left_join(liveries, by = c("team" = "team_abbr")) %>%
  ggplot(aes(group = team_color, fill = team_color, color = team_color)) + 
  geom_bag(data = . %>% 
             select(team, data, team_color) %>% 
             unnest(data), 
           aes(x = game_mean_Offense, 
               y = game_mean_Defense,
               fill = team_color)) + 
  geom_point(data = . %>% 
               select(team, data, team_color) %>% 
               unnest(data), 
             aes(x = game_mean_Offense, 
                 y = game_mean_Defense, 
                 color = team_color), 
             alpha = 0.5) + 
  scale_fill_identity() +
  scale_color_identity() +
  geom_point(aes(x = season_mean_Offense, y = season_mean_Defense)) + 
  theme_bw() + 
  theme(plot.title = element_text(size = 12, hjust = 0.5, face = "bold")) +
  geom_hline(aes(yintercept = play_mean_Defense), 
             color = "red", 
             linetype = "dashed") + 
  geom_vline(aes(xintercept = play_mean_Offense), 
             color = "red", 
             linetype = "dashed") + 
  geom_hline(aes(yintercept = dm_Defense), 
             color = "dodgerblue", 
             linetype = "dashed") + 
  geom_vline(aes(xintercept = dm_Offense), 
             color = "dodgerblue", 
             linetype = "dashed") + 
  scale_y_reverse(lim = c(0.75, -0.75)) +
  xlim(-0.75, 0.75) +
  coord_fixed() +
  labs(
    title = "Median Game Performance by Team",
    subtitle = "2019 season - each point is a game - cross indicates 
    depth median of the bivariate distribution",
    x = "Offensive EPA/game",
    y = "Defensive EPA/game",
    caption = "dashed lines are season-level means per play (red) & 
    season-level means per game (blue)\n
    CC-BY-SA @GrumpyNFLStats - data by @nflfastR"
  )

for( i in 1:6 ) {
  p <- bagplots + facet_wrap_paginate(
                      ~team,
                      ncol = 3,
                      nrow = 2,
                      page =  i)

  print(p)
  }

Looking real good so far.

It’s evident from the bag plots that some teams are well-represented by the more classical season level mean(epa) but some are absolutely not. Check out the difference between EPA/play and the depth median for BAL, CIN, NE, NO, PIT, SF, and WAS. Clearly, EPA/play is not telling us the whole story for those teams.

One cool thing you can do with bagplots is to compare 2 teams, much like you might do in one dimension with density plots. Here’s how different the Super Bowl champion Chiefs were from my beloved Cowboys (hint: not very):

It’s also evident that some teams marbles, er, games, are not as scattered as others. Most teams have no games that could be counted as outliers, while ATL, CHI, DET, IND, JAX and NYG have two. And the hulls are not all the same size, indicating that for some teams, the game-level means are much more close together than for others. This points to a metric of consistency, e.g. how close was each teams game-to-game performance to its overall median.

If we calculate this distance on the 2-dimensional plane using some good-ole Euclid, we can show how much each game deviated from the eventual median for each team:

# Some more ggplot-ting :) Nothing too fancy happening here
# We mutate the all_mean_epas dataframe to calculate the linear distance
#  between each point on the plane and the depth median                      [1]
# Then we join the liveries dataframe as before to get the colors we need    [2]
# Order the teams by ascending mean deviation                                [3]
# Plot first the points, then the text labels for the min and max deviations [4]

all_mean_epas %>% 
  unnest(data) %>% 
  mutate(game_dev = sqrt(
      (game_mean_Defense - dm_Defense)^2 + 
      (game_mean_Offense - dm_Offense)^2)                                   #[1]
  ) %>% 
  left_join(liveries, by = c("team" = "team_abbr")) %>%                     #[2]
  ggplot(aes(x = fct_reorder(team, game_dev, mean, .desc = T),              #[3]
             y = game_dev, 
             group = team_color)) + 
  geom_point(aes(color = team_color), 
             alpha = 0.4) + 
  scale_y_reverse(limits = c(2.3, -1)) + 
  scale_color_identity() + 
  geom_text_repel(data = . %>% group_by(team) %>% 
                    slice(which.max(game_dev)),                             #[4] 
                  aes(label = game_id), 
                  box.padding = 1, 
                  point.padding = 0.6,
                  size = 3,
                  direction = "y",
                  nudge_y = -0.5,
                  color = "red",
                  hjust = 0,
                  angle = 90) +
  geom_text_repel(data = . %>% group_by(team) %>% 
                    slice(which.min(game_dev)),                             #[4]
                  aes(label = game_id),
                  point.padding = 0.6,
                  direction = "y", 
                  nudge_y = 0.5, 
                  angle = 90, 
                  color = "dodgerblue",
                  hjust = 1,
                  size = 3) + 
  theme_bw() +
  theme(plot.title = element_text(size = 12, hjust = 0.5, face = "bold")) +
  labs(
    title = "Deviation from the depth median per game, by team",
    subtitle = "2019 Season - most/least representative games indicated",
    x = "Team (in ascending order of consistency)",
    y = "Deviation from the depth median",
    caption = "CC-BY-SA @GrumpyNFLStats - data by @nflfastR"
  )

For all their underperformance, it looks like IND, BUF, and PIT were very consistent teams. While BAL, NE, SF and NO were far more inconsistent - more prone to wildly over/underperforming their observed medians - than the rest of the league.

Building the bivariate EPA plot

So now we’re close to the end. We’ve found the bivariate medians for each teams’ mean(epa), we’ve investigated what those medians mean, and come up with a way to quantify the consistency of each team’s performance.

Now it’s time to build the Offensive EPA vs. Defensive EPA as before, but better. YMMV on the ‘better’ part

Unfortunately we can’t really pile the bag plots on each other and call it a day. I tried. It’s cool for 2-3 teams but is an undecipherable mess when you pile 64 polygons on each other, no matter how you manipulate the axes.

So let’s try something else.

We have the coordinates of the depth median from the all_mean_epas data frame. We also have the deviation scores, and a neat thing we get from the deviation scores is that we can use those as indicators of uncertainty around the depth median.

So let’s use the depth median as points, and circles with radius equal to the sqrt(deviation) as our indicators of week-to-week variation.

# One more ggplot :) 
# We start off with the same pipeline that we used to build the previous
#  plot, calculating deviation scores etc.                                   [1]
# Then we calculate the team-level mean of those deviation scores            [2]  
# Build the ggplot(), first plotting the
# Circles on the depth medians with radius equal to sqrt(game_dev)           [3]
# Then place the team icons on top of the depth medians.                     [4]


all_mean_epas %>% 
    unnest(data) %>% 
    mutate(game_dev = sqrt(
        (game_mean_Defense - dm_Defense)^2 + 
        (game_mean_Offense - dm_Offense)^2)
    ) %>% 
    left_join(liveries, by = c("team" = "team_abbr")) %>%                   #[1]
  group_by(team, dm_Offense, dm_Defense, 
           team_logo_espn, team_color, 
           team_color2) %>% 
  summarize(mean_game_dev = mean(game_dev)) %>%                             #[2]
    ggplot() +
    geom_circle(aes(x0 = dm_Offense,                                        #[3]
                    y0 = dm_Defense, 
                    r = mean_game_dev^2, 
                    color = "grey90", 
                    fill = team_color), 
                alpha = 0.1) +
    geom_image(aes(x = dm_Offense,                                          #[4]
                   y = dm_Defense, 
                   image = team_logo_espn)) + 
    scale_y_reverse() + 
    theme_bw() + 
    theme(plot.title = element_text(size = 12, hjust = 0.5, face = "bold")) + 
    geom_hline(aes(yintercept = mean(dm_Defense, na.rm = T)), 
               color = "red", 
               linetype = "dashed") + 
    geom_vline(aes(xintercept = mean(dm_Offense, na.rm = T)), 
               color = "red", 
               linetype = "dashed") + 
    labs(
        title = "Offensive EPA/game vs Defensive EPA/game, 2019 season",
        subtitle = "Win probability between 5% and 95%, including post-season",
        x = "Offensive EPA/game",
        y = "Defensive EPA/game",
        caption = "Circles indicate root mean deviation across all games\n
        CC-BY-SA @GrumpyNFLStats - data by @nflfastR"
    ) + scale_fill_identity(guide = F) + scale_color_identity(guide = F) +
    coord_fixed() ## necessary for team logos AND circles *2nd table_flip*

The thing I notice most about this plot is how much clearer the “any given Sunday” effect is on this plot than in the original. If two teams’ logos are inside each others’ bubbles, it’s likely they are closely matched - moreso than the difference in their graph positions might suggest.

It also suggests - and I apologize to the Kansas City fans reading this - that NE, SF, NO, and BAL wuz robbed. Week-to-week, they were better teams than KC. The big advantage KC had was consistency: maybe this is what paid off for them in the end.

Afterword

We did what we set out to do, mostly. We used a more resonable way to summarize EPA and a more rigorous way to find the central tendency of team’s game-time performance (i.e. the (Offensive_EPA, Defensive_EPA duple)). And we came up with a variant of the RBSDM.com-type plot that contains our work and supports a different, hopefully better, take on the data.

Clearly, the work is not perfect. I would have liked to add things like controls for common opponents, leverage weighting, and things like that. But we’ll leave that for another time.

Take care, y’all.

@Grumpy

Experiments in bivariate EPA plotting

Mario Martinez (@grumpyNFLstats)

6/15/2020

Experiments in bivariate EPA plotting

Introduction

Calculating ‘mean EPA’

Reproducing the original graph

What could be improved about this graph?

Finding a better season-level mean(epa)

Calculating the various game-level EPAs and the bivariate (Tukey) median

Building the bivariate EPA plot

Afterword