The Hidden Game of Football created the idea of “Expected Points”. They calculated the expected value of possessions at different place on the field. Our class used those expected values to evaluate football strategies (“Should they go for it?” “Should they kick a field goal?”“). In this lab, we’ll calculate our own expected values using data from the 2015 season.

Loading the Data and good old dplyr

n <- read.csv('/home/rstudioshared/shared_files/data/NFLPlaybyPlay2015.csv')
library(dplyr)

Points by Drive

The data we’re using in this lab should be familiar from a lab where we looked at NFL penalties. This data set has one line for every NFL play from the 2015 season. The first line of following code creates a table, drives_list, with one line for every drive. The next section of code determines how many points were scored on every scoring drive (scoring plays have sp==1). The last section of code joins scoring_drives with drives_list to create a table called drives with the number of points scored on every drive of every game. Note that safeties count as -2 points score. Take a look at each table as it’s created to make sure that you understand what this code is doing.

drives_list <- n %>% filter(posteam!="") %>% group_by(GameID, Drive, posteam) %>% summarize(secs_remaining=min(TimeSecs))

scoring_drives <- n %>% filter(sp==1) %>% 
  group_by(GameID, Drive) %>% 
  summarize(TDpoints = sum(7*Touchdown,na.rm=TRUE), 
            FGpoints = sum(ifelse(FieldGoalResult=="Good",3,0),na.rm=TRUE), 
            SAFpoints = sum(-2*Safety,na.rm=TRUE),
            points = TDpoints+FGpoints+SAFpoints
            ) 

drives <- left_join(drives_list, scoring_drives, by=c("GameID", "Drive"))
drives[is.na(drives$points), 5:8] <- 0

We actually need to know more than just how many points were scored on each drive itself. If a drive pins an opponent deep in their own end and creates a field position advantage (that may well lead to points later on) that should count for something. Meanwhile, if a drive results in great field position for the other team, we should account for that too. The following code, which gets a bit complicated, calculates the “next points scored” for each drive. These points may or may not be scored on the drive itself. If the next points scored are scored by the opposing team, they are recorded as negative points and they will be counted as negative points in our expected value calculation. Here, I don’t expect you to understand all of the code – we can discuss the particulars in class – but I don want you to understand what this code is achieving.

drives$nextpoints <- 0

for (g in unique(drives$GameID)){
  tmp <- drives %>% filter(GameID==g)
  for (d in 1:nrow(tmp)){
    i <- d
    while(tmp[i, ]$points==0 & i < nrow(tmp)){i <- i + 1}
    drives[drives$GameID==g & drives$Drive==tmp[d, ]$Drive, ]$nextpoints <- tmp[i, ]$points*(2*(tmp[i, ]$posteam==tmp[d, ]$posteam)-1)
  }
}

Take a look at the drives table paying careful attention to the nextpoints column.

Joining Drives and Plays

Next, we match each play in the full data set with the drives table we created so that for each play we have the number of next points scored (with a positive sign if the team in poses ion scored next and negative sign otherwise).

np <- left_join(n %>% select(GameID, Drive, down, yrdline100, ydstogo, qtr, TimeSecs), drives %>% select(GameID, Drive, points, nextpoints))
## Joining, by = c("GameID", "Drive")

This table will allow us to calculated expected points for each yard lines.

Let’s start by looking only at 1st and 10 plays and find the mean value of nextpoints for every position on the field. The following code makes that calculation and graphs the data:

library(ggplot2)

np %>% filter(down==1, ydstogo==10) %>% group_by(yrdline100) %>% summarize(expected_points=mean(nextpoints), num_plays = length(nextpoints)) %>% 
  ggplot(aes(yrdline100, expected_points, size=num_plays)) +geom_point() + geom_smooth() + ggtitle("Expected Points v. Yards from Touchdown")

Q1: What is the value of a first and 10 on the 50-yard line?
Q2: What is the value of a first and 10 on your own 25-yard line?
Q3: What is the value of a first and 10 on your opponents 25 yard line?
Q4: What is the value of gaining 10 yards? Q5: What is the cost of a turnover?

While we’re at it, let’s look at the value of a down. In the code below, we limit ourselves to 1st, 2nd and 3rd downs (instead of just 1st downs) but still use only plays with 10 yards to go for a first down. This time we’ll make three lines – one for each down.

np %>% filter(down<4, ydstogo==10) %>% group_by(down, yrdline100) %>% summarize(expected_points=mean(nextpoints), num_plays = length(nextpoints)) %>% 
  ggplot(aes(yrdline100, expected_points, size=num_plays, color=as.factor(down))) +geom_point() + geom_smooth(method="lm") + ggtitle("Expected Points v. Yards from Touchdown")

Q6: Estimate the value of a 3rd and 10 at the fifty yard line? Q7: What does it cost, in expected points, to throw an incomplete pass on 1st and 10? Does the cost appear to depend on the yard line? Q8: What does it cost, in expected points, to throw an incomplete pass on 2nd and 10? Does the cost appear to depend on the yard line?

Q9: We would like to be able to know the values of downs and distances. How many more points is a 3rd and 1 worth than a 3rd and 10? Try to devise and describe a method for placing point values on down and distances. Are there any complications or problems we might encounter when using your method? We will discuss our proposed methods in class and try them out in a future lab.