In the National Football League (NFL), teams are given four chances, or downs, to move the ball tens yards towards their opponent’s goal. Recently, teams’ strategy on fourth down has become increasingly scrutinzed. The traditional options are to
Punt the ball. This gives the ball to the other team, but puts them in a position that should be further away from the goal your team is defending.
Kick a field goal. Provided the team is within the placekicker’s range, it could try to kick a field goal and score three points.
Attempt to convert the fourth down. This has the reward of keeping possession of the ball with the risk of giving it to the other team at the current spot on the field.
The traditional strategy is to punt the ball away unless within field goal range. Attempts to convert only ever occur when a team is either very close to the opponent’s goal or if the team is deperate to maintain possession. Analytic trends disagree with the strategy and claim that in many situations, attempting the conversion is the superior option to kicking or punting. This analysis will look at fourth down plays in the NFL in 2020 to get an idea what the current state of the NFL is in regards to fourth down. It will look at overall trends in the league and also examine what individual teams are doing.
This data is provided from the nflfastR R package. The team that maintains the package also maintains play-by-play data for NFL games dating back to 1999. More information about the package can be found at https://www.nflfastr.com/.
When parsing the data for a season, nflfastR creates a dataframe. The data in each row relates to an individual play from the season. There are 340 columns that contain a variety of information. For the purposes of this analysis, I am only looking at plays that occurred on fourth down in the 2020 regular season. I am also limiting the columns to the following:
| Column Name | Description |
|---|---|
| posteam | The team that has possession of the ball |
| ydstogo | The number of yards needed to get a first down |
| yardline_100 | The number of yards away from the opponent’s goal |
| fixed_drive | A one-up counter of the drives (possessions) in the game |
| fixed_drive_result | The result of the drive |
| fourth_down_converted | A binary (1 or 0) to indicate if the team succeed on converting the fouth down |
| play_type | A categorical variable indicating the type of play |
| special_teams_play | A binary (1 or 0) to indicate if the play was a special teams play |
| game_seconds_remaining | The number (0-3600) of seconds left in the game |
| qtr | The quarter of the game the play occurred in |
| score_differential | The difference between the possessing team’s score and its opponents’ score |
library(ggplot2)
library(ggrepel)
library(ggimage)
library(nflfastR)
library(scales)
library(dplyr)
library(cowplot)
data <- readRDS(
url('https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2020.rds')
)
fourth_down <- data %>%
filter(down==4 & season_type=="REG") %>%
select(posteam, ydstogo,
yardline_100, fixed_drive_result,
fixed_drive, fourth_down_converted, play_type,
season, special_teams_play,
game_seconds_remaining, qtr, score_differential) %>%
data.frame()
The maintainers of this package are pretty good at returning clean and reasonable data. Of the 3640 fourth down plays in the dataset, only 13 (~0.3%) have NA values. Given their scarcity, then can easily be removed from the analysis. They are also good at returning the data as appropriate types. Only a few character types need to be converted to factors.
dim(fourth_down)
## [1] 3640 12
fourth_down <- fourth_down[!rowSums(is.na(fourth_down))>0,]
dim(fourth_down)
## [1] 3626 12
str(fourth_down)
## 'data.frame': 3626 obs. of 12 variables:
## $ posteam : chr "SF" "ARI" "ARI" "SF" ...
## $ ydstogo : num 3 10 7 5 3 9 1 5 9 2 ...
## $ yardline_100 : num 34 65 72 64 68 77 1 34 36 6 ...
## $ fixed_drive_result : chr "Field goal" "Punt" "Punt" "Punt" ...
## $ fixed_drive : num 1 2 4 5 7 8 9 10 11 13 ...
## $ fourth_down_converted : num 0 0 0 0 0 0 0 0 0 0 ...
## $ play_type : chr "field_goal" "punt" "punt" "punt" ...
## $ season : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
## $ special_teams_play : num 1 1 1 1 1 1 0 1 1 1 ...
## $ game_seconds_remaining: num 3416 3317 3219 3134 3022 ...
## $ qtr : num 1 1 1 1 1 1 2 2 2 2 ...
## $ score_differential : num 0 -3 -10 10 3 -3 3 -3 3 3 ...
fourth_down$posteam <- as.factor(fourth_down$posteam)
Looking at the summary data, we can also see that the values make sense in the context of football.
summary(fourth_down)
## posteam ydstogo yardline_100 fixed_drive_result
## WAS : 137 Min. : 1.000 Min. : 1.00 Length:3626
## CIN : 135 1st Qu.: 3.000 1st Qu.:28.00 Class :character
## PHI : 133 Median : 6.000 Median :51.00 Mode :character
## CHI : 130 Mean : 7.466 Mean :48.36
## NYG : 127 3rd Qu.:10.000 3rd Qu.:69.00
## NYJ : 127 Max. :42.000 Max. :99.00
## (Other):2837
## fixed_drive fourth_down_converted play_type season
## Min. : 1.00 Min. :0.00000 Length:3626 Min. :2020
## 1st Qu.: 6.00 1st Qu.:0.00000 Class :character 1st Qu.:2020
## Median :11.00 Median :0.00000 Mode :character Median :2020
## Mean :11.68 Mean :0.09983 Mean :2020
## 3rd Qu.:17.00 3rd Qu.:0.00000 3rd Qu.:2020
## Max. :30.00 Max. :1.00000 Max. :2020
##
## special_teams_play game_seconds_remaining qtr score_differential
## Min. :0.0000 Min. : 1 Min. :1.000 Min. :-45.000
## 1st Qu.:1.0000 1st Qu.: 755 1st Qu.:2.000 1st Qu.: -7.000
## Median :1.0000 Median :1684 Median :3.000 Median : 0.000
## Mean :0.8003 Mean :1680 Mean :2.596 Mean : -1.039
## 3rd Qu.:1.0000 3rd Qu.:2594 3rd Qu.:4.000 3rd Qu.: 5.000
## Max. :1.0000 Max. :3584 Max. :5.000 Max. : 45.000
##
All of the teams are present, the yards and yardlines make sense, the time values are reasonable, and everything else is well within reasonable bounds for what would be expected for football. This is a good dataset!
One of the first things to look at is how many more yards are there to go when teams are running their fourth down plays. That is, if they wanted to try to convert and keep possession of the ball, how many more yards would the team need to gain?
From the histogram of all fourth down plays, we can see that most fourth down plays are run with only a yard to go. We can also see that there there is a sharp drop off from 4th & 10 yards to go and the plays with more than ten yards to go. That makes some sense, as every set of downs starts off at 1st & 10. The only way the number of yards would increase is via penalty or negative plays.
ydstogo_tot <- fourth_down %>%
select(ydstogo) %>%
group_by(ydstogo) %>%
summarise(n=length(ydstogo), .groups='keep') %>%
data.frame()
xbreaks <- seq(1, max(fourth_down$ydstogo, by=1))
all_hist <- ggplot(fourth_down, aes(x=ydstogo)) +
geom_histogram(bins=max(fourth_down$ydstogo), color="black", fill="steelblue2") +
labs(title="Histogram of Yards to Go on 4th Down, 2020 Regular Season", x="Yards To Go", y="Number of Instances") +
theme_light() +
theme(plot.title = element_text(hjust=.5)) +
scale_x_continuous(breaks=xbreaks, labels=xbreaks) +
geom_label_repel(inherit.aes=FALSE,
data=ydstogo_tot,
aes(x=ydstogo, y=n, label= ifelse(n == max(n), n, "")),
box.padding = 2,
point.padding = 2,
nudge_x=1.5,
size=4,
color="Grey50",
segment.color="black") +
annotate("rect", xmin=40, xmax=43, ymin=-10, ymax=40, alpha=0.2) +
annotate("text", x=38, y=120, label= paste(sum(ydstogo_tot[ydstogo_tot$ydstogo>39, "n"]), "plays at more\nthan 40 yards to go"))
conv_attempt <- fourth_down %>%
filter(play_type %in% c("run", "pass")) %>%
select(ydstogo) %>%
data.frame()
conv_ydstogo_tot <- conv_attempt %>%
select(ydstogo) %>%
group_by(ydstogo) %>%
summarise(n=length(ydstogo), .groups='keep') %>%
data.frame()
conv_xbreaks <- seq(1, max(conv_attempt$ydstogo, by=1))
conv_hist <- ggplot(conv_attempt, aes(x=ydstogo)) +
geom_histogram(bins=max(conv_attempt$ydstogo), color="black", fill="steelblue2") +
labs(title="Histogram of Yards to Go on 4th Down, Conversion Attempts", x="Yards To Go", y="Number of Instances") +
theme_light() +
theme(plot.title = element_text(hjust=.5)) +
scale_x_continuous(breaks=conv_xbreaks, labels=conv_xbreaks) +
geom_label_repel(inherit.aes=FALSE,
data=conv_ydstogo_tot,
aes(x=ydstogo, y=n, label= ifelse(n == max(n), n, "")),
box.padding = 2,
point.padding = 2,
nudge_x=1,
size=4,
color="Grey50",
segment.color="black")
plot_grid(all_hist, conv_hist, nrow=2)
Whereas the number of yards to go on all fourth down plays was a spike at one yard, a fairly flat level up to ten yards, and then a gradual decline, when teams decide to convert, it is almost always with one yard to go. Occasionally it will also be with two yards to go, but after that teams are very unlikely to try to convert the fourth attempt. What is also interesting is that almost 60% of all 4th & 1 plays are attempts to convert. Teams actually are being aggressive in situations where they are likely to convert (one yard is usually easy to gain on a play) instead of chosing to act more conservatively and kick/punt the ball.
There are three options for a team on fourth down. A kick, a punt, or an attempt to convert. From these options are four result types. A punt, a field goal, a conversion of downs, or a failure. Below are pie charts, broken out by team, showing what the results of fourth down play selections are.
play_selection <- fourth_down %>%
filter(play_type != "no_play") %>%
select(posteam, play_type, fourth_down_converted, fixed_drive_result) %>%
mutate(my_play_type = ifelse(
fourth_down_converted==1, "Convert", ifelse(
fixed_drive_result=="Field goal", fixed_drive_result, ifelse(
fixed_drive_result=="Punt", fixed_drive_result, "Fail"
)
)
)
) %>%
group_by(posteam, my_play_type) %>%
summarize(n=length(fixed_drive_result), .groups='keep') %>%
group_by(posteam)%>%
mutate(percent_of_total=round(100*n/sum(n),1))%>%
ungroup() %>%
data.frame()
afc_teams <- c("BUF", "MIA", "NE", "NYJ", "BAL", "CIN", "CLE", "PIT", "HOU", "IND", "JAX", "TEN", "DEN", "KC", "LV", "LAC")
nfc_teams <- c("DAL", "NYG", "PHI", "WAS", "CHI", "DET", "GB", "MIN", "ATL", "CAR", "NO", "TB", "ARI", "LA", "SF", "SEA")
afc_pies <- ggplot(data=play_selection[play_selection$posteam %in% afc_teams,], aes(x="", y=n, fill=my_play_type)) +
geom_bar(stat="identity", position="fill", width=1) +
coord_polar(theta="y", start=0) +
theme_light()+
labs(fill="Play Type", x=NULL, y=NULL, title="4th Down Play Results by AFC Team") +
theme(plot.title=element_text(hjust=0.5),
axis.text=element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()) +
scale_fill_brewer(palette="RdYlBu") +
facet_wrap(~posteam, ncol=4, nrow=4)+
geom_text(aes(x=1.75, label=paste0(percent_of_total, "%")),
size=3,
position=position_fill(vjust=.5))
nfc_pies <- ggplot(data=play_selection[play_selection$posteam %in% nfc_teams,], aes(x="", y=n, fill=my_play_type)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y", start=0) +
theme_light()+
labs(fill="Play Type", x=NULL, y=NULL,title="4th Down Play Results by NFC Team") +
theme(plot.title=element_text(hjust=0.5),
axis.text=element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank())+
scale_fill_brewer(palette="RdYlBu") +
facet_wrap(~posteam, ncol=4, nrow=4)+
geom_text(aes(x=1.75, label=paste0(percent_of_total, "%")),
size=3,
position=position_fill(vjust=.5))
pie_grid <- plot_grid(nfc_pies + theme(legend.position = "none"), afc_pies + theme(legend.position = "none"), nrow=1)
legend <- get_legend(afc_pies + theme(legend.box.margin=margin(0,0,0,12)))
plot_grid(pie_grid, legend, rel_widths=c(2,.18))
We can see from this that teams typically punt about half the time, kick a field goal about a quarter of the time, and attempt to convert, with varying degrees of success, the rest of the time.
Of note, Seattle, San Francisco, NY Jets (NYJ) and Pittsburgh are the only four teams to punt over 60% of the time. San Francisco and the Jets had poor offenses in the 2020 season, so it isn’t surprising that they punted so often. What is surprising is that Pittsburgh and Seattle, two playoff teams with excellent wide receiver corps, punted so often. One would think that a team with great offensive players would not need to punt on so many of their drives, but perhaps that is not the case.
Carolina, Dallas, Minnesota, Philadelphia, and the LA Chargers (LAC), had the highest percentage of conversion attempts, all over 30%. All five of them were teams with a losing record in 2020. Teams that are trailing in games have to become more aggressive on fourth down, so that my explain why they lead the league in their rate of conversion attempts.
Cleveland, a good team that won 11 games enroute to the 6th seed in the AFC playoffs, appears to be fairly unlucky. One fifth of their fourth down plays resulted in a failure. That is second worst only to Jacksonville, which had the worst record in the league in 2020.
Knowing that on average teams punt on about half of all their fourth down plays, we can take a look at how many punts each team executed in the 2020 season. Punting can be seen as an indicator of the quality of a team’s offense. A good offense should be able to move the ball down the field and get close to the other team’s goal, increasing the chances of a field goal or touchdown. Good offenses should also be able to gain or very nearly gain ten yards in three plays. Bad offenses struggle to do those things, and so they have to punt more often.
Below is a chart showing the number of punts by each team in the regular season, ordered by number of punts and color coded with team colors.
punts <- fourth_down %>%
filter(play_type=="punt") %>%
select(fixed_drive, posteam, season, play_type) %>%
group_by(posteam, season) %>%
summarise(n=length(fixed_drive), .groups='keep') %>%
data.frame()
team_punts <- merge(x=punts, y=teams_colors_logos, by.x="posteam", by.y="team_abbr")
ggplot(team_punts, aes(x=reorder(posteam, n), y=n)) +
geom_bar(stat="identity", aes(fill=posteam)) +
coord_flip() +
labs(title="Punts by Team in 2020 NFL Regular Season",
x="Team", y="Number of Punts",
fill="Team") +
scale_color_manual(values=team_punts$team_color) +
scale_fill_manual(values=team_punts$team_color) +
theme_light() +
theme(plot.title = element_text(hjust=.5)) +
geom_text(data=team_punts, aes(x=posteam, y=n, label=n, fill=NULL),hjust=-0.1, size=4)
The NY Jets (NYJ), the team with the most punts, were one of the worst teams in 2020. Their offense ranked near the bottom of the league. It is no surprise that they had the most punts. The Buffalo Bills were one of the best offensive teams in 2020, so it is no surprise they had the fewest. Most of the order of this chart makes sense, with better offenses/more aggressive coaches to the right and worse offenses/more conservative coaches to the left.
The biggest surprise is the team with the second most punts, Pittsburgh. They have good to decent players on their offense and so it would be expected that they could move the ball and avoid punts. That was not the case. Maybe the coach of that team doesn’t trust his offense or has a more conservative philosophy. On the other end of the graph are the Las Vegas Raiders (LV). Their head coach is known to eschew the more progressive analytic thinking and use more conservative (i.e., punting) play calling. They had a slightly above average offense in 2020, but to see them have so few punts relative to the rest of the league is interesting. Perhaps their coach has turned over a new leaf.
When teams do decide to “go for it” and attempt to convert the fourth down, the number of yards to gain for the first down and the team’s position on the field play an important role in the team’s decision. Below is a heatmap showing the probability that a team makes a conversion attempt, by yards to go and yards from their opponent’s goal. Any play with more than 15 yards to go is truncated and marked as “15+”.
go_for_it <- fourth_down %>%
filter(play_type == "run" | play_type == "pass") %>%
select(yardline_100, ydstogo, play_type) %>%
mutate(my_ydstogo = ifelse(ydstogo > 15, 15, ydstogo),
my_yardline = cut(yardline_100, breaks=seq(0,100, by=5))
) %>%
group_by(my_ydstogo, my_yardline) %>%
summarize(n=length(play_type), .groups='keep')%>%
data.frame()
all_for_it <- fourth_down %>%
filter(play_type %in% c("run", "pass", "punt", "field_goal")) %>%
select(yardline_100, ydstogo, play_type) %>%
mutate(my_ydstogo = ifelse(ydstogo > 15, 15, ydstogo),
my_yardline = cut(yardline_100, breaks=seq(0,100, by=5))
) %>%
group_by(my_ydstogo, my_yardline) %>%
summarize(all_n=length(play_type), .groups='keep')%>%
data.frame()
total <- merge(go_for_it,all_for_it, by=c("my_ydstogo", "my_yardline"))
total$percent <- total$n/total$all_n
ylabs <- c("0-5", "5-10", "10-15", "15-20", "20-25", "25-30", "30-35", "35-40", "40-45",
"45-50", "50-55", "55-60", "60-65", "65-70", "70-75", "75-80", "80-85", "85-90",
"90-95")
ggplot(total, aes(x=my_ydstogo, y=my_yardline, fill=percent)) +
geom_tile(color="grey") +
geom_text(aes(label=scales::percent(accuracy=0.1, percent)), size=3)+
labs(title="4th Down Conversion Attempts Percentage\nby Goal Distance and Yards to Go\n2020 Regular Season",
x="Yards to go",
y="Yards from Opponent Endzone",
fill="Percentage of Plays\nThat Are Conversion\nAttempts",
caption="Any play of 15 or more yards to go labeled as 15+") +
theme_classic() +
theme(plot.title=element_text(hjust=0.5)) +
scale_x_continuous(labels=append(seq(1,14,by=1), "15+"), breaks=seq(1,15,by=1)) +
scale_y_discrete(labels=ylabs) +
scale_fill_continuous(low="blue", high="red", breaks=seq(0,1, by=.1))
As we can see, teams are very likely to go for it inside the five yard line, especially if there are only one or two yards to gain. The risk/reward decision of going for it in this situation is very heavily tilted towards the rewards of going for it. Teams will likely score a touchdown or field goal, and if they fail their opponent is over 95 yards away from scoring themselves. This is an easy decision for teams to make.
In fact, there are many instances of teams going for it with short yardage to gains. In those situations, especially when they only have one yard to gain, teams very often try to convert. There is a warm red strip on the left side of the heatmap. Up to about 55 yards from the opponent goal, teams are over 50% likely to attempt a conversion on 4th & 1.
When teams are far away from their opponent’s goal and/or they have many yards to gain for a first down, they are as apt to attempt to go for it. The risks of not converting and giving the ball to their opponent close to the goal they are defending is too great. As we see, when teams are 75 or more yards away from their opponent’s goal, there are also no fourth down conversion attempts, indicated by the white space. Likewise, at distances greater than five or so yards to go, teams very unlikely attempted a conversion.
If you look at the area around 30-35 and 50-55 yards to go, you’ll see that teams are going for it even if they have more than one yard to gain for a first down. This area of the field is right at the edge of most field goal kicker’s range. It is also too close to the opponent’s goal for a punt to have much of an effect on the opponent’s field position. In areas of the field futher away from the opponent’s goal, a punt makes more sense strategically. In areas of the field closer to the opponent’s endzone, a field goal is a much better choice. Given that this is an area of the field where the choice isn’t as straightforward, teams may think this is a good opportunity to be aggressive, and that shows in the chart.
Fourth down play type selection can also depend on the game situation. That is, teams may make different selections depending on how much time is left in the game and whether or not they are leading, trailing, or tied.
probs <- fourth_down %>%
filter(play_type %in% c("run", "pass", "punt", "field_goal") &qtr != 5) %>%
select(play_type, game_seconds_remaining, score_differential) %>%
mutate(minutes = ceiling(game_seconds_remaining/60),
my_playtype = ifelse(play_type %in% c("run", "pass"), "attempt", play_type),
leading = ifelse(score_differential > 0, "Leading", ifelse(score_differential <0, "Trailing","Tied"))) %>%
group_by(my_playtype, minutes, leading) %>%
summarise(n=length(game_seconds_remaining), .groups='keep') %>%
group_by(minutes, leading)%>%
mutate(percent_of_total=round(100*n/sum(n),1))%>%
ungroup() %>%
data.frame()
minute_levels <-seq(60, 1, by=-1)
minute_breaks <- seq(60, 0, by=-5)
probs$minutes <- factor(probs$minutes, levels=minute_levels)
probs$leading <- factor(probs$leading, levels=c("Leading", "Trailing", "Tied"))
ggplot(probs, aes(x=minutes, y=percent_of_total, group=my_playtype)) +
geom_point(aes(color=my_playtype), size=2) +
geom_line(aes(color=my_playtype), linetype="dotted") +
labs(title = "Probability of Play Type by Time Remaining, 2020 Regular Season",
x="Minutes Remaining",
y="Probability",
fill="Play Type",
caption="Overtime is excluded")+
theme_light() +
theme(plot.title=element_text(hjust=0.5)) +
geom_vline(xintercept=31, linetype="solid", size=1) +
scale_color_brewer(palette="Set2", name="Play Type") +
scale_x_discrete(labels=minute_breaks, breaks=minute_breaks ) +
facet_wrap(~leading, ncol=1)
In all score situations, at the beginning of a game or at the beginning of the second half, teams are very likely to punt. Even teams that are trailing or tied punt nearly 100% of the time at the beginning of the second half. Perhaps they think there is still enough time to outscore their opponent and do not worry about being aggressive and trying to keep the drive alive.
At almost any given time in a game, the leading team has a 50% or greater probablity to punt the ball away. Towards the end of the first half they increase their probability of kicking a field goal and padding the lead to around 25% and it remains there throughout most of the game. They almost never elect to attempt a conversion. Perhaps they are confident in their lead and the ability of their defense to hold the lead or their offense to regain it.
For trailing teams, punting in the most favored play type at the beginning of the halves, but it quickly drops down to about a 50%-60% probabilty. The probability of a field goal hovers around 25% for most of the game until the fourth quarter. With about 25 minutes left in the game, trailing teams start to increase their probability of attempting a fourth down conversion. Once there are about 10 minutes left in the game, trailing teams sharply increase the probability that they will attempt a conversion. Punting and giving the ball to the other team is not productive in this situation and teams need to be aggressive in this situation in order to keep drives alive and attempt to close the scoring gap. Interestingly, field goal probability dips as well. Maybe teams are trailing by more than a field goal, or maybe teams want to win outright with a touchdown instead of a field goal.
For tied teams, punting is still the dominant strategy, but field goals are increasingly likely. A three-point score will break a tie, so teams are less likely to attempt a conversion to continue and drive and will take a more “safe” field goal attempt. In fact, at the end of games where field goal attempts will win the game for a team, field goals become the dominant play type called.
A good way to think about fourth down play calls is that, in general, some of the worst teams in the league punt the most often and some of the best teams in the league punt the least. Some of the worst teams also tend to punt at a higher frequency than make one of the other play calls. However, teams generally punt about half the time they get to fourth down, so it is likely that bad teams simply get to fourth down more often than good teams.
In terms of strategy decisions, in some areas teams appear to be moving towards the more progressive way of thinking. Teams are attemping to convert a fourth down in short yardage situations. They are also attemping to convert in short yardage situations in areas of the field where a punt or field goal kick might not make the most sense. However, in other areas teams are choosing a more conservative strategy. Leading teams almost always punt on fourth down. If they were more aggressive and tried to extend drives, they may wind up extending leads and increase their chances of winning. Tied teams could also benefit from this strategy. Trailing teams are indeed aggressive in endgame situations, when they are most desparate to score. However, if they were more aggressive earlier in the game, that could lead to more scoring opportunities and perhaps they wouldn’t need to be so aggressive late in games.