Staggered Difference-in-Differences Design using R

Mark Bounthavong

28 July 2024

Introduction

In health policy analysis, evaluating the impact of an evidence-based program (EBP) in the real world is complicated due to variations in the timing of implementation. For instance, the United States (US) Department of Veterans Affairs (VA) implemented academic detailing, a peer-to-peer educational outreach delivered by trained clinicians to other clinicians, at different times across sites between 2010 and 2016. Some VA sites implemented academic detailing early, and some VA sites implemented academic detailing later. Adding to the issue is the potential for the treatment effect of the EBP to vary across time. For instance, VA sites that implemented academic detailing early may have a larger impact on their outcomes compared to VA sites that implemented academic detailing later. These issues introduce challenges to health policy analysts to evaluate an EBP due to the difference in implementation time and time-varying treatment effects.

We can think of this kind of situation as a “staggered” implementation of an EBP across time. Assuming that certain assumptions hold, we can take advantage of this situation to apply a difference-in-differences (DID) framework to estimate the average treatment effect of the treated (ATT). This “staggered DID” approach can provide us with an ATT estimate of the impact of the EBP across time where implementation time varies along with treatment effect.

There are several methods to perform a staggered DID, but this article will focus on the Callaway & Sant’Anna approach.[1]

Callaway & Sant’Anna staggered DID

The framework for the Callaway & Sant’Anna staggered DID is nicely presented in their paper.[1] The basic premise is that we can estimate the group-time average treatment effect by looking at the combinations of group \((g)\) and time \((t)\) across the time period. The group \((g)\) denotes the timing of the EBP implementation and time \((t)\) denotes the time period. For instance, the ATT at time = 4 \((t = 4)\) for an observation that implemented the EBP at time period = 2 \((g = 2)\), the \(ATT_{(g, t)}\) is denoted as \(ATT_{(g = 2, t = 4)}\)

Therefore, we can denote the ATT for observations for a particular group \((g)\) at a particular point in time \((t)\) as:

\[\begin{align*} ATT_{(g, t)} = E[{Y_t}(g) - Y_{t}(0) | G_{g} = 1], \end{align*}\]

where \({Y_t}(g)\) denotes the outcome for group \(g\), \(Y_{t}(0)\) denotes the outcome for the control, and \(G_{g}\) denotes a binary variable to indicate if the observation first implemented the EBP at time period \(g\) or \(G_{i,g} = 1 \lbrace G_{i} = g \rbrace\).

There are two important conditions about DID estimation:

Parallel trends: If there was no treatment, the trends between the two groups are the same.
No anticipation effect: In the periods before the adoption of the EBP, the average differences are zero between the two groups.

Assuming that these conditions apply, we can apply the staggered DID method proposed by Callaway & Sant’Anna (“cs staggered DID”).

The last step in the cs staggered DID approach requires an aggregation of the various group x time combinations. Once the \(ATT_{(g, t)}\) is estimated for each \(g\) and \(t\) combination, we will need to aggregate these into a single weighted \(ATT_{(g, t)}\) estimate. This is weighted by the group (when the implementation occurred) and time, \(w(g, t)\).

\[\begin{align*} ATT_{aggregate} = \sum_{g \in G} \sum_{t = 2}^{\tau} \omega(g, t)* ATT_{(g, t)} \end{align*}\]

(Note: This is a simplification of the Callaway & Sant’Anna staggered DID framework. It is highly recommended that interested readers review their paper.)

Motivating example

Let’s generate a simulated data that is in the long format and appropriate for a staggered DID design.

Make sure to load the did and panelView packages. The did package will be the main package to create our simulated data and to perform the staggered DID estimations. The panelView package will be used to visualize the variation in treatment implementation or adoption across time.

# Load libraries
library("did")
library("panelView")

Data generating process (DGP)

We will create a simulated data with 6 periods after the implementation of the hypothetical EBP with 1000 observations (units) setting the ipw option to TRUE and the reg option to TRUE. The ipw option allows for the data generating process to use inverse probability weights and the reg options allows the data generating process to use regressions.

I slightly modified the code provided by Callaway on his website to generate this data. We want to drop the observations where the EBP implementation occurred at first time period because these do not contribute to ATT estimation, \(ATT_{(g, t)}\).

# Set seed
set.seed(12345)

# Data generating process with 6 time periods after adoption and 1000 observations
dgp <- reset.sim(
                time.periods <- 6,
                n = 1000,
                ipw = TRUE,
                reg = TRUE
                )
dgp$te <- 0

# Add dynamic effects
dgp$te.e <- 1:time.periods

# Drop observations where the implementation of the EBP was on period 1. According to Callaway & Sant'Anna, these observations do not help in estimating the ATT(g, t). 
data1 <- build_sim_dataset(dgp)

# Generate the indicator for adoption across time. If the observation adopts the EBP, they will be coded as 1 at the time of adoption and 1 for all periods after. 
data1$long[data1$period < data1$G] = 0
data1$long[data1$period >= data1$G] = 1

# How many observations remained after dropping the observations that had adopted the EBP at time period = 1
nrow(data1)

## [1] 5124

head(data1)

## # A tibble: 6 × 8
##       G     X    id cluster period     Y treat  long
##   <dbl> <dbl> <int>   <int>  <dbl> <dbl> <dbl> <dbl>
## 1     2 0.709     1      21      1  1.08     1     0
## 2     2 0.709     1      21      2  4.42     1     1
## 3     2 0.709     1      21      3  6.04     1     1
## 4     2 0.709     1      21      4  8.20     1     1
## 5     2 0.709     1      21      5 14.1      1     1
## 6     2 0.709     1      21      6 13.7      1     1

There are several important variables in the data.

id: unique identifier for the observation (subject-level)
treat: Grouping variable (time-invariant)
long: Time-varying grouping variable
Y: outcome variable as a continuous data type
period: time variable ranging from 1 to 6
G: time when the adoption of the EBP occurred
X: covariate (will control for in the model)
cluster: group cluster (not needed for this tutorial)

Visualize treatment adoption patterns

We can visualize when the EBP was implemented or adopted using the panelView package. There are two ways to do this:

Method 1 - Visualize EBP adoption

### Method 1:
panelview(
          data = data1,
          formula = Y ~ long,            ## Use the formula option
          index = c("id", "period"),
          xlab = "Year",
          ylab = "Unit",
          display.all = FALSE,
          gridOff = TRUE,
          by.timing = TRUE,
          pre.post = FALSE
          )

Method 2 - Visualize EBP adoption

### Method 2:
panelview(
          data = data1,
          Y = "Y",                      ## Define the outcome 
          D = "long",                   ## Define the longitudinal exposure
          index = c("id", "period"),
          xlab = "Year",
          ylab = "Unit",
          display.all = FALSE,
          gridOff = TRUE,
          by.timing = TRUE,
          pre.post = FALSE
          )

Visualize the groups that get treated before and after EBP adoption

We can also visualize the adoption patterns by looking at this with the pre-post adoption periods.

### Using the outcomes
panelview(
          data = data1,
          formula = Y ~ long,            ## Use the formula option
          index = c("id", "period"),
          xlab = "Year",
          ylab = "Unit",
          display.all = FALSE,
          gridOff = TRUE,
          by.timing = TRUE,
          pre.post = TRUE,
          type = "outcome",
          by.cohort = TRUE,
          color = c("gray", "blue", "red")
)

## Specified colors in the order of "treated (pre)", "treated (post)", "control".
## Number of unique treatment histories: 6

Estimate group-time ATT

We will estimate the group-time average treatment effect of the treated (ATT) using the att_gt function. This output includes the group x time combinations. For instance, the ATT for the group when EBP was implemented at time period = 4 and at time period = 5, is 2.0280 (95% CI: 1.4894, 2.5667).

Not only do we get the various combinations between group x time effects, we also have the Wald test for pre-test of parallel trends, which is p = 0.88028 suggesting that the parallel trends assumption holds.

# Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime <- att_gt(yname = "Y",
                        tname = "period",
                        idname = "id",
                        gname = "G",
                        xformla = ~ X,
                        data = data1
)

# Summarize the results
summary(att_grouptime)

## 
## Call:
## att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
##     xformla = ~X, data = data1)
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## Group-Time Average Treatment Effects:
##  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
##      2    2   1.0889     0.1611        0.5951      1.5828 *
##      2    3   1.7708     0.1686        1.2539      2.2878 *
##      2    4   2.8899     0.1668        2.3785      3.4013 *
##      2    5   3.8359     0.1719        3.3089      4.3629 *
##      2    6   5.0540     0.1749        4.5176      5.5903 *
##      3    2   0.1012     0.1696       -0.4188      0.6212  
##      3    3   0.8082     0.1754        0.2703      1.3461 *
##      3    4   1.8602     0.1730        1.3296      2.3908 *
##      3    5   2.8019     0.1575        2.3188      3.2849 *
##      3    6   4.0831     0.1675        3.5695      4.5968 *
##      4    2  -0.0436     0.1562       -0.5226      0.4354  
##      4    3  -0.0378     0.1800       -0.5898      0.5141  
##      4    4   0.9011     0.1650        0.3951      1.4071 *
##      4    5   2.0280     0.1756        1.4894      2.5667 *
##      4    6   3.1085     0.1868        2.5356      3.6813 *
##      5    2   0.1178     0.1693       -0.4015      0.6371  
##      5    3  -0.2093     0.1849       -0.7764      0.3578  
##      5    4  -0.0799     0.1874       -0.6544      0.4946  
##      5    5   1.3767     0.1717        0.8502      1.9032 *
##      5    6   2.4207     0.1894        1.8399      3.0016 *
##      6    2   0.0663     0.1635       -0.4351      0.5678  
##      6    3  -0.2224     0.1992       -0.8333      0.3885  
##      6    4  -0.0099     0.1765       -0.5512      0.5313  
##      6    5   0.2411     0.1913       -0.3456      0.8278  
##      6    6   0.9987     0.1750        0.4620      1.5354 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## P-value for pre-test of parallel trends assumption:  0.88028
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

We can plot the results from the group-time average treatment effect. The ggdid function plots each of the groups based on when the EBP was implemented. In this example, there are 5 time points when the EBP could be implemented (time period = 2, 3, 4, 5, and 6). Recall that we dropped EBP implementation at time period 1 since those observations do not contributed to the ATT.

# plot the results
ggdid(att_grouptime, ylim = c(-1, 6)) # Standardize the y-axis range

Callaway & Sant’Anna DID estimator

We can visualize the ATTs across time for the different groups when the EBP is implemented at various time periods. However, we want to aggregate all of these various subexperiments into a single weighted ATT. To do that, we will need to use the aggte function. In total, we have 5 such groups when the EBP was implemented (time periods 2, 3, 4, 5, and 6). Moreover, the aggte function will weight each of these by group (when the implementation occurred) and time, \(w(g, t)\). By default, the att_gt function uses the double robust estimation method. (Note: The standard errors from the cs staggered DID method are estimated using bootstrapping methods.)

### Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime <- att_gt(yname = "Y",
                        tname = "period",
                        idname = "id",
                        gname = "G",
                        xformla = ~ X,
                        data = data1
                        )

### Aggregate the ATT for the different groups with various implementation time periods. 
att_aggregate <- aggte(att_grouptime, type = "dynamic")
summary(att_aggregate)

## 
## Call:
## aggte(MP = att_grouptime, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  2.9992        0.0875     2.8277      3.1706 *
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
##          -4   0.0663     0.1789       -0.4241      0.5567  
##          -3  -0.0523     0.1045       -0.3386      0.2341  
##          -2  -0.0883     0.0784       -0.3031      0.1264  
##          -1   0.0557     0.0680       -0.1306      0.2420  
##           0   1.0436     0.0603        0.8783      1.2090 *
##           1   2.0189     0.0760        1.8105      2.2273 *
##           2   2.9352     0.0964        2.6710      3.1994 *
##           3   3.9441     0.1184        3.6196      4.2686 *
##           4   5.0540     0.1665        4.5977      5.5102 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Based on the cs staggered DID, the overall ATT after aggregation is 2.9992 (95% CI: 2.8277, 3.1706). In other words, the implementation of the EBP was significantly associated with an increase of 2.9992 units over the entire time period.

We can plot the result of the cs staggered DID using the ggdid function. The x-axis denotes the length of the exposure or the time since implementation of the EBP. At time period = 0, this is when the observation implemented the EBP. At time period = -1, this is the period before the implementation of the EBP.

Visually, the parallel trends seems to hold based on the point estimates in the pre-implementation period is at or near zero and not statistically significant. Overall, the ATT in the periods after implementation of EBP are positive and statistically significant.

## Plot the cs staggered DID results
ggdid(att_aggregate)

Note: The comparison group is “not ever treated.” However, one can also use “not yet treated” as the control group, which will include both the “not ever treated” and “not yet treated” control groups. This can be done using the control_group option.

### Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime_notyet <- att_gt(yname = "Y",
                        tname = "period",
                        idname = "id",
                        gname = "G",
                        xformla = ~ X,
                        control_group = "notyettreated",
                        data = data1
                        )

### Aggregate the ATT for the different groups with various implementation time periods. 
att_aggregate_notyet <- aggte(att_grouptime_notyet, type = "dynamic")
summary(att_aggregate_notyet)

## 
## Call:
## aggte(MP = att_grouptime_notyet, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  3.0041        0.0788     2.8497      3.1585 *
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
##          -4  -0.0402     0.1212       -0.3697      0.2892  
##          -3   0.0131     0.0899       -0.2311      0.2573  
##          -2  -0.0698     0.0878       -0.3085      0.1688  
##          -1   0.0793     0.0744       -0.1228      0.2813  
##           0   1.0302     0.0674        0.8469      1.2134 *
##           1   2.0456     0.0758        1.8396      2.2515 *
##           2   2.9693     0.0899        2.7251      3.2135 *
##           3   3.9216     0.1090        3.6253      4.2179 *
##           4   5.0540     0.1834        4.5557      5.5523 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

ggdid(att_aggregate_notyet)

The differences in ATT between the two types of controls are not very different (see Table).

Table: Comparision between “not ever treated” and “not yet treated”
Not ever treated	Not yet treated
2.999	3.004

Note: It is recommended to use the “Not ever treated” as the default.

Conclusions

This is a simple overview of the Callaway & Sant’Anna staggered DID approach. It serves as a very useful framework when dealing with health policy research where the implementation of EBP varies across the time period along with variations in the treatment effect. Future articles on this subject will be forthcoming as I learn more about these various approaches to staggered difference-in-differences estimations.

References

An excellent paper by Callaway & Sant’Anna on staggered difference-in-differences framework

Callaway B, Sant’Anna PHC. Difference-in-differences with multiple time periods. J Econom. 2021;225(2):200-230.

Wing and colleagues provide an excellent background on the issues and approaches to evaluating a staggered DID design.[2]

Wing C, Yozwiak M, Hollingsworth A, Freedman S, Simon K. Designing difference-in-difference studies with staggered treatment adoption: Key concepts and practical guidelines. Annu Rev Public Health. 2024;45:485-505.

The Stata version of the Callaway & Sant’Anna staggered DID approach was presented at the Stata 2021 conference. The slides for their presentation is avaialble here.

Websites

Callaway has a website that provides vignettes on how to use the did package. Callaway also has GitHub site where one can report any bugs and learn about updates to the did package. This is where I learned to use the did package to perform the Callaway & Sant’Anna staggered DID approach.

Another website that I used to help write this article was by the Tilburg Science Hub. They wrote a great article using the Callaway & Sant’Anna staggered DID using an example. Their article can be viewed here.

A video by Sant’Anna on the staggered DID framework was a great resource and available here.

Disclaimers and Disclosures

This is for educational purposes only.

This is a work in progress and may be updated in the future.