Introduction
In health policy analysis, evaluating the impact of an evidence-based program (EBP) in the real world is complicated due to variations in the timing of implementation. For instance, the United States (US) Department of Veterans Affairs (VA) implemented academic detailing, a peer-to-peer educational outreach delivered by trained clinicians to other clinicians, at different times across sites between 2010 and 2016. Some VA sites implemented academic detailing early, and some VA sites implemented academic detailing later. Adding to the issue is the potential for the treatment effect of the EBP to vary across time. For instance, VA sites that implemented academic detailing early may have a larger impact on their outcomes compared to VA sites that implemented academic detailing later. These issues introduce challenges to health policy analysts to evaluate an EBP due to the difference in implementation time and time-varying treatment effects.
We can think of this kind of situation as a “staggered” implementation of an EBP across time. Assuming that certain assumptions hold, we can take advantage of this situation to apply a difference-in-differences (DID) framework to estimate the average treatment effect of the treated (ATT). This “staggered DID” approach can provide us with an ATT estimate of the impact of the EBP across time where implementation time varies along with treatment effect.
There are several methods to perform a staggered DID, but this article will focus on the Callaway & Sant’Anna approach.[1]
Callaway & Sant’Anna staggered DID
The framework for the Callaway & Sant’Anna staggered DID is nicely presented in their paper.[1] The basic premise is that we can estimate the group-time average treatment effect by looking at the combinations of group \((g)\) and time \((t)\) across the time period. The group \((g)\) denotes the timing of the EBP implementation and time \((t)\) denotes the time period. For instance, the ATT at time = 4 \((t = 4)\) for an observation that implemented the EBP at time period = 2 \((g = 2)\), the \(ATT_{(g, t)}\) is denoted as \(ATT_{(g = 2, t = 4)}\)
Therefore, we can denote the ATT for observations for a particular group \((g)\) at a particular point in time \((t)\) as:
\[\begin{align*} ATT_{(g, t)} = E[{Y_t}(g) - Y_{t}(0) | G_{g} = 1], \end{align*}\]
where \({Y_t}(g)\) denotes the outcome for group \(g\), \(Y_{t}(0)\) denotes the outcome for the control, and \(G_{g}\) denotes a binary variable to indicate if the observation first implemented the EBP at time period \(g\) or \(G_{i,g} = 1 \lbrace G_{i} = g \rbrace\).
There are two important conditions about DID estimation:
Parallel trends: If there was no treatment, the trends between the two groups are the same.
No anticipation effect: In the periods before the adoption of the EBP, the average differences are zero between the two groups.
Assuming that these conditions apply, we can apply the staggered DID method proposed by Callaway & Sant’Anna (“cs staggered DID”).
The last step in the cs staggered DID approach requires an
aggregation of the various group x time
combinations. Once
the \(ATT_{(g, t)}\) is estimated for
each \(g\) and \(t\) combination, we will need to aggregate
these into a single weighted \(ATT_{(g,
t)}\) estimate. This is weighted by the group (when the
implementation occurred) and time, \(w(g,
t)\).
\[\begin{align*} ATT_{aggregate} = \sum_{g \in G} \sum_{t = 2}^{\tau} \omega(g, t)* ATT_{(g, t)} \end{align*}\]
(Note: This is a simplification of the Callaway & Sant’Anna staggered DID framework. It is highly recommended that interested readers review their paper.)
Motivating example
Let’s generate a simulated data that is in the long format and appropriate for a staggered DID design.
Make sure to load the did
and panelView
packages. The did
package will be the main package to
create our simulated data and to perform the staggered DID estimations.
The panelView
package will be used to visualize the
variation in treatment implementation or adoption across time.
# Load libraries
library("did")
library("panelView")
Data generating process (DGP)
We will create a simulated data with 6 periods after the
implementation of the hypothetical EBP with 1000 observations (units)
setting the ipw
option to TRUE
and the
reg
option to TRUE
. The ipw
option allows for the data generating process to use inverse probability
weights and the reg
options allows the data generating
process to use regressions.
I slightly modified the code provided by Callaway on his website to generate this data. We want to drop the observations where the EBP implementation occurred at first time period because these do not contribute to ATT estimation, \(ATT_{(g, t)}\).
# Set seed
set.seed(12345)
# Data generating process with 6 time periods after adoption and 1000 observations
dgp <- reset.sim(
time.periods <- 6,
n = 1000,
ipw = TRUE,
reg = TRUE
)
dgp$te <- 0
# Add dynamic effects
dgp$te.e <- 1:time.periods
# Drop observations where the implementation of the EBP was on period 1. According to Callaway & Sant'Anna, these observations do not help in estimating the ATT(g, t).
data1 <- build_sim_dataset(dgp)
# Generate the indicator for adoption across time. If the observation adopts the EBP, they will be coded as 1 at the time of adoption and 1 for all periods after.
data1$long[data1$period < data1$G] = 0
data1$long[data1$period >= data1$G] = 1
# How many observations remained after dropping the observations that had adopted the EBP at time period = 1
nrow(data1)
## [1] 5124
head(data1)
## # A tibble: 6 × 8
## G X id cluster period Y treat long
## <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 2 0.709 1 21 1 1.08 1 0
## 2 2 0.709 1 21 2 4.42 1 1
## 3 2 0.709 1 21 3 6.04 1 1
## 4 2 0.709 1 21 4 8.20 1 1
## 5 2 0.709 1 21 5 14.1 1 1
## 6 2 0.709 1 21 6 13.7 1 1
There are several important variables in the data.
id
: unique identifier for the observation (subject-level)treat
: Grouping variable (time-invariant)long
: Time-varying grouping variableY
: outcome variable as a continuous data typeperiod
: time variable ranging from 1 to 6G
: time when the adoption of the EBP occurredX
: covariate (will control for in the model)cluster
: group cluster (not needed for this tutorial)
Visualize treatment adoption patterns
We can visualize when the EBP was implemented or adopted using the
panelView
package. There are two ways to do this:
Method 1 - Visualize EBP adoption
### Method 1:
panelview(
data = data1,
formula = Y ~ long, ## Use the formula option
index = c("id", "period"),
xlab = "Year",
ylab = "Unit",
display.all = FALSE,
gridOff = TRUE,
by.timing = TRUE,
pre.post = FALSE
)
Method 2 - Visualize EBP adoption
### Method 2:
panelview(
data = data1,
Y = "Y", ## Define the outcome
D = "long", ## Define the longitudinal exposure
index = c("id", "period"),
xlab = "Year",
ylab = "Unit",
display.all = FALSE,
gridOff = TRUE,
by.timing = TRUE,
pre.post = FALSE
)
Visualize the groups that get treated before and after EBP adoption
We can also visualize the adoption patterns by looking at this with the pre-post adoption periods.
### Using the outcomes
panelview(
data = data1,
formula = Y ~ long, ## Use the formula option
index = c("id", "period"),
xlab = "Year",
ylab = "Unit",
display.all = FALSE,
gridOff = TRUE,
by.timing = TRUE,
pre.post = TRUE,
type = "outcome",
by.cohort = TRUE,
color = c("gray", "blue", "red")
)
## Specified colors in the order of "treated (pre)", "treated (post)", "control".
## Number of unique treatment histories: 6
Estimate group-time ATT
We will estimate the group-time average treatment effect of the
treated (ATT) using the att_gt
function. This output
includes the group x time
combinations. For instance, the
ATT for the group when EBP was
implemented at time period = 4
and at
time period = 5
, is 2.0280 (95% CI: 1.4894, 2.5667).
Not only do we get the various combinations between
group x time
effects, we also have the Wald test for
pre-test of parallel trends, which is p = 0.88028
suggesting that the parallel trends assumption holds.
# Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime <- att_gt(yname = "Y",
tname = "period",
idname = "id",
gname = "G",
xformla = ~ X,
data = data1
)
# Summarize the results
summary(att_grouptime)
##
## Call:
## att_gt(yname = "Y", tname = "period", idname = "id", gname = "G",
## xformla = ~X, data = data1)
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
## Group-Time Average Treatment Effects:
## Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
## 2 2 1.0889 0.1611 0.5951 1.5828 *
## 2 3 1.7708 0.1686 1.2539 2.2878 *
## 2 4 2.8899 0.1668 2.3785 3.4013 *
## 2 5 3.8359 0.1719 3.3089 4.3629 *
## 2 6 5.0540 0.1749 4.5176 5.5903 *
## 3 2 0.1012 0.1696 -0.4188 0.6212
## 3 3 0.8082 0.1754 0.2703 1.3461 *
## 3 4 1.8602 0.1730 1.3296 2.3908 *
## 3 5 2.8019 0.1575 2.3188 3.2849 *
## 3 6 4.0831 0.1675 3.5695 4.5968 *
## 4 2 -0.0436 0.1562 -0.5226 0.4354
## 4 3 -0.0378 0.1800 -0.5898 0.5141
## 4 4 0.9011 0.1650 0.3951 1.4071 *
## 4 5 2.0280 0.1756 1.4894 2.5667 *
## 4 6 3.1085 0.1868 2.5356 3.6813 *
## 5 2 0.1178 0.1693 -0.4015 0.6371
## 5 3 -0.2093 0.1849 -0.7764 0.3578
## 5 4 -0.0799 0.1874 -0.6544 0.4946
## 5 5 1.3767 0.1717 0.8502 1.9032 *
## 5 6 2.4207 0.1894 1.8399 3.0016 *
## 6 2 0.0663 0.1635 -0.4351 0.5678
## 6 3 -0.2224 0.1992 -0.8333 0.3885
## 6 4 -0.0099 0.1765 -0.5512 0.5313
## 6 5 0.2411 0.1913 -0.3456 0.8278
## 6 6 0.9987 0.1750 0.4620 1.5354 *
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## P-value for pre-test of parallel trends assumption: 0.88028
## Control Group: Never Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
We can plot the results from the group-time average treatment effect.
The ggdid
function plots each of the groups based on when
the EBP was implemented. In this example, there are 5 time points when
the EBP could be implemented (time period = 2, 3, 4, 5, and 6). Recall
that we dropped EBP implementation at time period 1 since those
observations do not contributed to the ATT.
# plot the results
ggdid(att_grouptime, ylim = c(-1, 6)) # Standardize the y-axis range
Callaway & Sant’Anna DID estimator
We can visualize the ATTs across time for the different groups when
the EBP is implemented at various time periods. However, we want to
aggregate all of these various subexperiments into a single weighted
ATT. To do that, we will need to use the aggte
function. In
total, we have 5 such groups when the EBP was implemented (time periods
2, 3, 4, 5, and 6). Moreover, the aggte
function will
weight each of these by group (when the implementation occurred) and
time, \(w(g, t)\). By default, the
att_gt
function uses the double robust estimation method.
(Note: The standard errors from the cs staggered DID method are
estimated using bootstrapping methods.)
### Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime <- att_gt(yname = "Y",
tname = "period",
idname = "id",
gname = "G",
xformla = ~ X,
data = data1
)
### Aggregate the ATT for the different groups with various implementation time periods.
att_aggregate <- aggte(att_grouptime, type = "dynamic")
summary(att_aggregate)
##
## Call:
## aggte(MP = att_grouptime, type = "dynamic")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on event-study/dynamic aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 2.9992 0.0875 2.8277 3.1706 *
##
##
## Dynamic Effects:
## Event time Estimate Std. Error [95% Simult. Conf. Band]
## -4 0.0663 0.1789 -0.4241 0.5567
## -3 -0.0523 0.1045 -0.3386 0.2341
## -2 -0.0883 0.0784 -0.3031 0.1264
## -1 0.0557 0.0680 -0.1306 0.2420
## 0 1.0436 0.0603 0.8783 1.2090 *
## 1 2.0189 0.0760 1.8105 2.2273 *
## 2 2.9352 0.0964 2.6710 3.1994 *
## 3 3.9441 0.1184 3.6196 4.2686 *
## 4 5.0540 0.1665 4.5977 5.5102 *
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Never Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
Based on the cs staggered DID, the overall ATT after aggregation is 2.9992 (95% CI: 2.8277, 3.1706). In other words, the implementation of the EBP was significantly associated with an increase of 2.9992 units over the entire time period.
We can plot the result of the cs staggered DID using the
ggdid
function. The x-axis denotes the length of the
exposure or the time since implementation of the EBP. At time period =
0, this is when the observation implemented the EBP. At time period =
-1, this is the period before the implementation of the EBP.
Visually, the parallel trends seems to hold based on the point estimates in the pre-implementation period is at or near zero and not statistically significant. Overall, the ATT in the periods after implementation of EBP are positive and statistically significant.
## Plot the cs staggered DID results
ggdid(att_aggregate)
Note: The comparison group is “not ever treated.” However, one can
also use “not yet treated” as the control group, which will include both
the “not ever treated” and “not yet treated” control groups. This can be
done using the control_group
option.
### Estimate the group-time ATT of implementing the EBP controlling for the X covariate
att_grouptime_notyet <- att_gt(yname = "Y",
tname = "period",
idname = "id",
gname = "G",
xformla = ~ X,
control_group = "notyettreated",
data = data1
)
### Aggregate the ATT for the different groups with various implementation time periods.
att_aggregate_notyet <- aggte(att_grouptime_notyet, type = "dynamic")
summary(att_aggregate_notyet)
##
## Call:
## aggte(MP = att_grouptime_notyet, type = "dynamic")
##
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
##
##
## Overall summary of ATT's based on event-study/dynamic aggregation:
## ATT Std. Error [ 95% Conf. Int.]
## 3.0041 0.0788 2.8497 3.1585 *
##
##
## Dynamic Effects:
## Event time Estimate Std. Error [95% Simult. Conf. Band]
## -4 -0.0402 0.1212 -0.3697 0.2892
## -3 0.0131 0.0899 -0.2311 0.2573
## -2 -0.0698 0.0878 -0.3085 0.1688
## -1 0.0793 0.0744 -0.1228 0.2813
## 0 1.0302 0.0674 0.8469 1.2134 *
## 1 2.0456 0.0758 1.8396 2.2515 *
## 2 2.9693 0.0899 2.7251 3.2135 *
## 3 3.9216 0.1090 3.6253 4.2179 *
## 4 5.0540 0.1834 4.5557 5.5523 *
## ---
## Signif. codes: `*' confidence band does not cover 0
##
## Control Group: Not Yet Treated, Anticipation Periods: 0
## Estimation Method: Doubly Robust
ggdid(att_aggregate_notyet)
The differences in ATT between the two types of controls are not very different (see Table).
Not ever treated | Not yet treated |
---|---|
2.999 | 3.004 |
Note: It is recommended to use the “Not ever treated” as the default.
Conclusions
This is a simple overview of the Callaway & Sant’Anna staggered DID approach. It serves as a very useful framework when dealing with health policy research where the implementation of EBP varies across the time period along with variations in the treatment effect. Future articles on this subject will be forthcoming as I learn more about these various approaches to staggered difference-in-differences estimations.
References
An excellent paper by Callaway & Sant’Anna on staggered difference-in-differences framework
- Callaway B, Sant’Anna PHC. Difference-in-differences with multiple time periods. J Econom. 2021;225(2):200-230.
Wing and colleagues provide an excellent background on the issues and approaches to evaluating a staggered DID design.[2]
- Wing C, Yozwiak M, Hollingsworth A, Freedman S, Simon K. Designing difference-in-difference studies with staggered treatment adoption: Key concepts and practical guidelines. Annu Rev Public Health. 2024;45:485-505.
The Stata
version of the Callaway & Sant’Anna
staggered DID approach was presented at the Stata 2021 conference. The
slides for their presentation is avaialble here.
Websites
Callaway has a website
that provides vignettes on how to use the did
package.
Callaway also has GitHub
site where one can report any bugs and learn about updates to the
did
package. This is where I learned to use the
did
package to perform the Callaway & Sant’Anna
staggered DID approach.
Another website that I used to help write this article was by the Tilburg Science Hub. They wrote a great article using the Callaway & Sant’Anna staggered DID using an example. Their article can be viewed here.
A video by Sant’Anna on the staggered DID framework was a great resource and available here.
Disclaimers and Disclosures
This is for educational purposes only.
This is a work in progress and may be updated in the future.