1 How to use this website

1.1 Citations

This website was developed by Corinne Riddell and Dana Goin, epidemiologists at the University of California, Berkeley (UC Berkeley) and San Francisco (UCSF), respectively. If you use this website to inform your research, please use the following citations:

  1. Goin DE, Riddell CA (2022). Comparing two-way fixed effects and new estimators for difference-in-differences: A simulation study and empirical example. Epidemiology. 2023;34(4):535-543. doi: 10.1097/EDE.0000000000001611. Link.

  2. Riddell CA, Goin DE (2022). Guide for comparing estimators of policy change effects on health [Letter to the Editor]. Epidemiology. 2023;34(3):e21-22. doi:10.1097/EDE.0000000000001586. Link.

On this website we refer to new estimators and tools from the econometric literature. If you use any of the new estimators or tools, you will also want to cite the following:

  • Paper about the Goodman-Bacon Decomposition: Goodman-Bacon A. Difference-in-differences with variation in treatment timing. J Econom. 2021;225(2):254-77.

  • Paper about the Group-Time ATT Estimator: Callaway B, Sant’Anna PHC. Difference-in-Differences with multiple time periods. J Econom. 2020;225(2):200-30.

  • Paper about the Cohort ATT Estimator: Sun L, Abraham S. Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. J Econom. 2021;225(2):175–99.

  • Paper about the Target Trial Estimator: Ben-Michael E, Feller A, Stuart EA. A Trial Emulation Approach for Policy Evaluations with Group-level Longitudinal Data. Epidemiology. 2021 Jul;32(4):533–40.

1.2 License

Guide to DID estimators © 2022 by Corinne A Riddell and Dana E Goin is licensed under the Creative Commons Attribution-Non-Commercial-NoDerivatives 4.0 License.

1.3 Contribute fixes and improvements

This website is under development. If you find an error, or would like to suggest an improvement, please let us know by opening an issue on Github or submitting a bug report to: c.riddell [AT] berkeley [DOT] edu.

Thank you to @scottlyden for contributing a fix via pull request.

2 Overview

This guide illustrates scenarios in which difference in differences (DID) analysis can be applied. We assume readers know what a DID design is, have some background knowledge in policy analysis, and have working knowledge of R and RStudio. Reviewing the manuscript by Goin and Riddell and its references is a great place to start if this guide leaves you with more questions than answers.

We start with the simplest of DID scenarios and slowly amp up the complexity. For each scenario we describe a target parameter to estimate and the parameter estimated by a two-way fixed effects (TWFE) model. We apply the Goodman-Bacon decomposition to this parameter to determine if the TWFE estimate is influenced by estimates that are “forbidden” (e.g., ones that compare a newly treated state to a previously treated state). The goal is to illustrate when the usual TWFE method of estimation provides suitable results and when the TWFE approach is biased and/or aggregates the individual ATTs in an unintuitive way. In the latter case, we show alternative methods to estimation to overcome these issues.

In the following examples, the effect of state policy changes on a health outcome is the effect of interest.

2.1 Download data and code

To get a local version of the data and code used for this analysis, you can run the following code within RStudio. This will download a local copy of all the files contained in the GitHub repository “Guide-to-DID-estimators” on Corinne Riddell’s GitHub.

install.packages("usethis") #run this line if you need to install the usethis package
usethis::use_course("corinne-riddell/Guide-to-DID-estimators")

By default, this creates a new folder on your desktop. You can specify a different directory using the destdir argument in the use_course() function.

The files will open in RStudio when you run the code. In the future, you can open the Guide-to-DID-estimators.Rproj file found in the downloaded folder by double clicking it or by using the Select “File” > “Open Project” within RStudio and choosing the Guide-to-DID-estimators project.

2.2 Load R packages and functions

First, load the packages for reading and plotting the data. The packages bacondecomp,
did, and staggered are specific to DID analyses.

# you will need to install these packages if you don't have them already
# install.packages(c("here", "readxl", "tidyverse", "patchwork", "magrittr", 
# "broom", "ggrepel", "lubridate", "bacondecomp", "staggered", "did"))
# You may also need to download the developer version of `DRDID` before if you
# get an error. To do so use: 
# install.packages("devtools")
# devtools::install_github("pedrohcgs/DRDID")
# Finally, install the package emo from Hadley Wickham's GitHub:
# devtools::install_github("hadley/emo")

library(here) #nice file paths
library(readxl) #read in excel data
library(tidyverse) #collection of packages for data science
library(patchwork) #"stiches" together ggplots
library(magrittr) #pipes
library(broom) #tidy displays
library(ggrepel) #for labelling points on a ggplot
library(lubridate) #helps with date objects
library(bacondecomp) #Goodman Bacon Decomposition
library(did) #Callaway and Sant'Anna estimator for DID
library(staggered) #Sun and Abraham estimator for DID
library(emo) #used to insert emojis into this document

One estimator we use in the guide, the Target Trial estimator, was implemented using code revised from the creator’s GitHub repository. We use source() to read in the functions below. This code will only work if you downloaded the files using the use_course() function in the previous step.

source("./helper_func_ed.R")
source("./helper_func_ed_sum.R")
source("./helper_func_ed_sum_hte.R")
source("./helper_func_ed_sum_C.R")

The first file is a slightly edited version of the original helper_funcs.R file found in Eli Ben-Michael’s linked GitHub repository. The remaining helper functions are ones that Dana Goin made to aggregate the estimates under different settings.

2.3 Package Versions

Since the packages used for these analyses are relatively new, we anticipate them to change over time and that these changes may affect the readers ability to reproduce the results. For these analyses we used bacondecomp version 0.1.1, staggered version 1.1 and did version 2.1.2. If your results differ or you get an error running any of the code that uses functions from those packages, then it may be because you are using different versions of the packages.

packageVersion("bacondecomp")
## [1] '0.1.1'
packageVersion("staggered")
## [1] '1.2.1'
packageVersion("did")
## [1] '2.1.2'

3 Scenario 1 (2 states by 2 time points)

We first consider the simplest case in which there is one state that never adopts the policy (the never-treated group), and one state that does implement the policy (the treated group). The groups are observed at two time periods only.

Below, the data are read in from an Excel spreadsheet. You can open the spreadsheet in Excel to view it if that comes naturally to you, or, use an R View()er window after you have imported it. The data contain four variables state, time, policy, and outcome.

  • state: ID for the grouping variable
  • time: Time index, begins at 1
  • policy: Indicator variable for exposure to the policy change
  • outcome: The outcome variable
s1 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen1",
                col_types = c("text", "text", "text", "numeric"))
# str(s1)
# View(s1)

Note that I specified the column types for each variable. I read in the state, time, and policy variables as “text” so that R will consider them as categorical/factor variables in the analysis (or, using econ-speak, as “fixed effects”).

3.1 Visualization of Scenario 1

The following can be read from the labelled plot:

  • Pre-treatment difference of 1
  • Post-treatment difference of 3
  • Causal effect of policy = 3-1 = 2 or equivalently,
  • State 1 difference of 5
  • State 2 difference of 3
  • Causal effect of policy = 5-3 = 2 under the following assumptions

Three standard assumptions are typical stated in DID papers and required for the DID estimate to estimate the causal effect of interest:

  1. Parallel trends Had the treated group not been treated, the trends for the treated state would continue to be the same as the never-treated state during the post treatment time period.
  2. No anticipation: The effect of the policy change begins after it is introduced; there is no anticipatory effect before the policy is introduced. Anticipatory effects occur if individuals change their behavior in anticipation of a policy change.
  3. No coincident other changes: The policy is the only change introduced at the specific time it is adopted that causally affects the outcome.

If you’re familiar with causal inference theory, you will also know about the main assumptions required when estimating causal effects. Ben-Michael et al. list these assumptions in their paper, and we include the assumptions here for reference:

  1. Consistency: There are no multiple versions of treatment that are unknown to the investigator. For example, if states introduced a texting ban while driving and these bans were associated with different penalties, then the policy change would not meet the consistency assumption because the different penalties may have varied effects on the outcome and the investigator is analyzing these policies as though they were the same. However, if the investigator knows that the penalties differed and estimated the effects accounting for this heterogeneity, then the consistency assumption would not be violated.

  2. No interference: Policies that affect individuals in one state should not affect the outcomes of individuals in other states.

  3. Exchangeability: Had the treated states been untreated, they would have experienced the trend in outcomes experienced by the untreated group. This causal assumption is the same as the parallel trends assumption.

  4. Positivity: Each state has a non-zero probability of being treated. According to Ben-Michael et al, this assumption is not required for standard DID models.

  5. Correct model specification: The parametric model accurately represents the underlying causal model.

Lastly, Athens and Imbens outline assumptions required for the standard DID estimator to be an unbiased estimator of a certain weighted average causal effect, including random assignment of the policy’s introduction date.

3.2 Two-way fixed effects (TWFE) regression model

Under the canonical TWFE design, we can estimate the policy effect by including a fixed effect (FE) for state, a FE for time, and an indicator for the policy change. The indicator should be 1 for when the treated states are in the post-treatment period, and 0 otherwise. The effect estimate is the regression coefficient on the policy variable.

s1_mod <- lm(outcome ~ state + time + policy, data = s1)
tidy(s1_mod)
## # A tibble: 4 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)     9          NaN       NaN     NaN
## 2 state2          1.00       NaN       NaN     NaN
## 3 time2           3          NaN       NaN     NaN
## 4 policy1         2.00       NaN       NaN     NaN

The coefficient on the policy term is 2 showing that the regression model estimate equalled the causal effect.

Note: that in this scenario and the other simple scenarios we cover, there is no variability in the data, so the model cannot estimate a standard error (as indicated by the NaN in the regression output). In a later scenario, we add noise to the data and show how to estimate the standard error.

3.3 Summary

When there are only two treatment groups and two time points, where one of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression.

4 Scenario 2A and 2B (3 states, homogeneous and heterogeneous treatment effects)

This scenario is similar to Scenario 1, except now there are two states that undergo treatment. In Scenario 2a, the magnitude of the treatment effect is the same in both states. In Scenario 2b, the magnitude of the treatment effect varies by state.

s2a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2a",  
                 col_types = c("text", "text", "text", "numeric"))

s2b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2b",
                  col_types = c("text", "text", "text", "numeric"))

s2a %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
                                     state == "1" ~ "never-treated"))

s2b %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
                                     state == "1" ~ "never-treated"))

4.1 Visualization of Scenario 2

The following can be read from the labelled plot for Scenario 2a:

  • Post-Pre Difference for never-treated state 1: 12-9=3
  • Post-Pre Difference for treated state 2: 15-10=5
  • Post-Pre Difference for treated state 3: 16-11=5
  • Causal effect of policy = 5-3 for both states for an average treatment effect on the treated (ATT) of 2

The following can be read from the labelled plot for Scenario 2b:

  • Post-Pre Difference for never-treated state 1: 12-9=3
  • Post-Pre Difference for treated state 2: 15-10=5
  • Post-Pre Difference for treated state 3: 17-11=6
  • Causal effect of policy = 2 for state 2 vs 1 and 3 for state 3 vs 1 for an ATT of 2.5

4.2 TWFE Regression model

s2a_mod <- lm(outcome ~ policy + state + time, data = s2a)
tidy(s2a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 5 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     9.00  2.22e-16   4.05e16 1.57e-17
## 2 policy1         2.00  3.85e-16   5.20e15 1.22e-16
## 3 state2          1.00  2.94e-16   3.40e15 1.87e-16
## 4 state3          2.00  2.94e-16   6.81e15 9.35e-17
## 5 time2           3.00  3.14e-16   9.55e15 6.66e-17

The coefficient estimate for the policy variable equals to 2, which is the ATT for Scenario 2a.

s2b_mod <- lm(outcome ~ policy + state + time, data = s2b)
tidy(s2b_mod)
## # A tibble: 5 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)    9         0.5       18     0.0353
## 2 policy1        2.50      0.866      2.89  0.212 
## 3 state2         0.750     0.661      1.13  0.460 
## 4 state3         2.25      0.661      3.40  0.182 
## 5 time2          3.00      0.707      4.24  0.147

The coefficient estimate for the policy variable equals to 2.5, which is the ATT for Scenario 2b.

4.3 Summary

When there are multiple treatment groups and two time points, where >1 of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression. If treatment effects are heterogeneous across states, then the estimated effect is the average ATT across the states.

5 Scenario 3A and 3B (Two states, multiple time points)

In Scenario 3, the number of time periods is increased to incorporate 5 time points before and after a policy change. One never-treated and one treated group are considered. In Scenario 3a, the effect of treatment is constant once treatment is introduced. In Scenario 3b, the effect of treatment increases with time (i.e., it is dynamic).

Because there are multiple time points, when we read in the data we update the column type for the time variable to be numeric. This will help with plotting the data.

s3a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3a",
                 col_types = c("text", "numeric", "text", "numeric", "text"))

s3b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3b",
                 col_types = c("text", "numeric", "text", "numeric", "text"))

In addition to state, time, policy, and outcome, the data contains another variable time_since_policy which equals 0 before treatment (and always equals 0 for the never-treated), and counts the time since treatment in the treated state.

5.1 Visualization of Scenario 3

The following can be read from the labelled plot for Scenario 3a:

  • Pre-treatment difference is 17-15=2
  • Post-treatment difference is 34-30=4
  • DID = 2

The following can be read from the labelled plot for Scenario 3b:

  • Pre-treatment difference is 17-15=2
  • Post-treatment difference is 38-30=8
  • DID = 6

For Scenario 3b the effect is dynamic, and rather than calculating an ATT that aggregates over post-time, we may want to calculate the effect separately by time since treatment. Here, the effect increases over time. This could happen if the policy change takes time to “kick-in” (so it might level off after more time passes) or if there is a positive feedback loop.

In contrast, some policies may be associated with large initial effects that attenuate over time. Knowing if a policy introduction is associated with a changing effect over time is therefore important for understanding the real-world effects of the policy.

Thus, in this scenario we first estimate the average treatment effect using a TWFE model, and then we estimate the dynamic effect by extending the TWFE model to include a categorical indicator for time since the policy’s introduction.

5.2 TWFE Regression Model

Estimation of the dynamic effect in Scenario 3a. Different from the previous scenarios, we need to use factor(time) to ensure time is modeled using indicator variables (i.e., time fixed effects) since we imported it as a numeric variable in this scenario to make plotting easier.

s3a_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s3a)
tidy(s3a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 12 × 5
##    term           estimate std.error statistic   p.value
##    <chr>             <dbl>     <dbl>     <dbl>     <dbl>
##  1 (Intercept)        9.00  3.73e-15   2.41e15 9.71e-121
##  2 policy1            2.00  4.30e-15   4.65e14 5.16e-115
##  3 state2             2.00  3.04e-15   6.57e14 3.23e-116
##  4 factor(time)2      3.00  4.81e-15   6.23e14 4.92e-116
##  5 factor(time)3      6.00  4.81e-15   1.25e15 1.92e-118
##  6 factor(time)4      9     4.81e-15   1.87e15 7.49e-120
##  7 factor(time)5     12     4.81e-15   2.49e15 7.50e-121
##  8 factor(time)6     15.0   5.27e-15   2.84e15 2.61e-121
##  9 factor(time)7     18     5.27e-15   3.41e15 6.07e-122
## 10 factor(time)8     21     5.27e-15   3.98e15 1.77e-122
## 11 factor(time)9     24     5.27e-15   4.55e15 6.08e-123
## 12 factor(time)10    27.0   5.27e-15   5.12e15 2.37e-123

The coefficient of the policy term for scenario 3a is 2 showing that the regression model estimate equaled the causal effect.

Estimation of the dynamic effect in Scenario 3b:

s3b_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s3b)
tidy(s3b_mod)
## # A tibble: 12 × 5
##    term           estimate std.error statistic     p.value
##    <chr>             <dbl>     <dbl>     <dbl>       <dbl>
##  1 (Intercept)        9.00      1.22      7.35 0.0000801  
##  2 policy1            6.00      1.41      4.24 0.00283    
##  3 state2             2.00      1         2.00 0.0805     
##  4 factor(time)2      3.00      1.58      1.90 0.0943     
##  5 factor(time)3      6.00      1.58      3.79 0.00528    
##  6 factor(time)4      9.00      1.58      5.69 0.000459   
##  7 factor(time)5     12         1.58      7.59 0.0000637  
##  8 factor(time)6     13.0       1.73      7.51 0.0000689  
##  9 factor(time)7     17         1.73      9.81 0.00000976 
## 10 factor(time)8     21         1.73     12.1  0.00000198 
## 11 factor(time)9     25.0       1.73     14.4  0.000000519
## 12 factor(time)10    29.0       1.73     16.7  0.000000164

The coefficient of the policy term for scenario 3b is 6 showing that the regression model estimate equaled the causal effect.

5.3 TWFE Regression Model with time_since_policy specification

Rather than using a binary indicator for the policy change, it can be coded using the time_since_policy variable. If we include this as a factor variable in the model, then separate effects will be estimated for each time since treatment.

For Scenario 3a, using this variable should yield 2 for each time since treatment, since the treatment effect is constant over time.

s3a_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s3a)
tidy(s3a_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
##    term               estimate std.error statistic  p.value
##    <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            9.00  7.72e-15   1.17e15 3.24e-60
##  2 time_since_policy1     2.00  1.54e-14   1.30e14 2.13e-56
##  3 time_since_policy2     2.00  1.54e-14   1.30e14 2.13e-56
##  4 time_since_policy3     2.00  1.54e-14   1.30e14 2.13e-56
##  5 time_since_policy4     2.00  1.54e-14   1.30e14 2.13e-56
##  6 time_since_policy5     2.00  1.54e-14   1.30e14 2.13e-56
##  7 state2                 2     6.30e-15   3.17e14 5.91e-58
##  8 factor(time)2          3.00  9.96e-15   3.01e14 7.30e-58
##  9 factor(time)3          6.00  9.96e-15   6.02e14 4.56e-59
## 10 factor(time)4          9.00  9.96e-15   9.03e14 9.01e-60
## 11 factor(time)5         12.0   9.96e-15   1.20e15 2.85e-60
## 12 factor(time)6         15.0   1.26e-14   1.19e15 2.99e-60
## 13 factor(time)7         18.0   1.26e-14   1.43e15 1.44e-60
## 14 factor(time)8         21.0   1.26e-14   1.67e15 7.78e-61
## 15 factor(time)9         24.0   1.26e-14   1.90e15 4.56e-61
## 16 factor(time)10        27.0   1.26e-14   2.14e15 2.85e-61

Indeed, this is what we see.

For Scenario 3b, the effect increases by 2 units for every unit of time since treatment is initiated so this should be reflected in the coefficient estimates.

s3b_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s3b)
tidy(s3b_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
##    term               estimate std.error statistic  p.value
##    <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            9.00  8.46e-15   1.06e15 4.68e-60
##  2 time_since_policy1     2.00  1.69e-14   1.18e14 3.07e-56
##  3 time_since_policy2     4.00  1.69e-14   2.37e14 1.92e-57
##  4 time_since_policy3     6.00  1.69e-14   3.55e14 3.79e-58
##  5 time_since_policy4     8.00  1.69e-14   4.73e14 1.20e-58
##  6 time_since_policy5    10.0   1.69e-14   5.91e14 4.91e-59
##  7 state2                 2     6.90e-15   2.90e14 8.52e-58
##  8 factor(time)2          3.00  1.09e-14   2.75e14 1.05e-57
##  9 factor(time)3          6.00  1.09e-14   5.50e14 6.57e-59
## 10 factor(time)4          9.00  1.09e-14   8.24e14 1.30e-59
## 11 factor(time)5         12.0   1.09e-14   1.10e15 4.11e-60
## 12 factor(time)6         15.0   1.38e-14   1.09e15 4.31e-60
## 13 factor(time)7         18.0   1.38e-14   1.30e15 2.08e-60
## 14 factor(time)8         21.0   1.38e-14   1.52e15 1.12e-60
## 15 factor(time)9         24.0   1.38e-14   1.74e15 6.57e-61
## 16 factor(time)10        27.0   1.38e-14   1.96e15 4.10e-61

For this scenario, the effect estimate is 2 in the first time after treatment, then 4, and so on.

Note that we could have modeled time_since_policy linearly (e.g., by including a numeric variable for time_since_policy). Here, we modeled time_since_policy as a factor variable, which allows the effect estimates to change non-linearly over time.

5.4 Summary

When there is one treated group and one control group, with multiple pre- and post- time points, you can calculate the DID estimate using TWFE regression when the effect is constant or dynamic. Using a “time since treatment” indicator will estimate effects separately by time since the policy change, allowing you to see how the effect has changed over time.

6 Scenario 4 (4 states, heterogeneous treatment effects)

Scenario 4 extends Scenario 3 to the multiple group setting. State 1 is never-treated, while states 2-4 undergo treatment at time=6. The effects are not dynamic in time, but they are heterogeneous with some states having a larger treatment effect.

Note there are two new columns time_as_date and time_first_trt_date that we will use later in this example.

s4 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen4",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric", "date", "date"))

6.1 Visualization of Scenario 4

The following can be read from the labelled plot:

  • Pre-post difference for never-treated state 1: 30-15 = 15

  • Pre-post difference for treated state 2: 34-17 = 17

  • Pre-post difference for treated state 3: 35-19 = 16

  • Pre-post difference for treated state 4: 39-21 = 18

  • \(DID_{2 vs 1} = 17-15 = 2\)

  • \(DID_{3 vs 1} = 16-15 = 1\)

  • \(DID_{4 vs 1} = 18-15 = 3\)

  • \(\text{Average ATT} = (2 + 1 + 3)/3 = 2\)

6.2 TWFE Regression Model

s4_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s4)
tidy(s4_mod)
## # A tibble: 14 × 5
##    term           estimate std.error statistic  p.value
##    <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)        9        0.277     32.4  1.45e-22
##  2 policy1            2.00     0.320      6.24 1.31e- 6
##  3 state2             2.00     0.253      7.90 2.24e- 8
##  4 state3             3.50     0.253     13.8  1.71e-13
##  5 state4             6.5      0.253     25.7  5.34e-20
##  6 factor(time)2      3.00     0.310      9.67 4.20e-10
##  7 factor(time)3      6        0.310     19.3  5.83e-17
##  8 factor(time)4      9.00     0.310     29.0  2.44e-21
##  9 factor(time)5     12        0.310     38.7  1.63e-24
## 10 factor(time)6     15.0      0.392     38.2  2.20e-24
## 11 factor(time)7     18        0.392     45.9  2.06e-26
## 12 factor(time)8     21        0.392     53.5  3.89e-28
## 13 factor(time)9     24        0.392     61.2  1.24e-29
## 14 factor(time)10    27        0.392     68.8  5.91e-31

The coefficient of the policy term is 2 showing that the regression model estimates the ATT.

In this scenario and the ones considered before it we have not needed to use the new estimators because the TWFE uncover the true effect – this is the case because the policy change was introduced at one time point in each scenario. Below, we show how to use the Goodman-Bacon decomposition function bacon(), to show how its results in the case where the standard DID approach works so that you can contrast this with later scenarios where the standard approach fails.

6.3 Goodman Bacon Decomposition

Before using the bacon() function, we need to create numeric-coded versions of some of the variables that are stored as characters:

s4 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

The bacon function has an argument called formula, where you list the outcome as a function of the policy change, i.e., formula = outcome ~ policy_n. Note that no fixed effects for state or time are included in the formula. Instead, the time and state fixedeffects are specified by the time_varand id_var arguments.

s4_bacon <- bacon(formula = outcome ~ policy_n, 
                  data = s4,
                  id_var = "state_n",
                  time_var = "time")
##                   type weight avg_est
## 1 Treated vs Untreated      1       2

This small table illustrates that 100% of the weight is put on comparisons between treated and untreated states and the ATT equals 2.

For more detail we print the object:

s4_bacon 
##   treated untreated estimate weight                 type
## 2       6     99999        2      1 Treated vs Untreated

treated=6 says that all treated states start treatment at time = 6 and are compared to untreated states (with a fictitious treatment time of 99999).

The results from the Goodman-Bacon Decomposition are reassuring and tell us we can trust the TWFE estimate. For pedagodgical purposes, we also calculate the Group-Time ATT estimator using the att_gt function.

6.4 Group-Time ATT Estimator

Like bacon(), att_gt() requires all numeric variables. It also requires an argument called gname. gname expects a variable which encodes for each state the time of first policy change. For never-treated state 1, this variable equals 0 and for treated states 2-4 this variable equals 6. Another important argument in the function is control_group which you can set to “nevertreated” or “notyettreated”. When specifying “control_group=nevertreated”, only states that never receive the policy change are used as control states. When specifying “control_group=notyettreated”, the control group expands to include states that have not yet received treatment at the time of the policy’s introduction. In this guide, we use “notyetreated” as the control group since these states satisfy the parallel trends assumption and including them would increase statistical precision (if there were noise in the estimates).

s4_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s4, 
             control_group = "notyettreated",
             anticipation = 0)
## No pre-treatment periods to test
m2_ag <- aggte(s4_cs, type="simple")
summary(m2_ag)
## 
## Call:
## aggte(MP = s4_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##  ATT    Std. Error     [ 95%  Conf. Int.]  
##    2        0.9884     0.0628      3.9372 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Here, we see the ATT is equal to 2 as estimated by the TWFE model.

6.5 Target Trial Estimator

We can also calculate the estimate using the Target Trial Estimator introduced by Ben-Michael, Feller, and Stuart. We employ slightly edited versions of the code that these authors provide with their published article. The code calculates estimates of the effect for every event time (also called time since treatment). We built upon this code to further aggregate across event times to produce one overall estimate using fit_event_jack_sum(), fit_event_jack_sum_hte(), and fit_event_jack_sum_C() functions. Use fit_event_jack_sum() when there is only one adoption cohort, as shown here.

Note that when you run this function, messages are printed to the screen of the form “Dropping #”. These messages are outputted by the jackknife function that drops one unit at a time when estimating the standard error. You will see them printed below and each time we use one of the fit_event_jack*() functions.

s4_bm <- fit_event_jack_sum(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s4, 
                            max_time_to = 10000)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s4_bm %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s4_bm
##    se estimate  lb  ub
## 1 NaN        2 NaN NaN

The ATT is equal to the effect estimated by the TWFE and Group-Time estimators.

Ideally, we would also show you how to run the Cohort ATT estimator for this scenario but the function to do so, staggered_sa(), throws an error when trying to estimate the standard error. Thus, we will show you how to use this function in the scenario that incorporates noise into the data.

7 Scenario 5 (4 states, staggered treatment timing)

Scenario 5 introduces staggered timing. In this example, the treated states introduce the policy at two different time points. The treatment effects are neither heterogeneous or dynamic. There are two never-treated states and two ever-treated states.

s5 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen5",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric", 
                              "date", "date"))

7.1 Visualization of Scenario 5

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 49.5-16.5 = 33
  • \(DID_{3 vs 1,2} = 33-30=3\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 66-33= 33
  • \(DID_{4 vs 1,2} = 33-30=3\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 2 is: 36-16.5=19.5
  • \(DID_{3 vs 4} = 19.5-16.5=3\)

Contrast 4:

  • Pre-post difference for earlier-treated state 2 is: 60-36=24
  • Pre-post difference for later-treated state 4 is: 66-39=27
  • \(DID_{4 vs 3} = 27-24=3\)

Since the causal effect equals 3 across all contrasts, and all contrasts are valid, this is the quantity that we want the TWFE and new estimators to estimate.

7.2 TWFE Regression Model

s5_mod <- lm(outcome ~  policy + state + factor(time), data = s5)
tidy(s5_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 24 × 5
##    term          estimate std.error statistic p.value
##    <chr>            <dbl>     <dbl>     <dbl>   <dbl>
##  1 (Intercept)       9.00  1.41e-14   6.39e14       0
##  2 policy1           3.00  1.11e-14   2.69e14       0
##  3 state2            1.00  8.04e-15   1.24e14       0
##  4 state3            3.00  1.20e-14   2.50e14       0
##  5 state4            9     9.48e-15   9.50e14       0
##  6 factor(time)2     3.00  1.80e-14   1.67e14       0
##  7 factor(time)3     6.00  1.80e-14   3.34e14       0
##  8 factor(time)4     9.00  1.80e-14   5.00e14       0
##  9 factor(time)5    12.0   1.82e-14   6.59e14       0
## 10 factor(time)6    15.0   1.82e-14   8.24e14       0
## # ℹ 14 more rows

The coefficient of the policy term is 3 showing that the regression model estimates the ATT.

7.3 Goodman Bacon Decomposition

Using the bacon()function, we can see see the four comparisons being made and the effect estimates for each comparison. The weight column in the output shows how much weight each estimate contributes to the ATT. Here, the treated vs untreated comparisons are the most heavily weighted.

s5 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))
#bacon decomp says no problems
#only comparing treated vs untreated and the average effect estimate is 2
s5_bacon <- bacon(outcome ~ policy_n,
      data = s5,
      id_var = "state_n",
      time_var = "time")
##                       type  weight avg_est
## 1 Earlier vs Later Treated 0.06715       3
## 2 Later vs Earlier Treated 0.15108       3
## 3     Treated vs Untreated 0.78177       3

We can also see a list of each comparison and its weight:

s5_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999        3 0.30695444     Treated vs Untreated
## 3      12     99999        3 0.47482014     Treated vs Untreated
## 6      12         5        3 0.15107914 Later vs Earlier Treated
## 8       5        12        3 0.06714628 Earlier vs Later Treated

7.4 Group-Time ATT Estimator

For pedagogical purposes, we also show the estimate if we applied the Callaway Sant’Anna estimator using the att_gt() function to estimate group-time ATTs.

s5_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s5, 
             control_group = "notyettreated",
             anticipation = 0)
## No pre-treatment periods to test

Note that I suppressed the warnings by using warning=FALSE in the R markdown chunk header for this specific chunk. This is because it prints more than 100 lines of the same warning “## Warning in max(abs(b/bSigma), na.rm = TRUE): no non-missing arguments to max; ## returning -Inf”. This warning is because there is no variability in the the data so the model cannot estimate a standard error. This is not realistic, and in a later example we incorporate variability. I suppress the warnings in this document whenever they are too numerous. In practice, you will want to heed warnings and understand why they occur before suppressing them.

s5_cs_ag <- aggte(s5_cs, type="simple")
summary(s5_cs_ag)
## 
## Call:
## aggte(MP = s5_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##  ATT    Std. Error     [ 95%  Conf. Int.] 
##    3            NA         NA          NA 
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Here, we see the ATT is equal to 3 as estimated by the TWFE model.

7.5 Target Trial Estimator

We can also calculate the estimate using the Target Trial Estimator. Here we use fit_event_jack_sum() because the effect is not heterogeneous.

s5_bm <- fit_event_jack_sum(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s5, 
                            max_time_to = 10000)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s5_bm %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s5_bm
##   se estimate lb ub
## 1  0        3  3  3

This method also estimates an effect of 3, the true estimate, and the same as the other estimators.

7.6 Summary

When a policy’s introduction is staggered in time and the treatment effect is constant (not dynamic or heterogeneous), TWFE can be used to estimate the treatment effect.

8 Scenario 6 (4 states, staggered timing, dynamic effects)

Scenario 6 is similar to Scenario 5 except now the treatment effect increases as time goes on, i.e., it is dynamic. The magnitude of the treatment effect is the same for both treated groups.

# here we also specify the range of the data to be read into R because we don't
# want to include other information included in this sheet
s6 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen6", range = "A1:H81",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
                              "date", "date"))

8.1 Visualization of Scenario 6

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 57-16.5 = 40.5
  • \(DID_{3 vs 1,2} = 40.5-30 = 10.5\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 70-33= 37
  • \(DID_{4 vs 1,2} = 37-30 = 7\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 3 is: 39-16.5=22.5
  • \(DID_{3 vs 4} = 22.5-16.5 = 6\)

Contrast 4:

  • Pre-post difference for earlier-treated state 3 is: 71-39=32
  • Pre-post difference for later-treated state 4 is: 70-39=31
  • \(DID_{4 vs 3} = 31-32 = -1\)

Contrasts 1 and 2 are okay because they are clean comparisons between never-treated and treated states. Contrasts 3 and 4 are trickier. Contrast 3 compares the earlier treated state to the later treated state. It is okay because parallel trends holds and it only uses the time before the later state is treated to inform the estimation. Contrast 4 is the problem – it uses post-treatment time from state 3 as the “pre” time for the comparison. The issue is that this post time includes the effects of the treatment, and since the effect is dynamic it does not satisfy the parallel trends assumption. However this contrast still contributes to the TWFE regression estimate. To see this, we first calculate the TWFE regression estimate and then use the Goodman Bacon decomposition to see how much weight each contrast contributes to the TWFE estimate.

8.2 Calculation of summary measures of the causal effect

In the previous scenarios, it has been relatively straightforward to identify the true causal effect that we would like to estimate. With dynamic effects, there are a few possibilities:

Time-specific dynamic effects: Calculate the causal effect of the policy at each time post-policy change. To see how these are calculated, see columns I and J in the “scen6” sheet of the Excel spreadsheet containing the imported data. Column J corresponds to the difference between the observed outcome (after the policy change) and the counterfactual outcome had the treated state not been treated. This table summarizes the causal effect over time since treatment:

Time since policy change Causal effect of the policy
1 3
2 4
18 18

This is the case for both the earlier- and the later- treated state, implying that the ATT at the first time point equals 3, and so on (e.g., these are averages across the two treated states). When estimating the effects dynamically, these are the time-since-treatment-specific parameters we aim to estimate.

You may prefer a summary estimate, that aggregates across all post-treatment time into one number. Three possible ways to summarize into one number are as follows:

📘 Causal Effect A

Take an average of the estimated effects across the treated units:

\(\frac{\text{Number in adoption cohort 1}}{\text{Number of treated units}} \times ATT_{adoption cohort 1} + \frac{\text{Number in adoption cohort 2}}{\text{Number of treated units}} \times ATT_{adoption cohort 2}\)

\(\frac{1}{2}\times 10.5 + \frac{1}{2}\times 7 = 8.75\)



📙 Causal Effect B

Take the average of the causal effects across all the time points:

\[\frac{ATT_{t=1} + ATT_{t=2} +...+ ATT_{t=16}}{\text{Number of post-treatment times}} = \frac{3 + 4 + ... + 18}{16} = \frac{168}{16} = 10.5\]

📗 Causal Effect C

Take a weighted average of the causal effects, where the weights correspond to the number of units treated at each time point:

\[\frac{2\times{ATT_{t=1}} +2\times{ATT_{t=2}} + ... + 2\times{ATT_{t=9} + ATT_{t=10} + ... ATT_{t=16}}}{\text{Total number of state-time points}}\]

\[\frac{2\times(3+4+...+11) + (12+13+...+18)}{25} = 9.24\]

8.3 TWFE Regression Model

s6_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s6)
tidy(s6_mod)
## # A tibble: 24 × 5
##    term          estimate std.error statistic  p.value
##    <chr>            <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)       8.24     1.28       6.44 2.90e- 8
##  2 policy1           6.80     1.01       6.72 1.02e- 8
##  3 state2            1.00     0.731      1.37 1.77e- 1
##  4 state3            5.96     1.09       5.46 1.11e- 6
##  5 state4            9.09     0.861     10.6  6.23e-15
##  6 factor(time)2     3.00     1.63       1.84 7.17e- 2
##  7 factor(time)3     6.00     1.63       3.67 5.41e- 4
##  8 factor(time)4     9.00     1.63       5.51 9.51e- 7
##  9 factor(time)5    11.1      1.65       6.68 1.16e- 8
## 10 factor(time)6    14.3      1.65       8.65 6.76e-12
## # ℹ 14 more rows

The coefficient of the policy term equals 6.8. This is smaller than any of 📘 Causal Effect A , 📙 Causal Effect B , or 📗 Causal Effect C. To see how the TWFE came to be this number, we use the Goodman Bacon decomposition to see how much weight each of the contrasts contributes to the effect estimate:

8.4 Goodman Bacon Decomposition

s6 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s6_bacon <- bacon(outcome ~ policy_n,
      data = s6,
      id_var = "state_n",
      time_var = "time")
##                       type  weight  avg_est
## 1 Earlier vs Later Treated 0.06715  6.00000
## 2 Later vs Earlier Treated 0.15108 -1.00000
## 3     Treated vs Untreated 0.78177  8.37423

It is more helpful to view the four contrasts separately:

s6_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999     10.5 0.30695444     Treated vs Untreated
## 3      12     99999      7.0 0.47482014     Treated vs Untreated
## 6      12         5     -1.0 0.15107914 Later vs Earlier Treated
## 8       5        12      6.0 0.06714628 Earlier vs Later Treated

The estimates of the treatment effects for each contrast are as we calculated them above. Note that 15% of the weight of the TWFE estimate is on Contrast 4 – the one making the improper comparison between an earlier treated and a later treated state where parallel trends does not hold.

We can also confirm that you get the TWFE estimate by taking the weighted average of the Goodman-Bacon decomposition components:

#see the TWFE estimate
sum(s6_bacon$estimate*s6_bacon$weight)
## [1] 6.798561

The decomposition shows that the TWFE regression estimate is biased because it incorporates a contrast that we would not want to make in practice between an earlier treated and a later treated group. To overcome this, we use one of the new estimators. Consider the Group-Time ATT to start.

8.5 Group-Time ATT Estimator

Like in the earlier examples we start by running the att_gt() function. Unlike before, we display the output from this step using summary(). This output shows the estimated ATT for each combination of treated state and time. The time is long because it includes estimates for pre-policy time, which helps with the evaluation of the parallel trends assumption or to see if there are any lead effects (“anticipation”) of the treatment on the outcome. You wouldn’t usually report this entire table, but it worth showing here to see how the estimated effect is dynamic and that decisions need to be make about how to report dynamics
effects including whether or not to aggregate the effect, and if so, at what level.

s6_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s6, 
             control_group = "notyettreated",
             anticipation = 0)
## No pre-treatment periods to test
summary(s6_cs)
## 
## Call:
## att_gt(yname = "outcome", tname = "time", idname = "state_n", 
##     gname = "time_first_treat", data = s6, control_group = "notyettreated", 
##     anticipation = 0)
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## Group-Time Average Treatment Effects:
##  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band] 
##      5    2        0         NA            NA          NA 
##      5    3        0         NA            NA          NA 
##      5    4        0         NA            NA          NA 
##      5    5        3         NA            NA          NA 
##      5    6        4         NA            NA          NA 
##      5    7        5         NA            NA          NA 
##      5    8        6         NA            NA          NA 
##      5    9        7         NA            NA          NA 
##      5   10        8         NA            NA          NA 
##      5   11        9         NA            NA          NA 
##      5   12       10         NA            NA          NA 
##      5   13       11         NA            NA          NA 
##      5   14       12         NA            NA          NA 
##      5   15       13         NA            NA          NA 
##      5   16       14         NA            NA          NA 
##      5   17       15         NA            NA          NA 
##      5   18       16         NA            NA          NA 
##      5   19       17         NA            NA          NA 
##      5   20       18         NA            NA          NA 
##     12    2        0         NA            NA          NA 
##     12    3        0         NA            NA          NA 
##     12    4        0         NA            NA          NA 
##     12    5        0         NA            NA          NA 
##     12    6        0         NA            NA          NA 
##     12    7        0         NA            NA          NA 
##     12    8        0         NA            NA          NA 
##     12    9        0         NA            NA          NA 
##     12   10        0         NA            NA          NA 
##     12   11        0         NA            NA          NA 
##     12   12        3         NA            NA          NA 
##     12   13        4         NA            NA          NA 
##     12   14        5         NA            NA          NA 
##     12   15        6         NA            NA          NA 
##     12   16        7         NA            NA          NA 
##     12   17        8         NA            NA          NA 
##     12   18        9         NA            NA          NA 
##     12   19       10         NA            NA          NA 
##     12   20       11         NA            NA          NA 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Callaway and Sant’Anna provide many options for aggregating the group-time effects. The simplest option is to specify type = simple in the attge() function. This estimate considers only the contrasts with a never-treated state (i.e., Comparisons 1 and 2 in the figure above) and combines them into a weighted average, where the weights correspond to each adoption cohort’s time spent in the post-treatment period. For contrast 1, there are 16 time periods post treatment and for contrast 2, there are 9 periods post-treatment. Thus the weighted average is: \(10.5\times(16/25) + 7\times(9/25)=\) 9.24. Note that this corresponds to 📗 Causal Effect C calculated in Section 8.2 above.

#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s6_cs_ag <- aggte(s6_cs, type="simple")
summary(s6_cs_ag)
## 
## Call:
## aggte(MP = s6_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##   ATT    Std. Error     [ 95%  Conf. Int.]  
##  9.24        1.1956     6.8967     11.5833 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, we can specify type = "group" to get separate effect estimates according to time of implementation. Note that group denotes the time of the policy’s introduction for the treated states.

s6_cs_ag2 <- aggte(s6_cs, type = "group")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s6_cs_ag2)
## 
## Call:
## aggte(MP = s6_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on group/cohort aggregation:  
##   ATT    Std. Error     [ 95%  Conf. Int.] 
##  8.75            NA         NA          NA 
## 
## 
## Group Effects:
##  Group Estimate Std. Error [95% Pointwise  Conf. Band] 
##      5     10.5         NA              NA          NA 
##     12      7.0         NA              NA          NA 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

The output also displays an estimate of the Overall ATT, equal to 8.75. This estimate is different from the one estimated where type = "simple", and is equal to 📘 Causal Effect A. Thus, the researcher may want to estimate the ATT using this method if they prefer these weights over the weights specified by the previous model.

You can also estimate the dynamic effect separately for each time since the treatment was introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be surprised by all the rows in the outputted table!

s6_cs_ag3 <- aggte(s6_cs, type="dynamic")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s6_cs_ag3)
## 
## Call:
## aggte(MP = s6_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##   ATT    Std. Error     [ 95%  Conf. Int.] 
##  10.5            NA         NA          NA 
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Pointwise  Conf. Band] 
##         -10        0         NA              NA          NA 
##          -9        0         NA              NA          NA 
##          -8        0         NA              NA          NA 
##          -7        0         NA              NA          NA 
##          -6        0         NA              NA          NA 
##          -5        0         NA              NA          NA 
##          -4        0         NA              NA          NA 
##          -3        0         NA              NA          NA 
##          -2        0         NA              NA          NA 
##          -1        0         NA              NA          NA 
##           0        3         NA              NA          NA 
##           1        4         NA              NA          NA 
##           2        5         NA              NA          NA 
##           3        6         NA              NA          NA 
##           4        7         NA              NA          NA 
##           5        8         NA              NA          NA 
##           6        9         NA              NA          NA 
##           7       10         NA              NA          NA 
##           8       11         NA              NA          NA 
##           9       12         NA              NA          NA 
##          10       13         NA              NA          NA 
##          11       14         NA              NA          NA 
##          12       15         NA              NA          NA 
##          13       16         NA              NA          NA 
##          14       17         NA              NA          NA 
##          15       18         NA              NA          NA 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Again, the Overall ATT effect estimate is different from the other two. Here, it is a simple average of all effects in the post-treatment time. This is equal to \((3 + 4 + ... + 18)/16 = 10.5\), which equals 📙 Causal Effect B.

If you are more interested in the dynamic effect over time, it is helpful to plot the estimates using the ggdid() function:

ggdid(s6_cs_ag3)

Lastly, you can use type = "calendar" to aggregate the effects over calendar time. Epidemiologically, this is not sensible under the context of dynamic effects because it is mixing effect estimates across different times since treatment was introduced. In some cases, only one comparison is contributing to the estimation (e.g., in calendar time periods where only one group has introduced the treatment), while at other points, two comparisons contribute. But this “knowledge” is lost in the presentation. We don’t show the output from using type = "calendar" but include the code below in case it makes sense for your setting.

#calendar=time specific
#not recommended for this setting...but Callaway and Sant'Anna say they prefer
#it here https://bcallaway11.github.io/did/articles/did-basics.html
s6_cs_ag4 <- aggte(s6_cs, type = "calendar")
summary(s6_cs_ag4)
ggdid(s6_cs_ag4)

8.6 Target Trial Estimator

We can also estimate the effects using the Target Trial estimator. This time, we start with the fit_event_jack() function which gives an estimate for each time since treatment. This estimator yields the same estimates as the Group Time ATT approach.

s6_bm <- fit_event_jack(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s6, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(cohort, event_time)`
s6_bm_ave <- s6_bm %>% filter(cohort == "average")

s6_bm_ave %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

ggplot(s6_bm_ave, aes(x = event_time, y = estimate)) + 
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(aes(col = event_time >= 0)) + 
  geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) + 
  labs(y = "Estimate", x = "Event time") + 
  theme_bw() + 
  scale_color_discrete(labels=c('pre', 'post')) +
  theme(legend.title=element_blank(), legend.position = "bottom")

To summarize the dynamic effects into one aggregated effect estimate we use the fit_event_jack_sum() function:

s6_bm2 <- fit_event_jack_sum(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s6, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s6_bm2$estimate
## [1] 10.5

The aggregated effect estimate equals 10.5, equivalent to the Overall ATT from the Group-Time estimator when type = "dynamic" and to 📙 Causal Effect B

8.7 TWFE with time_since_policy specification

But what if we model TWFE with the time_since_change indicators?

s6_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s6)
tidy(s6_mod2) %>% filter(str_detect(term, "time_since")) %>% 
  mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>% 
  arrange(time) 
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 6
##    term                estimate std.error statistic p.value  time
##    <chr>                  <dbl>     <dbl>     <dbl>   <dbl> <dbl>
##  1 time_since_policy1      3.00  2.42e-14   1.24e14       0     1
##  2 time_since_policy2      4.00  2.42e-14   1.65e14       0     2
##  3 time_since_policy3      5.00  2.46e-14   2.03e14       0     3
##  4 time_since_policy4      6.00  2.46e-14   2.44e14       0     4
##  5 time_since_policy5      7.00  2.46e-14   2.84e14       0     5
##  6 time_since_policy6      8.00  2.46e-14   3.25e14       0     6
##  7 time_since_policy7      9.00  2.46e-14   3.65e14       0     7
##  8 time_since_policy8     10.0   2.55e-14   3.92e14       0     8
##  9 time_since_policy9     11.0   2.55e-14   4.31e14       0     9
## 10 time_since_policy10    12.0   3.56e-14   3.37e14       0    10
## 11 time_since_policy11    13.0   3.56e-14   3.66e14       0    11
## 12 time_since_policy12    14.0   3.56e-14   3.94e14       0    12
## 13 time_since_policy13    15.0   3.56e-14   4.22e14       0    13
## 14 time_since_policy14    16.0   3.56e-14   4.50e14       0    14
## 15 time_since_policy15    17.0   3.58e-14   4.75e14       0    15
## 16 time_since_policy16    18.0   3.58e-14   5.03e14       0    16

The regression model still works! The estimates from the policy indicator variables equal those estimated by the Group-Time estimator when specified using type = "dynamic" and the Target Trial approach, which are all equivalent to the time-since-treatment-specific parameters we identified earlier.

8.8 Summary

When the treatment effect is staggered and dynamic, you can still capture the effect estimate using a TWFE model so long as the policy effect is modeled using the time_since_policy variable. The key question for the researcher is to identify the parameter of interest – are you interested in estimating a dynamic effect, or does a parameter that summarizes over treatment time make sense?

Recommendation: When there are multiple time periods, start by estimating a dynamic effect to see if the model supports its presence (i.e., does the effect change over time or is it constant?). If the effect appears dynamic, this is important because it is indicative of how the policy operates after being rolled out. If there is no strong support for a dynamic effect, then consider estimating the effects separately by timing of introduction if that is sensible for the policy change under study (i.e., if there are only a few separate time points), or estimating one overall summary parameter.

9 Scenario 7 (4 states, staggered timing, heterogeneous effects)

Scenario 7 is similar to Scenario 5 except now the treatment effect is different for the two treated states, i.e., it is heterogeneous.

# here we also specify the range of the data to be read into R because we don't
# want to include other information included in this sheet
s7 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen7", range = "A1:H81",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
                              "date", "date"))

9.1 Visualization of Scenario 7

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 52.5-16.5 = 36
  • \(DID_{3 vs 1,2} = 36-30 = 6\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 67-33= 34
  • \(DID_{4 vs 1,2} = 34-30 = 4\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 3 is: 39-16.5=22.5
  • \(DID_{3 vs 4} = 22.5-16.5 = 6\)

Contrast 4:

  • Pre-post difference for earlier-treated state 3 is: 63-39=24
  • Pre-post difference for later-treated state 4 is: 67-39=28
  • \(DID_{4 vs 3} = 28-24 = 4\)

Contrasts 1 and 2 are definitely okay because they are clean comparisons between never-treated and treated states. Contrasts 3 and 4 are trickier, however, we can see from the diagram that parallel trends is still satisfied. So, even though the model uses a previously treated state as a control, this is okay because the causal effect of the treatment was heterogeneous and does not lead to a violation of the parallel trends assumption. Thus, we are comfortable with all contrasts contributing to the TWFE regression estimate. To see how much weight is put on each one, we use the Goodman Bacon decomposition, but first we consider the different causal effects that can be calculated.

9.2 Calculation of summary measures of the causal effect

Once the setting includes both heterogeneous effects and staggered timing, there is a question of how to aggregate the causal effect estimates across multiple treated units into one estimate. Here are three causal effects that could be calculated:

📘 Causal Effect A

Take an average of the estimated effects across the treated units:

\(\frac{\text{Number in adoption cohort 1}}{\text{Number of treated units}} \times ATT_{adoption cohort 1} + \frac{\text{Number in adoption cohort 2}}{\text{Number of treated units}} \times ATT_{adoption cohort 2}\)

\(\frac{1}{2} \times 6 + \frac{1}{2} \times 4 = 5\)

📙 Causal Effect B

Take an average of the effects across all post-treatment time points:

\(\frac{ATT_{t=1} + ATT_{t=2} + ... + ATT_{t=16}}{\text{Number of posttreatment times}}\)

\(\frac{[5\times9] +[6\times7]}{16}=5.43\)

📗 Causal Effect C

Take a weighted average of the post-treatment-time-specific effects, where the weights correspond to the number of units treated at each time point:

\(\frac{2\times{[ATT_{t=1}} +{ATT_{t=2}} + ... + {ATT_{t=9}] + 1\times[{ATT_{t=10} + ... +ATT_{t=16}}}]}{\text{Total number of state-time points}}\)

\(\frac{2\times(5+5+...+5) + 1\times{(6+...+6)}}{25} = 5.28\)

Each of these causal effects are valid measures to calculate and all come with strengths and weaknesses. The researcher needs to decide which one to calculate. Let’s first see how the estimate from the TWFE model compares.

9.3 TWFE Regression Model

s7_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s7)
tidy(s7_mod)
## # A tibble: 24 × 5
##    term          estimate std.error statistic  p.value
##    <chr>            <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)       8.83     0.191     46.2  2.75e-46
##  2 policy1           4.75     0.151     31.4  3.16e-37
##  3 state2            1.00     0.109      9.15 1.02e-12
##  4 state3            4.00     0.163     24.5  1.20e-31
##  5 state4            8.66     0.129     67.3  2.87e-55
##  6 factor(time)2     3.00     0.244     12.3  1.59e-17
##  7 factor(time)3     6.00     0.244     24.6  1.14e-31
##  8 factor(time)4     9.00     0.244     36.8  5.94e-41
##  9 factor(time)5    12.3      0.247     49.8  4.53e-48
## 10 factor(time)6    15.3      0.247     62.0  2.79e-53
## # ℹ 14 more rows

The coefficient of the policy term equals 4.75. This is smaller than any of 📘 Causal Effect A , 📙 Causal Effect B , or 📗 Causal Effect C. To see how the TWFE came to be this number, we use the Goodman Bacon decomposition to see how much weight each of the contrasts contributes to the effect estimate.

9.4 Goodman Bacon Decomposition

s7 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s7_bacon <- bacon(outcome ~ policy_n,
      data = s7,
      id_var = "state_n",
      time_var = "time")
##                       type  weight avg_est
## 1 Earlier vs Later Treated 0.06715 6.00000
## 2 Later vs Earlier Treated 0.15108 4.00000
## 3     Treated vs Untreated 0.78177 4.78528

It is more helpful to view the four contrasts separately:

s7_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999        6 0.30695444     Treated vs Untreated
## 3      12     99999        4 0.47482014     Treated vs Untreated
## 6      12         5        4 0.15107914 Later vs Earlier Treated
## 8       5        12        6 0.06714628 Earlier vs Later Treated

The estimates of the treatment effects for each contrast are as we calculated them above. However, the TWFE’s estimate of 4.75, is pulled towards 4, because the combined weight put on the effect estimate of 4 is higher than the combined weight put on the effect estimate of 6. This leads to an effect estimate from TWFE that is smaller than any of the causal effect estimates.

The TWFE regression estimate is estimating a different causal estimand than any of the three with initially proposed. The new proposed estimators overcome this issue and are considered next.

9.5 Group-Time ATT Estimator

Like in the earlier examples we start by running the att_gt() function.

s7_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s7, 
             control_group = "notyettreated",
             anticipation = 0)
## No pre-treatment periods to test

Callaway and Sant’Anna provide many options for aggregating the group-time effects. The simplest option is to specify type = simple in the attge() function. This estimate considers only the contrasts with a never-treated state (i.e., Comparisons 1 and 2 in the figure above) and combines them into a weighted average, where the weights correspond to each adoption cohort’s time spent in the post-treatment period. For contrast 1, there are 16 time periods post treatment and for contrast 2, there are 9 periods post-treatment. Thus the weighted average is: \(6\times(16/25) + 4\times(9/25)=\) 5.28. This is the same as 📗 Causal Effect C calculated above.

#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s7_cs_ag <- aggte(s7_cs, type="simple")
summary(s7_cs_ag)
## 
## Call:
## aggte(MP = s7_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##   ATT    Std. Error     [ 95%  Conf. Int.]  
##  5.28        0.6832      3.941       6.619 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, we can specify type = "group" to get separate effect estimates according to time of implementation. In the output Group denotes the time of the policy’s introduction for the treated states.

s7_cs_ag2 <- aggte(s7_cs, type = "group")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s7_cs_ag2)
## 
## Call:
## aggte(MP = s7_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on group/cohort aggregation:  
##  ATT    Std. Error     [ 95%  Conf. Int.] 
##    5            NA         NA          NA 
## 
## 
## Group Effects:
##  Group Estimate Std. Error [95% Pointwise  Conf. Band] 
##      5        6         NA              NA          NA 
##     12        4         NA              NA          NA 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

The output also displays an estimate of the Overall ATT, equal to 5. This estimate is different from the one estimated where type = "simple". Here, the estimate is a weighted average, with weights proportional to the number of units in each adoption cohort (e.g., \((1/2)\times 6 + (1/2) \times 4=\) 5). This is equal to 📘 Causal Effect A calculated above.

Alternatively, we can specific type = "dynamic" to get separate effect estimates for each time period. We can also see that the Overall ATT from this call is equal to 📙 Causal Effect B calculated above.

s7_cs_ag2 <- aggte(s7_cs, type = "dynamic")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous conf. band is somehow smaller than pointwise one using
## normal approximation. Since this is unusual, we are reporting pointwise
## confidence intervals
summary(s7_cs_ag2)
## 
## Call:
## aggte(MP = s7_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  5.4375         0.417     4.6202      6.2548 *
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Pointwise  Conf. Band]  
##         -10        0         NA              NA          NA  
##          -9        0         NA              NA          NA  
##          -8        0         NA              NA          NA  
##          -7        0         NA              NA          NA  
##          -6        0         NA              NA          NA  
##          -5        0         NA              NA          NA  
##          -4        0         NA              NA          NA  
##          -3        0         NA              NA          NA  
##          -2        0         NA              NA          NA  
##          -1        0         NA              NA          NA  
##           0        5     1.4826          2.0942      7.9058 *
##           1        5     0.7413          3.5471      6.4529 *
##           2        5     0.7413          3.5471      6.4529 *
##           3        5     0.7413          3.5471      6.4529 *
##           4        5     0.7413          3.5471      6.4529 *
##           5        5     1.4826          2.0942      7.9058 *
##           6        5     0.7413          3.5471      6.4529 *
##           7        5         NA              NA          NA  
##           8        5     0.7413          3.5471      6.4529 *
##           9        6         NA              NA          NA  
##          10        6         NA              NA          NA  
##          11        6         NA              NA          NA  
##          12        6         NA              NA          NA  
##          13        6         NA              NA          NA  
##          14        6         NA              NA          NA  
##          15        6         NA              NA          NA  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

9.6 Target Trial Estimator

We can also estimate the effects using the Target Trial estimator. This time, we start with the fit_event_jack() function which gives an estimate for each time since treatment.

s7_bm <- fit_event_jack(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s7, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(cohort, event_time)`
s7_bm_ave <- s7_bm %>% filter(cohort == "average")

s7_bm_ave %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

The plot below shows that the time-specific estimates equal 5 when there are two states contributing to the estimate (one with a causal effect of 4 and the other with a causal effect of 6) and 6 when only the early-introduction state contributes to the causal effect estimate.

ggplot(s7_bm_ave, aes(x = event_time, y = estimate)) + 
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(aes(col = event_time >= 0)) + 
  geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) + 
  labs(y = "Estimate", x = "Event time") + 
  theme_bw() + 
  scale_color_discrete(labels=c('pre', 'post')) +
  theme(legend.title=element_blank(), legend.position = "bottom")

To summarize the effects into one aggregated effect estimate we use one of the fit_event_jack_sum(), fit_event_jack_sum_hte(), or fit_event_jack_sum_C() functions. The functions estimate different causal effects, so choose the one that corresponds in the effect estimate you are interested in.

Let’s calculate the aggregated estimate using fit_event_jack_sum() first:

s7_bm2 <- fit_event_jack_sum(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s7, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s7_bm2$estimate
## [1] 5.4375

The aggregated effect estimate using fit_event_jack_sum equals 5.4375, equivalent to the 📙 Causal Effect B.

We can compare this to the aggregated effect estimate from fit_event_jack_sum_hte():

s7_bm3 <- fit_event_jack_sum_hte(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s7, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s7_bm3$estimate
## [1] 5

The aggregated effect estimate using fit_event_jack_sum_hte() equals 5, equivalent to the 📙 Causal Effect A.

Finally, we can compare this to the aggregated effect estimate from fit_event_jack_sum_C():

s7_bm4 <- fit_event_jack_sum_C(outcome_var = "outcome", 
                            date_var = "time_as_date", 
                            unit_var = "state",
                            policy_var = "time_first_trt_date", 
                            data = s7, 
                            max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s7_bm4$estimate
## [1] 5.28

The aggregated effect estimate using fit_event_jack_sum_C() equals 5.28, equivalent to the 📗 Causal Effect C.

9.7 Summary

When the treatment effect is staggered and heterogeneous, the TWFE estimate aggregates the heterogeneous effects using weights that are calculated by the model and not concordant with weights researchers would intuitively choose. In this example, this led to an effect estimate that was lower than the three causal effect estimates we aimed to estimate. Thus, TWFE performs unfavorably when effects are heterogeneous across adoption cohorts.

Recommendation: Use the aggte() function with type = "group" to estimate effects separately for each adoption cohort alongside confidence intervals. Examine the difference in effects across adoption cohorts and their associated precision, remembering that the number of units contributing to each adoption cohort will affect the width of its confidence interval, which may make it difficult to make conclusive statements about heterogeneity across adoption cohorts in the presence of limited data.

10 Scenario 8 (4 states, staggered timing, dynamic and heterogeneous treatment effects)

Scenario 8 combines the complexities across Scenarios 6 and 7. In this scenario, the state that implements the policy change first has a stronger treatment effect (i.e., a steeper change in slope).

s8 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen8", range = "A1:H81", 
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric",
                              "date", "date"))

10.1 Visualization of Scenario 8

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 64.5-16.5 = 48
  • \(DID_{3 vs 1,2} = 48-30 = 18\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 68-33= 35
  • \(DID_{4 vs 1,2} = 35-30 = 5\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 3 is: 42-16.5=25.5
  • \(DID_{3 vs 4} = 25.5-16.5 = 9\)

Contrast 4:

  • Pre-post difference for earlier-treated state 3 is: 82-42=40
  • Pre-post difference for later-treated state 4 is: 68-39=29
  • \(DID_{4 vs 3} = 29-40 = -11\)

Similar to the previous scenario, parallel trends is satisfied for contrasts 1-3, so we can be okay with these contrasts contributing to the average effect estimate. Like last time, contrast 4 is the problem, and here the problem is intensified because the “control” state – which was treated in an earlier period – is undergoing a large treatment effect in the “pre” period (times 5-11). This makes the pre-post difference in the control state larger than the pre-post difference in the comparison state – because the control state is still experiencing a stronger (i.e., steeper slope) treatment effect than the state that is treated at time 12. This leads to a DID estimate for this group and time being relatively large, but negative, even though the true effect is to increase the outcome.

10.2 Calculation of summary measures of the causal effect

  1. Time-specific dynamic effects: To see how these are calculated, see columns I and J in the “scen7” sheet of the Excel spreadsheet containing the imported data. Column J corresponds to the difference between the observed outcome (after the policy change) and the counterfactual outcome had the treated state not been treated. This table summarizes the causal effect over time since treatment separately for each treated state:
Time since policy change Causal effect of the policy in earlier treated Causal effect of the policy in later treated Time-specific ATT
1 3 1 2
2 5 2 3.5
3 7 3 5
4 9 4 6.5
9 19 9 (last observed time for later treated) 14
10 21 - 21
-
16 33 - 33

Here, the earlier- and later- treated states had different ATTs. So, we take their average at each time point in the last column. In times 10 or larger, there is only one treated state, so the average is equal to that one state’s causal effect. When estimating the effects dynamically, these are the time-since-treatment-specific parameters we aim to estimate.

You may prefer a summary estimate that aggregates across all post-treatment time into one number. Three possible ways to summarize into one number are as follows:

📘 Causal Effect A

Take a simple average of the heterogeneous treatment effects:

\[\frac{ATT_{\text{3 vs 1,2}} + ATT_{\text{4 vs 1,2}}}{2} = \frac{18 + 5}{2} = 11.5\]

📙 Causal Effect B

Take the average of the causal effects across all the time points:

\[\frac{ATT_{t=1} + ATT_{t=2} + ATT_{t=16}}{\text{Number of post-treatment times}} = \frac{2 + 3.5 + ... + 33}{16} = \frac{261}{16} = 16.3125\]

📗 Causal Effect B

Take a weighted average of the causal effects, where the weights correspond to the number of units treated at each time point:

\[\frac{2\times ATT_{t=1} + 2\times ATT_{t=2} + ... + 2 \times ATT_{t=9} + ATT_{t=10} + ... + ATT_{t=16}}{\text{Number of post-treatment times}}\]

\[\frac{2 \times (2 + 3.5 + ... + 14) + (21 + 23 ... + 33)}{25} = 13.32\]

10.3 TWFE Regression Model

Let’s take a look at the regression output:

s8_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s8)
tidy(s8_mod)
## # A tibble: 24 × 5
##    term          estimate std.error statistic    p.value
##    <chr>            <dbl>     <dbl>     <dbl>      <dbl>
##  1 (Intercept)       6.98      2.76     2.53  0.0143    
##  2 policy1           6.84      2.18     3.13  0.00276   
##  3 state2            1.00      1.58     0.634 0.529     
##  4 state3           11.9       2.35     5.07  0.00000472
##  5 state4            8.17      1.86     4.40  0.0000496 
##  6 factor(time)2     3         3.53     0.851 0.399     
##  7 factor(time)3     6.00      3.53     1.70  0.0944    
##  8 factor(time)4     9.00      3.53     2.55  0.0135    
##  9 factor(time)5    11.0       3.57     3.09  0.00308   
## 10 factor(time)6    14.5       3.57     4.07  0.000147  
## # ℹ 14 more rows

The coefficient of the policy term equals 6.84. This is smaller than any of 📘 Causal Effect A , 📙 Causal Effect B , or 📗 Causal Effect C. Why is the TWFE estimate so much lower? In the previous scenario we learned how the TWFE estimate considers each possible contrast between a treated state a those untreated at the same time. The fourth contrast in this setting did not satisfy the parallel trends assumption, but the other three contrasts did. How much weight did the TWFE estimator put on each of the contrasts? Let’s consider the Goodman Bacon decomposition to see how much each contrast contributed to the TWFE estimate:

10.4 Goodman Bacon Decomposition

s8 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s8_bacon <- bacon(outcome ~ policy_n,
      data = s8,
      id_var = "state_n",
      time_var = "time")
##                       type  weight   avg_est
## 1 Earlier vs Later Treated 0.06715   9.00000
## 2 Later vs Earlier Treated 0.15108 -11.00000
## 3     Treated vs Untreated 0.78177  10.10429
s8_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999       18 0.30695444     Treated vs Untreated
## 3      12     99999        5 0.47482014     Treated vs Untreated
## 6      12         5      -11 0.15107914 Later vs Earlier Treated
## 8       5        12        9 0.06714628 Earlier vs Later Treated

Notice that our fourth contrast (here shown in row 3 of the second table) contributes 15% of the weight to the estimate. We want an estimator that does not use this contrast!

Below we confirm the weighted estimate across the rows equals the TWFE estimate from the regression:

#see the TWFE estimate
sum(s8_bacon$estimate*s8_bacon$weight)
## [1] 6.841727

The TWFE estimator is influenced by the forbidden fourth contrast that doesn’t meet the parallel trends assumption, so we’d like an estimator that doesn’t suffer from this concern.

10.5 Group-Time ATT Estimator

Now that we know how to use the Callaway and Sant’Anna code, we can apply it to this example as well. Let’s start by running the att_gt() function. This time we won’t display the output from summary(), but feel free to uncomment that line in the R markdown document to take a peak:

s8_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s8, 
             control_group = "notyettreated",
             anticipation = 0)
## No pre-treatment periods to test
#summary(s8_cs)

Let’s aggregate the ATTs using type = simple. Callaway and Sant’Anna’s estimate considers only the clean contrasts of 1 and 2 and combines them by weighting by the total time in the post-treatment period. For contrast 1, there are 16 time periods post treatment and for contrast 2, there are 9 periods post-treatment. Thus this estimate is the weighted average: \(18\times(16/25) + 5\times(9/25)=\) 13.32 . Note that this is equal to 📗 Causal Effect C.

#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s8_cs_ag <- aggte(s8_cs, type="simple")
summary(s8_cs_ag)
## 
## Call:
## aggte(MP = s8_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##    ATT    Std. Error     [ 95%  Conf. Int.] 
##  13.32            NA         NA          NA 
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, we can specify type = "group" to get separate effect estimates according to time of implementation and see that these match our by-hand calculations:

s8_cs_ag2 <- aggte(s8_cs, type = "group")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous critival value is NA. This probably happened because we
## cannot compute t-statistic (std errors are NA). We then report pointwise conf.
## intervals.
summary(s8_cs_ag2)
## 
## Call:
## aggte(MP = s8_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on group/cohort aggregation:  
##   ATT    Std. Error     [ 95%  Conf. Int.] 
##  11.5            NA         NA          NA 
## 
## 
## Group Effects:
##  Group Estimate Std. Error [95% Pointwise  Conf. Band] 
##      5       18         NA              NA          NA 
##     12        5         NA              NA          NA 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Note here that the Overall ATT is a simple average of the two groups (e.g., (18 + 5)/2 = 11.5), equal to 📘 Causal Effect A.

Alternatively, you can estimate the dynamic effect separately for each time since the treatment has been introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be alarmed by all the rows in this table!

s8_cs_ag3 <- aggte(s8_cs, type="dynamic")
## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e =
## min_e, : Simultaneous conf. band is somehow smaller than pointwise one using
## normal approximation. Since this is unusual, we are reporting pointwise
## confidence intervals
summary(s8_cs_ag3)
## 
## Call:
## aggte(MP = s8_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##      ATT    Std. Error     [ 95%  Conf. Int.] 
##  16.3125            NA         NA          NA 
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Pointwise  Conf. Band]  
##         -10      0.0         NA              NA          NA  
##          -9      0.0         NA              NA          NA  
##          -8      0.0         NA              NA          NA  
##          -7      0.0         NA              NA          NA  
##          -6      0.0         NA              NA          NA  
##          -5      0.0         NA              NA          NA  
##          -4      0.0         NA              NA          NA  
##          -3      0.0         NA              NA          NA  
##          -2      0.0         NA              NA          NA  
##          -1      0.0         NA              NA          NA  
##           0      2.0     0.7413          0.5471      3.4529 *
##           1      3.5     1.1120          1.3206      5.6794 *
##           2      5.0     1.4826          2.0942      7.9058 *
##           3      6.5     1.8533          2.8677     10.1323 *
##           4      8.0         NA              NA          NA  
##           5      9.5     2.5946          4.4148     14.5852 *
##           6     11.0     2.9652          5.1883     16.8117 *
##           7     12.5     3.3359          5.9618     19.0382 *
##           8     14.0     3.7065          6.7354     21.2646 *
##           9     21.0         NA              NA          NA  
##          10     23.0         NA              NA          NA  
##          11     25.0         NA              NA          NA  
##          12     27.0         NA              NA          NA  
##          13     29.0         NA              NA          NA  
##          14     31.0         NA              NA          NA  
##          15     33.0         NA              NA          NA  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Note here that the Overall ATT is a simple average of all effects in the post-treatment time. This is equal to (2 + 3.5 + 5 + 6.5 + … + 33)/16=16.31, which is equal to 📙 Causal Effect B.

Because the treatment effect is heterogeneous (the later treated state has a smaller treatment effect than the earlier treated state) and because there is less observed treated time for the later state, this means that the treatment effect estimates for event times 0 through 8 combine across the two treated states but for times 9 and over those effect estimates are based only on the first treated state. Without knowing this, it looks like the treatment effect “jumps” between event times 8 and 9 and that there is a slope increase, but this is an artifact based on combining across states with heterogeneous effects with different lengths of observed treatment times.

We can also plot the effect estimates over time and see this jump and slope increase between event times 8 and 9. Note also the confidence interval estimates that occurred for times 0 through 8 since there were two data points that could be used to estimate standard errors.

ggdid(s8_cs_ag3)

Here again we don’t display the results using type = "calendar" but have included the code here in case you want to run it.

s8_cs_ag4 <- aggte(s8_cs, type = "calendar")
summary(s8_cs_ag4)
ggdid(s8_cs_ag4)

10.6 Target Trial Estimator

The code and output below show that the dynamic estimates using the Target Trial approach are the same as those produced by the Group-Time ATT approach.

s8_bm <- fit_event_jack(outcome_var = "outcome", 
                        date_var = "time_as_date", 
                        unit_var = "state",
                        policy_var = "time_first_trt_date", 
                        data = s8, 
                        max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(cohort, event_time)`
s8_bm_ave <- s8_bm %>% filter(cohort == "average")

s8_bm_ave %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

ggplot(s8_bm_ave, aes(x = event_time, y = estimate)) + 
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(aes(col = event_time >= 0)) + 
  geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) + 
  labs(y = "Estimate", x = "Event time") + 
  theme_bw() + 
  scale_color_discrete(labels=c('pre', 'post')) +
  theme(legend.title=element_blank(), legend.position = "bottom")

To get one summary estimate across time points we use the fit_event_jack_sum() function because the effects are dynamic across adoption cohorts:

s8_bm2 <- fit_event_jack_sum(outcome_var = "outcome", 
                        date_var = "time_as_date", 
                        unit_var = "state",
                        policy_var = "time_first_trt_date", 
                        data = s8, 
                        max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s8_bm2$estimate
## [1] 16.3125

This gives the same estimate as produced by the Group-Time ATT estimator when specifying type = "dynamic" in the aggte() function, and also equals 📙 Causal Effect B.

If instead you want to estimate the overarching ATT using the same approach as used when type = "group", you can use the fit_event_jack_sum_hte() function:

s8_bm3 <- fit_event_jack_sum_hte(outcome_var = "outcome", 
                        date_var = "time_as_date", 
                        unit_var = "state",
                        policy_var = "time_first_trt_date", 
                        data = s8, 
                        max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s8_bm3$estimate
## [1] 11.5

This is equal to 📘 Causal Effect A.

Finally if you want to estimate the overarching ATT using the same approach as used when type = “simple” you can use the fit_event_jack_sum_C() function:

s8_bm4 <- fit_event_jack_sum_C(outcome_var = "outcome", 
                        date_var = "time_as_date", 
                        unit_var = "state",
                        policy_var = "time_first_trt_date", 
                        data = s8, 
                        max_time_to = 10e7)
## Dropping 1 
## Dropping 2 
## Dropping 3 
## Dropping 4
s8_bm4$estimate
## [1] 13.32

This is equal to 📗 Causal Effect C.

10.7 TWFE with time_since_policy specification

But what if we run the TWFE model with the time_since_change indicators? Here, I run the regression and filter() out the policy estimates for easy viewing since the regression table is so large:

s8_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s8)
tidy(s8_mod2) %>% filter(str_detect(term, "time_since")) %>% 
  mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>% 
  arrange(time) 
## # A tibble: 16 × 6
##    term                estimate std.error statistic  p.value  time
##    <chr>                  <dbl>     <dbl>     <dbl>    <dbl> <dbl>
##  1 time_since_policy1     0.621      1.13     0.547 5.87e- 1     1
##  2 time_since_policy2     2.06       1.13     1.81  7.73e- 2     2
##  3 time_since_policy3     4.11       1.16     3.56  9.55e- 4     3
##  4 time_since_policy4     5.64       1.16     4.88  1.63e- 5     4
##  5 time_since_policy5     7.17       1.16     6.21  2.19e- 7     5
##  6 time_since_policy6     8.70       1.16     7.53  2.96e- 9     6
##  7 time_since_policy7    10.2        1.16     8.85  4.59e-11     7
##  8 time_since_policy8    11.7        1.20     9.77  2.89e-12     8
##  9 time_since_policy9    13.3        1.20    11.1   6.07e-14     9
## 10 time_since_policy10   18.7        1.67    11.2   5.00e-14    10
## 11 time_since_policy11   20.8        1.67    12.5   1.48e-15    11
## 12 time_since_policy12   23.0        1.67    13.8   5.41e-17    12
## 13 time_since_policy13   25.2        1.67    15.1   2.41e-18    13
## 14 time_since_policy14   27.4        1.67    16.4   1.29e-19    14
## 15 time_since_policy15   29.5        1.68    17.6   1.04e-20    15
## 16 time_since_policy16   31.7        1.68    18.9   7.43e-22    16

The effect estimates on the policy indicators do not equal an average of the ATTs from the two states for each time period. For this example, this specification gives estimates that are systematically lower than the parameters we aim to estimate and to the estimates produced by the Group-Time estimator and the Target Trial estimator.

10.8 Summary

When the treatment effect is staggered, dynamic, and heterogeneous, the TWFE regression specification can produced biased results, as will the specification using a categorical variable for time since treatment. In this case, use one of the newer estimators.

11 Scenario 9 (24 states, staggered timing, dynamic and hetergenous treatment effects)

So far, we have considered simpler examples where the outcomes didn’t incorporate noise. This was helpful for getting started, but doesn’t reflect the imprecision in estimates based on real-world data. Furthermore, we couldn’t estimate standard errors because there was not error in the estimation.

This scenario is more realistic. It increases the number of treated and control units, as is common in analyses across several geographic areas like states. We also added variability to the outcome, with different amounts of variability by state to reflect units containing varying sample sizes.

s9_start <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen9", 
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))

# we hid some code here from displaying in the html version of this file where
# we build up this scenario and add noise. Please view the Rmd document to see
# these lines of code!

11.1 Visualization of Scenario 8

Visualization of all comparisons made by TWFE regression

We start by running the linear model as we have done previously:

s9_mod <- lm(outcome ~ policy + factor(state) + factor(time), 
             data = all_states)
tidy(s9_mod)
## # A tibble: 54 × 5
##    term            estimate std.error statistic  p.value
##    <chr>              <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)       7.70       1.55     4.98   8.06e- 7
##  2 policy1          10.1        0.806   12.5    2.59e-32
##  3 factor(state)11   0.126      1.46     0.0862 9.31e- 1
##  4 factor(state)12  -0.449      1.46    -0.307  7.59e- 1
##  5 factor(state)13  -1.04       1.46    -0.711  4.77e- 1
##  6 factor(state)14  -0.430      1.46    -0.294  7.69e- 1
##  7 factor(state)15  -0.0183     1.46    -0.0125 9.90e- 1
##  8 factor(state)20   0.713      1.46     0.488  6.26e- 1
##  9 factor(state)21  -1.75       1.46    -1.20   2.32e- 1
## 10 factor(state)22   0.668      1.46     0.457  6.48e- 1
## # ℹ 44 more rows

Up until now, we haven’t worried about the estimation of the 95% confidence interval because our previous datasets didn’t contain any noise. In practice, we want to estimate confidence intervals and to do that correctly, we need to account for the clustering on data over time within state. To do that we use the vcovHC function from the sandwich package.

#install.packages("sandwich")
library(sandwich)
m2_var <- vcovHC(s9_mod)

We can then pull out the variance from the policy1 term from the variance-covariance matrix and use it to compute Wald type confidence intervals:

est = summary(s9_mod)$coefficients["policy1", "Estimate"] 
lb = summary(s9_mod)$coefficients["policy1", "Estimate"] - 1.96*sqrt(m2_var["policy1","policy1"]) 
ub = summary(s9_mod)$coefficients["policy1", "Estimate"] + 1.96*sqrt(m2_var["policy1","policy1"])

The estimated effect is 10.1 with a 95% CI from 8.4 to 11.8.

11.2 Goodman Bacon Decomposition

We can use the Goodman Bacon decomposition to examine the contrasts contributing to the TWFE estimate.

The DID contrasts for the four comparisons are:

  1. Treated at time 15 vs. never-treated: 18
  2. Treated at time 22 vs. never-treated: 5
  3. Treated at time 22 vs. post-treatment time for states treated at time 5: -11
  4. Treated at time 5 vs. pre-treatment time for states treated at time 22: 9

We can compare this to the output from the Goodman Bacon decomposition:

all_states %<>% mutate(policy_n = as.numeric(policy))

s9_bacon <- bacon(outcome ~ policy_n,
      data = all_states,
      id_var = "state",
      time_var = "time")
##                       type  weight  avg_est
## 1 Earlier vs Later Treated 0.09929  9.48462
## 2 Later vs Earlier Treated 0.06383 -9.69867
## 3     Treated vs Untreated 0.83688 11.64355
s9_bacon
##   treated untreated  estimate     weight                     type
## 2      15     99999 17.488998 0.45390071     Treated vs Untreated
## 3      22     99999  4.715617 0.38297872     Treated vs Untreated
## 6      22        15 -9.698670 0.06382979 Later vs Earlier Treated
## 8      15        22  9.484617 0.09929078 Earlier vs Later Treated

We can see that the estimates displayed in the second table are not too far off from the DID contrasts listed above. For this scenario, 84% of the weight is on the treated vs. untreated and about 10% of the weight is on the earlier vs. the later treated, with only 6% of the weight on the forbidden contrast. In the next section we will see how these weights change if pre-treatment time is shortened.

11.3 Group-Time ATT Estimator

Compare to Callaway and Sant’Anna’s approach. We will start be specifying type="simple" in the aggte() function:

s9_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state", 
             gname = "time_first_treat", 
             data = all_states, 
             control_group = "nevertreated",
             anticipation = 0)
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state", gname =
## "time_first_treat", : Not returning pre-test Wald statistic due to singular
## covariance matrix
s9_cs_ag <- aggte(s9_cs, type="simple")
summary(s9_cs_ag)
## 
## Call:
## aggte(MP = s9_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##      ATT    Std. Error     [ 95%  Conf. Int.]  
##  13.0551        3.1213     6.9375     19.1727 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
lb = s9_cs_ag$overall.att - 1.96*s9_cs_ag$overall.se 
ub = s9_cs_ag$overall.att + 1.96*s9_cs_ag$overall.se
round(c(s9_cs_ag$overall.att, lb, ub), 2)
## [1] 13.06  6.94 19.17

The Group-Time estimate is 13.06 with 95% CI 6.94 to 19.17.

The simple ATT is a weighted average (based on the amount of time in the post-treatment period) of the estimated effects for the treated groups calculated like this:

(19.09088*(16/25)) + (2.324934*(9/25))
## [1] 13.05514

To figure out what the group specific estimates are we use type = group:

s9_cs_ag2 <- aggte(s9_cs, type = "group")
summary(s9_cs_ag2)
## 
## Call:
## aggte(MP = s9_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on group/cohort aggregation:  
##      ATT    Std. Error     [ 95%  Conf. Int.]  
##  10.7079        1.9083     6.9678     14.4481 *
## 
## 
## Group Effects:
##  Group Estimate Std. Error [95% Simult.  Conf. Band]  
##     15  19.0909     2.9566       13.2694     24.9124 *
##     22   2.3249     2.4761       -2.5505      7.2003  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Here the overall ATT is a simple average of the group-specific ATTs (10.7079).

Finally, we can look at these effect estimates separately for each time since the intervention began and graph them:

s9_cs_ag3 <- aggte(s9_cs, type="dynamic")
summary(s9_cs_ag3)
## 
## Call:
## aggte(MP = s9_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##      ATT    Std. Error     [ 95%  Conf. Int.]  
##  16.0589        2.5464    11.0681     21.0497 *
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
##         -20  -0.6142     2.3984       -6.8829      5.6545  
##         -19  -2.0235     2.2139       -7.8097      3.7627  
##         -18   2.5619     2.3638       -3.6163      8.7401  
##         -17  -1.1189     2.2896       -7.1030      4.8653  
##         -16  -2.7910     2.5026       -9.3319      3.7498  
##         -15   0.6003     1.5647       -3.4893      4.6898  
##         -14   2.6788     3.0229       -5.2220     10.5796  
##         -13  -0.5176     2.0436       -5.8588      4.8235  
##         -12   0.5020     2.7170       -6.5993      7.6034  
##         -11  -1.0797     2.3658       -7.2631      5.1037  
##         -10   1.2770     3.0557       -6.7094      9.2634  
##          -9  -1.1832     2.3450       -7.3122      4.9458  
##          -8   1.9790     2.2550       -3.9147      7.8726  
##          -7   1.4156     2.0807       -4.0226      6.8537  
##          -6  -2.3957     3.2051      -10.7727      5.9812  
##          -5   0.6336     1.7616       -3.9705      5.2376  
##          -4  -1.6087     2.8316       -9.0094      5.7920  
##          -3  -1.8200     1.8068       -6.5425      2.9024  
##          -2   1.4925     2.5145       -5.0794      8.0643  
##          -1   1.3875     2.0779       -4.0435      6.8184  
##           0   2.4347     3.4790       -6.6581     11.5275  
##           1   4.4729     3.0790       -3.5745     12.5203  
##           2   5.2589     2.3329       -0.8386     11.3563  
##           3   5.8989     2.6967       -1.1494     12.9472  
##           4   6.2301     2.1717        0.5540     11.9062 *
##           5   6.3651     2.8537       -1.0935     13.8238  
##           6  11.8416     2.9819        4.0481     19.6352 *
##           7  11.9494     3.4280        2.9899     20.9089 *
##           8  14.9849     2.9229        7.3455     22.6242 *
##           9  20.6918     7.1567        1.9867     39.3968 *
##          10  20.7157     2.1385       15.1264     26.3051 *
##          11  25.9967     3.4426       16.9991     34.9943 *
##          12  27.7174     4.2341       16.6510     38.7839 *
##          13  30.3829     3.6997       20.7132     40.0526 *
##          14  31.0607     2.7300       23.9256     38.1958 *
##          15  30.9403     4.8280       18.3216     43.5590 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
ggdid(s9_cs_ag3)

Now that we have confidence intervals, we can assess the parallel trends assumption by investigating if the pre-treatment confidence intervals overlap the null and are randomly distributed around the zero line. This is true in our example, so we wouldn’t be concerned about violating the parallel trends assumption with these data.

11.4 Cohort ATT Estimator

Now we can also calculate effects using the Cohort ATT estimator. We couldn’t run this estimator for the other scenarios because the function throws an error if it is unable to estimate standard errors. To use the staggered_sa function, we need to add a new variable where the time_first_treat variable is set to Inf for all states that were never-treated:

table(all_states$time_first_treat, useNA = "always")
## 
##    0   15   22 <NA> 
##  360  180  180    0
all_states$time_first_treat2 <- all_states$time_first_treat
all_states$time_first_treat2[all_states$time_first_treat == 0] <- Inf

#check that the recoding is as planned:
table(all_states$time_first_treat, all_states$time_first_treat2, useNA = "always")
##       
##         15  22 Inf <NA>
##   0      0   0 360    0
##   15   180   0   0    0
##   22     0 180   0    0
##   <NA>   0   0   0    0

For this function, we need to specify the data frame in the df argument, the clustering unit in the i argument, the time unit in the t argument. Here we use the newly-defined time_first_treat2 as the grouping variable g. We start by setting estimand = "simple":

s9_sa <- staggered_sa(df = all_states,
                      i = "state",
                      t = "time",
                      g = "time_first_treat2",
                      y = "outcome", 
                      estimand = "simple")

The estimate of the treatment effect is 13.06, which is equivalent to the effect estimate from the Group-Time ATT estimator when type = "simple". We can calculate its confidence interval using either of these:

#95% CI
round(c(s9_sa$estimate, s9_sa$estimate - 1.96*s9_sa$se, s9_sa$estimate + 1.96*s9_sa$se), 2)
## [1] 13.06  8.90 17.21
round(c(s9_sa$estimate, s9_sa$estimate - 1.96*s9_sa$se_neyman, s9_sa$estimate + 1.96*s9_sa$se_neyman), 2)
## [1] 13.06  8.71 17.40

To calculate an average weights by group size (or cohort size using the terminology employed in this function), we set estimand = "cohort":

s9_sa_group <- staggered_sa(df = all_states,
                      i = "state",
                      t = "time",
                      g = "time_first_treat2",
                      y = "outcome", 
                      estimand = "cohort")

s9_sa_group$estimate
## [1] 10.70791
#95% CI
round(c(s9_sa_group$estimate, 
        s9_sa_group$estimate - 1.96*s9_sa_group$se, 
        s9_sa_group$estimate + 1.96*s9_sa_group$se), 2)
## [1] 10.71  7.02 14.39
round(c(s9_sa_group$estimate, 
        s9_sa_group$estimate - 1.96*s9_sa_group$se_neyman, 
        s9_sa_group$estimate + 1.96*s9_sa_group$se_neyman), 2)
## [1] 10.71  6.81 14.61

Here, the Cohort ATT effect estimate is equivalent to the Group-Time estimate where type = "group".

Finally, we can also use the Cohort ATT function to calculate dynamic effect estimates by setting estimand = "eventstudy". This function also requires we specify which event times we would like estimates for. To produce estimates we can compare with the Group Time ATT effects, we specify eventTime = c(-20:15), which says to begin looking 20 time points before the policy change up until 15 time points after the policy change.

times <- c(-20:15)

s9_sa_dynamic <- staggered_sa(df = all_states,
                      i = "state",
                      t = "time",
                      g = "time_first_treat2",
                      y = "outcome", 
                      estimand = "eventstudy",
                      eventTime = times)

# gather the estimates and CI information into a data frame for plotting
sa_dynamic_estimates <- tibble(
  `Event Time` = times,
  `Estimate` = s9_sa_dynamic$estimate, 
  `Std. Error` = s9_sa_dynamic$se,
  `95% lower bound` = s9_sa_dynamic$estimate - 1.96*s9_sa_dynamic$se,
  `95% upper bound` = s9_sa_dynamic$estimate + 1.96*s9_sa_dynamic$se,
  `Std. Error (Neyman)` = s9_sa_dynamic$se_neyman,
  `95% lower bound (Neyman)` = s9_sa_dynamic$estimate - 1.96*s9_sa_dynamic$se_neyman,
  `95% upper bound (Neyman)` = s9_sa_dynamic$estimate + 1.96*s9_sa_dynamic$se_neyman)
sa_dynamic_estimates
## # A tibble: 36 × 8
##    `Event Time` Estimate `Std. Error` `95% lower bound` `95% upper bound`
##           <int>    <dbl>        <dbl>             <dbl>             <dbl>
##  1          -20  -1.38           2.22             -5.72             2.97 
##  2          -19  -2.39           2.35             -7.00             2.21 
##  3          -18  -1.11           2.24             -5.50             3.27 
##  4          -17  -1.67           1.85             -5.30             1.96 
##  5          -16  -3.07           2.02             -7.03             0.900
##  6          -15  -2.77           1.98             -6.65             1.11 
##  7          -14  -0.0820         2.00             -3.99             3.83 
##  8          -13  -0.600          3.30             -7.07             5.87 
##  9          -12  -0.0976         2.33             -4.67             4.48 
## 10          -11  -1.18           2.18             -5.44             3.09 
## # ℹ 26 more rows
## # ℹ 3 more variables: `Std. Error (Neyman)` <dbl>,
## #   `95% lower bound (Neyman)` <dbl>, `95% upper bound (Neyman)` <dbl>

We can then create a plot similar to the one produced by the ggdid() command:

ggplot(sa_dynamic_estimates, aes(x = `Event Time`, y = Estimate)) + 
  geom_hline(yintercept = 0) + 
  geom_linerange(aes(xmin = `Event Time`, 
                     ymin = `95% lower bound (Neyman)`, 
                     ymax = `95% upper bound (Neyman)`,
                     col = `Event Time` >= 0)) + 
  geom_point(aes(col = `Event Time` >= 0)) +
  theme_bw() +
  scale_color_discrete(labels=c('pre', 'post')) +
  theme(legend.title=element_blank(), legend.position = "bottom")

Overall, the Group-Time estimates and Cohort ATT estimates are quite close and have overlapping confidence intervals. Favorably, the summary estimates are both higher than the TWFE estimate because they do not use information based on the forbidden contrast.

11.5 Target Trial Estimator

We also consider the Target trial estimator. This function relies on the date variable being a date object rather than a numeric index. Without loss of generality, we add dates to this dataset picking Jan 1, 2015 as the start date by way of exposition and incrementing time by one-month intervals:

all_states$time2 <- as.Date("2015-01-01") #date 0
all_states$period1 <- paste(all_states$time, " month")
all_states$time3 <- all_states$time2 %m+% period(all_states$period1) #new time variable

all_states %<>% select(-period1, -time2)

To use this estimator, the time_first_treat variable needs to be set to missing for the untreated states for the function to run:

all_states %<>% 
  mutate(time_first_treat3 = case_when(time_first_treat2 == 15 ~ as.Date("2016-04-01"),
                                       time_first_treat2 == 22 ~ as.Date("2016-11-01"),
                                       is.infinite(time_first_treat2) ~ NA_Date_))

#check the coding
#table(all_states2$time_first_treat2, all_states2$time_first_treat3, useNA = "ifany")

Then we compute the estimates and plot them. The results are very similar to the Group-Time ATT and Cohort ATT estimates plot, showing no discernible trends in the pre-exposure period, and an increasing effect of treatment in the post-exposure period.

s9_bm <- fit_event_jack(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
## `summarise()` has grouped output by 'cohort'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(cohort, event_time)`
s9_bm_ave <- s9_bm %>% filter(cohort == "average")

s9_bm_ave %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

ggplot(s9_bm_ave, aes(x = event_time, y = estimate)) + 
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(aes(col = event_time >= 0)) + 
  geom_linerange(aes(ymin = lb, ymax = ub, col = event_time >= 0)) + 
  labs(y = "Estimate", x = "Event time") + 
  theme_bw() + 
  scale_color_discrete(labels=c('pre', 'post')) +
  theme(legend.title=element_blank(), legend.position = "bottom")

We then use the fit_event_jack_sum set of functions to calculate the ATT using three different methods:

s9_bm_sum <- fit_event_jack_sum(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
s9_bm_sum %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s9_bm_sum
##         se estimate       lb       ub
## 1 2.686804 16.05888 10.79274 21.32501
s9_bm_sum_hte <- fit_event_jack_sum_hte(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
s9_bm_sum_hte %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s9_bm_sum_hte
##         se estimate      lb       ub
## 1 3.243503 10.70791 4.35064 17.06517
s9_bm_sum_simple <- fit_event_jack_sum_C(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
s9_bm_sum_simple %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s9_bm_sum_simple
##         se estimate       lb       ub
## 1 3.254358 13.05514 6.676597 19.43368

12 Scenario 10 (same as previous but with less pre-exposure time)

In this scenario, we use the same data from Scenario 9, except we limit the pre-intervention point to 5 time units, rather than the 15 in Scenario 9. The amount of pre-intervention time should not affect the size of the effect estimate because the size of the change from the previous scenario is the same. The goal of this scenario is to see how the different estimators are affected by this change in the amount of pre-intervention time.

We start by limiting the time frame to begin at time t=10 or larger:

all_states2 <- all_states %>% filter(time >= 10)

12.1 Comparison of TWFE Estimates

And then we re-run the model:

s10_mod <- lm(outcome ~ policy + factor(state) + factor(time), 
             data = all_states2)
tidy(s10_mod)
## # A tibble: 45 × 5
##    term            estimate std.error statistic  p.value
##    <chr>              <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)       33.9        1.82    18.7   3.35e-58
##  2 policy1            7.62       1.04     7.32  1.11e-12
##  3 factor(state)11    1.04       1.87     0.556 5.79e- 1
##  4 factor(state)12   -0.669      1.87    -0.358 7.21e- 1
##  5 factor(state)13   -1.18       1.87    -0.629 5.29e- 1
##  6 factor(state)14   -0.475      1.87    -0.254 8.00e- 1
##  7 factor(state)15    0.451      1.87     0.241 8.09e- 1
##  8 factor(state)20    0.261      1.87     0.140 8.89e- 1
##  9 factor(state)21   -1.88       1.87    -1.00  3.16e- 1
## 10 factor(state)22    0.397      1.87     0.212 8.32e- 1
## # ℹ 35 more rows

Again, we use the vcovHC function to estimate the SE and obtain the 95% confidence interval for the policy coefficient:

m2_var <- vcovHC(s10_mod)
est2 = summary(s10_mod)$coefficients["policy1", "Estimate"] 
lb2 = summary(s10_mod)$coefficients["policy1", "Estimate"] - 1.96*sqrt(m2_var["policy1","policy1"]) 
ub2 = summary(s10_mod)$coefficients["policy1", "Estimate"] + 1.96*sqrt(m2_var["policy1","policy1"])

The estimated effect is 7.6 with a 95% CI from 5.4 to 9.8. Let’s compare this with the estimate from Scenario 9:

Estimate (95% CI)
Scenario 9 10.1 (6.9,19.2)
Scenario 10 7.6 (5.4,9.8)

The estimate for Scenario 10 is much lower than that for Scenario 9.

12.2 Comparison of Goodman Bacon decomposition weights

Let’s see how the weights from the Bacon decomposition for Scenario 10 compare to Scenario 9.

Scenario 10 weights:

all_states2 %<>% mutate(policy_n = as.numeric(policy))

s10_bacon <- bacon(outcome ~ policy_n,
      data = all_states2,
      id_var = "state",
      time_var = "time")
##                       type  weight  avg_est
## 1 Earlier vs Later Treated 0.07384  9.78211
## 2 Later vs Earlier Treated 0.13291 -9.69867
## 3     Treated vs Untreated 0.79325 10.32571
s10_bacon
##   treated untreated  estimate     weight                     type
## 2      15     99999 17.822890 0.33755274     Treated vs Untreated
## 3      22     99999  4.772245 0.45569620     Treated vs Untreated
## 6      22        15 -9.698670 0.13291139 Later vs Earlier Treated
## 8      15        22  9.782109 0.07383966 Earlier vs Later Treated

Scenario 9 weights:

s9_bacon
##   treated untreated  estimate     weight                     type
## 2      15     99999 17.488998 0.45390071     Treated vs Untreated
## 3      22     99999  4.715617 0.38297872     Treated vs Untreated
## 6      22        15 -9.698670 0.06382979 Later vs Earlier Treated
## 8      15        22  9.484617 0.09929078 Earlier vs Later Treated

Comparing the weights:

##   treated untreated estimate Scenario 9 weight Scenario 10 weight
## 1      15     99999    17.82              0.45               0.34
## 2      22     99999     4.77              0.38               0.46
## 3      22        15    -9.70              0.06               0.13
## 4      15        22     9.78              0.10               0.07
##   Weight difference                     type
## 1             -0.12     Treated vs Untreated
## 2              0.07     Treated vs Untreated
## 3              0.07 Later vs Earlier Treated
## 4             -0.03 Earlier vs Later Treated

More weight is put on the states treated at time=22 in Scenario 10 vs. 9 and less weight on the states treated at time=15. This is because the TWFE estimator is a weighted combination of contrasts, where the weights are partially determined by how close the contrast is to the middle of the panel. More weight is put on the estimates for groups with changes closer to the middle of the panel. For Scenario 9, the states with treatment changes at time 15 are exactly in the middle of the panel (which had 30 time units), but for Scenario 10, with a shortened pre-treatment period, less weight is put on this group as they are further away from the middle of the panel and more weight is put on the group that introduced treatment at time point 22 because this becomes closer to the middle.

This weighting does not likely correspond to any weighting that the researcher would choose. Concerning!

12.3 Comparison of Group-Time Estimates

Let’s see how the other estimators are affected by this change:

s10_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state", 
             gname = "time_first_treat", 
             data = all_states2, 
             control_group = "notyettreated",
             anticipation = 0)

s10_cs_ag <- aggte(s10_cs, type="simple")
summary(s10_cs_ag)
## 
## Call:
## aggte(MP = s10_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##      ATT    Std. Error     [ 95%  Conf. Int.]  
##  12.9667        3.4804     6.1452     19.7882 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Not Yet Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

The Callaway Sant’Anna estimate of the ATT is the same as for Scenario 9 (with a slightly lower standard error).

12.4 Comparison of Cohort ATT estimates

What about the Cohort ATT estimate?

s10_sa <- staggered_sa(df = all_states2,
                      i = "state",
                      t = "time",
                      g = "time_first_treat2",
                      y = "outcome", 
                      estimand = "simple")

s10_sa$estimate
## [1] 13.05514
round(c(s9_sa$estimate, s9_sa$estimate - 1.96*s9_sa$se, s9_sa$estimate + 1.96*s9_sa$se), 2)
## [1] 13.06  8.90 17.21
round(c(s9_sa$estimate, s9_sa$estimate - 1.96*s9_sa$se_neyman, s9_sa$estimate + 1.96*s9_sa$se_neyman), 2)
## [1] 13.06  8.71 17.40

Like the Group-Time ATT estimate, the Cohort ATT estimate stays exactly the same.

12.5 Comparison of the Target trial estimates

Lastly, what happens to the Target Trial estimates? We compare them below:

s9_bm_hte <- fit_event_jack_sum_hte(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
s9_bm_hte %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s9_bm_hte
##         se estimate      lb       ub
## 1 3.243503 10.70791 4.35064 17.06517
s10_bm_hte <- fit_event_jack_sum_hte(outcome_var = "outcome", 
                        date_var = "time3", 
                        unit_var = "state",
                        policy_var = "time_first_treat3", 
                        data = all_states2, 
                        max_time_to = 10e7)
## Dropping 10 
## Dropping 11 
## Dropping 12 
## Dropping 13 
## Dropping 14 
## Dropping 15 
## Dropping 20 
## Dropping 21 
## Dropping 22 
## Dropping 23 
## Dropping 24 
## Dropping 25 
## Dropping 30 
## Dropping 31 
## Dropping 32 
## Dropping 33 
## Dropping 34 
## Dropping 35 
## Dropping 40 
## Dropping 41 
## Dropping 42 
## Dropping 43 
## Dropping 44 
## Dropping 45
s10_bm_hte %<>% mutate(lb = estimate - 1.96*se,
                      ub = estimate + 1.96*se)

s10_bm_hte
##         se estimate      lb       ub
## 1 3.243503 10.70791 4.35064 17.06517

The Target Trial estimates stay exactly the same as well.

12.6 Summary

The TWFE estimate is affected by the length of the panel. It places more weight on contrasts that are closer to the middle of the panel. The newer estimators do not have this issue.