In this application, I attempt to replicate parts of the paper Arkhangelsky, Athey, Hirshberg, Imbens, and Wager (2021) “Synthetic Difference-in-Differences.” American Economic Review, 111 (12): 4088-4118. This is purely a replication exercise.
# Loading packages
knitr::opts_chunk$set(echo = TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.height=4)
necessaryPackages <- c("foreign","reshape","rvest","tidyverse","dplyr","stringr","ggplot2","stargazer","readr","haven","Synth","devtools","SCtools","augsynth","synthdid")
new.packages <- necessaryPackages[
!(necessaryPackages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(necessaryPackages, require, character.only = TRUE)
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
##
## [[8]]
## [1] TRUE
##
## [[9]]
## [1] TRUE
##
## [[10]]
## [1] TRUE
##
## [[11]]
## [1] TRUE
##
## [[12]]
## [1] TRUE
##
## [[13]]
## [1] TRUE
##
## [[14]]
## [1] TRUE
##
## [[15]]
## [1] TRUE
if(!require(SCtools)) devtools::install_github("bcastanho/SCtools")
if(!require(augsynth)) devtools::install_github("ebenmichael/augsynth")
if(!require(synthdid)) devtools:: install_github("synth-inference/synthdid")
# Importing the dataset
data("california_prop99")
# Describing the dataset and setting it up as a panel
applicationdata = panel.matrices(california_prop99)
summary(applicationdata)
## Length Class Mode
## Y 1209 -none- numeric
## N0 1 -none- numeric
## T0 1 -none- numeric
## W 1209 -none- numeric
str(applicationdata)
## List of 4
## $ Y : num [1:39, 1:31] 89.8 100.3 124.8 120 155 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:39] "Alabama" "Arkansas" "Colorado" "Connecticut" ...
## .. ..$ : chr [1:31] "1970" "1971" "1972" "1973" ...
## $ N0: int 38
## $ T0: num 19
## $ W : int [1:39, 1:31] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:39] "Alabama" "Arkansas" "Colorado" "Connecticut" ...
## .. ..$ : chr [1:31] "1970" "1971" "1972" "1973" ...
The data source is Abadie, Diamond, and Hainmueller (2010) “Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program.” Journal of the American statistical Association 105, no. 490 (2010): 493-505.
The dataset is a panel of 39 units (states) observed in 31 time periods (years 1970 through 2000). Treated unit is California and and control units are the remaining 38 states. There are 19 pre-treatment time periods (years 1970 through 1988) and 12 post-treatment time periods (years 1989 through 2000).
# SDID-estimated average treatment effect for the treated
tau.hat = synthdid_estimate(applicationdata$Y, applicationdata$N0, applicationdata$T0)
se = sqrt(vcov(tau.hat, method='placebo'))
sprintf('SDID Estimate: %1.2f', tau.hat) # It matches 15.6 in the second column and second row of Table 1 in Arkhangelsky, Athey, Hirshberg, Imbens, and Wager (2021)
## [1] "SDID Estimate: -15.60"
sprintf('SDID Standard error: %1.2f', se) # It is close to 8.4 in the second column and third row of Table 1 in Arkhangelsky, Athey, Hirshberg, Imbens, and Wager (2021)
## [1] "SDID Standard error: 8.47"
sprintf('95%% CI (%1.2f, %1.2f)', tau.hat - 1.96 * se, tau.hat + 1.96 * se)
## [1] "95% CI (-32.20, 0.99)"
plot(tau.hat) # It matches plot in the first row and third column of Figure 1 in Arkhangelsky, Athey, Hirshberg, Imbens, and Wager (2021)
# Control unit weights and contribution plot
top18unitwgt = synthdid_controls(tau.hat)[1:18, , drop=FALSE]
synthdid_units_plot(tau.hat, units = rownames(top18unitwgt)) # It matches plot in the second row and third column of Figure 1 in Arkhangelsky, Athey, Hirshberg, Imbens, and Wager (2021)