Augmented Synthetic Control Method –
augsynth R package
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
echo = TRUE,
dpi = 300,
tidy = "styler",
fig.width = 8,
fig.height = 5
)Import Packages
Check: augsynth📦 R package tutorial.
Data
To show the usage and features of augsynth, we’ll use
data on the impact of personal income tax cuts in Kansas that comes with
the AugSynth package. Our interest is in estimating the
effect of income tax cuts on gross state product (GSP) per capita.
## # A tibble: 6 × 32
## fips year qtr state gdp revenuepop rev_state_total rev_local_total
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 1990 1 Alabama 71610 NA NA NA
## 2 1 1990 2 Alabama 72718. NA NA NA
## 3 1 1990 3 Alabama 73826. NA NA NA
## 4 1 1990 4 Alabama 74935. NA NA NA
## 5 1 1991 1 Alabama 76043 NA NA NA
## 6 1 1991 2 Alabama 77347. NA NA NA
## # ℹ 24 more variables: popestimate <dbl>, qtrly_estabs_count <dbl>,
## # month1_emplvl <dbl>, month2_emplvl <dbl>, month3_emplvl <dbl>,
## # total_qtrly_wages <dbl>, taxable_qtrly_wages <dbl>, avg_wkly_wage <dbl>,
## # year_qtr <dbl>, treated <dbl>, gdpcapita <dbl>, lngdp <dbl>,
## # lngdpcapita <dbl>, revstatecapita <dbl>, revlocalcapita <dbl>,
## # emplvl1capita <dbl>, emplvl2capita <dbl>, emplvl3capita <dbl>,
## # emplvlcapita <dbl>, totalwagescapita <dbl>, taxwagescapita <dbl>, …
## Rows: 5,250
## Columns: 32
## $ fips <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ year <dbl> 1990, 1990, 1990, 1990, 1991, 1991, 1991, 1991, 19…
## $ qtr <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1,…
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alaba…
## $ gdp <dbl> 71610.00, 72718.25, 73826.50, 74934.75, 76043.00, …
## $ revenuepop <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 1443.286, 1443.286…
## $ rev_state_total <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ rev_local_total <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ popestimate <dbl> 4050055, 4062330, 4074606, 4086881, 4099156, 41128…
## $ qtrly_estabs_count <dbl> 86586, 86188, 86799, 87915, 91626, 91880, 93501, 9…
## $ month1_emplvl <dbl> 1571410, 1599077, 1599355, 1612133, 1581388, 15988…
## $ month2_emplvl <dbl> 1572535, 1610310, 1604376, 1614907, 1580574, 16079…
## $ month3_emplvl <dbl> 1581994, 1618397, 1611040, 1615504, 1590768, 16110…
## $ total_qtrly_wages <dbl> 7832159837, 8107228382, 8102930357, 8725347965, 81…
## $ taxable_qtrly_wages <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ avg_wkly_wage <dbl> 382, 388, 388, 416, 396, 404, 406, 432, 414, 416, …
## $ year_qtr <dbl> 1990.00, 1990.25, 1990.50, 1990.75, 1991.00, 1991.…
## $ treated <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ gdpcapita <dbl> 17681.24, 17900.62, 18118.69, 18335.44, 18550.89, …
## $ lngdp <dbl> 11.17899, 11.19435, 11.20947, 11.22437, 11.23905, …
## $ lngdpcapita <dbl> 9.780260, 9.792591, 9.804699, 9.816591, 9.828273, …
## $ revstatecapita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ revlocalcapita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ emplvl1capita <dbl> 0.3879972, 0.3936354, 0.3925178, 0.3944654, 0.3857…
## $ emplvl2capita <dbl> 0.3882750, 0.3964006, 0.3937500, 0.3951441, 0.3855…
## $ emplvl3capita <dbl> 0.3906105, 0.3983913, 0.3953855, 0.3952902, 0.3880…
## $ emplvlcapita <dbl> 0.3889609, 0.3961424, 0.3938844, 0.3949666, 0.3864…
## $ totalwagescapita <dbl> 1933.840, 1995.709, 1988.642, 2134.965, 1992.086, …
## $ taxwagescapita <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ avgwklywagecapita <dbl> 382, 388, 388, 416, 396, 404, 406, 432, 414, 416, …
## $ estabscapita <dbl> 0.02137897, 0.02121639, 0.02130243, 0.02151152, 0.…
## $ abb <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A…
kansas %>%
select(year, qtr, year_qtr, state, treated, gdp, lngdpcapita) %>%
filter(state == "Kansas" & year_qtr >= 2012 & year_qtr < 2013)## # A tibble: 4 × 7
## year qtr year_qtr state treated gdp lngdpcapita
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2012 1 2012 Kansas 0 143844 10.8
## 2 2012 2 2012. Kansas 1 141518 10.8
## 3 2012 3 2012. Kansas 1 138890 10.8
## 4 2012 4 2013. Kansas 1 139603 10.8
🎯 Data Requirement
The data should be long panel with columns:
- unit
- time
- outcome
- treatment
In this example,
- unit =
fips - time =
year_qtr - outcome =
lngdpcapita - treatment =
treated
Visusalization
Check treatment status
# check treatment status
panelView::panelview(
data = kansas,
formula = lngdpcapita ~ treated,
index = c("fips", "year_qtr"), # index = (unit, time)
type = "treat",
axis.lab = "time",
xlab = "Time",
ylab = "Unit",
axis.adjust = TRUE,
by.timing = TRUE,
background = "white",
main = "Treatment Status"
)Check outcome variable
# check outcome variable
panelView::panelview(
data = kansas,
formula = lngdpcapita ~ treated,
index = c("fips", "year_qtr"), # index = (unit, time)
type = "outcome", # change this to outcome
axis.lab = "time",
xlab = "Time",
ylab = "Unit",
axis.adjust = TRUE,
by.timing = TRUE,
background = "white",
main = "Outcome variable"
)Synthetic Control
Now to find a synthetic control using the entire series of
pre-intervention outcomes (and no auxiliary covariates), we can use
augsynth. To do so we just need to do the following:
Specify a formula:
outcome ~ treatmentSpecify the unit variable and time variable. Note: optionally provide when intervention took place (the code will automatically determine this if
t_intis not provided)For synthetic control, we need to specify that we don’t want to fit an outcome model
syn <- augsynth(
form = lngdpcapita ~ treated,
unit = fips,
time = year_qtr,
data = kansas,
progfunc = "None", # we don't want to fit an outcome model
scm = TRUE # whether the SCM weighting function is used
)
# summary
res <- summary(syn)
res##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "None", scm = TRUE)
##
## Average ATT Estimate (p Value for Joint Null): -0.0294 ( 0.33 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
##
## Avg Estimated Bias: NA
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.018 -0.045 0.008 0.140
## 2012.50 -0.041 -0.068 -0.007 0.019
## 2012.75 -0.033 -0.062 -0.004 0.041
## 2013.00 -0.019 -0.046 0.010 0.120
## 2013.25 -0.029 -0.053 0.000 0.043
## 2013.50 -0.046 -0.073 -0.022 0.019
## 2013.75 -0.032 -0.056 -0.008 0.021
## 2014.00 -0.045 -0.074 -0.018 0.020
## 2014.25 -0.043 -0.074 -0.011 0.018
## 2014.50 -0.029 -0.066 0.000 0.034
## 2014.75 -0.018 -0.053 0.016 0.142
## 2015.00 -0.029 -0.074 0.010 0.075
## 2015.25 -0.019 -0.058 0.013 0.116
## 2015.50 -0.022 -0.063 0.007 0.104
## 2015.75 -0.019 -0.060 0.013 0.164
## 2016.00 -0.028 -0.072 0.008 0.090
## Value Estimate Std.Error p_val lower_bound
## 1 Average Post-Treatment Effect -0.02943473 NA 0.326 NA
## upper_bound
## 1 NA
## Time Estimate lower_bound upper_bound p_val
## 2014.75 2014.75 -0.01849616 -0.05262764 0.015635328 0.142
## 2015 2015.00 -0.02930564 -0.07355016 0.009882359 0.075
## 2015.25 2015.25 -0.01908377 -0.05827177 0.012519454 0.116
## 2015.5 2015.50 -0.02162342 -0.06333968 0.007451545 0.104
## 2015.75 2015.75 -0.01856676 -0.06028302 0.013036471 0.164
## 2016 2016.00 -0.02816962 -0.07241414 0.008490122 0.090
We can obtain:
ATT estimates for each post-intervention time period and overall
The quality of the synthetic control fit measured by the L2 distance between Kansas and its synthetic control
The percent improvement over uniform weights
By default, we’ll also see point-wise confidence intervals using a conformal inference procedure.
Statistical Inference
The default test statistic is the sum of the absolute treatment
effects function(x) sum(abs(x)). We can change the test
statistic via the stat_func argument. For instance, if we
want to perform a one-way test against positive effects, we can set the
test statistic to be the negative sum
function(x) -sum(x):
##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "None", scm = TRUE)
##
## Average ATT Estimate (p Value for Joint Null): -0.0294 ( 0.15 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
##
## Avg Estimated Bias: NA
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.018 -0.080 0.006 0.057
## 2012.50 -0.041 -0.103 -0.015 0.015
## 2012.75 -0.033 -0.095 -0.007 0.020
## 2013.00 -0.019 -0.081 0.002 0.047
## 2013.25 -0.029 -0.091 -0.005 0.023
## 2013.50 -0.046 -0.108 -0.022 0.022
## 2013.75 -0.032 -0.094 -0.010 0.023
## 2014.00 -0.045 -0.107 -0.018 0.017
## 2014.25 -0.043 -0.105 -0.011 0.023
## 2014.50 -0.029 -0.091 0.000 0.029
## 2014.75 -0.018 -0.080 0.011 0.087
## 2015.00 -0.029 -0.091 0.005 0.057
## 2015.25 -0.019 -0.081 0.005 0.074
## 2015.50 -0.022 -0.084 0.007 0.056
## 2015.75 -0.019 -0.081 0.013 0.115
## 2016.00 -0.028 -0.090 0.008 0.071
Or if we want to prioritize testing the average post-treatment effect, we can set it to be the absolute sum:
##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "None", scm = TRUE)
##
## Average ATT Estimate (p Value for Joint Null): -0.0294 ( 0.3 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
##
## Avg Estimated Bias: NA
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.018 -0.045 0.006 0.114
## 2012.50 -0.041 -0.070 -0.015 0.027
## 2012.75 -0.033 -0.060 -0.004 0.041
## 2013.00 -0.019 -0.046 0.005 0.106
## 2013.25 -0.029 -0.053 -0.005 0.043
## 2013.50 -0.046 -0.073 -0.015 0.018
## 2013.75 -0.032 -0.056 -0.005 0.018
## 2014.00 -0.045 -0.074 -0.016 0.024
## 2014.25 -0.043 -0.074 -0.006 0.017
## 2014.50 -0.029 -0.061 0.000 0.043
## 2014.75 -0.018 -0.053 0.011 0.122
## 2015.00 -0.029 -0.074 0.010 0.085
## 2015.25 -0.019 -0.056 0.010 0.113
## 2015.50 -0.022 -0.056 0.007 0.108
## 2015.75 -0.019 -0.055 0.013 0.192
## 2016.00 -0.028 -0.077 0.008 0.096
Visusalize Results
It’s easier to see this information visually. Below we plot the difference between Kansas and it’s synthetic control. Before the tax cuts (to the left of the dashed line) we expect these to be close, and after the tax cuts we measure the effect (with point-wise confidence intervals).
We can also compute point-wise confidence intervals using the Jackknife+ procedure by
changing the inf_type argument, although this requires
additional assumptions.
🎯 Augmenting Synthetic Control with Outcome Model
In this example the pre-intervention synthetic control fit has an L2 imbalance of 0.083, about 20% of the imbalance between Kansas and the average of the other states. We can reduce this by augmenting synthetic control with ridge regression.
To do this we do the following:
Change
progfuncto"Ridge"We can also choose the ridge hyper-parameter by setting
lambda, while not specifyinglambdawill determine one through cross validation
asyn <- augsynth(
form = lngdpcapita ~ treated,
unit = fips,
time = year_qtr,
data = kansas,
progfunc = "Ridge", # outcome model
scm = TRUE
)We can plot the cross-validation MSE when dropping pre-treatment time
periods by setting cv = T in the plot
function:
By default, the CV procedure chooses the maximal value of
lambda with MSE within one standard deviation of the
minimal MSE. To instead choose the lambda that minizes the
cross validation MSE, set min_1se = FALSE.
We can look at the summary. Now in the summary output we see an estimate of the overall bias of synth; we measure this with the average amount that augmentation changes the synth estimate. Notice that the estimates become somewhat larger in magnitude, and the standard errors are tighter.
##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "None", scm = TRUE)
##
## Average ATT Estimate (p Value for Joint Null): -0.0294 ( 0.33 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
##
## Avg Estimated Bias: NA
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.018 -0.045 0.008 0.140
## 2012.50 -0.041 -0.068 -0.007 0.019
## 2012.75 -0.033 -0.062 -0.004 0.041
## 2013.00 -0.019 -0.046 0.010 0.120
## 2013.25 -0.029 -0.053 0.000 0.043
## 2013.50 -0.046 -0.073 -0.022 0.019
## 2013.75 -0.032 -0.056 -0.008 0.021
## 2014.00 -0.045 -0.074 -0.018 0.020
## 2014.25 -0.043 -0.074 -0.011 0.018
## 2014.50 -0.029 -0.066 0.000 0.034
## 2014.75 -0.018 -0.053 0.016 0.142
## 2015.00 -0.029 -0.074 0.010 0.075
## 2015.25 -0.019 -0.058 0.013 0.116
## 2015.50 -0.022 -0.063 0.007 0.104
## 2015.75 -0.019 -0.060 0.013 0.164
## 2016.00 -0.028 -0.072 0.008 0.090
# compare att
rbind(res2$average_att, res$average_att) %>%
tibble() %>%
transmute(model = c("ASC", "SC"), Estimate, p_val)## # A tibble: 2 × 3
## model Estimate p_val
## <chr> <dbl> <dbl>
## 1 ASC -0.0401 0.083
## 2 SC -0.0294 0.326
# compare L2 imbalance
rbind(res2$l2_imbalance, res$l2_imbalance) %>%
as.vector() %>%
tibble() %>%
rename(L2_imbalance = 1) %>%
mutate(model = c("ASC", "SC"), .before = 1)## # A tibble: 2 × 2
## model L2_imbalance
## <chr> <dbl>
## 1 ASC 0.0615
## 2 SC 0.0826
Adding Covariates
We can add the covariates into the formula after |.
By default this will create time invariant covariates by averaging
the auxiliary covariates over the pre-intervention period, dropping
NA values.
We can use a custom aggregation function by setting the
cov_agg argument. Then the lagged outcomes and the
auxiliary covariates are jointly balanced by SCM and the ridge outcome
model includes both.
covsyn <- augsynth(
form = lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
log(revlocalcapita) + log(avgwklywagecapita) +
estabscapita + emplvlcapita, # now adding covariates
unit = fips,
time = year_qtr,
data = kansas,
progfunc = "ridge",
scm = TRUE
)Again we can look at the summary and plot the results.
##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "ridge", scm = TRUE)
##
## Average ATT Estimate (p Value for Joint Null): -0.0609 ( 0.14 )
## L2 Imbalance: 0.054
## Percent improvement from uniform weights: 86.6%
##
## Covariate L2 Imbalance: 0.005
## Percent improvement from uniform weights: 97.7%
##
## Avg Estimated Bias: 0.027
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.021 -0.044 0.002 0.065
## 2012.50 -0.047 -0.081 -0.014 0.026
## 2012.75 -0.050 -0.083 -0.007 0.024
## 2013.00 -0.045 -0.074 -0.012 0.036
## 2013.25 -0.055 -0.088 -0.022 0.020
## 2013.50 -0.071 -0.110 -0.028 0.020
## 2013.75 -0.058 -0.091 -0.025 0.030
## 2014.00 -0.081 -0.119 -0.043 0.021
## 2014.25 -0.078 -0.121 -0.029 0.021
## 2014.50 -0.065 -0.114 -0.011 0.035
## 2014.75 -0.057 -0.110 -0.013 0.039
## 2015.00 -0.075 -0.124 -0.022 0.042
## 2015.25 -0.063 -0.106 -0.014 0.037
## 2015.50 -0.067 -0.106 -0.019 0.023
## 2015.75 -0.063 -0.101 -0.019 0.025
## 2016.00 -0.078 -0.127 -0.019 0.045
Now we can additionally fit ridge ASCM on the residuals, look at the summary, and plot the results.
covsyn_resid <- augsynth(
form = lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
log(revlocalcapita) + log(avgwklywagecapita) +
estabscapita + emplvlcapita,
unit = fips,
time = year_qtr,
data = kansas,
progfunc = "ridge",
scm = TRUE,
lambda = asyn$lambda,
residualize = T
)##
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "ridge", scm = TRUE,
## lambda = ..3, residualize = ..4)
##
## Average ATT Estimate (p Value for Joint Null): -0.0548 ( 0.27 )
## L2 Imbalance: 0.067
## Percent improvement from uniform weights: 83.4%
##
## Covariate L2 Imbalance: 0.000
## Percent improvement from uniform weights: 100%
##
## Avg Estimated Bias: 0.006
##
## Inference type: Conformal inference
##
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.25 -0.025 -0.046 0.000 0.044
## 2012.50 -0.051 -0.076 -0.026 0.006
## 2012.75 -0.045 -0.070 -0.020 0.015
## 2013.00 -0.044 -0.069 -0.019 0.006
## 2013.25 -0.051 -0.077 -0.026 0.015
## 2013.50 -0.069 -0.094 -0.048 0.016
## 2013.75 -0.051 -0.072 -0.031 0.016
## 2014.00 -0.069 -0.095 -0.040 0.014
## 2014.25 -0.067 -0.097 -0.037 0.012
## 2014.50 -0.053 -0.083 -0.024 0.011
## 2014.75 -0.045 -0.075 -0.015 0.029
## 2015.00 -0.064 -0.089 -0.034 0.005
## 2015.25 -0.051 -0.076 -0.026 0.004
## 2015.50 -0.059 -0.089 -0.034 0.013
## 2015.75 -0.058 -0.087 -0.023 0.011
## 2016.00 -0.074 -0.103 -0.044 0.010
Difference Outcome Models
Finally, we can augment synth with many different outcome models. The
simplest outcome model is a unit fixed effect model, which we can
include by setting fixedeff = T.
desyn <- augsynth(lngdpcapita ~ treated,
fips, year_qtr, kansas,
progfunc = "none", scm = T,
fixedeff = T
)We can incorporate other outcome models by changing the
progfunc. Several outcome models are available, including,
fitting the factor model directly with gsynth, general
elastic net regression, Bayesian structural time series estimation with
CausalImpact, and matrix completion with
MCPanel. For each outcome model you can supply an optional
set of parameters, see documentation for details.