Augmented Synthetic Control Method – augsynth R package

knitr::opts_chunk$set(
  warning = FALSE, 
  message = FALSE,
  echo = TRUE, 
  dpi = 300, 
  tidy = "styler", 
  fig.width = 8, 
  fig.height = 5
)

Import Packages

Check: augsynth📦 R package tutorial.

## Install devtools if noy already installed
# install.packages("devtools", repos='http://cran.us.r-project.org')

## Install augsynth from github
# devtools::install_github("ebenmichael/augsynth")

library(augsynth)
library(tidyverse)
library(fect)

Data

To show the usage and features of augsynth, we’ll use data on the impact of personal income tax cuts in Kansas that comes with the AugSynth package. Our interest is in estimating the effect of income tax cuts on gross state product (GSP) per capita.

# import data
data(kansas)

# check
head(kansas)
## # A tibble: 6 × 32
##    fips  year   qtr state      gdp revenuepop rev_state_total rev_local_total
##   <dbl> <dbl> <dbl> <chr>    <dbl>      <dbl>           <dbl>           <dbl>
## 1     1  1990     1 Alabama 71610          NA              NA              NA
## 2     1  1990     2 Alabama 72718.         NA              NA              NA
## 3     1  1990     3 Alabama 73826.         NA              NA              NA
## 4     1  1990     4 Alabama 74935.         NA              NA              NA
## 5     1  1991     1 Alabama 76043          NA              NA              NA
## 6     1  1991     2 Alabama 77347.         NA              NA              NA
## # ℹ 24 more variables: popestimate <dbl>, qtrly_estabs_count <dbl>,
## #   month1_emplvl <dbl>, month2_emplvl <dbl>, month3_emplvl <dbl>,
## #   total_qtrly_wages <dbl>, taxable_qtrly_wages <dbl>, avg_wkly_wage <dbl>,
## #   year_qtr <dbl>, treated <dbl>, gdpcapita <dbl>, lngdp <dbl>,
## #   lngdpcapita <dbl>, revstatecapita <dbl>, revlocalcapita <dbl>,
## #   emplvl1capita <dbl>, emplvl2capita <dbl>, emplvl3capita <dbl>,
## #   emplvlcapita <dbl>, totalwagescapita <dbl>, taxwagescapita <dbl>, …
glimpse(kansas)
## Rows: 5,250
## Columns: 32
## $ fips                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ year                <dbl> 1990, 1990, 1990, 1990, 1991, 1991, 1991, 1991, 19…
## $ qtr                 <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1,…
## $ state               <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alaba…
## $ gdp                 <dbl> 71610.00, 72718.25, 73826.50, 74934.75, 76043.00, …
## $ revenuepop          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 1443.286, 1443.286…
## $ rev_state_total     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ rev_local_total     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ popestimate         <dbl> 4050055, 4062330, 4074606, 4086881, 4099156, 41128…
## $ qtrly_estabs_count  <dbl> 86586, 86188, 86799, 87915, 91626, 91880, 93501, 9…
## $ month1_emplvl       <dbl> 1571410, 1599077, 1599355, 1612133, 1581388, 15988…
## $ month2_emplvl       <dbl> 1572535, 1610310, 1604376, 1614907, 1580574, 16079…
## $ month3_emplvl       <dbl> 1581994, 1618397, 1611040, 1615504, 1590768, 16110…
## $ total_qtrly_wages   <dbl> 7832159837, 8107228382, 8102930357, 8725347965, 81…
## $ taxable_qtrly_wages <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ avg_wkly_wage       <dbl> 382, 388, 388, 416, 396, 404, 406, 432, 414, 416, …
## $ year_qtr            <dbl> 1990.00, 1990.25, 1990.50, 1990.75, 1991.00, 1991.…
## $ treated             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ gdpcapita           <dbl> 17681.24, 17900.62, 18118.69, 18335.44, 18550.89, …
## $ lngdp               <dbl> 11.17899, 11.19435, 11.20947, 11.22437, 11.23905, …
## $ lngdpcapita         <dbl> 9.780260, 9.792591, 9.804699, 9.816591, 9.828273, …
## $ revstatecapita      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ revlocalcapita      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ emplvl1capita       <dbl> 0.3879972, 0.3936354, 0.3925178, 0.3944654, 0.3857…
## $ emplvl2capita       <dbl> 0.3882750, 0.3964006, 0.3937500, 0.3951441, 0.3855…
## $ emplvl3capita       <dbl> 0.3906105, 0.3983913, 0.3953855, 0.3952902, 0.3880…
## $ emplvlcapita        <dbl> 0.3889609, 0.3961424, 0.3938844, 0.3949666, 0.3864…
## $ totalwagescapita    <dbl> 1933.840, 1995.709, 1988.642, 2134.965, 1992.086, …
## $ taxwagescapita      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ avgwklywagecapita   <dbl> 382, 388, 388, 416, 396, 404, 406, 432, 414, 416, …
## $ estabscapita        <dbl> 0.02137897, 0.02121639, 0.02130243, 0.02151152, 0.…
## $ abb                 <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A…
kansas %>%
  select(year, qtr, year_qtr, state, treated, gdp, lngdpcapita) %>%
  filter(state == "Kansas" & year_qtr >= 2012 & year_qtr < 2013)
## # A tibble: 4 × 7
##    year   qtr year_qtr state  treated    gdp lngdpcapita
##   <dbl> <dbl>    <dbl> <chr>    <dbl>  <dbl>       <dbl>
## 1  2012     1    2012  Kansas       0 143844        10.8
## 2  2012     2    2012. Kansas       1 141518        10.8
## 3  2012     3    2012. Kansas       1 138890        10.8
## 4  2012     4    2013. Kansas       1 139603        10.8

🎯 Data Requirement

The data should be long panel with columns:

  • unit
  • time
  • outcome
  • treatment

In this example,

  • unit = fips
  • time = year_qtr
  • outcome = lngdpcapita
  • treatment = treated

Visusalization

Check treatment status

# check treatment status
panelView::panelview(
  data = kansas,
  formula = lngdpcapita ~ treated,
  index = c("fips", "year_qtr"), # index = (unit, time)
  type = "treat",
  axis.lab = "time",
  xlab = "Time",
  ylab = "Unit",
  axis.adjust = TRUE,
  by.timing = TRUE,
  background = "white",
  main = "Treatment Status"
)

Check outcome variable

# check outcome variable
panelView::panelview(
  data = kansas,
  formula = lngdpcapita ~ treated,
  index = c("fips", "year_qtr"), # index = (unit, time)
  type = "outcome", # change this to outcome
  axis.lab = "time",
  xlab = "Time",
  ylab = "Unit",
  axis.adjust = TRUE,
  by.timing = TRUE,
  background = "white",
  main = "Outcome variable"
)

Synthetic Control

Now to find a synthetic control using the entire series of pre-intervention outcomes (and no auxiliary covariates), we can use augsynth. To do so we just need to do the following:

  1. Specify a formula: outcome ~ treatment

  2. Specify the unit variable and time variable. Note: optionally provide when intervention took place (the code will automatically determine this if t_int is not provided)

  3. For synthetic control, we need to specify that we don’t want to fit an outcome model

syn <- augsynth(
  form = lngdpcapita ~ treated,
  unit = fips,
  time = year_qtr,
  data = kansas,
  progfunc = "None", # we don't want to fit an outcome model
  scm = TRUE # whether the SCM weighting function is used
)

# summary
res <- summary(syn)

res
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "None", scm = TRUE)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0294   ( 0.33 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
## 
## Avg Estimated Bias: NA
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.018             -0.045              0.008   0.140
##  2012.50   -0.041             -0.068             -0.007   0.019
##  2012.75   -0.033             -0.062             -0.004   0.041
##  2013.00   -0.019             -0.046              0.010   0.120
##  2013.25   -0.029             -0.053              0.000   0.043
##  2013.50   -0.046             -0.073             -0.022   0.019
##  2013.75   -0.032             -0.056             -0.008   0.021
##  2014.00   -0.045             -0.074             -0.018   0.020
##  2014.25   -0.043             -0.074             -0.011   0.018
##  2014.50   -0.029             -0.066              0.000   0.034
##  2014.75   -0.018             -0.053              0.016   0.142
##  2015.00   -0.029             -0.074              0.010   0.075
##  2015.25   -0.019             -0.058              0.013   0.116
##  2015.50   -0.022             -0.063              0.007   0.104
##  2015.75   -0.019             -0.060              0.013   0.164
##  2016.00   -0.028             -0.072              0.008   0.090
res$average_att
##                           Value    Estimate Std.Error p_val lower_bound
## 1 Average Post-Treatment Effect -0.02943473        NA 0.326          NA
##   upper_bound
## 1          NA
tail(res$att)
##            Time    Estimate lower_bound upper_bound p_val
## 2014.75 2014.75 -0.01849616 -0.05262764 0.015635328 0.142
## 2015    2015.00 -0.02930564 -0.07355016 0.009882359 0.075
## 2015.25 2015.25 -0.01908377 -0.05827177 0.012519454 0.116
## 2015.5  2015.50 -0.02162342 -0.06333968 0.007451545 0.104
## 2015.75 2015.75 -0.01856676 -0.06028302 0.013036471 0.164
## 2016    2016.00 -0.02816962 -0.07241414 0.008490122 0.090

We can obtain:

  • ATT estimates for each post-intervention time period and overall

  • The quality of the synthetic control fit measured by the L2 distance between Kansas and its synthetic control

  • The percent improvement over uniform weights

By default, we’ll also see point-wise confidence intervals using a conformal inference procedure.

Statistical Inference

The default test statistic is the sum of the absolute treatment effects function(x) sum(abs(x)). We can change the test statistic via the stat_func argument. For instance, if we want to perform a one-way test against positive effects, we can set the test statistic to be the negative sum function(x) -sum(x):

?augsynth:::summary.augsynth()

summary(syn, stat_func = function(x) -sum(x))
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "None", scm = TRUE)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0294   ( 0.15 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
## 
## Avg Estimated Bias: NA
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.018             -0.080              0.006   0.057
##  2012.50   -0.041             -0.103             -0.015   0.015
##  2012.75   -0.033             -0.095             -0.007   0.020
##  2013.00   -0.019             -0.081              0.002   0.047
##  2013.25   -0.029             -0.091             -0.005   0.023
##  2013.50   -0.046             -0.108             -0.022   0.022
##  2013.75   -0.032             -0.094             -0.010   0.023
##  2014.00   -0.045             -0.107             -0.018   0.017
##  2014.25   -0.043             -0.105             -0.011   0.023
##  2014.50   -0.029             -0.091              0.000   0.029
##  2014.75   -0.018             -0.080              0.011   0.087
##  2015.00   -0.029             -0.091              0.005   0.057
##  2015.25   -0.019             -0.081              0.005   0.074
##  2015.50   -0.022             -0.084              0.007   0.056
##  2015.75   -0.019             -0.081              0.013   0.115
##  2016.00   -0.028             -0.090              0.008   0.071

Or if we want to prioritize testing the average post-treatment effect, we can set it to be the absolute sum:

summary(syn, stat_func = function(x) abs(sum(x)))
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "None", scm = TRUE)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0294   ( 0.3 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
## 
## Avg Estimated Bias: NA
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.018             -0.045              0.006   0.114
##  2012.50   -0.041             -0.070             -0.015   0.027
##  2012.75   -0.033             -0.060             -0.004   0.041
##  2013.00   -0.019             -0.046              0.005   0.106
##  2013.25   -0.029             -0.053             -0.005   0.043
##  2013.50   -0.046             -0.073             -0.015   0.018
##  2013.75   -0.032             -0.056             -0.005   0.018
##  2014.00   -0.045             -0.074             -0.016   0.024
##  2014.25   -0.043             -0.074             -0.006   0.017
##  2014.50   -0.029             -0.061              0.000   0.043
##  2014.75   -0.018             -0.053              0.011   0.122
##  2015.00   -0.029             -0.074              0.010   0.085
##  2015.25   -0.019             -0.056              0.010   0.113
##  2015.50   -0.022             -0.056              0.007   0.108
##  2015.75   -0.019             -0.055              0.013   0.192
##  2016.00   -0.028             -0.077              0.008   0.096

Visusalize Results

It’s easier to see this information visually. Below we plot the difference between Kansas and it’s synthetic control. Before the tax cuts (to the left of the dashed line) we expect these to be close, and after the tax cuts we measure the effect (with point-wise confidence intervals).

plot(syn)

We can also compute point-wise confidence intervals using the Jackknife+ procedure by changing the inf_type argument, although this requires additional assumptions.

plot(syn, inf_type = "jackknife+")

🎯 Augmenting Synthetic Control with Outcome Model

In this example the pre-intervention synthetic control fit has an L2 imbalance of 0.083, about 20% of the imbalance between Kansas and the average of the other states. We can reduce this by augmenting synthetic control with ridge regression.

To do this we do the following:

  • Change progfunc to "Ridge"

  • We can also choose the ridge hyper-parameter by setting lambda, while not specifying lambda will determine one through cross validation

asyn <- augsynth(
  form = lngdpcapita ~ treated,
  unit = fips,
  time = year_qtr,
  data = kansas,
  progfunc = "Ridge", # outcome model
  scm = TRUE
)

We can plot the cross-validation MSE when dropping pre-treatment time periods by setting cv = T in the plot function:

plot(asyn, cv = TRUE)

By default, the CV procedure chooses the maximal value of lambda with MSE within one standard deviation of the minimal MSE. To instead choose the lambda that minizes the cross validation MSE, set min_1se = FALSE.

res2 <- summary(asyn)

We can look at the summary. Now in the summary output we see an estimate of the overall bias of synth; we measure this with the average amount that augmentation changes the synth estimate. Notice that the estimates become somewhat larger in magnitude, and the standard errors are tighter.

# check
res
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "None", scm = TRUE)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0294   ( 0.33 )
## L2 Imbalance: 0.083
## Percent improvement from uniform weights: 79.5%
## 
## Avg Estimated Bias: NA
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.018             -0.045              0.008   0.140
##  2012.50   -0.041             -0.068             -0.007   0.019
##  2012.75   -0.033             -0.062             -0.004   0.041
##  2013.00   -0.019             -0.046              0.010   0.120
##  2013.25   -0.029             -0.053              0.000   0.043
##  2013.50   -0.046             -0.073             -0.022   0.019
##  2013.75   -0.032             -0.056             -0.008   0.021
##  2014.00   -0.045             -0.074             -0.018   0.020
##  2014.25   -0.043             -0.074             -0.011   0.018
##  2014.50   -0.029             -0.066              0.000   0.034
##  2014.75   -0.018             -0.053              0.016   0.142
##  2015.00   -0.029             -0.074              0.010   0.075
##  2015.25   -0.019             -0.058              0.013   0.116
##  2015.50   -0.022             -0.063              0.007   0.104
##  2015.75   -0.019             -0.060              0.013   0.164
##  2016.00   -0.028             -0.072              0.008   0.090
# compare att
rbind(res2$average_att, res$average_att) %>%
  tibble() %>%
  transmute(model = c("ASC", "SC"), Estimate, p_val)
## # A tibble: 2 × 3
##   model Estimate p_val
##   <chr>    <dbl> <dbl>
## 1 ASC    -0.0401 0.083
## 2 SC     -0.0294 0.326
# compare L2 imbalance
rbind(res2$l2_imbalance, res$l2_imbalance) %>%
  as.vector() %>%
  tibble() %>%
  rename(L2_imbalance = 1) %>%
  mutate(model = c("ASC", "SC"), .before = 1)
## # A tibble: 2 × 2
##   model L2_imbalance
##   <chr>        <dbl>
## 1 ASC         0.0615
## 2 SC          0.0826

Plot Result

plot(asyn)

Adding Covariates

We can add the covariates into the formula after |.

By default this will create time invariant covariates by averaging the auxiliary covariates over the pre-intervention period, dropping NA values.

We can use a custom aggregation function by setting the cov_agg argument. Then the lagged outcomes and the auxiliary covariates are jointly balanced by SCM and the ridge outcome model includes both.

covsyn <- augsynth(
  form = lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
    log(revlocalcapita) + log(avgwklywagecapita) +
    estabscapita + emplvlcapita, # now adding covariates
  unit = fips,
  time = year_qtr,
  data = kansas,
  progfunc = "ridge",
  scm = TRUE
)

Again we can look at the summary and plot the results.

res3 <- summary(covsyn)

res3
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "ridge", scm = TRUE)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0609   ( 0.14 )
## L2 Imbalance: 0.054
## Percent improvement from uniform weights: 86.6%
## 
## Covariate L2 Imbalance: 0.005
## Percent improvement from uniform weights: 97.7%
## 
## Avg Estimated Bias: 0.027
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.021             -0.044              0.002   0.065
##  2012.50   -0.047             -0.081             -0.014   0.026
##  2012.75   -0.050             -0.083             -0.007   0.024
##  2013.00   -0.045             -0.074             -0.012   0.036
##  2013.25   -0.055             -0.088             -0.022   0.020
##  2013.50   -0.071             -0.110             -0.028   0.020
##  2013.75   -0.058             -0.091             -0.025   0.030
##  2014.00   -0.081             -0.119             -0.043   0.021
##  2014.25   -0.078             -0.121             -0.029   0.021
##  2014.50   -0.065             -0.114             -0.011   0.035
##  2014.75   -0.057             -0.110             -0.013   0.039
##  2015.00   -0.075             -0.124             -0.022   0.042
##  2015.25   -0.063             -0.106             -0.014   0.037
##  2015.50   -0.067             -0.106             -0.019   0.023
##  2015.75   -0.063             -0.101             -0.019   0.025
##  2016.00   -0.078             -0.127             -0.019   0.045
plot(covsyn)

Now we can additionally fit ridge ASCM on the residuals, look at the summary, and plot the results.

covsyn_resid <- augsynth(
  form = lngdpcapita ~ treated | lngdpcapita + log(revstatecapita) +
    log(revlocalcapita) + log(avgwklywagecapita) +
    estabscapita + emplvlcapita,
  unit = fips,
  time = year_qtr,
  data = kansas,
  progfunc = "ridge",
  scm = TRUE,
  lambda = asyn$lambda,
  residualize = T
)
summary(covsyn_resid)
## 
## Call:
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time), 
##     t_int = t_int, data = data, progfunc = "ridge", scm = TRUE, 
##     lambda = ..3, residualize = ..4)
## 
## Average ATT Estimate (p Value for Joint Null):  -0.0548   ( 0.27 )
## L2 Imbalance: 0.067
## Percent improvement from uniform weights: 83.4%
## 
## Covariate L2 Imbalance: 0.000
## Percent improvement from uniform weights: 100%
## 
## Avg Estimated Bias: 0.006
## 
## Inference type: Conformal inference
## 
##     Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
##  2012.25   -0.025             -0.046              0.000   0.044
##  2012.50   -0.051             -0.076             -0.026   0.006
##  2012.75   -0.045             -0.070             -0.020   0.015
##  2013.00   -0.044             -0.069             -0.019   0.006
##  2013.25   -0.051             -0.077             -0.026   0.015
##  2013.50   -0.069             -0.094             -0.048   0.016
##  2013.75   -0.051             -0.072             -0.031   0.016
##  2014.00   -0.069             -0.095             -0.040   0.014
##  2014.25   -0.067             -0.097             -0.037   0.012
##  2014.50   -0.053             -0.083             -0.024   0.011
##  2014.75   -0.045             -0.075             -0.015   0.029
##  2015.00   -0.064             -0.089             -0.034   0.005
##  2015.25   -0.051             -0.076             -0.026   0.004
##  2015.50   -0.059             -0.089             -0.034   0.013
##  2015.75   -0.058             -0.087             -0.023   0.011
##  2016.00   -0.074             -0.103             -0.044   0.010
plot(covsyn_resid)

Difference Outcome Models

Finally, we can augment synth with many different outcome models. The simplest outcome model is a unit fixed effect model, which we can include by setting fixedeff = T.

desyn <- augsynth(lngdpcapita ~ treated,
  fips, year_qtr, kansas,
  progfunc = "none", scm = T,
  fixedeff = T
)

We can incorporate other outcome models by changing the progfunc. Several outcome models are available, including, fitting the factor model directly with gsynth, general elastic net regression, Bayesian structural time series estimation with CausalImpact, and matrix completion with MCPanel. For each outcome model you can supply an optional set of parameters, see documentation for details.