Synthetic Difference-in-Differences: Prop 99

Introduction

This project is a synopsis of a replication paper written while I was in grad school on the newest causal method developed by recent Nobel Prize winner Guido Imbens, Susan Athey, et al. Their paper can be found here and Athey gives a wonderful presentation breaking down the new method which looks to increase robustness by combining components from Difference-in-Differences and Synthetic Control methodologies.

The dataset used is regarding California’s Proposition 99 which includes cigarette consumption data on 38 states from 1970 to 2000. Prop 99 instituted a tax on cigarettes sales in 1989 in efforts to decrease cigarette consumption. While intuition supports the notion this worked proving causation requires more and with lack of a true counterfactual we must rely on experimental designs and mathematics to create counterfactuals.

To evaluate the efficacy of Prop 99, three different, yet related, causal methods will be used and compared for robustness.

Difference-in-Differences (DiD)
Synthetic Control (SC)
Synthetic Difference-in-Differences (SDiD)

devtools::install_github("synth-inference/synthdid")

## Skipping install of 'synthdid' from a github remote, the SHA1 (b839f2cc) has not changed since last install.
##   Use `force = TRUE` to force installation

library(synthdid)
library(rngtools)
library(future)
library(doFuture)

## Loading required package: foreach

library(future.batchtools)
library(xtable)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(tibble)
library(ggplot2)

Define Estimators

The initial step is just to simply setup a list of estimators for the three causal methods being evaluated. Here these are each set to the corresponding function in the synthdid package which delineates between each methodology. The second half of this publication goes into more or the nuances between estimate methods.

estimators = list(did=did_estimate,
                  sc=sc_estimate,
                  sdid=synthdid_estimate)
str(synthdid_estimate)

## function (Y, N0, T0, X = array(dim = c(dim(Y), 0)), noise.level = sd(apply(Y[1:N0, 
##     1:T0], 1, diff)), eta.omega = ((nrow(Y) - N0) * (ncol(Y) - T0))^(1/4), 
##     eta.lambda = 1e-06, zeta.omega = eta.omega * noise.level, zeta.lambda = eta.lambda * 
##         noise.level, omega.intercept = TRUE, lambda.intercept = TRUE, weights = list(omega = NULL, 
##         lambda = NULL), update.omega = is.null(weights$omega), update.lambda = is.null(weights$lambda), 
##     min.decrease = 1e-05 * noise.level, max.iter = 10000, sparsify = sparsify_function, 
##     max.iter.pre.sparsify = 100)

str(sc_estimate)

## function (Y, N0, T0, eta.omega = 1e-06, ...)

str(did_estimate)

## function (Y, N0, T0, ...)

Data

One of the beauties of Prop 99 data is it requires minimal processing and widely available. As seen below this is panel data and is organized by year for each state with cigarette packs per capita as the variable of interest. A column denoting treatment is also present with dummy variables, but only California is in the treatment group. Panel data typically includes:

Units (n) -> states
Time periods (t) -> years
Outcomes (Y) -> packs per capita
Treatment Indicators -> binary dummy variable

data('california_prop99')
head(california_prop99)

##         State Year PacksPerCapita treated
## 1     Alabama 1970           89.8       0
## 2    Arkansas 1970          100.3       0
## 3    Colorado 1970          124.8       0
## 4 Connecticut 1970          120.0       0
## 5    Delaware 1970          155.0       0
## 6     Georgia 1970          109.9       0

Enter the Matrix

Here we are simply converting the data from panel data into matrices which is required by sythdid. Synthdid has a useful function panel.matrices() which we used here. Notice the difference on the call of head() from the previous step. We end up with four matrices:

$Y is a 38 x 31 matrix of with outcome data organized in the 38 rows from states and 31 columns for years
$N0 is simply the number of units or states
$T0 is the number time periods, years, during the pre-treatment era which in this case if before 1989.
$W is our counterfactual group or control group eligible units signified by all 0’s for treatment and the lack of California

setup = panel.matrices(california_prop99)
head(setup)

## $Y
##                 1970  1971  1972  1973  1974  1975  1976  1977  1978  1979
## Alabama         89.8  95.4 101.1 102.9 108.2 111.7 116.2 117.1 123.0 121.4
## Arkansas       100.3 104.1 103.9 108.0 109.7 114.8 119.1 122.6 127.3 126.5
## Colorado       124.8 125.5 134.3 137.9 132.8 131.0 134.2 132.0 129.2 131.5
## Connecticut    120.0 117.6 110.8 109.3 112.4 110.2 113.4 117.3 117.5 117.4
## Delaware       155.0 161.1 156.3 154.7 151.3 147.6 153.0 153.3 155.5 150.2
## Georgia        109.9 115.7 117.0 119.8 123.7 122.9 125.9 127.9 130.6 131.0
## Idaho          102.4 108.5 126.1 121.8 125.6 123.3 125.1 125.0 122.8 117.5
## Illinois       124.8 125.6 126.6 124.4 131.9 131.8 134.4 134.0 136.7 135.3
## Indiana        134.6 139.3 149.2 156.0 159.6 162.4 166.6 173.0 150.9 148.9
## Iowa           108.5 108.4 109.4 110.6 116.1 120.5 124.4 125.5 127.1 124.2
## Kansas         114.0 102.8 111.0 115.2 118.6 123.4 127.7 127.9 127.1 126.4
## Kentucky       155.8 163.5 179.4 201.9 212.4 223.0 230.9 229.4 224.7 214.9
## Louisiana      115.9 119.8 125.3 126.7 129.9 133.6 139.6 140.0 142.7 140.1
## Maine          128.5 133.2 136.5 138.0 142.1 140.7 144.9 145.6 143.9 138.5
## Minnesota      104.3 116.4  96.8 106.8 110.6 111.5 116.7 117.2 118.9 118.3
## Mississippi     93.4 105.4 112.1 115.0 117.1 116.8 120.9 122.1 124.9 123.9
## Missouri       121.3 127.6 130.0 132.1 135.4 135.6 139.5 140.8 141.8 140.2
## Montana        111.2 115.6 122.2 119.9 121.9 123.7 124.9 127.0 127.2 120.3
## Nebraska       108.1 108.6 104.9 106.6 110.5 114.1 118.1 117.7 117.4 116.1
## Nevada         189.5 190.5 198.6 201.5 204.7 205.2 201.4 190.8 187.0 183.3
## New Hampshire  265.7 278.0 296.2 279.0 269.8 269.1 290.5 278.8 269.6 254.6
## New Mexico      90.0  92.6  99.3  98.9 100.3 103.1 102.4 102.4 103.1 101.0
## North Carolina 172.4 187.6 214.1 226.5 227.3 226.0 230.2 217.0 205.5 197.3
## North Dakota    93.8  98.5 103.8 108.7 110.5 117.9 125.4 122.2 121.9 121.3
## Ohio           121.6 124.6 124.4 120.5 122.1 122.5 124.6 127.3 131.3 130.9
## Oklahoma       108.4 115.4 121.7 124.1 130.5 132.9 138.6 140.4 143.6 141.6
## Pennsylvania   107.3 106.3 109.0 110.7 114.2 114.6 118.8 120.1 122.3 122.6
## Rhode Island   123.9 123.2 134.4 142.0 146.1 154.7 150.2 148.8 146.8 145.8
## South Carolina 103.6 115.0 118.7 125.5 129.7 130.5 136.8 137.2 140.4 135.7
## South Dakota    92.7  96.7 103.0 103.5 108.4 113.5 116.7 115.6 116.9 117.4
## Tennessee       99.8 106.3 111.5 109.7 114.8 117.4 121.7 124.6 127.3 127.2
## Texas          106.4 108.9 108.6 110.4 114.7 116.0 121.4 124.2 126.6 126.4
## Utah            65.5  67.7  71.3  72.7  75.6  75.8  77.9  78.0  79.6  79.1
## Vermont        122.6 124.4 138.0 146.8 151.8 155.5 171.1 169.4 162.4 160.9
## Virginia       124.3 128.4 137.0 143.1 149.6 152.7 158.1 157.7 155.9 151.8
## West Virginia  114.5 111.5 117.5 116.6 119.9 123.2 129.7 133.9 131.6 122.1
## Wisconsin      106.4 105.4 108.8 109.5 111.8 113.5 115.4 117.2 116.7 117.1
## Wyoming        132.2 131.7 140.0 141.2 145.8 160.7 161.5 160.4 160.3 168.6
## California     123.0 121.0 123.5 124.4 126.7 127.1 128.0 126.4 126.1 121.9
##                 1980  1981  1982  1983  1984  1985  1986  1987  1988  1989
## Alabama        123.2 119.6 119.1 116.3 113.0 114.5 116.3 114.0 112.1 105.6
## Arkansas       131.8 128.7 127.4 128.0 123.1 125.8 126.0 122.3 121.5 118.3
## Colorado       131.0 133.8 130.5 125.3 119.7 112.4 109.9 102.4  94.6  88.8
## Connecticut    118.0 116.4 114.7 114.1 112.5 111.0 108.5 109.0 104.8 100.6
## Delaware       150.5 152.6 154.1 149.6 144.0 144.5 142.4 141.0 137.1 131.7
## Georgia        134.0 131.7 131.2 128.6 126.3 128.8 129.0 129.3 124.1 117.1
## Idaho          115.2 114.1 111.5 111.3 103.6 100.7  96.7  95.0  84.5  78.4
## Illinois       135.2 133.0 130.7 127.9 124.0 121.6 118.2 109.5 107.6 104.6
## Indiana        146.9 148.5 147.7 143.0 137.8 135.3 137.6 134.0 134.0 132.5
## Iowa           124.6 132.9 116.2 115.6 111.2 109.4 104.1 101.1 100.2  94.4
## Kansas         127.1 132.0 130.9 127.6 121.7 115.7 109.4 105.2 103.2  96.5
## Kentucky       215.3 209.7 210.6 201.1 183.2 182.4 179.8 171.2 173.2 171.6
## Louisiana      143.8 144.0 143.9 133.7 128.9 125.0 121.2 116.5 110.9 103.6
## Maine          141.2 138.9 139.5 135.4 135.5 127.9 119.0 125.0 125.0 122.4
## Minnesota      117.7 120.8 119.4 113.2 110.8 113.0 104.3 108.8  94.1  92.3
## Mississippi    127.0 125.3 125.8 122.3 116.4 115.3 113.2 110.0 109.0 108.3
## Missouri       142.1 140.5 139.7 134.1 130.0 129.2 128.8 128.7 127.4 122.8
## Montana        122.0 121.1 122.4 113.7 110.1 103.6  97.8  91.7  87.1  86.2
## Nebraska       116.3 117.0 117.1 110.8 107.7 105.1 103.1 101.3  92.9  93.8
## Nevada         177.7 171.9 165.1 159.2 136.6 146.7 142.6 147.7 141.9 137.9
## New Hampshire  247.8 245.4 239.8 232.9 215.1 201.1 195.9 195.1 180.4 172.9
## New Mexico     102.7 103.0  97.5  96.3  88.9  88.0  88.2  82.3  77.7  74.4
## North Carolina 187.8 179.3 179.0 169.8 160.6 156.3 154.4 150.5 146.0 139.3
## North Dakota   123.7 125.7 126.8 119.6 109.4 103.2  99.8  92.3  87.1  84.1
## Ohio           133.5 132.8 134.0 130.0 127.1 126.7 126.3 124.6 122.4 118.6
## Oklahoma       141.6 143.7 147.0 140.0 128.1 124.2 119.9 113.1 103.6  97.5
## Pennsylvania   124.0 125.2 123.3 125.3 115.3 115.8 113.9 110.6 107.6 107.1
## Rhode Island   149.3 151.2 146.3 135.8 136.9 133.4 136.3 124.4 138.0 120.8
## South Carolina 138.3 136.1 136.0 131.1 127.0 125.4 126.6 126.6 124.4 122.4
## South Dakota   114.7 115.7 113.0 109.8 105.7 104.4  97.0  95.8  91.9  87.4
## Tennessee      130.4 129.1 131.4 129.0 125.1 128.7 129.0 130.6 125.3 124.7
## Texas          129.7 129.0 131.2 126.4 117.2 115.9 113.7 105.8  96.5  94.5
## Utah            74.8  77.6  73.6  69.0  66.3  66.5  64.4  67.7  55.0  57.0
## Vermont        161.6 163.8 162.3 153.8 144.3 144.5 131.2 128.3 128.7 120.9
## Virginia       148.9 149.9 147.4 144.7 136.8 134.6 135.8 133.0 129.5 122.5
## West Virginia  122.3 120.5 119.8 115.7 111.9 109.1 112.1 107.5 109.1 104.0
## Wisconsin      117.6 119.9 115.6 106.3 105.6 107.0 105.4 106.0 102.6 100.3
## Wyoming        158.1 163.1 157.7 141.2 128.9 125.7 124.8 110.4 114.3 111.4
## California     120.2 118.6 115.4 110.8 104.8 102.8  99.7  97.5  90.1  82.4
##                 1990  1991  1992  1993  1994  1995  1996  1997  1998  1999
## Alabama        108.6 107.9 109.1 108.5 107.1 102.6 101.4 104.9 106.2 100.7
## Arkansas       113.1 116.8 126.0 113.8 108.8 113.0 110.7 108.7 109.5 104.8
## Colorado        87.4  90.2  88.3  88.6  89.1  85.4  83.1  81.3  81.2  79.6
## Connecticut     91.5  86.7  83.5  79.1  76.6  79.3  76.0  75.9  75.5  73.4
## Delaware       127.2 118.8 120.0 123.8 126.1 127.2 128.3 124.1 132.8 139.5
## Georgia        113.8 109.6 109.2 109.2 107.8 100.3 102.7 100.6 100.5  97.1
## Idaho           90.1  85.4  85.1  86.7  93.0  78.2  73.6  75.0  78.9  75.1
## Illinois        94.1  96.1  94.8  94.6  85.7  84.3  81.8  79.6  80.3  72.2
## Indiana        128.3 127.2 128.2 126.8 128.2 135.4 135.1 135.3 135.9 133.3
## Iowa            95.4  97.1  95.2  92.5  93.4  93.0  94.0  93.9  94.0  91.7
## Kansas          94.3  91.8  90.0  89.9  89.1  90.1  88.7  89.2  87.6  83.3
## Kentucky       182.5 170.4 167.6 167.6 170.1 175.3 179.0 186.8 171.3 165.3
## Louisiana      101.5 107.2 108.5 106.2 105.3 105.7 106.8 105.3 103.2 101.0
## Maine          117.5 116.1 114.5 108.5 101.6 102.3 100.0 101.1  94.5  85.5
## Minnesota       90.7  86.2  83.8  81.6  83.4  84.1  81.7  84.1  83.2  80.7
## Mississippi    101.8 105.6 103.9 105.4 106.0 107.5 106.9 106.3 107.0 103.9
## Missouri       119.1 119.9 122.3 121.6 119.4 124.0 124.1 120.6 120.1 118.0
## Montana         84.7  82.9  86.6  86.0  88.2  90.5  87.3  88.9  89.1  82.6
## Nebraska        89.9  92.4  90.6  91.1  85.9  88.5  86.2  85.5  83.1  86.6
## Nevada         137.3 115.5 110.0 108.1 105.2 100.9  99.0  95.6 102.4 103.9
## New Hampshire  152.4 144.8 143.7 148.9 153.8 158.5 158.0 174.4 173.8 171.7
## New Mexico      70.8  69.9  71.4  69.0  68.2  67.0  65.7  61.8  62.6  59.7
## North Carolina 133.7 132.7 128.9 129.7 112.7 124.9 129.7 125.6 126.0 113.1
## North Dakota    77.1  85.2  74.3  83.0  81.0  80.6  80.8  77.5  79.1  74.7
## Ohio           115.5 113.2 112.3 108.9 108.6 111.7 107.6 108.6 106.4 104.0
## Oklahoma        88.4  87.8  86.3  86.2 104.8 109.5 110.8 111.8 112.2 111.4
## Pennsylvania   101.3 102.5  96.2  94.7  95.4  95.4  93.3  92.9  92.1  91.1
## Rhode Island   101.4 103.6 100.1  94.1  91.9  90.8  87.5  90.0  88.7  86.9
## South Carolina 118.6 121.5 112.8 115.2 112.2 109.2 102.9 124.5 126.9 109.4
## South Dakota    88.3  91.8  93.0  91.6  94.8  98.6  92.3  88.8  88.3  83.5
## Tennessee      121.8 120.6 121.0 120.8 118.8 125.4 119.2 118.9 119.7 115.6
## Texas           85.6  79.4  77.2  81.3  78.8  75.2  74.6  72.6  73.2  67.6
## Utah            53.4  53.5  55.0  56.2  55.8  52.0  54.0  57.0  42.3  43.9
## Vermont        124.3 120.9 126.5 117.2 120.3 123.2 102.5  97.7  97.0  94.1
## Virginia       118.9 109.1 108.2 105.4 106.2 106.7 104.6 108.0 105.6 102.1
## West Virginia  104.1 100.1  97.9 111.0 104.2 115.2 112.7 114.5 114.6 112.4
## Wisconsin       94.0  95.5  96.2  91.2  91.8  93.5  92.1  91.9  88.7  84.4
## Wyoming         96.9 109.1 110.8 108.4 111.2 115.0 110.3 108.8 102.9 104.8
## California      77.8  68.7  67.5  63.4  58.6  56.4  54.5  53.8  52.3  47.2
##                 2000
## Alabama         96.2
## Arkansas        99.4
## Colorado        73.0
## Connecticut     71.4
## Delaware       140.7
## Georgia         88.4
## Idaho           66.9
## Illinois        70.0
## Indiana        125.5
## Iowa            88.9
## Kansas          79.8
## Kentucky       156.2
## Louisiana      104.3
## Maine           82.9
## Minnesota       76.0
## Mississippi     97.2
## Missouri       113.8
## Montana         75.5
## Nebraska        77.6
## Nevada          93.2
## New Hampshire  147.3
## New Mexico      53.8
## North Carolina 109.0
## North Dakota    72.5
## Ohio            99.9
## Oklahoma       108.9
## Pennsylvania    87.9
## Rhode Island    83.1
## South Carolina 103.9
## South Dakota    75.1
## Tennessee      108.7
## Texas           69.3
## Utah            40.7
## Vermont         88.9
## Virginia        96.7
## West Virginia  107.9
## Wisconsin       80.1
## Wyoming         90.5
## California      41.6
## 
## $N0
## [1] 38
## 
## $T0
## [1] 19
## 
## $W
##                1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982
## Alabama           0    0    0    0    0    0    0    0    0    0    0    0    0
## Arkansas          0    0    0    0    0    0    0    0    0    0    0    0    0
## Colorado          0    0    0    0    0    0    0    0    0    0    0    0    0
## Connecticut       0    0    0    0    0    0    0    0    0    0    0    0    0
## Delaware          0    0    0    0    0    0    0    0    0    0    0    0    0
## Georgia           0    0    0    0    0    0    0    0    0    0    0    0    0
## Idaho             0    0    0    0    0    0    0    0    0    0    0    0    0
## Illinois          0    0    0    0    0    0    0    0    0    0    0    0    0
## Indiana           0    0    0    0    0    0    0    0    0    0    0    0    0
## Iowa              0    0    0    0    0    0    0    0    0    0    0    0    0
## Kansas            0    0    0    0    0    0    0    0    0    0    0    0    0
## Kentucky          0    0    0    0    0    0    0    0    0    0    0    0    0
## Louisiana         0    0    0    0    0    0    0    0    0    0    0    0    0
## Maine             0    0    0    0    0    0    0    0    0    0    0    0    0
## Minnesota         0    0    0    0    0    0    0    0    0    0    0    0    0
## Mississippi       0    0    0    0    0    0    0    0    0    0    0    0    0
## Missouri          0    0    0    0    0    0    0    0    0    0    0    0    0
## Montana           0    0    0    0    0    0    0    0    0    0    0    0    0
## Nebraska          0    0    0    0    0    0    0    0    0    0    0    0    0
## Nevada            0    0    0    0    0    0    0    0    0    0    0    0    0
## New Hampshire     0    0    0    0    0    0    0    0    0    0    0    0    0
## New Mexico        0    0    0    0    0    0    0    0    0    0    0    0    0
## North Carolina    0    0    0    0    0    0    0    0    0    0    0    0    0
## North Dakota      0    0    0    0    0    0    0    0    0    0    0    0    0
## Ohio              0    0    0    0    0    0    0    0    0    0    0    0    0
## Oklahoma          0    0    0    0    0    0    0    0    0    0    0    0    0
## Pennsylvania      0    0    0    0    0    0    0    0    0    0    0    0    0
## Rhode Island      0    0    0    0    0    0    0    0    0    0    0    0    0
## South Carolina    0    0    0    0    0    0    0    0    0    0    0    0    0
## South Dakota      0    0    0    0    0    0    0    0    0    0    0    0    0
## Tennessee         0    0    0    0    0    0    0    0    0    0    0    0    0
## Texas             0    0    0    0    0    0    0    0    0    0    0    0    0
## Utah              0    0    0    0    0    0    0    0    0    0    0    0    0
## Vermont           0    0    0    0    0    0    0    0    0    0    0    0    0
## Virginia          0    0    0    0    0    0    0    0    0    0    0    0    0
## West Virginia     0    0    0    0    0    0    0    0    0    0    0    0    0
## Wisconsin         0    0    0    0    0    0    0    0    0    0    0    0    0
## Wyoming           0    0    0    0    0    0    0    0    0    0    0    0    0
## California        0    0    0    0    0    0    0    0    0    0    0    0    0
##                1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
## Alabama           0    0    0    0    0    0    0    0    0    0    0    0    0
## Arkansas          0    0    0    0    0    0    0    0    0    0    0    0    0
## Colorado          0    0    0    0    0    0    0    0    0    0    0    0    0
## Connecticut       0    0    0    0    0    0    0    0    0    0    0    0    0
## Delaware          0    0    0    0    0    0    0    0    0    0    0    0    0
## Georgia           0    0    0    0    0    0    0    0    0    0    0    0    0
## Idaho             0    0    0    0    0    0    0    0    0    0    0    0    0
## Illinois          0    0    0    0    0    0    0    0    0    0    0    0    0
## Indiana           0    0    0    0    0    0    0    0    0    0    0    0    0
## Iowa              0    0    0    0    0    0    0    0    0    0    0    0    0
## Kansas            0    0    0    0    0    0    0    0    0    0    0    0    0
## Kentucky          0    0    0    0    0    0    0    0    0    0    0    0    0
## Louisiana         0    0    0    0    0    0    0    0    0    0    0    0    0
## Maine             0    0    0    0    0    0    0    0    0    0    0    0    0
## Minnesota         0    0    0    0    0    0    0    0    0    0    0    0    0
## Mississippi       0    0    0    0    0    0    0    0    0    0    0    0    0
## Missouri          0    0    0    0    0    0    0    0    0    0    0    0    0
## Montana           0    0    0    0    0    0    0    0    0    0    0    0    0
## Nebraska          0    0    0    0    0    0    0    0    0    0    0    0    0
## Nevada            0    0    0    0    0    0    0    0    0    0    0    0    0
## New Hampshire     0    0    0    0    0    0    0    0    0    0    0    0    0
## New Mexico        0    0    0    0    0    0    0    0    0    0    0    0    0
## North Carolina    0    0    0    0    0    0    0    0    0    0    0    0    0
## North Dakota      0    0    0    0    0    0    0    0    0    0    0    0    0
## Ohio              0    0    0    0    0    0    0    0    0    0    0    0    0
## Oklahoma          0    0    0    0    0    0    0    0    0    0    0    0    0
## Pennsylvania      0    0    0    0    0    0    0    0    0    0    0    0    0
## Rhode Island      0    0    0    0    0    0    0    0    0    0    0    0    0
## South Carolina    0    0    0    0    0    0    0    0    0    0    0    0    0
## South Dakota      0    0    0    0    0    0    0    0    0    0    0    0    0
## Tennessee         0    0    0    0    0    0    0    0    0    0    0    0    0
## Texas             0    0    0    0    0    0    0    0    0    0    0    0    0
## Utah              0    0    0    0    0    0    0    0    0    0    0    0    0
## Vermont           0    0    0    0    0    0    0    0    0    0    0    0    0
## Virginia          0    0    0    0    0    0    0    0    0    0    0    0    0
## West Virginia     0    0    0    0    0    0    0    0    0    0    0    0    0
## Wisconsin         0    0    0    0    0    0    0    0    0    0    0    0    0
## Wyoming           0    0    0    0    0    0    0    0    0    0    0    0    0
## California        0    0    0    0    0    0    1    1    1    1    1    1    1
##                1996 1997 1998 1999 2000
## Alabama           0    0    0    0    0
## Arkansas          0    0    0    0    0
## Colorado          0    0    0    0    0
## Connecticut       0    0    0    0    0
## Delaware          0    0    0    0    0
## Georgia           0    0    0    0    0
## Idaho             0    0    0    0    0
## Illinois          0    0    0    0    0
## Indiana           0    0    0    0    0
## Iowa              0    0    0    0    0
## Kansas            0    0    0    0    0
## Kentucky          0    0    0    0    0
## Louisiana         0    0    0    0    0
## Maine             0    0    0    0    0
## Minnesota         0    0    0    0    0
## Mississippi       0    0    0    0    0
## Missouri          0    0    0    0    0
## Montana           0    0    0    0    0
## Nebraska          0    0    0    0    0
## Nevada            0    0    0    0    0
## New Hampshire     0    0    0    0    0
## New Mexico        0    0    0    0    0
## North Carolina    0    0    0    0    0
## North Dakota      0    0    0    0    0
## Ohio              0    0    0    0    0
## Oklahoma          0    0    0    0    0
## Pennsylvania      0    0    0    0    0
## Rhode Island      0    0    0    0    0
## South Carolina    0    0    0    0    0
## South Dakota      0    0    0    0    0
## Tennessee         0    0    0    0    0
## Texas             0    0    0    0    0
## Utah              0    0    0    0    0
## Vermont           0    0    0    0    0
## Virginia          0    0    0    0    0
## West Virginia     0    0    0    0    0
## Wisconsin         0    0    0    0    0
## Wyoming           0    0    0    0    0
## California        1    1    1    1    1

Estimates

The function estimator() uses Bayesian techniques to estimate each known datapoint using the others as datapoints. Calling estimator() along with the three methods from estimators on the newly created matrices provides us with the estimates for all three methods.

estimates = lapply(estimators, function(estimator) { estimator(setup$Y,
                                                               setup$N0, setup$T0) } )

head(estimates)

## $did
## synthdid: -27.349 +- NA. Effective N0/N0 = 38.0/38~1.0. Effective T0/T0 = 19.0/19~1.0. N1,T1 = 1,12. 
## 
## $sc
## synthdid: -19.620 +- NA. Effective N0/N0 = 3.8/38~0.1. Effective T0/T0 = Inf/19~Inf. N1,T1 = 1,12. 
## 
## $sdid
## synthdid: -15.604 +- NA. Effective N0/N0 = 16.4/38~0.4. Effective T0/T0 = 2.8/19~0.1. N1,T1 = 1,12.

Standard Errors

With estimates in hand for all three methods standard errors are calculated. The if/else portion can basically be ignored as the actual replication also compared the matrix completion causal method–as such the else portion was used to get standard errors. The bulk of the lifting here is done by the vcov() function which is a quick way of getting the covariance matrix. Note the use of “placebo” as the method with other options in the synthdid package being “bootstrap” and “jackknife”.

standard.errors = mapply(function(estimate, name) {
  set.seed(12345)
  if(name == 'mc') { mc_placebo_se(setup$Y, setup$N0, setup$T0) }
  else {             sqrt(vcov(estimate, method='placebo'))     }
}, estimates, names(estimators))

head(standard.errors)

##       did        sc      sdid 
## 17.740267  9.917426  8.367993

Creating Output Table

Standard errors and estimates for each of the three methods are merged into a table via rbind() after being unlisted. Row and column names are also defined with results rounded to one decimal place.

Comparative analysis of standard errors bares stronger evidence in favor of SDiD or SC over DiD. Proportionally, DiD’s standard error is 65% of the estimator while SDiD’s is 54%. SDiD is a more flexible method and as such should have relatively higher variance and standard errors in comparison to the less flexible DiD method. However, as Table 1 shows the standard error for SDiD is both nominally and proportionally smaller than that of DiD. The outperformance of SDiD and SC over DiD can be attributed to the use of weighting which offered a more localized approach by emphasizing control units more similar pre-treatment to treatment units.

california.table = rbind(unlist(estimates), unlist(standard.errors))
rownames(california.table) = c('estimate', 'standard error')
colnames(california.table) = toupper(names(estimators))
round(california.table, digits=1)

##                  DID    SC  SDID
## estimate       -27.3 -19.6 -15.6
## standard error  17.7   9.9   8.4

Figure 1

By utilizing the the synthdid_plot function from synthdid package were are able to get a tremendous glimpse into both the effects of Prop 99 and the nuances between these three methods. More information on the arguments involved with synthdid_plot can be found here.

DiD Plot

The first plot is the DiD plot. The addition of the pre-treatment dashed trend line for California is a great way of visualizing the parallel trends assumption key to DiD. The general idea with this assumption is to fit trendlines between the treatment unit and the control groups based on pre-treatment data. As the highlighted red area at the bottom of the first facet shows the entire pre-treatment period was weighted equally in determining the trends. The causal estimate of-27.3 can be seen as the difference in the slope of the dashed parallel assumption line and the actual California outcome post-treatment in 1989.

SC Plot

The middle plot of SC methodology does not include dashed trend lines. This is because SC does not rely on the parallel trends assumption. Instead a synthetic control unit is built by weighting a group of states such that their pre-treatment trend shares a strong similarity with treatment unit. However, the synthetic control unit does not receive treatment and continues on serving as a counterfactual. The divergence in slope of California and the synthetic control shows the estimate of -19.6.

SDiD Plot

In the final facet we see SDiD in action. Like its parent DiD, SDiD uses the parallel trends assumption. But as seen in the DiD plot the actual data pre-treatment between California and other states is far from parallel. We can force trend lines on the DiD to satisfy the assumption but clearly the two are not parallel. To remedy this and better satisfy the parallel trends assumption the idea of a synthetic control is applied. By creating a control unit based on the most applicable states like the SC method does we are able to create a control unit where the actual outcomes line pre-treatment is far more parallel than that of the DiD plot. Also worth noting is the difference in the weighting of time periods as well between DiD and SDiD. SDiD gives much more of the weight in the immediate years leading up to Prop 99 going into effect in 1989.

synthdid_plot(estimates[1:3], facet.vertical=FALSE,
              control.name='control', treated.name='california',
              lambda.comparable=TRUE, se.method = 'none',
              trajectory.linetype = 1, line.width=.75, effect.curvature=-.4,
              trajectory.alpha=.7, effect.alpha=.7,
              diagram.alpha=1, onset.alpha=.7) +
  theme(legend.position=c(.26,.07), legend.direction='horizontal',
        legend.key=element_blank(), legend.background=element_blank(),
        strip.background=element_blank(), strip.text.x = element_blank())

Figure 2

Figure 2 does not tell us much about the effects of Prop 99, but it does tell us a lot about the performance of these three methods. This plots are in the same order as Figure 1 (DiD, SC, SDiD). Understanding this visualization can best be done by trying to answer these questions:

Which plot has the most uniform dot sizes?
Which plot has the least amount of dots far away from the dark horizontal line which represents the estimate for each?
Which plot offers the best combination of the prior two questions?

Clearly, the final plot, SDiD offers by far the best combination of uniform dot sizes while falling close to the estimate line. Each of these dots represent non-treatment states which were used to create the control unit. The DiD plot has a nice even weighting on each state but suffers from outliers like Nebraska and Montana falling very far from the estimate with several other moderately far. The SC plot actually does a little bit better as the heavily weighted states contributing to its control unit fall close to the estimate line with a few weaker contributors as outliers. SDiD on the other hand has a much neater, tighter fit of its control unit contributors to the estimate line.

synthdid_units_plot(rev(estimates[1:3]), se.method='none') +
  theme(legend.background=element_blank(), legend.title = element_blank(),
        legend.direction='horizontal', legend.position=c(.17,.07),
        strip.background=element_blank(), strip.text.x = element_blank())

Conclusions

My paper only went as far to outline SDiD and the methods from which it draws inspiration and replicates analysis on California Proposition 99 from Synthetic Difference-in-Differences. As such it can not properly assert SDiD domination over DiD or to a lesser degree SC. The results of this paper certainly do lend credence to the idea SDiD may be an incredible addition to the tool bag of anyone looking to use applied microeconomics and could be superior a causal estimation method when using panel data in social sciences.

Deeper analysis is presented for the superiority of SDiD in the actual paper which were beyond the scope of my paper or its author’s computational abilities. The authors ran placebo simulations to further prove SDiD’s dominance. Using data from the Current Population Survey a placebo simulation was run gauging DiD and SC against SDiD. SDiD outperformed its recent ancestors with a root-mean-square deviation of 0.28 compared to 0.37 for SC and 0.49 for DiD. SDiD also registered lower bias readings at 0.10 as opposed to 0.20 for SC and 0.21 for DiD. A second simulation was run based off of the Penn World Table dataset. Its findings concur our findings with Prop 99 with SDiD outperforming SC marginally as measured by root-mean-sqaure- deviation and bias while significantly outperforming DiD.