This project is a synopsis of a replication paper written while I was in grad school on the newest causal method developed by recent Nobel Prize winner Guido Imbens, Susan Athey, et al. Their paper can be found here and Athey gives a wonderful presentation breaking down the new method which looks to increase robustness by combining components from Difference-in-Differences and Synthetic Control methodologies.
The dataset used is regarding California’s Proposition 99 which includes cigarette consumption data on 38 states from 1970 to 2000. Prop 99 instituted a tax on cigarettes sales in 1989 in efforts to decrease cigarette consumption. While intuition supports the notion this worked proving causation requires more and with lack of a true counterfactual we must rely on experimental designs and mathematics to create counterfactuals.
To evaluate the efficacy of Prop 99, three different, yet related, causal methods will be used and compared for robustness.
Difference-in-Differences (DiD)
Synthetic Control (SC)
Synthetic Difference-in-Differences (SDiD)
devtools::install_github("synth-inference/synthdid")
## Skipping install of 'synthdid' from a github remote, the SHA1 (b839f2cc) has not changed since last install.
## Use `force = TRUE` to force installation
library(synthdid)
library(rngtools)
library(future)
library(doFuture)
## Loading required package: foreach
library(future.batchtools)
library(xtable)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(tibble)
library(ggplot2)
The initial step is just to simply setup a list of estimators for the three causal methods being evaluated. Here these are each set to the corresponding function in the synthdid package which delineates between each methodology. The second half of this publication goes into more or the nuances between estimate methods.
estimators = list(did=did_estimate,
sc=sc_estimate,
sdid=synthdid_estimate)
str(synthdid_estimate)
## function (Y, N0, T0, X = array(dim = c(dim(Y), 0)), noise.level = sd(apply(Y[1:N0,
## 1:T0], 1, diff)), eta.omega = ((nrow(Y) - N0) * (ncol(Y) - T0))^(1/4),
## eta.lambda = 1e-06, zeta.omega = eta.omega * noise.level, zeta.lambda = eta.lambda *
## noise.level, omega.intercept = TRUE, lambda.intercept = TRUE, weights = list(omega = NULL,
## lambda = NULL), update.omega = is.null(weights$omega), update.lambda = is.null(weights$lambda),
## min.decrease = 1e-05 * noise.level, max.iter = 10000, sparsify = sparsify_function,
## max.iter.pre.sparsify = 100)
str(sc_estimate)
## function (Y, N0, T0, eta.omega = 1e-06, ...)
str(did_estimate)
## function (Y, N0, T0, ...)
One of the beauties of Prop 99 data is it requires minimal processing and widely available. As seen below this is panel data and is organized by year for each state with cigarette packs per capita as the variable of interest. A column denoting treatment is also present with dummy variables, but only California is in the treatment group. Panel data typically includes:
Units (n) -> states
Time periods (t) -> years
Outcomes (Y) -> packs per capita
Treatment Indicators -> binary dummy variable
data('california_prop99')
head(california_prop99)
## State Year PacksPerCapita treated
## 1 Alabama 1970 89.8 0
## 2 Arkansas 1970 100.3 0
## 3 Colorado 1970 124.8 0
## 4 Connecticut 1970 120.0 0
## 5 Delaware 1970 155.0 0
## 6 Georgia 1970 109.9 0
Here we are simply converting the data from panel data into matrices which is required by sythdid. Synthdid has a useful function panel.matrices() which we used here. Notice the difference on the call of head() from the previous step. We end up with four matrices:
$Y is a 38 x 31 matrix of with outcome data organized in the 38 rows from states and 31 columns for years
$N0 is simply the number of units or states
$T0 is the number time periods, years, during the pre-treatment era which in this case if before 1989.
$W is our counterfactual group or control group eligible units signified by all 0’s for treatment and the lack of California
setup = panel.matrices(california_prop99)
head(setup)
## $Y
## 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## Alabama 89.8 95.4 101.1 102.9 108.2 111.7 116.2 117.1 123.0 121.4
## Arkansas 100.3 104.1 103.9 108.0 109.7 114.8 119.1 122.6 127.3 126.5
## Colorado 124.8 125.5 134.3 137.9 132.8 131.0 134.2 132.0 129.2 131.5
## Connecticut 120.0 117.6 110.8 109.3 112.4 110.2 113.4 117.3 117.5 117.4
## Delaware 155.0 161.1 156.3 154.7 151.3 147.6 153.0 153.3 155.5 150.2
## Georgia 109.9 115.7 117.0 119.8 123.7 122.9 125.9 127.9 130.6 131.0
## Idaho 102.4 108.5 126.1 121.8 125.6 123.3 125.1 125.0 122.8 117.5
## Illinois 124.8 125.6 126.6 124.4 131.9 131.8 134.4 134.0 136.7 135.3
## Indiana 134.6 139.3 149.2 156.0 159.6 162.4 166.6 173.0 150.9 148.9
## Iowa 108.5 108.4 109.4 110.6 116.1 120.5 124.4 125.5 127.1 124.2
## Kansas 114.0 102.8 111.0 115.2 118.6 123.4 127.7 127.9 127.1 126.4
## Kentucky 155.8 163.5 179.4 201.9 212.4 223.0 230.9 229.4 224.7 214.9
## Louisiana 115.9 119.8 125.3 126.7 129.9 133.6 139.6 140.0 142.7 140.1
## Maine 128.5 133.2 136.5 138.0 142.1 140.7 144.9 145.6 143.9 138.5
## Minnesota 104.3 116.4 96.8 106.8 110.6 111.5 116.7 117.2 118.9 118.3
## Mississippi 93.4 105.4 112.1 115.0 117.1 116.8 120.9 122.1 124.9 123.9
## Missouri 121.3 127.6 130.0 132.1 135.4 135.6 139.5 140.8 141.8 140.2
## Montana 111.2 115.6 122.2 119.9 121.9 123.7 124.9 127.0 127.2 120.3
## Nebraska 108.1 108.6 104.9 106.6 110.5 114.1 118.1 117.7 117.4 116.1
## Nevada 189.5 190.5 198.6 201.5 204.7 205.2 201.4 190.8 187.0 183.3
## New Hampshire 265.7 278.0 296.2 279.0 269.8 269.1 290.5 278.8 269.6 254.6
## New Mexico 90.0 92.6 99.3 98.9 100.3 103.1 102.4 102.4 103.1 101.0
## North Carolina 172.4 187.6 214.1 226.5 227.3 226.0 230.2 217.0 205.5 197.3
## North Dakota 93.8 98.5 103.8 108.7 110.5 117.9 125.4 122.2 121.9 121.3
## Ohio 121.6 124.6 124.4 120.5 122.1 122.5 124.6 127.3 131.3 130.9
## Oklahoma 108.4 115.4 121.7 124.1 130.5 132.9 138.6 140.4 143.6 141.6
## Pennsylvania 107.3 106.3 109.0 110.7 114.2 114.6 118.8 120.1 122.3 122.6
## Rhode Island 123.9 123.2 134.4 142.0 146.1 154.7 150.2 148.8 146.8 145.8
## South Carolina 103.6 115.0 118.7 125.5 129.7 130.5 136.8 137.2 140.4 135.7
## South Dakota 92.7 96.7 103.0 103.5 108.4 113.5 116.7 115.6 116.9 117.4
## Tennessee 99.8 106.3 111.5 109.7 114.8 117.4 121.7 124.6 127.3 127.2
## Texas 106.4 108.9 108.6 110.4 114.7 116.0 121.4 124.2 126.6 126.4
## Utah 65.5 67.7 71.3 72.7 75.6 75.8 77.9 78.0 79.6 79.1
## Vermont 122.6 124.4 138.0 146.8 151.8 155.5 171.1 169.4 162.4 160.9
## Virginia 124.3 128.4 137.0 143.1 149.6 152.7 158.1 157.7 155.9 151.8
## West Virginia 114.5 111.5 117.5 116.6 119.9 123.2 129.7 133.9 131.6 122.1
## Wisconsin 106.4 105.4 108.8 109.5 111.8 113.5 115.4 117.2 116.7 117.1
## Wyoming 132.2 131.7 140.0 141.2 145.8 160.7 161.5 160.4 160.3 168.6
## California 123.0 121.0 123.5 124.4 126.7 127.1 128.0 126.4 126.1 121.9
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
## Alabama 123.2 119.6 119.1 116.3 113.0 114.5 116.3 114.0 112.1 105.6
## Arkansas 131.8 128.7 127.4 128.0 123.1 125.8 126.0 122.3 121.5 118.3
## Colorado 131.0 133.8 130.5 125.3 119.7 112.4 109.9 102.4 94.6 88.8
## Connecticut 118.0 116.4 114.7 114.1 112.5 111.0 108.5 109.0 104.8 100.6
## Delaware 150.5 152.6 154.1 149.6 144.0 144.5 142.4 141.0 137.1 131.7
## Georgia 134.0 131.7 131.2 128.6 126.3 128.8 129.0 129.3 124.1 117.1
## Idaho 115.2 114.1 111.5 111.3 103.6 100.7 96.7 95.0 84.5 78.4
## Illinois 135.2 133.0 130.7 127.9 124.0 121.6 118.2 109.5 107.6 104.6
## Indiana 146.9 148.5 147.7 143.0 137.8 135.3 137.6 134.0 134.0 132.5
## Iowa 124.6 132.9 116.2 115.6 111.2 109.4 104.1 101.1 100.2 94.4
## Kansas 127.1 132.0 130.9 127.6 121.7 115.7 109.4 105.2 103.2 96.5
## Kentucky 215.3 209.7 210.6 201.1 183.2 182.4 179.8 171.2 173.2 171.6
## Louisiana 143.8 144.0 143.9 133.7 128.9 125.0 121.2 116.5 110.9 103.6
## Maine 141.2 138.9 139.5 135.4 135.5 127.9 119.0 125.0 125.0 122.4
## Minnesota 117.7 120.8 119.4 113.2 110.8 113.0 104.3 108.8 94.1 92.3
## Mississippi 127.0 125.3 125.8 122.3 116.4 115.3 113.2 110.0 109.0 108.3
## Missouri 142.1 140.5 139.7 134.1 130.0 129.2 128.8 128.7 127.4 122.8
## Montana 122.0 121.1 122.4 113.7 110.1 103.6 97.8 91.7 87.1 86.2
## Nebraska 116.3 117.0 117.1 110.8 107.7 105.1 103.1 101.3 92.9 93.8
## Nevada 177.7 171.9 165.1 159.2 136.6 146.7 142.6 147.7 141.9 137.9
## New Hampshire 247.8 245.4 239.8 232.9 215.1 201.1 195.9 195.1 180.4 172.9
## New Mexico 102.7 103.0 97.5 96.3 88.9 88.0 88.2 82.3 77.7 74.4
## North Carolina 187.8 179.3 179.0 169.8 160.6 156.3 154.4 150.5 146.0 139.3
## North Dakota 123.7 125.7 126.8 119.6 109.4 103.2 99.8 92.3 87.1 84.1
## Ohio 133.5 132.8 134.0 130.0 127.1 126.7 126.3 124.6 122.4 118.6
## Oklahoma 141.6 143.7 147.0 140.0 128.1 124.2 119.9 113.1 103.6 97.5
## Pennsylvania 124.0 125.2 123.3 125.3 115.3 115.8 113.9 110.6 107.6 107.1
## Rhode Island 149.3 151.2 146.3 135.8 136.9 133.4 136.3 124.4 138.0 120.8
## South Carolina 138.3 136.1 136.0 131.1 127.0 125.4 126.6 126.6 124.4 122.4
## South Dakota 114.7 115.7 113.0 109.8 105.7 104.4 97.0 95.8 91.9 87.4
## Tennessee 130.4 129.1 131.4 129.0 125.1 128.7 129.0 130.6 125.3 124.7
## Texas 129.7 129.0 131.2 126.4 117.2 115.9 113.7 105.8 96.5 94.5
## Utah 74.8 77.6 73.6 69.0 66.3 66.5 64.4 67.7 55.0 57.0
## Vermont 161.6 163.8 162.3 153.8 144.3 144.5 131.2 128.3 128.7 120.9
## Virginia 148.9 149.9 147.4 144.7 136.8 134.6 135.8 133.0 129.5 122.5
## West Virginia 122.3 120.5 119.8 115.7 111.9 109.1 112.1 107.5 109.1 104.0
## Wisconsin 117.6 119.9 115.6 106.3 105.6 107.0 105.4 106.0 102.6 100.3
## Wyoming 158.1 163.1 157.7 141.2 128.9 125.7 124.8 110.4 114.3 111.4
## California 120.2 118.6 115.4 110.8 104.8 102.8 99.7 97.5 90.1 82.4
## 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
## Alabama 108.6 107.9 109.1 108.5 107.1 102.6 101.4 104.9 106.2 100.7
## Arkansas 113.1 116.8 126.0 113.8 108.8 113.0 110.7 108.7 109.5 104.8
## Colorado 87.4 90.2 88.3 88.6 89.1 85.4 83.1 81.3 81.2 79.6
## Connecticut 91.5 86.7 83.5 79.1 76.6 79.3 76.0 75.9 75.5 73.4
## Delaware 127.2 118.8 120.0 123.8 126.1 127.2 128.3 124.1 132.8 139.5
## Georgia 113.8 109.6 109.2 109.2 107.8 100.3 102.7 100.6 100.5 97.1
## Idaho 90.1 85.4 85.1 86.7 93.0 78.2 73.6 75.0 78.9 75.1
## Illinois 94.1 96.1 94.8 94.6 85.7 84.3 81.8 79.6 80.3 72.2
## Indiana 128.3 127.2 128.2 126.8 128.2 135.4 135.1 135.3 135.9 133.3
## Iowa 95.4 97.1 95.2 92.5 93.4 93.0 94.0 93.9 94.0 91.7
## Kansas 94.3 91.8 90.0 89.9 89.1 90.1 88.7 89.2 87.6 83.3
## Kentucky 182.5 170.4 167.6 167.6 170.1 175.3 179.0 186.8 171.3 165.3
## Louisiana 101.5 107.2 108.5 106.2 105.3 105.7 106.8 105.3 103.2 101.0
## Maine 117.5 116.1 114.5 108.5 101.6 102.3 100.0 101.1 94.5 85.5
## Minnesota 90.7 86.2 83.8 81.6 83.4 84.1 81.7 84.1 83.2 80.7
## Mississippi 101.8 105.6 103.9 105.4 106.0 107.5 106.9 106.3 107.0 103.9
## Missouri 119.1 119.9 122.3 121.6 119.4 124.0 124.1 120.6 120.1 118.0
## Montana 84.7 82.9 86.6 86.0 88.2 90.5 87.3 88.9 89.1 82.6
## Nebraska 89.9 92.4 90.6 91.1 85.9 88.5 86.2 85.5 83.1 86.6
## Nevada 137.3 115.5 110.0 108.1 105.2 100.9 99.0 95.6 102.4 103.9
## New Hampshire 152.4 144.8 143.7 148.9 153.8 158.5 158.0 174.4 173.8 171.7
## New Mexico 70.8 69.9 71.4 69.0 68.2 67.0 65.7 61.8 62.6 59.7
## North Carolina 133.7 132.7 128.9 129.7 112.7 124.9 129.7 125.6 126.0 113.1
## North Dakota 77.1 85.2 74.3 83.0 81.0 80.6 80.8 77.5 79.1 74.7
## Ohio 115.5 113.2 112.3 108.9 108.6 111.7 107.6 108.6 106.4 104.0
## Oklahoma 88.4 87.8 86.3 86.2 104.8 109.5 110.8 111.8 112.2 111.4
## Pennsylvania 101.3 102.5 96.2 94.7 95.4 95.4 93.3 92.9 92.1 91.1
## Rhode Island 101.4 103.6 100.1 94.1 91.9 90.8 87.5 90.0 88.7 86.9
## South Carolina 118.6 121.5 112.8 115.2 112.2 109.2 102.9 124.5 126.9 109.4
## South Dakota 88.3 91.8 93.0 91.6 94.8 98.6 92.3 88.8 88.3 83.5
## Tennessee 121.8 120.6 121.0 120.8 118.8 125.4 119.2 118.9 119.7 115.6
## Texas 85.6 79.4 77.2 81.3 78.8 75.2 74.6 72.6 73.2 67.6
## Utah 53.4 53.5 55.0 56.2 55.8 52.0 54.0 57.0 42.3 43.9
## Vermont 124.3 120.9 126.5 117.2 120.3 123.2 102.5 97.7 97.0 94.1
## Virginia 118.9 109.1 108.2 105.4 106.2 106.7 104.6 108.0 105.6 102.1
## West Virginia 104.1 100.1 97.9 111.0 104.2 115.2 112.7 114.5 114.6 112.4
## Wisconsin 94.0 95.5 96.2 91.2 91.8 93.5 92.1 91.9 88.7 84.4
## Wyoming 96.9 109.1 110.8 108.4 111.2 115.0 110.3 108.8 102.9 104.8
## California 77.8 68.7 67.5 63.4 58.6 56.4 54.5 53.8 52.3 47.2
## 2000
## Alabama 96.2
## Arkansas 99.4
## Colorado 73.0
## Connecticut 71.4
## Delaware 140.7
## Georgia 88.4
## Idaho 66.9
## Illinois 70.0
## Indiana 125.5
## Iowa 88.9
## Kansas 79.8
## Kentucky 156.2
## Louisiana 104.3
## Maine 82.9
## Minnesota 76.0
## Mississippi 97.2
## Missouri 113.8
## Montana 75.5
## Nebraska 77.6
## Nevada 93.2
## New Hampshire 147.3
## New Mexico 53.8
## North Carolina 109.0
## North Dakota 72.5
## Ohio 99.9
## Oklahoma 108.9
## Pennsylvania 87.9
## Rhode Island 83.1
## South Carolina 103.9
## South Dakota 75.1
## Tennessee 108.7
## Texas 69.3
## Utah 40.7
## Vermont 88.9
## Virginia 96.7
## West Virginia 107.9
## Wisconsin 80.1
## Wyoming 90.5
## California 41.6
##
## $N0
## [1] 38
##
## $T0
## [1] 19
##
## $W
## 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982
## Alabama 0 0 0 0 0 0 0 0 0 0 0 0 0
## Arkansas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Colorado 0 0 0 0 0 0 0 0 0 0 0 0 0
## Connecticut 0 0 0 0 0 0 0 0 0 0 0 0 0
## Delaware 0 0 0 0 0 0 0 0 0 0 0 0 0
## Georgia 0 0 0 0 0 0 0 0 0 0 0 0 0
## Idaho 0 0 0 0 0 0 0 0 0 0 0 0 0
## Illinois 0 0 0 0 0 0 0 0 0 0 0 0 0
## Indiana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Iowa 0 0 0 0 0 0 0 0 0 0 0 0 0
## Kansas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Kentucky 0 0 0 0 0 0 0 0 0 0 0 0 0
## Louisiana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Maine 0 0 0 0 0 0 0 0 0 0 0 0 0
## Minnesota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Mississippi 0 0 0 0 0 0 0 0 0 0 0 0 0
## Missouri 0 0 0 0 0 0 0 0 0 0 0 0 0
## Montana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Nebraska 0 0 0 0 0 0 0 0 0 0 0 0 0
## Nevada 0 0 0 0 0 0 0 0 0 0 0 0 0
## New Hampshire 0 0 0 0 0 0 0 0 0 0 0 0 0
## New Mexico 0 0 0 0 0 0 0 0 0 0 0 0 0
## North Carolina 0 0 0 0 0 0 0 0 0 0 0 0 0
## North Dakota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Ohio 0 0 0 0 0 0 0 0 0 0 0 0 0
## Oklahoma 0 0 0 0 0 0 0 0 0 0 0 0 0
## Pennsylvania 0 0 0 0 0 0 0 0 0 0 0 0 0
## Rhode Island 0 0 0 0 0 0 0 0 0 0 0 0 0
## South Carolina 0 0 0 0 0 0 0 0 0 0 0 0 0
## South Dakota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Tennessee 0 0 0 0 0 0 0 0 0 0 0 0 0
## Texas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Utah 0 0 0 0 0 0 0 0 0 0 0 0 0
## Vermont 0 0 0 0 0 0 0 0 0 0 0 0 0
## Virginia 0 0 0 0 0 0 0 0 0 0 0 0 0
## West Virginia 0 0 0 0 0 0 0 0 0 0 0 0 0
## Wisconsin 0 0 0 0 0 0 0 0 0 0 0 0 0
## Wyoming 0 0 0 0 0 0 0 0 0 0 0 0 0
## California 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
## Alabama 0 0 0 0 0 0 0 0 0 0 0 0 0
## Arkansas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Colorado 0 0 0 0 0 0 0 0 0 0 0 0 0
## Connecticut 0 0 0 0 0 0 0 0 0 0 0 0 0
## Delaware 0 0 0 0 0 0 0 0 0 0 0 0 0
## Georgia 0 0 0 0 0 0 0 0 0 0 0 0 0
## Idaho 0 0 0 0 0 0 0 0 0 0 0 0 0
## Illinois 0 0 0 0 0 0 0 0 0 0 0 0 0
## Indiana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Iowa 0 0 0 0 0 0 0 0 0 0 0 0 0
## Kansas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Kentucky 0 0 0 0 0 0 0 0 0 0 0 0 0
## Louisiana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Maine 0 0 0 0 0 0 0 0 0 0 0 0 0
## Minnesota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Mississippi 0 0 0 0 0 0 0 0 0 0 0 0 0
## Missouri 0 0 0 0 0 0 0 0 0 0 0 0 0
## Montana 0 0 0 0 0 0 0 0 0 0 0 0 0
## Nebraska 0 0 0 0 0 0 0 0 0 0 0 0 0
## Nevada 0 0 0 0 0 0 0 0 0 0 0 0 0
## New Hampshire 0 0 0 0 0 0 0 0 0 0 0 0 0
## New Mexico 0 0 0 0 0 0 0 0 0 0 0 0 0
## North Carolina 0 0 0 0 0 0 0 0 0 0 0 0 0
## North Dakota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Ohio 0 0 0 0 0 0 0 0 0 0 0 0 0
## Oklahoma 0 0 0 0 0 0 0 0 0 0 0 0 0
## Pennsylvania 0 0 0 0 0 0 0 0 0 0 0 0 0
## Rhode Island 0 0 0 0 0 0 0 0 0 0 0 0 0
## South Carolina 0 0 0 0 0 0 0 0 0 0 0 0 0
## South Dakota 0 0 0 0 0 0 0 0 0 0 0 0 0
## Tennessee 0 0 0 0 0 0 0 0 0 0 0 0 0
## Texas 0 0 0 0 0 0 0 0 0 0 0 0 0
## Utah 0 0 0 0 0 0 0 0 0 0 0 0 0
## Vermont 0 0 0 0 0 0 0 0 0 0 0 0 0
## Virginia 0 0 0 0 0 0 0 0 0 0 0 0 0
## West Virginia 0 0 0 0 0 0 0 0 0 0 0 0 0
## Wisconsin 0 0 0 0 0 0 0 0 0 0 0 0 0
## Wyoming 0 0 0 0 0 0 0 0 0 0 0 0 0
## California 0 0 0 0 0 0 1 1 1 1 1 1 1
## 1996 1997 1998 1999 2000
## Alabama 0 0 0 0 0
## Arkansas 0 0 0 0 0
## Colorado 0 0 0 0 0
## Connecticut 0 0 0 0 0
## Delaware 0 0 0 0 0
## Georgia 0 0 0 0 0
## Idaho 0 0 0 0 0
## Illinois 0 0 0 0 0
## Indiana 0 0 0 0 0
## Iowa 0 0 0 0 0
## Kansas 0 0 0 0 0
## Kentucky 0 0 0 0 0
## Louisiana 0 0 0 0 0
## Maine 0 0 0 0 0
## Minnesota 0 0 0 0 0
## Mississippi 0 0 0 0 0
## Missouri 0 0 0 0 0
## Montana 0 0 0 0 0
## Nebraska 0 0 0 0 0
## Nevada 0 0 0 0 0
## New Hampshire 0 0 0 0 0
## New Mexico 0 0 0 0 0
## North Carolina 0 0 0 0 0
## North Dakota 0 0 0 0 0
## Ohio 0 0 0 0 0
## Oklahoma 0 0 0 0 0
## Pennsylvania 0 0 0 0 0
## Rhode Island 0 0 0 0 0
## South Carolina 0 0 0 0 0
## South Dakota 0 0 0 0 0
## Tennessee 0 0 0 0 0
## Texas 0 0 0 0 0
## Utah 0 0 0 0 0
## Vermont 0 0 0 0 0
## Virginia 0 0 0 0 0
## West Virginia 0 0 0 0 0
## Wisconsin 0 0 0 0 0
## Wyoming 0 0 0 0 0
## California 1 1 1 1 1
The function estimator() uses Bayesian techniques to estimate each known datapoint using the others as datapoints. Calling estimator() along with the three methods from estimators on the newly created matrices provides us with the estimates for all three methods.
estimates = lapply(estimators, function(estimator) { estimator(setup$Y,
setup$N0, setup$T0) } )
head(estimates)
## $did
## synthdid: -27.349 +- NA. Effective N0/N0 = 38.0/38~1.0. Effective T0/T0 = 19.0/19~1.0. N1,T1 = 1,12.
##
## $sc
## synthdid: -19.620 +- NA. Effective N0/N0 = 3.8/38~0.1. Effective T0/T0 = Inf/19~Inf. N1,T1 = 1,12.
##
## $sdid
## synthdid: -15.604 +- NA. Effective N0/N0 = 16.4/38~0.4. Effective T0/T0 = 2.8/19~0.1. N1,T1 = 1,12.
With estimates in hand for all three methods standard errors are calculated. The if/else portion can basically be ignored as the actual replication also compared the matrix completion causal method–as such the else portion was used to get standard errors. The bulk of the lifting here is done by the vcov() function which is a quick way of getting the covariance matrix. Note the use of “placebo” as the method with other options in the synthdid package being “bootstrap” and “jackknife”.
standard.errors = mapply(function(estimate, name) {
set.seed(12345)
if(name == 'mc') { mc_placebo_se(setup$Y, setup$N0, setup$T0) }
else { sqrt(vcov(estimate, method='placebo')) }
}, estimates, names(estimators))
head(standard.errors)
## did sc sdid
## 17.740267 9.917426 8.367993
Standard errors and estimates for each of the three methods are merged into a table via rbind() after being unlisted. Row and column names are also defined with results rounded to one decimal place.
Comparative analysis of standard errors bares stronger evidence in favor of SDiD or SC over DiD. Proportionally, DiD’s standard error is 65% of the estimator while SDiD’s is 54%. SDiD is a more flexible method and as such should have relatively higher variance and standard errors in comparison to the less flexible DiD method. However, as Table 1 shows the standard error for SDiD is both nominally and proportionally smaller than that of DiD. The outperformance of SDiD and SC over DiD can be attributed to the use of weighting which offered a more localized approach by emphasizing control units more similar pre-treatment to treatment units.
california.table = rbind(unlist(estimates), unlist(standard.errors))
rownames(california.table) = c('estimate', 'standard error')
colnames(california.table) = toupper(names(estimators))
round(california.table, digits=1)
## DID SC SDID
## estimate -27.3 -19.6 -15.6
## standard error 17.7 9.9 8.4
By utilizing the the synthdid_plot function from synthdid package were are able to get a tremendous glimpse into both the effects of Prop 99 and the nuances between these three methods. More information on the arguments involved with synthdid_plot can be found here.
The first plot is the DiD plot. The addition of the pre-treatment dashed trend line for California is a great way of visualizing the parallel trends assumption key to DiD. The general idea with this assumption is to fit trendlines between the treatment unit and the control groups based on pre-treatment data. As the highlighted red area at the bottom of the first facet shows the entire pre-treatment period was weighted equally in determining the trends. The causal estimate of-27.3 can be seen as the difference in the slope of the dashed parallel assumption line and the actual California outcome post-treatment in 1989.
The middle plot of SC methodology does not include dashed trend lines. This is because SC does not rely on the parallel trends assumption. Instead a synthetic control unit is built by weighting a group of states such that their pre-treatment trend shares a strong similarity with treatment unit. However, the synthetic control unit does not receive treatment and continues on serving as a counterfactual. The divergence in slope of California and the synthetic control shows the estimate of -19.6.
In the final facet we see SDiD in action. Like its parent DiD, SDiD uses the parallel trends assumption. But as seen in the DiD plot the actual data pre-treatment between California and other states is far from parallel. We can force trend lines on the DiD to satisfy the assumption but clearly the two are not parallel. To remedy this and better satisfy the parallel trends assumption the idea of a synthetic control is applied. By creating a control unit based on the most applicable states like the SC method does we are able to create a control unit where the actual outcomes line pre-treatment is far more parallel than that of the DiD plot. Also worth noting is the difference in the weighting of time periods as well between DiD and SDiD. SDiD gives much more of the weight in the immediate years leading up to Prop 99 going into effect in 1989.
synthdid_plot(estimates[1:3], facet.vertical=FALSE,
control.name='control', treated.name='california',
lambda.comparable=TRUE, se.method = 'none',
trajectory.linetype = 1, line.width=.75, effect.curvature=-.4,
trajectory.alpha=.7, effect.alpha=.7,
diagram.alpha=1, onset.alpha=.7) +
theme(legend.position=c(.26,.07), legend.direction='horizontal',
legend.key=element_blank(), legend.background=element_blank(),
strip.background=element_blank(), strip.text.x = element_blank())
Figure 2 does not tell us much about the effects of Prop 99, but it does tell us a lot about the performance of these three methods. This plots are in the same order as Figure 1 (DiD, SC, SDiD). Understanding this visualization can best be done by trying to answer these questions:
Which plot has the most uniform dot sizes?
Which plot has the least amount of dots far away from the dark horizontal line which represents the estimate for each?
Which plot offers the best combination of the prior two questions?
Clearly, the final plot, SDiD offers by far the best combination of uniform dot sizes while falling close to the estimate line. Each of these dots represent non-treatment states which were used to create the control unit. The DiD plot has a nice even weighting on each state but suffers from outliers like Nebraska and Montana falling very far from the estimate with several other moderately far. The SC plot actually does a little bit better as the heavily weighted states contributing to its control unit fall close to the estimate line with a few weaker contributors as outliers. SDiD on the other hand has a much neater, tighter fit of its control unit contributors to the estimate line.
synthdid_units_plot(rev(estimates[1:3]), se.method='none') +
theme(legend.background=element_blank(), legend.title = element_blank(),
legend.direction='horizontal', legend.position=c(.17,.07),
strip.background=element_blank(), strip.text.x = element_blank())
My paper only went as far to outline SDiD and the methods from which it draws inspiration and replicates analysis on California Proposition 99 from Synthetic Difference-in-Differences. As such it can not properly assert SDiD domination over DiD or to a lesser degree SC. The results of this paper certainly do lend credence to the idea SDiD may be an incredible addition to the tool bag of anyone looking to use applied microeconomics and could be superior a causal estimation method when using panel data in social sciences.
Deeper analysis is presented for the superiority of SDiD in the actual paper which were beyond the scope of my paper or its author’s computational abilities. The authors ran placebo simulations to further prove SDiD’s dominance. Using data from the Current Population Survey a placebo simulation was run gauging DiD and SC against SDiD. SDiD outperformed its recent ancestors with a root-mean-square deviation of 0.28 compared to 0.37 for SC and 0.49 for DiD. SDiD also registered lower bias readings at 0.10 as opposed to 0.20 for SC and 0.21 for DiD. A second simulation was run based off of the Penn World Table dataset. Its findings concur our findings with Prop 99 with SDiD outperforming SC marginally as measured by root-mean-sqaure- deviation and bias while significantly outperforming DiD.