require(data.table)
library(tidyverse)
Scenario: Let’s suppose that as of 1990, recreational Marijuana consumption and sales are legal only in the State of Georgia.In “states” variable,Georgia is indicated as 1.
state <- seq(1:50)
year <- 1970:2019
frame <- data.frame(state, year)
dt <- data.table(frame, key=c("state", "year"))
#make panel data
comb <- CJ(1:50, 1970:2019)
ans <- dt[comb]
#put values for marijuana, cigarette, and beer
ans$marij <- rnorm(n = 2500, mean = 10, sd = 2)
ans$ciga <- rnorm(n = 2500, mean = 119, sd = 33)
ans$beer <- rnorm(n = 2500, mean = 23, sd = 4)
#increase marijuana sales after 1990 in Georgia (states==1)
ans$marij[ans$year >= 1990 & ans$state == 1] <- rnorm(n = 1500, mean = 20, sd = 3)
head(ans,5)
## state year marij ciga beer
## 1: 1 1970 9.178839 176.4733 18.22219
## 2: 1 1971 8.163153 119.5286 23.22194
## 3: 1 1972 10.894528 114.7572 12.94722
## 4: 1 1973 8.371598 112.8960 26.87022
## 5: 1 1974 7.476360 150.3786 26.08536
ans_g<-filter(ans, state==1)
plot <- ggplot(ans_g, aes(x=year, y=marij, group=state, color=state)) +
geom_line() +
geom_vline(aes(xintercept=1990))
plot
As we can see, since the marijuana sales became legal in Georgia in 1990, its sales increase clearly after 1990. After increase in marijuana in 1990, the sales shows some fluctuation over the years.
ans_by<-filter(ans, state<=10)
plot2 <- ggplot(ans_by, aes(x=year, y=marij, group=state, color=state)) +
geom_line() +
geom_vline(aes(xintercept=1990))+
labs(
colour = "state"
)+
theme(legend.position='none')
plot2
Explanation: To compare Georgia and other states, I chose 9 states other than Georgia, for example. 9 states shows random trends across the year, while only Georgia shows increased trend after the policy implementation. Thus it looks good to conduct the research to see the effect of marijuana policy on its sales in Georgia.
I chose Difference in Differences method. Since data has clear pre-policy and post-policy period, and policy affected and unaffected group, Difference in Differences method would be appropriate model to be conducted.
Main Specification: \[y_{it}= \beta_1 +\beta_2treat_i + \beta_3post_t +\beta_4treat_i*post_t+\beta_5beerSales_{it}+\beta_6 cigaSales_{it} + e_{it} \]
,where y is marijuana sales, treat is a dummy variable indicating whether it is the policy affected group, post is a dummy variable indicating whether it is post-policy period, treat_post is interaction term between treat and post variable.I also control for beer and cigarette sales because the trajectory of marijuana sales would be related to beer or cigarette sales.
#create the treatment and post dummy variables.
ans$treat<-ifelse(ans$state== 1, 1,0)
ans$post<-ifelse(ans$year>= 1990, 1,0)
ans$treat_post<-ans$treat*ans$post
#regression
reg <- summary(lm(marij~ treat+post+treat_post+beer+ciga ,data=ans))
reg
##
## Call:
## lm(formula = marij ~ treat + post + treat_post + beer + ciga,
## data = ans)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.8639 -1.3185 -0.0046 1.3287 7.3450
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.5923047 0.2808965 34.149 <2e-16 ***
## treat -0.4561177 0.4548691 -1.003 0.316
## post 0.0024202 0.0830803 0.029 0.977
## treat_post 11.5352875 0.5872420 19.643 <2e-16 ***
## beer 0.0134121 0.0099008 1.355 0.176
## ciga 0.0007817 0.0011954 0.654 0.513
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.014 on 2494 degrees of freedom
## Multiple R-squared: 0.2655, Adjusted R-squared: 0.2641
## F-statistic: 180.3 on 5 and 2494 DF, p-value: < 2.2e-16
The DID estimator, treat_post, is ATT. The coefficient of treat_post is significant and quite large. Marijuana sales increased by 11 units in Georgia in the post-policy period compared to the other states in pre-policy period. The covariates such as beer and cigarette sales do not show significant coefficients.
Synthetic control is a good method to give robust evidence of causal effects. Since synthetic control is a weighted average of control group, this method can create similar synthetic control to treatment group of pre-intervention and show the contribution of this synthetic control to the counterfactual (Abadie et al., 2010). Thus, this method suggests that the synthetic control presents an approximation to the marijuana sales that would have happened in Georgia in 1970-2019 in the absence of the policy.
require(tidysynth)
ma_out <-
ans %>%
# initial the synthetic control object
synthetic_control(outcome = marij, # outcome
unit = state, # unit index in the panel data
time = year, # time index in the panel data
i_unit = 1, # unit where the intervention occurred
i_time = 1990, # time period when the intervention occurred
generate_placebos=T # generate placebo synthetic controls (for inference)
) %>%
# Generate the aggregate predictors used to fit the weights
# average beer and cigarette consumption in the donor pool from 1970 - 1990
generate_predictor(time_window = 1970:1990,
beer_sales = mean(beer, na.rm = T),
ciga_sales = mean(ciga, na.rm = T)) %>%
# Lagged cigarette sales
generate_predictor(time_window = 1975,
cigsale_1975 = marij) %>%
generate_predictor(time_window = 1980,
cigsale_1980 = marij) %>%
generate_predictor(time_window = 1985,
cigsale_1985 = marij) %>%
generate_predictor(time_window = 1990,
cigsale_1990 = marij) %>%
# Generate the fitted weights for the synthetic control
generate_weights(optimization_window = 1970:1990, # time to use in the optimization task
margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
) %>%
# Generate the synthetic control
generate_control()
ma_out %>% plot_trends()
Figure shows marijuana sales for Georgia and its synthetic control
during 1970-2019 period. We can see that marijuana sales in Georgia and
the synthetic Georgia is quite close for pre-policy period. After the
1990 policy implementation, the gap of sales between Georgia and its
synthetic counterpart becomes large. This suggests that the synthetic
Georgia provides a good approximation to the marijuana sales that would
happen in Georgia during 1970-2019 in the absence of the policy.
ma_out %>% plot_differences()
Figure presents that gap in marijuana sales between Georgia and its
synthetic counterpart.
ma_out %>% grab_balance_table()
## # A tibble: 6 x 4
## variable `1` synthetic_1 donor_sample
## <chr> <dbl> <dbl> <dbl>
## 1 beer_sales 23.2 23.6 23.3
## 2 ciga_sales 121. 124. 120.
## 3 cigsale_1975 13.8 9.51 10.2
## 4 cigsale_1980 8.04 10.3 10.3
## 5 cigsale_1985 11.3 11.5 9.94
## 6 cigsale_1990 22.5 14.6 10.2
ma_out %>% plot_weights()
ma_out %>% plot_placebos()
Figure shows the marijuana gap in Georgia and placebo gap in the other countries. In this figure, Georgia gap is clearly the most distinctive in the post policy period.
Reference Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American statistical Association, 105(490), 493-505.