require(data.table)
library(tidyverse)

1. Create fake data to simulate a relevant situation

Scenario: Let’s suppose that as of 1990, recreational Marijuana consumption and sales are legal only in the State of Georgia.In “states” variable,Georgia is indicated as 1.

state <- seq(1:50)
year <- 1970:2019
frame <- data.frame(state, year)

dt <- data.table(frame, key=c("state", "year"))

#make panel data
comb <- CJ(1:50, 1970:2019) 
ans <- dt[comb]

#put values for marijuana, cigarette, and beer
ans$marij <- rnorm(n = 2500, mean = 10, sd = 2)
ans$ciga <- rnorm(n = 2500, mean = 119, sd = 33)
ans$beer <- rnorm(n = 2500, mean = 23, sd = 4)

#increase marijuana sales after 1990 in Georgia (states==1)
ans$marij[ans$year >= 1990 & ans$state == 1] <- rnorm(n = 1500, mean = 20, sd = 3) 
head(ans,5)
##    state year     marij     ciga     beer
## 1:     1 1970  9.178839 176.4733 18.22219
## 2:     1 1971  8.163153 119.5286 23.22194
## 3:     1 1972 10.894528 114.7572 12.94722
## 4:     1 1973  8.371598 112.8960 26.87022
## 5:     1 1974  7.476360 150.3786 26.08536

2. Plot the data to show what pattern you are trying to estimate

Plot 2. Comparison between Georgia and other 9 states, for example.

ans_by<-filter(ans, state<=10)

plot2 <- ggplot(ans_by, aes(x=year, y=marij, group=state, color=state)) +
  geom_line() +
  geom_vline(aes(xintercept=1990))+
  labs(
         colour = "state"
        )+
 theme(legend.position='none')

plot2

Explanation: To compare Georgia and other states, I chose 9 states other than Georgia, for example. 9 states shows random trends across the year, while only Georgia shows increased trend after the policy implementation. Thus it looks good to conduct the research to see the effect of marijuana policy on its sales in Georgia.

3 and 4. Regression and present the results

Method 1. Difference in Differences

I chose Difference in Differences method. Since data has clear pre-policy and post-policy period, and policy affected and unaffected group, Difference in Differences method would be appropriate model to be conducted.

Main Specification: \[y_{it}= \beta_1 +\beta_2treat_i + \beta_3post_t +\beta_4treat_i*post_t+\beta_5beerSales_{it}+\beta_6 cigaSales_{it} + e_{it} \]

,where y is marijuana sales, treat is a dummy variable indicating whether it is the policy affected group, post is a dummy variable indicating whether it is post-policy period, treat_post is interaction term between treat and post variable.I also control for beer and cigarette sales because the trajectory of marijuana sales would be related to beer or cigarette sales.

#create the treatment and post dummy variables.
ans$treat<-ifelse(ans$state== 1, 1,0)
ans$post<-ifelse(ans$year>= 1990, 1,0)
ans$treat_post<-ans$treat*ans$post
#regression
reg <- summary(lm(marij~ treat+post+treat_post+beer+ciga ,data=ans))

reg
## 
## Call:
## lm(formula = marij ~ treat + post + treat_post + beer + ciga, 
##     data = ans)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.8639 -1.3185 -0.0046  1.3287  7.3450 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.5923047  0.2808965  34.149   <2e-16 ***
## treat       -0.4561177  0.4548691  -1.003    0.316    
## post         0.0024202  0.0830803   0.029    0.977    
## treat_post  11.5352875  0.5872420  19.643   <2e-16 ***
## beer         0.0134121  0.0099008   1.355    0.176    
## ciga         0.0007817  0.0011954   0.654    0.513    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.014 on 2494 degrees of freedom
## Multiple R-squared:  0.2655, Adjusted R-squared:  0.2641 
## F-statistic: 180.3 on 5 and 2494 DF,  p-value: < 2.2e-16

The DID estimator, treat_post, is ATT. The coefficient of treat_post is significant and quite large. Marijuana sales increased by 11 units in Georgia in the post-policy period compared to the other states in pre-policy period. The covariates such as beer and cigarette sales do not show significant coefficients.

5. Add additional empirical implementation

Method 2. Synthetic Control

Synthetic control is a good method to give robust evidence of causal effects. Since synthetic control is a weighted average of control group, this method can create similar synthetic control to treatment group of pre-intervention and show the contribution of this synthetic control to the counterfactual (Abadie et al., 2010). Thus, this method suggests that the synthetic control presents an approximation to the marijuana sales that would have happened in Georgia in 1970-2019 in the absence of the policy.

require(tidysynth)
ma_out <-
  
  ans %>%
  
  # initial the synthetic control object
  synthetic_control(outcome = marij, # outcome
                    unit = state, # unit index in the panel data
                    time = year, # time index in the panel data
                    i_unit = 1, # unit where the intervention occurred
                    i_time = 1990, # time period when the intervention occurred
                    generate_placebos=T # generate placebo synthetic controls (for inference)
  ) %>%
  
  # Generate the aggregate predictors used to fit the weights
  
  # average beer and cigarette consumption in the donor pool from 1970 - 1990
  generate_predictor(time_window = 1970:1990,
                     beer_sales = mean(beer, na.rm = T),
                     ciga_sales = mean(ciga, na.rm = T)) %>%
  

  # Lagged cigarette sales 
  generate_predictor(time_window = 1975,
                     cigsale_1975 = marij) %>%
  generate_predictor(time_window = 1980,
                     cigsale_1980 = marij) %>%
  generate_predictor(time_window = 1985,
                     cigsale_1985 = marij) %>%
    generate_predictor(time_window = 1990,
                     cigsale_1990 = marij) %>%

  
  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = 1970:1990, # time to use in the optimization task
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
  ) %>%
  
  # Generate the synthetic control
  generate_control()
ma_out %>% plot_trends()

Figure shows marijuana sales for Georgia and its synthetic control during 1970-2019 period. We can see that marijuana sales in Georgia and the synthetic Georgia is quite close for pre-policy period. After the 1990 policy implementation, the gap of sales between Georgia and its synthetic counterpart becomes large. This suggests that the synthetic Georgia provides a good approximation to the marijuana sales that would happen in Georgia during 1970-2019 in the absence of the policy.

ma_out %>% plot_differences()

Figure presents that gap in marijuana sales between Georgia and its synthetic counterpart.

ma_out %>% grab_balance_table()
## # A tibble: 6 x 4
##   variable        `1` synthetic_1 donor_sample
##   <chr>         <dbl>       <dbl>        <dbl>
## 1 beer_sales    23.2        23.6         23.3 
## 2 ciga_sales   121.        124.         120.  
## 3 cigsale_1975  13.8         9.51        10.2 
## 4 cigsale_1980   8.04       10.3         10.3 
## 5 cigsale_1985  11.3        11.5          9.94
## 6 cigsale_1990  22.5        14.6         10.2
ma_out %>% plot_weights()

ma_out %>% plot_placebos()

Figure shows the marijuana gap in Georgia and placebo gap in the other countries. In this figure, Georgia gap is clearly the most distinctive in the post policy period.

Reference Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American statistical Association, 105(490), 493-505.