Regression Discontinuity

a. Download and read the paper

In this exercise, I explore regression discontinuity designs, based the following paper: Carpenter and Dobkin (2009) The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age AEJ: Applied Economics 1(1)164-82

b. Get the Carpenter & Dobkin and data

c. Reproduce Figure 3 of the paper

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=all, shape="All"))+
  geom_point(aes(y=internal, shape="Internal"))+
  geom_point(aes(y=external, shape = "External"))+
  scale_shape_manual(values=c("All" = 18, 
                              "Internal" = 0, 
                              "External" = 17))+
  labs(title = "Age Profiles for Age Deaths")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=allfitted, linetype = "All Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=allfitted, linetype = "All Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=internalfitted, linetype = "Internal Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=internalfitted, linetype = "Internal Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=externalfitted, linetype = "External Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=externalfitted, linetype = "External Fitted"))

d. Simplest regression-discontinuity design

Qd <-lm(all ~ agecell + over21, data = my_data_HW6)
stargazer(Qd, title="Simplest RD", align=TRUE,  type='html')
Simplest RD
Dependent variable:
all
agecell -0.975
(0.632)
over21 7.663***
(1.440)
Constant 112.310***
(12.668)
Observations 48
R2 0.595
Adjusted R2 0.577
Residual Std. Error 2.493 (df = 45)
F Statistic 32.995*** (df = 2; 45)
Note: p<0.1; p<0.05; p<0.01

Being over 21 is a significant variable in order to explain the probability of death. Once a person turns 21 the amount of deaths per 100,000 goes up by 7.63 on average ceteris paribus. Given the fact that when they turn 21 they have access to drinking this could explain in part why the probability of death goes up. However it should be noted that there might be other variables affecting this results. For example it could be that the probability of owning a car goes up, and this makes them more likely to have a car accident. For this reason it is important to analyze the cause of death to see if there is something in particular that is pushing the numbers.

e. Plot the results

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=allfitted))+
  labs(title = "Results Simple RD")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(fortify(Qd), over21==0), aes(x = agecell, y = .fitted))+
  geom_line(data = filter(fortify(Qd), over21==1), aes(x = agecell, y = .fitted))

f. Allow more flexibility to your RD

Qf <-lm(all ~ agecell + over21 + agecell*over21, data = my_data_HW6)
stargazer(Qf, title="Flexible RD", align=TRUE,  type='html')
Flexible RD
Dependent variable:
all
agecell 0.827
(0.819)
over21 83.333***
(24.357)
agecell:over21 -3.603***
(1.158)
Constant 76.251***
(16.396)
Observations 48
R2 0.668
Adjusted R2 0.645
Residual Std. Error 2.283 (df = 44)
F Statistic 29.466*** (df = 3; 44)
Note: p<0.1; p<0.05; p<0.01

Now that the interaction term has been added, agents are allowed to exhibit different slopes for the periods before and after 21. The estimate of interest is beta_over21 + 21*beta_over21_and_age = 7.67. The effect now is a little bigger than before, due to the fact that slopes get apart around 21. There are 7.63 more deaths per 100,000 people on average ceteris paribus.

g. Again, plot the data

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=allfitted))+
  labs(title = "Results Flexible RD")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(fortify(Qf), over21==0), aes(x = agecell, y = .fitted))+
  geom_line(data = filter(fortify(Qf), over21==1), aes(x = agecell, y = .fitted))

h. Compare to pre-packaged RDD

Qh <- RDestimate(all ~ agecell, cutpoint = 21, data = my_data_HW6)

Qh_table <- matrix(c(Qh$obs, Qh$est, Qh$se, Qh$z, Qh$p, Qh$bw), nrow = 3, ncol = 6)
Qh_table <- round(Qh_table, digits = 3)
rownames(Qh_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qh_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qh_table %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Observations Coeff. Std. Error Z value P-value Bandwidth
LATE 40 9.001 1.480 6.080 0 1.656
Half-Bw 20 9.579 1.914 5.004 0 0.828
Double-Bw 48 7.953 1.278 6.223 0 3.312
plot(Qh)

The LATE coefficient now is 9.001. Similar to our first estimation. This gives us the Local Average Treatment Effect. The average of deaths augments by 9 over 100,000 ceteris paribus after someone turns 21.

i. Prepackaged linear RDD

Qi <- RDestimate(all ~ agecell, cutpoint = 21, kernel = "rectangular", bw=2, data = my_data_HW6)

Qi_table <- matrix(c(Qi$obs, Qi$est, Qi$se, Qi$z, Qi$p, Qi$bw), nrow = 3, ncol = 6)
Qi_table <- round(Qh_table, digits = 3)
rownames(Qi_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qi_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qi_table %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Observations Coeff. Std. Error Z value P-value Bandwidth
LATE 40 9.001 1.480 6.080 0 1.656
Half-Bw 20 9.579 1.914 5.004 0 0.828
Double-Bw 48 7.953 1.278 6.223 0 3.312
plot(Qi)

If we allow for the interaction term LATE = 7.663. Now the effect is smaller than in part h, this is probably due to the fact that we are using a rectangular kernell which gives the same weights to all the observations (hence the linear lines in the plot). There are 7.663 per 100,000 more deaths after turning 21 on average, ceteris paribus.