In this exercise, I explore regression discontinuity designs, based the following paper: Carpenter and Dobkin (2009) The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age AEJ: Applied Economics 1(1)164-82
ggplot(my_data_HW6, aes(x=agecell)) +
geom_point(aes(y=all, shape="All"))+
geom_point(aes(y=internal, shape="Internal"))+
geom_point(aes(y=external, shape = "External"))+
scale_shape_manual(values=c("All" = 18,
"Internal" = 0,
"External" = 17))+
labs(title = "Age Profiles for Age Deaths")+
xlab("Age")+
ylab("Death Per 100.000 person-years")+
geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=allfitted, linetype = "All Fitted"))+
geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=allfitted, linetype = "All Fitted"))+
geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=internalfitted, linetype = "Internal Fitted"))+
geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=internalfitted, linetype = "Internal Fitted"))+
geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=externalfitted, linetype = "External Fitted"))+
geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=externalfitted, linetype = "External Fitted"))
Qd <-lm(all ~ agecell + over21, data = my_data_HW6)
stargazer(Qd, title="Simplest RD", align=TRUE, type='html')
| Dependent variable: | |
| all | |
| agecell | -0.975 |
| (0.632) | |
| over21 | 7.663*** |
| (1.440) | |
| Constant | 112.310*** |
| (12.668) | |
| Observations | 48 |
| R2 | 0.595 |
| Adjusted R2 | 0.577 |
| Residual Std. Error | 2.493 (df = 45) |
| F Statistic | 32.995*** (df = 2; 45) |
| Note: | p<0.1; p<0.05; p<0.01 |
Being over 21 is a significant variable in order to explain the probability of death. Once a person turns 21 the amount of deaths per 100,000 goes up by 7.63 on average ceteris paribus. Given the fact that when they turn 21 they have access to drinking this could explain in part why the probability of death goes up. However it should be noted that there might be other variables affecting this results. For example it could be that the probability of owning a car goes up, and this makes them more likely to have a car accident. For this reason it is important to analyze the cause of death to see if there is something in particular that is pushing the numbers.
ggplot(my_data_HW6, aes(x=agecell)) +
geom_point(aes(y=allfitted))+
labs(title = "Results Simple RD")+
xlab("Age")+
ylab("Death Per 100.000 person-years")+
geom_line(data = filter(fortify(Qd), over21==0), aes(x = agecell, y = .fitted))+
geom_line(data = filter(fortify(Qd), over21==1), aes(x = agecell, y = .fitted))
Qf <-lm(all ~ agecell + over21 + agecell*over21, data = my_data_HW6)
stargazer(Qf, title="Flexible RD", align=TRUE, type='html')
| Dependent variable: | |
| all | |
| agecell | 0.827 |
| (0.819) | |
| over21 | 83.333*** |
| (24.357) | |
| agecell:over21 | -3.603*** |
| (1.158) | |
| Constant | 76.251*** |
| (16.396) | |
| Observations | 48 |
| R2 | 0.668 |
| Adjusted R2 | 0.645 |
| Residual Std. Error | 2.283 (df = 44) |
| F Statistic | 29.466*** (df = 3; 44) |
| Note: | p<0.1; p<0.05; p<0.01 |
Now that the interaction term has been added, agents are allowed to exhibit different slopes for the periods before and after 21. The estimate of interest is beta_over21 + 21*beta_over21_and_age = 7.67. The effect now is a little bigger than before, due to the fact that slopes get apart around 21. There are 7.63 more deaths per 100,000 people on average ceteris paribus.
ggplot(my_data_HW6, aes(x=agecell)) +
geom_point(aes(y=allfitted))+
labs(title = "Results Flexible RD")+
xlab("Age")+
ylab("Death Per 100.000 person-years")+
geom_line(data = filter(fortify(Qf), over21==0), aes(x = agecell, y = .fitted))+
geom_line(data = filter(fortify(Qf), over21==1), aes(x = agecell, y = .fitted))
Qh <- RDestimate(all ~ agecell, cutpoint = 21, data = my_data_HW6)
Qh_table <- matrix(c(Qh$obs, Qh$est, Qh$se, Qh$z, Qh$p, Qh$bw), nrow = 3, ncol = 6)
Qh_table <- round(Qh_table, digits = 3)
rownames(Qh_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qh_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qh_table %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| Observations | Coeff. | Std. Error | Z value | P-value | Bandwidth | |
|---|---|---|---|---|---|---|
| LATE | 40 | 9.001 | 1.480 | 6.080 | 0 | 1.656 |
| Half-Bw | 20 | 9.579 | 1.914 | 5.004 | 0 | 0.828 |
| Double-Bw | 48 | 7.953 | 1.278 | 6.223 | 0 | 3.312 |
plot(Qh)
The LATE coefficient now is 9.001. Similar to our first estimation. This gives us the Local Average Treatment Effect. The average of deaths augments by 9 over 100,000 ceteris paribus after someone turns 21.
Qi <- RDestimate(all ~ agecell, cutpoint = 21, kernel = "rectangular", bw=2, data = my_data_HW6)
Qi_table <- matrix(c(Qi$obs, Qi$est, Qi$se, Qi$z, Qi$p, Qi$bw), nrow = 3, ncol = 6)
Qi_table <- round(Qh_table, digits = 3)
rownames(Qi_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qi_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qi_table %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| Observations | Coeff. | Std. Error | Z value | P-value | Bandwidth | |
|---|---|---|---|---|---|---|
| LATE | 40 | 9.001 | 1.480 | 6.080 | 0 | 1.656 |
| Half-Bw | 20 | 9.579 | 1.914 | 5.004 | 0 | 0.828 |
| Double-Bw | 48 | 7.953 | 1.278 | 6.223 | 0 | 3.312 |
plot(Qi)
If we allow for the interaction term LATE = 7.663. Now the effect is smaller than in part h, this is probably due to the fact that we are using a rectangular kernell which gives the same weights to all the observations (hence the linear lines in the plot). There are 7.663 per 100,000 more deaths after turning 21 on average, ceteris paribus.