For instance, from the AEA website. ### b. Get the Carpenter & Dobkin and data: Can be downloaded straight here http://masteringmetrics.com/wp-content/uploads/2015/01/AEJfigs.dta, from A&P Mastering Metrics resources webpage. (If that link is dead just ask me for the file.)
No need to bother trying to get their exact formatting here (unless you love doing that). Just make sure you show the data and the discontinuities.
hw6<-read.dta("AEJfigs.dta")
hw61<-subset(hw6,agecell<21)
hw62<-subset(hw6,agecell>=21)
F1<-ggplot() +
expand_limits(x = 23, y = 120)+
geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+
geom_line(data = hw61, aes(x = agecell, y = allfitted),color="black", size=0.5)+
geom_line(data = hw62, aes(x = agecell, y = allfitted),color="black", size=0.5)+
geom_point(data = hw6, aes(x = agecell, y = internal),color="grey", size=1.5)+
geom_line(data = hw61, aes(x = agecell, y = internalfitted),color="black", size=0.5)+
geom_line(data = hw62, aes(x = agecell, y = internalfitted),color="black", size=0.5)+
geom_point(data = hw6, aes(x = agecell, y = external),color="grey", size=1.5)+
geom_line(data = hw61, aes(x = agecell, y = externalfitted),color="black", size=0.5)+
geom_line(data = hw62, aes(x = agecell, y = externalfitted),color="black", size=0.5)+
scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
ggtitle("Age Profile for Death Rates")+
ylab('Deaths per 100,000 person-years')+ theme_classic()+
xlab('Age')+
theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
theme(plot.title = element_text(hjust =0.5))
F1
Run a simple RD regression (same slope, with a jump) for “all” deaths by age (I don’t mean all the variables, just the one variable that is called “all”). Note: You are NOT trying to reproduce the paper’s tables. That is because you only have access to a collapsed dataset with <50 observations. Don’t expect to closely reproduce the results of the tables in the paper, which uses a full 1500 observations dataset. Also, you don’t have the same variables either. Analyse the results. How do you use these results to estimate the relationship between drinking and mortality?
hw6$over21 <- ifelse(hw6$agecell >=21, 1, 0)
reg1<-lm(all~agecell+over21,data=hw6)
stargazer(reg1, type = "html",
title = "RD Regression results")
| Dependent variable: | |
| all | |
| agecell | -0.975 |
| (0.632) | |
| over21 | 7.663*** |
| (1.440) | |
| Constant | 112.310*** |
| (12.668) | |
| Observations | 48 |
| R2 | 0.595 |
| Adjusted R2 | 0.577 |
| Residual Std. Error | 2.493 (df = 45) |
| F Statistic | 32.995*** (df = 2; 45) |
| Note: | p<0.1; p<0.05; p<0.01 |
Answer: The RD regression equation is as following: \[ Death_{it} = b_0+b_1Age_i +b_2Over21_t+ \epsilon_{it} \] According to the estimation results, we found out that the deaths would increase 7.66(b2) people in 100,000 people per year when they achieve the drinking age. And with one unit age increase if they didn’t change group by increasing age, the deaths would decrease 0.97(b1) people in 100,000 people per year. The intercept is b0=112.31 which indicate that the deaths would be 112 in 100,000 people peryear if all people age at zero and don’t achieve drinking age. So, b2 estimate result shows that the drinking age has a significant positive relationship with mortality. Reaching drinking age can increase people’s death rate as we have discussed in the analysis part.
Plot the all variable by age, and add the regression lines defined by your regression output.(It’s ok if the lines extend throughout the whole figure. Don’t worry too much about formatting.)
hw6$predicty<-112.3097-0.9747*hw6$agecell +hw6$over21 *7.6627;
hw61$predicty<-112.3097-0.9747*hw61$agecell +0 *7.6627;
hw62$predicty<-112.3097 -0.9747*hw62$agecell +1 *7.6627;
F2<-ggplot() +
expand_limits(x = 23, y = 120)+
geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+
geom_line(data = hw61, aes(x = agecell, y = predicty),color="black", size=0.5)+
geom_line(data = hw62, aes(x = agecell, y = predicty),color="black", size=0.5)+
scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
ggtitle("regression output plot")+
ylab('Deaths per 100,000 person-years')+ theme_classic()+
xlab('Age')+
theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
theme(plot.title = element_text(hjust =0.5))
F2
Run another RD where not only the intercept, but also the slope can differ between the under- and over-21 subsamples. Analyse the results. What is your estimate of interest. Does it differ from the previous one?
reg2<-lm(all~agecell+over21+agecell*over21,data=hw6)
stargazer(reg2, type = "html",
title = "RD Regression results with different slopes")
| Dependent variable: | |
| all | |
| agecell | 0.827 |
| (0.819) | |
| over21 | 83.333*** |
| (24.357) | |
| agecell:over21 | -3.603*** |
| (1.158) | |
| Constant | 76.251*** |
| (16.396) | |
| Observations | 48 |
| R2 | 0.668 |
| Adjusted R2 | 0.645 |
| Residual Std. Error | 2.283 (df = 44) |
| F Statistic | 29.466*** (df = 3; 44) |
| Note: | p<0.1; p<0.05; p<0.01 |
Answer: The RD regression equation is as following: \[ Death_{it} = b_0+b_1Age_i +b_2Over21_t +b_3Over21_tAge_i + \epsilon_{it} \] According to the estimation results, we found out that the deaths would increase 83.33(b2) people in 100,000 people per year when they reach the drinking age. And with one unit age increase after drinking age, the deaths would decrease 3.603(b3) people in 100,000 people per year. And with one unit age increase before drinking age, the deaths would increase 0.827(b1) people in 100,000 people per year. The intercept is b0=76.251 which indicate that the deaths would be 76.251 in 100,000 people per year if all people age at zero and don’t achieve drinking age. So, b2 and b3 are estimates of interests that different with in the last question. They show that the drinking age has a significant positive relationship with mortality. Reaching drinking age can increase people’s death rate as we have discussed in the analysis part. But after reaching the drinking age, the deaths rate would decrease with the increase of age.
hw6$predicty2<- 76.251 +0.827*hw6$agecell +hw6$over21 *83.333+hw6$over21*hw6$agecell*(-3.603);
hw61$predicty2<-76.251 +0.827*hw61$agecell +0*83.333+0*hw61$agecell*(-3.603);
hw62$predicty2<-76.251 +0.827*hw62$agecell +1 *83.333+1*hw62$agecell*(-3.603);
F3<-ggplot() +
expand_limits(x = 23, y = 120)+
geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+
geom_line(data = hw61, aes(x = agecell, y = predicty2),color="black", size=0.5)+
geom_line(data = hw62, aes(x = agecell, y = predicty2),color="black", size=0.5)+
scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
ggtitle("regression output plot")+
ylab('Deaths per 100,000 person-years')+ theme_classic()+
xlab('Age')+
theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
theme(plot.title = element_text(hjust =0.5))
F3
Use the rdd package.Run the RDestimate command on all~agecell, specifying your data and the cutoff option at 21.Print the output of your regression. Plot the output of your regression (You can just write plot(name). RD estimate outputs plotable objects). Discuss the results. Do they differ from what you found? Why?
reg3<-RDestimate(all~agecell,data=hw6,cutpoint=21)
summary(reg3)
Call: RDestimate(formula = all ~ agecell, data = hw6, cutpoint = 21)
Type: sharp
Estimates: Bandwidth Observations Estimate Std. Error z value Pr(>|z|) LATE 1.6561 40 9.001 1.480 6.080 1.199e-09 Half-BW 0.8281 20 9.579 1.914 5.004 5.609e-07 Double-BW 3.3123 48 7.953 1.278 6.223 4.882e-10
LATE Half-BW Double-BW *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
F-statistics: F Num. DoF Denom. DoF p
LATE 33.08 3 36 3.799e-10 Half-BW 29.05 3 16 2.078e-06 Double-BW 32.54 3 44 6.129e-11
plot(reg3)
Answer:
LATE estimate results is 9.001 which indicate the local average treatment effect is 9.001 that the expected value of deaths of treatment group who reach 21 drinking age is 9.001 more than control group under 21 years old.
Re-run an RDestimate command, this time adding the options kernel = “rectangular” and bw=2. Output the results. Plot them. Compare to previous output and discuss.
reg4<-RDestimate(all~agecell,data=hw6,cutpoint=21,kernel = "rectangular",bw=2)
summary(reg4)
Call: RDestimate(formula = all ~ agecell, data = hw6, cutpoint = 21, bw = 2, kernel = “rectangular”)
Type: sharp
Estimates: Bandwidth Observations Estimate Std. Error z value Pr(>|z|) LATE 2 48 7.663 1.273 6.017 1.776e-09 Half-BW 1 24 9.753 1.902 5.128 2.929e-07 Double-BW 4 48 7.663 1.273 6.017 1.776e-09
LATE Half-BW Double-BW *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
F-statistics: F Num. DoF Denom. DoF p
LATE 29.47 3 44 2.651e-10 Half-BW 16.82 3 20 2.167e-05 Double-BW 29.47 3 44 2.651e-10
plot(reg4)
Answer:
LATE estimate results is 7.663 which indicate the local average treatment effect is 7.663 that the expected value of deaths of treatment group who reach 21 drinking age is 7.663 more than control group under 21 years old. The estimate is lower than the previous estimate that shows a smaller difference between treatment group and control group.