a. Download and read the paper:

For instance, from the AEA website. ### b. Get the Carpenter & Dobkin and data: Can be downloaded straight here http://masteringmetrics.com/wp-content/uploads/2015/01/AEJfigs.dta, from A&P Mastering Metrics resources webpage. (If that link is dead just ask me for the file.)

c. Reproduce Figure 3 of the paper

No need to bother trying to get their exact formatting here (unless you love doing that). Just make sure you show the data and the discontinuities.

hw6<-read.dta("AEJfigs.dta")
hw61<-subset(hw6,agecell<21)
hw62<-subset(hw6,agecell>=21)
F1<-ggplot() + 
   expand_limits(x = 23, y = 120)+
   geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+ 
  geom_line(data = hw61, aes(x = agecell, y = allfitted),color="black", size=0.5)+ 
  geom_line(data = hw62, aes(x = agecell, y = allfitted),color="black", size=0.5)+ 
  geom_point(data = hw6, aes(x = agecell, y = internal),color="grey", size=1.5)+ 
  geom_line(data = hw61, aes(x = agecell, y = internalfitted),color="black", size=0.5)+ 
    geom_line(data = hw62, aes(x = agecell, y = internalfitted),color="black", size=0.5)+ 
  geom_point(data = hw6, aes(x = agecell, y = external),color="grey", size=1.5)+ 
  geom_line(data = hw61, aes(x = agecell, y = externalfitted),color="black", size=0.5)+ 
   geom_line(data = hw62, aes(x = agecell, y = externalfitted),color="black", size=0.5)+ 
  scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
   scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
   ggtitle("Age Profile for Death Rates")+
   ylab('Deaths per 100,000 person-years')+ theme_classic()+
   xlab('Age')+
   theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
  theme(plot.title = element_text(hjust =0.5))
F1

d. Simplest regression-discontinuity design

Run a simple RD regression (same slope, with a jump) for “all” deaths by age (I don’t mean all the variables, just the one variable that is called “all”). Note: You are NOT trying to reproduce the paper’s tables. That is because you only have access to a collapsed dataset with <50 observations. Don’t expect to closely reproduce the results of the tables in the paper, which uses a full 1500 observations dataset. Also, you don’t have the same variables either. Analyse the results. How do you use these results to estimate the relationship between drinking and mortality?

hw6$over21 <- ifelse(hw6$agecell >=21, 1, 0)
reg1<-lm(all~agecell+over21,data=hw6)
stargazer(reg1, type = "html",  
          title = "RD Regression results")
RD Regression results
Dependent variable:
all
agecell -0.975
(0.632)
over21 7.663***
(1.440)
Constant 112.310***
(12.668)
Observations 48
R2 0.595
Adjusted R2 0.577
Residual Std. Error 2.493 (df = 45)
F Statistic 32.995*** (df = 2; 45)
Note: p<0.1; p<0.05; p<0.01

Answer: The RD regression equation is as following: \[ Death_{it} = b_0+b_1Age_i +b_2Over21_t+ \epsilon_{it} \] According to the estimation results, we found out that the deaths would increase 7.66(b2) people in 100,000 people per year when they achieve the drinking age. And with one unit age increase if they didn’t change group by increasing age, the deaths would decrease 0.97(b1) people in 100,000 people per year. The intercept is b0=112.31 which indicate that the deaths would be 112 in 100,000 people peryear if all people age at zero and don’t achieve drinking age. So, b2 estimate result shows that the drinking age has a significant positive relationship with mortality. Reaching drinking age can increase people’s death rate as we have discussed in the analysis part.

e. Plot the results.

Plot the all variable by age, and add the regression lines defined by your regression output.(It’s ok if the lines extend throughout the whole figure. Don’t worry too much about formatting.)

hw6$predicty<-112.3097-0.9747*hw6$agecell +hw6$over21 *7.6627; 
hw61$predicty<-112.3097-0.9747*hw61$agecell +0 *7.6627;
hw62$predicty<-112.3097 -0.9747*hw62$agecell +1 *7.6627;
F2<-ggplot() + 
   expand_limits(x = 23, y = 120)+
   geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+ 
  geom_line(data = hw61, aes(x = agecell, y = predicty),color="black", size=0.5)+ 
  geom_line(data = hw62, aes(x = agecell, y = predicty),color="black", size=0.5)+ 
  scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
   scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
   ggtitle("regression output plot")+
   ylab('Deaths per 100,000 person-years')+ theme_classic()+
   xlab('Age')+
   theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
  theme(plot.title = element_text(hjust =0.5))
F2

f. Allow more flexibility to your RD

Run another RD where not only the intercept, but also the slope can differ between the under- and over-21 subsamples. Analyse the results. What is your estimate of interest. Does it differ from the previous one?

reg2<-lm(all~agecell+over21+agecell*over21,data=hw6)
stargazer(reg2, type = "html",  
          title = "RD Regression results with different slopes")
RD Regression results with different slopes
Dependent variable:
all
agecell 0.827
(0.819)
over21 83.333***
(24.357)
agecell:over21 -3.603***
(1.158)
Constant 76.251***
(16.396)
Observations 48
R2 0.668
Adjusted R2 0.645
Residual Std. Error 2.283 (df = 44)
F Statistic 29.466*** (df = 3; 44)
Note: p<0.1; p<0.05; p<0.01

Answer: The RD regression equation is as following: \[ Death_{it} = b_0+b_1Age_i +b_2Over21_t +b_3Over21_tAge_i + \epsilon_{it} \] According to the estimation results, we found out that the deaths would increase 83.33(b2) people in 100,000 people per year when they reach the drinking age. And with one unit age increase after drinking age, the deaths would decrease 3.603(b3) people in 100,000 people per year. And with one unit age increase before drinking age, the deaths would increase 0.827(b1) people in 100,000 people per year. The intercept is b0=76.251 which indicate that the deaths would be 76.251 in 100,000 people per year if all people age at zero and don’t achieve drinking age. So, b2 and b3 are estimates of interests that different with in the last question. They show that the drinking age has a significant positive relationship with mortality. Reaching drinking age can increase people’s death rate as we have discussed in the analysis part. But after reaching the drinking age, the deaths rate would decrease with the increase of age.

g. Again, plot the data again, plot the data and the regression lines defined by the regression.

hw6$predicty2<- 76.251  +0.827*hw6$agecell +hw6$over21 *83.333+hw6$over21*hw6$agecell*(-3.603); 
hw61$predicty2<-76.251  +0.827*hw61$agecell +0*83.333+0*hw61$agecell*(-3.603); 
hw62$predicty2<-76.251  +0.827*hw62$agecell +1 *83.333+1*hw62$agecell*(-3.603); 
F3<-ggplot() + 
   expand_limits(x = 23, y = 120)+
   geom_point(data = hw6, aes(x = agecell, y = all),color="grey", size=1.5)+ 
  geom_line(data = hw61, aes(x = agecell, y = predicty2),color="black", size=0.5)+ 
  geom_line(data = hw62, aes(x = agecell, y = predicty2),color="black", size=0.5)+ 
  scale_y_continuous(limits = c(0,120), expand = c(0, 0),breaks = seq(from = 0, to = 120, by = 20))+
   scale_x_continuous(limits = c(19, 23), expand = c(0, 0),breaks = seq(from = 19, to = 23, by = 0.5))+
   ggtitle("regression output plot")+
   ylab('Deaths per 100,000 person-years')+ theme_classic()+
   xlab('Age')+
   theme(text = element_text(face = "bold",size = 6),plot.title = element_text(face = "bold",size=12))+
  theme(plot.title = element_text(hjust =0.5))
F3

h. Compare to pre-packaged RDD:

Use the rdd package.Run the RDestimate command on all~agecell, specifying your data and the cutoff option at 21.Print the output of your regression. Plot the output of your regression (You can just write plot(name). RD estimate outputs plotable objects). Discuss the results. Do they differ from what you found? Why?

reg3<-RDestimate(all~agecell,data=hw6,cutpoint=21)
summary(reg3)

Call: RDestimate(formula = all ~ agecell, data = hw6, cutpoint = 21)

Type: sharp

Estimates: Bandwidth Observations Estimate Std. Error z value Pr(>|z|) LATE 1.6561 40 9.001 1.480 6.080 1.199e-09 Half-BW 0.8281 20 9.579 1.914 5.004 5.609e-07 Double-BW 3.3123 48 7.953 1.278 6.223 4.882e-10

LATE Half-BW Double-BW *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

F-statistics: F Num. DoF Denom. DoF p
LATE 33.08 3 36 3.799e-10 Half-BW 29.05 3 16 2.078e-06 Double-BW 32.54 3 44 6.129e-11

plot(reg3)

Answer:

LATE estimate results is 9.001 which indicate the local average treatment effect is 9.001 that the expected value of deaths of treatment group who reach 21 drinking age is 9.001 more than control group under 21 years old.

i. Prepackaged linear RDD:

Re-run an RDestimate command, this time adding the options kernel = “rectangular” and bw=2. Output the results. Plot them. Compare to previous output and discuss.

reg4<-RDestimate(all~agecell,data=hw6,cutpoint=21,kernel = "rectangular",bw=2)
summary(reg4)

Call: RDestimate(formula = all ~ agecell, data = hw6, cutpoint = 21, bw = 2, kernel = “rectangular”)

Type: sharp

Estimates: Bandwidth Observations Estimate Std. Error z value Pr(>|z|) LATE 2 48 7.663 1.273 6.017 1.776e-09 Half-BW 1 24 9.753 1.902 5.128 2.929e-07 Double-BW 4 48 7.663 1.273 6.017 1.776e-09

LATE Half-BW Double-BW *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

F-statistics: F Num. DoF Denom. DoF p
LATE 29.47 3 44 2.651e-10 Half-BW 16.82 3 20 2.167e-05 Double-BW 29.47 3 44 2.651e-10

plot(reg4)

Answer:

LATE estimate results is 7.663 which indicate the local average treatment effect is 7.663 that the expected value of deaths of treatment group who reach 21 drinking age is 7.663 more than control group under 21 years old. The estimate is lower than the previous estimate that shows a smaller difference between treatment group and control group.