HW10
a.Donwload and read the paper:
b.Get the Carpenter & Dobkin and data:
c.Reproduce Figure 3 of the paper
attach(HW10)
plot(agecell,all,pch=18,col="red",ylim=c(0,120),
main="Age Profile for Death Rates",
xlab="Age",ylab="Deaths per 100,000 person-years",xaxt="n")
points(agecell,external,pch=17,col="blue")
points(agecell,internal,pch=22,col="orange")
lines(agecell,allfitted,lty=2,col="black")
lines(agecell,externalfitted,lty=1,col="black")
lines(agecell,internalfitted,lty=3,col="black")
axis(1,at=seq(19,23,0.5))
axis(2,at=seq(0,120,20))d.Simplest regression-discountinuity design
The coefficient of the over age 21 indicator implies the change in all causes of death at age 21. The estimate shows evidence of a discrete increase of 7.66 and it is statistically significant. This result provides evidence of a positive causal relationship between alcohol consumption and deaths.
Regression in level
##
## Call:
## lm(formula = all ~ agecell + over21, data = HW10)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0559 -1.8483 0.1149 1.4909 5.8043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 112.3097 12.6681 8.866 1.96e-11 ***
## agecell -0.9747 0.6325 -1.541 0.13
## over21 7.6627 1.4403 5.320 3.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.493 on 45 degrees of freedom
## Multiple R-squared: 0.5946, Adjusted R-squared: 0.5765
## F-statistic: 32.99 on 2 and 45 DF, p-value: 1.508e-09
Regression in percentage
##
## Call:
## lm(formula = lnall ~ agecell + over21, data = HW10)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.054506 -0.018848 0.001014 0.016217 0.058464
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.717290 0.132047 35.724 < 2e-16 ***
## agecell -0.009350 0.006592 -1.418 0.163
## over21 0.078428 0.015013 5.224 4.35e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02599 on 45 degrees of freedom
## Multiple R-squared: 0.5944, Adjusted R-squared: 0.5764
## F-statistic: 32.97 on 2 and 45 DF, p-value: 1.52e-09
e.Plot the results
attach(HW10)
plot(agecell,all,pch=18,col="red",ylim=c(0,120),
main="All Causes of Death",
xlab="Age",ylab="Deaths per 100,000 person-years",xaxt="n")
abline(lm(all~agecell+over21))
axis(1,at=seq(19,23,0.5))f.Allow more flexibility to your RD
The results provide the evidence that model is not well fitted in the linear regression under age 21, while it can be better fitted in the linear regression. The statistical significance implies that the deaths has a negative causal relationship with age when the age is over 21.
under21<-HW10[HW10$over21==0,]
under21$all.1<-under21$all
under21<-under21[,-c(2)]
over21<-HW10[HW10$over21==1,]
over21$all.2<-over21$all
over21<-over21[,-c(2)]
model.1<-lm(all.1~agecell,under21)
model.2<-lm(all.2~agecell,over21)
library(stargazer)| Dependent variable: | ||
| Deaths under Age 21 | Deaths over Age 21 | |
| (1) | (2) | |
| Age | 0.827 | -2.776*** |
| (0.857) | (0.779) | |
| Constant | 76.251*** | 159.585*** |
| (17.153) | (17.140) | |
| Observations | 24 | 24 |
| R2 | 0.041 | 0.366 |
| Adjusted R2 | -0.003 | 0.337 |
| Residual Std. Error (df = 22) | 2.388 | 2.172 |
| F Statistic (df = 1; 22) | 0.932 | 12.692*** |
| Note: | p<0.1; p<0.05; p<0.01 | |
g.Plot the data again
library(ggplot2)
predicted_model1<-data.frame(under21_pred=predict(model.1,under21),
age1=under21$agecell)
predicted_model2<-data.frame(over21_pred=predict(model.2,over21),
age2=over21$agecell)
All<-cbind(predicted_model1,predicted_model2,under21$all.1,over21$all.2)
names(All)<-c("under21_pred","age_under21",
"over21_pred","age_over21",
"under21","over21")
s<-c("under21_pred"="red","over21_pred"="blue","under21"="black","over21"="orange")
ggplot(All)+
geom_point(aes(age_under21,under21,color="under21"))+
geom_line(aes(age_under21,under21_pred,color="under21_pred"),size=1)+
geom_point(aes(age_over21,over21,color="over21"),pch=17)+
geom_line(aes(age_over21,over21_pred,color="over21_pred"),size=1)+
scale_x_continuous(breaks=seq(19,23,0.5),limits=c(19,23),expand=c(0,0))+
scale_y_continuous(breaks=seq(0,120,20),limits=c(0,120),expand=c(0,0))+
labs(x="Age", y="Deaths per 100,000 person-years",color="Legend")+
theme_classic()+
scale_color_manual(values=s)+
theme(legend.text= element_text(face="bold",size=8))+
theme(legend.background = element_rect(fill= alpha("cornsilk3",0.3)))+
theme(legend.title = element_blank())+
theme(legend.position = c(0.8, 0.2))h.Compare to pre-packaged RDD
The LATE is 9.001 and it is statistically significant. The result is very different with what we caculated in the simplest RDD regression.
##
## Call:
## RDestimate(formula = all ~ agecell, data = HW10, cutpoint = 21)
##
## Type:
## sharp
##
## Estimates:
## Bandwidth Observations Estimate Std. Error z value Pr(>|z|)
## LATE 1.6561 40 9.001 1.480 6.080 1.199e-09
## Half-BW 0.8281 20 9.579 1.914 5.004 5.609e-07
## Double-BW 3.3123 48 7.953 1.278 6.223 4.882e-10
##
## LATE ***
## Half-BW ***
## Double-BW ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistics:
## F Num. DoF Denom. DoF p
## LATE 33.08 3 36 3.799e-10
## Half-BW 29.05 3 16 2.078e-06
## Double-BW 32.54 3 44 6.129e-11
i.Prepackaged linear RDD
When adding options for kernel and bandwidth, the estimate of LATE is more closed to the estimate of the simplest RDD regression. Because the kernel is set up “triangular” in part h and now it is “rectangular”. In this part, we set up the RDD with a different way to give weights, since triangular kernel function gives a higher weight on values closed to the mean while rectangular function gives the same weight to all values.
##
## Call:
## RDestimate(formula = all ~ agecell, data = HW10, cutpoint = 21,
## bw = 2, kernel = "rectangular")
##
## Type:
## sharp
##
## Estimates:
## Bandwidth Observations Estimate Std. Error z value Pr(>|z|)
## LATE 2 48 7.663 1.273 6.017 1.776e-09
## Half-BW 1 24 9.753 1.902 5.128 2.929e-07
## Double-BW 4 48 7.663 1.273 6.017 1.776e-09
##
## LATE ***
## Half-BW ***
## Double-BW ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistics:
## F Num. DoF Denom. DoF p
## LATE 29.47 3 44 2.651e-10
## Half-BW 16.82 3 20 2.167e-05
## Double-BW 29.47 3 44 2.651e-10