data<-read.csv('G:/My Drive/Seans Drive/PhD/Classes/AQME/HW/AEJfigs.csv')
data<-na.omit(data)
data$over21=ifelse(data$agecell>=21, 1,0)
data21u<-split(data, data$over21)$'0'
data21o<-split(data, data$over21)$'1'
g1<-ggplot(NULL)+geom_line(data=data21u,aes(y=internalfitted, x=agecell, color='#9F9F9F'),linetype=4)+geom_point(data=data21u, aes(y=internal, x=agecell, color='#9F9F9F'),shape=2)+
  geom_line(data=data21u, aes(y=externalfitted, x=agecell, color='#767676'),linetype=1)+geom_point(data=data21u, aes(y=external, x=agecell, color='#767676'),shape=1)+
  geom_line(data=data21u, aes(y=allfitted, x=agecell,color='#4F4F4F'),linetype=5)+geom_point(data=data21u, aes(y=all, x=agecell, color='#4F4F4F'),shape=0)+
  geom_line(data=data21o,aes(y=internalfitted, x=agecell, color='#9F9F9F'),linetype=4)+geom_point(data=data21o,aes(y=internal, x=agecell, color='#9F9F9F'),shape=2)+
  geom_line(data=data21o,aes(y=externalfitted, x=agecell, color='#767676'),linetype=1)+geom_point(data=data21o,aes(y=external, x=agecell, color='#767676'),shape=1)+
  geom_line(data=data21o,aes(y=allfitted, x=agecell,color='#4F4F4F'),linetype=5)+geom_point(data=data21o,aes(y=all, x=agecell, color='#4F4F4F'),shape=0)+
  scale_color_manual(labels = c("All", "External","Internal"), values = c('#4F4F4F', '#767676','#9F9F9F' ))+
  ggtitle("") +
  xlab("Deaths per 100,000 person-year") + ylab("Age") 

g1

RD1<-lm(all~agecell+as.factor(over21),data=data)
stargazer(RD1,type='html')
Dependent variable:
all
agecell -0.975
(0.632)
as.factor(over21)1 7.663***
(1.440)
Constant 112.310***
(12.668)
Observations 48
R2 0.595
Adjusted R2 0.577
Residual Std. Error 2.493 (df = 45)
F Statistic 32.995*** (df = 2; 45)
Note: p<0.1; p<0.05; p<0.01
data$fit_all<-fitted(RD1)

The simple regression states that increased alcohol consumption leads to an increase in total fatalities. Turning 21, which is the legal drinking age in the United States, leads to 7.56 additional fatalities due to alcohol consumption

g2<-ggplot(data)+geom_point(aes(y=all, x=agecell))+geom_line(aes(y=fit_all, x=agecell))
g2

data$agecell2=data$agecell*data$agecell
data$agecell3=data$agecell*data$agecell*data$agecell
RD2<-lm(all~over21+ agecell+agecell2+agecell3,data=data)
stargazer(RD2,type='html')
Dependent variable:
all
over21 8.941***
(1.785)
agecell 583.080
(508.804)
agecell2 -27.063
(24.288)
agecell3 0.417
(0.385)
Constant -4,075.580
(3,544.608)
Observations 48
R2 0.666
Adjusted R2 0.635
Residual Std. Error 2.314 (df = 43)
F Statistic 21.460*** (df = 4; 43)
Note: p<0.1; p<0.05; p<0.01
data$fit_all2<-fitted(RD2)

By adding more flexibility, or allowing the estimate to better fit my data, I get an increase in my estimated effect. Turning 21 leads to, on average, 8.94 additional fatalities due to alcohol consumption. This increase in the estimate is due to either “over fitting”, or my previous model “under-fitting”. By increasing the degrees at which I estimate the model may lead effect my estimate, see (Gelmen and Imbens 2017).

g3<-ggplot(data)+geom_point(aes(y=all, x=agecell))+geom_line(aes(y=fit_all2, x=agecell))
g3

library(rdd)
RD3<-RDestimate(all~agecell,data=data,cutpoint=21)
stargazer(RD3$est,type='html')
LATE Half-BW Double-BW
9.001 9.579 7.953

The RDD package gives a LATE of 9 additional deaths due to an increase in alcohol consumption after turning 21. This difference is due to how the canned program RDestimate estimates the linear function from the data. The RDestimate uses triangular kernel for its local linear fitting. The LATE using a triangular kernel is close to the LATE found in the 3rd degree polynomial regression used in the “flexible” model.

plot(RD3)

RD4<-RDestimate(all~agecell,data=data,cutpoint=21,kernel='rectangular')
stargazer(RD4$est,type='html')
LATE Half-BW Double-BW
7.663 9.094 7.663

Using a linear fitting or “rectangular” kernel, the estimate is much closer to the LATE I found using a linear regression. This would make sense since a linear regression fits a linear model to the data. The LATE (7.663) using a rectangular kernel is identical to the simple RDD model used above.