I use the Carpenter and Dobkin (2009) The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age AEJ: Applied Economics 1(1)164-82
library(haven)
AEJ <- read_dta("AEJfigs.dta")
#Create a dummy variable for 21 and more
AEJ$age21 <- 0
AEJ$age21[AEJ$agecell >21] <- 1
plot(AEJ$agecell, AEJ$all, pch=16, xlim=c(19,23), ylim=c(0,120), xlab = "Age",
ylab = "Deaths per 100,000 person-years", main = "Figure 3. Age Profile for Death Rates", col="gray")
points(AEJ$agecell, AEJ$internal, pch=0)
points(AEJ$agecell, AEJ$external, pch=17)
#To show the discontinuity at age 21, I seperately draw two lines.
lines(AEJ$agecell[AEJ$agecell<21], AEJ$allfitted[AEJ$agecell<21], lty=3, lwd =1, col="gray")
lines(AEJ$agecell[AEJ$agecell>=21], AEJ$allfitted[AEJ$agecell>=21], lty=3, lwd=1, col="gray")
lines(AEJ$agecell[AEJ$agecell<21], AEJ$internalfitted[AEJ$agecell<21], lty=2, lwd=1)
lines(AEJ$agecell[AEJ$agecell>=21], AEJ$internalfitted[AEJ$agecell>=21], lty=2, lwd=1)
lines(AEJ$agecell[AEJ$agecell<21], AEJ$externalfitted[AEJ$agecell<21], lty=1, lwd=1)
lines(AEJ$agecell[AEJ$agecell>=21], AEJ$externalfitted[AEJ$agecell>=21], lty=1, lwd=1)
legend(x=20.8,y=60, ncol=2,
c("All","Internal","External","All fitted", "Internal fitted", "External fitted"),
pch=c(16,0,17,NA,NA,NA), lty=c(0,0,0,3,2,1))
model <- lm(all ~ agecell + age21, AEJ, na.action = "na.exclude")
summary(model)
##
## Call:
## lm(formula = all ~ agecell + age21, data = AEJ, na.action = "na.exclude")
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0559 -1.8483 0.1149 1.4909 5.8043
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 112.3097 12.6681 8.866 1.96e-11 ***
## agecell -0.9747 0.6325 -1.541 0.13
## age21 7.6627 1.4403 5.320 3.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.493 on 45 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.5946, Adjusted R-squared: 0.5765
## F-statistic: 32.99 on 2 and 45 DF, p-value: 1.508e-09
The age21 variable is statistically significant at 99% level. It implies that people who are 21 and more have 7.773 more deaths per 100,000 person year than people who are less than 21.
In order to estimate the relationship between drinking and mortality by using age21, we need two regressions on drinking and mortality. We already have the coefficient of age21 in the mortality model. We can have the coefficient of age21 in the drinking model. Let’s say, B21.
The ratio of these two coefficients is the elasticity of drinking on mortality.
plot(AEJ$agecell, AEJ$all, pch=16, xlim=c(19,23), ylim=c(80,120), xlab= "Age", ylab="Deaths per 100,000 person-years"
, main = "RD regression")
f <- fitted(model)
lines(AEJ$agecell, f,lty=2, lwd=1, col="orange")
less21 <- subset(AEJ, age21==0)
over21 <- subset(AEJ, age21==1)
model1 <- lm(all~agecell, data = less21, na.action = "na.exclude")
model2 <- lm(all~agecell, data = over21, na.action = "na.exclude")
summary(model1)
##
## Call:
## lm(formula = all ~ agecell, data = less21, na.action = "na.exclude")
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7972 -1.2565 -0.0694 0.9600 5.3409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.2515 17.1528 4.445 0.000203 ***
## agecell 0.8270 0.8567 0.965 0.344879
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.388 on 22 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.04064, Adjusted R-squared: -0.002972
## F-statistic: 0.9318 on 1 and 22 DF, p-value: 0.3449
summary(model2)
##
## Call:
## lm(formula = all ~ agecell, data = over21, na.action = "na.exclude")
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3683 -1.8255 0.2738 1.1077 4.1014
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 159.5847 17.1402 9.311 4.36e-09 ***
## agecell -2.7764 0.7793 -3.563 0.00174 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.172 on 22 degrees of freedom
## Multiple R-squared: 0.3658, Adjusted R-squared: 0.337
## F-statistic: 12.69 on 1 and 22 DF, p-value: 0.001742
Both the intercept and the slope from model1 and model2 are different, respectively.
age21 <- data.frame(agecell = c(21))
p1 <- predict(model1, age21)
p2 <- predict(model2, age21)
p2-p1
## 1
## 7.662709
Our estimate of interest is the predicted value at cutoff point (age 21), which can be obtained from difference between the predicted values at age 21 from model 1 and model 2.
This number is the same as one from Part d.
plot(AEJ$agecell, AEJ$all, pch=16, xlim=c(19,23), ylim=c(80,120), xlab= "Age", ylab="Deaths per 100,000 person-years")
f1 <- fitted(model1)
f2 <- fitted(model2)
lines(less21$agecell, f1,lty=3, lwd=1, col="red")
lines(over21$agecell, f2,lty=3, lwd=1, col="blue")
library(rdd)
rdd21 <- RDestimate(all~agecell, data=AEJ, cutpoint = 21)
rdd21
##
## Call:
## RDestimate(formula = all ~ agecell, data = AEJ, cutpoint = 21)
##
## Coefficients:
## LATE Half-BW Double-BW
## 9.001 9.579 7.953
plot(rdd21)
plot(AEJ$agecell, AEJ$all, pch=16, xlim=c(19,23), ylim=c(90,105), xlab= "Age", ylab="Deaths per 100,000 person-years")
f1 <- fitted(model1)
f2 <- fitted(model2)
lines(less21$agecell, f1,lty=3, lwd=1)
lines(over21$agecell, f2,lty=3, lwd=1)
We need to compare the coefficient of LATE with the coefficient of age 21. They are different (7.66 vs. 9.00). I also plot the output of RDD regression and the adjusted output of simple RD regression. You can confirm the differences.
RDD estimate uses a triangular kernel to fit local linear regression on each sie of age 21, while simple RD regression estimate does not use kernel. As a result, the RDD plot has a bigger discountinuity than the simple RD plot.
rdd21_k <- RDestimate(all~agecell, data=AEJ, cutpoint = 21, kernel = "rectangular", bw=2)
rdd21_k
##
## Call:
## RDestimate(formula = all ~ agecell, data = AEJ, cutpoint = 21,
## bw = 2, kernel = "rectangular")
##
## Coefficients:
## LATE Half-BW Double-BW
## 7.663 9.753 7.663
plot(rdd21_k)
The coefficient of LATE is 7.663, which is the same as the coefficient of age 21 in the simple RD regression. As a result, we can figure out that a rectangular kernel and bandwith of 2 have no weight like simple RD regression.