In this exercise, we will explore regression discontinuity designs:

a. Download the following paper: Carpenter and Dobkin (2009) The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age AEJ: Applied Economics 1(1)164-82

b. Data can be downloaded from: Click Here

setwd("~/OneDrive - University of Georgia/4th Sem PhD/Adv Econometric Applications_Filipski/Assignments/HW6")
library(haven)
CDdata <- read_dta("AEJfigs.dta")

#Dummy for age<21 and age>=21:
CDdata$age21 <- as.factor(ifelse(CDdata$agecell>=21, 1, 0))

c. Reproducing Figure 3 of the paper:

Answer:

library(ggplot2)

#lets split data to two halves (over and under 21 age) to draw fitted lines:
CDdata_over21 <- subset(CDdata, CDdata$age21 == 1)
CDdata_under21 <- subset(CDdata, CDdata$age21 == 0)

fig3 <- ggplot()+
  geom_point(data = CDdata, aes(x = agecell, y = all), shape = 1) +
  geom_line(data = CDdata_under21, aes(x=agecell, y = allfitted, linetype = "All fitted"))+
  geom_line(data = CDdata_over21, aes(x=agecell, y = allfitted, linetype = "All fitted")) +
  geom_point(data = CDdata, aes(x = agecell, y = internal), shape = 2) +
  geom_line(data = CDdata_under21, aes(x=agecell, y = internalfitted, linetype = "Internal fitted")) +
  geom_line(data = CDdata_over21, aes(x=agecell, y = internalfitted, linetype = "Internal fitted")) +
  geom_point(data = CDdata, aes(x = agecell, y = external), shape = 3) +
  geom_line(data = CDdata_under21, aes(x=agecell, y = externalfitted, linetype = "External fitted")) +
  geom_line(data = CDdata_over21, aes(x=agecell, y = externalfitted, linetype = "External fitted")) +
  scale_x_continuous(breaks=seq(19, 23, by=0.5)) +  scale_y_continuous(breaks=seq(0, 120, by= 20)) +
  xlab("Age") + 
  ylab("Deaths per 100,000 person-years") +
  ggtitle("FIGURE 3: AGE PROFILE FOR DEATH RATES") 
fig3

d. Simplest RDD:

Answer: Using the dataset, we try to model the jump observed in above graph: only for “all” deaths by age: (Note: notations adopted here are not exactly similar to the original paper)

\(y_{ai} = \alpha + \beta \times age_{ai} + \gamma \times D_{ai} + v_{ai}\)

an individual \(i\) of age \(a\) has an outcome (all deaths) represented as \(y\). Similarly, \(age_{ai}\) captures the slope, while \(D_{ai}\) is the dummy =1 if age>=21 represents the jump at 21 years of age. \(v_{ai}\) is the unobserved error component and \(\alpha\) is an intercept.

simpleRDD <- lm(all ~ agecell + age21, CDdata)
library(stargazer)
stargazer (simpleRDD,
           type="text",
           align=TRUE,
           no.space=TRUE,
           column.labels=c("All Deaths"),
           covariate.labels = c("Age", "Age Above 21"),
           title="Table: Simple RDD")
## 
## Table: Simple RDD
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 all            
##                             All Deaths         
## -----------------------------------------------
## Age                           -0.975           
##                               (0.632)          
## Age Above 21                 7.663***          
##                               (1.440)          
## Constant                    112.310***         
##                              (12.668)          
## -----------------------------------------------
## Observations                    48             
## R2                             0.595           
## Adjusted R2                    0.577           
## Residual Std. Error       2.493 (df = 45)      
## F Statistic           32.995*** (df = 2; 45)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Result illustrates that there is the significant jump in all types of deaths at the age 21. We can use this age threshold relation to predict the alcohol consumption and see if drinking has an effect on mortality. This break helps us to estimate the effect just above and below the threshold.

e. Plotting Results:

#lets predict xb from above regression and draw the fitted line
CDdata$predict <- predict(simpleRDD, CDdata)

#lets update these datasets
CDdata_over21 <- subset(CDdata, CDdata$age21 == 1)
CDdata_under21 <- subset(CDdata, CDdata$age21 == 0)

#now we can draw plot with dots from original data and fitted line from above regression:
library(ggplot2)
plot_simpleRDD <- ggplot()+
  geom_point(data = CDdata, aes(x = agecell, y = all), shape = 1) +
  geom_line(data = CDdata_under21, aes(x=agecell, y = predict, linetype = "Fitted Line from Simple RDD"))+
  geom_line(data = CDdata_over21, aes(x=agecell, y = predict, linetype = "Fitted Line from Simple RDD")) +
  xlab("Age") + 
  ylab("PREDICTED 'ALL' DEATHS") +
  ggtitle("FIGURE: PLOTTING PREDICTED VALUE FROM SIMPLE RDD") 
plot_simpleRDD

f. Allowing more flexibility to our RD: Allowing for slopes to vary

Answer: We can add an interaction term to vary slopes of lines on either sides of threshold: \(y_{ai} = \alpha + \beta \times age_{ai} + \gamma \times D_{ai} + \delta (age_{ai} \times D_{ai}) + v_{ai}\)

an individual \(i\) of age \(a\) has an outcome (all deaths) represented as \(y\). Similarly, \(age_{ai}\) captures the slope, while \(D_{ai}\) is the dummy =1 if age>=21 represents the jump at 21 years of age. \(v_{ai}\) is the unobserved error component and \(\alpha\) is an intercept.

#create interaction
CDdata$age21 <- ifelse(CDdata$agecell>=21, 1, 0)
CDdata$interaction <- CDdata$agecell*CDdata$age21
  
flexibleRDD <- lm(all ~ agecell + age21 + interaction, CDdata)
library(stargazer)
stargazer (flexibleRDD,
           type="text",
           align=TRUE,
           no.space=TRUE,
           column.labels=c("All Deaths"),
           covariate.labels = c("Age", "Age Above 21", "Age * Age Above 21"),
           title="Table: Flexible RDD")
## 
## Table: Flexible RDD
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 all            
##                             All Deaths         
## -----------------------------------------------
## Age                            0.827           
##                               (0.819)          
## Age Above 21                 83.333***         
##                              (24.357)          
## Age * Age Above 21           -3.603***         
##                               (1.158)          
## Constant                     76.251***         
##                              (16.396)          
## -----------------------------------------------
## Observations                    48             
## R2                             0.668           
## Adjusted R2                    0.645           
## Residual Std. Error       2.283 (df = 44)      
## F Statistic           29.466*** (df = 3; 44)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

g. Again, plotting the result from flexible RDD:

Answer:

#lets predict xb from above flexible RDD regression and draw the fitted line
CDdata$predict2 <- predict(flexibleRDD, CDdata)

#lets update these datasets
CDdata_over21 <- subset(CDdata, CDdata$age21 == 1)
CDdata_under21 <- subset(CDdata, CDdata$age21 == 0)

#now we can draw plot with dots from original data and fitted line from above regression:
library(ggplot2)
plot_flexibleRDD <- ggplot()+
  geom_point(data = CDdata, aes(x = agecell, y = all), shape = 1) +
  geom_line(data = CDdata_under21, aes(x=agecell, y = predict2, linetype = "Fitted Line from Flexible RDD"))+
  geom_line(data = CDdata_over21, aes(x=agecell, y = predict2, linetype = "Fitted Line from Flexible RDD")) +
  xlab("Age") + 
  ylab("PREDICTED 'ALL' DEATHS") +
  ggtitle("FIGURE: PLOTTING PREDICTED VALUE FROM FLEXIBLE RDD") 
plot_flexibleRDD

Notice that the slope until 21 years of age is increasing. After a significant jump at 21, the slope decreases linearly.

h. Compare to pre-packaged RDD:

Answer:

library(rdd)
RDD_package <- RDestimate(all ~ agecell, data = CDdata, cutpoint = 21)   #recall 21 is the age cutoff
RDD_package
## 
## Call:
## RDestimate(formula = all ~ agecell, data = CDdata, cutpoint = 21)
## 
## Coefficients:
##      LATE    Half-BW  Double-BW  
##     9.001      9.579      7.953
plot (RDD_package)      #plotting the estimates from packaged RDD

Comparison with simple RDD: the LATE estimate using simple RDD equation (7.663) is smaller than with the packaged version (see also for half bandwidth or double bandwidth). It differs because default RDestimate have some specific bandwidth and weighting. But OLS weights all sample equally ending up giving lower estimate.

i. Pre-packaged Linear RDD with different bandwidth and weighting:

Answer: Now we use bw=2 and kernel = “rectangular” in above estimation process and see the changes (if any):

library(rdd)
RDD_package2 <- RDestimate(all ~ agecell, data = CDdata, cutpoint = 21, kernel= "rectangular", bw=2)   #recall 21 is the age cutoff
RDD_package2
## 
## Call:
## RDestimate(formula = all ~ agecell, data = CDdata, cutpoint = 21, 
##     bw = 2, kernel = "rectangular")
## 
## Coefficients:
##      LATE    Half-BW  Double-BW  
##     7.663      9.753      7.663
plot (RDD_package2)      #plotting the estimates from packaged RDD with kernel and bw specified

Comparison with earlier outputs: Everything else unchanged, changing the bw to 2 and using rectangular kernel weights resulted in exactly same LATE estimate as we obtained from OLS (i.e., simpleRDD).