Regression Discontinuity

a. Download and read the paper

In this exercise, I explore regression discontinuity designs, based the following paper: Carpenter and Dobkin (2009) The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age AEJ: Applied Economics 1(1)164-82

b. Get the Carpenter & Dobkin and data

c. Reproduce Figure 3 of the paper

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=all, shape="All"))+
  geom_point(aes(y=internal, shape="Internal"))+
  geom_point(aes(y=external, shape = "External"))+
  scale_shape_manual(values=c("All" = 18, 
                              "Internal" = 0, 
                              "External" = 17))+
  labs(title = "Age Profiles for Age Deaths")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=allfitted, linetype = "All Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=allfitted, linetype = "All Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=internalfitted, linetype = "Internal Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=internalfitted, linetype = "Internal Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 0), aes(y=externalfitted, linetype = "External Fitted"))+
  geom_line(data = filter(my_data_HW6, over21 == 1), aes(y=externalfitted, linetype = "External Fitted"))

d. Simplest regression-discontinuity design

Qd <-lm(all ~ agecell + over21, data = my_data_HW6)
stargazer(Qd, title="Simplest RD", align=TRUE,  type='html')

**Simplest RD**

	Dependent variable:

	all

agecell	-0.975
	(0.632)

over21	7.663^***
	(1.440)

Constant	112.310^***
	(12.668)


Observations	48
R²	0.595
Adjusted R²	0.577
Residual Std. Error	2.493 (df = 45)
F Statistic	32.995^*** (df = 2; 45)

Note:	p<0.1; p<0.05; p<0.01

Being over 21 is a significant variable in order to explain the probability of death. Once a person turns 21 the amount of deaths per 100,000 goes up by 7.63 on average ceteris paribus. Given the fact that when they turn 21 they have access to drinking this could explain in part why the probability of death goes up. However it should be noted that there might be other variables affecting this results. For example it could be that the probability of owning a car goes up, and this makes them more likely to have a car accident. For this reason it is important to analyze the cause of death to see if there is something in particular that is pushing the numbers.

e. Plot the results

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=allfitted))+
  labs(title = "Results Simple RD")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(fortify(Qd), over21==0), aes(x = agecell, y = .fitted))+
  geom_line(data = filter(fortify(Qd), over21==1), aes(x = agecell, y = .fitted))

f. Allow more flexibility to your RD

Qf <-lm(all ~ agecell + over21 + agecell*over21, data = my_data_HW6)
stargazer(Qf, title="Flexible RD", align=TRUE,  type='html')

**Flexible RD**

	Dependent variable:

	all

agecell	0.827
	(0.819)

over21	83.333^***
	(24.357)

agecell:over21	-3.603^***
	(1.158)

Constant	76.251^***
	(16.396)


Observations	48
R²	0.668
Adjusted R²	0.645
Residual Std. Error	2.283 (df = 44)
F Statistic	29.466^*** (df = 3; 44)

Note:	p<0.1; p<0.05; p<0.01

Now that the interaction term has been added, agents are allowed to exhibit different slopes for the periods before and after 21. The estimate of interest is beta_over21 + 21*beta_over21_and_age = 7.67. The effect now is a little bigger than before, due to the fact that slopes get apart around 21. There are 7.63 more deaths per 100,000 people on average ceteris paribus.

g. Again, plot the data

ggplot(my_data_HW6, aes(x=agecell)) +
  geom_point(aes(y=allfitted))+
  labs(title = "Results Flexible RD")+
  xlab("Age")+
  ylab("Death Per 100.000 person-years")+
  geom_line(data = filter(fortify(Qf), over21==0), aes(x = agecell, y = .fitted))+
  geom_line(data = filter(fortify(Qf), over21==1), aes(x = agecell, y = .fitted))

h. Compare to pre-packaged RDD

Qh <- RDestimate(all ~ agecell, cutpoint = 21, data = my_data_HW6)

Qh_table <- matrix(c(Qh$obs, Qh$est, Qh$se, Qh$z, Qh$p, Qh$bw), nrow = 3, ncol = 6)
Qh_table <- round(Qh_table, digits = 3)
rownames(Qh_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qh_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qh_table %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

	Observations	Coeff.	Std. Error	Z value	Bandwidth
LATE	40	9.001	1.480	6.080	1.656
Half-Bw	20	9.579	1.914	5.004	0.828
Double-Bw	48	7.953	1.278	6.223	3.312

plot(Qh)

The LATE coefficient now is 9.001. Similar to our first estimation. This gives us the Local Average Treatment Effect. The average of deaths augments by 9 over 100,000 ceteris paribus after someone turns 21.

i. Prepackaged linear RDD

Qi <- RDestimate(all ~ agecell, cutpoint = 21, kernel = "rectangular", bw=2, data = my_data_HW6)

Qi_table <- matrix(c(Qi$obs, Qi$est, Qi$se, Qi$z, Qi$p, Qi$bw), nrow = 3, ncol = 6)
Qi_table <- round(Qh_table, digits = 3)
rownames(Qi_table) <- c("LATE", "Half-Bw", "Double-Bw")
colnames(Qi_table) <- c("Observations", "Coeff.", "Std. Error", "Z value", "P-value", "Bandwidth")
Qi_table %>%
  kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

	Observations	Coeff.	Std. Error	Z value	Bandwidth
LATE	40	9.001	1.480	6.080	1.656
Half-Bw	20	9.579	1.914	5.004	0.828
Double-Bw	48	7.953	1.278	6.223	3.312

plot(Qi)

If we allow for the interaction term LATE = 7.663. Now the effect is smaller than in part h, this is probably due to the fact that we are using a rectangular kernell which gives the same weights to all the observations (hence the linear lines in the plot). There are 7.663 per 100,000 more deaths after turning 21 on average, ceteris paribus.

Regression Discontinuity

Aguiar28D

2/18/2021

Regression Discontinuity

a. Download and read the paper

b. Get the Carpenter & Dobkin and data

c. Reproduce Figure 3 of the paper

d. Simplest regression-discontinuity design

e. Plot the results

f. Allow more flexibility to your RD

g. Again, plot the data

h. Compare to pre-packaged RDD

i. Prepackaged linear RDD