i. Example of an IV paper.
a. Full reference to the paper: authors, title of the paper,
journal, year, and DOI.
# Acemoglu, Daron, Simon Johnson, and James A. Robinson. "The colonial origins of comparative development: An empirical investigation." American economic review 91.5 (2001): 1369-1401.
b. Very short description of the main research question the paper
tries to answer (max 3 lines).
# How does the difference in institutions shape the difference in economic growth in different countries?
c. Outcome variable, treatment variable, and main argument of why
the treatment is likely endogenous in the causal relationship of
interest (max 5 lines).
# The outcome variable is the log of Gross Domestic Product (GDP) per capita in a colony in 1995. The treatment variable is the average expropriation risk in the past 100 years. The average expropriation risk may be correlated with the completeness of the institution of the economy which may originate from the local economic condition.
d. Instrumental variable(s).
# The Instrumental Variable (IV) is the settler mortality rates in the country in 1900.
e. Draw the directed acyclic graph (DAG) corresponding with the
identication strategy.
# The endogenous relationship
dagify(
"GDP" ~ "Risk",
"Risk" ~ "GDP") %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point() +
geom_dag_edges_arc() +
geom_dag_text() +
theme_dag()

# The exogenous relationship with IV
dagify(
"GDP" ~ "Risk",
"Risk" ~ "Institution",
"Institution" ~ "Mortality") %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point() +
geom_dag_edges_arc() +
geom_dag_text() +
theme_dag()

f. Main argument of why the instrument(s) is/are relevant (max 3
lines).
# The lower the settler mortality rate in the country was, the higher probability that the Western legal system to protect the private enterprises would be established in the country, and the less expropriated risk would the enterprises be suffered.
g. Main argument of why the instrument(s) is/are valid (max 3
lines).
# It is difficult for a latter economic development outcome to impact the former settler mortality rate in about a century ago.
h. Main finding of the paper (max 3 lines).
# The less mortal the settlers were, the better environment the local enterprises would face, and the better economic growth the country would experience.
iii. ITT and LATE.
# Load data
data(JC)
table(JC$assignment)
##
## 0 1
## 3663 5577
table(JC$trainy1)
##
## 0 1
## 2666 6574
table(JC$assignment, JC$trainy1)
##
## 0 1
## 0 1809 1854
## 1 857 4720
prop.table(table(JC$assignment, JC$trainy1))
##
## 0 1
## 0 0.19577922 0.20064935
## 1 0.09274892 0.51082251
as.data.frame(table(JC$assignment, JC$trainy1))
## Var1 Var2 Freq
## 1 0 0 1809
## 2 1 0 857
## 3 0 1 1854
## 4 1 1 4720
a. Intent-To-Treat (ITT) effect.
JCx0 <- JC[which(JC$assignment == 0), ]
JCx1 <- JC[which(JC$assignment == 1), ]
mean(JCx1$earny4) - mean(JCx0$earny4)
## [1] 16.05513
m_itt <- lm(earny4 ~ assignment, data = JC)
summary(m_itt)
##
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -213.98 -164.65 -24.02 99.25 2211.98
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 197.926 3.212 61.620 < 2e-16 ***
## assignment 16.055 4.134 3.883 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared: 0.00163, Adjusted R-squared: 0.001522
## F-statistic: 15.08 on 1 and 9238 DF, p-value: 0.0001038
b. Complier share.
mean(JCx1$trainy1) - mean(JCx0$trainy1)
## [1] 0.3401906
m_cs <- lm(trainy1 ~ assignment, data = JC)
summary(m_cs)
##
## Call:
## lm(formula = trainy1 ~ assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8463 -0.5061 0.1537 0.1537 0.4939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.506143 0.006964 72.68 <2e-16 ***
## assignment 0.340191 0.008963 37.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4215 on 9238 degrees of freedom
## Multiple R-squared: 0.1349, Adjusted R-squared: 0.1348
## F-statistic: 1440 on 1 and 9238 DF, p-value: < 2.2e-16
c. Complier/Local Average Treatment Eect (LATE).
JCz0 <- JC[which(JC$trainy1 == 0), ]
JCz1 <- JC[which(JC$trainy1 == 1), ]
mean(JCz1$earny4) - mean(JCz0$earny4)
## [1] 14.22828
library(AER)
m_iv <- ivreg(earny4 ~ trainy1 | assignment, data = JC)
summary(m_iv)
##
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -221.23 -165.43 -22.55 100.01 2235.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 174.039 8.909 19.536 < 2e-16 ***
## trainy1 47.194 12.192 3.871 0.000109 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797, Adjusted R-squared: -0.004905
## Wald test: 14.98 on 1 and 9238 DF, p-value: 0.0001092
# Interpretation: With the assumption of no intent-to-treat, the treatment effect of the "random assignment" is only 16.055. After the instrumentation, the estimated LATE becomes 47.194. The coefficients in both estimations are significant. The IV estimation may suggest that the assignment is not random and the ITT estimation is biased as people did have intention to be treated.
iv. Replication and extension.
library(haven)
df <- read_dta("Card1995.dta", encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique")
a. OLS, IV, First-stage, and Reduced form estimations using
proximity to college.
df$exp <- df$age76 - df$ed76 - 6
df$exp2 <- (df$exp ^ 2) / 100
library(fixest)
##
## Attaching package: 'fixest'
## The following object is masked from 'package:scales':
##
## pvalue
library(ivreg)
## Warning: package 'ivreg' was built under R version 4.4.3
## Registered S3 methods overwritten by 'ivreg':
## method from
## anova.ivreg AER
## hatvalues.ivreg AER
## model.matrix.ivreg AER
## predict.ivreg AER
## print.ivreg AER
## print.summary.ivreg AER
## summary.ivreg AER
## terms.ivreg AER
## update.ivreg AER
## vcov.ivreg AER
##
## Attaching package: 'ivreg'
## The following objects are masked from 'package:AER':
##
## ivreg, ivreg.fit
m_ols <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
m_fs <- feols(ed76 ~ nearc4 + black + reg76r + smsa76r + exp + exp2, data = df)
df$ed76_fitted <- fitted(m_fs)
m_iv <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76_fitted, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
m_rf <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + nearc4, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
tab <- etable(m_ols, m_iv, m_rf, se.below = T)
tab
## m_ols m_iv m_rf
## Dependent Var.: lwage76 lwage76 lwage76
##
## Constant 4.734*** 3.547*** 5.957***
## (0.0676) (0.9280) (0.0364)
## black -0.1896*** -0.1126 -0.2639***
## (0.0176) (0.0607) (0.0185)
## reg76r -0.1249*** -0.0954*** -0.1435***
## (0.0151) (0.0264) (0.0163)
## smsa76r 0.1614*** 0.1278*** 0.1848***
## (0.0156) (0.0320) (0.0175)
## exp 0.0836*** 0.1055*** 0.0533***
## (0.0066) (0.0211) (0.0069)
## exp2 -0.2241*** -0.1872*** -0.2187***
## (0.0318) (0.0361) (0.0340)
## ed76 0.0740***
## (0.0035)
## ed76_fitted 0.1457**
## (0.0555)
## nearc4 0.0446**
## (0.0170)
## _______________ __________ __________ __________
## S.E. type IID IID IID
## Observations 3,010 3,010 3,010
## R2 0.29051 0.18706 0.18706
## Adj. R2 0.28909 0.18543 0.18543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
c. Endogeneity discussion.
df$interaction1 <- df$nearc4a * df$age76
df$interaction2 <- df$nearc4a * df$age76 ^ 2 / 100
m_ols <- feols(lwage76 ~ black + reg76r + smsa76r + ed76 + exp + exp2, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
m_fs_ed76 <- feols(ed76 ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$ed76_fitted <- fitted(m_fs_ed76)
m_fs_exp <- feols(exp ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$exp_fitted <- fitted(m_fs_exp)
m_fs_exp2 <- feols(exp2 ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$exp2_fitted <- fitted(m_fs_exp2)
m_iv <- feols(lwage76 ~ black + reg76r + smsa76r + ed76_fitted + exp_fitted + exp2_fitted, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
m_rf <- feols(lwage76 ~ black + reg76r + smsa76r + nearc4a + nearc4b + interaction1 + interaction2, data = df)
## NOTE: 603 observations removed because of NA values (LHS: 603).
tab <- etable(m_ols, m_iv, m_rf, se.below = T)
tab
## m_ols m_iv m_rf
## Dependent Var.: lwage76 lwage76 lwage76
##
## Constant 4.734*** 3.857*** 6.215***
## (0.0676) (0.3790) (0.0178)
## black -0.1896*** -0.0418 -0.2413***
## (0.0176) (0.0569) (0.0180)
## reg76r -0.1249*** -0.0813*** -0.1371***
## (0.0151) (0.0235) (0.0160)
## smsa76r 0.1614*** 0.0777* 0.1791***
## (0.0156) (0.0380) (0.0170)
## ed76 0.0740***
## (0.0035)
## exp 0.0836***
## (0.0066)
## exp2 -0.2241***
## (0.0318)
## ed76_fitted 0.1725***
## (0.0353)
## exp_fitted -0.0251
## (0.0343)
## exp2_fitted 0.3419*
## (0.1719)
## nearc4a -1.488
## (0.9554)
## nearc4b -0.0024
## (0.0214)
## interaction1 0.0642
## (0.0668)
## interaction2 -0.0323
## (0.1155)
## _______________ __________ __________ __________
## S.E. type IID IID IID
## Observations 3,010 3,010 3,010
## R2 0.29051 0.22267 0.22286
## Adj. R2 0.28909 0.22111 0.22105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Interpretation: In the previous several models, the coefficient of ed76 would decrease after instrumented, while, in these model, with three variables are instrumented, the coefficient of ed76 increases to exceed 0.1 (and still stay significant).
# Endogeneity test
m_iv <- ivreg(lwage76 ~ black + reg76r + smsa76r + exp + exp2 | ed76 | nearc4a + nearc4b + interaction1 + interaction2, data = df)
summary(m_iv, diagnostics = TRUE)
##
## Call:
## ivreg(formula = lwage76 ~ black + reg76r + smsa76r + exp + exp2 |
## ed76 | nearc4a + nearc4b + interaction1 + interaction2, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.61638 -0.22444 0.02206 0.24233 1.34656
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.590107 0.106727 43.008 < 2e-16 ***
## ed76 0.082539 0.006030 13.688 < 2e-16 ***
## black -0.181022 0.018325 -9.878 < 2e-16 ***
## reg76r -0.121940 0.015226 -8.009 1.64e-15 ***
## smsa76r 0.157018 0.015793 9.942 < 2e-16 ***
## exp 0.087094 0.006952 12.529 < 2e-16 ***
## exp2 -0.224720 0.031817 -7.063 2.02e-12 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 4 3000 384.015 <2e-16 ***
## Wu-Hausman 1 3002 3.034 0.0817 .
## Sargan 3 NA 10.978 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3746 on 3003 degrees of freedom
## Multiple R-Squared: 0.2891, Adjusted R-squared: 0.2877
## Wald test: 161.6 on 6 and 3003 DF, p-value: < 2.2e-16
# Test result: With the Durbin-Wu-Hausman test, we reject the null hypothesis at 0.1 level of significance, as ed76 is an endogenous variables.