i. Example of an IV paper.

a. Full reference to the paper: authors, title of the paper, journal, year, and DOI.

# Acemoglu, Daron, Simon Johnson, and James A. Robinson. "The colonial origins of comparative development: An empirical investigation." American economic review 91.5 (2001): 1369-1401.

b. Very short description of the main research question the paper tries to answer (max 3 lines).

# How does the difference in institutions shape the difference in economic growth in different countries?

c. Outcome variable, treatment variable, and main argument of why the treatment is likely endogenous in the causal relationship of interest (max 5 lines).

# The outcome variable is the log of Gross Domestic Product (GDP) per capita in a colony in 1995. The treatment variable is the average expropriation risk in the past 100 years. The average expropriation risk may be correlated with the completeness of the institution of the economy which may originate from the local economic condition.

d. Instrumental variable(s).

# The Instrumental Variable (IV) is the settler mortality rates in the country in 1900.

e. Draw the directed acyclic graph (DAG) corresponding with the identication strategy.

# The endogenous relationship
dagify(
  "GDP" ~ "Risk",
  "Risk" ~ "GDP") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_point() +
  geom_dag_edges_arc() +
  geom_dag_text() +
  theme_dag()

# The exogenous relationship with IV
dagify(
  "GDP" ~ "Risk",
  "Risk" ~ "Institution",
  "Institution" ~ "Mortality") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_point() +
  geom_dag_edges_arc() +
  geom_dag_text() +
  theme_dag()

f. Main argument of why the instrument(s) is/are relevant (max 3 lines).

# The lower the settler mortality rate in the country was, the higher probability that the Western legal system to protect the private enterprises would be established in the country, and the less expropriated risk would the enterprises be suffered.

g. Main argument of why the instrument(s) is/are valid (max 3 lines).

# It is difficult for a latter economic development outcome to impact the former settler mortality rate in about a century ago.

h. Main finding of the paper (max 3 lines).

# The less mortal the settlers were, the better environment the local enterprises would face, and the better economic growth the country would experience.

ii. Hausman-Nevo instrument.

Discussion.

# Evidence: In DellaVigna et al. (2019), differences in average prices between chains are broadly consistent with the optimal benchmark, which argues that such an instrument may satisfy the exclusion restriction.

iii. ITT and LATE.

# Load data
data(JC)
table(JC$assignment)

## 
##    0    1 
## 3663 5577

table(JC$trainy1)

## 
##    0    1 
## 2666 6574

table(JC$assignment, JC$trainy1)

##    
##        0    1
##   0 1809 1854
##   1  857 4720

prop.table(table(JC$assignment, JC$trainy1))

##    
##              0          1
##   0 0.19577922 0.20064935
##   1 0.09274892 0.51082251

as.data.frame(table(JC$assignment, JC$trainy1))

##   Var1 Var2 Freq
## 1    0    0 1809
## 2    1    0  857
## 3    0    1 1854
## 4    1    1 4720

a. Intent-To-Treat (ITT) effect.

JCx0 <- JC[which(JC$assignment == 0), ]
JCx1 <- JC[which(JC$assignment == 1), ]
mean(JCx1$earny4) - mean(JCx0$earny4)

## [1] 16.05513

m_itt <- lm(earny4 ~ assignment, data = JC)
summary(m_itt)

## 
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -213.98 -164.65  -24.02   99.25 2211.98 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  197.926      3.212  61.620  < 2e-16 ***
## assignment    16.055      4.134   3.883 0.000104 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared:  0.00163,    Adjusted R-squared:  0.001522 
## F-statistic: 15.08 on 1 and 9238 DF,  p-value: 0.0001038

b. Complier share.

mean(JCx1$trainy1) - mean(JCx0$trainy1)

## [1] 0.3401906

m_cs <- lm(trainy1 ~ assignment, data = JC)
summary(m_cs)

## 
## Call:
## lm(formula = trainy1 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8463 -0.5061  0.1537  0.1537  0.4939 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.506143   0.006964   72.68   <2e-16 ***
## assignment  0.340191   0.008963   37.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4215 on 9238 degrees of freedom
## Multiple R-squared:  0.1349, Adjusted R-squared:  0.1348 
## F-statistic:  1440 on 1 and 9238 DF,  p-value: < 2.2e-16

c. Complier/Local Average Treatment Eect (LATE).

JCz0 <- JC[which(JC$trainy1 == 0), ]
JCz1 <- JC[which(JC$trainy1 == 1), ]
mean(JCz1$earny4) - mean(JCz0$earny4)

## [1] 14.22828

library(AER)

m_iv <- ivreg(earny4 ~ trainy1 | assignment, data = JC)
summary(m_iv)

## 
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -221.23 -165.43  -22.55  100.01 2235.87 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  174.039      8.909  19.536  < 2e-16 ***
## trainy1       47.194     12.192   3.871 0.000109 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797,   Adjusted R-squared: -0.004905 
## Wald test: 14.98 on 1 and 9238 DF,  p-value: 0.0001092

# Interpretation: With the assumption of no intent-to-treat, the treatment effect of the "random assignment" is only 16.055. After the instrumentation, the estimated LATE becomes 47.194. The coefficients in both estimations are significant. The IV estimation may suggest that the assignment is not random and the ITT estimation is biased as people did have intention to be treated.

iv. Replication and extension.

library(haven)

df <- read_dta("Card1995.dta", encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique")

a. OLS, IV, First-stage, and Reduced form estimations using proximity to college.

df$exp <- df$age76 - df$ed76 - 6
df$exp2 <- (df$exp ^ 2) / 100

library(fixest)

## 
## Attaching package: 'fixest'

## The following object is masked from 'package:scales':
## 
##     pvalue

library(ivreg)

## Warning: package 'ivreg' was built under R version 4.4.3

## Registered S3 methods overwritten by 'ivreg':
##   method              from
##   anova.ivreg         AER 
##   hatvalues.ivreg     AER 
##   model.matrix.ivreg  AER 
##   predict.ivreg       AER 
##   print.ivreg         AER 
##   print.summary.ivreg AER 
##   summary.ivreg       AER 
##   terms.ivreg         AER 
##   update.ivreg        AER 
##   vcov.ivreg          AER

## 
## Attaching package: 'ivreg'

## The following objects are masked from 'package:AER':
## 
##     ivreg, ivreg.fit

m_ols <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_fs <- feols(ed76 ~ nearc4 + black + reg76r + smsa76r + exp + exp2, data = df)
df$ed76_fitted <- fitted(m_fs)
m_iv <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76_fitted, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_rf <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + nearc4, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

tab <- etable(m_ols, m_iv, m_rf, se.below = T)
tab

##                      m_ols       m_iv       m_rf
## Dependent Var.:    lwage76    lwage76    lwage76
##                                                 
## Constant         4.734***   3.547***   5.957*** 
##                 (0.0676)   (0.9280)   (0.0364)  
## black           -0.1896*** -0.1126    -0.2639***
##                 (0.0176)   (0.0607)   (0.0185)  
## reg76r          -0.1249*** -0.0954*** -0.1435***
##                 (0.0151)   (0.0264)   (0.0163)  
## smsa76r          0.1614***  0.1278***  0.1848***
##                 (0.0156)   (0.0320)   (0.0175)  
## exp              0.0836***  0.1055***  0.0533***
##                 (0.0066)   (0.0211)   (0.0069)  
## exp2            -0.2241*** -0.1872*** -0.2187***
##                 (0.0318)   (0.0361)   (0.0340)  
## ed76             0.0740***                      
##                 (0.0035)                        
## ed76_fitted                 0.1457**            
##                            (0.0555)             
## nearc4                                 0.0446** 
##                                       (0.0170)  
## _______________ __________ __________ __________
## S.E. type              IID        IID        IID
## Observations         3,010      3,010      3,010
## R2                 0.29051    0.18706    0.18706
## Adj. R2            0.28909    0.18543    0.18543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

b. IV, First-stage, and Reduced form estimations using proximity to public and private colleges as instruments.

m_ols <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_fs <- feols(ed76 ~ nearc4a + nearc4b + black + reg76r + smsa76r + exp + exp2, data = df)
df$ed76_fitted <- fitted(m_fs)
m_iv <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + ed76_fitted, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_rf <- feols(lwage76 ~ black + reg76r + smsa76r + exp + exp2 + nearc4a + nearc4b, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

tab <- etable(m_ols, m_iv, m_rf, se.below = T)
tab

##                      m_ols       m_iv       m_rf
## Dependent Var.:    lwage76    lwage76    lwage76
##                                                 
## Constant         4.734***   3.073***   5.956*** 
##                 (0.0676)   (0.7116)   (0.0363)  
## black           -0.1896*** -0.0827    -0.2639***
##                 (0.0176)   (0.0481)   (0.0185)  
## reg76r          -0.1249*** -0.0850*** -0.1384***
##                 (0.0151)   (0.0227)   (0.0164)  
## smsa76r          0.1614***  0.1136***  0.1839***
##                 (0.0156)   (0.0268)   (0.0175)  
## exp              0.0836***  0.1155***  0.0526***
##                 (0.0066)   (0.0167)   (0.0069)  
## exp2            -0.2241*** -0.1797*** -0.2146***
##                 (0.0318)   (0.0353)   (0.0340)  
## ed76             0.0740***                      
##                 (0.0035)                        
## ed76_fitted                 0.1742***           
##                            (0.0426)             
## nearc4a                                0.0640***
##                                       (0.0180)  
## nearc4b                               -0.0001   
##                                       (0.0219)  
## _______________ __________ __________ __________
## S.E. type              IID        IID        IID
## Observations         3,010      3,010      3,010
## R2                 0.29051    0.18971    0.18987
## Adj. R2            0.28909    0.18809    0.18798
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Interpretation: At the mean, 1 year of education will increase the weekly earnings by 17.42% at 0.001 level of significance.

c. Endogeneity discussion.

df$interaction1 <- df$nearc4a * df$age76
df$interaction2 <- df$nearc4a * df$age76 ^ 2 / 100

m_ols <- feols(lwage76 ~ black + reg76r + smsa76r + ed76 + exp + exp2, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_fs_ed76 <- feols(ed76 ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$ed76_fitted <- fitted(m_fs_ed76)
m_fs_exp <- feols(exp ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$exp_fitted <- fitted(m_fs_exp)
m_fs_exp2 <- feols(exp2 ~ nearc4a + nearc4b + interaction1 + interaction2 + black + reg76r + smsa76r, data = df)
df$exp2_fitted <- fitted(m_fs_exp2)
m_iv <- feols(lwage76 ~ black + reg76r + smsa76r + ed76_fitted + exp_fitted + exp2_fitted, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

m_rf <- feols(lwage76 ~ black + reg76r + smsa76r + nearc4a + nearc4b + interaction1 + interaction2, data = df)

## NOTE: 603 observations removed because of NA values (LHS: 603).

tab <- etable(m_ols, m_iv, m_rf, se.below = T)
tab

##                      m_ols       m_iv       m_rf
## Dependent Var.:    lwage76    lwage76    lwage76
##                                                 
## Constant         4.734***   3.857***   6.215*** 
##                 (0.0676)   (0.3790)   (0.0178)  
## black           -0.1896*** -0.0418    -0.2413***
##                 (0.0176)   (0.0569)   (0.0180)  
## reg76r          -0.1249*** -0.0813*** -0.1371***
##                 (0.0151)   (0.0235)   (0.0160)  
## smsa76r          0.1614***  0.0777*    0.1791***
##                 (0.0156)   (0.0380)   (0.0170)  
## ed76             0.0740***                      
##                 (0.0035)                        
## exp              0.0836***                      
##                 (0.0066)                        
## exp2            -0.2241***                      
##                 (0.0318)                        
## ed76_fitted                 0.1725***           
##                            (0.0353)             
## exp_fitted                 -0.0251              
##                            (0.0343)             
## exp2_fitted                 0.3419*             
##                            (0.1719)             
## nearc4a                               -1.488    
##                                       (0.9554)  
## nearc4b                               -0.0024   
##                                       (0.0214)  
## interaction1                           0.0642   
##                                       (0.0668)  
## interaction2                          -0.0323   
##                                       (0.1155)  
## _______________ __________ __________ __________
## S.E. type              IID        IID        IID
## Observations         3,010      3,010      3,010
## R2                 0.29051    0.22267    0.22286
## Adj. R2            0.28909    0.22111    0.22105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Interpretation: In the previous several models, the coefficient of ed76 would decrease after instrumented, while, in these model, with three variables are instrumented, the coefficient of ed76 increases to exceed 0.1 (and still stay significant).

# Endogeneity test
m_iv <- ivreg(lwage76 ~ black + reg76r + smsa76r + exp + exp2 | ed76 | nearc4a + nearc4b + interaction1 + interaction2, data = df)
summary(m_iv, diagnostics = TRUE)

## 
## Call:
## ivreg(formula = lwage76 ~ black + reg76r + smsa76r + exp + exp2 | 
##     ed76 | nearc4a + nearc4b + interaction1 + interaction2, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.61638 -0.22444  0.02206  0.24233  1.34656 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.590107   0.106727  43.008  < 2e-16 ***
## ed76         0.082539   0.006030  13.688  < 2e-16 ***
## black       -0.181022   0.018325  -9.878  < 2e-16 ***
## reg76r      -0.121940   0.015226  -8.009 1.64e-15 ***
## smsa76r      0.157018   0.015793   9.942  < 2e-16 ***
## exp          0.087094   0.006952  12.529  < 2e-16 ***
## exp2        -0.224720   0.031817  -7.063 2.02e-12 ***
## 
## Diagnostic tests:
##                   df1  df2 statistic p-value    
## Weak instruments    4 3000   384.015  <2e-16 ***
## Wu-Hausman          1 3002     3.034  0.0817 .  
## Sargan              3   NA    10.978  0.0118 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3746 on 3003 degrees of freedom
## Multiple R-Squared: 0.2891,  Adjusted R-squared: 0.2877 
## Wald test: 161.6 on 6 and 3003 DF,  p-value: < 2.2e-16

# Test result: With the Durbin-Wu-Hausman test, we reject the null hypothesis at 0.1 level of significance, as ed76 is an endogenous variables.

EEFE 530, Spring 2025

Problem Set 4

Timothy (Yuan) Zhuang

25 March 2025

i. Example of an IV paper.

a. Full reference to the paper: authors, title of the paper, journal, year, and DOI.

b. Very short description of the main research question the paper tries to answer (max 3 lines).

c. Outcome variable, treatment variable, and main argument of why the treatment is likely endogenous in the causal relationship of interest (max 5 lines).

d. Instrumental variable(s).

e. Draw the directed acyclic graph (DAG) corresponding with the identication strategy.

f. Main argument of why the instrument(s) is/are relevant (max 3 lines).

g. Main argument of why the instrument(s) is/are valid (max 3 lines).

h. Main finding of the paper (max 3 lines).

ii. Hausman-Nevo instrument.

Discussion.

iii. ITT and LATE.

a. Intent-To-Treat (ITT) effect.

c. Complier/Local Average Treatment Eect (LATE).

iv. Replication and extension.

a. OLS, IV, First-stage, and Reduced form estimations using proximity to college.

b. IV, First-stage, and Reduced form estimations using proximity to public and private colleges as instruments.

c. Endogeneity discussion.