i

Reference:

Huang, R., & Gale, F. (2009). Food Demand in China: Income, Quality, and Nutrient Effects. China Agricultural Economic Review, 1(4), 395–409.

Research Question:

How do income levels influence food demand, particularly concerning food quality and nutrient intake, among Chinese households?

Outcome Variable, Treatment Variable, and Endogeneity Concern:

Outcome Variable: Household food expenditure on various food categories.

Treatment Variable: Household income.

Endogeneity Concern: Household income may be endogenous due to measurement errors or unobserved factors (e.g., household preferences or regional economic conditions) that simultaneously affect income and food expenditure.

Instrumental Variable(s):

Regional average income, excluding the household’s own income, to instrument for household income.

Directed Acyclic Graph (DAG):

I will generate and display a DAG to illustrate the identification strategy.

Relevance of the Instrument:

Regional average income is correlated with individual household income due to shared economic environments, making it a relevant instrument.

Validity of the Instrument:

Assuming that regional average income affects a household’s food expenditure only through its impact on the household’s own income, it satisfies the exclusion restriction, thereby serving as a valid instrument.

Main Finding:

Higher household income leads to increased expenditure on higher-quality food items and improved nutrient intake, highlighting the role of income in dietary choices.

ii

Oh & Vukina (2022) justify the validity of the Hausman-Nevo instrument by emphasizing that price variations across markets primarily stem from correlated marginal cost shocks rather than demand-side fluctuations. They argue that since egg prices are influenced by shared supply factors like feed costs and wholesale price trends, using the price of the same product in other markets as an instrument is appropriate. To further mitigate concerns about correlated demand shocks across regions, the authors include seasonal controls (e.g., for Easter and Christmas) and retail chain-specific fixed effects to capture store-level pricing strategies.

To reinforce the exclusion restriction, the authors conduct robustness checks by testing alternative instruments, such as wholesale egg prices and feed costs, and find similar results. They also explore different ways of constructing the Hausman-Nevo instrument and confirm that their findings remain consistent. These steps strengthen the argument that price variations across markets are driven by supply-side factors rather than unobserved local demand shocks, supporting the instrument’s validity in their demand estimation framework.

iii

library(causalweight)

## Warning: 程序包'causalweight'是用R版本4.4.3 来建造的

## 载入需要的程序包：ranger

## Warning: 程序包'ranger'是用R版本4.4.3 来建造的

data(JC)
head(JC)

##    assignment female age white black hispanic educ educmis geddegree hsdegree
## 2           0      0  24     0     1        0   12       0         0        1
## 3           1      1  18     1     0        0    8       0         1        0
## 5           0      1  18     0     1        0   10       0         0        0
## 7           1      1  17     1     0        0   10       0         0        0
## 9           1      0  21     0     0        1   12       0         0        1
## 10          1      0  17     0     0        1    0       1         0        0
##    english cohabmarried haschild everwkd mwearn hhsize hhsizemis educmum
## 2        1            0        1       0    350      2         0      12
## 3        1            0        0       0      0      7         0      12
## 5        1            0        0       0    472      2         0      12
## 7        1            0        0       0      0      5         0      11
## 9        0            0        0       0      0      5         0       0
## 10       0            0        0       0      0      0         1       0
##    educmummis educdad educdadmis welfarechild welfarechildmis health healthmis
## 2           0      12          0            1               0      1         0
## 3           0      12          0            1               0      1         0
## 5           0      12          0            1               0      1         0
## 7           0      12          0            2               0      3         0
## 9           1       0          1            3               0      3         0
## 10          1       0          1            0               1      0         1
##    smoke smokemis alcohol alcoholmis everwkdy1    earnq4 earnq4mis   pworky1
## 2      1        0       4          0         0   0.00000         0 51.923077
## 3      1        0       3          0         1 308.00000         0 21.153847
## 5      0        1       0          1         0   0.00000         0 13.461538
## 7      1        0       4          0         1 175.00000         0 65.384613
## 9      2        0       4          0         1  13.84615         0  1.923077
## 10     0        1       0          1         0   0.00000         0  0.000000
##    pworky1mis health12 health12mis trainy1 trainy2   pworky2  pworky3   pworky4
## 2           0        2           0       0       0  26.92308 86.53846  65.38461
## 3           0        2           0       1       0  30.76923 26.92308  71.15385
## 5           0        3           0       1       0   0.00000 36.53846  11.53846
## 7           0        3           0       1       1 100.00000 32.69231  15.38461
## 9           0        3           0       1       0  44.23077 96.15385 100.00000
## 10          0        1           0       0       1  40.38462 59.61538 100.00000
##       earny2    earny3    earny4 health30 health48
## 2   63.38769 238.99600 265.18832        2        2
## 3   95.60345  41.87040 216.83582        1        2
## 5    0.00000  52.75866  11.85605        3        2
## 7  285.62827  71.36455  18.29637        2        3
## 9   86.73231 247.79420 221.14769        2        2
## 10 116.47488 230.20717 671.80804        2        2

a

ITT_manual <- mean(JC$earny4[JC$assignment == 1], na.rm = TRUE) - 
              mean(JC$earny4[JC$assignment == 0], na.rm = TRUE)
print(ITT_manual)

## [1] 16.05513

ITT_model <- lm(earny4 ~ assignment, data = JC)
summary(ITT_model)

## 
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -213.98 -164.65  -24.02   99.25 2211.98 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  197.926      3.212  61.620  < 2e-16 ***
## assignment    16.055      4.134   3.883 0.000104 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared:  0.00163,    Adjusted R-squared:  0.001522 
## F-statistic: 15.08 on 1 and 9238 DF,  p-value: 0.0001038

b

take_up_treatment <- mean(JC$trainy1[JC$assignment == 1], na.rm = TRUE)
take_up_control <- mean(JC$trainy1[JC$assignment == 0], na.rm = TRUE)


complier_share <- take_up_treatment - take_up_control
print(complier_share)

## [1] 0.3401906

complier_model <- lm(trainy1 ~ assignment, data = JC)
summary(complier_model)

## 
## Call:
## lm(formula = trainy1 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8463 -0.5061  0.1537  0.1537  0.4939 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.506143   0.006964   72.68   <2e-16 ***
## assignment  0.340191   0.008963   37.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4215 on 9238 degrees of freedom
## Multiple R-squared:  0.1349, Adjusted R-squared:  0.1348 
## F-statistic:  1440 on 1 and 9238 DF,  p-value: < 2.2e-16

c

LATE_manual <- ITT_manual / complier_share
print(LATE_manual)

## [1] 47.1945

library(AER)

## Warning: 程序包'AER'是用R版本4.4.3 来建造的

## 载入需要的程序包：car

## Warning: 程序包'car'是用R版本4.4.1 来建造的

## 载入需要的程序包：carData

## Warning: 程序包'carData'是用R版本4.4.1 来建造的

## 
## 载入程序包：'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

## 载入需要的程序包：lmtest

## Warning: 程序包'lmtest'是用R版本4.4.1 来建造的

## 载入需要的程序包：zoo

## Warning: 程序包'zoo'是用R版本4.4.1 来建造的

## 
## 载入程序包：'zoo'

## The following objects are masked from 'package:data.table':
## 
##     yearmon, yearqtr

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 载入需要的程序包：sandwich

## Warning: 程序包'sandwich'是用R版本4.4.1 来建造的

## 载入需要的程序包：survival

LATE_model <- ivreg(earny4 ~ trainy1 | assignment, data = JC)
summary(LATE_model)

## 
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -221.23 -165.43  -22.55  100.01 2235.87 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  174.039      8.909  19.536  < 2e-16 ***
## trainy1       47.194     12.192   3.871 0.000109 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797,   Adjusted R-squared: -0.004905 
## Wald test: 14.98 on 1 and 9238 DF,  p-value: 0.0001092

The LATE estimate from the 2SLS regression is 47.194, meaning that for compliers—individuals whose participation in training was influenced by random assignment—the JC program increased their yearly earnings by approximately $47.19 in the fourth year post-treatment. This effect is statistically significant (p = 0.000109), indicating strong evidence that the program had a positive and meaningful impact on earnings for those who were induced to participate due to the random assignment.

iv

OLS, IV, First-stage, and Reduced form estimations using proximity to college.

library(foreign)
library(AER)
library(stargazer)
library(texreg)

## Warning: 程序包'texreg'是用R版本4.4.3 来建造的

## Version:  1.39.4
## Date:     2024-07-23
## Author:   Philip Leifeld (University of Manchester)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").

## 
## 载入程序包：'texreg'

## The following object is masked from 'package:tidyr':
## 
##     extract

library(haven)

## Warning: 程序包'haven'是用R版本4.4.3 来建造的

data_path <- "C:/Users/mohan/Dropbox/Mohan_files/530/PS4/Card1995.dta"
nls <- read_dta(data_path)

# OLS Regression (Baseline)
# age76−ed76−6
nls$exp <- nls$age76 - nls$ed76 -6
ols_model <- lm(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(ols_model)

## 
## Call:
## lm(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black + reg76r + 
##     smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59297 -0.22315  0.01893  0.24223  1.33190 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.733664   0.067603  70.022  < 2e-16 ***
## ed76          0.074009   0.003505  21.113  < 2e-16 ***
## exp           0.083596   0.006648  12.575  < 2e-16 ***
## I(exp^2/100) -0.224088   0.031784  -7.050 2.21e-12 ***
## black        -0.189632   0.017627 -10.758  < 2e-16 ***
## reg76r       -0.124862   0.015118  -8.259  < 2e-16 ***
## smsa76r       0.161423   0.015573  10.365  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3742 on 3003 degrees of freedom
##   (因为不存在，603个观察量被删除了)
## Multiple R-squared:  0.2905, Adjusted R-squared:  0.2891 
## F-statistic: 204.9 on 6 and 3003 DF,  p-value: < 2.2e-16

The coefficient on ed76 measures the correlation between schooling and wages, but it may be biased due to endogeneity

#First-Stage Regression

first_stage <- lm(ed76 ~ nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(first_stage)

## 
## Call:
## lm(formula = ed76 ~ nearc4 + exp + I(exp^2/100) + black + reg76r + 
##     smsa76r, data = nls)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6389 -1.4325 -0.1028  1.3268  6.2332 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.53964    0.16286 101.559  < 2e-16 ***
## nearc4        0.30628    0.07666   3.995 6.59e-05 ***
## exp          -0.35881    0.03040 -11.805  < 2e-16 ***
## I(exp^2/100) -0.21620    0.14590  -1.482    0.138    
## black        -1.03873    0.08358 -12.428  < 2e-16 ***
## reg76r       -0.32964    0.07385  -4.464 8.29e-06 ***
## smsa76r       0.39091    0.07788   5.019 5.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.982 on 3606 degrees of freedom
## Multiple R-squared:  0.4813, Adjusted R-squared:  0.4805 
## F-statistic: 557.7 on 6 and 3606 DF,  p-value: < 2.2e-16

The coefficient on nearc4 is positive and significant, indicating that individuals near a 4-year college tend to get more schooling.

 # IV Regression Using nearc4 as an Instrument for ed76

iv_model <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r | 
                     nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(iv_model)

## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
##      arithmetic operators in their names;
##   the printed representation of the hypothesis will be omitted

## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black + 
##     reg76r + smsa76r | nearc4 + exp + I(exp^2/100) + black + 
##     reg76r + smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82125 -0.24065  0.02368  0.25469  1.43205 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.75278    0.82934   4.525 6.27e-06 ***
## ed76          0.13229    0.04923   2.687  0.00725 ** 
## exp           0.10750    0.02130   5.047 4.76e-07 ***
## I(exp^2/100) -0.22841    0.03341  -6.836 9.84e-12 ***
## black        -0.13080    0.05287  -2.474  0.01342 *  
## reg76r       -0.10490    0.02307  -4.546 5.67e-06 ***
## smsa76r       0.13132    0.03013   4.359 1.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.391 on 3003 degrees of freedom
## Multiple R-Squared: 0.2252,  Adjusted R-squared: 0.2237 
## Wald test: 120.8 on 6 and 3003 DF,  p-value: < 2.2e-16

# Reduced-Form Regression
reduced_form <- lm(lwage76 ~ nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(reduced_form)

## 
## Call:
## lm(formula = lwage76 ~ nearc4 + exp + I(exp^2/100) + black + 
##     reg76r + smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.56525 -0.24771  0.01465  0.27091  1.38743 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.956604   0.036371 163.775  < 2e-16 ***
## nearc4        0.044624   0.017011   2.623  0.00876 ** 
## exp           0.053258   0.006948   7.666 2.38e-14 ***
## I(exp^2/100) -0.218720   0.034021  -6.429 1.49e-10 ***
## black        -0.263903   0.018485 -14.277  < 2e-16 ***
## reg76r       -0.143458   0.016336  -8.782  < 2e-16 ***
## smsa76r       0.184752   0.017503  10.555  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4005 on 3003 degrees of freedom
##   (因为不存在，603个观察量被删除了)
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1854 
## F-statistic: 115.2 on 6 and 3003 DF,  p-value: < 2.2e-16

IV, First-stage, and Reduced form estimations using proximity to public and private colleges as instruments

iv_model2 <- ivreg(
  formula = lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r | 
            nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r, 
  data = nls
)
summary(iv_model2)

## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
##      arithmetic operators in their names;
##   the printed representation of the hypothesis will be omitted

## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black + 
##     reg76r + smsa76r | nearc4a + nearc4b + exp + I(exp^2/100) + 
##     black + reg76r + smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.93985 -0.25152  0.01722  0.27365  1.48154 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.26801    0.68718   4.756 2.07e-06 ***
## ed76          0.16109    0.04077   3.951 7.96e-05 ***
## exp           0.11931    0.01818   6.564 6.16e-11 ***
## I(exp^2/100) -0.23054    0.03503  -6.582 5.46e-11 ***
## black        -0.10173    0.04531  -2.245   0.0248 *  
## reg76r       -0.09504    0.02165  -4.389 1.18e-05 ***
## smsa76r       0.11645    0.02705   4.305 1.73e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4108 on 3003 degrees of freedom
## Multiple R-Squared: 0.1447,  Adjusted R-squared: 0.143 
## Wald test:   111 on 6 and 3003 DF,  p-value: < 2.2e-16

The IV regression shows that education (ed76) has a positive and statistically significant causal effect on wages. Urban residence (smsa76r) increases wages, while living in the South (reg76r) lowers them. Experience exhibits the expected concave relationship with wages, and being Black is associated with significantly lower wages.

library(AER)
library(modelsummary)

## Warning: 程序包'modelsummary'是用R版本4.4.1 来建造的

## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
##   backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
## 
## Revert to `kableExtra` for one session:
## 
##   options(modelsummary_factory_default = 'kableExtra')
##   options(modelsummary_factory_latex = 'kableExtra')
##   options(modelsummary_factory_html = 'kableExtra')
## 
## Silence this message forever:
## 
##   config_modelsummary(startup_message = FALSE)

modelsummary(
  list(
    "First Stage" = first_stage,
    "Reduced Form" = reduced_form,
    "IV 2SLS" = iv_model2
  ),
  title = "IV Analysis",
  statistic = "({statistic}) {p.value}", 
  stars = TRUE
)

IV Analysis
	First Stage	Reduced Form	IV 2SLS
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	16.540***	5.957***	3.268***
	(101.559) <0.001	(163.775) <0.001	(4.756) <0.001
nearc4	0.306***	0.045**
	(3.995) <0.001	(2.623) 0.009
exp	-0.359***	0.053***	0.119***
	(-11.805) <0.001	(7.666) <0.001	(6.564) <0.001
I(exp^2/100)	-0.216	-0.219***	-0.231***
	(-1.482) 0.138	(-6.429) <0.001	(-6.582) <0.001
black	-1.039***	-0.264***	-0.102*
	(-12.428) <0.001	(-14.277) <0.001	(-2.245) 0.025
reg76r	-0.330***	-0.143***	-0.095***
	(-4.464) <0.001	(-8.782) <0.001	(-4.389) <0.001
smsa76r	0.391***	0.185***	0.116***
	(5.019) <0.001	(10.555) <0.001	(4.305) <0.001
ed76			0.161***
			(3.951) <0.001
Num.Obs.	3613	3010	3010
R2	0.481	0.187	0.145
R2 Adj.	0.480	0.185	0.143
AIC	15205.5	3043.1	3196.0
BIC	15255.0	3091.2	3244.0
Log.Lik.	-7594.738	-1513.546
F	557.743	115.164
RMSE	1.98	0.40	0.41

In the first-stage regression, (nearc4) has a significantly positive effect on education, indicating that it is a strong instrumental variable. In the reduced-form regression, (nearc4) also has a significantly positive effect on wages. In the IV (2SLS) regression, the coefficient on (ed76) is 0.161 and statistically significant, suggesting that education has a significant causal effect on wages. This supports the conclusion that education is endogenous in the wage regression, and using (nearc4) as an instrument provides a more reliable estimate of the return to schooling.

Endogeneity discussion.

Test if Education (ed76) is Endogenous

library(plm)

## Warning: 程序包'plm'是用R版本4.4.3 来建造的

## 
## 载入程序包：'plm'

## The following object is masked from 'package:data.table':
## 
##     between

## The following objects are masked from 'package:dplyr':
## 
##     between, lag, lead

library(lmtest)
library(AER)

ols_model <- lm(lwage76 ~ ed76  + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
iv_model <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r | 
                     nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r, 
                   data = nls)

residuals_iv <- residuals(lm(ed76 ~ nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls))

# Run OLS again with residuals as an additional regressor
hausman_test <- lm(lwage76 ~ ed76 + residuals_iv + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)

# Check if residuals are significant
summary(hausman_test)

## 
## Call:
## lm(formula = lwage76 ~ ed76 + residuals_iv + exp + I(exp^2/100) + 
##     black + reg76r + smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.60480 -0.22131  0.02145  0.24337  1.31871 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.15091    0.66535   4.736 2.28e-06 ***
## ed76          0.16887    0.03982   4.240 2.30e-05 ***
## residuals_iv -0.09565    0.04000  -2.391   0.0169 *  
## exp           0.11745    0.01564   7.510 7.74e-14 ***
## I(exp^2/100) -0.20269    0.03300  -6.143 9.18e-10 ***
## black        -0.09071    0.04496  -2.017   0.0437 *  
## reg76r       -0.08934    0.02119  -4.217 2.55e-05 ***
## smsa76r       0.11458    0.02502   4.580 4.84e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3739 on 3002 degrees of freedom
##   (因为不存在，603个观察量被删除了)
## Multiple R-squared:  0.2919, Adjusted R-squared:  0.2902 
## F-statistic: 176.7 on 7 and 3002 DF,  p-value: < 2.2e-16

Yes, the result indicates that education is endogenous in the wage regression. The coefficient on the IV residual (residuals_iv) is statistically significant (p = 0.0169), which suggests that the OLS estimate of education is correlated with the error term and therefore biased due to endogeneity. In contrast, experience (exp) remains significant and its coefficient does not suffer from such bias, suggesting that experience is not endogenous in this model.

nls$nearc4a_age <- nls$nearc4a * nls$age76
nls$nearc4a_age2 <- nls$nearc4a * (nls$age76^2 / 100)

iv_model_interact <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r | 
                             nearc4a + nearc4b + nearc4a_age + nearc4a_age2 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, 
                           data = nls)

summary(iv_model_interact)

## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
##      arithmetic operators in their names;
##   the printed representation of the hypothesis will be omitted

## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black + 
##     reg76r + smsa76r | nearc4a + nearc4b + nearc4a_age + nearc4a_age2 + 
##     exp + I(exp^2/100) + black + reg76r + smsa76r, data = nls)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.61638 -0.22444  0.02206  0.24233  1.34656 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.590107   0.106727  43.008  < 2e-16 ***
## ed76          0.082539   0.006030  13.688  < 2e-16 ***
## exp           0.087094   0.006952  12.529  < 2e-16 ***
## I(exp^2/100) -0.224720   0.031817  -7.063 2.02e-12 ***
## black        -0.181022   0.018325  -9.878  < 2e-16 ***
## reg76r       -0.121940   0.015226  -8.009 1.64e-15 ***
## smsa76r       0.157018   0.015793   9.942  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3746 on 3003 degrees of freedom
## Multiple R-Squared: 0.2891,  Adjusted R-squared: 0.2877 
## Wald test: 161.6 on 6 and 3003 DF,  p-value: < 2.2e-16

Compared to the previous IV model using only nearc4a and nearc4b as instruments, the results from this extended model—which includes interactions between nearc4a and age76 and its square (nearc4_age, nearc4_age2)—show a slight decrease in the coefficient on ed76 (from 0.161 to 0.0852), though it remains highly statistically significant (p < 2e-16). This indicates that when allowing for heterogeneous effects of the instruments (by interacting with age), the estimated return to education becomes more conservative. The other coefficients remain qualitatively similar, suggesting that the core relationships are robust, but the strength of the instrumented effect of education on wages becomes less pronounced when accounting for potential effect modification by age.

library(AER)
library(car)
hansen_test <- summary(iv_model_interact, diagnostics = TRUE)

## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
##      arithmetic operators in their names;
##   the printed representation of the hypothesis will be omitted

print(hansen_test$diagnostics)

##                  df1  df2 statistic       p-value
## Weak instruments   4 3000 384.01518 2.345667e-267
## Wu-Hausman         1 3002   3.03355  8.166180e-02
## Sargan             3   NA  10.97760  1.184763e-02

I examine the Wu-Hausman test result. The null hypothesis of this test is that ed76 is exogenous (i.e., OLS is consistent), while the alternative is that ed76 is endogenous (i.e., IV is required). The p-value for the Wu-Hausman test is 0.0817, which is marginally above the typical 5% significance threshold but below 10%. This suggests weak evidence against exogeneity—we cannot strongly reject the null at the 5% level, but there may still be endogeneity concerns at the 10% level. Thus, while the evidence is not definitive, it leans toward treating ed76 as potentially endogenous.

PS4

Mohan

2025-03-22

i

ii

iii

a

b

c

iv

OLS, IV, First-stage, and Reduced form estimations using proximity to college.

IV, First-stage, and Reduced form estimations using proximity to public and private colleges as instruments

Endogeneity discussion.