Reference:
Huang, R., & Gale, F. (2009). Food Demand in China: Income, Quality, and Nutrient Effects. China Agricultural Economic Review, 1(4), 395–409.
Research Question:
How do income levels influence food demand, particularly concerning food quality and nutrient intake, among Chinese households?
Outcome Variable, Treatment Variable, and Endogeneity Concern:
Outcome Variable: Household food expenditure on various food categories.
Treatment Variable: Household income.
Endogeneity Concern: Household income may be endogenous due to measurement errors or unobserved factors (e.g., household preferences or regional economic conditions) that simultaneously affect income and food expenditure.
Instrumental Variable(s):
Regional average income, excluding the household’s own income, to instrument for household income.
Directed Acyclic Graph (DAG):
I will generate and display a DAG to illustrate the identification strategy.
Relevance of the Instrument:
Regional average income is correlated with individual household income due to shared economic environments, making it a relevant instrument.
Validity of the Instrument:
Assuming that regional average income affects a household’s food expenditure only through its impact on the household’s own income, it satisfies the exclusion restriction, thereby serving as a valid instrument.
Main Finding:
Higher household income leads to increased expenditure on higher-quality food items and improved nutrient intake, highlighting the role of income in dietary choices.
Oh & Vukina (2022) justify the validity of the Hausman-Nevo instrument by emphasizing that price variations across markets primarily stem from correlated marginal cost shocks rather than demand-side fluctuations. They argue that since egg prices are influenced by shared supply factors like feed costs and wholesale price trends, using the price of the same product in other markets as an instrument is appropriate. To further mitigate concerns about correlated demand shocks across regions, the authors include seasonal controls (e.g., for Easter and Christmas) and retail chain-specific fixed effects to capture store-level pricing strategies.
To reinforce the exclusion restriction, the authors conduct robustness checks by testing alternative instruments, such as wholesale egg prices and feed costs, and find similar results. They also explore different ways of constructing the Hausman-Nevo instrument and confirm that their findings remain consistent. These steps strengthen the argument that price variations across markets are driven by supply-side factors rather than unobserved local demand shocks, supporting the instrument’s validity in their demand estimation framework.
library(causalweight)
## Warning: 程序包'causalweight'是用R版本4.4.3 来建造的
## 载入需要的程序包:ranger
## Warning: 程序包'ranger'是用R版本4.4.3 来建造的
data(JC)
head(JC)
## assignment female age white black hispanic educ educmis geddegree hsdegree
## 2 0 0 24 0 1 0 12 0 0 1
## 3 1 1 18 1 0 0 8 0 1 0
## 5 0 1 18 0 1 0 10 0 0 0
## 7 1 1 17 1 0 0 10 0 0 0
## 9 1 0 21 0 0 1 12 0 0 1
## 10 1 0 17 0 0 1 0 1 0 0
## english cohabmarried haschild everwkd mwearn hhsize hhsizemis educmum
## 2 1 0 1 0 350 2 0 12
## 3 1 0 0 0 0 7 0 12
## 5 1 0 0 0 472 2 0 12
## 7 1 0 0 0 0 5 0 11
## 9 0 0 0 0 0 5 0 0
## 10 0 0 0 0 0 0 1 0
## educmummis educdad educdadmis welfarechild welfarechildmis health healthmis
## 2 0 12 0 1 0 1 0
## 3 0 12 0 1 0 1 0
## 5 0 12 0 1 0 1 0
## 7 0 12 0 2 0 3 0
## 9 1 0 1 3 0 3 0
## 10 1 0 1 0 1 0 1
## smoke smokemis alcohol alcoholmis everwkdy1 earnq4 earnq4mis pworky1
## 2 1 0 4 0 0 0.00000 0 51.923077
## 3 1 0 3 0 1 308.00000 0 21.153847
## 5 0 1 0 1 0 0.00000 0 13.461538
## 7 1 0 4 0 1 175.00000 0 65.384613
## 9 2 0 4 0 1 13.84615 0 1.923077
## 10 0 1 0 1 0 0.00000 0 0.000000
## pworky1mis health12 health12mis trainy1 trainy2 pworky2 pworky3 pworky4
## 2 0 2 0 0 0 26.92308 86.53846 65.38461
## 3 0 2 0 1 0 30.76923 26.92308 71.15385
## 5 0 3 0 1 0 0.00000 36.53846 11.53846
## 7 0 3 0 1 1 100.00000 32.69231 15.38461
## 9 0 3 0 1 0 44.23077 96.15385 100.00000
## 10 0 1 0 0 1 40.38462 59.61538 100.00000
## earny2 earny3 earny4 health30 health48
## 2 63.38769 238.99600 265.18832 2 2
## 3 95.60345 41.87040 216.83582 1 2
## 5 0.00000 52.75866 11.85605 3 2
## 7 285.62827 71.36455 18.29637 2 3
## 9 86.73231 247.79420 221.14769 2 2
## 10 116.47488 230.20717 671.80804 2 2
ITT_manual <- mean(JC$earny4[JC$assignment == 1], na.rm = TRUE) -
mean(JC$earny4[JC$assignment == 0], na.rm = TRUE)
print(ITT_manual)
## [1] 16.05513
ITT_model <- lm(earny4 ~ assignment, data = JC)
summary(ITT_model)
##
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -213.98 -164.65 -24.02 99.25 2211.98
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 197.926 3.212 61.620 < 2e-16 ***
## assignment 16.055 4.134 3.883 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared: 0.00163, Adjusted R-squared: 0.001522
## F-statistic: 15.08 on 1 and 9238 DF, p-value: 0.0001038
take_up_treatment <- mean(JC$trainy1[JC$assignment == 1], na.rm = TRUE)
take_up_control <- mean(JC$trainy1[JC$assignment == 0], na.rm = TRUE)
complier_share <- take_up_treatment - take_up_control
print(complier_share)
## [1] 0.3401906
complier_model <- lm(trainy1 ~ assignment, data = JC)
summary(complier_model)
##
## Call:
## lm(formula = trainy1 ~ assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8463 -0.5061 0.1537 0.1537 0.4939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.506143 0.006964 72.68 <2e-16 ***
## assignment 0.340191 0.008963 37.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4215 on 9238 degrees of freedom
## Multiple R-squared: 0.1349, Adjusted R-squared: 0.1348
## F-statistic: 1440 on 1 and 9238 DF, p-value: < 2.2e-16
LATE_manual <- ITT_manual / complier_share
print(LATE_manual)
## [1] 47.1945
library(AER)
## Warning: 程序包'AER'是用R版本4.4.3 来建造的
## 载入需要的程序包:car
## Warning: 程序包'car'是用R版本4.4.1 来建造的
## 载入需要的程序包:carData
## Warning: 程序包'carData'是用R版本4.4.1 来建造的
##
## 载入程序包:'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## 载入需要的程序包:lmtest
## Warning: 程序包'lmtest'是用R版本4.4.1 来建造的
## 载入需要的程序包:zoo
## Warning: 程序包'zoo'是用R版本4.4.1 来建造的
##
## 载入程序包:'zoo'
## The following objects are masked from 'package:data.table':
##
## yearmon, yearqtr
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## 载入需要的程序包:sandwich
## Warning: 程序包'sandwich'是用R版本4.4.1 来建造的
## 载入需要的程序包:survival
LATE_model <- ivreg(earny4 ~ trainy1 | assignment, data = JC)
summary(LATE_model)
##
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -221.23 -165.43 -22.55 100.01 2235.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 174.039 8.909 19.536 < 2e-16 ***
## trainy1 47.194 12.192 3.871 0.000109 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797, Adjusted R-squared: -0.004905
## Wald test: 14.98 on 1 and 9238 DF, p-value: 0.0001092
The LATE estimate from the 2SLS regression is 47.194, meaning that for compliers—individuals whose participation in training was influenced by random assignment—the JC program increased their yearly earnings by approximately $47.19 in the fourth year post-treatment. This effect is statistically significant (p = 0.000109), indicating strong evidence that the program had a positive and meaningful impact on earnings for those who were induced to participate due to the random assignment.
library(foreign)
library(AER)
library(stargazer)
library(texreg)
## Warning: 程序包'texreg'是用R版本4.4.3 来建造的
## Version: 1.39.4
## Date: 2024-07-23
## Author: Philip Leifeld (University of Manchester)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
##
## 载入程序包:'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
library(haven)
## Warning: 程序包'haven'是用R版本4.4.3 来建造的
data_path <- "C:/Users/mohan/Dropbox/Mohan_files/530/PS4/Card1995.dta"
nls <- read_dta(data_path)
# OLS Regression (Baseline)
# age76−ed76−6
nls$exp <- nls$age76 - nls$ed76 -6
ols_model <- lm(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(ols_model)
##
## Call:
## lm(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black + reg76r +
## smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.59297 -0.22315 0.01893 0.24223 1.33190
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.733664 0.067603 70.022 < 2e-16 ***
## ed76 0.074009 0.003505 21.113 < 2e-16 ***
## exp 0.083596 0.006648 12.575 < 2e-16 ***
## I(exp^2/100) -0.224088 0.031784 -7.050 2.21e-12 ***
## black -0.189632 0.017627 -10.758 < 2e-16 ***
## reg76r -0.124862 0.015118 -8.259 < 2e-16 ***
## smsa76r 0.161423 0.015573 10.365 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3742 on 3003 degrees of freedom
## (因为不存在,603个观察量被删除了)
## Multiple R-squared: 0.2905, Adjusted R-squared: 0.2891
## F-statistic: 204.9 on 6 and 3003 DF, p-value: < 2.2e-16
The coefficient on ed76 measures the correlation between schooling and wages, but it may be biased due to endogeneity
#First-Stage Regression
first_stage <- lm(ed76 ~ nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(first_stage)
##
## Call:
## lm(formula = ed76 ~ nearc4 + exp + I(exp^2/100) + black + reg76r +
## smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6389 -1.4325 -0.1028 1.3268 6.2332
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.53964 0.16286 101.559 < 2e-16 ***
## nearc4 0.30628 0.07666 3.995 6.59e-05 ***
## exp -0.35881 0.03040 -11.805 < 2e-16 ***
## I(exp^2/100) -0.21620 0.14590 -1.482 0.138
## black -1.03873 0.08358 -12.428 < 2e-16 ***
## reg76r -0.32964 0.07385 -4.464 8.29e-06 ***
## smsa76r 0.39091 0.07788 5.019 5.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.982 on 3606 degrees of freedom
## Multiple R-squared: 0.4813, Adjusted R-squared: 0.4805
## F-statistic: 557.7 on 6 and 3606 DF, p-value: < 2.2e-16
The coefficient on nearc4 is positive and significant, indicating that individuals near a 4-year college tend to get more schooling.
# IV Regression Using nearc4 as an Instrument for ed76
iv_model <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r |
nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(iv_model)
## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
## arithmetic operators in their names;
## the printed representation of the hypothesis will be omitted
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black +
## reg76r + smsa76r | nearc4 + exp + I(exp^2/100) + black +
## reg76r + smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82125 -0.24065 0.02368 0.25469 1.43205
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.75278 0.82934 4.525 6.27e-06 ***
## ed76 0.13229 0.04923 2.687 0.00725 **
## exp 0.10750 0.02130 5.047 4.76e-07 ***
## I(exp^2/100) -0.22841 0.03341 -6.836 9.84e-12 ***
## black -0.13080 0.05287 -2.474 0.01342 *
## reg76r -0.10490 0.02307 -4.546 5.67e-06 ***
## smsa76r 0.13132 0.03013 4.359 1.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.391 on 3003 degrees of freedom
## Multiple R-Squared: 0.2252, Adjusted R-squared: 0.2237
## Wald test: 120.8 on 6 and 3003 DF, p-value: < 2.2e-16
# Reduced-Form Regression
reduced_form <- lm(lwage76 ~ nearc4 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
summary(reduced_form)
##
## Call:
## lm(formula = lwage76 ~ nearc4 + exp + I(exp^2/100) + black +
## reg76r + smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.56525 -0.24771 0.01465 0.27091 1.38743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.956604 0.036371 163.775 < 2e-16 ***
## nearc4 0.044624 0.017011 2.623 0.00876 **
## exp 0.053258 0.006948 7.666 2.38e-14 ***
## I(exp^2/100) -0.218720 0.034021 -6.429 1.49e-10 ***
## black -0.263903 0.018485 -14.277 < 2e-16 ***
## reg76r -0.143458 0.016336 -8.782 < 2e-16 ***
## smsa76r 0.184752 0.017503 10.555 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4005 on 3003 degrees of freedom
## (因为不存在,603个观察量被删除了)
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1854
## F-statistic: 115.2 on 6 and 3003 DF, p-value: < 2.2e-16
iv_model2 <- ivreg(
formula = lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r |
nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r,
data = nls
)
summary(iv_model2)
## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
## arithmetic operators in their names;
## the printed representation of the hypothesis will be omitted
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black +
## reg76r + smsa76r | nearc4a + nearc4b + exp + I(exp^2/100) +
## black + reg76r + smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.93985 -0.25152 0.01722 0.27365 1.48154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.26801 0.68718 4.756 2.07e-06 ***
## ed76 0.16109 0.04077 3.951 7.96e-05 ***
## exp 0.11931 0.01818 6.564 6.16e-11 ***
## I(exp^2/100) -0.23054 0.03503 -6.582 5.46e-11 ***
## black -0.10173 0.04531 -2.245 0.0248 *
## reg76r -0.09504 0.02165 -4.389 1.18e-05 ***
## smsa76r 0.11645 0.02705 4.305 1.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4108 on 3003 degrees of freedom
## Multiple R-Squared: 0.1447, Adjusted R-squared: 0.143
## Wald test: 111 on 6 and 3003 DF, p-value: < 2.2e-16
The IV regression shows that education (ed76
) has a
positive and statistically significant causal effect on wages. Urban
residence (smsa76r
) increases wages, while living in the
South (reg76r
) lowers them. Experience exhibits the
expected concave relationship with wages, and being Black is associated
with significantly lower wages.
library(AER)
library(modelsummary)
## Warning: 程序包'modelsummary'是用R版本4.4.1 来建造的
## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
## backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
##
## Revert to `kableExtra` for one session:
##
## options(modelsummary_factory_default = 'kableExtra')
## options(modelsummary_factory_latex = 'kableExtra')
## options(modelsummary_factory_html = 'kableExtra')
##
## Silence this message forever:
##
## config_modelsummary(startup_message = FALSE)
modelsummary(
list(
"First Stage" = first_stage,
"Reduced Form" = reduced_form,
"IV 2SLS" = iv_model2
),
title = "IV Analysis",
statistic = "({statistic}) {p.value}",
stars = TRUE
)
First Stage | Reduced Form | IV 2SLS | |
---|---|---|---|
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | |||
(Intercept) | 16.540*** | 5.957*** | 3.268*** |
(101.559) <0.001 | (163.775) <0.001 | (4.756) <0.001 | |
nearc4 | 0.306*** | 0.045** | |
(3.995) <0.001 | (2.623) 0.009 | ||
exp | -0.359*** | 0.053*** | 0.119*** |
(-11.805) <0.001 | (7.666) <0.001 | (6.564) <0.001 | |
I(exp^2/100) | -0.216 | -0.219*** | -0.231*** |
(-1.482) 0.138 | (-6.429) <0.001 | (-6.582) <0.001 | |
black | -1.039*** | -0.264*** | -0.102* |
(-12.428) <0.001 | (-14.277) <0.001 | (-2.245) 0.025 | |
reg76r | -0.330*** | -0.143*** | -0.095*** |
(-4.464) <0.001 | (-8.782) <0.001 | (-4.389) <0.001 | |
smsa76r | 0.391*** | 0.185*** | 0.116*** |
(5.019) <0.001 | (10.555) <0.001 | (4.305) <0.001 | |
ed76 | 0.161*** | ||
(3.951) <0.001 | |||
Num.Obs. | 3613 | 3010 | 3010 |
R2 | 0.481 | 0.187 | 0.145 |
R2 Adj. | 0.480 | 0.185 | 0.143 |
AIC | 15205.5 | 3043.1 | 3196.0 |
BIC | 15255.0 | 3091.2 | 3244.0 |
Log.Lik. | -7594.738 | -1513.546 | |
F | 557.743 | 115.164 | |
RMSE | 1.98 | 0.40 | 0.41 |
In the first-stage regression, (nearc4
) has a
significantly positive effect on education, indicating that it is a
strong instrumental variable. In the reduced-form regression,
(nearc4
) also has a significantly positive effect on wages.
In the IV (2SLS) regression, the coefficient on (ed76
) is
0.161 and statistically significant, suggesting that education has a
significant causal effect on wages. This supports the conclusion that
education is endogenous in the wage regression, and using
(nearc4
) as an instrument provides a more reliable estimate
of the return to schooling.
Test if Education (ed76) is Endogenous
library(plm)
## Warning: 程序包'plm'是用R版本4.4.3 来建造的
##
## 载入程序包:'plm'
## The following object is masked from 'package:data.table':
##
## between
## The following objects are masked from 'package:dplyr':
##
## between, lag, lead
library(lmtest)
library(AER)
ols_model <- lm(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
iv_model <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r |
nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r,
data = nls)
residuals_iv <- residuals(lm(ed76 ~ nearc4a + nearc4b + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls))
# Run OLS again with residuals as an additional regressor
hausman_test <- lm(lwage76 ~ ed76 + residuals_iv + exp + I(exp^2 / 100) + black + reg76r + smsa76r, data = nls)
# Check if residuals are significant
summary(hausman_test)
##
## Call:
## lm(formula = lwage76 ~ ed76 + residuals_iv + exp + I(exp^2/100) +
## black + reg76r + smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.60480 -0.22131 0.02145 0.24337 1.31871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.15091 0.66535 4.736 2.28e-06 ***
## ed76 0.16887 0.03982 4.240 2.30e-05 ***
## residuals_iv -0.09565 0.04000 -2.391 0.0169 *
## exp 0.11745 0.01564 7.510 7.74e-14 ***
## I(exp^2/100) -0.20269 0.03300 -6.143 9.18e-10 ***
## black -0.09071 0.04496 -2.017 0.0437 *
## reg76r -0.08934 0.02119 -4.217 2.55e-05 ***
## smsa76r 0.11458 0.02502 4.580 4.84e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3739 on 3002 degrees of freedom
## (因为不存在,603个观察量被删除了)
## Multiple R-squared: 0.2919, Adjusted R-squared: 0.2902
## F-statistic: 176.7 on 7 and 3002 DF, p-value: < 2.2e-16
Yes, the result indicates that education is endogenous in the wage regression. The coefficient on the IV residual (residuals_iv) is statistically significant (p = 0.0169), which suggests that the OLS estimate of education is correlated with the error term and therefore biased due to endogeneity. In contrast, experience (exp) remains significant and its coefficient does not suffer from such bias, suggesting that experience is not endogenous in this model.
nls$nearc4a_age <- nls$nearc4a * nls$age76
nls$nearc4a_age2 <- nls$nearc4a * (nls$age76^2 / 100)
iv_model_interact <- ivreg(lwage76 ~ ed76 + exp + I(exp^2 / 100) + black + reg76r + smsa76r |
nearc4a + nearc4b + nearc4a_age + nearc4a_age2 + exp + I(exp^2 / 100) + black + reg76r + smsa76r,
data = nls)
summary(iv_model_interact)
## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
## arithmetic operators in their names;
## the printed representation of the hypothesis will be omitted
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + I(exp^2/100) + black +
## reg76r + smsa76r | nearc4a + nearc4b + nearc4a_age + nearc4a_age2 +
## exp + I(exp^2/100) + black + reg76r + smsa76r, data = nls)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.61638 -0.22444 0.02206 0.24233 1.34656
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.590107 0.106727 43.008 < 2e-16 ***
## ed76 0.082539 0.006030 13.688 < 2e-16 ***
## exp 0.087094 0.006952 12.529 < 2e-16 ***
## I(exp^2/100) -0.224720 0.031817 -7.063 2.02e-12 ***
## black -0.181022 0.018325 -9.878 < 2e-16 ***
## reg76r -0.121940 0.015226 -8.009 1.64e-15 ***
## smsa76r 0.157018 0.015793 9.942 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3746 on 3003 degrees of freedom
## Multiple R-Squared: 0.2891, Adjusted R-squared: 0.2877
## Wald test: 161.6 on 6 and 3003 DF, p-value: < 2.2e-16
Compared to the previous IV model using only nearc4a and nearc4b as instruments, the results from this extended model—which includes interactions between nearc4a and age76 and its square (nearc4_age, nearc4_age2)—show a slight decrease in the coefficient on ed76 (from 0.161 to 0.0852), though it remains highly statistically significant (p < 2e-16). This indicates that when allowing for heterogeneous effects of the instruments (by interacting with age), the estimated return to education becomes more conservative. The other coefficients remain qualitatively similar, suggesting that the core relationships are robust, but the strength of the instrumented effect of education on wages becomes less pronounced when accounting for potential effect modification by age.
library(AER)
library(car)
hansen_test <- summary(iv_model_interact, diagnostics = TRUE)
## Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
## arithmetic operators in their names;
## the printed representation of the hypothesis will be omitted
print(hansen_test$diagnostics)
## df1 df2 statistic p-value
## Weak instruments 4 3000 384.01518 2.345667e-267
## Wu-Hausman 1 3002 3.03355 8.166180e-02
## Sargan 3 NA 10.97760 1.184763e-02
I examine the Wu-Hausman test result. The null hypothesis of this test is that ed76 is exogenous (i.e., OLS is consistent), while the alternative is that ed76 is endogenous (i.e., IV is required). The p-value for the Wu-Hausman test is 0.0817, which is marginally above the typical 5% significance threshold but below 10%. This suggests weak evidence against exogeneity—we cannot strongly reject the null at the 5% level, but there may still be endogeneity concerns at the 10% level. Thus, while the evidence is not definitive, it leans toward treating ed76 as potentially endogenous.