\(~\) \(~\) \(~\)
\(~\)
data <- read.table("./Week 4 - Test.txt", header = TRUE, sep = ",")
data$exper2 <- data$exper^2
reg <- lm(logw ~ educ + exper + exper2 + smsa + south, data = data)
summary(reg)
##
## Call:
## lm(formula = logw ~ educ + exper + exper2 + smsa + south, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.71487 -0.22987 0.02268 0.24898 1.38552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6110144 0.0678950 67.914 < 2e-16 ***
## educ 0.0815797 0.0034990 23.315 < 2e-16 ***
## exper 0.0838357 0.0067735 12.377 < 2e-16 ***
## exper2 -0.0022021 0.0003238 -6.800 1.26e-11 ***
## smsa 0.1508006 0.0158360 9.523 < 2e-16 ***
## south -0.1751761 0.0146486 -11.959 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3813 on 3004 degrees of freedom
## Multiple R-squared: 0.2632, Adjusted R-squared: 0.2619
## F-statistic: 214.6 on 5 and 3004 DF, p-value: < 2.2e-16
\(~\)
The estimated β2 coefficient indicates that each one-unit increase in the education level is accompanied by an increase of 0.0815797 - or 8% - in the predicted value of log wage, with all other variables remaing constant. This does not, however, necessarily mean that there’s a causal relation between both variables, given the possibility of endogeneity in the model.
\(~\)
\(~\)
This may be the case because factors like motivation, work-ethic, efficiency and intelligence are all variables that can affect both the education level and wage of a person, but are not present in the model. The coefficients of part A thus lose their usefulness, since they cannot properly estimate causal effects.
\(~\)
\(~\)
The increase of age is directly related to an increase in years of working experience, but not to an increase in wage, meaning it’s adequate as an instrumental variable.
\(~\)
\(~\)
data$age2 <- data$age^2
reg2 <- lm(formula = educ ~ age + age2 + smsa + south + nearc + daded + momed, data = data)
summary(reg2)
##
## Call:
## lm(formula = educ ~ age + age2 + smsa + south + nearc + daded +
## momed, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2777 -1.5450 -0.2224 1.6957 7.2250
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.652354 3.976343 -1.421 0.155277
## age 0.989610 0.278714 3.551 0.000390 ***
## age2 -0.017019 0.004838 -3.518 0.000441 ***
## smsa 0.529566 0.101504 5.217 1.94e-07 ***
## south -0.424851 0.091037 -4.667 3.19e-06 ***
## nearc 0.264554 0.099085 2.670 0.007626 **
## daded 0.190443 0.015611 12.199 < 2e-16 ***
## momed 0.234515 0.017028 13.773 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.326 on 3002 degrees of freedom
## Multiple R-squared: 0.2466, Adjusted R-squared: 0.2448
## F-statistic: 140.4 on 7 and 3002 DF, p-value: < 2.2e-16
\(~\)
These values indicate that the added variables are suitable instruments for education, with the most significant ones being education of the father and education of the mother.
\(~\)
\(~\)
reg3 <- ivreg(logw ~ smsa + south | educ + exper + exper2 | age + age2 + nearc + daded + momed, data = data)
summary(reg3)
##
## Call:
## ivreg(formula = logw ~ smsa + south | educ + exper + exper2 |
## age + age2 + nearc + daded + momed, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7494 -0.2360 0.0266 0.2498 1.3468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4169039 0.1154208 38.268 < 2e-16 ***
## educ 0.0998429 0.0065738 15.188 < 2e-16 ***
## exper 0.0728669 0.0167134 4.360 1.35e-05 ***
## exper2 -0.0016393 0.0008381 -1.956 0.0506 .
## smsa 0.1349370 0.0167695 8.047 1.21e-15 ***
## south -0.1589869 0.0156854 -10.136 < 2e-16 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments (educ) 5 3002 145.511 < 2e-16 ***
## Weak instruments (exper) 5 3002 1257.258 < 2e-16 ***
## Weak instruments (exper2) 5 3002 1098.430 < 2e-16 ***
## Wu-Hausman 2 3002 5.709 0.00335 **
## Sargan 2 NA 3.702 0.15705
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3844 on 3004 degrees of freedom
## Multiple R-Squared: 0.2512, Adjusted R-squared: 0.2499
## Wald test: 175.9 on 5 and 3004 DF, p-value: < 2.2e-16
\(~\)
Education now has a higher impact on the level of wages and years of experience a lower one, meaning their effects were being respectively under- and overestimated.
\(~\)
\(~\)
As seen in item above, the sargan test results in a statistic of 3.702 and a p-value of 0.15705, meaning that the instruments are validly exogenous.
\(~\)