6.27 People often claim that an extra year of work experience raises wages by 0.8% and that a year of education is equivalent to 14 years of experience. If that’s true, then an additional year of education should increase wages by 11.2%. In this analysis, we’ll use data from cps5_small to put this idea to the test. The only observations utilized will be those in which years of education exceeds 7. Perform all tests at a 5% level of significance.
wage_data <- subset(cps5_small, educ > 7)
model <- lm(log(wage) ~ educ + exper, data = wage_data)
linearHypothesis(model, c("educ = 0", "exper = 0"), test = "F")
##
## Linear hypothesis test:
## educ = 0
## exper = 0
##
## Model 1: restricted model
## Model 2: log(wage) ~ educ + exper
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1177 368.38
## 2 1175 261.86 2 106.52 239 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is significantly less than 0.05, which means we can reject the null hypothesis. This suggests that EDUC and EXPER jointly impact wages. The F-statistic is large, which indicates that adding these predictors improves the model fit. The reduction in the residual sum of squares (RSS) suggests the model explains a significant portion of the variation in wages.
(b)Use RESET to test the adequacy of the model; perform the test with the squares of the predictions and the squares and cubes of the predictions.
# Calculate the squared and cubed predictions
wage_data$predictions <- fitted(model)
wage_data$predictions_squared <- wage_data$predictions^2
wage_data$predictions_cubed <- wage_data$predictions^3
# Estimate a model with the squared and cubed predictions
reset_model <- lm(log(wage) ~ educ + exper + predictions_squared + predictions_cubed, data = wage_data)
# RESET test
linearHypothesis(reset_model, c("predictions_squared = 0", "predictions_cubed = 0"), test = "F")
##
## Linear hypothesis test:
## predictions_squared = 0
## predictions_cubed = 0
##
## Model 1: restricted model
## Model 2: log(wage) ~ educ + exper + predictions_squared + predictions_cubed
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1175 261.86
## 2 1173 259.85 2 2.0121 4.5415 0.01085 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value from the RESET test is 0.01085, which is below 0.05, meaning we reject the null hypothesis that the model is correctly specified. This suggests that there might be some omitted variable bias or a need for a more flexible functional form.
educ_level <- 10
exper_level <- 5
# Estimate the extended model
extended_model <- lm(log(wage) ~ educ + exper + I(educ^2) + I(exper^2) +
I(educ * exper), data = wage_data)
# Joint hypothesis test for each level
hypothesis <- c(
paste0("educ + ", 2 * educ_level, " * I(educ^2) + ", exper_level,
" *I(educ * exper)=0"),
paste0("exper + ", 2 * exper_level, " * I(exper^2) + ", educ_level,
" *I(educ * exper)=0")
)
test_result <- linearHypothesis(extended_model, hypothesis, test = "F")
print(test_result)
##
## Linear hypothesis test:
## educ + 20 * I(educ^2) + 5 *I(educ * exper) = 0
## exper + 10 * I(exper^2) + 10 *I(educ * exper) = 0
##
## Model 1: restricted model
## Model 2: log(wage) ~ educ + exper + I(educ^2) + I(exper^2) + I(educ *
## exper)
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1174 273.34
## 2 1172 252.71 2 20.625 47.827 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The joint hypothesis test results indicate a p-value much lower than 0.05, which means we reject the null hypothesis. This suggests that EDUC and EXPER have nonlinear effects on wages. The inclusion of quadratic terms and interaction terms provides a significantly better fit than the simpler model.