Make sure to include the unit of the values whenever appropriate.
Hint: The variables are available in the CPS85 data set from the mosaicData package.
data(CPS85, package="mosaicData")
wage_predict <- lm(wage ~ educ + exper + sex,
data = CPS85)
# View summary of model 1
summary(wage_predict)
##
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.571 -2.746 -0.653 1.893 37.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.50451 1.20985 -5.376 1.14e-07 ***
## educ 0.94051 0.07886 11.926 < 2e-16 ***
## exper 0.11330 0.01671 6.781 3.19e-11 ***
## sexM 2.33763 0.38806 6.024 3.19e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared: 0.2532, Adjusted R-squared: 0.2489
## F-statistic: 59.88 on 3 and 530 DF, p-value: < 2.2e-16
No, the coefficient of education is statistically significant at 5%. This is given by the P value whcih is greater than 5%.
Hint: Discuss both its sign and magnitude. The coefficient of education is 0.94051. This means that an increase in 1 level of education would increase the given wage by that coefficient. So for every year of education, you would earn en extra 94 cents in your hourly wage.
Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude. Yes, there is evidence of gender discrimination in wages. This is given by the amount of the coefficient (2.33763), the number of stars at the end (***) and the P value. All of these incdicate that gender is a relevant factor in predicting wages, which implies gender discrimination. The males significange is also higher than the females.
If a woman had 15 years of education and 5 years of experience, she would make $8.15 an hour.
Hint: Provide a technical interpretation. The intercept is -6.50451, which means that if all other variables were zero, the wage would be that amount (about negative $6.50 and hour). In reality it is impossible for the wage to be negative, so the intercept doesnโt really mean much in this case.
Hint: Discuss in terms of both residual standard error and reported adjusted R squared.
data(CPS85, package="mosaicData")
wage_predict <- lm(wage ~ educ + exper + sex + union,
data = CPS85)
# View summary of model 1
summary(wage_predict)
##
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.496 -2.708 -0.712 1.909 37.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.48023 1.20159 -5.393 1.05e-07 ***
## educ 0.93495 0.07835 11.934 < 2e-16 ***
## exper 0.10692 0.01674 6.387 3.70e-10 ***
## sexM 2.14765 0.39097 5.493 6.14e-08 ***
## unionUnion 1.47111 0.50932 2.888 0.00403 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared: 0.2648, Adjusted R-squared: 0.2592
## F-statistic: 47.62 on 4 and 529 DF, p-value: < 2.2e-16
In the first model, the adjusted R-squared was 0.2489 while in the second model it was 0.2592. The resdiual standard error in the first model was 4.454 while in the second model it was 4.423. Because the change in both of these values was so small, one model is not distinctly better than the other, but if one had to be chosen it would be the new model.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.