library(tidyverse)
library(scales)
options(scipen=999)
Make sure to include the unit of the values whenever appropriate.
Hint: The variables are available in the CPS85 data set from the mosaicData package.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex,
data = CPS85)
# View summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.571 -2.746 -0.653 1.893 37.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.50451 1.20985 -5.376 0.0000001141795 ***
## educ 0.94051 0.07886 11.926 < 0.0000000000000002 ***
## exper 0.11330 0.01671 6.781 0.0000000000319 ***
## sexM 2.33763 0.38806 6.024 0.0000000031877 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared: 0.2532, Adjusted R-squared: 0.2489
## F-statistic: 59.88 on 3 and 530 DF, p-value: < 0.00000000000000022
The coefficient of education is statistically significant at 5% because the p value of education is less than 5%.
Hint: Discuss both its sign and magnitude. For every change is an additional unit of “education” which is in years, an additional 94 cents per hour will be added to wage.
Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude. There is evidence for gender discrimination in wages because the sign of the coefficent is positive, with male gender being statistically significant and the magnitude of the coeifficient is over 2.
The predicted wage of a women with 15 years of education and 5 years of experience would be 0.15. This number is achieved by multiplying the two coefficents by the years and then subtracting the intercept.
Hint: Provide a technical interpretation. When all predictors are at zero the intercept is the value the wage will become, which is -6.50451 dollars per hour.
Hint: Discuss in terms of both residual standard error and reported adjusted R squared.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex + union,
data = CPS85)
# View summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.496 -2.708 -0.712 1.909 37.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.48023 1.20159 -5.393 0.00000010459 ***
## educ 0.93495 0.07835 11.934 < 0.0000000000000002 ***
## exper 0.10692 0.01674 6.387 0.00000000037 ***
## sexM 2.14765 0.39097 5.493 0.00000006145 ***
## unionUnion 1.47111 0.50932 2.888 0.00403 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared: 0.2648, Adjusted R-squared: 0.2592
## F-statistic: 47.62 on 4 and 529 DF, p-value: < 0.00000000000000022
The second model is better because the residual standard error is lower meaning that the data is more accurate because the model predicted closer to the actual wage in the data. The second adjusted r squared is larger than the first model meaning more percentage of the data can be explained through model 2. ## Q8 Hide the messages, but display the code and its results on the webpage. Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.