Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict wages using the following predictors: 1) years of education, 2) years of experience, and 3) sex.

Hint: The variables are available in the CPS85 data set from the mosaicData package.

data(CPS85, package="mosaicData")
wage_predict <- lm(wage ~ educ + exper + sex, 
                data = CPS85)

# View summary of model 1
summary(wage_predict)
## 
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.571 -2.746 -0.653  1.893 37.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.50451    1.20985  -5.376 1.14e-07 ***
## educ         0.94051    0.07886  11.926  < 2e-16 ***
## exper        0.11330    0.01671   6.781 3.19e-11 ***
## sexM         2.33763    0.38806   6.024 3.19e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared:  0.2532, Adjusted R-squared:  0.2489 
## F-statistic: 59.88 on 3 and 530 DF,  p-value: < 2.2e-16

Q2 Is the coefficient of education statistically significant at 5%?

No, the coefficient of education is statistically significant at 5%. This is given by the P value whcih is greater than 5%.

Q3 Interpret the coefficient of education.

Hint: Discuss both its sign and magnitude. The coefficient of education is 0.94051. This means that an increase in 1 level of education would increase the given wage by that coefficient. So for every year of education, you would earn en extra 94 cents in your hourly wage.

Q4 Is there evidence for gender discrimination in wages? Make your argument using the relevant test results.

Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude. Yes, there is evidence of gender discrimination in wages. This is given by the amount of the coefficient (2.33763), the number of stars at the end (***) and the P value. All of these incdicate that gender is a relevant factor in predicting wages, which implies gender discrimination. The males significange is also higher than the females.

Q5 Predict wage for a woman who has 15 years of education, 5 years of experience.

If a woman had 15 years of education and 5 years of experience, she would make $8.15 an hour.

Q6 Interpret the Intercept.

Hint: Provide a technical interpretation. The intercept is -6.50451, which means that if all other variables were zero, the wage would be that amount (about negative $6.50 and hour). In reality it is impossible for the wage to be negative, so the intercept doesnโ€™t really mean much in this case.

Q7 Build another model by adding a predictor to the model above. The additional predictor is whether the person is a union member. Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

data(CPS85, package="mosaicData")
wage_predict <- lm(wage ~ educ + exper + sex + union, 
                data = CPS85)

# View summary of model 1
summary(wage_predict)
## 
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.496 -2.708 -0.712  1.909 37.784 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.48023    1.20159  -5.393 1.05e-07 ***
## educ         0.93495    0.07835  11.934  < 2e-16 ***
## exper        0.10692    0.01674   6.387 3.70e-10 ***
## sexM         2.14765    0.39097   5.493 6.14e-08 ***
## unionUnion   1.47111    0.50932   2.888  0.00403 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared:  0.2648, Adjusted R-squared:  0.2592 
## F-statistic: 47.62 on 4 and 529 DF,  p-value: < 2.2e-16

In the first model, the adjusted R-squared was 0.2489 while in the second model it was 0.2592. The resdiual standard error in the first model was 4.454 while in the second model it was 4.423. Because the change in both of these values was so small, one model is not distinctly better than the other, but if one had to be chosen it would be the new model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.