library(tidyverse)
library(scales)
options(scipen=999)

Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict wages using the following predictors: 1) years of education, 2) years of experience, and 3) sex.

Hint: The variables are available in the CPS85 data set from the mosaicData package.

data(CPS85, package = "mosaicData")

wages_lm <- lm(wage ~ sex + exper + educ,
                data = CPS85)

# View summary of model 1
summary(wages_lm)
## 
## Call:
## lm(formula = wage ~ sex + exper + educ, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.571 -2.746 -0.653  1.893 37.724 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) -6.50451    1.20985  -5.376      0.0000001141795 ***
## sexM         2.33763    0.38806   6.024      0.0000000031877 ***
## exper        0.11330    0.01671   6.781      0.0000000000319 ***
## educ         0.94051    0.07886  11.926 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared:  0.2532, Adjusted R-squared:  0.2489 
## F-statistic: 59.88 on 3 and 530 DF,  p-value: < 0.00000000000000022

Q2 Is the coefficient of education statistically significant at 5%?

The coefficient of education is statistically significant at 5 because the p value of education is less than 5%

Q3 Interpret the coefficient of education.

Hint: Discuss both its sign and magnitude.

For every change in an additional unit of ‘education’ which is in years an additional 94 cents per hour will be added to the wage.

Q4 Is there evidence for gender discrimination in wages? Make your argument using the relevant test results.

Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude.

There is evidence for gender discrimination in wages because the sign of the coefficient is positive, with male gender being statistically significant and the magnitude of the coefficient being over 2.

Q5 Predict wage for a woman who has 15 years of education, 5 years of experience.

The predicated wage of a women with 15 years of education and 5 years of experience would be 8.15 dollars per hour. You get this number by multiplying the two coefficients by the years and then subtracting the intercept.

Q6 Interpret the Intercept.

Hint: Provide a technical interpretation.

When all predictors are at 0 the intercept is the value the wage will become, which is -6.50451 dollars per hour.

Q7 Build another model by adding a predictor to the model above. The additional predictor is whether the person is a union member. Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

data(CPS85, package = "mosaicData")

wages_lm <- lm(wage ~ sex + exper + educ +
                 union,
                data = CPS85)

# View summary of model 1
summary(wages_lm)
## 
## Call:
## lm(formula = wage ~ sex + exper + educ + union, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.496 -2.708 -0.712  1.909 37.784 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) -6.48023    1.20159  -5.393        0.00000010459 ***
## sexM         2.14765    0.39097   5.493        0.00000006145 ***
## exper        0.10692    0.01674   6.387        0.00000000037 ***
## educ         0.93495    0.07835  11.934 < 0.0000000000000002 ***
## unionUnion   1.47111    0.50932   2.888              0.00403 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared:  0.2648, Adjusted R-squared:  0.2592 
## F-statistic: 47.62 on 4 and 529 DF,  p-value: < 0.00000000000000022

The second model is better because the difference between the actual wage per hour and the predicted wage per hour is smaller shown to us by the Residual Standard Error. This model can also explain more variability in the model with an adjusted r-squared of 25.92%.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.