Quiz 4

Q1 Build a regression model to predict wages using the following predictors: 1) years of education, 2) years of experience, and 3) sex.
Q2 Is the coefficient of education statistically significant at 5%?
Q3 Interpret the coefficient of education.
Q4 Is there evidence for gender discrimination in wages? Make your argument using the relevant test results.
Q5 Predict wage for a woman who has 15 years of education, 5 years of experience.
Q6 Interpret the Intercept.
Q7 Build another model by adding a predictor to the model above. The additional predictor is whether the person is a union member. Which of the two models is better?
Q8 Hide the messages, but display the code and its results on the webpage.
Q9 Display the title and your name correctly at the top of the webpage.
Q10 Use the correct slug.

Make sure to include the unit of the values whenever appropriate.

Q1 Build a regression model to predict wages using the following predictors: 1) years of education, 2) years of experience, and 3) sex.

Hint: The variables are available in the CPS85 data set from the mosaicData package.

library(tidyverse)

data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex,
                data = CPS85)

# View summary of model 1
summary(wages_lm)
## 
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.571 -2.746 -0.653  1.893 37.724 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.50451    1.20985  -5.376 1.14e-07 ***
## educ         0.94051    0.07886  11.926  < 2e-16 ***
## exper        0.11330    0.01671   6.781 3.19e-11 ***
## sexM         2.33763    0.38806   6.024 3.19e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared:  0.2532, Adjusted R-squared:  0.2489 
## F-statistic: 59.88 on 3 and 530 DF,  p-value: < 2.2e-16

Q2 Is the coefficient of education statistically significant at 5%?

Since the coeffiecient of education has a p-value > 5%, it is statistically significant at the 5% significance level.

Q3 Interpret the coefficient of education.

Hint: Discuss both its sign and magnitude.

The coefficient of education has high significance at the .1% significance level, meaning that we are 99.9% condfident that education has an influence on the wages of individuals.

Q4 Is there evidence for gender discrimination in wages? Make your argument using the relevant test results.

Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude.

Yes, males are more likely to have higher wages than females because the coefficient sexM having higher wages has high significance at the .1% significance level, meaning that we are 99.9% confident that being a male has an influence on earning higher wages. Based on this finding, there is evidence of gender discrimination favoring males having higher wages than females.

Q5 Predict wage for a woman who has 15 years of education, 5 years of experience.

The average wage of a female with 15 years of education and 5 years of experience is 8.15.

Q6 Interpret the Intercept.

Hint: Provide a technical interpretation.

The intercept is highly significant at the .1% significance level, meaning that we are 99.9% confident that it has an influence on wages.

Q7 Build another model by adding a predictor to the model above. The additional predictor is whether the person is a union member. Which of the two models is better?

Hint: Discuss in terms of both residual standard error and reported adjusted R squared.

library(tidyverse)

data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex + union,
                data = CPS85)

# View summary of model 1
summary(wages_lm)
## 
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.496 -2.708 -0.712  1.909 37.784 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.48023    1.20159  -5.393 1.05e-07 ***
## educ         0.93495    0.07835  11.934  < 2e-16 ***
## exper        0.10692    0.01674   6.387 3.70e-10 ***
## sexM         2.14765    0.39097   5.493 6.14e-08 ***
## unionUnion   1.47111    0.50932   2.888  0.00403 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared:  0.2648, Adjusted R-squared:  0.2592 
## F-statistic: 47.62 on 4 and 529 DF,  p-value: < 2.2e-16

The residual standard error is 4.423, meaning that the actual and predicted wages have a difference of 4.423. The adjusted r-squared model is .2593, meaning that 25.93% of variability in wages of individuals is reported by the model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.