library(tidyverse)
library(scales)
options(scipen = 999)
Make sure to include the unit of the values whenever appropriate.
Hint: The variables are available in the CPS85 data set from the mosaicData package.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex,
data = CPS85)
# View summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.571 -2.746 -0.653 1.893 37.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.50451 1.20985 -5.376 0.0000001141795 ***
## educ 0.94051 0.07886 11.926 < 0.0000000000000002 ***
## exper 0.11330 0.01671 6.781 0.0000000000319 ***
## sexM 2.33763 0.38806 6.024 0.0000000031877 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared: 0.2532, Adjusted R-squared: 0.2489
## F-statistic: 59.88 on 3 and 530 DF, p-value: < 0.00000000000000022
The coefficent of education is statistically signifigant at 5% because 0.0000000000000002 is much lower than 5. You can see it in the data set of CPS85 that the pvalue of eductation is less than 5%.
Hint: Discuss both its sign and magnitude.
The coefficient education displays that for every additonal year of educaton you recieve you earn 94 more cents an hour to your wage.
Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude.
If you take a closer look you can see that there is gender discrimination in wages. In the chart you can easily see that men are paid more than women by a good amount. The men have a $2.01 greater wage than the females wages.
If you do some calculations the prediction of a womens wage who has 15 years of education and 5 years of work experience would be $8.15/hr.
Hint: Provide a technical interpretation.
When all predictions are at the value 0 the intercept is showing the value of the wages paid.
Hint: Discuss in terms of both residual standard error and reported adjusted R squared.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex + union,
data = CPS85)
# View summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.496 -2.708 -0.712 1.909 37.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.48023 1.20159 -5.393 0.00000010459 ***
## educ 0.93495 0.07835 11.934 < 0.0000000000000002 ***
## exper 0.10692 0.01674 6.387 0.00000000037 ***
## sexM 2.14765 0.39097 5.493 0.00000006145 ***
## unionUnion 1.47111 0.50932 2.888 0.00403 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared: 0.2648, Adjusted R-squared: 0.2592
## F-statistic: 47.62 on 4 and 529 DF, p-value: < 0.00000000000000022
The second model is much better than the first one we created. If you look at the standard error, it is lower on th second model than the first and R Squared is higher in the second model than in the first.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.