library(tidyverse)
library(scales)
options(scipen = 999)
Make sure to include the unit of the values whenever appropriate.
Hint: The variables are available in the CPS85 data set from the mosaicData package.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex,
data = CPS85)
#View Summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.571 -2.746 -0.653 1.893 37.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.50451 1.20985 -5.376 0.0000001141795 ***
## educ 0.94051 0.07886 11.926 < 0.0000000000000002 ***
## exper 0.11330 0.01671 6.781 0.0000000000319 ***
## sexM 2.33763 0.38806 6.024 0.0000000031877 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 530 degrees of freedom
## Multiple R-squared: 0.2532, Adjusted R-squared: 0.2489
## F-statistic: 59.88 on 3 and 530 DF, p-value: < 0.00000000000000022
The coefficient of education is statistically significant at 5% because the p value of education is less than 5%
Hint: Discuss both its sign and magnitude.
For every year of education someone has they have 94 cents addded to their wage every year. Education also has three stars meaning it is very statistically significant to wage.
Hint: Discuss all three aspects of the relevant predictor: 1) statistical significance, 2) sign, and 3) magnitude.
Yes there is evidence for gender discrimination in wages. The statistcal signifance is very big as at 5% it would be statistacally significant.
To do this you first multiply 15 by the coefficient of education which is .94 that equals 14.1. then you multiply five (years of experience) by the coefficient of experience (.11) which equals .55. Add these two together for 14.56 then subtract the interpect (6.5) so the hourly wage for a women with 15 years of education and 5 years of expereince would be $8.15 per hour.
Hint: Provide a technical interpretation.
When the linear regression line crosses the y axis it does so at -6.5. Which means that when x equals 0 wage starts at $-6.5 dollars per hour.
Hint: Discuss in terms of both residual standard error and reported adjusted R squared.
data(CPS85, package="mosaicData")
wages_lm <- lm(wage ~ educ + exper + sex + union,
data = CPS85)
#View Summary of model 1
summary(wages_lm)
##
## Call:
## lm(formula = wage ~ educ + exper + sex + union, data = CPS85)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.496 -2.708 -0.712 1.909 37.784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.48023 1.20159 -5.393 0.00000010459 ***
## educ 0.93495 0.07835 11.934 < 0.0000000000000002 ***
## exper 0.10692 0.01674 6.387 0.00000000037 ***
## sexM 2.14765 0.39097 5.493 0.00000006145 ***
## unionUnion 1.47111 0.50932 2.888 0.00403 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.423 on 529 degrees of freedom
## Multiple R-squared: 0.2648, Adjusted R-squared: 0.2592
## F-statistic: 47.62 on 4 and 529 DF, p-value: < 0.00000000000000022
The second model is better because the residual standard error is lower than the first (4.42 compared to 4.45). This means that the real wage and the wage predicted by the model are closer in teh second model rather than the first. The adjsuted r squared value is bigger in the second model than the first meaning the second model is more accurate. ## Q8 Hide the messages, but display the code and its results on the webpage. Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.