sa <- read_dta('SA.dta')
head(sa)
summary(sa)
sa <- sa %>%
mutate(logwage = log(wage))
The ‘raw’ variable of wage has a long right ‘tail’ (a positive skew),
i.e. it is lognormal distributed. After the log transformation, it get
normalized.
Interpretation: both of the predictors have a positive significant effect; change in education by one unit increases the logarithm of wages by 0,15; change in experience by one unit increases the logarithm of wages by 0,015.
summary(mod1 <- lm(logwage ~ educ + exper, data = sa))
##
## Call:
## lm(formula = logwage ~ educ + exper, data = sa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2228 -0.4612 0.0557 0.5014 3.9545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2001317 0.0207020 9.667 <2e-16 ***
## educ 0.1503643 0.0017467 86.086 <2e-16 ***
## exper 0.0154358 0.0005441 28.369 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7832 on 19945 degrees of freedom
## Multiple R-squared: 0.2766, Adjusted R-squared: 0.2765
## F-statistic: 3813 on 2 and 19945 DF, p-value: < 2.2e-16
In order to answer the question, we should exponentiate the coefficient. Thus, when work experience changes by 1, the wage of a worker increases by 2%. To know how many years should a worker work on average to double his or her wage, we shoul imply the formula: X → log1,02(2) = 35, that means 35 years.
exp(coef(mod1))
## (Intercept) educ exper
## 1.221564 1.162258 1.015556
# print(betaeduc <- (1.16-1)*100)
print(betaexper <- (1.02-1)*100)
## [1] 2
log(2, 1.02)
## [1] 35.00279
# or
69/2
## [1] 34.5
The effect of work experience on logged wage looks linear.
crPlots(mod1,
~ exper,
ylab = "Partial residuals",
col=carPalette()[1], col.lines=carPalette()[3:4])
By added a polynomial of degree 2 in the model, we can see that the relationship is rather non-linear, which remind a mountain curve (weakly visible). We got the following equation: logwage = 54.00526 + 0,15(educ) + 25.3(exper) - 7.3(exper)^2. This means that the function first grows (the relationship is positive: the higher the experience, the higher the wage), and then, after 14.6 (25.3/(7.3*2)) points of experience, this relationship becomes negative.
summary(mod2 <- lm(logwage ~ educ + poly(exper,2), data = sa))
##
## Call:
## lm(formula = logwage ~ educ + poly(exper, 2), data = sa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2563 -0.4620 0.0525 0.4996 3.9251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.542403 0.012434 43.62 <2e-16 ***
## educ 0.150200 0.001743 86.17 <2e-16 ***
## poly(exper, 2)1 25.275032 0.890463 28.38 <2e-16 ***
## poly(exper, 2)2 -7.292723 0.781605 -9.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7816 on 19944 degrees of freedom
## Multiple R-squared: 0.2797, Adjusted R-squared: 0.2796
## F-statistic: 2582 on 3 and 19944 DF, p-value: < 2.2e-16
Overall, female gender is negatively associated with wages. In the comparison with men (reference category), women wages are 0.24 log points lower (or 22% lower). And at the same time, education is more important for women in determining wages than for men: the effect of education on wages is higher by 0.01 log points (1%) among women compared to men.
sa$female <- as.factor(sa$female)
summary(mod3 <- lm(logwage ~ educ*female + exper, data = sa))
##
## Call:
## lm(formula = logwage ~ educ * female + exper, data = sa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1197 -0.4595 0.0527 0.5029 3.8961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2694726 0.0217713 12.377 < 2e-16 ***
## educ 0.1502080 0.0020571 73.021 < 2e-16 ***
## female1 -0.2472828 0.0251589 -9.829 < 2e-16 ***
## exper 0.0151128 0.0005415 27.911 < 2e-16 ***
## educ:female1 0.0107683 0.0032883 3.275 0.00106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7787 on 19943 degrees of freedom
## Multiple R-squared: 0.285, Adjusted R-squared: 0.2849
## F-statistic: 1987 on 4 and 19943 DF, p-value: < 2.2e-16
exp(coef(mod3))
## (Intercept) educ female1 exper educ:female1
## 1.3092738 1.1620759 0.7809199 1.0152276 1.0108265
print(betafemale <- (0.78-1)*100)
## [1] -22
print(betainteraction <- (1.01-1)*100)
## [1] 1