In this lab exercise, you will learn:
The gender pay gap or gender wage gap is the average difference between the remuneration for men and women who are working. Women are generally considered to be paid less than men. In this exercise, we are going to revisit this issue using WAGE1. This data set is part of the R package wooldridge.
rm(list=ls())
Let’s load all the packages needed for this exercise (this assumes you’ve already installed them).
#install.packages("wooldridge") # install R package "wooldridge"
library(wooldridge) # load package; to get data
library(sandwich) # to obtain robust standard errors
attach(wage1) # Allowing objects in the database to be accessed by simply giving their names
str(wage1)
## 'data.frame': 526 obs. of 24 variables:
## $ wage : num 3.1 3.24 3 6 5.3 ...
## $ educ : int 11 12 11 8 12 16 18 12 12 17 ...
## $ exper : int 2 22 2 44 7 9 15 5 26 22 ...
## $ tenure : int 0 2 0 28 2 8 7 3 4 21 ...
## $ nonwhite: int 0 0 0 0 0 0 0 0 0 0 ...
## $ female : int 1 1 0 0 0 0 0 1 1 0 ...
## $ married : int 0 1 0 1 1 1 0 0 0 1 ...
## $ numdep : int 2 3 2 0 1 0 0 0 2 0 ...
## $ smsa : int 1 1 0 1 0 1 1 1 1 1 ...
## $ northcen: int 0 0 0 0 0 0 0 0 0 0 ...
## $ south : int 0 0 0 0 0 0 0 0 0 0 ...
## $ west : int 1 1 1 1 1 1 1 1 1 1 ...
## $ construc: int 0 0 0 0 0 0 0 0 0 0 ...
## $ ndurman : int 0 0 0 0 0 0 0 0 0 0 ...
## $ trcommpu: int 0 0 0 0 0 0 0 0 0 0 ...
## $ trade : int 0 0 1 0 0 0 1 0 1 0 ...
## $ services: int 0 1 0 0 0 0 0 0 0 0 ...
## $ profserv: int 0 0 0 0 0 1 0 0 0 0 ...
## $ profocc : int 0 0 0 0 0 1 1 1 1 1 ...
## $ clerocc : int 0 0 0 1 0 0 0 0 0 0 ...
## $ servocc : int 0 1 0 0 0 0 0 0 0 0 ...
## $ lwage : num 1.13 1.18 1.1 1.79 1.67 ...
## $ expersq : int 4 484 4 1936 49 81 225 25 676 484 ...
## $ tenursq : int 0 4 0 784 4 64 49 9 16 441 ...
## - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
Description of main variables:
Consider a multiple regression model as follows: \[wage_i = \beta_0 + \beta_1\cdot female_i + \beta_2 \cdot married_i + \beta_3 \cdot (FeMar_i) + u_i,\] where \(FeMar_i\) is an interaction term of \(female\) and \(married\): \[FeMar_i = female_i \times married_i.\]
Interaction effects often occur when the effect of one variable depends on the value of another variable. In this example, the interaction, \(FeMar_i\), is used because the gender wage gap (i.e., gender effect on wage) might depend on marital status.
On average, is gender wage gap of the married larger than that of the unmarried? Which coefficient(s) can represent the gender marriage premium gap between female and male?
The OLS estimation of the model:
fit.m1 <- lm(wage ~ female + married + I(female*married))
summary(fit.m1)
##
## Call:
## lm(formula = wage ~ female + married + I(female * married))
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7530 -1.7327 -0.9973 1.2566 17.0184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.1680 0.3614 14.299 < 2e-16 ***
## female -0.5564 0.4736 -1.175 0.241
## married 2.8150 0.4363 6.451 2.53e-10 ***
## I(female * married) -2.8607 0.6076 -4.708 3.20e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.352 on 522 degrees of freedom
## Multiple R-squared: 0.181, Adjusted R-squared: 0.1763
## F-statistic: 38.45 on 3 and 522 DF, p-value: < 2.2e-16
Note: To generate an interaction term, we use the function, I(x), which changes the class of an object to indicate that it should be treated ‘as is’.
(a) Is the gender wage gap for the unmarried significantly different from 0? (b) How about the gender wage gap for the married?
# Gender wage gap for the unmarried:
# Test the significance of beta_1
# t-test using robust se
library(lmtest)
library(sandwich)
coeftest(fit.m1, vcov=vcovHC, type="HC3")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.16802 0.29533 17.4989 < 2.2e-16 ***
## female -0.55644 0.40327 -1.3798 0.1682
## married 2.81501 0.43702 6.4413 2.692e-10 ***
## I(female * married) -2.86068 0.54565 -5.2427 2.302e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Gender wage gap for the married
# Test the significance of beta_1+beta_3
# F-test using robust se
library(car)
linearHypothesis(fit.m1, c("female + I(female * married) = 0"), white.adjust=c("hc3"))
## Linear hypothesis test
##
## Hypothesis:
## female + I(female * married) = 0
##
## Model 1: restricted model
## Model 2: wage ~ female + married + I(female * married)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 523
## 2 522 1 86.425 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this exercise, we are going to study the relationship between wage and years of work experience using WAGE1.
Consider the following multiple regression model: \[wage_i = \beta_0 + \beta_1 \cdot exper_i + \beta_2 \cdot exper^2_i + \beta_3 \cdot educ_i + \beta_4 \cdot female_i + \beta_5 \cdot tenure_i + \beta_6 \cdot tenure^2_i + u_i.\] Description of main variables:
fit.m2 <- lm(wage ~ exper + expersq + educ + female + tenure + tenursq)
summary(fit.m2)
##
## Call:
## lm(formula = wage ~ exper + expersq + educ + female + tenure +
## tenursq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9361 -1.6524 -0.4524 1.1177 13.3086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.1097498 0.7106694 -2.969 0.00313 **
## exper 0.1878383 0.0357401 5.256 2.16e-07 ***
## expersq -0.0037975 0.0007708 -4.927 1.13e-06 ***
## educ 0.5262551 0.0485421 10.841 < 2e-16 ***
## female -1.7831998 0.2572159 -6.933 1.23e-11 ***
## tenure 0.2116944 0.0491737 4.305 2.00e-05 ***
## tenursq -0.0029460 0.0016860 -1.747 0.08118 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.872 on 519 degrees of freedom
## Multiple R-squared: 0.4022, Adjusted R-squared: 0.3953
## F-statistic: 58.19 on 6 and 519 DF, p-value: < 2.2e-16
\(\frac{\partial wage}{\partial exper} = \beta_1 + 2\beta_2 \cdot exper = 0.188 + 2\times (-0.004) \times exper\).
So, for different individuals who have different years of work experience, the marginal effect of experience on wage is different. More experience means less marginal effect.
Based on OLS estimations on \(\beta_1\) and \(\beta_2\), we find that the relationship
between wage and experience is an inverted U-shaped
curve.
Using the estimates \(\hat\beta_1\) and \(\hat\beta_2\), find the value of \(exper\), call it \(exper^*\), where \(wage\) is maximized: \[exper^* = - \frac{\hat\beta_1}{2\hat\beta_2}.\]
Find the optimal years of experience that maximizes wage.
b1 <- fit.m2$coefficients["exper"]
b2 <- fit.m2$coefficients["expersq"]
-b1/(2*b2) # years of experience that maximizes wage
## exper
## 24.73163
Here is the R code used to create the plot of wage vs. years of experience:
x.exper <- min(exper):max(exper) # generate a sequence of years of experience
y.wage <- b2*(x.exper^2) + b1*x.exper # wage
plot(x=x.exper, y=y.wage, type="l", xlab="experience (in years)",
ylab="wage", main="Wage vs. Experience")
abline(v=-b1/(2*b2), col="red") #find the maximum wage
First, let’s define a dummy variable, college, which =1 if educ >= 16, =0 otherwise. Create a plot to visualize how the relationship between experience and wages changes based on education level.
library(ggplot2)
wage1$college <- as.numeric(educ>=16)
ggplot(data = wage1) +
geom_point(mapping = aes(x = exper, y = wage, color=college))
Second, let’s build a model to capture the heterogenous effect of experience across different levels of education. Examine the coefficients for the interaction terms. What does the coefficients imply about the relationship between experience and wages for different education levels?
fit.m3 <- lm(wage ~ exper + expersq + educ + female + tenure + tenursq + I(exper*educ) + I(expersq*educ))
summary(fit.m3)
##
## Call:
## lm(formula = wage ~ exper + expersq + educ + female + tenure +
## tenursq + I(exper * educ) + I(expersq * educ))
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9474 -1.6176 -0.4741 1.0892 13.4757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0079731 1.4793193 -0.005 0.99570
## exper -0.1621227 0.1481098 -1.095 0.27420
## expersq 0.0040167 0.0029319 1.370 0.17127
## educ 0.3493568 0.1118879 3.122 0.00189 **
## female -1.7401801 0.2576597 -6.754 3.89e-11 ***
## tenure 0.2077880 0.0489699 4.243 2.61e-05 ***
## tenursq -0.0028162 0.0016826 -1.674 0.09479 .
## I(exper * educ) 0.0298167 0.0117112 2.546 0.01119 *
## I(expersq * educ) -0.0006824 0.0002406 -2.837 0.00474 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.854 on 517 degrees of freedom
## Multiple R-squared: 0.4118, Adjusted R-squared: 0.4027
## F-statistic: 45.24 on 8 and 517 DF, p-value: < 2.2e-16
b1.m3 <- fit.m3$coefficients["exper"]
b2.m3 <- fit.m3$coefficients["expersq"]
b3.m3 <- fit.m3$coefficients["educ"]
b7.m3 <- fit.m3$coefficients["I(exper * educ)"]
b8.m3 <- fit.m3$coefficients["I(expersq * educ)"]
summary(educ)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.00 12.00 12.56 14.00 18.00
# The marginal effect of exper on wage
educ.fix <- 12
y.wage.mineduc <- b1.m3*x.exper + b2.m3*(x.exper^2) + b3.m3*educ.fix +
b7.m3*x.exper*educ.fix + b8.m3*(x.exper^2)*educ.fix
educ.fix <- 16
y.wage.maxeduc <- b1.m3*x.exper + b2.m3*(x.exper^2) + b3.m3*educ.fix +
b7.m3*x.exper*educ.fix + b8.m3*(x.exper^2)*educ.fix
plot(x=x.exper, y=y.wage.mineduc, type="l", xlab="experience (in years)",
ylab="wage", main="Wage vs. Experience",
ylim=c(min(y.wage.mineduc), max(y.wage.maxeduc)), col="red")
lines(x=x.exper, y=y.wage.maxeduc, type="l", col="blue")
abline(v= - (b1.m3 + b7.m3 * 12) / (2 * b2.m3 + 2 * b8.m3 * 12), col="red") #find the maximum wage for educ=12
abline(v= - (b1.m3 + b7.m3 * 16) / (2 * b2.m3 + 2 * b8.m3 * 16), col="blue") #find the maximum wage for educ=16