L8 (Lab) - The Nonlinear Regression

In this lab exercise, you will learn:

Nonlinear Regression Using R
- How to generate an interaction term or a polynomial term: \(I(x_1*x_2)\) or \(I(x^2)\)
- How to interpret results of multiple regressions, esp. with interaction terms.
Multiple Regression with \(X\) and \(X^2\) as Independent Variables
- Find the value of \(X\) that minimizes or maximizes \(Y\).

Exercise 1: Gender Wage Gap

The gender pay gap or gender wage gap is the average difference between the remuneration for men and women who are working. Women are generally considered to be paid less than men. In this exercise, we are going to revisit this issue using WAGE1. This data set is part of the R package wooldridge.

Clear the Workspace

rm(list=ls())

Install and Load Needed Packages

Let’s load all the packages needed for this exercise (this assumes you’ve already installed them).

#install.packages("wooldridge")        # install R package "wooldridge"
library(wooldridge)                   # load package; to get data 
library(sandwich)                     # to obtain robust standard errors

Import Data: WAGE1

attach(wage1)   # Allowing objects in the database to be accessed by simply giving their names
str(wage1)

## 'data.frame':    526 obs. of  24 variables:
##  $ wage    : num  3.1 3.24 3 6 5.3 ...
##  $ educ    : int  11 12 11 8 12 16 18 12 12 17 ...
##  $ exper   : int  2 22 2 44 7 9 15 5 26 22 ...
##  $ tenure  : int  0 2 0 28 2 8 7 3 4 21 ...
##  $ nonwhite: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ female  : int  1 1 0 0 0 0 0 1 1 0 ...
##  $ married : int  0 1 0 1 1 1 0 0 0 1 ...
##  $ numdep  : int  2 3 2 0 1 0 0 0 2 0 ...
##  $ smsa    : int  1 1 0 1 0 1 1 1 1 1 ...
##  $ northcen: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ south   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ west    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ construc: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ndurman : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ trcommpu: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ trade   : int  0 0 1 0 0 0 1 0 1 0 ...
##  $ services: int  0 1 0 0 0 0 0 0 0 0 ...
##  $ profserv: int  0 0 0 0 0 1 0 0 0 0 ...
##  $ profocc : int  0 0 0 0 0 1 1 1 1 1 ...
##  $ clerocc : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ servocc : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ lwage   : num  1.13 1.18 1.1 1.79 1.67 ...
##  $ expersq : int  4 484 4 1936 49 81 225 25 676 484 ...
##  $ tenursq : int  0 4 0 784 4 64 49 9 16 441 ...
##  - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"

Description of main variables:

wage: average hourly earnings.
female: \(=1\) if female.
married: \(=1\) if married.

Estimation and Interpretation of a Multiple Regression Model

Consider a multiple regression model as follows: \[wage_i = \beta_0 + \beta_1\cdot female_i + \beta_2 \cdot married_i + \beta_3 \cdot (FeMar_i) + u_i,\] where \(FeMar_i\) is an interaction term of \(female\) and \(married\): \[FeMar_i = female_i \times married_i.\]

Interaction effects often occur when the effect of one variable depends on the value of another variable. In this example, the interaction, \(FeMar_i\), is used because the gender wage gap (i.e., gender effect on wage) might depend on marital status.

On average, is gender wage gap of the married larger than that of the unmarried? Which coefficient(s) can represent the gender marriage premium gap between female and male?

Interpretation of coefficients:

average wage for unmarried male: \(\beta_0\)
average wage for married male: \(\beta_0 + \beta_2\)
average wage for unmarried female: \(\beta_0 + \beta_1\)
average wage for married female: \(\beta_0 + \beta_1 + \beta_2 + \beta_3\)
marriage premium for male: \(\beta_2\)
marriage premium for female: \(\beta_2 + \beta_3\)
gender marriage premium gap (between female and male): \(\beta_3\)
gender wage gap for the unmarried: \(\beta_1\)
gender wage gap for the married: \(\beta_1 + \beta_3\)

The OLS estimation of the model:

fit.m1 <- lm(wage ~ female + married + I(female*married))
summary(fit.m1)

## 
## Call:
## lm(formula = wage ~ female + married + I(female * married))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7530 -1.7327 -0.9973  1.2566 17.0184 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           5.1680     0.3614  14.299  < 2e-16 ***
## female               -0.5564     0.4736  -1.175    0.241    
## married               2.8150     0.4363   6.451 2.53e-10 ***
## I(female * married)  -2.8607     0.6076  -4.708 3.20e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.352 on 522 degrees of freedom
## Multiple R-squared:  0.181,  Adjusted R-squared:  0.1763 
## F-statistic: 38.45 on 3 and 522 DF,  p-value: < 2.2e-16

Note: To generate an interaction term, we use the function, I(x), which changes the class of an object to indicate that it should be treated ‘as is’.

Interpretation of OLS Estimation:

average wage for unmarried male: \(\hat\beta_0 = 5.17\)
average wage for married male: \(\hat\beta_0 + \hat\beta_2 = 7.98\)
average wage for unmarried female: \(\hat\beta_0 + \hat\beta_1 = 4.61\)
average wage for married female: \(\hat\beta_0 + \hat\beta_1 + \hat\beta_2 + \hat\beta_3 = 4.57\)
marriage premium for male: \(\hat\beta_2 = 2.82\)
marriage premium for female: \(\hat\beta_2 + \hat\beta_3 = -0.04\)
gender marriage premium gap (between female and male): \(\hat\beta_3 = -2.86\)
gender wage gap for the unmarried: \(\hat\beta_1 = -0.56\)
gender wage gap for the married: \(\hat\beta_1 + \hat\beta_3 = -3.42\)

(a) Is the gender wage gap for the unmarried significantly different from 0? (b) How about the gender wage gap for the married?

# Gender wage gap for the unmarried: 
# Test the significance of beta_1
# t-test using robust se
library(lmtest)
library(sandwich)

coeftest(fit.m1, vcov=vcovHC, type="HC3")

## 
## t test of coefficients:
## 
##                     Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)          5.16802    0.29533 17.4989 < 2.2e-16 ***
## female              -0.55644    0.40327 -1.3798    0.1682    
## married              2.81501    0.43702  6.4413 2.692e-10 ***
## I(female * married) -2.86068    0.54565 -5.2427 2.302e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Gender wage gap for the married
# Test the significance of beta_1+beta_3
# F-test using robust se
library(car)

linearHypothesis(fit.m1, c("female + I(female * married) = 0"), white.adjust=c("hc3"))

## Linear hypothesis test
## 
## Hypothesis:
## female + I(female * married) = 0
## 
## Model 1: restricted model
## Model 2: wage ~ female + married + I(female * married)
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1    523                        
## 2    522  1 86.425 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exercise 2: The Wage Returns to Work Experience

In this exercise, we are going to study the relationship between wage and years of work experience using WAGE1.

Consider the following multiple regression model: \[wage_i = \beta_0 + \beta_1 \cdot exper_i + \beta_2 \cdot exper^2_i + \beta_3 \cdot educ_i + \beta_4 \cdot female_i + \beta_5 \cdot tenure_i + \beta_6 \cdot tenure^2_i + u_i.\] Description of main variables:

wage: average hourly earnings.
exper: years potential experience.
expersq: \(exper^2\).
educ: years of education.
female: \(=1\) if female.
tenure: years with current employer.
tenursq: \(tenure^2\).

fit.m2 <- lm(wage ~ exper + expersq + educ + female + tenure + tenursq)
summary(fit.m2)

## 
## Call:
## lm(formula = wage ~ exper + expersq + educ + female + tenure + 
##     tenursq)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9361 -1.6524 -0.4524  1.1177 13.3086 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.1097498  0.7106694  -2.969  0.00313 ** 
## exper        0.1878383  0.0357401   5.256 2.16e-07 ***
## expersq     -0.0037975  0.0007708  -4.927 1.13e-06 ***
## educ         0.5262551  0.0485421  10.841  < 2e-16 ***
## female      -1.7831998  0.2572159  -6.933 1.23e-11 ***
## tenure       0.2116944  0.0491737   4.305 2.00e-05 ***
## tenursq     -0.0029460  0.0016860  -1.747  0.08118 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.872 on 519 degrees of freedom
## Multiple R-squared:  0.4022, Adjusted R-squared:  0.3953 
## F-statistic: 58.19 on 6 and 519 DF,  p-value: < 2.2e-16

What is the marginal effect of work experience on wage, given other variables fixed?

\(\frac{\partial wage}{\partial exper} = \beta_1 + 2\beta_2 \cdot exper = 0.188 + 2\times (-0.004) \times exper\).
So, for different individuals who have different years of work experience, the marginal effect of experience on wage is different. More experience means less marginal effect.
Based on OLS estimations on \(\beta_1\) and \(\beta_2\), we find that the relationship between wage and experience is an inverted U-shaped curve.
Using the estimates \(\hat\beta_1\) and \(\hat\beta_2\), find the value of \(exper\), call it \(exper^*\), where \(wage\) is maximized: \[exper^* = - \frac{\hat\beta_1}{2\hat\beta_2}.\]

Find the optimal years of experience that maximizes wage.

Obtain the estimated coefficients

b1 <- fit.m2$coefficients["exper"]
b2 <- fit.m2$coefficients["expersq"]

Compute the optimal years of experience

-b1/(2*b2)    # years of experience that maximizes wage

##    exper 
## 24.73163

Here is the R code used to create the plot of wage vs. years of experience:

x.exper <- min(exper):max(exper)    # generate a sequence of years of experience
y.wage <- b2*(x.exper^2) + b1*x.exper   # wage
plot(x=x.exper, y=y.wage, type="l", xlab="experience (in years)",
     ylab="wage", main="Wage vs. Experience")
abline(v=-b1/(2*b2), col="red")  #find the maximum wage

Do you agree that the marginal effect of work experience on wage depends on years of education, given other variables fixed? How to build a model to test this hypothesis?

First, let’s define a dummy variable, college, which =1 if educ >= 16, =0 otherwise. Create a plot to visualize how the relationship between experience and wages changes based on education level.

library(ggplot2)
wage1$college <- as.numeric(educ>=16) 
ggplot(data = wage1) + 
  geom_point(mapping = aes(x = exper, y = wage, color=college))

Second, let’s build a model to capture the heterogenous effect of experience across different levels of education. Examine the coefficients for the interaction terms. What does the coefficients imply about the relationship between experience and wages for different education levels?

fit.m3 <- lm(wage ~ exper + expersq + educ + female + tenure + tenursq + I(exper*educ) + I(expersq*educ))
summary(fit.m3)

## 
## Call:
## lm(formula = wage ~ exper + expersq + educ + female + tenure + 
##     tenursq + I(exper * educ) + I(expersq * educ))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9474 -1.6176 -0.4741  1.0892 13.4757 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.0079731  1.4793193  -0.005  0.99570    
## exper             -0.1621227  0.1481098  -1.095  0.27420    
## expersq            0.0040167  0.0029319   1.370  0.17127    
## educ               0.3493568  0.1118879   3.122  0.00189 ** 
## female            -1.7401801  0.2576597  -6.754 3.89e-11 ***
## tenure             0.2077880  0.0489699   4.243 2.61e-05 ***
## tenursq           -0.0028162  0.0016826  -1.674  0.09479 .  
## I(exper * educ)    0.0298167  0.0117112   2.546  0.01119 *  
## I(expersq * educ) -0.0006824  0.0002406  -2.837  0.00474 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.854 on 517 degrees of freedom
## Multiple R-squared:  0.4118, Adjusted R-squared:  0.4027 
## F-statistic: 45.24 on 8 and 517 DF,  p-value: < 2.2e-16

b1.m3 <- fit.m3$coefficients["exper"]
b2.m3 <- fit.m3$coefficients["expersq"]

b3.m3 <- fit.m3$coefficients["educ"]
b7.m3 <- fit.m3$coefficients["I(exper * educ)"]
b8.m3 <- fit.m3$coefficients["I(expersq * educ)"]

summary(educ)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   12.00   12.00   12.56   14.00   18.00

# The marginal effect of exper on wage
educ.fix <- 12 
y.wage.mineduc <- b1.m3*x.exper + b2.m3*(x.exper^2) + b3.m3*educ.fix + 
                  b7.m3*x.exper*educ.fix + b8.m3*(x.exper^2)*educ.fix

educ.fix <- 16 
y.wage.maxeduc <- b1.m3*x.exper + b2.m3*(x.exper^2) + b3.m3*educ.fix + 
                  b7.m3*x.exper*educ.fix + b8.m3*(x.exper^2)*educ.fix

plot(x=x.exper, y=y.wage.mineduc, type="l", xlab="experience (in years)",
     ylab="wage", main="Wage vs. Experience",
     ylim=c(min(y.wage.mineduc), max(y.wage.maxeduc)), col="red")

lines(x=x.exper, y=y.wage.maxeduc, type="l", col="blue")

abline(v= - (b1.m3 + b7.m3 * 12) / (2 * b2.m3 + 2 * b8.m3 * 12), col="red")  #find the maximum wage for educ=12

abline(v= - (b1.m3 + b7.m3 * 16) / (2 * b2.m3 + 2 * b8.m3 * 16), col="blue")  #find the maximum wage for educ=16