How does fertility affect labor supply? That is, how much does a woman’s labor supply fall when she has an additional child? In this exercise, you will estimate this effect using data for married women from the 1980 U.S. Census. The data are available on the text website, http://www.pearsonglobaleditions.com, in the file Fertility and described in the file Fertility_Description. The data set contains information on married women aged 21–35 with two or more children.
rm(list=ls())
library(openxlsx)
## Warning: package 'openxlsx' was built under R version 4.3.3
library(lmtest)
library(sandwich)
library(AER)
id <- "1F0XVHnHEBR5OQyiV7xy2el8efMPbrPQn"
fertility <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
str(fertility)
## 'data.frame': 254654 obs. of 9 variables:
## $ morekids: num 0 0 0 0 0 0 0 0 0 0 ...
## $ boy1st : num 1 0 1 1 0 1 0 1 0 1 ...
## $ boy2nd : num 0 1 0 0 0 0 1 1 1 0 ...
## $ samesex : num 0 0 0 0 1 0 0 1 0 0 ...
## $ agem1 : num 27 30 27 35 30 26 29 33 29 27 ...
## $ black : num 0 0 0 1 0 0 0 0 0 0 ...
## $ hispan : num 0 0 0 0 0 0 0 0 0 0 ...
## $ othrace : num 0 0 0 0 0 0 0 0 0 0 ...
## $ weeksm1 : num 0 30 0 0 22 40 0 52 0 0 ...
- Regress weeksm1 on the indicator variable morekids, using OLS. On average, do women with more than two children work less than women with two children? How much less?
model.a <- weeksm1 ~ morekids
fit.a <- lm(model.a, data=fertility)
coeftest(fit.a, vcov=vcovHC, type="HC1") #use heteroskedasticity robust SE
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.068428 0.056068 375.765 < 2.2e-16 ***
## morekids -5.386996 0.087149 -61.813 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[Ans] The coefficient is \(-5.39\), which indicates that women with more than 2 children work \(5.39\) fewer weeks per year than women with 2 or fewer children.
- Explain why the OLS regression estimated in (a) is inappropriate for estimating the causal effect of fertility (morekids) on labor supply (weeksm1).
[Ans] Both fertility and weeks worked are choice variables. A woman with a positive labor supply regression error (a woman who works more than average) may also be a woman who is less likely to have an additional child. This would imply that \(morekids\) is positively correlated with the error, so that the OLS estimator of \(\beta_{morekids}\) is positively biased.
- The data set contains the variable samesex, which is equal to 1 if the first two children are of the same sex (boy–boy or girl–girl) and equal to 0 otherwise. Are couples whose first two children are of the same sex more likely to have a third child? Is the effect large? Is it statistically significant?
model.c <- morekids ~ samesex
fit.c <- lm(model.c, data=fertility)
summary(fit.c)
##
## Call:
## lm(formula = model.c, data = fertility)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4139 -0.4139 -0.3464 0.5860 0.6536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.346425 0.001365 253.79 <2e-16 ***
## samesex 0.067525 0.001920 35.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4844 on 254652 degrees of freedom
## Multiple R-squared: 0.004835, Adjusted R-squared: 0.004831
## F-statistic: 1237 on 1 and 254652 DF, p-value: < 2.2e-16
coeftest(fit.c, vcov=vcovHC, type="HC1") #use heteroskedasticity robust SE
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.346425 0.001341 258.335 < 2.2e-16 ***
## samesex 0.067525 0.001919 35.188 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[Ans] Regress \(morekids\) on \(samesex\). We get \(\hat\beta_{samesex}=0.068\), so that couples with \(samesex = 1\) are \(6.8\%\) more likely to have an additional child than couples with \(samesex = 0\). The effect is highly significant (\(t = 35.19\) with a p-value 0.)
- Explain why samesex is a valid instrument for the IV regression of weeksm1 on morekids.
[Ans] The variable \(samesex\) is random and is unrelated to any of the other variables in the model including the error term in the labor supply equation. Thus, the instrument is exogenous. From (c), the first stage F-statistic is larger than 10 (\(F = 1237\)) or the t-statistic is larger than \(\sqrt{10}\) (\(t=35.19\)), so the instrument is relevant and strong. Together, these imply that \(samesex\) is a valid instrument.
- Is samesex a weak instrument?
[Ans] No, see the answer to (d).
- Estimate the IV regression of weeksm1 on morekids, using samesex as an instrument. How large is the fertility effect on labor supply?
# perform TSLS using 'ivreg()'
fit.f <- ivreg(weeksm1 ~ morekids | samesex, data = fertility)
coeftest(fit.f, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.42109 0.48725 43.9632 < 2.2e-16 ***
## morekids -6.31369 1.27469 -4.9531 7.308e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[Ans] The estimated value of \(\beta_{morekids} = -6.31\).
- Do the results change when you include the variables agem1, black, hispan, and othrace in the labor supply regression (treating these variable as exogenous)? Explain why or why not.
fit.g <- ivreg(weeksm1 ~ morekids + agem1 + black + hispan + othrace | samesex + agem1 + black + hispan + othrace, data = fertility)
coeftest(fit.g, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.791894 0.389791 -12.2935 < 2.2e-16 ***
## morekids -5.821051 1.246401 -4.6703 3.009e-06 ***
## agem1 0.831598 0.022641 36.7300 < 2.2e-16 ***
## black 11.623273 0.231798 50.1440 < 2.2e-16 ***
## hispan 0.404180 0.260799 1.5498 0.1212
## othrace 2.130962 0.210988 10.0999 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[Ans] The estimated value of \(\beta_{morekids} = -5.82\). The results do not change in an important way. The reason is that \(samesex\) is unrelated to \(agem1\), \(black\), \(hispan\), \(othrace\), so that there is no omitted variable bias in IV regression in (2).