Lab12

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

rm(list=ls()) 
library(openxlsx)
library(AER)             # load package; to run IV regression; also contains data

## Loading required package: car

## Loading required package: carData

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Loading required package: survival

library(stargazer)        # load package; to put regression results into a single stargazer table

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

id <- "1F0XVHnHEBR5OQyiV7xy2el8efMPbrPQn"
fertility <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),
                 sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)

Exercise R12.1

How does fertility affect labor supply? That is, how much does a woman’s labor supply fall when she has an additional child? In this exercise you will estimate this effect using data for married women from the 1980 U.S. Census. The data are available on the textbook website, http://www .pearsonhighered.com/stock_watson, in the file Fertility and described in the file Fertility_Description. The data set contains information on married women aged 21–35 with two or more children.

Regress weeksworked on the indicator variable morekids, using OLS. On average, do women with more than two children work less than women with two children? How much less?

str(fertility)

## 'data.frame':    254654 obs. of  9 variables:
##  $ morekids: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ boy1st  : num  1 0 1 1 0 1 0 1 0 1 ...
##  $ boy2nd  : num  0 1 0 0 0 0 1 1 1 0 ...
##  $ samesex : num  0 0 0 0 1 0 0 1 0 0 ...
##  $ agem1   : num  27 30 27 35 30 26 29 33 29 27 ...
##  $ black   : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ hispan  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ othrace : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ weeksm1 : num  0 30 0 0 22 40 0 52 0 0 ...

model1 <- (weeksm1~morekids)
fit.1 <- lm(model1, data = fertility)
coeftest(fit.1, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 21.068428   0.056068 375.765 < 2.2e-16 ***
## morekids    -5.386996   0.087149 -61.813 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer:

Yes, according to the regression 5.38 less.

Explain why the OLS regression estimated in (a) is inappropriate for estimating the causal effect of fertility (morekids) on labor supply (weeksworked).

Answer:

The regressor suffers from omitted variables and it also is assuming that kids are the factor that makes mothers earns less, which is not always the case.

The data set contains the variable samesex, which is equal to 1 if the first two children are of the same sex (boy–boy or girl–girl) and equal to 0 otherwise. Are couples whose first two children are of the same sex more likely to have a third child? Is the effect large? Is it statistically significant?

model2 <- (weeksm1~morekids+samesex)
model3 <- (morekids~samesex)

fit.2 <- lm(model3, data = fertility) #linear
coeftest(fit.2, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 0.346425   0.001341 258.335 < 2.2e-16 ***
## samesex     0.067525   0.001919  35.188 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

fit.2b <- glm(model3, data = fertility, family = binomial(link="probit"))
coeftest(fit.2b, vcov = vcovHC, type = "HC1")

## 
## z test of coefficients:
## 
##               Estimate Std. Error  z value  Pr(>|z|)    
## (Intercept) -0.3949909  0.0036341 -108.691 < 2.2e-16 ***
## samesex      0.1775954  0.0050615   35.087 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pnorm(-0.1776)

## [1] 0.4295186

Answer:

According to the regression yes, it is likely to have more kids, but the factor is very small just about 6.7% in the linear model.

The probit model is yelling 42.95% more chance

–

Explain why samesex is a valid instrument for the instrumental variable regression of weeksworked on morekids.

cor(fertility$morekids, fertility$samesex)

## [1] 0.06953403

Answer:

If sex of the kid is not correlated to any other factor, however in order to have the kids sex you first have to have the kid (about 0.069), so it is correlated with the variable more kids, but not correlated with other factors.

Is samesex a weak instrument?

fit.3 <- lm(model3, data = fertility)
linearHypothesis(fit.3, c("samesex=0"), white.adjust=c("hc1"))

## Linear hypothesis test
## 
## Hypothesis:
## samesex = 0
## 
## Model 1: restricted model
## Model 2: morekids ~ samesex
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1 254653                        
## 2 254652  1 1238.2 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(fit.3, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 0.346425   0.001341 258.335 < 2.2e-16 ***
## samesex     0.067525   0.001919  35.188 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer:

According to the F test it is a strong instrument.

Estimate the regression of weeksworked on morekids, using samesex as an instrument. How large is the fertility effect on labor supply?

fit.4 <- ivreg(weeksm1~morekids+samesex |morekids+ samesex, data = fertility) #should I use morekids?
coeftest(fit.4, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 21.098504   0.069626 303.0278   <2e-16 ***
## morekids    -5.382493   0.087374 -61.6027   <2e-16 ***
## samesex     -0.062879   0.086277  -0.7288   0.4661    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer:

Without the morekids in the instrument part of the regression, what I think is correct we obtain the coefficient of -6.31 and it is statisticaly significant, meaning about 6.31 weeks less work.

Do the results change when you include the variables agem1, black, hispan, and othrace in the labor supply regression (treating these variable as exogenous)? Explain why or why not.

fit.4 <- ivreg(weeksm1~morekids+ samesex+ agem1+ black+ hispan+ othrace |morekids+ samesex+ agem1+ black+ hispan+ othrace, data = fertility) #should I use morekids?
coeftest(fit.4, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept) -4.849300   0.369975 -13.1071 < 2.2e-16 ***
## morekids    -6.232476   0.086472 -72.0753 < 2.2e-16 ***
## samesex      0.027980   0.084990   0.3292  0.741991    
## agem1        0.837929   0.012118  69.1453 < 2.2e-16 ***
## black       11.664631   0.195532  59.6559 < 2.2e-16 ***
## hispan       0.466398   0.180707   2.5810  0.009853 ** 
## othrace      2.142279   0.208278  10.2857 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer:

Yes, the result changes, that may be the case of omitted variables.