This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
rm(list=ls())
library(openxlsx)
library(AER) # load package; to run IV regression; also contains data
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
library(stargazer) # load package; to put regression results into a single stargazer table
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
id <- "1F0XVHnHEBR5OQyiV7xy2el8efMPbrPQn"
fertility <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),
sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
How does fertility affect labor supply? That is, how much does a woman’s labor supply fall when she has an additional child? In this exercise you will estimate this effect using data for married women from the 1980 U.S. Census. The data are available on the textbook website, http://www .pearsonhighered.com/stock_watson, in the file Fertility and described in the file Fertility_Description. The data set contains information on married women aged 21–35 with two or more children.
str(fertility)
## 'data.frame': 254654 obs. of 9 variables:
## $ morekids: num 0 0 0 0 0 0 0 0 0 0 ...
## $ boy1st : num 1 0 1 1 0 1 0 1 0 1 ...
## $ boy2nd : num 0 1 0 0 0 0 1 1 1 0 ...
## $ samesex : num 0 0 0 0 1 0 0 1 0 0 ...
## $ agem1 : num 27 30 27 35 30 26 29 33 29 27 ...
## $ black : num 0 0 0 1 0 0 0 0 0 0 ...
## $ hispan : num 0 0 0 0 0 0 0 0 0 0 ...
## $ othrace : num 0 0 0 0 0 0 0 0 0 0 ...
## $ weeksm1 : num 0 30 0 0 22 40 0 52 0 0 ...
model1 <- (weeksm1~morekids)
fit.1 <- lm(model1, data = fertility)
coeftest(fit.1, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.068428 0.056068 375.765 < 2.2e-16 ***
## morekids -5.386996 0.087149 -61.813 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes, according to the regression 5.38 less.
The regressor suffers from omitted variables and it also is assuming that kids are the factor that makes mothers earns less, which is not always the case.
model2 <- (weeksm1~morekids+samesex)
model3 <- (morekids~samesex)
fit.2 <- lm(model3, data = fertility) #linear
coeftest(fit.2, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.346425 0.001341 258.335 < 2.2e-16 ***
## samesex 0.067525 0.001919 35.188 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit.2b <- glm(model3, data = fertility, family = binomial(link="probit"))
coeftest(fit.2b, vcov = vcovHC, type = "HC1")
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3949909 0.0036341 -108.691 < 2.2e-16 ***
## samesex 0.1775954 0.0050615 35.087 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pnorm(-0.1776)
## [1] 0.4295186
According to the regression yes, it is likely to have more kids, but the factor is very small just about 6.7% in the linear model.
The probit model is yelling 42.95% more chance
–
cor(fertility$morekids, fertility$samesex)
## [1] 0.06953403
If sex of the kid is not correlated to any other factor, however in order to have the kids sex you first have to have the kid (about 0.069), so it is correlated with the variable more kids, but not correlated with other factors.
fit.3 <- lm(model3, data = fertility)
linearHypothesis(fit.3, c("samesex=0"), white.adjust=c("hc1"))
## Linear hypothesis test
##
## Hypothesis:
## samesex = 0
##
## Model 1: restricted model
## Model 2: morekids ~ samesex
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 254653
## 2 254652 1 1238.2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(fit.3, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.346425 0.001341 258.335 < 2.2e-16 ***
## samesex 0.067525 0.001919 35.188 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the F test it is a strong instrument.
fit.4 <- ivreg(weeksm1~morekids+samesex |morekids+ samesex, data = fertility) #should I use morekids?
coeftest(fit.4, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.098504 0.069626 303.0278 <2e-16 ***
## morekids -5.382493 0.087374 -61.6027 <2e-16 ***
## samesex -0.062879 0.086277 -0.7288 0.4661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Without the morekids in the instrument part of the regression, what I think is correct we obtain the coefficient of -6.31 and it is statisticaly significant, meaning about 6.31 weeks less work.
fit.4 <- ivreg(weeksm1~morekids+ samesex+ agem1+ black+ hispan+ othrace |morekids+ samesex+ agem1+ black+ hispan+ othrace, data = fertility) #should I use morekids?
coeftest(fit.4, vcov = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.849300 0.369975 -13.1071 < 2.2e-16 ***
## morekids -6.232476 0.086472 -72.0753 < 2.2e-16 ***
## samesex 0.027980 0.084990 0.3292 0.741991
## agem1 0.837929 0.012118 69.1453 < 2.2e-16 ***
## black 11.664631 0.195532 59.6559 < 2.2e-16 ***
## hispan 0.466398 0.180707 2.5810 0.009853 **
## othrace 2.142279 0.208278 10.2857 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes, the result changes, that may be the case of omitted variables.