In this part of the problem set, we are going to replicate part of the results of Joshua Angrist and William Evans’ article “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” Here is the abstract of the study:
Research on the labor-supply consequences of childbearing is complicated by the endogeneity of fertility. This study uses parental preferences for a mixed sibling-sex composition to construct instrumental variables (IV) estimates of the effect of childbearing on labor supply. IV estimates for women are significant but smaller than ordinary least-squares estimates. The IV are also smaller for more educated women and show no impact of family size on husbands’ labor supply. A comparison of estimates using sibling-sex composition and twins instruments implies that the impact of a third child disappears when the child reaches age 13. (JEL J13, J22)
The purpose of this exercise is to study how fertility affects female labor supply. In order to do this, we are going to compare female labor supply in households with two children versus households with three children. Since fertility decisions are endogenous, we are going to use two sets of instruments: whether there is a multiple pregnancy in the second pregnancy and sex composition of the first two children. This latter instrument was the one proposed by Angrist & Evans (1998). Intuitively, parents are more likely to have a third child when the first two have the same sex. Assuming that whether the first two children have the same sex is random, we can use this variable as an instrument for the number of children in the household.
We are going to use the census80.csv dataset that corresponds to an extract of the 1980 US Census. It has been restricted to the set of families with two or three children and with mother’s age between 21 and 35 years. The data set contains the following variables:
Setting my working directory and uploading the data set.
setwd("~/EDX courses/MicroMaster MIT/14.310x-Data Analysis for Social Scientists/Programs")
mydata <- read.csv("census80.csv")
summary(mydata)
workedm weeksm whitem blackm
Min. :0.0000 Min. : 0.00 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.:0.0000
Median :1.0000 Median :12.00 Median :1.0000 Median :0.0000
Mean :0.5716 Mean :20.82 Mean :0.8314 Mean :0.1125
3rd Qu.:1.0000 3rd Qu.:48.00 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :52.00 Max. :1.0000 Max. :1.0000
hispm othracem sex1st sex2nd
Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.00000 Median :0.00000 Median :0.0000 Median :0.0000
Mean :0.02725 Mean :0.02886 Mean :0.4871 Mean :0.4881
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.0000
ageq2nd ageq3rd numberkids
Min. : 0.00 Min. : 0.00 Min. :2.000
1st Qu.: 9.00 1st Qu.: 5.00 1st Qu.:2.000
Median :19.00 Median :13.00 Median :2.000
Mean :21.75 Mean :16.59 Mean :2.286
3rd Qu.:33.00 3rd Qu.:26.00 3rd Qu.:3.000
Max. :71.00 Max. :67.00 Max. :3.000
NA's :305132
#Loading Required Library
library("AER")
#Constructing an Indicator Variable Using ageq2nd and age3rd variables
mydata$temp[mydata$ageq2nd == mydata$ageq3rd] <- 1 #Creating a temporary vector that meets the given criteria
mydata$multiple <- 0
mydata$multiple[mydata$temp == 1] <- 1
summary(mydata$multiple)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.00729 0.00000 1.00000
Based on the outcomes the proportion of households with a multiple pregnancy in the second pregnancy is 0.00729.
#Constructing an Indicator Variable Using sex1st and sex2nd variables
mydata$samesex <- (mydata$sex1st == mydata$sex2nd)
mydata$samesex[mydata$samesex == FALSE] <- 0
mydata$samesex[mydata$samesex == TRUE] <- 1
summary(mydata$samesex)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 1.0000 0.5019 1.0000 1.0000
The proportion of the household in which the first two children have the same sex is 0.5019.
Now let’s set up the model we want to estimate. In particular, we are interested in estimating the following equation:
laborsupplyh = α0 + α11 3childrenh + α2 blackmotherh + α3 hispanicmotherh + α4 otherraceh + εh(equation 1)
where,
# Creating a variable 'three' that suggests the families with three children
mydata$three <- (mydata$numberkids == 3)
mydata$three[mydata$numberkids == FALSE] <- 0
mydata$three[mydata$numberkids == TRUE] <- 1
summary(mydata$three)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.2861 1.0000 1.0000
# Running the OLS Model
ols1 <- lm(workedm ~ three + blackm + hispm + othracem, data = mydata)
# Creating an empty matrix
OLS <- matrix(ncol = 2, nrow = 2, data = NA) #creating an empty matrix to input my answer
#Inputting the values of my interest in the empty matrix
OLS[1, 1] <- ols1$coefficients[2] #1st value, in first row X first column
pvalue <- summary(ols1)
OLS[2, 1] <- pvalue$coefficients[2, 4]#2nd value in second row X first column
ols2 <- lm(weeksm ~ three + blackm + hispm + othracem, data = mydata)
OLS[1, 2] <- ols2$coefficients[2]# 3rd value in 1st row second column
pvalue <- summary(ols2)
OLS[2, 2] <- pvalue$coefficients[2, 4] #4th value in 2nd row second column
OLS
[,1] [,2]
[1,] -0.0839132 -3.940177
[2,] 0.0000000 0.000000
Based on the Results, the Right Options are as follow:
Since fertility is an endogenous variable, we want to use the multiple pregnancy and the same sex variables as instruments for having three children in the household. We are going to estimate the first-stage using each variable separately. Run a regression for each of these instruments using the indicator of having three children as the dependent variable and controlling for the race of the mother.
Required model: 13childrenh = β0 + β1 multipleh + β2 blackmotherh + β3 hispanicmotherh + β4otherraceh + νh (equation 2)
ols3 <- lm(three ~ multiple + blackm + hispm + othracem, data = mydata)
myanswer <- matrix(ncol = 1, nrow = 2, data = NA)
myanswer[1,1] <- ols3$coefficients[2]
pvalue <- summary(ols3)
myanswer[2,1] <- pvalue$coefficients[2,4]
myanswer
[,1]
[1,] 0.7179404
[2,] 0.0000000
## Or we can simply print the summary statistics
summary(ols3)
Call:
lm(formula = three ~ multiple + blackm + hispm + othracem, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-0.3528 -0.2710 -0.2710 0.6641 0.7290
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2710109 0.0007523 360.242 < 2e-16 ***
multiple 0.7179404 0.0080400 89.296 < 2e-16 ***
blackm 0.0648870 0.0021730 29.860 < 2e-16 ***
hispm 0.0817475 0.0042103 19.416 < 2e-16 ***
othracem 0.0115414 0.0040954 2.818 0.00483 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4472 on 427413 degrees of freedom
Multiple R-squared: 0.02108, Adjusted R-squared: 0.02107
F-statistic: 2301 on 4 and 427413 DF, p-value: < 2.2e-16
In either cases, we calculated that having a multiple pregnancy at the second pregnancy increases the likelihood of having a third child by 71.79%.
ols4 <- lm(three ~ samesex + blackm + hispm + othracem, data = mydata)
myanswer1 <- matrix(ncol = 1, nrow = 2, data = NA)
myanswer1[1,1] <- ols4$coefficients[2]
pvalue <- summary(ols4)
myanswer1[2,1] <- pvalue$coefficients[2,4]
myanswer1
[,1]
[1,] 4.901816e-02
[2,] 1.669377e-276
Based on the findings, having two children of the same sex increases the likelihood of having a third child by 4.902%.
iv1 <- ivreg(workedm ~ three + blackm + hispm + othracem | blackm + hispm + othracem + multiple, data = mydata)
iv2 <- ivreg(weeksm ~ three + blackm + hispm + othracem | blackm + hispm + othracem + multiple, data = mydata)
iv <- matrix(ncol=2, nrow=2, data="Not Yet")
iv[1,1] <- iv1$coefficients[2]
pvalue <- summary(iv1)
iv[2,1] <- pvalue$coefficients[2,4]
iv[1,2] <- iv2$coefficients[2]
pvalue1 <- summary(iv2)
iv[2,2] <- pvalue1$coefficients[2,4]
iv
[,1] [,2]
[1,] "-0.064125589077173" "-3.13765416218266"
[2,] "1.93810863508335e-07" "1.1639267997331e-08"
The results show that having a third child decreases the likelihood that the mother works by 6.41% when we use multiple pregnancy at the second pregnancy variables as an instrument.
iv3 <- ivreg(workedm ~ three + blackm + hispm + othracem | blackm + hispm + othracem + samesex, data = mydata)
iv4 <- ivreg(weeksm ~ three + blackm + hispm + othracem | blackm + hispm + othracem + samesex, data = mydata)
iv_1 <- matrix(ncol=2, nrow=2, data="Not Yet")
iv_1[1,1] <- iv3$coefficients[2]
pvalue2 <- summary(iv3)
iv_1[2,1] <- pvalue2$coefficients[2,4]
iv_1[1,2] <- iv4$coefficients[2]
pvalue3 <- summary(iv4)
iv_1[2,2] <- pvalue3$coefficients[2,4]
iv_1
[,1] [,2]
[1,] "-0.0982205358125773" "-4.99429876267641"
[2,] "0.00137589272606202" "0.000268763736611791"
The results show that, if we use the same-sex variable as the instrument, then having a third child decreases the likelihood that the mother works by 9.82%.
IV estimates are local treatment effects. Thus, we are identifying the effect of fertility over women who have a third child when the relevant instrument changes.
Why?
Under heterogeneous effects, IV estimates correspond to LATE (local average treatment effects). Thus, we are able to identify the average effect over the population that decides to have a third child when the instrument is switched on. This implies, that α^IV−multiple1=0.06412559 is the treatment effect on those that have a third child due to a multiple pregnancy. In general, for most of the population, having a multiple pregnancy would imply having a third child. On the other hand, α^IV−samesex1=0.098220536 corresponds to the treatment effect on those that decide to have a third child when the first two children have the same sex.
thinking clearly about experimental design allows us to identify parameters beyond treatment effects, for example, General Equilibrium Effects as in the French Unemployment experiment. Another potential advantage of designing carefully experiments is the identification of potential mechanisms that drive a causal relationship. In this set of questions, we are going to discuss the identification of mechanisms. We are going to study Bursztyn et al.’s (2014) article “Understanding Mechanisms Underlying Peer Effects: Evidence from a Field Experiment on Financial Decisions”
For now, assume you are interested in establishing whether there is social influence on financial decisions, and that you have the following experimental design:
Using this experimental design, you decide to estimate the following model:
decisionp = β0 + β1 informationp + εp (equation 4)
where, - decisionp is a dummy variable that indicates whether investor 2 in the pair p takes the same decision as her peer;
- informationp indicates whether pair p belongs to the treatment group and investor 2 received information on the decision of investor 1; finally, - εij is an error term
Yes. Because I have conducted an RCT in which I have randomized whether an investor learns or not about the decision of her peer. Then, I can identify a causal treatment effect in the parameter β1. If I see an effect on the decision, it means that his/her decision was influenced by the knowledge of what investor 1 did.
A researcher points out that equation 4 is not exploiting all the information in the data. She suggests that I can estimate the following model, which will allow me to identify not only the causal effect of knowing the peer’s decision, but also the causal effect of having a peer who doesn’t purchase the asset:
purchasep2 = β0 + β1 purchasep1 + β2 informationp + β3 purchasep1 × informationp + εp (equation 5)
where, - purchasep2 is a dummy variable that indicates whether investor 2 in pair p purchased the asset; - purchasep1 indicates whether investor 1 purchased the asset; - informationp indicates whether the pair p belongs to the treatment group of sharing information; - purchasep1×informationp is the interaction; finally, εp is an error term.
It is not possible to tell in this setting.
Why? In this setting, the researcher has randomized whether the second investor knows about the decision of the first one. However, pairs are endogenously formed and thus it is not possible to identify the causal effect of having a peer who declined to purchase the asset.