Question 1

a) Plot a scatter plot of labour force participation against labour market experience. Can you tell from this graph whether there is a relationship between labour market experience and the probability of a woman being in the labour force? Why or Why not?

plot(PS4DATA$exper, PS4DATA$lfp, xlab="LFP (1=in LF, 0 otherwise)", ylab="Labour market experience")

I cannot tell whether there is a relationship between LFP and experience form this graph, because it is difficult to access the counts of participating women for each bin of experience.

b) Write OLS regression you would use to estimate the relationship between LFP and labour market experience. Estimate this regression and interpret your estimates of both the intercept and the slope.

If labour market participation were a continuous variable (say a probability observed between [0,1]) I might estimate the regression: \[y_i = \alpha + \beta_1 x_i + \epsilon_i,\] where \(y\) is LFP, \(\alpha\) and \(\beta_1\) constants, \(x\) labour market experience, \(\epsilon\) an error term, and \(i\) the observation’s index. We would run

fit1 <- lm(lfp ~ exper, PS4DATA) # note, I am not using heteroskedasticity-corrected errors
fit1_robust <- lm_robust(lfp ~ exper, PS4DATA, se_type="HC1") # I save this for later use
se_fit1 <- sqrt(diag(vcovHC.default(fit1, type = "HC1")))
stargazer(fit1, se = list(se_fit1, NULL), type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 lfp            
## -----------------------------------------------
## exper                        0.021***          
##                               (0.002)          
##                                                
## Constant                     0.345***          
##                               (0.029)          
##                                                
## -----------------------------------------------
## Observations                    753            
## R2                             0.117           
## Adjusted R2                    0.116           
## Residual Std. Error      0.466 (df = 751)      
## F Statistic           99.795*** (df = 1; 751)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Where the coefficient .345 is the significant increase in the probability of LFP for a unit increase labour market experience. The intercept is the mean probability of LFP for individuals with no labour market experience.

c) Using the estimate in 1.b), compute the probability that a woman with 45 years of work experience is in the labour force. Interpret.

fit1_45 <- predict(fit1_robust, newdata = list("exper" = c(45)))
fit1_45
##        1 
## 1.291397

A woman with 45 years of work experience is associated a probability greater than 1 of participating in the work force. This is absurd, and results from the fact that the linear probability model(LPM) is unbounded. The plot below makes this clear. We can interpret this as by bounding all computed LPM to [0,1], such that a woman with 45 years of labour market experience has a probability 1 of LFP.

fit1_robust.stats <- summary(fit1_robust)
plot(PS4DATA$exper, PS4DATA$lfp, xlab="LFP (1=in LF, 0 otherwise)", ylab="Labour market experience")
abline(a=fit1_robust.stats$coefficients[1,1], 
      b=fit1_robust.stats$coefficients[2,1],
      col="red", lwd=2)
legend(x="bottomright", legend="linear probability model", col="red", lty=1, lwd=2)

d) Write the probit model you would use to estimate the relationship between labour force participation and labour market experience. Estimate this regression and calculate the probability that a woman with 45 yeras of work experience is in the labour force.

fit2 <- glm(lfp~exper,binomial(link = "probit"),data=PS4DATA)
fit2_45 <- predict(fit2, newdata = list("exper" = c(45)), type = "response")
fit2_45
##         1 
## 0.9886487
sort.exper <- sort(PS4DATA$exper)
fit2_preds <- predict(fit2, newdata = list("exper" = sort.exper), type = "response")

e) Generate a graph that plots the original data, the fitted values from the linear probability model and the probit model. Mark the fitted value when when Exper_i=45 in both LPM and probit specifications

plot(PS4DATA$exper, PS4DATA$lfp, xlab="LFP (1=in LF, 0 otherwise)", ylab="Labour market experience", xlim = c(-.25,50), ylim=c(-.25,1.5), pch=1)
abline(a=fit1_robust.stats$coefficients[1,1], 
      b=fit1_robust.stats$coefficients[2,1],
      col="red", lwd=2)
lines(sort.exper, fit2_preds, col="cornflowerblue", lwd=2)
points(x=c(45, 45),y=c(fit1_45, fit2_45), pch="+", col=c("red", "blue"))
text(x=c(45-5, 45-5),y=c(fit1_45+.1, fit2_45-.1), c("LPM|Exper=45","probit|Exper=45"))
legend(x="bottomright", legend=c("LPM", "probit model"), col=c("red", "cornflowerblue"), lty=c(1,1), lwd=c(2,2))

Question 2

a) Regress the LFP dummy on the dummy youngkids. Interpret you results.

fit3 <- lm(lfp~youngkids, data=PS4DATA)
se_fit3 <- sqrt(diag(vcovHC.default(fit3, type = "HC1")))
stargazer(fit3, se = list(se_fit3, NULL), type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 lfp            
## -----------------------------------------------
## youngkids                    -0.258***         
##                               (0.044)          
##                                                
## Constant                     0.619***          
##                               (0.020)          
##                                                
## -----------------------------------------------
## Observations                    753            
## R2                             0.043           
## Adjusted R2                    0.041           
## Residual Std. Error      0.485 (df = 751)      
## F Statistic           33.512*** (df = 1; 751)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The mean uniform proportion of labour force participants for the group of females with young children (youngkids=1) is significantly 25.8 percentage points lower than the group of females without.

b) Provide a potential OVB

In the Becker model, parents choose to have less children as the opportunity cost of child bearing increases. If discrimination lowers the labour-market prospects of subgroups, then discriminated subgroups of the population are both more likely to participate less in the labour market and have a lower opportunity cost of – and thus higher incentives to – bearing young children.

The OV would be discrimiation (D). LFP would be a negative function of D. Youngkids would be a positive function of D.

c) add women’s education level and reestimate 2a). Interpret the coefficient on youngkids.

fit4 <- lm(lfp~youngkids + educ, data=PS4DATA)
se_fit4 <- sqrt(diag(vcovHC.default(fit4, type = "HC1")))
stargazer(fit4, se = list(se_fit4, NULL), type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 lfp            
## -----------------------------------------------
## youngkids                    -0.281***         
##                               (0.043)          
##                                                
## educ                         0.045***          
##                               (0.007)          
##                                                
## Constant                       0.070           
##                               (0.091)          
##                                                
## -----------------------------------------------
## Observations                    753            
## R2                             0.085           
## Adjusted R2                    0.083           
## Residual Std. Error      0.475 (df = 750)      
## F Statistic           34.967*** (df = 2; 750)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Raising young children decreases the mean probability of LFP for women by percentage points, all else equal.

d) Using the estimates in regression 2c), what is the predicted effect on Pr(LFP=1) of increasing education from 10 to 20 years holding youngkids constant at 1?

fit4.preds <- predict(fit4, newdata=data.frame("youngkids"=c(1,1), "educ"=c(10,20)))
fit4.preds[2]-fit4.preds[1]
##         2 
## 0.4502317

Holding youngkids=1 constant, increasing from 10 to 20 years of education increases the probability of LFP by 45.02 percentage points in the 2c) LPM.

e) Now estimate the probit specification. Provide a 95% CI on your beta 1 estimate. Calculate the predicted effect on Pr(LFP=1) of increasing education from 10 to 20 years holding constant youngkids=1.

fit5 <- glm(lfp~educ + youngkids, family=binomial("probit"), data=PS4DATA)
se_fit5 <- sqrt(diag(vcovHC.default(fit5, type = "HC1")))
stargazer(fit5, se = list(se_fit5, NULL), type = "text", 
          column.labels = c("robust se", "normal se"))
## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                               lfp            
##                            robust se         
## ---------------------------------------------
## educ                       0.125***          
##                             (0.021)          
##                                              
## youngkids                  -0.748***         
##                             (0.120)          
##                                              
## Constant                   -1.205***         
##                             (0.262)          
##                                              
## ---------------------------------------------
## Observations                  753            
## Log Likelihood             -481.589          
## Akaike Inf. Crit.           969.179          
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01
fit5.stats <- summary(fit5)
fit5.coefstats <- fit5.stats$coefficients
beta1 <- fit5.coefstats[3,1]
beta1.sd <- fit5.coefstats[3,2]
beta1.ci <- c(beta1-1.96*beta1.sd,
              beta1+1.96*beta1.sd)
fit5.preds <- predict(fit5, newdata=data.frame("youngkids"=c(1,1), "educ"=c(10,20)), type="response")

The 95% CI for beta 1 is

## [1] -0.9846188 -0.5112666

and increasing years of educated from 10 to 20 results in a percentage point increase of LFP of

## [1] 46.55717

```