Question 1
We will examine a paper by Anastasia Semykina entitled, “Self-employment among women: Do children matter more than we previously thought?”. You are provided the following data.
March CPS white women (NLSY_white_women_JAE.csv): 12,624 women for several year. You observe the following variables
| Variable | definitions: |
|---|---|
| id | unique individual ID |
| year | year |
| working | =1 if working, otherwise |
| self_empl | =1 if self-employed, otherwise |
| age | age in years |
| agesq | age squared |
| educ | years of schooling, truncated at 20 years |
| edu_0_11 | =1 if has 0-11 years of schooling, 0 otherwise |
| edu_12 | =1 if has 12 years of schooling, 0 otherwise |
| edu_13_15 | =1 if has 13-15 years of schooling, 0 otherwise |
| edu_16plus | =1 if has 16 or more years of schooling, 0 otherwise |
| married | =1 if married, 0 if not |
| d_ch_1_5 | =1 if has children ages 0 to 5, otherwise |
| d_ch_0 | =1 if has a newborn (<1 years old), otherwise |
| d_ch_1_5_alt | =1 if has children ages 1 to 5, otherwise |
| d_ch_6_17 | =1 if has children ages 6 to 17, otherwise |
| rotter_score | locus of control |
| sesteem_score1 | self esteem score |
| urban | =1 if urban location, 0 otherwise |
| afqt_1 | AFQT score |
| south | =1 if South region, otherwise |
| northeast | =1 if Northeast region, otherwise |
| northcen | =1 if North Central region, otherwise |
| west | =1 if West region, otherwise |
| sp_inc1000 | spouse’s income in thousands of dollars |
| samesex | =1 if the first two children have the same gender, otherwise |
| policever | =1 if ever stopped by police for other than minor traffic offense in 1980, 0 otherwise |
| unemp_rate | unemployment rate in percentage points |
| m_sp_inc1000 | individual time mean of sp_inc1000 |
| m_married | individual time mean of married |
library(tidyverse)
library(texreg)
library(sampleSelection)
library(readr)
library(mfx)
library(tinytex)
NLSY <- read_csv("NLSY_white_women_JAE.csv")
#use kids as instrument, look at lm with dummy variables
ols.work<- lm(working ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
ols.work2<- lm(working ~ age + agesq + edu_0_11 + edu_12 + edu_13_15 + edu_16plus + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
ols.self<- lm(self_empl ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
ols.self2<- lm(self_empl ~ age + agesq + edu_0_11 + edu_12 + edu_13_15 + edu_16plus + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
#htmlreg(list(ols.work,ols.self), digits = 4)
#these labeling functions work but need to be put in a list
htmlreg(ols.work, custom.coef.names = c("Intercept", "Age", "Age squared", "Education", "Married", "Has a Child aged 1 to 5", "Has a Newborn", "Has a Child 6 to 17", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "OLS-Working", digits = 4)
| OLS-Working | |
|---|---|
| Intercept | 0.4729*** |
| (0.0565) | |
| Age | 0.0154*** |
| (0.0030) | |
| Age squared | -0.0003*** |
| (0.0000) | |
| Education | 0.0141*** |
| (0.0011) | |
| Married | 0.0521*** |
| (0.0047) | |
| Has a Child aged 1 to 5 | -0.1708*** |
| (0.0045) | |
| Has a Newborn | -0.0138 |
| (0.0074) | |
| Has a Child 6 to 17 | -0.0517*** |
| (0.0044) | |
| Locus of Control score | 0.0013 |
| (0.0008) | |
| Self-esteem score | 0.0023*** |
| (0.0005) | |
| Intelligence test score | 0.0012*** |
| (0.0001) | |
| Urban location | 0.0176*** |
| (0.0043) | |
| South location | 0.0139* |
| (0.0058) | |
| Northeast location | 0.0083 |
| (0.0064) | |
| North Central location | 0.0078 |
| (0.0057) | |
| Spouse’s Income | -0.0016*** |
| (0.0001) | |
| Ever stopped by police | -0.0118 |
| (0.0066) | |
| Unemployment rate | -0.0126*** |
| (0.0016) | |
| R2 | 0.1170 |
| Adj. R2 | 0.1166 |
| Num. obs. | 33365 |
| p < 0.001; p < 0.01; p < 0.05 | |
htmlreg(ols.self, custom.coef.names = c("Intercept","Age", "Age squared", "Education", "Married", "Has a Child aged 1 to 5", "Has a Newborn", "Has a Child 6 to 17", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "OLS-Self Employed", digits = 4)
| OLS-Self Employed | |
|---|---|
| Intercept | -0.0998* |
| (0.0460) | |
| Age | 0.0082*** |
| (0.0025) | |
| Age squared | -0.0001** |
| (0.0000) | |
| Education | -0.0016 |
| (0.0009) | |
| Married | 0.0020 |
| (0.0039) | |
| Has a Child aged 1 to 5 | 0.0450*** |
| (0.0038) | |
| Has a Newborn | -0.0055 |
| (0.0064) | |
| Has a Child 6 to 17 | 0.0211*** |
| (0.0037) | |
| Locus of Control score | -0.0020** |
| (0.0007) | |
| Self-esteem score | 0.0011** |
| (0.0004) | |
| Intelligence test score | 0.0001 |
| (0.0001) | |
| Urban location | 0.0034 |
| (0.0035) | |
| South location | -0.0294*** |
| (0.0047) | |
| Northeast location | -0.0304*** |
| (0.0052) | |
| North Central location | -0.0283*** |
| (0.0047) | |
| Spouse’s Income | 0.0006*** |
| (0.0001) | |
| Ever stopped by police | 0.0272*** |
| (0.0054) | |
| Unemployment rate | 0.0007 |
| (0.0013) | |
| R2 | 0.0254 |
| Adj. R2 | 0.0248 |
| Num. obs. | 28228 |
| p < 0.001; p < 0.01; p < 0.05 | |
#older - more likely to work but at a decreasing rate - r2 is negative. The more educated, the more likely to work, but less likely to be self-employed. Married, more likely to work. children less than 5, less likely to work but if you do work you're more likely to be self employed. Same with children 6-17. Children youger then a year, less likely to work but very small. Rotter score, self esteem positive. aftq is positive. Regions are relative to the west. Higher the income, less likley to be employed.
probit.work <- probitmfx(working ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
probit.self <- probitmfx(self_empl ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
#htmlreg(list(probit.work,probit.self), digits = 4)
htmlreg(probit.work, custom.coef.names = c( "Age", "Age squared", "Education", "Married", "Has a Child aged 1 to 5", "Has a Newborn", "Has a Child 6 to 17", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "Probit-Working", digits = 4)
| Probit-Working | |
|---|---|
| Age | 0.0107*** |
| (0.0029) | |
| Age squared | -0.0002*** |
| (0.0000) | |
| Education | 0.0163*** |
| (0.0011) | |
| Married | 0.0239*** |
| (0.0047) | |
| Has a Child aged 1 to 5 | -0.1730*** |
| (0.0050) | |
| Has a Newborn | -0.0166* |
| (0.0065) | |
| Has a Child 6 to 17 | -0.0474*** |
| (0.0043) | |
| Locus of Control score | 0.0011 |
| (0.0008) | |
| Self-esteem score | 0.0020*** |
| (0.0005) | |
| Intelligence test score | 0.0012*** |
| (0.0001) | |
| Urban location | 0.0190*** |
| (0.0041) | |
| South location | 0.0169** |
| (0.0053) | |
| Northeast location | 0.0108 |
| (0.0059) | |
| North Central location | 0.0087 |
| (0.0053) | |
| Spouse’s Income | -0.0012*** |
| (0.0001) | |
| Ever stopped by police | -0.0095 |
| (0.0064) | |
| Unemployment rate | -0.0121*** |
| (0.0016) | |
| Num. obs. | 33365 |
| Log Likelihood | -12297.3329 |
| Deviance | 24594.6659 |
| AIC | 24630.6659 |
| BIC | 24782.1406 |
| p < 0.001; p < 0.01; p < 0.05 | |
htmlreg(probit.self, custom.coef.names = c("Age", "Age squared", "Education", "Married", "Has a Child aged 1 to 5", "Has a Newborn", "Has a Child 6 to 17", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "Probit-Self-Employed", digits = 4)
| Probit-Self-Employed | |
|---|---|
| Age | 0.0101*** |
| (0.0024) | |
| Age squared | -0.0001*** |
| (0.0000) | |
| Education | -0.0019* |
| (0.0008) | |
| Married | 0.0109** |
| (0.0035) | |
| Has a Child aged 1 to 5 | 0.0437*** |
| (0.0040) | |
| Has a Newborn | -0.0017 |
| (0.0054) | |
| Has a Child 6 to 17 | 0.0184*** |
| (0.0035) | |
| Locus of Control score | -0.0019** |
| (0.0007) | |
| Self-esteem score | 0.0011** |
| (0.0004) | |
| Intelligence test score | 0.0001 |
| (0.0001) | |
| Urban location | 0.0034 |
| (0.0033) | |
| South location | -0.0238*** |
| (0.0038) | |
| Northeast location | -0.0247*** |
| (0.0038) | |
| North Central location | -0.0227*** |
| (0.0038) | |
| Spouse’s Income | 0.0003*** |
| (0.0000) | |
| Ever stopped by police | 0.0298*** |
| (0.0062) | |
| Unemployment rate | 0.0006 |
| (0.0013) | |
| Num. obs. | 28228 |
| Log Likelihood | -6881.4622 |
| Deviance | 13762.9243 |
| AIC | 13798.9243 |
| BIC | 13947.3896 |
| p < 0.001; p < 0.01; p < 0.05 | |
The models show similar results. We are seeing differences in marriage, policever (mattered in linear regression for the working model but does not matter in probit), and education.
Marriage is different, 1/2 the size. Education is more important in the probit model then the linear probability model. Policever - mattered in the linear regression working model but does not have the impact in the probit model.
all betas are % points if income increases by 1,000, my likelihood of working by xxx points.
Children is the variable that affects the likelihood of working but doesn’t appear to have any real impact on likelihood of being self-employed.
probit.work_children <- probitmfx(working ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
probit.self_nochildren <- probitmfx(self_empl ~ age + agesq + educ + married + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate, data=NLSY)
#htmlreg(list(probit.work_children,probit.self_nochildren), digits = 4)
#needs labels here
htmlreg(probit.work_children, custom.coef.names = c( "Age", "Age squared", "Education", "Married", "Has a Child aged 1 to 5", "Has a Newborn", "Has a Child 6 to 17", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "Probit-Working with children",digits = 4)
| Probit-Working with children | |
|---|---|
| Age | 0.0107*** |
| (0.0029) | |
| Age squared | -0.0002*** |
| (0.0000) | |
| Education | 0.0163*** |
| (0.0011) | |
| Married | 0.0239*** |
| (0.0047) | |
| Has a Child aged 1 to 5 | -0.1730*** |
| (0.0050) | |
| Has a Newborn | -0.0166* |
| (0.0065) | |
| Has a Child 6 to 17 | -0.0474*** |
| (0.0043) | |
| Locus of Control score | 0.0011 |
| (0.0008) | |
| Self-esteem score | 0.0020*** |
| (0.0005) | |
| Intelligence test score | 0.0012*** |
| (0.0001) | |
| Urban location | 0.0190*** |
| (0.0041) | |
| South location | 0.0169** |
| (0.0053) | |
| Northeast location | 0.0108 |
| (0.0059) | |
| North Central location | 0.0087 |
| (0.0053) | |
| Spouse’s Income | -0.0012*** |
| (0.0001) | |
| Ever stopped by police | -0.0095 |
| (0.0064) | |
| Unemployment rate | -0.0121*** |
| (0.0016) | |
| Num. obs. | 33365 |
| Log Likelihood | -12297.3329 |
| Deviance | 24594.6659 |
| AIC | 24630.6659 |
| BIC | 24782.1406 |
| p < 0.001; p < 0.01; p < 0.05 | |
htmlreg(probit.self_nochildren, custom.coef.names = c( "Age", "Age squared", "Education", "Married", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Unemployment rate"), custom.model.names = "Probit - Self-employed without children", digits = 4)
| Probit - Self-employed without children | |
|---|---|
| Age | 0.0167*** |
| (0.0023) | |
| Age squared | -0.0002*** |
| (0.0000) | |
| Education | -0.0033*** |
| (0.0008) | |
| Married | 0.0247*** |
| (0.0034) | |
| Locus of Control score | -0.0017** |
| (0.0007) | |
| Self-esteem score | 0.0010** |
| (0.0004) | |
| Intelligence test score | 0.0001 |
| (0.0001) | |
| Urban location | 0.0026 |
| (0.0033) | |
| South location | -0.0260*** |
| (0.0039) | |
| Northeast location | -0.0261*** |
| (0.0039) | |
| North Central location | -0.0228*** |
| (0.0039) | |
| Spouse’s Income | 0.0004*** |
| (0.0000) | |
| Ever stopped by police | 0.0287*** |
| (0.0062) | |
| Unemployment rate | 0.0007 |
| (0.0013) | |
| Num. obs. | 28228 |
| Log Likelihood | -6973.1138 |
| Deviance | 13946.2277 |
| AIC | 13976.2277 |
| BIC | 14099.9487 |
| p < 0.001; p < 0.01; p < 0.05 | |
Having children may affect your decision to work but not your decision to be self employed. High unemployment rate might affect your decision to work but not your decision to be self employed. having been stopped by the police might affect your ability to be employed but not your decision to work. So some variables that could be important are:policever, unemp_rate, d_ch_0
heck.1<-heckit(working ~ age + agesq + educ + married + d_ch_1_5 + d_ch_0 + d_ch_6_17 + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever + unemp_rate,self_empl ~ age + agesq + educ + married + rotter_score + sesteem_score1 + afqt_1 + urban + south + northeast + northcen + sp_inc1000 + policever, data = NLSY, method = "2step")
#s - selection, o - out model
#millsratio is significant meaning there is some bias coming from the selection choice. Positive means OLS is overstating its betas.
#htmlreg(list(heck.1$lm),digits = 5)
htmlreg(heck.1$lm, custom.coef.names = c("Intercept", "Age", "Age squared", "Education", "Married", "Locus of Control score", "Self-esteem score", "Intelligence test score", "Urban location", "South location", "Northeast location", "North Central location", "Spouse's Income", "Ever stopped by police", "Inverse Mills Ratio"), custom.model.names = "Heckman Model", digits = 4)
| Heckman Model | |
|---|---|
| Intercept | -0.2607*** |
| (0.0402) | |
| Age | 0.0144*** |
| (0.0023) | |
| Age squared | -0.0002*** |
| (0.0000) | |
| Education | 0.0006 |
| (0.0009) | |
| Married | 0.0149*** |
| (0.0037) | |
| Locus of Control score | -0.0017* |
| (0.0007) | |
| Self-esteem score | 0.0015*** |
| (0.0004) | |
| Intelligence test score | 0.0004*** |
| (0.0001) | |
| Urban location | 0.0066 |
| (0.0035) | |
| South location | -0.0261*** |
| (0.0047) | |
| Northeast location | -0.0277*** |
| (0.0051) | |
| North Central location | -0.0257*** |
| (0.0047) | |
| Spouse’s Income | 0.0002*** |
| (0.0001) | |
| Ever stopped by police | 0.0251*** |
| (0.0054) | |
| Inverse Mills Ratio | 0.1354*** |
| (0.0116) | |
| R2 | 0.0928 |
| Adj. R2 | 0.0923 |
| Num. obs. | 28228 |
| p < 0.001; p < 0.01; p < 0.05 | |
In the Heckman model the Inverse Mills Ratio is statistically significant as there is bias coming from the selection choice in the Heckman model. This means that OLS is overstating its betas.