※ 補助金政策(子供の人数に応じて保育補助金を給付する政策)が女性の就労に与える影響の構造推定(Roy model).コードは小西先生の授業 ( https://kdb.tsukuba.ac.jp/syllabi/2018/FH24012/jpn/ ) で使用されているStataプログラムを黒田が自己流でRに書き換えたもの.
library(foreign) # for reading 'dta' file
library(sampleSelection) # for heckit estimation
Using Mroz (1987 ECTA) PSID data [wave = 1975], DL from https://www.uam.es/personal_pdi/economicas/rsmanga/mef_aplications_0809.html .
mroz <- foreign::read.dta(file = "c://sbs2018/mroz.dta")
お行儀が悪いが,以下のように直接アクセスしてimportすることも可.
# url_mroz <- "https://www.uam.es/personal_pdi/economicas/rsmanga/docs/mroz.dta"
# mroz <- foreign::read.dta(file = url_mroz)
パッケージ sampleSelection にも用意されているのでそちらを利用してもよい.ただし変数名が異なるので要注意(例:mroz$inlf に対応するのは Mroz87$lfp).
# data(Mroz87) # in 'sampleSelection' package
Structure of mroz data frame
str(mroz)
## 'data.frame': 753 obs. of 22 variables:
## $ inlf : int 1 1 1 1 1 1 1 1 1 1 ...
## $ hours : int 1610 1656 1980 456 1568 2032 1440 1020 1458 1600 ...
## $ kidslt6 : int 1 0 1 0 1 0 0 0 0 0 ...
## $ kidsge6 : int 0 2 3 3 2 0 2 0 2 2 ...
## $ age : int 32 30 35 34 31 54 37 54 48 39 ...
## $ educ : int 12 12 12 12 14 12 16 12 12 12 ...
## $ wage : num 3.35 1.39 4.55 1.1 4.59 ...
## $ repwage : num 2.65 2.65 4.04 3.25 3.6 ...
## $ hushrs : int 2708 2310 3072 1920 2000 1040 2670 4120 1995 2100 ...
## $ husage : int 34 30 40 53 32 57 37 53 52 43 ...
## $ huseduc : int 12 9 12 10 12 11 12 8 4 12 ...
## $ huswage : num 4.03 8.44 3.58 3.54 10 ...
## $ faminc : num 16310 21800 21040 7300 27300 ...
## $ mtr : num 0.721 0.661 0.692 0.781 0.622 ...
## $ motheduc: int 12 7 12 7 12 14 14 3 7 7 ...
## $ fatheduc: int 7 7 7 7 14 7 7 3 7 7 ...
## $ unem : num 5 11 5 5 9.5 7.5 5 5 3 5 ...
## $ city : int 0 1 0 0 1 1 0 0 0 0 ...
## $ exper : int 14 5 15 6 7 33 11 35 24 21 ...
## $ nwifeinc: num 10.9 19.5 12 6.8 20.1 ...
## $ lwage : num 1.2102 0.3285 1.5141 0.0921 1.5243 ...
## $ expersq : int 196 25 225 36 49 1089 121 1225 576 441 ...
## - attr(*, "datalabel")= chr ""
## - attr(*, "time.stamp")= chr "10 Feb 2009 01:31"
## - attr(*, "formats")= chr "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
## - attr(*, "types")= int 98 105 98 98 98 98 102 102 105 98 ...
## - attr(*, "val.labels")= chr "" "" "" "" ...
## - attr(*, "var.labels")= chr "=1 if in lab frce, 1975" "hours worked, 1975" "# kids < 6 years" "# kids 6-18" ...
## - attr(*, "version")= int 7
head(cbind(colnames(mroz), attr(mroz, "var.labels"))) # short description
## [,1] [,2]
## [1,] "inlf" "=1 if in lab frce, 1975"
## [2,] "hours" "hours worked, 1975"
## [3,] "kidslt6" "# kids < 6 years"
## [4,] "kidsge6" "# kids 6-18"
## [5,] "age" "woman's age in yrs"
## [6,] "educ" "years of schooling"
mroz$winc <- with(mroz, wage * hours) # annual income
mroz$linc <- log(mroz$winc) # logarithm
Step 1: Heckman’s first stage (probit) \[ \Pr(D_i = 1 | X_i) = \Phi \left( \frac{z_i' \beta - (\mu + \alpha) n_i - x_i' \gamma}{\sigma_\eta} \right), \quad \tilde{\beta} = \frac{\beta}{\sigma_\eta} \]
Step 2: Heckman’s second stage (linear) \[ E [w_i | X_i, D_i = 1] = z_i' \beta + \rho \sigma_\epsilon \lambda (X_i' \tilde{\theta}_V) \]
fm_selection <- inlf ~ educ + exper + expersq + unem + nwifeinc + kidslt6 + kidsge6
fm_outcome <- winc ~ educ + exper + expersq + unem
heckit00 <- heckit(selection = fm_selection, outcome = fm_outcome, data = mroz, method = "2step")
summary(heckit00)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## 2-step Heckman / heckit estimation
## 753 observations (325 censored and 428 observed)
## 16 free parameters (df = 738)
## Probit selection equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.070824 0.332255 -6.233 7.71e-10 ***
## educ 0.152485 0.024807 6.147 1.29e-09 ***
## exper 0.126984 0.018440 6.886 1.23e-11 ***
## expersq -0.002464 0.000594 -4.148 3.75e-05 ***
## unem -0.026340 0.016335 -1.612 0.107286
## nwifeinc -0.016772 0.004751 -3.530 0.000441 ***
## kidslt6 -0.558067 0.103770 -5.378 1.01e-07 ***
## kidsge6 0.132657 0.040041 3.313 0.000968 ***
## Outcome equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2547.224 2262.864 -1.126 0.2607
## educ 505.569 102.217 4.946 9.38e-07 ***
## exper 246.804 111.542 2.213 0.0272 *
## expersq -2.869 2.822 -1.017 0.3096
## unem -89.686 64.254 -1.396 0.1632
## Multiple R-Squared:0.1975, Adjusted R-Squared:0.188
## Error terms:
## Estimate Std. Error t value Pr(>|t|)
## invMillsRatio -786.0960 1104.6538 -0.712 0.477
## sigma 3888.8738 NA NA NA
## rho -0.2021 NA NA NA
## --------------------------------------------
str(heckit00$coefficients) # first step: probit selection eq.
## Named num [1:16] -2.07082 0.15248 0.12698 -0.00246 -0.02634 ...
## - attr(*, "names")= chr [1:16] "(Intercept)" "educ" "exper" "expersq" ...
str(heckit00$lm$coefficients) # second step: outcome (wage) eq.
## Named num [1:6] -2547.22 505.57 246.8 -2.87 -89.69 ...
## - attr(*, "names")= chr [1:6] "XO(Intercept)" "XOeduc" "XOexper" "XOexpersq" ...
cf.
probit_sel <- glm(formula = fm_selection, family = binomial(probit), data = mroz)
summary(probit_sel)
##
## Call:
## glm(formula = fm_selection, family = binomial(probit), data = mroz)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2936 -0.9601 0.5195 0.8852 2.4616
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.0708235 0.3329048 -6.220 4.96e-10 ***
## educ 0.1524850 0.0249703 6.107 1.02e-09 ***
## exper 0.1269837 0.0183895 6.905 5.01e-12 ***
## expersq -0.0024639 0.0005907 -4.171 3.03e-05 ***
## unem -0.0263399 0.0162702 -1.619 0.105468
## nwifeinc -0.0167724 0.0048504 -3.458 0.000544 ***
## kidslt6 -0.5580667 0.1031070 -5.412 6.22e-08 ***
## kidsge6 0.1326571 0.0406115 3.266 0.001089 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1029.7 on 752 degrees of freedom
## Residual deviance: 841.0 on 745 degrees of freedom
## AIC: 857
##
## Number of Fisher Scoring iterations: 4
\[ \lambda (\cdot) = \frac{\phi(\cdot)}{\Phi(\cdot)} \]
mroz$imr <- dnorm(probit_sel$linear.predictors) / pnorm(probit_sel$linear.predictors) # inv Mills ratio
# plot(x = mroz$imr, y = sampleSelection::invMillsRatio(probit_sel)$IMR1) # check
summary(mroz$imr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05675 0.39292 0.63470 0.73644 0.99112 3.09369
wage_eq <- lm(formula = update.formula(fm_outcome, . ~ . + imr), data = mroz, subset = winc > 0)
summary(wage_eq)
##
## Call:
## lm(formula = update.formula(fm_outcome, . ~ . + imr), data = mroz,
## subset = winc > 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9265 -2649 -460 2409 16382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2547.239 2270.136 -1.122 0.262
## educ 505.569 102.444 4.935 1.16e-06 ***
## exper 246.804 111.909 2.205 0.028 *
## expersq -2.869 2.830 -1.014 0.311
## unem -89.686 64.375 -1.393 0.164
## imr -786.088 1109.462 -0.709 0.479
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3876 on 422 degrees of freedom
## Multiple R-squared: 0.1975, Adjusted R-squared: 0.188
## F-statistic: 20.78 on 5 and 422 DF, p-value: < 2.2e-16
cf.
\[ \sigma_\eta = \frac{\beta}{\tilde{\beta}} = \frac{\mbox{(coef. of wage eq.)}}{\mbox{(coef. of selection eq.)}} \]
sigma2 <- heckit00$lm$coefficients["XOunem"] / heckit00$coefficients["unem"] # county unemployment rate [%]
sigma2
## XOunem
## 3404.972
Other \(\sigma_\eta\) candidates:
heckit00$lm$coefficients["XOeduc"] / heckit00$coefficients["educ"] # years of schooling
## XOeduc
## 3315.533
heckit00$lm$coefficients["XOexper"] / heckit00$coefficients["exper"] # labor market experience [year]
## XOexper
## 1943.587
\[ \Pr(D_i = 1 | X_i, \mbox{subsidy} q) - \Pr(D_i = 1 | X_i) = \Phi \left( \frac{z_i' \beta - (\mu - q + \alpha) n_i - x_i' \gamma}{\sigma_\eta} \right) - \Phi \left( \frac{z_i' \beta - (\mu + \alpha) n_i - x_i' \gamma}{\sigma_\eta} \right) \]
cf.
# heckit00$coefficients["kidslt6"] = - (mu + alpha)/sigma_eta
cf1 <- heckit00$coefficients["kidslt6"] + (500 / sigma2) # q = 500 USD/y per one child (age < 6)
cf2 <- heckit00$coefficients["kidslt6"] + (1000 / sigma2) # q = 1,000
cf3 <- heckit00$coefficients["kidslt6"] + (1500 / sigma2) # q = 1,500
heckit00_sel_coef <- heckit00$coefficients[c("(Intercept)", "educ", "exper", "expersq", "unem", "nwifeinc", "kidslt6", "kidsge6")]
heckit00_sel_coef_cf1 <- c(heckit00_sel_coef[1:6], cf1, heckit00_sel_coef[8])
heckit00_sel_coef_cf2 <- c(heckit00_sel_coef[1:6], cf2, heckit00_sel_coef[8])
heckit00_sel_coef_cf3 <- c(heckit00_sel_coef[1:6], cf3, heckit00_sel_coef[8])
mroz_sel <- cbind(1, mroz[, c("educ", "exper", "expersq", "unem", "nwifeinc", "kidslt6", "kidsge6")])
head(t(heckit00_sel_coef * t(mroz_sel)))
## 1 educ exper expersq unem nwifeinc
## 1 -2.070824 1.829819 1.7777707 -0.48292874 -0.1316986 -0.1829867
## 2 -2.070824 1.829819 0.6349181 -0.06159805 -0.2897369 -0.3270593
## 3 -2.070824 1.829819 1.9047543 -0.55438248 -0.1316986 -0.2019369
## 4 -2.070824 1.829819 0.7619017 -0.08870120 -0.1316986 -0.1140515
## 5 -2.070824 2.134789 0.8888854 -0.12073218 -0.2502274 -0.3371240
## 6 -2.070824 1.829819 4.1904595 -2.68321120 -0.1975479 -0.1653589
## kidslt6 kidsge6
## 1 -0.5580674 0.0000000
## 2 0.0000000 0.2653132
## 3 -0.5580674 0.3979698
## 4 0.0000000 0.3979698
## 5 -0.5580674 0.2653132
## 6 0.0000000 0.0000000
mroz$pk0 <- pnorm(q = apply(t(heckit00_sel_coef * t(mroz_sel)), 1, sum))
mroz$pk1 <- pnorm(q = apply(t(heckit00_sel_coef_cf1 * t(mroz_sel)), 1, sum))
mroz$pk2 <- pnorm(q = apply(t(heckit00_sel_coef_cf2 * t(mroz_sel)), 1, sum))
mroz$pk3 <- pnorm(q = apply(t(heckit00_sel_coef_cf3 * t(mroz_sel)), 1, sum))
summary(mroz$pk0)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002591 0.385960 0.606166 0.568381 0.770972 0.976533
summary(mroz$pk3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02716 0.44248 0.63576 0.60277 0.78261 0.97653
sum(mroz$pk0 > .5) / nrow(mroz) # without subsidy
## [1] 0.6215139
sum(mroz$pk1 > .5) / nrow(mroz) # with 0.5K USD/y subsidy
## [1] 0.6334661
sum(mroz$pk2 > .5) / nrow(mroz) # with 1.0K USD/y subsidy
## [1] 0.6586985
sum(mroz$pk3 > .5) / nrow(mroz) # with 1.5K USD/y subsidy
## [1] 0.6733068
# bandwidth
bw0 <- bw.SJ(mroz$pk0) # Sheather & Jones (1991)
# bw0 <- bw.nrd0(mroz$pk0) # [default] Silverman's rule of thumb
# bw0 <- bw.nrd(mroz$pk0) # Scott (1992)
# bw0 <- bw.ucv(mroz$pk0) # unbiased cross-validation
plot(density(mroz$pk3, bw = bw0), col = 4, lty = 2, axes = F, main = "Density of Pr(working)")
axis(side = 1); axis(side = 2)
lines(density(mroz$pk2, bw = bw0), col = 3, lty = 2)
lines(density(mroz$pk1, bw = bw0), col = 2, lty = 2)
lines(density(mroz$pk0, bw = bw0), col = 1)
legend(x = "topleft", legend = c("CF: 1,500 USD/year", "CF: 1000", "CF: 500", "No subsidy"), col = 4:1, lty = c(2, 2, 2, 1))
Same options as ‘Lec 5.do’, modified version, DL from manaba
plot(density(mroz$pk3, bw = .1, kernel = "epanechnikov"), col = 4, lty = 2, axes = F, main = "Density of Pr(working)")
axis(side = 1); axis(side = 2)
lines(density(mroz$pk2, bw = .1, kernel = "epanechnikov"), col = 3, lty = 2)
lines(density(mroz$pk1, bw = .1, kernel = "epanechnikov"), col = 2, lty = 2)
lines(density(mroz$pk0, bw = .1, kernel = "epanechnikov"), col = 1)
legend(x = "topleft", legend = c("CF: 1,500 USD/year", "CF: 1000", "CF: 500", "No subsidy"), col = 4:1, lty = c(2, 2, 2, 1))
cf.
mu0 <- c(0, 0) # mu
varcov_el <- with(heckit00, c(sigma^2, rep(sigma*rho, 2), 1))
varcov0 <- matrix(varcov_el, nrow = 2) # variance-covariance matrix
library(MASS) # for 'mvrnorm' function
eps_eta <- MASS::mvrnorm(n = nrow(mroz), mu = mu0, Sigma = varcov0)
cor(eps_eta)
## [,1] [,2]
## [1,] 1.0000000 -0.2098448
## [2,] -0.2098448 1.0000000
mroz$eps0 <- eps_eta[, 1] # epsilon (error term of wage eq.)
mroz$eta0 <- eps_eta[, 2] # eta (error term of selection eq.)
mroz$x0 <- apply(t(heckit00_sel_coef * t(mroz_sel)), 1, sum) + mroz$eta0
mroz$x3 <- apply(t(heckit00_sel_coef_cf3 * t(mroz_sel)), 1, sum) + mroz$eta0
# summary(mroz$x0)
# summary(mroz$x3)
table(round(mroz$x3 - mroz$x0, digits = 3))
##
## 0 0.441 0.881 1.322
## 606 118 26 3
子供のいる女性に対してのみ効果が生じる(という理論モデルを仮定している)ことに注意.よって,上記(効果の頻度)は以下(子供の人数)と対応している.
table(mroz$kidslt6)
##
## 0 1 2 3
## 606 118 26 3
xl <- range(with(mroz, c(x0, x3)))
yl <- range(mroz$eps0)
plot(x =mroz$x0 , y = mroz$eps0, col = 3, pch = 19, xlim = xl, ylim = yl, axes = F, xlab = "V*", ylab = "epsilon")
axis(side = 1); axis(side = 2)
par(new = T)
plot(x = mroz$x3, y = mroz$eps0, col = 2, pch = 3, xlim = xl, ylim = yl, axes = F, ann = F)
legend(x = "topright", legend = c("CF3", "No subsidy"), col = 2:3, pch = c(3, 19))
wage_eq_name <- c("educ", "exper", "expersq", "unem")
mroz$w <- apply(t(heckit00$lm$coefficients[1:5] * t(cbind(1, mroz[, wage_eq_name]))), 1, sum) + mroz$eps0
mroz$w0 <- ifelse(mroz$x0 >= 0, mroz$w, NA)
mroz$w3 <- ifelse(mroz$x3 >= 0, mroz$w, NA)
hist(mroz$w3[mroz$w3 > 0], breaks = 30, xlab = "Income")
hist(mroz$w0[mroz$w0 > 0], breaks = 30, add = T, col = 3)
legend(x = "topright", legend = c("CF3", "No subsidy"), col = 1, pt.bg = c(0, 3), pch = 22)
EOS