Jackie Finik
11/06/19
Proposed a 2-step approach to correct for 'sample selection'
\( y_i \)* = \( x'_i\beta \) + \( \epsilon_i \) [Outcome Equation]
\( d_i \)* = \( z'_i\gamma \) + \( \upsilon_i \) [Selection Equation]
\( y_i \) = \( x'_i\beta \) + \( \mu\hat{\lambda_i} \) + \( \epsilon_i \) [Final Equation with Heckman Correction]
Heckman model assumptions
lfp hours kids5 kids618 age educ wage repwage hushrs husage huseduc
1 1 1610 1 0 32 12 3.3540 2.65 2708 34 12
2 1 1656 0 2 30 12 1.3889 2.65 2310 30 9
3 1 1980 1 3 35 12 4.5455 4.04 3072 40 12
4 1 456 0 3 34 12 1.0965 3.25 1920 53 10
5 1 1568 1 2 31 14 4.5918 3.60 2000 32 12
6 1 2032 0 0 54 12 4.7421 4.70 1040 57 11
huswage faminc mtr motheduc fatheduc unem city exper nwifeinc
1 4.0288 16310 0.7215 12 7 5.0 0 14 10.910060
2 8.4416 21800 0.6615 7 7 11.0 1 5 19.499981
3 3.5807 21040 0.6915 12 7 5.0 0 15 12.039910
4 3.5417 7300 0.7815 7 7 5.0 0 6 6.799996
5 10.0000 27300 0.6215 12 14 9.5 1 7 20.100058
6 6.7106 19495 0.6915 14 7 7.5 1 33 9.859054
wifecoll huscoll
1 FALSE FALSE
2 FALSE FALSE
3 FALSE FALSE
4 FALSE FALSE
5 TRUE FALSE
6 FALSE FALSE
library(sampleSelection)
data ("Mroz87")
Mroz87$kids <- (Mroz87$kids5 + Mroz87$kids618 > 0)
#regular OLS model
ols1 = lm(wage ~ educ + exper + I( exper^2 ) + city, data=subset(Mroz87, lfp==1))
summary(ols1)
Call:
lm(formula = wage ~ educ + exper + I(exper^2) + city, data = subset(Mroz87,
lfp == 1))
Residuals:
Min 1Q Median 3Q Max
-5.6021 -1.6012 -0.4787 0.8950 21.2762
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.5609920 0.9288390 -2.757 0.00608 **
educ 0.4809623 0.0668679 7.193 2.91e-12 ***
exper 0.0324982 0.0615864 0.528 0.59800
I(exper^2) -0.0002602 0.0018378 -0.142 0.88747
city 0.4492741 0.3177735 1.414 0.15815
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.111 on 423 degrees of freedom
Multiple R-squared: 0.1248, Adjusted R-squared: 0.1165
F-statistic: 15.08 on 4 and 423 DF, p-value: 1.569e-11
#estimate the selection, followed by outcome models
greeneTS <- selection(lfp~ age + I(age^2) + faminc + kids + educ, + wage ~ exper + I(exper^2) + educ + city, data = Mroz87, method = "2step")
#exclusion restriction (including var(s) in selection modeling not in outcome modeling; satisfied by age, faminc, kids)
summary(greeneTS)
--------------------------------------------
Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
753 observations (325 censored and 428 observed)
14 free parameters (df = 740)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.157e+00 1.402e+00 -2.965 0.003127 **
age 1.854e-01 6.597e-02 2.810 0.005078 **
I(age^2) -2.426e-03 7.735e-04 -3.136 0.001780 **
faminc 4.580e-06 4.206e-06 1.089 0.276544
kidsTRUE -4.490e-01 1.309e-01 -3.430 0.000638 ***
educ 9.818e-02 2.298e-02 4.272 2.19e-05 ***
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9712003 2.0593505 -0.472 0.637
exper 0.0210610 0.0624646 0.337 0.736
I(exper^2) 0.0001371 0.0018782 0.073 0.942
educ 0.4170174 0.1002497 4.160 3.56e-05 ***
city 0.4438379 0.3158984 1.405 0.160
Multiple R-Squared:0.1264, Adjusted R-Squared:0.116
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -1.098 1.266 -0.867 0.386
sigma 3.200 NA NA NA
rho -0.343 NA NA NA
--------------------------------------------
#sigma > 0 observed outcomes are 'better' than average
greeneML <- selection (lfp ~ age + I(age^2) + faminc + kids + educ, + wage ~ exper + I(exper^2) + educ + city, data = Mroz87, maxMethod = "BHHH")
summary(greeneML)
--------------------------------------------
Tobit 2 model (sample selection model)
Maximum Likelihood estimation
BHHH maximisation, 62 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -1581.259
753 observations (325 censored and 428 observed)
13 free parameters (df = 740)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.120e+00 1.410e+00 -2.921 0.00359 **
age 1.840e-01 6.584e-02 2.795 0.00532 **
I(age^2) -2.409e-03 7.735e-04 -3.115 0.00191 **
faminc 5.676e-06 3.890e-06 1.459 0.14493
kidsTRUE -4.507e-01 1.367e-01 -3.298 0.00102 **
educ 9.533e-02 2.400e-02 3.973 7.8e-05 ***
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.9537242 1.6745690 -1.167 0.244
exper 0.0284295 0.0753989 0.377 0.706
I(exper^2) -0.0001151 0.0023339 -0.049 0.961
educ 0.4562471 0.0959626 4.754 2.39e-06 ***
city 0.4451424 0.4255420 1.046 0.296
Error terms:
Estimate Std. Error t value Pr(>|t|)
sigma 3.10350 0.08368 37.088 <2e-16 ***
rho -0.13328 0.22296 -0.598 0.55
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
--------------------------------------------
Simulation Study (Binary Outcome)
Simulation Study (Continuous Outcome)