Causal Inference
Introduction In this analysis, we will use the simple difference between the mean, propensity score matching, and genetic matching to investigate the Lalonde Sample.
Data Setup
library(foreign)
nsw_dw <- read.dta("~/Causal Inference/nsw_dw.dta")
nsw_dw.treated <- subset(nsw_dw, treat==1)
nsw_dw.control <- subset(nsw_dw, treat==0)
cps_control <- read.dta("~/Causal Inference/cps_controls.dta")
Treatment Effect by Simple Difference in Mean
nsw_dw.treated.mean <- mean(nsw_dw.treated$re78)
nsw_dw.control.mean <- mean(nsw_dw.control$re78)
treatmentEffect.differenceInMean <- nsw_dw.treated.mean - nsw_dw.control.mean
treatmentEffect.differenceInMean
## [1] 1794.342
# 95% confidence interval
t.test(nsw_dw.treated$re78, nsw_dw.control$re78, conf.level=0.95)
##
## Welch Two Sample t-test
##
## data: nsw_dw.treated$re78 and nsw_dw.control$re78
## t = 2.6741, df = 307.13, p-value = 0.007893
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 474.0105 3114.6743
## sample estimates:
## mean of x mean of y
## 6349.144 4554.801
A simple difference in the mean shows a positive treatment effect of 1794.342, with 95% confidence interval between 474.0105 and 3114.6743. This is a naive way of analysis data, the estimated treatment effect contains sample bias, and as this is not an RCT, we cannot assume sample bias is 0, meaning the result produced by this method is suspicious.
Treatment Effect by Difference in Mean using CPS Control
cps_control.mean <- mean(cps_control$re78)
treatmentEffect.cpsDifferenceInMean <- nsw_dw.treated.mean - cps_control.mean
treatmentEffect.cpsDifferenceInMean
## [1] -8497.516
# 95% confidence interval
t.test(nsw_dw.treated$re78, cps_control$re78, conf.level=0.95)
##
## Welch Two Sample t-test
##
## data: nsw_dw.treated$re78 and cps_control$re78
## t = -14.565, df = 190.46, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9648.335 -7346.698
## sample estimates:
## mean of x mean of y
## 6349.144 14846.660
By substitute the control data in Lalonde Sample to the actual National Supported Work Demonstration randomized experiment data, we get a better understanding of the population. The National Supported Work Demonstration dataset has 15992 control observations, where the Lalonde Sample only contains 260 control observations. Assuming the 185 treated observations are selected from the same population of the National Supported Work Demonstration randomized experiment data, using this 15992 control observation as the control group in the difference in mean analysis can help us obtain a precise understanding of the (control) population. However, if there is any sample bias on which person will get the treatment, a simple difference in mean cannot avoid that sample bias.
Propensity Score Matching
library(Matching)
## Loading required package: MASS
## ##
## ## Matching (Version 4.9-2, Build Date: 2015-12-25)
## ## See http://sekhon.berkeley.edu/matching for additional documentation.
## ## Please cite software as:
## ## Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ## Software with Automated Balance Optimization: The Matching package for R.''
## ## Journal of Statistical Software, 42(7): 1-52.
## ##
# propensity score matching
propensityScoreModel <- glm(treat ~ age + I(age^2) + education + I(education^2) + black +
hispanic + married + nodegree + re74 + I(re74^2) + re75 + I(re75^2),
family=binomial, data=nsw_dw)
propensityScoreMatching.ATT <- Match(Y=nsw_dw$re78, Tr=nsw_dw$treat, X=propensityScoreModel$fitted, M=1)
summary(propensityScoreMatching.ATT)
##
## Estimate... 1379.6
## AI SE...... 777.76
## T-stat..... 1.7738
## p.val...... 0.076096
##
## Original number of observations.............. 445
## Original number of treated obs............... 185
## Matched number of observations............... 185
## Matched number of observations (unweighted). 494
# 95% confidence interval
attach(propensityScoreMatching.ATT)
## The following object is masked from package:base:
##
## version
c(est-1.96*se, est+1.96*se)
## [1] -144.8159 2904.0129
detach(propensityScoreMatching.ATT)
# balance test
propensityScoreMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
black + hispanic + married + nodegree + re74 +
I(re74^2) + re75 + I(re75^2), data=nsw_dw,
match.out=propensityScoreMatching.ATT, nboots=10)
##
## ***** (V1) age *****
## Before Matching After Matching
## mean treatment........ 25.816 25.816
## mean control.......... 25.054 26.189
## std mean diff......... 10.655 -5.2154
##
## mean raw eQQ diff..... 0.94054 0.85628
## med raw eQQ diff..... 1 1
## max raw eQQ diff..... 7 9
##
## mean eCDF diff........ 0.025364 0.024169
## med eCDF diff........ 0.022193 0.020243
## max eCDF diff........ 0.065177 0.080972
##
## var ratio (Tr/Co)..... 1.0278 0.91034
## T-test p-value........ 0.26594 0.58434
## KS Bootstrap p-value.. 0.7 < 2.22e-16
## KS Naive p-value...... 0.7481 0.078412
## KS Statistic.......... 0.065177 0.080972
##
##
## ***** (V2) I(age^2) *****
## Before Matching After Matching
## mean treatment........ 717.39 717.39
## mean control.......... 677.32 741.82
## std mean diff......... 9.2937 -5.6629
##
## mean raw eQQ diff..... 56.076 52.844
## med raw eQQ diff..... 43 43
## max raw eQQ diff..... 721 909
##
## mean eCDF diff........ 0.025364 0.024169
## med eCDF diff........ 0.022193 0.020243
## max eCDF diff........ 0.065177 0.080972
##
## var ratio (Tr/Co)..... 1.0115 0.76029
## T-test p-value........ 0.33337 0.58978
## KS Bootstrap p-value.. 0.7 < 2.22e-16
## KS Naive p-value...... 0.7481 0.078412
## KS Statistic.......... 0.065177 0.080972
##
##
## ***** (V3) education *****
## Before Matching After Matching
## mean treatment........ 10.346 10.346
## mean control.......... 10.088 10.252
## std mean diff......... 12.806 4.6502
##
## mean raw eQQ diff..... 0.40541 0.10931
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2 2
##
## mean eCDF diff........ 0.028698 0.007808
## med eCDF diff........ 0.012682 0.0040486
## max eCDF diff........ 0.12651 0.038462
##
## var ratio (Tr/Co)..... 1.5513 0.84297
## T-test p-value........ 0.15017 0.64164
## KS Bootstrap p-value.. < 2.22e-16 0.5
## KS Naive p-value...... 0.062873 0.85831
## KS Statistic.......... 0.12651 0.038462
##
##
## ***** (V4) I(education^2) *****
## Before Matching After Matching
## mean treatment........ 111.06 111.06
## mean control.......... 104.37 109.88
## std mean diff......... 17.012 2.9945
##
## mean raw eQQ diff..... 8.7189 1.9393
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 60 60
##
## mean eCDF diff........ 0.028698 0.007808
## med eCDF diff........ 0.012682 0.0040486
## max eCDF diff........ 0.12651 0.038462
##
## var ratio (Tr/Co)..... 1.6625 0.9462
## T-test p-value........ 0.053676 0.74728
## KS Bootstrap p-value.. < 2.22e-16 0.5
## KS Naive p-value...... 0.062873 0.85831
## KS Statistic.......... 0.12651 0.038462
##
##
## ***** (V5) black *****
## Before Matching After Matching
## mean treatment........ 0.84324 0.84324
## mean control.......... 0.82692 0.81171
## std mean diff......... 4.4767 8.6493
##
## mean raw eQQ diff..... 0.016216 0.0080972
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.0081601 0.0040486
## med eCDF diff........ 0.0081601 0.0040486
## max eCDF diff........ 0.01632 0.0080972
##
## var ratio (Tr/Co)..... 0.92503 0.86488
## T-test p-value........ 0.64736 0.32299
##
##
## ***** (V6) hispanic *****
## Before Matching After Matching
## mean treatment........ 0.059459 0.059459
## mean control.......... 0.10769 0.088288
## std mean diff......... -20.341 -12.158
##
## mean raw eQQ diff..... 0.048649 0.0060729
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.024116 0.0030364
## med eCDF diff........ 0.024116 0.0030364
## max eCDF diff........ 0.048233 0.0060729
##
## var ratio (Tr/Co)..... 0.58288 0.69476
## T-test p-value........ 0.064043 0.14352
##
##
## ***** (V7) married *****
## Before Matching After Matching
## mean treatment........ 0.18919 0.18919
## mean control.......... 0.15385 0.16351
## std mean diff......... 8.9995 6.5379
##
## mean raw eQQ diff..... 0.037838 0.0080972
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.017672 0.0040486
## med eCDF diff........ 0.017672 0.0040486
## max eCDF diff........ 0.035343 0.0080972
##
## var ratio (Tr/Co)..... 1.1802 1.1215
## T-test p-value........ 0.33425 0.39942
##
##
## ***** (V8) nodegree *****
## Before Matching After Matching
## mean treatment........ 0.70811 0.70811
## mean control.......... 0.83462 0.71807
## std mean diff......... -27.751 -2.1852
##
## mean raw eQQ diff..... 0.12432 0.01417
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.063254 0.007085
## med eCDF diff........ 0.063254 0.007085
## max eCDF diff........ 0.12651 0.01417
##
## var ratio (Tr/Co)..... 1.4998 1.021
## T-test p-value........ 0.0020368 0.72722
##
##
## ***** (V9) re74 *****
## Before Matching After Matching
## mean treatment........ 2095.6 2095.6
## mean control.......... 2107 1497.1
## std mean diff......... -0.23437 12.247
##
## mean raw eQQ diff..... 487.98 163.88
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 8413 9319.2
##
## mean eCDF diff........ 0.019501 0.009145
## med eCDF diff........ 0.016112 0.0060729
## max eCDF diff........ 0.047089 0.038462
##
## var ratio (Tr/Co)..... 0.7381 1.9637
## T-test p-value........ 0.98186 0.17766
## KS Bootstrap p-value.. 0.6 0.3
## KS Naive p-value...... 0.97023 0.85831
## KS Statistic.......... 0.047089 0.038462
##
##
## ***** (V10) I(re74^2) *****
## Before Matching After Matching
## mean treatment........ 28141412 28141412
## mean control.......... 36667400 14335932
## std mean diff......... -7.4722 12.099
##
## mean raw eQQ diff..... 13311768 3403402
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 365146183 566240806
##
## mean eCDF diff........ 0.019501 0.009145
## med eCDF diff........ 0.016112 0.0060729
## max eCDF diff........ 0.047089 0.038462
##
## var ratio (Tr/Co)..... 0.50382 4.4094
## T-test p-value........ 0.51322 0.14459
## KS Bootstrap p-value.. 0.6 0.3
## KS Naive p-value...... 0.97023 0.85831
## KS Statistic.......... 0.047089 0.038462
##
##
## ***** (V11) re75 *****
## Before Matching After Matching
## mean treatment........ 1532.1 1532.1
## mean control.......... 1266.9 1242.1
## std mean diff......... 8.2363 9.0073
##
## mean raw eQQ diff..... 367.61 62.761
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2110.3 2510.5
##
## mean eCDF diff........ 0.051061 0.006862
## med eCDF diff........ 0.064657 0.0040486
## max eCDF diff........ 0.10748 0.036437
##
## var ratio (Tr/Co)..... 1.0763 1.3081
## T-test p-value........ 0.38527 0.35054
## KS Bootstrap p-value.. < 2.22e-16 0.5
## KS Naive p-value...... 0.16449 0.89829
## KS Statistic.......... 0.10748 0.036437
##
##
## ***** (V12) I(re75^2) *****
## Before Matching After Matching
## mean treatment........ 12654750 12654750
## mean control.......... 11196524 9422619
## std mean diff......... 2.6024 5.7682
##
## mean raw eQQ diff..... 2840847 885385
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 101660120 101660120
##
## mean eCDF diff........ 0.051061 0.006862
## med eCDF diff........ 0.064657 0.0040486
## max eCDF diff........ 0.10748 0.036437
##
## var ratio (Tr/Co)..... 1.4609 2.0326
## T-test p-value........ 0.77178 0.5263
## KS Bootstrap p-value.. < 2.22e-16 0.5
## KS Naive p-value...... 0.16449 0.89829
## KS Statistic.......... 0.10748 0.036437
##
##
## Before Matching Minimum p.value: < 2.22e-16
## Variable Name(s): education I(education^2) re75 I(re75^2) Number(s): 3 4 11 12
##
## After Matching Minimum p.value: < 2.22e-16
## Variable Name(s): age I(age^2) Number(s): 1 2
In matching, we are trying to avoid statistical significant in the difference between matched control and treatment subjects. This means the two group have more overlap and the p-value should be large so we observe a higher chance that the null hypothesis is true. This default propensity score matching using all the variable and a few interaction terms, we did not observe a significant increase in the p-value, meaning the co-variate is still not balanced after matching. The ATT estimated by this model is 777.76, with 95% confidence interval between -144.8159 and 2904.0129.
Genetic Matching
# genetic matching
attach(nsw_dw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16,
max.generations=10, wait.generations=1)
## Loading required namespace: rgenoud
##
##
## Sun Mar 25 01:25:41 2018
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
## 0.000000e+00 <= X7 <= 1.000000e+03
## 0.000000e+00 <= X8 <= 1.000000e+03
## 0.000000e+00 <= X9 <= 1.000000e+03
## 0.000000e+00 <= X10 <= 1.000000e+03
## 0.000000e+00 <= X11 <= 1.000000e+03
## 0.000000e+00 <= X12 <= 1.000000e+03
## 0.000000e+00 <= X13 <= 1.000000e+03
## 0.000000e+00 <= X14 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 4.626480e-02 7.675792e-02 1.570667e-01 1.570667e-01 2.057512e-01 2.057512e-01 2.357651e-01 4.601406e-01 5.551581e-01 5.638495e-01 5.638495e-01 6.697056e-01 7.814812e-01 8.097642e-01 8.333200e-01 8.838634e-01 9.034362e-01 9.446503e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 9.906462e-01 9.906462e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 1.517827e+02
## mean............ 5.102241e+02
## variance........ 9.031596e+04
## var 2:
## best............ 7.779011e+01
## mean............ 4.195485e+02
## variance........ 7.127398e+04
## var 3:
## best............ 3.563625e+02
## mean............ 2.512126e+02
## variance........ 3.154963e+04
## var 4:
## best............ 5.433395e+02
## mean............ 6.151383e+02
## variance........ 9.154968e+04
## var 5:
## best............ 7.803356e+02
## mean............ 5.479739e+02
## variance........ 8.734486e+04
## var 6:
## best............ 2.541555e+02
## mean............ 4.671601e+02
## variance........ 7.365312e+04
## var 7:
## best............ 9.780296e+02
## mean............ 4.167427e+02
## variance........ 1.068744e+05
## var 8:
## best............ 9.259881e+02
## mean............ 5.966286e+02
## variance........ 1.069254e+05
## var 9:
## best............ 5.359330e+02
## mean............ 4.427521e+02
## variance........ 8.164095e+04
## var 10:
## best............ 2.056684e+02
## mean............ 3.085875e+02
## variance........ 4.174271e+04
## var 11:
## best............ 1.168066e+02
## mean............ 5.045664e+02
## variance........ 7.225396e+04
## var 12:
## best............ 4.666732e+02
## mean............ 4.946183e+02
## variance........ 6.340009e+04
## var 13:
## best............ 9.381770e+02
## mean............ 4.716813e+02
## variance........ 7.910594e+04
## var 14:
## best............ 4.410991e+02
## mean............ 4.084476e+02
## variance........ 7.996912e+04
##
## GENERATION: 1
## Lexical Fit..... 5.050974e-02 8.262683e-02 1.570667e-01 1.570667e-01 2.057512e-01 2.057512e-01 2.663691e-01 5.385514e-01 5.638495e-01 5.638495e-01 5.898220e-01 6.697056e-01 7.859962e-01 8.043389e-01 8.333200e-01 8.838634e-01 9.356593e-01 9.446503e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 9.906462e-01 9.906462e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 28
## var 1:
## best............ 1.536530e+02
## mean............ 3.388658e+02
## variance........ 5.960838e+04
## var 2:
## best............ 7.048902e+01
## mean............ 3.225549e+02
## variance........ 3.766512e+04
## var 3:
## best............ 3.583915e+02
## mean............ 3.463198e+02
## variance........ 1.305464e+04
## var 4:
## best............ 5.391866e+02
## mean............ 6.949173e+02
## variance........ 2.157672e+04
## var 5:
## best............ 7.809516e+02
## mean............ 5.841643e+02
## variance........ 4.473077e+04
## var 6:
## best............ 2.526467e+02
## mean............ 2.737395e+02
## variance........ 6.246159e+04
## var 7:
## best............ 9.835427e+02
## mean............ 4.886302e+02
## variance........ 1.213228e+05
## var 8:
## best............ 9.342610e+02
## mean............ 6.995705e+02
## variance........ 5.021819e+04
## var 9:
## best............ 5.313971e+02
## mean............ 5.617117e+02
## variance........ 4.010984e+04
## var 10:
## best............ 2.028814e+02
## mean............ 2.691915e+02
## variance........ 1.045162e+04
## var 11:
## best............ 1.120078e+02
## mean............ 2.487344e+02
## variance........ 3.464837e+04
## var 12:
## best............ 4.608774e+02
## mean............ 4.395572e+02
## variance........ 7.402288e+04
## var 13:
## best............ 9.436336e+02
## mean............ 6.403012e+02
## variance........ 4.058389e+04
## var 14:
## best............ 4.445557e+02
## mean............ 3.849501e+02
## variance........ 1.353245e+04
##
## GENERATION: 2
## Lexical Fit..... 2.057512e-01 2.057512e-01 2.110219e-01 2.411855e-01 3.173114e-01 3.173114e-01 4.735241e-01 4.799121e-01 6.458307e-01 6.576197e-01 6.735115e-01 6.901604e-01 7.820381e-01 8.165199e-01 8.453031e-01 8.453031e-01 8.453031e-01 9.658319e-01 9.658319e-01 9.658319e-01 9.906462e-01 9.906462e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 41
## var 1:
## best............ 1.889952e+02
## mean............ 1.591668e+02
## variance........ 1.770360e+02
## var 2:
## best............ 6.434273e+01
## mean............ 8.616392e+01
## variance........ 1.668142e+03
## var 3:
## best............ 3.440148e+02
## mean............ 3.557315e+02
## variance........ 1.762982e+02
## var 4:
## best............ 5.898816e+02
## mean............ 5.393199e+02
## variance........ 3.677210e+02
## var 5:
## best............ 7.828007e+02
## mean............ 7.757473e+02
## variance........ 4.420380e+02
## var 6:
## best............ 2.768641e+02
## mean............ 2.567741e+02
## variance........ 1.550717e+02
## var 7:
## best............ 9.859199e+02
## mean............ 9.453758e+02
## variance........ 1.666109e+04
## var 8:
## best............ 9.359220e+02
## mean............ 9.235027e+02
## variance........ 1.035936e+03
## var 9:
## best............ 6.133072e+02
## mean............ 5.403610e+02
## variance........ 4.901364e+02
## var 10:
## best............ 1.893726e+02
## mean............ 2.425050e+02
## variance........ 2.292123e+04
## var 11:
## best............ 1.344795e+02
## mean............ 1.214234e+02
## variance........ 7.421132e+02
## var 12:
## best............ 3.916647e+02
## mean............ 4.602420e+02
## variance........ 3.712739e+02
## var 13:
## best............ 9.173538e+02
## mean............ 9.365062e+02
## variance........ 2.404518e+02
## var 14:
## best............ 3.615960e+02
## mean............ 4.178555e+02
## variance........ 4.743726e+03
##
## GENERATION: 3
## Lexical Fit..... 2.057512e-01 2.057512e-01 2.705633e-01 2.796699e-01 3.015794e-01 3.173114e-01 3.173114e-01 3.173114e-01 3.173114e-01 4.939436e-01 5.630807e-01 7.022789e-01 7.801215e-01 8.453031e-01 8.788599e-01 8.797346e-01 8.969526e-01 9.172177e-01 9.172177e-01 9.172177e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 54
## var 1:
## best............ 1.798465e+02
## mean............ 2.310484e+02
## variance........ 2.122945e+04
## var 2:
## best............ 6.755342e+01
## mean............ 8.431936e+01
## variance........ 2.179058e+03
## var 3:
## best............ 3.493640e+02
## mean............ 3.528277e+02
## variance........ 3.469362e+02
## var 4:
## best............ 5.706795e+02
## mean............ 5.564649e+02
## variance........ 2.046527e+03
## var 5:
## best............ 7.820135e+02
## mean............ 7.670474e+02
## variance........ 3.416214e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.837747e+02
## variance........ 2.320737e+03
## var 7:
## best............ 9.247469e+02
## mean............ 9.052096e+02
## variance........ 3.064334e+04
## var 8:
## best............ 9.343419e+02
## mean............ 8.769643e+02
## variance........ 4.562244e+04
## var 9:
## best............ 5.820357e+02
## mean............ 5.807763e+02
## variance........ 2.097102e+03
## var 10:
## best............ 1.949297e+02
## mean............ 1.981626e+02
## variance........ 1.226096e+02
## var 11:
## best............ 1.263023e+02
## mean............ 1.384642e+02
## variance........ 2.622175e+03
## var 12:
## best............ 4.191775e+02
## mean............ 4.216002e+02
## variance........ 2.368424e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.259202e+02
## variance........ 1.526832e+02
## var 14:
## best............ 3.153829e+02
## mean............ 3.311244e+02
## variance........ 1.009938e+04
##
## GENERATION: 4
## Lexical Fit..... 2.057512e-01 2.057512e-01 2.705633e-01 2.796699e-01 3.015794e-01 3.173114e-01 3.173114e-01 3.173114e-01 3.173114e-01 4.939436e-01 5.630807e-01 7.022789e-01 7.801215e-01 8.453031e-01 8.788599e-01 8.797346e-01 8.969526e-01 9.172177e-01 9.172177e-01 9.172177e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 67
## var 1:
## best............ 1.798465e+02
## mean............ 1.755787e+02
## variance........ 8.514986e+01
## var 2:
## best............ 6.755342e+01
## mean............ 8.824120e+01
## variance........ 5.359183e+03
## var 3:
## best............ 3.493640e+02
## mean............ 3.379118e+02
## variance........ 5.499746e+03
## var 4:
## best............ 5.706795e+02
## mean............ 5.510933e+02
## variance........ 2.158019e+03
## var 5:
## best............ 7.820135e+02
## mean............ 7.635109e+02
## variance........ 2.396517e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.842887e+02
## variance........ 1.784583e+03
## var 7:
## best............ 9.247469e+02
## mean............ 8.412525e+02
## variance........ 3.471076e+04
## var 8:
## best............ 9.343419e+02
## mean............ 9.312602e+02
## variance........ 3.092140e+02
## var 9:
## best............ 5.820357e+02
## mean............ 5.789174e+02
## variance........ 1.357159e+02
## var 10:
## best............ 1.949297e+02
## mean............ 2.400511e+02
## variance........ 1.359206e+04
## var 11:
## best............ 1.263023e+02
## mean............ 1.238253e+02
## variance........ 1.905438e+02
## var 12:
## best............ 4.191775e+02
## mean............ 4.012987e+02
## variance........ 5.559249e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.223748e+02
## variance........ 1.576272e+02
## var 14:
## best............ 3.153829e+02
## mean............ 3.220871e+02
## variance........ 1.884053e+02
##
## GENERATION: 5
## Lexical Fit..... 2.057512e-01 2.057512e-01 2.705633e-01 2.796699e-01 3.015794e-01 3.173114e-01 3.173114e-01 3.173114e-01 3.173114e-01 4.939436e-01 5.630807e-01 7.022789e-01 7.801215e-01 8.453031e-01 8.788599e-01 8.797346e-01 8.969526e-01 9.172177e-01 9.172177e-01 9.172177e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 80
## var 1:
## best............ 1.798465e+02
## mean............ 2.201109e+02
## variance........ 1.766000e+04
## var 2:
## best............ 6.755342e+01
## mean............ 6.750422e+01
## variance........ 2.004267e+00
## var 3:
## best............ 3.493640e+02
## mean............ 3.388136e+02
## variance........ 1.623358e+03
## var 4:
## best............ 5.706795e+02
## mean............ 5.717444e+02
## variance........ 1.681180e+01
## var 5:
## best............ 7.820135e+02
## mean............ 7.919621e+02
## variance........ 1.236508e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.722081e+02
## variance........ 1.434388e+00
## var 7:
## best............ 9.247469e+02
## mean............ 9.236964e+02
## variance........ 3.596068e+03
## var 8:
## best............ 9.343419e+02
## mean............ 9.344642e+02
## variance........ 8.678065e-01
## var 9:
## best............ 5.820357e+02
## mean............ 5.819482e+02
## variance........ 1.515294e+02
## var 10:
## best............ 1.949297e+02
## mean............ 1.952928e+02
## variance........ 1.373523e+01
## var 11:
## best............ 1.263023e+02
## mean............ 1.273412e+02
## variance........ 1.773052e+01
## var 12:
## best............ 4.191775e+02
## mean............ 4.254978e+02
## variance........ 1.203550e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.258438e+02
## variance........ 4.180780e+00
## var 14:
## best............ 3.153829e+02
## mean............ 3.153620e+02
## variance........ 9.430142e-01
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 2.057512e-01 2.057512e-01 2.705633e-01 2.796699e-01 3.015794e-01 3.173114e-01 3.173114e-01 3.173114e-01 3.173114e-01 4.939436e-01 5.630807e-01 7.022789e-01 7.801215e-01 8.453031e-01 8.788599e-01 8.797346e-01 8.969526e-01 9.172177e-01 9.172177e-01 9.172177e-01 9.446503e-01 9.446503e-01 9.658319e-01 9.658319e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 1.798465e+02
## X[ 2] : 6.755342e+01
## X[ 3] : 3.493640e+02
## X[ 4] : 5.706795e+02
## X[ 5] : 7.820135e+02
## X[ 6] : 2.726131e+02
## X[ 7] : 9.247469e+02
## X[ 8] : 9.343419e+02
## X[ 9] : 5.820357e+02
## X[10] : 1.949297e+02
## X[11] : 1.263023e+02
## X[12] : 4.191775e+02
## X[13] : 9.269344e+02
## X[14] : 3.153829e+02
##
## Solution Found Generation 3
## Number of Generations Run 5
##
## Sun Mar 25 01:25:42 2018
## Total run time : 0 hours 0 minutes and 1 seconds
geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE", Weight.matrix=geneticMatching)
summary(geneticMatching.ATT)
##
## Estimate... 1870.7
## AI SE...... 736.51
## T-stat..... 2.5399
## p.val...... 0.011088
##
## Original number of observations.............. 445
## Original number of treated obs............... 185
## Matched number of observations............... 445
## Matched number of observations (unweighted). 585
detach(nsw_dw)
# 95% confidence interval
attach(geneticMatching.ATT)
## The following object is masked from package:base:
##
## version
c(est-1.96*se, est+1.96*se)
## [1] 427.1013 3314.2132
detach(geneticMatching.ATT)
# balance test
geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) + black +
hispanic + married + nodegree + re74 + I(re74^2) + re75 + I(re75^2),
data=nsw_dw, match.out=geneticMatching.ATT, nboots=10)
##
## ***** (V1) age *****
## Before Matching After Matching
## mean treatment........ 25.816 25.016
## mean control.......... 25.054 25.063
## std mean diff......... 10.655 -0.71156
##
## mean raw eQQ diff..... 0.94054 0.25983
## med raw eQQ diff..... 1 0
## max raw eQQ diff..... 7 8
##
## mean eCDF diff........ 0.025364 0.0068879
## med eCDF diff........ 0.022193 0.0051282
## max eCDF diff........ 0.065177 0.032479
##
## var ratio (Tr/Co)..... 1.0278 0.91874
## T-test p-value........ 0.26594 0.78012
## KS Bootstrap p-value.. 0.4 0.8
## KS Naive p-value...... 0.7481 0.91722
## KS Statistic.......... 0.065177 0.032479
##
##
## ***** (V2) I(age^2) *****
## Before Matching After Matching
## mean treatment........ 717.39 669.67
## mean control.......... 677.32 675.92
## std mean diff......... 9.2937 -1.6151
##
## mean raw eQQ diff..... 56.076 17.012
## med raw eQQ diff..... 43 0
## max raw eQQ diff..... 721 800
##
## mean eCDF diff........ 0.025364 0.0068879
## med eCDF diff........ 0.022193 0.0051282
## max eCDF diff........ 0.065177 0.032479
##
## var ratio (Tr/Co)..... 1.0115 0.86003
## T-test p-value........ 0.33337 0.56308
## KS Bootstrap p-value.. 0.4 0.8
## KS Naive p-value...... 0.7481 0.91722
## KS Statistic.......... 0.065177 0.032479
##
##
## ***** (V3) education *****
## Before Matching After Matching
## mean treatment........ 10.346 10.204
## mean control.......... 10.088 10.2
## std mean diff......... 12.806 0.26425
##
## mean raw eQQ diff..... 0.40541 0.075214
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2 2
##
## mean eCDF diff........ 0.028698 0.0053724
## med eCDF diff........ 0.012682 0.0042735
## max eCDF diff........ 0.12651 0.013675
##
## var ratio (Tr/Co)..... 1.5513 1.0432
## T-test p-value........ 0.15017 0.87886
## KS Bootstrap p-value.. < 2.22e-16 1
## KS Naive p-value...... 0.062873 1
## KS Statistic.......... 0.12651 0.013675
##
##
## ***** (V4) I(education^2) *****
## Before Matching After Matching
## mean treatment........ 111.06 107.02
## mean control.......... 104.37 106.81
## std mean diff......... 17.012 0.63649
##
## mean raw eQQ diff..... 8.7189 1.4188
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 60 60
##
## mean eCDF diff........ 0.028698 0.0053724
## med eCDF diff........ 0.012682 0.0042735
## max eCDF diff........ 0.12651 0.013675
##
## var ratio (Tr/Co)..... 1.6625 1.0984
## T-test p-value........ 0.053676 0.70228
## KS Bootstrap p-value.. < 2.22e-16 1
## KS Naive p-value...... 0.062873 1
## KS Statistic.......... 0.12651 0.013675
##
##
## ***** (V5) black *****
## Before Matching After Matching
## mean treatment........ 0.84324 0.83596
## mean control.......... 0.82692 0.84494
## std mean diff......... 4.4767 -2.4246
##
## mean raw eQQ diff..... 0.016216 0.0068376
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.0081601 0.0034188
## med eCDF diff........ 0.0081601 0.0034188
## max eCDF diff........ 0.01632 0.0068376
##
## var ratio (Tr/Co)..... 0.92503 1.0467
## T-test p-value........ 0.64736 0.20575
##
##
## ***** (V6) hispanic *****
## Before Matching After Matching
## mean treatment........ 0.059459 0.08764
## mean control.......... 0.10769 0.08764
## std mean diff......... -20.341 0
##
## mean raw eQQ diff..... 0.048649 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.024116 0
## med eCDF diff........ 0.024116 0
## max eCDF diff........ 0.048233 0
##
## var ratio (Tr/Co)..... 0.58288 1
## T-test p-value........ 0.064043 1
##
##
## ***** (V7) married *****
## Before Matching After Matching
## mean treatment........ 0.18919 0.17079
## mean control.......... 0.15385 0.16854
## std mean diff......... 8.9995 0.59647
##
## mean raw eQQ diff..... 0.037838 0.0017094
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.017672 0.0008547
## med eCDF diff........ 0.017672 0.0008547
## max eCDF diff........ 0.035343 0.0017094
##
## var ratio (Tr/Co)..... 1.1802 1.0106
## T-test p-value........ 0.33425 0.31731
##
##
## ***** (V8) nodegree *****
## Before Matching After Matching
## mean treatment........ 0.70811 0.78202
## mean control.......... 0.83462 0.78427
## std mean diff......... -27.751 -0.54367
##
## mean raw eQQ diff..... 0.12432 0.0017094
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.063254 0.0008547
## med eCDF diff........ 0.063254 0.0008547
## max eCDF diff........ 0.12651 0.0017094
##
## var ratio (Tr/Co)..... 1.4998 1.0075
## T-test p-value........ 0.0020368 0.31731
##
##
## ***** (V9) re74 *****
## Before Matching After Matching
## mean treatment........ 2095.6 1851.6
## mean control.......... 2107 1867.5
## std mean diff......... -0.23437 -0.3388
##
## mean raw eQQ diff..... 487.98 241.36
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 8413 7870.3
##
## mean eCDF diff........ 0.019501 0.0089929
## med eCDF diff........ 0.016112 0.008547
## max eCDF diff........ 0.047089 0.02906
##
## var ratio (Tr/Co)..... 0.7381 0.83463
## T-test p-value........ 0.98186 0.89695
## KS Bootstrap p-value.. 0.8 0.4
## KS Naive p-value...... 0.97023 0.96583
## KS Statistic.......... 0.047089 0.02906
##
##
## ***** (V10) I(re74^2) *****
## Before Matching After Matching
## mean treatment........ 28141412 25209598
## mean control.......... 36667400 29583990
## std mean diff......... -7.4722 -4.2548
##
## mean raw eQQ diff..... 13311768 6319599
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 365146183 470088278
##
## mean eCDF diff........ 0.019501 0.0089929
## med eCDF diff........ 0.016112 0.008547
## max eCDF diff........ 0.047089 0.02906
##
## var ratio (Tr/Co)..... 0.50382 0.58969
## T-test p-value........ 0.51322 0.27056
## KS Bootstrap p-value.. 0.8 0.4
## KS Naive p-value...... 0.97023 0.96583
## KS Statistic.......... 0.047089 0.02906
##
##
## ***** (V11) re75 *****
## Before Matching After Matching
## mean treatment........ 1532.1 1268.4
## mean control.......... 1266.9 1199
## std mean diff......... 8.2363 2.3033
##
## mean raw eQQ diff..... 367.61 102.6
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2110.3 2510.5
##
## mean eCDF diff........ 0.051061 0.0093411
## med eCDF diff........ 0.064657 0.0068376
## max eCDF diff........ 0.10748 0.030769
##
## var ratio (Tr/Co)..... 1.0763 1.0502
## T-test p-value........ 0.38527 0.30158
## KS Bootstrap p-value.. 0.1 0.5
## KS Naive p-value...... 0.16449 0.94465
## KS Statistic.......... 0.10748 0.030769
##
##
## ***** (V12) I(re75^2) *****
## Before Matching After Matching
## mean treatment........ 12654750 10685430
## mean control.......... 11196524 10079958
## std mean diff......... 2.6024 1.172
##
## mean raw eQQ diff..... 2840847 1383849
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 101660120 101660120
##
## mean eCDF diff........ 0.051061 0.0093411
## med eCDF diff........ 0.064657 0.0068376
## max eCDF diff........ 0.10748 0.030769
##
## var ratio (Tr/Co)..... 1.4609 1.2575
## T-test p-value........ 0.77178 0.49394
## KS Bootstrap p-value.. 0.1 0.5
## KS Naive p-value...... 0.16449 0.94465
## KS Statistic.......... 0.10748 0.030769
##
##
## Before Matching Minimum p.value: < 2.22e-16
## Variable Name(s): education I(education^2) Number(s): 3 4
##
## After Matching Minimum p.value: 0.20575
## Variable Name(s): black Number(s): 5
Genetic matching is a better way to obtain balance in the co-variate. It uses genetic algorithm to improve on the lowest p-value (which is the least balanced co-variate) to produce an overall balanced matching model. The default genetic matching provides by R can on average produce an improved p-value of 0.1, which is a reasonable p-value we can accept the matching did successfully balance the co-variate. This genetic matching model estimates an ATT of 1683.3 with 95% confidence interval between 193.8664 and 3172.6476.
Genetic Matching Improvement 1
# bind the cps_control observations to the treated observations
cps_nsw <- rbind(cps_control, nsw_dw.treated)
# genetic matching on the combined dataset
attach(cps_nsw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
cps_nsw.geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16,
ties=FALSE, # ties is set to false to speed up the code
max.generations=10, wait.generations=1)
##
##
## Sun Mar 25 01:25:47 2018
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
## 0.000000e+00 <= X7 <= 1.000000e+03
## 0.000000e+00 <= X8 <= 1.000000e+03
## 0.000000e+00 <= X9 <= 1.000000e+03
## 0.000000e+00 <= X10 <= 1.000000e+03
## 0.000000e+00 <= X11 <= 1.000000e+03
## 0.000000e+00 <= X12 <= 1.000000e+03
## 0.000000e+00 <= X13 <= 1.000000e+03
## 0.000000e+00 <= X14 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.113154e-12 4.113154e-12 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.275107e-01
## mean............ 4.280594e+02
## variance........ 1.002552e+05
## var 2:
## best............ 5.540849e+02
## mean............ 5.465021e+02
## variance........ 1.121796e+05
## var 3:
## best............ 6.201271e+02
## mean............ 5.386414e+02
## variance........ 7.914807e+04
## var 4:
## best............ 1.317797e+02
## mean............ 5.066893e+02
## variance........ 1.325341e+05
## var 5:
## best............ 6.613368e+02
## mean............ 4.586361e+02
## variance........ 7.496608e+04
## var 6:
## best............ 9.685073e+02
## mean............ 4.975909e+02
## variance........ 1.143074e+05
## var 7:
## best............ 3.437448e+02
## mean............ 5.151082e+02
## variance........ 6.896991e+04
## var 8:
## best............ 3.401326e+01
## mean............ 4.673805e+02
## variance........ 9.919997e+04
## var 9:
## best............ 7.443439e+02
## mean............ 6.156917e+02
## variance........ 7.603490e+04
## var 10:
## best............ 1.510109e+02
## mean............ 4.672184e+02
## variance........ 7.776797e+04
## var 11:
## best............ 4.210445e+02
## mean............ 5.393222e+02
## variance........ 9.230107e+04
## var 12:
## best............ 6.263855e+02
## mean............ 5.011024e+02
## variance........ 1.111816e+05
## var 13:
## best............ 1.920704e+02
## mean............ 3.961493e+02
## variance........ 8.682024e+04
## var 14:
## best............ 5.752659e+01
## mean............ 4.998127e+02
## variance........ 7.721434e+04
##
## GENERATION: 1
## Lexical Fit..... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.011820e-07 2.011820e-07 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 9.275107e-01
## mean............ 1.706820e+02
## variance........ 7.686751e+04
## var 2:
## best............ 5.540849e+02
## mean............ 6.115401e+02
## variance........ 1.379646e+04
## var 3:
## best............ 5.570844e+02
## mean............ 5.421841e+02
## variance........ 1.883358e+04
## var 4:
## best............ 1.722150e+02
## mean............ 1.928798e+02
## variance........ 7.377514e+04
## var 5:
## best............ 6.613368e+02
## mean............ 5.895555e+02
## variance........ 2.619710e+04
## var 6:
## best............ 9.685073e+02
## mean............ 8.261733e+02
## variance........ 4.530463e+04
## var 7:
## best............ 4.300470e+02
## mean............ 4.189584e+02
## variance........ 4.223083e+04
## var 8:
## best............ 3.401326e+01
## mean............ 2.082935e+02
## variance........ 7.235118e+04
## var 9:
## best............ 7.816016e+02
## mean............ 8.016537e+02
## variance........ 1.296299e+04
## var 10:
## best............ 1.510109e+02
## mean............ 2.989919e+02
## variance........ 4.935110e+04
## var 11:
## best............ 4.010951e+02
## mean............ 5.330253e+02
## variance........ 1.690894e+04
## var 12:
## best............ 6.263855e+02
## mean............ 6.774050e+02
## variance........ 6.917801e+03
## var 13:
## best............ 1.679291e+02
## mean............ 1.772045e+02
## variance........ 5.299291e+03
## var 14:
## best............ 5.752659e+01
## mean............ 1.925354e+02
## variance........ 3.038729e+04
##
## GENERATION: 2
## Lexical Fit..... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.198094e-05 2.198094e-05 5.637069e-01 5.637069e-01 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 42
## var 1:
## best............ 9.275107e-01
## mean............ 6.069794e+00
## variance........ 3.195081e+02
## var 2:
## best............ 5.540849e+02
## mean............ 5.725737e+02
## variance........ 6.576686e+03
## var 3:
## best............ 8.920537e+01
## mean............ 3.915143e+02
## variance........ 5.200025e+04
## var 4:
## best............ 1.317797e+02
## mean............ 1.709342e+02
## variance........ 9.136457e+03
## var 5:
## best............ 4.693917e+02
## mean............ 6.485712e+02
## variance........ 2.159074e+03
## var 6:
## best............ 9.685073e+02
## mean............ 9.685594e+02
## variance........ 2.002155e-02
## var 7:
## best............ 3.437448e+02
## mean............ 3.874747e+02
## variance........ 1.991199e+03
## var 8:
## best............ 3.401326e+01
## mean............ 3.756305e+01
## variance........ 2.144090e+02
## var 9:
## best............ 7.443439e+02
## mean............ 7.577680e+02
## variance........ 5.924252e+02
## var 10:
## best............ 1.913898e+02
## mean............ 1.744536e+02
## variance........ 6.558745e+03
## var 11:
## best............ 4.268644e+02
## mean............ 4.149565e+02
## variance........ 2.209661e+02
## var 12:
## best............ 8.620674e+02
## mean............ 6.428866e+02
## variance........ 3.249749e+03
## var 13:
## best............ 1.920704e+02
## mean............ 1.766784e+02
## variance........ 4.042992e+02
## var 14:
## best............ 3.301506e+02
## mean............ 8.755586e+01
## variance........ 6.097591e+03
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.198094e-05 2.198094e-05 5.637069e-01 5.637069e-01 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 9.275107e-01
## X[ 2] : 5.540849e+02
## X[ 3] : 8.920537e+01
## X[ 4] : 1.317797e+02
## X[ 5] : 4.693917e+02
## X[ 6] : 9.685073e+02
## X[ 7] : 3.437448e+02
## X[ 8] : 3.401326e+01
## X[ 9] : 7.443439e+02
## X[10] : 1.913898e+02
## X[11] : 4.268644e+02
## X[12] : 8.620674e+02
## X[13] : 1.920704e+02
## X[14] : 3.301506e+02
##
## Solution Found Generation 1
## Number of Generations Run 2
##
## Sun Mar 25 01:28:31 2018
## Total run time : 0 hours 2 minutes and 44 seconds
cps_nsw.geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE",
Weight.matrix=cps_nsw.geneticMatching)
summary(cps_nsw.geneticMatching.ATT)
##
## Estimate... -5380.8
## AI SE...... 2883.7
## T-stat..... -1.8659
## p.val...... 0.062049
##
## Original number of observations.............. 16177
## Original number of treated obs............... 185
## Matched number of observations............... 16177
## Matched number of observations (unweighted). 16290
detach(cps_nsw)
# 95% confidence interval
attach(cps_nsw.geneticMatching.ATT)
## The following object is masked from package:base:
##
## version
c(est-1.96*se, est+1.96*se)
## [1] -11032.7568 271.2242
detach(cps_nsw.geneticMatching.ATT)
# balance test
cps_nsw.geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
black + hispanic + married + nodegree + re74 +
I(re74^2) + re75 + I(re75^2), data=cps_nsw,
match.out=cps_nsw.geneticMatching.ATT, nboots=10)
##
## ***** (V1) age *****
## Before Matching After Matching
## mean treatment........ 25.816 26.894
## mean control.......... 33.225 33.138
## std mean diff......... -103.55 -78.487
##
## mean raw eQQ diff..... 7.4865 6.4125
## med raw eQQ diff..... 6 7
## max raw eQQ diff..... 16 14
##
## mean eCDF diff........ 0.18628 0.16031
## med eCDF diff........ 0.19692 0.16249
## max eCDF diff........ 0.34274 0.38643
##
## var ratio (Tr/Co)..... 0.41964 0.51926
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.34274 0.38643
##
##
## ***** (V2) I(age^2) *****
## Before Matching After Matching
## mean treatment........ 717.39 786.57
## mean control.......... 1225.9 1220
## std mean diff......... -117.92 -89.722
##
## mean raw eQQ diff..... 513.91 437.94
## med raw eQQ diff..... 336 477
## max raw eQQ diff..... 1312 1344
##
## mean eCDF diff........ 0.18628 0.16031
## med eCDF diff........ 0.19692 0.16249
## max eCDF diff........ 0.34274 0.38643
##
## var ratio (Tr/Co)..... 0.302 0.38011
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.34274 0.38643
##
##
## ***** (V3) education *****
## Before Matching After Matching
## mean treatment........ 10.346 11.521
## mean control.......... 12.028 12.008
## std mean diff......... -83.633 -30.046
##
## mean raw eQQ diff..... 1.7351 0.92492
## med raw eQQ diff..... 2 0
## max raw eQQ diff..... 4 5
##
## mean eCDF diff........ 0.090791 0.04868
## med eCDF diff........ 0.037581 0.026826
## max eCDF diff........ 0.41227 0.16839
##
## var ratio (Tr/Co)..... 0.49052 0.31951
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.41227 0.16839
##
##
## ***** (V4) I(education^2) *****
## Before Matching After Matching
## mean treatment........ 111.06 135.36
## mean control.......... 152.9 152.42
## std mean diff......... -106.46 -47.582
##
## mean raw eQQ diff..... 42.168 23.029
## med raw eQQ diff..... 40 0
## max raw eQQ diff..... 128 128
##
## mean eCDF diff........ 0.090791 0.04868
## med eCDF diff........ 0.037581 0.026826
## max eCDF diff........ 0.41227 0.16839
##
## var ratio (Tr/Co)..... 0.34243 0.28564
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.41227 0.16839
##
##
## ***** (V5) black *****
## Before Matching After Matching
## mean treatment........ 0.84324 0.082339
## mean control.......... 0.073537 0.082339
## std mean diff......... 211.13 0
##
## mean raw eQQ diff..... 0.76757 0
## med raw eQQ diff..... 1 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.38485 0
## med eCDF diff........ 0.38485 0
## max eCDF diff........ 0.76971 0
##
## var ratio (Tr/Co)..... 1.9506 1
## T-test p-value........ < 2.22e-16 1
##
##
## ***** (V6) hispanic *****
## Before Matching After Matching
## mean treatment........ 0.059459 0.07078
## mean control.......... 0.072036 0.071892
## std mean diff......... -5.3038 -0.43386
##
## mean raw eQQ diff..... 0.016216 0.001105
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.0062883 0.00055249
## med eCDF diff........ 0.0062883 0.00055249
## max eCDF diff........ 0.012577 0.001105
##
## var ratio (Tr/Co)..... 0.84109 0.9857
## T-test p-value........ 0.47458 2.1981e-05
##
##
## ***** (V7) married *****
## Before Matching After Matching
## mean treatment........ 0.18919 0.13501
## mean control.......... 0.71173 0.706
## std mean diff......... -133.06 -167.08
##
## mean raw eQQ diff..... 0.51892 0.56703
## med raw eQQ diff..... 1 1
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.26127 0.28352
## med eCDF diff........ 0.26127 0.28352
## max eCDF diff........ 0.52254 0.56703
##
## var ratio (Tr/Co)..... 0.75167 0.56262
## T-test p-value........ < 2.22e-16 < 2.22e-16
##
##
## ***** (V8) nodegree *****
## Before Matching After Matching
## mean treatment........ 0.70811 0.30049
## mean control.......... 0.29584 0.30055
## std mean diff......... 90.437 -0.013483
##
## mean raw eQQ diff..... 0.41081 6.1387e-05
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.20614 3.0694e-05
## med eCDF diff........ 0.20614 3.0694e-05
## max eCDF diff........ 0.41227 6.1387e-05
##
## var ratio (Tr/Co)..... 0.99753 0.99988
## T-test p-value........ < 2.22e-16 0.56371
##
##
## ***** (V9) re74 *****
## Before Matching After Matching
## mean treatment........ 2095.6 4057.6
## mean control.......... 14017 13884
## std mean diff......... -243.96 -222.72
##
## mean raw eQQ diff..... 12014 9761.2
## med raw eQQ diff..... 13276 10815
## max raw eQQ diff..... 23256 19102
##
## mean eCDF diff........ 0.45911 0.37406
## med eCDF diff........ 0.50015 0.33892
## max eCDF diff........ 0.60309 0.58656
##
## var ratio (Tr/Co)..... 0.26074 0.21101
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.60309 0.58656
##
##
## ***** (V10) I(re74^2) *****
## Before Matching After Matching
## mean treatment........ 28141412 35929801
## mean control.......... 288045960 285021708
## std mean diff......... -227.78 -381.53
##
## mean raw eQQ diff..... 266205831 247439915
## med raw eQQ diff..... 221783269 200270349
## max raw eQQ diff..... 658838222 623162091
##
## mean eCDF diff........ 0.45911 0.37406
## med eCDF diff........ 0.50015 0.33892
## max eCDF diff........ 0.60309 0.58656
##
## var ratio (Tr/Co)..... 0.19219 0.06285
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.60309 0.58656
##
##
## ***** (V11) re75 *****
## Before Matching After Matching
## mean treatment........ 1532.1 4609.4
## mean control.......... 13651 13513
## std mean diff......... -376.45 -207.66
##
## mean raw eQQ diff..... 12112 8843.2
## med raw eQQ diff..... 13837 9499.6
## max raw eQQ diff..... 22438 16788
##
## mean eCDF diff........ 0.4751 0.34805
## med eCDF diff........ 0.51248 0.29702
## max eCDF diff........ 0.6509 0.60454
##
## var ratio (Tr/Co)..... 0.12059 0.212
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.6509 0.60454
##
##
## ***** (V12) I(re75^2) *****
## Before Matching After Matching
## mean treatment........ 12654750 39629696
## mean control.......... 272279442 269310626
## std mean diff......... -463.34 -413.34
##
## mean raw eQQ diff..... 259843916 228089774
## med raw eQQ diff..... 206883640 171289008
## max raw eQQ diff..... 629191089 565741310
##
## mean eCDF diff........ 0.4751 0.34805
## med eCDF diff........ 0.51248 0.29702
## max eCDF diff........ 0.6509 0.60454
##
## var ratio (Tr/Co)..... 0.051503 0.050575
## T-test p-value........ < 2.22e-16 < 2.22e-16
## KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16
## KS Naive p-value...... < 2.22e-16 < 2.22e-16
## KS Statistic.......... 0.6509 0.60454
##
##
## Before Matching Minimum p.value: < 2.22e-16
## Variable Name(s): age I(age^2) education I(education^2) black married nodegree re74 I(re74^2) re75 I(re75^2) Number(s): 1 2 3 4 5 7 8 9 10 11 12
##
## After Matching Minimum p.value: < 2.22e-16
## Variable Name(s): age I(age^2) education I(education^2) married re74 I(re74^2) re75 I(re75^2) Number(s): 1 2 3 4 7 9 10 11 12
By substituting the 260 control observations in the nsw dataset with the 15992 observations in the cps dataset, the genetic matching algorithm has more control observations to match on, which means a greater likelihood of finding “perfect match.” The ATT produced by this model is aline with the simple difference in mean analysis between the nsw treated and cps control, giving a negative treatment effect of -5097.5 with 95% confidence interval between -10042.4099 and -152.6114. However, the matched balance did not improve significantly even we have more observation to match on. This might be an indication that using the control observations in the cps dataset is inappropriate for the analysis.
Genetic Matching Improvement 2
# genetic matching with 0.25 std caliper
attach(nsw_dw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
caliper.geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16, caliper=0.25,
max.generations=10, wait.generations=1)
##
##
## Sun Mar 25 01:28:46 2018
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
## 0.000000e+00 <= X7 <= 1.000000e+03
## 0.000000e+00 <= X8 <= 1.000000e+03
## 0.000000e+00 <= X9 <= 1.000000e+03
## 0.000000e+00 <= X10 <= 1.000000e+03
## 0.000000e+00 <= X11 <= 1.000000e+03
## 0.000000e+00 <= X12 <= 1.000000e+03
## 0.000000e+00 <= X13 <= 1.000000e+03
## 0.000000e+00 <= X14 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 1.154871e-01 1.538568e-01 2.223815e-01 2.801223e-01 5.124101e-01 7.930811e-01 8.879696e-01 9.389695e-01 9.999994e-01 9.999994e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 5.174626e+02
## mean............ 4.183607e+02
## variance........ 8.588364e+04
## var 2:
## best............ 2.517747e+02
## mean............ 5.616477e+02
## variance........ 8.151522e+04
## var 3:
## best............ 5.423325e+02
## mean............ 5.091512e+02
## variance........ 8.353513e+04
## var 4:
## best............ 8.959062e+02
## mean............ 5.195744e+02
## variance........ 9.982392e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.752450e+02
## variance........ 9.761023e+04
## var 6:
## best............ 3.234974e+02
## mean............ 3.557615e+02
## variance........ 5.405763e+04
## var 7:
## best............ 9.283781e+02
## mean............ 5.471011e+02
## variance........ 1.329645e+05
## var 8:
## best............ 9.993378e+01
## mean............ 5.028265e+02
## variance........ 8.870994e+04
## var 9:
## best............ 1.287469e+02
## mean............ 4.710720e+02
## variance........ 7.529871e+04
## var 10:
## best............ 2.668056e+01
## mean............ 3.327088e+02
## variance........ 5.780442e+04
## var 11:
## best............ 9.778556e+02
## mean............ 3.952060e+02
## variance........ 9.840715e+04
## var 12:
## best............ 6.885289e+02
## mean............ 4.208080e+02
## variance........ 7.460026e+04
## var 13:
## best............ 9.448080e+02
## mean............ 4.509797e+02
## variance........ 8.334366e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.198698e+02
## variance........ 7.637572e+04
##
## GENERATION: 1
## Lexical Fit..... 1.154871e-01 1.538568e-01 2.223815e-01 2.801223e-01 5.124101e-01 7.930811e-01 8.879696e-01 9.389695e-01 9.999994e-01 9.999994e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 5.174626e+02
## mean............ 5.521987e+02
## variance........ 1.638715e+04
## var 2:
## best............ 2.517747e+02
## mean............ 4.626389e+02
## variance........ 8.729209e+04
## var 3:
## best............ 5.423325e+02
## mean............ 6.548286e+02
## variance........ 9.865970e+03
## var 4:
## best............ 8.959062e+02
## mean............ 7.013638e+02
## variance........ 5.099725e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.686821e+02
## variance........ 2.581440e+04
## var 6:
## best............ 3.234974e+02
## mean............ 5.118322e+02
## variance........ 6.283828e+04
## var 7:
## best............ 9.283781e+02
## mean............ 6.071086e+02
## variance........ 1.466464e+05
## var 8:
## best............ 9.993378e+01
## mean............ 2.197013e+02
## variance........ 6.821922e+04
## var 9:
## best............ 1.287469e+02
## mean............ 3.756282e+02
## variance........ 7.588449e+04
## var 10:
## best............ 2.668056e+01
## mean............ 1.539071e+02
## variance........ 1.692154e+04
## var 11:
## best............ 9.778556e+02
## mean............ 5.816245e+02
## variance........ 1.544968e+05
## var 12:
## best............ 6.885289e+02
## mean............ 6.686229e+02
## variance........ 6.514926e+03
## var 13:
## best............ 9.448080e+02
## mean............ 7.814687e+02
## variance........ 3.030432e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.580865e+02
## variance........ 1.066286e+04
##
## GENERATION: 2
## Lexical Fit..... 1.154871e-01 1.538568e-01 2.223815e-01 2.801223e-01 5.124101e-01 7.930811e-01 8.879696e-01 9.389695e-01 9.999994e-01 9.999994e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 5.174626e+02
## mean............ 6.122315e+02
## variance........ 2.415755e+04
## var 2:
## best............ 2.517747e+02
## mean............ 4.516073e+02
## variance........ 6.589793e+04
## var 3:
## best............ 5.423325e+02
## mean............ 6.500472e+02
## variance........ 1.136768e+04
## var 4:
## best............ 8.959062e+02
## mean............ 7.118880e+02
## variance........ 4.088882e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.930714e+02
## variance........ 2.269066e+04
## var 6:
## best............ 3.234974e+02
## mean............ 4.792907e+02
## variance........ 5.398690e+04
## var 7:
## best............ 9.283781e+02
## mean............ 6.253168e+02
## variance........ 1.214839e+05
## var 8:
## best............ 9.993378e+01
## mean............ 3.031994e+02
## variance........ 7.139733e+04
## var 9:
## best............ 1.287469e+02
## mean............ 3.618528e+02
## variance........ 6.408257e+04
## var 10:
## best............ 2.668056e+01
## mean............ 1.432764e+02
## variance........ 9.984163e+03
## var 11:
## best............ 9.778556e+02
## mean............ 5.749167e+02
## variance........ 1.317945e+05
## var 12:
## best............ 6.885289e+02
## mean............ 6.825786e+02
## variance........ 6.364565e+03
## var 13:
## best............ 9.448080e+02
## mean............ 7.546748e+02
## variance........ 3.145562e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.659258e+02
## variance........ 1.614830e+04
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 1.154871e-01 1.538568e-01 2.223815e-01 2.801223e-01 5.124101e-01 7.930811e-01 8.879696e-01 9.389695e-01 9.999994e-01 9.999994e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 5.174626e+02
## X[ 2] : 2.517747e+02
## X[ 3] : 5.423325e+02
## X[ 4] : 8.959062e+02
## X[ 5] : 6.916058e+02
## X[ 6] : 3.234974e+02
## X[ 7] : 9.283781e+02
## X[ 8] : 9.993378e+01
## X[ 9] : 1.287469e+02
## X[10] : 2.668056e+01
## X[11] : 9.778556e+02
## X[12] : 6.885289e+02
## X[13] : 9.448080e+02
## X[14] : 5.331021e+02
##
## Solution Found Generation 1
## Number of Generations Run 2
##
## Sun Mar 25 01:28:47 2018
## Total run time : 0 hours 0 minutes and 1 seconds
caliper.geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE", caliper=0.25,
Weight.matrix=caliper.geneticMatching)
summary(caliper.geneticMatching.ATT)
##
## Estimate... 1395.3
## AI SE...... 382.45
## T-stat..... 3.6484
## p.val...... 0.0002639
##
## Original number of observations.............. 445
## Original number of treated obs............... 185
## Matched number of observations............... 204
## Matched number of observations (unweighted). 328
##
## Caliper (SDs)........................................ 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
## Number of obs dropped by 'exact' or 'caliper' 241
detach(nsw_dw)
# 95% confidence interval
attach(caliper.geneticMatching.ATT)
## The following object is masked from package:base:
##
## version
c(est-1.96*se, est+1.96*se)
## [1] 645.7264 2144.9453
detach(caliper.geneticMatching.ATT)
# balance test
caliper.geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
black + hispanic + married + nodegree + re74 + I(re74^2) +
re75 + I(re75^2), data=nsw_dw,
match.out=caliper.geneticMatching.ATT, nboots=10)
##
## ***** (V1) age *****
## Before Matching After Matching
## mean treatment........ 25.816 23.446
## mean control.......... 25.054 23.456
## std mean diff......... 10.655 -0.16652
##
## mean raw eQQ diff..... 0.94054 0.14939
## med raw eQQ diff..... 1 0
## max raw eQQ diff..... 7 1
##
## mean eCDF diff........ 0.025364 0.0057458
## med eCDF diff........ 0.022193 0.0030488
## max eCDF diff........ 0.065177 0.018293
##
## var ratio (Tr/Co)..... 1.0278 0.97584
## T-test p-value........ 0.26594 0.79308
## KS Bootstrap p-value.. 0.5 1
## KS Naive p-value...... 0.7481 1
## KS Statistic.......... 0.065177 0.018293
##
##
## ***** (V2) I(age^2) *****
## Before Matching After Matching
## mean treatment........ 717.39 584.21
## mean control.......... 677.32 585.52
## std mean diff......... 9.2937 -0.40371
##
## mean raw eQQ diff..... 56.076 8.1982
## med raw eQQ diff..... 43 0
## max raw eQQ diff..... 721 89
##
## mean eCDF diff........ 0.025364 0.0057458
## med eCDF diff........ 0.022193 0.0030488
## max eCDF diff........ 0.065177 0.018293
##
## var ratio (Tr/Co)..... 1.0115 0.96054
## T-test p-value........ 0.33337 0.51241
## KS Bootstrap p-value.. 0.5 1
## KS Naive p-value...... 0.7481 1
## KS Statistic.......... 0.065177 0.018293
##
##
## ***** (V3) education *****
## Before Matching After Matching
## mean treatment........ 10.346 10.23
## mean control.......... 10.088 10.23
## std mean diff......... 12.806 0
##
## mean raw eQQ diff..... 0.40541 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2 0
##
## mean eCDF diff........ 0.028698 0
## med eCDF diff........ 0.012682 0
## max eCDF diff........ 0.12651 0
##
## var ratio (Tr/Co)..... 1.5513 1
## T-test p-value........ 0.15017 1
## KS Bootstrap p-value.. < 2.22e-16 1
## KS Naive p-value...... 0.062873 1
## KS Statistic.......... 0.12651 0
##
##
## ***** (V4) I(education^2) *****
## Before Matching After Matching
## mean treatment........ 111.06 106.73
## mean control.......... 104.37 106.73
## std mean diff......... 17.012 0
##
## mean raw eQQ diff..... 8.7189 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 60 0
##
## mean eCDF diff........ 0.028698 0
## med eCDF diff........ 0.012682 0
## max eCDF diff........ 0.12651 0
##
## var ratio (Tr/Co)..... 1.6625 1
## T-test p-value........ 0.053676 1
## KS Bootstrap p-value.. < 2.22e-16 1
## KS Naive p-value...... 0.062873 1
## KS Statistic.......... 0.12651 0
##
##
## ***** (V5) black *****
## Before Matching After Matching
## mean treatment........ 0.84324 0.95588
## mean control.......... 0.82692 0.95588
## std mean diff......... 4.4767 0
##
## mean raw eQQ diff..... 0.016216 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.0081601 0
## med eCDF diff........ 0.0081601 0
## max eCDF diff........ 0.01632 0
##
## var ratio (Tr/Co)..... 0.92503 1
## T-test p-value........ 0.64736 1
##
##
## ***** (V6) hispanic *****
## Before Matching After Matching
## mean treatment........ 0.059459 0.019608
## mean control.......... 0.10769 0.019608
## std mean diff......... -20.341 0
##
## mean raw eQQ diff..... 0.048649 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.024116 0
## med eCDF diff........ 0.024116 0
## max eCDF diff........ 0.048233 0
##
## var ratio (Tr/Co)..... 0.58288 1
## T-test p-value........ 0.064043 1
##
##
## ***** (V7) married *****
## Before Matching After Matching
## mean treatment........ 0.18919 0.034314
## mean control.......... 0.15385 0.034314
## std mean diff......... 8.9995 0
##
## mean raw eQQ diff..... 0.037838 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.017672 0
## med eCDF diff........ 0.017672 0
## max eCDF diff........ 0.035343 0
##
## var ratio (Tr/Co)..... 1.1802 1
## T-test p-value........ 0.33425 1
##
##
## ***** (V8) nodegree *****
## Before Matching After Matching
## mean treatment........ 0.70811 0.82353
## mean control.......... 0.83462 0.82353
## std mean diff......... -27.751 0
##
## mean raw eQQ diff..... 0.12432 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.063254 0
## med eCDF diff........ 0.063254 0
## max eCDF diff........ 0.12651 0
##
## var ratio (Tr/Co)..... 1.4998 1
## T-test p-value........ 0.0020368 1
##
##
## ***** (V9) re74 *****
## Before Matching After Matching
## mean treatment........ 2095.6 48.641
## mean control.......... 2107 62.902
## std mean diff......... -0.23437 -5.3426
##
## mean raw eQQ diff..... 487.98 17.552
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 8413 1199
##
## mean eCDF diff........ 0.019501 0.0051829
## med eCDF diff........ 0.016112 0.0060976
## max eCDF diff........ 0.047089 0.0091463
##
## var ratio (Tr/Co)..... 0.7381 0.4801
## T-test p-value........ 0.98186 0.28012
## KS Bootstrap p-value.. 0.6 0.7
## KS Naive p-value...... 0.97023 1
## KS Statistic.......... 0.047089 0.0091463
##
##
## ***** (V10) I(re74^2) *****
## Before Matching After Matching
## mean treatment........ 28141412 73273
## mean control.......... 36667400 151650
## std mean diff......... -7.4722 -14.591
##
## mean raw eQQ diff..... 13311768 55879
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 365146183 6964969
##
## mean eCDF diff........ 0.019501 0.0051829
## med eCDF diff........ 0.016112 0.0060976
## max eCDF diff........ 0.047089 0.0091463
##
## var ratio (Tr/Co)..... 0.50382 0.19069
## T-test p-value........ 0.51322 0.11549
## KS Bootstrap p-value.. 0.6 0.7
## KS Naive p-value...... 0.97023 1
## KS Statistic.......... 0.047089 0.0091463
##
##
## ***** (V11) re75 *****
## Before Matching After Matching
## mean treatment........ 1532.1 78.926
## mean control.......... 1266.9 90.454
## std mean diff......... 8.2363 -2.7335
##
## mean raw eQQ diff..... 367.61 8.3696
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 2110.3 345.03
##
## mean eCDF diff........ 0.051061 0.0079849
## med eCDF diff........ 0.064657 0.0060976
## max eCDF diff........ 0.10748 0.021341
##
## var ratio (Tr/Co)..... 1.0763 0.97985
## T-test p-value........ 0.38527 0.22238
## KS Bootstrap p-value.. < 2.22e-16 0.7
## KS Naive p-value...... 0.16449 1
## KS Statistic.......... 0.10748 0.021341
##
##
## ***** (V12) I(re75^2) *****
## Before Matching After Matching
## mean treatment........ 12654750 183206
## mean control.......... 11196524 188797
## std mean diff......... 2.6024 -0.44451
##
## mean raw eQQ diff..... 2840847 7233.8
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 101660120 267110
##
## mean eCDF diff........ 0.051061 0.0079849
## med eCDF diff........ 0.064657 0.0060976
## max eCDF diff........ 0.10748 0.021341
##
## var ratio (Tr/Co)..... 1.4609 1.011
## T-test p-value........ 0.77178 0.88797
## KS Bootstrap p-value.. < 2.22e-16 0.7
## KS Naive p-value...... 0.16449 1
## KS Statistic.......... 0.10748 0.021341
##
##
## Before Matching Minimum p.value: < 2.22e-16
## Variable Name(s): education I(education^2) re75 I(re75^2) Number(s): 3 4 11 12
##
## After Matching Minimum p.value: 0.11549
## Variable Name(s): I(re74^2) Number(s): 10
Caliper matching is to drop observations that cannot be closely matched. In this implementation, a 0.25 standard deviation dropped 241 out of the 445 observations. This shows two concerns: 1) more than half of the observations are dropped meaning what we are analyzing is fundamentally different from the original problem we are investigating on. 2) many of the data cannot be closely matched challenge the validity of the none matching method on this dataset, as the assumption that we can find “identical twins” to obtain Y(0) and Y(1) is not the case in this dataset. The result produced by caliper matching is ATT of 1395.3 with 95% confidence interval between 645.7264 and 2144.9453. This result is not significantly different from result produces by the default genetic matching model. As in caliper matching, unmatched observations are dropped, the SE is smaller making the 95% confidence interval tighter.
Summary In conclusion, we were able to obtain a balanced match using genetic matching and estimate an average treatment effect on the treated unit of 1683.3 with 95% confidence interval between 193.8664 and 3172.6476. However, an extensive analysis suggests further investigation on the relation between the Lalonde Sample and the Current Population Survey Data, as well as if the matching model was able to find close enough match in the Lalonde Sample to produce a meaningful result.