Causal Inference

Introduction In this analysis, we will use the simple difference between the mean, propensity score matching, and genetic matching to investigate the Lalonde Sample.

Data Setup

library(foreign)
nsw_dw <- read.dta("~/Causal Inference/nsw_dw.dta")
nsw_dw.treated <- subset(nsw_dw, treat==1)
nsw_dw.control <- subset(nsw_dw, treat==0)
cps_control <- read.dta("~/Causal Inference/cps_controls.dta")

Treatment Effect by Simple Difference in Mean

nsw_dw.treated.mean <- mean(nsw_dw.treated$re78)
nsw_dw.control.mean <- mean(nsw_dw.control$re78)
treatmentEffect.differenceInMean <- nsw_dw.treated.mean - nsw_dw.control.mean
treatmentEffect.differenceInMean
## [1] 1794.342
# 95% confidence interval
t.test(nsw_dw.treated$re78, nsw_dw.control$re78, conf.level=0.95)
## 
##  Welch Two Sample t-test
## 
## data:  nsw_dw.treated$re78 and nsw_dw.control$re78
## t = 2.6741, df = 307.13, p-value = 0.007893
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   474.0105 3114.6743
## sample estimates:
## mean of x mean of y 
##  6349.144  4554.801

A simple difference in the mean shows a positive treatment effect of 1794.342, with 95% confidence interval between 474.0105 and 3114.6743. This is a naive way of analysis data, the estimated treatment effect contains sample bias, and as this is not an RCT, we cannot assume sample bias is 0, meaning the result produced by this method is suspicious.

Treatment Effect by Difference in Mean using CPS Control

cps_control.mean <- mean(cps_control$re78)
treatmentEffect.cpsDifferenceInMean <- nsw_dw.treated.mean - cps_control.mean
treatmentEffect.cpsDifferenceInMean
## [1] -8497.516
# 95% confidence interval
t.test(nsw_dw.treated$re78, cps_control$re78, conf.level=0.95)
## 
##  Welch Two Sample t-test
## 
## data:  nsw_dw.treated$re78 and cps_control$re78
## t = -14.565, df = 190.46, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9648.335 -7346.698
## sample estimates:
## mean of x mean of y 
##  6349.144 14846.660

By substitute the control data in Lalonde Sample to the actual National Supported Work Demonstration randomized experiment data, we get a better understanding of the population. The National Supported Work Demonstration dataset has 15992 control observations, where the Lalonde Sample only contains 260 control observations. Assuming the 185 treated observations are selected from the same population of the National Supported Work Demonstration randomized experiment data, using this 15992 control observation as the control group in the difference in mean analysis can help us obtain a precise understanding of the (control) population. However, if there is any sample bias on which person will get the treatment, a simple difference in mean cannot avoid that sample bias.

Propensity Score Matching

library(Matching)
## Loading required package: MASS
## ## 
## ##  Matching (Version 4.9-2, Build Date: 2015-12-25)
## ##  See http://sekhon.berkeley.edu/matching for additional documentation.
## ##  Please cite software as:
## ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ##   Software with Automated Balance Optimization: The Matching package for R.''
## ##   Journal of Statistical Software, 42(7): 1-52. 
## ##
# propensity score matching
propensityScoreModel <- glm(treat ~ age + I(age^2) + education + I(education^2) + black +
                            hispanic + married + nodegree + re74 + I(re74^2) + re75 + I(re75^2),
                            family=binomial, data=nsw_dw)
propensityScoreMatching.ATT <- Match(Y=nsw_dw$re78, Tr=nsw_dw$treat, X=propensityScoreModel$fitted, M=1)
summary(propensityScoreMatching.ATT)
## 
## Estimate...  1379.6 
## AI SE......  777.76 
## T-stat.....  1.7738 
## p.val......  0.076096 
## 
## Original number of observations..............  445 
## Original number of treated obs...............  185 
## Matched number of observations...............  185 
## Matched number of observations  (unweighted).  494
# 95% confidence interval
attach(propensityScoreMatching.ATT)
## The following object is masked from package:base:
## 
##     version
c(est-1.96*se, est+1.96*se)
## [1] -144.8159 2904.0129
detach(propensityScoreMatching.ATT)

# balance test
propensityScoreMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
                                                black + hispanic + married + nodegree + re74 +
                                                I(re74^2) + re75 + I(re75^2), data=nsw_dw,
                                                match.out=propensityScoreMatching.ATT, nboots=10)
## 
## ***** (V1) age *****
##                        Before Matching        After Matching
## mean treatment........     25.816             25.816 
## mean control..........     25.054             26.189 
## std mean diff.........     10.655            -5.2154 
## 
## mean raw eQQ diff.....    0.94054            0.85628 
## med  raw eQQ diff.....          1                  1 
## max  raw eQQ diff.....          7                  9 
## 
## mean eCDF diff........   0.025364           0.024169 
## med  eCDF diff........   0.022193           0.020243 
## max  eCDF diff........   0.065177           0.080972 
## 
## var ratio (Tr/Co).....     1.0278            0.91034 
## T-test p-value........    0.26594            0.58434 
## KS Bootstrap p-value..        0.7         < 2.22e-16 
## KS Naive p-value......     0.7481           0.078412 
## KS Statistic..........   0.065177           0.080972 
## 
## 
## ***** (V2) I(age^2) *****
##                        Before Matching        After Matching
## mean treatment........     717.39             717.39 
## mean control..........     677.32             741.82 
## std mean diff.........     9.2937            -5.6629 
## 
## mean raw eQQ diff.....     56.076             52.844 
## med  raw eQQ diff.....         43                 43 
## max  raw eQQ diff.....        721                909 
## 
## mean eCDF diff........   0.025364           0.024169 
## med  eCDF diff........   0.022193           0.020243 
## max  eCDF diff........   0.065177           0.080972 
## 
## var ratio (Tr/Co).....     1.0115            0.76029 
## T-test p-value........    0.33337            0.58978 
## KS Bootstrap p-value..        0.7         < 2.22e-16 
## KS Naive p-value......     0.7481           0.078412 
## KS Statistic..........   0.065177           0.080972 
## 
## 
## ***** (V3) education *****
##                        Before Matching        After Matching
## mean treatment........     10.346             10.346 
## mean control..........     10.088             10.252 
## std mean diff.........     12.806             4.6502 
## 
## mean raw eQQ diff.....    0.40541            0.10931 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          2                  2 
## 
## mean eCDF diff........   0.028698           0.007808 
## med  eCDF diff........   0.012682          0.0040486 
## max  eCDF diff........    0.12651           0.038462 
## 
## var ratio (Tr/Co).....     1.5513            0.84297 
## T-test p-value........    0.15017            0.64164 
## KS Bootstrap p-value.. < 2.22e-16                0.5 
## KS Naive p-value......   0.062873            0.85831 
## KS Statistic..........    0.12651           0.038462 
## 
## 
## ***** (V4) I(education^2) *****
##                        Before Matching        After Matching
## mean treatment........     111.06             111.06 
## mean control..........     104.37             109.88 
## std mean diff.........     17.012             2.9945 
## 
## mean raw eQQ diff.....     8.7189             1.9393 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....         60                 60 
## 
## mean eCDF diff........   0.028698           0.007808 
## med  eCDF diff........   0.012682          0.0040486 
## max  eCDF diff........    0.12651           0.038462 
## 
## var ratio (Tr/Co).....     1.6625             0.9462 
## T-test p-value........   0.053676            0.74728 
## KS Bootstrap p-value.. < 2.22e-16                0.5 
## KS Naive p-value......   0.062873            0.85831 
## KS Statistic..........    0.12651           0.038462 
## 
## 
## ***** (V5) black *****
##                        Before Matching        After Matching
## mean treatment........    0.84324            0.84324 
## mean control..........    0.82692            0.81171 
## std mean diff.........     4.4767             8.6493 
## 
## mean raw eQQ diff.....   0.016216          0.0080972 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0081601          0.0040486 
## med  eCDF diff........  0.0081601          0.0040486 
## max  eCDF diff........    0.01632          0.0080972 
## 
## var ratio (Tr/Co).....    0.92503            0.86488 
## T-test p-value........    0.64736            0.32299 
## 
## 
## ***** (V6) hispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.059459           0.059459 
## mean control..........    0.10769           0.088288 
## std mean diff.........    -20.341            -12.158 
## 
## mean raw eQQ diff.....   0.048649          0.0060729 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.024116          0.0030364 
## med  eCDF diff........   0.024116          0.0030364 
## max  eCDF diff........   0.048233          0.0060729 
## 
## var ratio (Tr/Co).....    0.58288            0.69476 
## T-test p-value........   0.064043            0.14352 
## 
## 
## ***** (V7) married *****
##                        Before Matching        After Matching
## mean treatment........    0.18919            0.18919 
## mean control..........    0.15385            0.16351 
## std mean diff.........     8.9995             6.5379 
## 
## mean raw eQQ diff.....   0.037838          0.0080972 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.017672          0.0040486 
## med  eCDF diff........   0.017672          0.0040486 
## max  eCDF diff........   0.035343          0.0080972 
## 
## var ratio (Tr/Co).....     1.1802             1.1215 
## T-test p-value........    0.33425            0.39942 
## 
## 
## ***** (V8) nodegree *****
##                        Before Matching        After Matching
## mean treatment........    0.70811            0.70811 
## mean control..........    0.83462            0.71807 
## std mean diff.........    -27.751            -2.1852 
## 
## mean raw eQQ diff.....    0.12432            0.01417 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.063254           0.007085 
## med  eCDF diff........   0.063254           0.007085 
## max  eCDF diff........    0.12651            0.01417 
## 
## var ratio (Tr/Co).....     1.4998              1.021 
## T-test p-value........  0.0020368            0.72722 
## 
## 
## ***** (V9) re74 *****
##                        Before Matching        After Matching
## mean treatment........     2095.6             2095.6 
## mean control..........       2107             1497.1 
## std mean diff.........   -0.23437             12.247 
## 
## mean raw eQQ diff.....     487.98             163.88 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....       8413             9319.2 
## 
## mean eCDF diff........   0.019501           0.009145 
## med  eCDF diff........   0.016112          0.0060729 
## max  eCDF diff........   0.047089           0.038462 
## 
## var ratio (Tr/Co).....     0.7381             1.9637 
## T-test p-value........    0.98186            0.17766 
## KS Bootstrap p-value..        0.6                0.3 
## KS Naive p-value......    0.97023            0.85831 
## KS Statistic..........   0.047089           0.038462 
## 
## 
## ***** (V10) I(re74^2) *****
##                        Before Matching        After Matching
## mean treatment........   28141412           28141412 
## mean control..........   36667400           14335932 
## std mean diff.........    -7.4722             12.099 
## 
## mean raw eQQ diff.....   13311768            3403402 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  365146183          566240806 
## 
## mean eCDF diff........   0.019501           0.009145 
## med  eCDF diff........   0.016112          0.0060729 
## max  eCDF diff........   0.047089           0.038462 
## 
## var ratio (Tr/Co).....    0.50382             4.4094 
## T-test p-value........    0.51322            0.14459 
## KS Bootstrap p-value..        0.6                0.3 
## KS Naive p-value......    0.97023            0.85831 
## KS Statistic..........   0.047089           0.038462 
## 
## 
## ***** (V11) re75 *****
##                        Before Matching        After Matching
## mean treatment........     1532.1             1532.1 
## mean control..........     1266.9             1242.1 
## std mean diff.........     8.2363             9.0073 
## 
## mean raw eQQ diff.....     367.61             62.761 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....     2110.3             2510.5 
## 
## mean eCDF diff........   0.051061           0.006862 
## med  eCDF diff........   0.064657          0.0040486 
## max  eCDF diff........    0.10748           0.036437 
## 
## var ratio (Tr/Co).....     1.0763             1.3081 
## T-test p-value........    0.38527            0.35054 
## KS Bootstrap p-value.. < 2.22e-16                0.5 
## KS Naive p-value......    0.16449            0.89829 
## KS Statistic..........    0.10748           0.036437 
## 
## 
## ***** (V12) I(re75^2) *****
##                        Before Matching        After Matching
## mean treatment........   12654750           12654750 
## mean control..........   11196524            9422619 
## std mean diff.........     2.6024             5.7682 
## 
## mean raw eQQ diff.....    2840847             885385 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  101660120          101660120 
## 
## mean eCDF diff........   0.051061           0.006862 
## med  eCDF diff........   0.064657          0.0040486 
## max  eCDF diff........    0.10748           0.036437 
## 
## var ratio (Tr/Co).....     1.4609             2.0326 
## T-test p-value........    0.77178             0.5263 
## KS Bootstrap p-value.. < 2.22e-16                0.5 
## KS Naive p-value......    0.16449            0.89829 
## KS Statistic..........    0.10748           0.036437 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): education I(education^2) re75 I(re75^2)  Number(s): 3 4 11 12 
## 
## After Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age I(age^2)  Number(s): 1 2

In matching, we are trying to avoid statistical significant in the difference between matched control and treatment subjects. This means the two group have more overlap and the p-value should be large so we observe a higher chance that the null hypothesis is true. This default propensity score matching using all the variable and a few interaction terms, we did not observe a significant increase in the p-value, meaning the co-variate is still not balanced after matching. The ATT estimated by this model is 777.76, with 95% confidence interval between -144.8159 and 2904.0129.

Genetic Matching

# genetic matching
attach(nsw_dw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
          nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16,
                            max.generations=10, wait.generations=1)
## Loading required namespace: rgenoud
## 
## 
## Sun Mar 25 01:25:41 2018
## Domains:
##  0.000000e+00   <=  X1   <=    1.000000e+03 
##  0.000000e+00   <=  X2   <=    1.000000e+03 
##  0.000000e+00   <=  X3   <=    1.000000e+03 
##  0.000000e+00   <=  X4   <=    1.000000e+03 
##  0.000000e+00   <=  X5   <=    1.000000e+03 
##  0.000000e+00   <=  X6   <=    1.000000e+03 
##  0.000000e+00   <=  X7   <=    1.000000e+03 
##  0.000000e+00   <=  X8   <=    1.000000e+03 
##  0.000000e+00   <=  X9   <=    1.000000e+03 
##  0.000000e+00   <=  X10  <=    1.000000e+03 
##  0.000000e+00   <=  X11  <=    1.000000e+03 
##  0.000000e+00   <=  X12  <=    1.000000e+03 
##  0.000000e+00   <=  X13  <=    1.000000e+03 
##  0.000000e+00   <=  X14  <=    1.000000e+03 
## 
## Data Type: Floating Point
## Operators (code number, name, population) 
##  (1) Cloning...........................  1
##  (2) Uniform Mutation..................  2
##  (3) Boundary Mutation.................  2
##  (4) Non-Uniform Mutation..............  2
##  (5) Polytope Crossover................  2
##  (6) Simple Crossover..................  2
##  (7) Whole Non-Uniform Mutation........  2
##  (8) Heuristic Crossover...............  2
##  (9) Local-Minimum Crossover...........  0
## 
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size       : 16
## Convergence Tolerance: 1.000000e-03
## 
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
## 
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 4.626480e-02  7.675792e-02  1.570667e-01  1.570667e-01  2.057512e-01  2.057512e-01  2.357651e-01  4.601406e-01  5.551581e-01  5.638495e-01  5.638495e-01  6.697056e-01  7.814812e-01  8.097642e-01  8.333200e-01  8.838634e-01  9.034362e-01  9.446503e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  9.906462e-01  9.906462e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 1.517827e+02
## mean............ 5.102241e+02
## variance........ 9.031596e+04
## var 2:
## best............ 7.779011e+01
## mean............ 4.195485e+02
## variance........ 7.127398e+04
## var 3:
## best............ 3.563625e+02
## mean............ 2.512126e+02
## variance........ 3.154963e+04
## var 4:
## best............ 5.433395e+02
## mean............ 6.151383e+02
## variance........ 9.154968e+04
## var 5:
## best............ 7.803356e+02
## mean............ 5.479739e+02
## variance........ 8.734486e+04
## var 6:
## best............ 2.541555e+02
## mean............ 4.671601e+02
## variance........ 7.365312e+04
## var 7:
## best............ 9.780296e+02
## mean............ 4.167427e+02
## variance........ 1.068744e+05
## var 8:
## best............ 9.259881e+02
## mean............ 5.966286e+02
## variance........ 1.069254e+05
## var 9:
## best............ 5.359330e+02
## mean............ 4.427521e+02
## variance........ 8.164095e+04
## var 10:
## best............ 2.056684e+02
## mean............ 3.085875e+02
## variance........ 4.174271e+04
## var 11:
## best............ 1.168066e+02
## mean............ 5.045664e+02
## variance........ 7.225396e+04
## var 12:
## best............ 4.666732e+02
## mean............ 4.946183e+02
## variance........ 6.340009e+04
## var 13:
## best............ 9.381770e+02
## mean............ 4.716813e+02
## variance........ 7.910594e+04
## var 14:
## best............ 4.410991e+02
## mean............ 4.084476e+02
## variance........ 7.996912e+04
## 
## GENERATION: 1
## Lexical Fit..... 5.050974e-02  8.262683e-02  1.570667e-01  1.570667e-01  2.057512e-01  2.057512e-01  2.663691e-01  5.385514e-01  5.638495e-01  5.638495e-01  5.898220e-01  6.697056e-01  7.859962e-01  8.043389e-01  8.333200e-01  8.838634e-01  9.356593e-01  9.446503e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  9.906462e-01  9.906462e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 28
## var 1:
## best............ 1.536530e+02
## mean............ 3.388658e+02
## variance........ 5.960838e+04
## var 2:
## best............ 7.048902e+01
## mean............ 3.225549e+02
## variance........ 3.766512e+04
## var 3:
## best............ 3.583915e+02
## mean............ 3.463198e+02
## variance........ 1.305464e+04
## var 4:
## best............ 5.391866e+02
## mean............ 6.949173e+02
## variance........ 2.157672e+04
## var 5:
## best............ 7.809516e+02
## mean............ 5.841643e+02
## variance........ 4.473077e+04
## var 6:
## best............ 2.526467e+02
## mean............ 2.737395e+02
## variance........ 6.246159e+04
## var 7:
## best............ 9.835427e+02
## mean............ 4.886302e+02
## variance........ 1.213228e+05
## var 8:
## best............ 9.342610e+02
## mean............ 6.995705e+02
## variance........ 5.021819e+04
## var 9:
## best............ 5.313971e+02
## mean............ 5.617117e+02
## variance........ 4.010984e+04
## var 10:
## best............ 2.028814e+02
## mean............ 2.691915e+02
## variance........ 1.045162e+04
## var 11:
## best............ 1.120078e+02
## mean............ 2.487344e+02
## variance........ 3.464837e+04
## var 12:
## best............ 4.608774e+02
## mean............ 4.395572e+02
## variance........ 7.402288e+04
## var 13:
## best............ 9.436336e+02
## mean............ 6.403012e+02
## variance........ 4.058389e+04
## var 14:
## best............ 4.445557e+02
## mean............ 3.849501e+02
## variance........ 1.353245e+04
## 
## GENERATION: 2
## Lexical Fit..... 2.057512e-01  2.057512e-01  2.110219e-01  2.411855e-01  3.173114e-01  3.173114e-01  4.735241e-01  4.799121e-01  6.458307e-01  6.576197e-01  6.735115e-01  6.901604e-01  7.820381e-01  8.165199e-01  8.453031e-01  8.453031e-01  8.453031e-01  9.658319e-01  9.658319e-01  9.658319e-01  9.906462e-01  9.906462e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 41
## var 1:
## best............ 1.889952e+02
## mean............ 1.591668e+02
## variance........ 1.770360e+02
## var 2:
## best............ 6.434273e+01
## mean............ 8.616392e+01
## variance........ 1.668142e+03
## var 3:
## best............ 3.440148e+02
## mean............ 3.557315e+02
## variance........ 1.762982e+02
## var 4:
## best............ 5.898816e+02
## mean............ 5.393199e+02
## variance........ 3.677210e+02
## var 5:
## best............ 7.828007e+02
## mean............ 7.757473e+02
## variance........ 4.420380e+02
## var 6:
## best............ 2.768641e+02
## mean............ 2.567741e+02
## variance........ 1.550717e+02
## var 7:
## best............ 9.859199e+02
## mean............ 9.453758e+02
## variance........ 1.666109e+04
## var 8:
## best............ 9.359220e+02
## mean............ 9.235027e+02
## variance........ 1.035936e+03
## var 9:
## best............ 6.133072e+02
## mean............ 5.403610e+02
## variance........ 4.901364e+02
## var 10:
## best............ 1.893726e+02
## mean............ 2.425050e+02
## variance........ 2.292123e+04
## var 11:
## best............ 1.344795e+02
## mean............ 1.214234e+02
## variance........ 7.421132e+02
## var 12:
## best............ 3.916647e+02
## mean............ 4.602420e+02
## variance........ 3.712739e+02
## var 13:
## best............ 9.173538e+02
## mean............ 9.365062e+02
## variance........ 2.404518e+02
## var 14:
## best............ 3.615960e+02
## mean............ 4.178555e+02
## variance........ 4.743726e+03
## 
## GENERATION: 3
## Lexical Fit..... 2.057512e-01  2.057512e-01  2.705633e-01  2.796699e-01  3.015794e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.939436e-01  5.630807e-01  7.022789e-01  7.801215e-01  8.453031e-01  8.788599e-01  8.797346e-01  8.969526e-01  9.172177e-01  9.172177e-01  9.172177e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 54
## var 1:
## best............ 1.798465e+02
## mean............ 2.310484e+02
## variance........ 2.122945e+04
## var 2:
## best............ 6.755342e+01
## mean............ 8.431936e+01
## variance........ 2.179058e+03
## var 3:
## best............ 3.493640e+02
## mean............ 3.528277e+02
## variance........ 3.469362e+02
## var 4:
## best............ 5.706795e+02
## mean............ 5.564649e+02
## variance........ 2.046527e+03
## var 5:
## best............ 7.820135e+02
## mean............ 7.670474e+02
## variance........ 3.416214e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.837747e+02
## variance........ 2.320737e+03
## var 7:
## best............ 9.247469e+02
## mean............ 9.052096e+02
## variance........ 3.064334e+04
## var 8:
## best............ 9.343419e+02
## mean............ 8.769643e+02
## variance........ 4.562244e+04
## var 9:
## best............ 5.820357e+02
## mean............ 5.807763e+02
## variance........ 2.097102e+03
## var 10:
## best............ 1.949297e+02
## mean............ 1.981626e+02
## variance........ 1.226096e+02
## var 11:
## best............ 1.263023e+02
## mean............ 1.384642e+02
## variance........ 2.622175e+03
## var 12:
## best............ 4.191775e+02
## mean............ 4.216002e+02
## variance........ 2.368424e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.259202e+02
## variance........ 1.526832e+02
## var 14:
## best............ 3.153829e+02
## mean............ 3.311244e+02
## variance........ 1.009938e+04
## 
## GENERATION: 4
## Lexical Fit..... 2.057512e-01  2.057512e-01  2.705633e-01  2.796699e-01  3.015794e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.939436e-01  5.630807e-01  7.022789e-01  7.801215e-01  8.453031e-01  8.788599e-01  8.797346e-01  8.969526e-01  9.172177e-01  9.172177e-01  9.172177e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 67
## var 1:
## best............ 1.798465e+02
## mean............ 1.755787e+02
## variance........ 8.514986e+01
## var 2:
## best............ 6.755342e+01
## mean............ 8.824120e+01
## variance........ 5.359183e+03
## var 3:
## best............ 3.493640e+02
## mean............ 3.379118e+02
## variance........ 5.499746e+03
## var 4:
## best............ 5.706795e+02
## mean............ 5.510933e+02
## variance........ 2.158019e+03
## var 5:
## best............ 7.820135e+02
## mean............ 7.635109e+02
## variance........ 2.396517e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.842887e+02
## variance........ 1.784583e+03
## var 7:
## best............ 9.247469e+02
## mean............ 8.412525e+02
## variance........ 3.471076e+04
## var 8:
## best............ 9.343419e+02
## mean............ 9.312602e+02
## variance........ 3.092140e+02
## var 9:
## best............ 5.820357e+02
## mean............ 5.789174e+02
## variance........ 1.357159e+02
## var 10:
## best............ 1.949297e+02
## mean............ 2.400511e+02
## variance........ 1.359206e+04
## var 11:
## best............ 1.263023e+02
## mean............ 1.238253e+02
## variance........ 1.905438e+02
## var 12:
## best............ 4.191775e+02
## mean............ 4.012987e+02
## variance........ 5.559249e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.223748e+02
## variance........ 1.576272e+02
## var 14:
## best............ 3.153829e+02
## mean............ 3.220871e+02
## variance........ 1.884053e+02
## 
## GENERATION: 5
## Lexical Fit..... 2.057512e-01  2.057512e-01  2.705633e-01  2.796699e-01  3.015794e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.939436e-01  5.630807e-01  7.022789e-01  7.801215e-01  8.453031e-01  8.788599e-01  8.797346e-01  8.969526e-01  9.172177e-01  9.172177e-01  9.172177e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 80
## var 1:
## best............ 1.798465e+02
## mean............ 2.201109e+02
## variance........ 1.766000e+04
## var 2:
## best............ 6.755342e+01
## mean............ 6.750422e+01
## variance........ 2.004267e+00
## var 3:
## best............ 3.493640e+02
## mean............ 3.388136e+02
## variance........ 1.623358e+03
## var 4:
## best............ 5.706795e+02
## mean............ 5.717444e+02
## variance........ 1.681180e+01
## var 5:
## best............ 7.820135e+02
## mean............ 7.919621e+02
## variance........ 1.236508e+03
## var 6:
## best............ 2.726131e+02
## mean............ 2.722081e+02
## variance........ 1.434388e+00
## var 7:
## best............ 9.247469e+02
## mean............ 9.236964e+02
## variance........ 3.596068e+03
## var 8:
## best............ 9.343419e+02
## mean............ 9.344642e+02
## variance........ 8.678065e-01
## var 9:
## best............ 5.820357e+02
## mean............ 5.819482e+02
## variance........ 1.515294e+02
## var 10:
## best............ 1.949297e+02
## mean............ 1.952928e+02
## variance........ 1.373523e+01
## var 11:
## best............ 1.263023e+02
## mean............ 1.273412e+02
## variance........ 1.773052e+01
## var 12:
## best............ 4.191775e+02
## mean............ 4.254978e+02
## variance........ 1.203550e+03
## var 13:
## best............ 9.269344e+02
## mean............ 9.258438e+02
## variance........ 4.180780e+00
## var 14:
## best............ 3.153829e+02
## mean............ 3.153620e+02
## variance........ 9.430142e-01
## 
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
## 
## Solution Lexical Fitness Value:
## 2.057512e-01  2.057512e-01  2.705633e-01  2.796699e-01  3.015794e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.939436e-01  5.630807e-01  7.022789e-01  7.801215e-01  8.453031e-01  8.788599e-01  8.797346e-01  8.969526e-01  9.172177e-01  9.172177e-01  9.172177e-01  9.446503e-01  9.446503e-01  9.658319e-01  9.658319e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## 
## Parameters at the Solution:
## 
##  X[ 1] : 1.798465e+02
##  X[ 2] : 6.755342e+01
##  X[ 3] : 3.493640e+02
##  X[ 4] : 5.706795e+02
##  X[ 5] : 7.820135e+02
##  X[ 6] : 2.726131e+02
##  X[ 7] : 9.247469e+02
##  X[ 8] : 9.343419e+02
##  X[ 9] : 5.820357e+02
##  X[10] : 1.949297e+02
##  X[11] : 1.263023e+02
##  X[12] : 4.191775e+02
##  X[13] : 9.269344e+02
##  X[14] : 3.153829e+02
## 
## Solution Found Generation 3
## Number of Generations Run 5
## 
## Sun Mar 25 01:25:42 2018
## Total run time : 0 hours 0 minutes and 1 seconds
geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE", Weight.matrix=geneticMatching)
summary(geneticMatching.ATT)
## 
## Estimate...  1870.7 
## AI SE......  736.51 
## T-stat.....  2.5399 
## p.val......  0.011088 
## 
## Original number of observations..............  445 
## Original number of treated obs...............  185 
## Matched number of observations...............  445 
## Matched number of observations  (unweighted).  585
detach(nsw_dw)

# 95% confidence interval
attach(geneticMatching.ATT)
## The following object is masked from package:base:
## 
##     version
c(est-1.96*se, est+1.96*se)
## [1]  427.1013 3314.2132
detach(geneticMatching.ATT)

# balance test
geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) + black +
                                        hispanic + married + nodegree + re74 + I(re74^2) + re75 + I(re75^2),
                                        data=nsw_dw, match.out=geneticMatching.ATT, nboots=10)
## 
## ***** (V1) age *****
##                        Before Matching        After Matching
## mean treatment........     25.816             25.016 
## mean control..........     25.054             25.063 
## std mean diff.........     10.655           -0.71156 
## 
## mean raw eQQ diff.....    0.94054            0.25983 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....          7                  8 
## 
## mean eCDF diff........   0.025364          0.0068879 
## med  eCDF diff........   0.022193          0.0051282 
## max  eCDF diff........   0.065177           0.032479 
## 
## var ratio (Tr/Co).....     1.0278            0.91874 
## T-test p-value........    0.26594            0.78012 
## KS Bootstrap p-value..        0.4                0.8 
## KS Naive p-value......     0.7481            0.91722 
## KS Statistic..........   0.065177           0.032479 
## 
## 
## ***** (V2) I(age^2) *****
##                        Before Matching        After Matching
## mean treatment........     717.39             669.67 
## mean control..........     677.32             675.92 
## std mean diff.........     9.2937            -1.6151 
## 
## mean raw eQQ diff.....     56.076             17.012 
## med  raw eQQ diff.....         43                  0 
## max  raw eQQ diff.....        721                800 
## 
## mean eCDF diff........   0.025364          0.0068879 
## med  eCDF diff........   0.022193          0.0051282 
## max  eCDF diff........   0.065177           0.032479 
## 
## var ratio (Tr/Co).....     1.0115            0.86003 
## T-test p-value........    0.33337            0.56308 
## KS Bootstrap p-value..        0.4                0.8 
## KS Naive p-value......     0.7481            0.91722 
## KS Statistic..........   0.065177           0.032479 
## 
## 
## ***** (V3) education *****
##                        Before Matching        After Matching
## mean treatment........     10.346             10.204 
## mean control..........     10.088               10.2 
## std mean diff.........     12.806            0.26425 
## 
## mean raw eQQ diff.....    0.40541           0.075214 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          2                  2 
## 
## mean eCDF diff........   0.028698          0.0053724 
## med  eCDF diff........   0.012682          0.0042735 
## max  eCDF diff........    0.12651           0.013675 
## 
## var ratio (Tr/Co).....     1.5513             1.0432 
## T-test p-value........    0.15017            0.87886 
## KS Bootstrap p-value.. < 2.22e-16                  1 
## KS Naive p-value......   0.062873                  1 
## KS Statistic..........    0.12651           0.013675 
## 
## 
## ***** (V4) I(education^2) *****
##                        Before Matching        After Matching
## mean treatment........     111.06             107.02 
## mean control..........     104.37             106.81 
## std mean diff.........     17.012            0.63649 
## 
## mean raw eQQ diff.....     8.7189             1.4188 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....         60                 60 
## 
## mean eCDF diff........   0.028698          0.0053724 
## med  eCDF diff........   0.012682          0.0042735 
## max  eCDF diff........    0.12651           0.013675 
## 
## var ratio (Tr/Co).....     1.6625             1.0984 
## T-test p-value........   0.053676            0.70228 
## KS Bootstrap p-value.. < 2.22e-16                  1 
## KS Naive p-value......   0.062873                  1 
## KS Statistic..........    0.12651           0.013675 
## 
## 
## ***** (V5) black *****
##                        Before Matching        After Matching
## mean treatment........    0.84324            0.83596 
## mean control..........    0.82692            0.84494 
## std mean diff.........     4.4767            -2.4246 
## 
## mean raw eQQ diff.....   0.016216          0.0068376 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0081601          0.0034188 
## med  eCDF diff........  0.0081601          0.0034188 
## max  eCDF diff........    0.01632          0.0068376 
## 
## var ratio (Tr/Co).....    0.92503             1.0467 
## T-test p-value........    0.64736            0.20575 
## 
## 
## ***** (V6) hispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.059459            0.08764 
## mean control..........    0.10769            0.08764 
## std mean diff.........    -20.341                  0 
## 
## mean raw eQQ diff.....   0.048649                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.024116                  0 
## med  eCDF diff........   0.024116                  0 
## max  eCDF diff........   0.048233                  0 
## 
## var ratio (Tr/Co).....    0.58288                  1 
## T-test p-value........   0.064043                  1 
## 
## 
## ***** (V7) married *****
##                        Before Matching        After Matching
## mean treatment........    0.18919            0.17079 
## mean control..........    0.15385            0.16854 
## std mean diff.........     8.9995            0.59647 
## 
## mean raw eQQ diff.....   0.037838          0.0017094 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.017672          0.0008547 
## med  eCDF diff........   0.017672          0.0008547 
## max  eCDF diff........   0.035343          0.0017094 
## 
## var ratio (Tr/Co).....     1.1802             1.0106 
## T-test p-value........    0.33425            0.31731 
## 
## 
## ***** (V8) nodegree *****
##                        Before Matching        After Matching
## mean treatment........    0.70811            0.78202 
## mean control..........    0.83462            0.78427 
## std mean diff.........    -27.751           -0.54367 
## 
## mean raw eQQ diff.....    0.12432          0.0017094 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.063254          0.0008547 
## med  eCDF diff........   0.063254          0.0008547 
## max  eCDF diff........    0.12651          0.0017094 
## 
## var ratio (Tr/Co).....     1.4998             1.0075 
## T-test p-value........  0.0020368            0.31731 
## 
## 
## ***** (V9) re74 *****
##                        Before Matching        After Matching
## mean treatment........     2095.6             1851.6 
## mean control..........       2107             1867.5 
## std mean diff.........   -0.23437            -0.3388 
## 
## mean raw eQQ diff.....     487.98             241.36 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....       8413             7870.3 
## 
## mean eCDF diff........   0.019501          0.0089929 
## med  eCDF diff........   0.016112           0.008547 
## max  eCDF diff........   0.047089            0.02906 
## 
## var ratio (Tr/Co).....     0.7381            0.83463 
## T-test p-value........    0.98186            0.89695 
## KS Bootstrap p-value..        0.8                0.4 
## KS Naive p-value......    0.97023            0.96583 
## KS Statistic..........   0.047089            0.02906 
## 
## 
## ***** (V10) I(re74^2) *****
##                        Before Matching        After Matching
## mean treatment........   28141412           25209598 
## mean control..........   36667400           29583990 
## std mean diff.........    -7.4722            -4.2548 
## 
## mean raw eQQ diff.....   13311768            6319599 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  365146183          470088278 
## 
## mean eCDF diff........   0.019501          0.0089929 
## med  eCDF diff........   0.016112           0.008547 
## max  eCDF diff........   0.047089            0.02906 
## 
## var ratio (Tr/Co).....    0.50382            0.58969 
## T-test p-value........    0.51322            0.27056 
## KS Bootstrap p-value..        0.8                0.4 
## KS Naive p-value......    0.97023            0.96583 
## KS Statistic..........   0.047089            0.02906 
## 
## 
## ***** (V11) re75 *****
##                        Before Matching        After Matching
## mean treatment........     1532.1             1268.4 
## mean control..........     1266.9               1199 
## std mean diff.........     8.2363             2.3033 
## 
## mean raw eQQ diff.....     367.61              102.6 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....     2110.3             2510.5 
## 
## mean eCDF diff........   0.051061          0.0093411 
## med  eCDF diff........   0.064657          0.0068376 
## max  eCDF diff........    0.10748           0.030769 
## 
## var ratio (Tr/Co).....     1.0763             1.0502 
## T-test p-value........    0.38527            0.30158 
## KS Bootstrap p-value..        0.1                0.5 
## KS Naive p-value......    0.16449            0.94465 
## KS Statistic..........    0.10748           0.030769 
## 
## 
## ***** (V12) I(re75^2) *****
##                        Before Matching        After Matching
## mean treatment........   12654750           10685430 
## mean control..........   11196524           10079958 
## std mean diff.........     2.6024              1.172 
## 
## mean raw eQQ diff.....    2840847            1383849 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  101660120          101660120 
## 
## mean eCDF diff........   0.051061          0.0093411 
## med  eCDF diff........   0.064657          0.0068376 
## max  eCDF diff........    0.10748           0.030769 
## 
## var ratio (Tr/Co).....     1.4609             1.2575 
## T-test p-value........    0.77178            0.49394 
## KS Bootstrap p-value..        0.1                0.5 
## KS Naive p-value......    0.16449            0.94465 
## KS Statistic..........    0.10748           0.030769 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): education I(education^2)  Number(s): 3 4 
## 
## After Matching Minimum p.value: 0.20575 
## Variable Name(s): black  Number(s): 5

Genetic matching is a better way to obtain balance in the co-variate. It uses genetic algorithm to improve on the lowest p-value (which is the least balanced co-variate) to produce an overall balanced matching model. The default genetic matching provides by R can on average produce an improved p-value of 0.1, which is a reasonable p-value we can accept the matching did successfully balance the co-variate. This genetic matching model estimates an ATT of 1683.3 with 95% confidence interval between 193.8664 and 3172.6476.

Genetic Matching Improvement 1

# bind the cps_control observations to the treated observations
cps_nsw <- rbind(cps_control, nsw_dw.treated)

# genetic matching on the combined dataset
attach(cps_nsw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
          nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
cps_nsw.geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16,
                                    ties=FALSE,   # ties is set to false to speed up the code
                                    max.generations=10, wait.generations=1)
## 
## 
## Sun Mar 25 01:25:47 2018
## Domains:
##  0.000000e+00   <=  X1   <=    1.000000e+03 
##  0.000000e+00   <=  X2   <=    1.000000e+03 
##  0.000000e+00   <=  X3   <=    1.000000e+03 
##  0.000000e+00   <=  X4   <=    1.000000e+03 
##  0.000000e+00   <=  X5   <=    1.000000e+03 
##  0.000000e+00   <=  X6   <=    1.000000e+03 
##  0.000000e+00   <=  X7   <=    1.000000e+03 
##  0.000000e+00   <=  X8   <=    1.000000e+03 
##  0.000000e+00   <=  X9   <=    1.000000e+03 
##  0.000000e+00   <=  X10  <=    1.000000e+03 
##  0.000000e+00   <=  X11  <=    1.000000e+03 
##  0.000000e+00   <=  X12  <=    1.000000e+03 
##  0.000000e+00   <=  X13  <=    1.000000e+03 
##  0.000000e+00   <=  X14  <=    1.000000e+03 
## 
## Data Type: Floating Point
## Operators (code number, name, population) 
##  (1) Cloning...........................  1
##  (2) Uniform Mutation..................  2
##  (3) Boundary Mutation.................  2
##  (4) Non-Uniform Mutation..............  2
##  (5) Polytope Crossover................  2
##  (6) Simple Crossover..................  2
##  (7) Whole Non-Uniform Mutation........  2
##  (8) Heuristic Crossover...............  2
##  (9) Local-Minimum Crossover...........  0
## 
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size       : 16
## Convergence Tolerance: 1.000000e-03
## 
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
## 
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  4.113154e-12  4.113154e-12  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.275107e-01
## mean............ 4.280594e+02
## variance........ 1.002552e+05
## var 2:
## best............ 5.540849e+02
## mean............ 5.465021e+02
## variance........ 1.121796e+05
## var 3:
## best............ 6.201271e+02
## mean............ 5.386414e+02
## variance........ 7.914807e+04
## var 4:
## best............ 1.317797e+02
## mean............ 5.066893e+02
## variance........ 1.325341e+05
## var 5:
## best............ 6.613368e+02
## mean............ 4.586361e+02
## variance........ 7.496608e+04
## var 6:
## best............ 9.685073e+02
## mean............ 4.975909e+02
## variance........ 1.143074e+05
## var 7:
## best............ 3.437448e+02
## mean............ 5.151082e+02
## variance........ 6.896991e+04
## var 8:
## best............ 3.401326e+01
## mean............ 4.673805e+02
## variance........ 9.919997e+04
## var 9:
## best............ 7.443439e+02
## mean............ 6.156917e+02
## variance........ 7.603490e+04
## var 10:
## best............ 1.510109e+02
## mean............ 4.672184e+02
## variance........ 7.776797e+04
## var 11:
## best............ 4.210445e+02
## mean............ 5.393222e+02
## variance........ 9.230107e+04
## var 12:
## best............ 6.263855e+02
## mean............ 5.011024e+02
## variance........ 1.111816e+05
## var 13:
## best............ 1.920704e+02
## mean............ 3.961493e+02
## variance........ 8.682024e+04
## var 14:
## best............ 5.752659e+01
## mean............ 4.998127e+02
## variance........ 7.721434e+04
## 
## GENERATION: 1
## Lexical Fit..... 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  2.011820e-07  2.011820e-07  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 9.275107e-01
## mean............ 1.706820e+02
## variance........ 7.686751e+04
## var 2:
## best............ 5.540849e+02
## mean............ 6.115401e+02
## variance........ 1.379646e+04
## var 3:
## best............ 5.570844e+02
## mean............ 5.421841e+02
## variance........ 1.883358e+04
## var 4:
## best............ 1.722150e+02
## mean............ 1.928798e+02
## variance........ 7.377514e+04
## var 5:
## best............ 6.613368e+02
## mean............ 5.895555e+02
## variance........ 2.619710e+04
## var 6:
## best............ 9.685073e+02
## mean............ 8.261733e+02
## variance........ 4.530463e+04
## var 7:
## best............ 4.300470e+02
## mean............ 4.189584e+02
## variance........ 4.223083e+04
## var 8:
## best............ 3.401326e+01
## mean............ 2.082935e+02
## variance........ 7.235118e+04
## var 9:
## best............ 7.816016e+02
## mean............ 8.016537e+02
## variance........ 1.296299e+04
## var 10:
## best............ 1.510109e+02
## mean............ 2.989919e+02
## variance........ 4.935110e+04
## var 11:
## best............ 4.010951e+02
## mean............ 5.330253e+02
## variance........ 1.690894e+04
## var 12:
## best............ 6.263855e+02
## mean............ 6.774050e+02
## variance........ 6.917801e+03
## var 13:
## best............ 1.679291e+02
## mean............ 1.772045e+02
## variance........ 5.299291e+03
## var 14:
## best............ 5.752659e+01
## mean............ 1.925354e+02
## variance........ 3.038729e+04
## 
## GENERATION: 2
## Lexical Fit..... 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  2.198094e-05  2.198094e-05  5.637069e-01  5.637069e-01  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 42
## var 1:
## best............ 9.275107e-01
## mean............ 6.069794e+00
## variance........ 3.195081e+02
## var 2:
## best............ 5.540849e+02
## mean............ 5.725737e+02
## variance........ 6.576686e+03
## var 3:
## best............ 8.920537e+01
## mean............ 3.915143e+02
## variance........ 5.200025e+04
## var 4:
## best............ 1.317797e+02
## mean............ 1.709342e+02
## variance........ 9.136457e+03
## var 5:
## best............ 4.693917e+02
## mean............ 6.485712e+02
## variance........ 2.159074e+03
## var 6:
## best............ 9.685073e+02
## mean............ 9.685594e+02
## variance........ 2.002155e-02
## var 7:
## best............ 3.437448e+02
## mean............ 3.874747e+02
## variance........ 1.991199e+03
## var 8:
## best............ 3.401326e+01
## mean............ 3.756305e+01
## variance........ 2.144090e+02
## var 9:
## best............ 7.443439e+02
## mean............ 7.577680e+02
## variance........ 5.924252e+02
## var 10:
## best............ 1.913898e+02
## mean............ 1.744536e+02
## variance........ 6.558745e+03
## var 11:
## best............ 4.268644e+02
## mean............ 4.149565e+02
## variance........ 2.209661e+02
## var 12:
## best............ 8.620674e+02
## mean............ 6.428866e+02
## variance........ 3.249749e+03
## var 13:
## best............ 1.920704e+02
## mean............ 1.766784e+02
## variance........ 4.042992e+02
## var 14:
## best............ 3.301506e+02
## mean............ 8.755586e+01
## variance........ 6.097591e+03
## 
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
## 
## Solution Lexical Fitness Value:
## 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  2.198094e-05  2.198094e-05  5.637069e-01  5.637069e-01  1.000000e+00  1.000000e+00  
## 
## Parameters at the Solution:
## 
##  X[ 1] : 9.275107e-01
##  X[ 2] : 5.540849e+02
##  X[ 3] : 8.920537e+01
##  X[ 4] : 1.317797e+02
##  X[ 5] : 4.693917e+02
##  X[ 6] : 9.685073e+02
##  X[ 7] : 3.437448e+02
##  X[ 8] : 3.401326e+01
##  X[ 9] : 7.443439e+02
##  X[10] : 1.913898e+02
##  X[11] : 4.268644e+02
##  X[12] : 8.620674e+02
##  X[13] : 1.920704e+02
##  X[14] : 3.301506e+02
## 
## Solution Found Generation 1
## Number of Generations Run 2
## 
## Sun Mar 25 01:28:31 2018
## Total run time : 0 hours 2 minutes and 44 seconds
cps_nsw.geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE", 
                                     Weight.matrix=cps_nsw.geneticMatching)
summary(cps_nsw.geneticMatching.ATT)
## 
## Estimate...  -5380.8 
## AI SE......  2883.7 
## T-stat.....  -1.8659 
## p.val......  0.062049 
## 
## Original number of observations..............  16177 
## Original number of treated obs...............  185 
## Matched number of observations...............  16177 
## Matched number of observations  (unweighted).  16290
detach(cps_nsw)

# 95% confidence interval
attach(cps_nsw.geneticMatching.ATT)
## The following object is masked from package:base:
## 
##     version
c(est-1.96*se, est+1.96*se)
## [1] -11032.7568    271.2242
detach(cps_nsw.geneticMatching.ATT)

# balance test
cps_nsw.geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
                                                black + hispanic + married + nodegree + re74 +
                                                I(re74^2) + re75 + I(re75^2), data=cps_nsw,
                                                match.out=cps_nsw.geneticMatching.ATT, nboots=10)
## 
## ***** (V1) age *****
##                        Before Matching        After Matching
## mean treatment........     25.816             26.894 
## mean control..........     33.225             33.138 
## std mean diff.........    -103.55            -78.487 
## 
## mean raw eQQ diff.....     7.4865             6.4125 
## med  raw eQQ diff.....          6                  7 
## max  raw eQQ diff.....         16                 14 
## 
## mean eCDF diff........    0.18628            0.16031 
## med  eCDF diff........    0.19692            0.16249 
## max  eCDF diff........    0.34274            0.38643 
## 
## var ratio (Tr/Co).....    0.41964            0.51926 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.34274            0.38643 
## 
## 
## ***** (V2) I(age^2) *****
##                        Before Matching        After Matching
## mean treatment........     717.39             786.57 
## mean control..........     1225.9               1220 
## std mean diff.........    -117.92            -89.722 
## 
## mean raw eQQ diff.....     513.91             437.94 
## med  raw eQQ diff.....        336                477 
## max  raw eQQ diff.....       1312               1344 
## 
## mean eCDF diff........    0.18628            0.16031 
## med  eCDF diff........    0.19692            0.16249 
## max  eCDF diff........    0.34274            0.38643 
## 
## var ratio (Tr/Co).....      0.302            0.38011 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.34274            0.38643 
## 
## 
## ***** (V3) education *****
##                        Before Matching        After Matching
## mean treatment........     10.346             11.521 
## mean control..........     12.028             12.008 
## std mean diff.........    -83.633            -30.046 
## 
## mean raw eQQ diff.....     1.7351            0.92492 
## med  raw eQQ diff.....          2                  0 
## max  raw eQQ diff.....          4                  5 
## 
## mean eCDF diff........   0.090791            0.04868 
## med  eCDF diff........   0.037581           0.026826 
## max  eCDF diff........    0.41227            0.16839 
## 
## var ratio (Tr/Co).....    0.49052            0.31951 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.41227            0.16839 
## 
## 
## ***** (V4) I(education^2) *****
##                        Before Matching        After Matching
## mean treatment........     111.06             135.36 
## mean control..........      152.9             152.42 
## std mean diff.........    -106.46            -47.582 
## 
## mean raw eQQ diff.....     42.168             23.029 
## med  raw eQQ diff.....         40                  0 
## max  raw eQQ diff.....        128                128 
## 
## mean eCDF diff........   0.090791            0.04868 
## med  eCDF diff........   0.037581           0.026826 
## max  eCDF diff........    0.41227            0.16839 
## 
## var ratio (Tr/Co).....    0.34243            0.28564 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.41227            0.16839 
## 
## 
## ***** (V5) black *****
##                        Before Matching        After Matching
## mean treatment........    0.84324           0.082339 
## mean control..........   0.073537           0.082339 
## std mean diff.........     211.13                  0 
## 
## mean raw eQQ diff.....    0.76757                  0 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.38485                  0 
## med  eCDF diff........    0.38485                  0 
## max  eCDF diff........    0.76971                  0 
## 
## var ratio (Tr/Co).....     1.9506                  1 
## T-test p-value........ < 2.22e-16                  1 
## 
## 
## ***** (V6) hispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.059459            0.07078 
## mean control..........   0.072036           0.071892 
## std mean diff.........    -5.3038           -0.43386 
## 
## mean raw eQQ diff.....   0.016216           0.001105 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0062883         0.00055249 
## med  eCDF diff........  0.0062883         0.00055249 
## max  eCDF diff........   0.012577           0.001105 
## 
## var ratio (Tr/Co).....    0.84109             0.9857 
## T-test p-value........    0.47458         2.1981e-05 
## 
## 
## ***** (V7) married *****
##                        Before Matching        After Matching
## mean treatment........    0.18919            0.13501 
## mean control..........    0.71173              0.706 
## std mean diff.........    -133.06            -167.08 
## 
## mean raw eQQ diff.....    0.51892            0.56703 
## med  raw eQQ diff.....          1                  1 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.26127            0.28352 
## med  eCDF diff........    0.26127            0.28352 
## max  eCDF diff........    0.52254            0.56703 
## 
## var ratio (Tr/Co).....    0.75167            0.56262 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## 
## 
## ***** (V8) nodegree *****
##                        Before Matching        After Matching
## mean treatment........    0.70811            0.30049 
## mean control..........    0.29584            0.30055 
## std mean diff.........     90.437          -0.013483 
## 
## mean raw eQQ diff.....    0.41081         6.1387e-05 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.20614         3.0694e-05 
## med  eCDF diff........    0.20614         3.0694e-05 
## max  eCDF diff........    0.41227         6.1387e-05 
## 
## var ratio (Tr/Co).....    0.99753            0.99988 
## T-test p-value........ < 2.22e-16            0.56371 
## 
## 
## ***** (V9) re74 *****
##                        Before Matching        After Matching
## mean treatment........     2095.6             4057.6 
## mean control..........      14017              13884 
## std mean diff.........    -243.96            -222.72 
## 
## mean raw eQQ diff.....      12014             9761.2 
## med  raw eQQ diff.....      13276              10815 
## max  raw eQQ diff.....      23256              19102 
## 
## mean eCDF diff........    0.45911            0.37406 
## med  eCDF diff........    0.50015            0.33892 
## max  eCDF diff........    0.60309            0.58656 
## 
## var ratio (Tr/Co).....    0.26074            0.21101 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.60309            0.58656 
## 
## 
## ***** (V10) I(re74^2) *****
##                        Before Matching        After Matching
## mean treatment........   28141412           35929801 
## mean control..........  288045960          285021708 
## std mean diff.........    -227.78            -381.53 
## 
## mean raw eQQ diff.....  266205831          247439915 
## med  raw eQQ diff.....  221783269          200270349 
## max  raw eQQ diff.....  658838222          623162091 
## 
## mean eCDF diff........    0.45911            0.37406 
## med  eCDF diff........    0.50015            0.33892 
## max  eCDF diff........    0.60309            0.58656 
## 
## var ratio (Tr/Co).....    0.19219            0.06285 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........    0.60309            0.58656 
## 
## 
## ***** (V11) re75 *****
##                        Before Matching        After Matching
## mean treatment........     1532.1             4609.4 
## mean control..........      13651              13513 
## std mean diff.........    -376.45            -207.66 
## 
## mean raw eQQ diff.....      12112             8843.2 
## med  raw eQQ diff.....      13837             9499.6 
## max  raw eQQ diff.....      22438              16788 
## 
## mean eCDF diff........     0.4751            0.34805 
## med  eCDF diff........    0.51248            0.29702 
## max  eCDF diff........     0.6509            0.60454 
## 
## var ratio (Tr/Co).....    0.12059              0.212 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........     0.6509            0.60454 
## 
## 
## ***** (V12) I(re75^2) *****
##                        Before Matching        After Matching
## mean treatment........   12654750           39629696 
## mean control..........  272279442          269310626 
## std mean diff.........    -463.34            -413.34 
## 
## mean raw eQQ diff.....  259843916          228089774 
## med  raw eQQ diff.....  206883640          171289008 
## max  raw eQQ diff.....  629191089          565741310 
## 
## mean eCDF diff........     0.4751            0.34805 
## med  eCDF diff........    0.51248            0.29702 
## max  eCDF diff........     0.6509            0.60454 
## 
## var ratio (Tr/Co).....   0.051503           0.050575 
## T-test p-value........ < 2.22e-16         < 2.22e-16 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         < 2.22e-16 
## KS Statistic..........     0.6509            0.60454 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age I(age^2) education I(education^2) black married nodegree re74 I(re74^2) re75 I(re75^2)  Number(s): 1 2 3 4 5 7 8 9 10 11 12 
## 
## After Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age I(age^2) education I(education^2) married re74 I(re74^2) re75 I(re75^2)  Number(s): 1 2 3 4 7 9 10 11 12

By substituting the 260 control observations in the nsw dataset with the 15992 observations in the cps dataset, the genetic matching algorithm has more control observations to match on, which means a greater likelihood of finding “perfect match.” The ATT produced by this model is aline with the simple difference in mean analysis between the nsw treated and cps control, giving a negative treatment effect of -5097.5 with 95% confidence interval between -10042.4099 and -152.6114. However, the matched balance did not improve significantly even we have more observation to match on. This might be an indication that using the control observations in the cps dataset is inappropriate for the analysis.

Genetic Matching Improvement 2

# genetic matching with 0.25 std caliper
attach(nsw_dw)
X = cbind(age, I(age^2), education, I(education^2), I(age*education), black, hispanic, married,
          nodegree, re74, I(re74^2), re75, I(re75^2), I(re74*re75))
caliper.geneticMatching <- GenMatch(Tr=treat, X=X, estimand="ATE", M=1, pop.size=16, caliper=0.25,
                            max.generations=10, wait.generations=1)
## 
## 
## Sun Mar 25 01:28:46 2018
## Domains:
##  0.000000e+00   <=  X1   <=    1.000000e+03 
##  0.000000e+00   <=  X2   <=    1.000000e+03 
##  0.000000e+00   <=  X3   <=    1.000000e+03 
##  0.000000e+00   <=  X4   <=    1.000000e+03 
##  0.000000e+00   <=  X5   <=    1.000000e+03 
##  0.000000e+00   <=  X6   <=    1.000000e+03 
##  0.000000e+00   <=  X7   <=    1.000000e+03 
##  0.000000e+00   <=  X8   <=    1.000000e+03 
##  0.000000e+00   <=  X9   <=    1.000000e+03 
##  0.000000e+00   <=  X10  <=    1.000000e+03 
##  0.000000e+00   <=  X11  <=    1.000000e+03 
##  0.000000e+00   <=  X12  <=    1.000000e+03 
##  0.000000e+00   <=  X13  <=    1.000000e+03 
##  0.000000e+00   <=  X14  <=    1.000000e+03 
## 
## Data Type: Floating Point
## Operators (code number, name, population) 
##  (1) Cloning...........................  1
##  (2) Uniform Mutation..................  2
##  (3) Boundary Mutation.................  2
##  (4) Non-Uniform Mutation..............  2
##  (5) Polytope Crossover................  2
##  (6) Simple Crossover..................  2
##  (7) Whole Non-Uniform Mutation........  2
##  (8) Heuristic Crossover...............  2
##  (9) Local-Minimum Crossover...........  0
## 
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size       : 16
## Convergence Tolerance: 1.000000e-03
## 
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
## 
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 1.154871e-01  1.538568e-01  2.223815e-01  2.801223e-01  5.124101e-01  7.930811e-01  8.879696e-01  9.389695e-01  9.999994e-01  9.999994e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 5.174626e+02
## mean............ 4.183607e+02
## variance........ 8.588364e+04
## var 2:
## best............ 2.517747e+02
## mean............ 5.616477e+02
## variance........ 8.151522e+04
## var 3:
## best............ 5.423325e+02
## mean............ 5.091512e+02
## variance........ 8.353513e+04
## var 4:
## best............ 8.959062e+02
## mean............ 5.195744e+02
## variance........ 9.982392e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.752450e+02
## variance........ 9.761023e+04
## var 6:
## best............ 3.234974e+02
## mean............ 3.557615e+02
## variance........ 5.405763e+04
## var 7:
## best............ 9.283781e+02
## mean............ 5.471011e+02
## variance........ 1.329645e+05
## var 8:
## best............ 9.993378e+01
## mean............ 5.028265e+02
## variance........ 8.870994e+04
## var 9:
## best............ 1.287469e+02
## mean............ 4.710720e+02
## variance........ 7.529871e+04
## var 10:
## best............ 2.668056e+01
## mean............ 3.327088e+02
## variance........ 5.780442e+04
## var 11:
## best............ 9.778556e+02
## mean............ 3.952060e+02
## variance........ 9.840715e+04
## var 12:
## best............ 6.885289e+02
## mean............ 4.208080e+02
## variance........ 7.460026e+04
## var 13:
## best............ 9.448080e+02
## mean............ 4.509797e+02
## variance........ 8.334366e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.198698e+02
## variance........ 7.637572e+04
## 
## GENERATION: 1
## Lexical Fit..... 1.154871e-01  1.538568e-01  2.223815e-01  2.801223e-01  5.124101e-01  7.930811e-01  8.879696e-01  9.389695e-01  9.999994e-01  9.999994e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 5.174626e+02
## mean............ 5.521987e+02
## variance........ 1.638715e+04
## var 2:
## best............ 2.517747e+02
## mean............ 4.626389e+02
## variance........ 8.729209e+04
## var 3:
## best............ 5.423325e+02
## mean............ 6.548286e+02
## variance........ 9.865970e+03
## var 4:
## best............ 8.959062e+02
## mean............ 7.013638e+02
## variance........ 5.099725e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.686821e+02
## variance........ 2.581440e+04
## var 6:
## best............ 3.234974e+02
## mean............ 5.118322e+02
## variance........ 6.283828e+04
## var 7:
## best............ 9.283781e+02
## mean............ 6.071086e+02
## variance........ 1.466464e+05
## var 8:
## best............ 9.993378e+01
## mean............ 2.197013e+02
## variance........ 6.821922e+04
## var 9:
## best............ 1.287469e+02
## mean............ 3.756282e+02
## variance........ 7.588449e+04
## var 10:
## best............ 2.668056e+01
## mean............ 1.539071e+02
## variance........ 1.692154e+04
## var 11:
## best............ 9.778556e+02
## mean............ 5.816245e+02
## variance........ 1.544968e+05
## var 12:
## best............ 6.885289e+02
## mean............ 6.686229e+02
## variance........ 6.514926e+03
## var 13:
## best............ 9.448080e+02
## mean............ 7.814687e+02
## variance........ 3.030432e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.580865e+02
## variance........ 1.066286e+04
## 
## GENERATION: 2
## Lexical Fit..... 1.154871e-01  1.538568e-01  2.223815e-01  2.801223e-01  5.124101e-01  7.930811e-01  8.879696e-01  9.389695e-01  9.999994e-01  9.999994e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 5.174626e+02
## mean............ 6.122315e+02
## variance........ 2.415755e+04
## var 2:
## best............ 2.517747e+02
## mean............ 4.516073e+02
## variance........ 6.589793e+04
## var 3:
## best............ 5.423325e+02
## mean............ 6.500472e+02
## variance........ 1.136768e+04
## var 4:
## best............ 8.959062e+02
## mean............ 7.118880e+02
## variance........ 4.088882e+04
## var 5:
## best............ 6.916058e+02
## mean............ 5.930714e+02
## variance........ 2.269066e+04
## var 6:
## best............ 3.234974e+02
## mean............ 4.792907e+02
## variance........ 5.398690e+04
## var 7:
## best............ 9.283781e+02
## mean............ 6.253168e+02
## variance........ 1.214839e+05
## var 8:
## best............ 9.993378e+01
## mean............ 3.031994e+02
## variance........ 7.139733e+04
## var 9:
## best............ 1.287469e+02
## mean............ 3.618528e+02
## variance........ 6.408257e+04
## var 10:
## best............ 2.668056e+01
## mean............ 1.432764e+02
## variance........ 9.984163e+03
## var 11:
## best............ 9.778556e+02
## mean............ 5.749167e+02
## variance........ 1.317945e+05
## var 12:
## best............ 6.885289e+02
## mean............ 6.825786e+02
## variance........ 6.364565e+03
## var 13:
## best............ 9.448080e+02
## mean............ 7.546748e+02
## variance........ 3.145562e+04
## var 14:
## best............ 5.331021e+02
## mean............ 4.659258e+02
## variance........ 1.614830e+04
## 
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
## 
## Solution Lexical Fitness Value:
## 1.154871e-01  1.538568e-01  2.223815e-01  2.801223e-01  5.124101e-01  7.930811e-01  8.879696e-01  9.389695e-01  9.999994e-01  9.999994e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## 
## Parameters at the Solution:
## 
##  X[ 1] : 5.174626e+02
##  X[ 2] : 2.517747e+02
##  X[ 3] : 5.423325e+02
##  X[ 4] : 8.959062e+02
##  X[ 5] : 6.916058e+02
##  X[ 6] : 3.234974e+02
##  X[ 7] : 9.283781e+02
##  X[ 8] : 9.993378e+01
##  X[ 9] : 1.287469e+02
##  X[10] : 2.668056e+01
##  X[11] : 9.778556e+02
##  X[12] : 6.885289e+02
##  X[13] : 9.448080e+02
##  X[14] : 5.331021e+02
## 
## Solution Found Generation 1
## Number of Generations Run 2
## 
## Sun Mar 25 01:28:47 2018
## Total run time : 0 hours 0 minutes and 1 seconds
caliper.geneticMatching.ATT <- Match(Y=re78, Tr=treat, X=X, estimand="ATE", caliper=0.25,
                                     Weight.matrix=caliper.geneticMatching)
summary(caliper.geneticMatching.ATT)
## 
## Estimate...  1395.3 
## AI SE......  382.45 
## T-stat.....  3.6484 
## p.val......  0.0002639 
## 
## Original number of observations..............  445 
## Original number of treated obs...............  185 
## Matched number of observations...............  204 
## Matched number of observations  (unweighted).  328 
## 
## Caliper (SDs)........................................   0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 
## Number of obs dropped by 'exact' or 'caliper'  241
detach(nsw_dw)

# 95% confidence interval
attach(caliper.geneticMatching.ATT)
## The following object is masked from package:base:
## 
##     version
c(est-1.96*se, est+1.96*se)
## [1]  645.7264 2144.9453
detach(caliper.geneticMatching.ATT)

# balance test
caliper.geneticMatching.balance <- MatchBalance(treat ~ age + I(age^2) + education + I(education^2) +
                                                black + hispanic + married + nodegree + re74 + I(re74^2) +
                                                re75 + I(re75^2), data=nsw_dw,
                                                match.out=caliper.geneticMatching.ATT, nboots=10)
## 
## ***** (V1) age *****
##                        Before Matching        After Matching
## mean treatment........     25.816             23.446 
## mean control..........     25.054             23.456 
## std mean diff.........     10.655           -0.16652 
## 
## mean raw eQQ diff.....    0.94054            0.14939 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....          7                  1 
## 
## mean eCDF diff........   0.025364          0.0057458 
## med  eCDF diff........   0.022193          0.0030488 
## max  eCDF diff........   0.065177           0.018293 
## 
## var ratio (Tr/Co).....     1.0278            0.97584 
## T-test p-value........    0.26594            0.79308 
## KS Bootstrap p-value..        0.5                  1 
## KS Naive p-value......     0.7481                  1 
## KS Statistic..........   0.065177           0.018293 
## 
## 
## ***** (V2) I(age^2) *****
##                        Before Matching        After Matching
## mean treatment........     717.39             584.21 
## mean control..........     677.32             585.52 
## std mean diff.........     9.2937           -0.40371 
## 
## mean raw eQQ diff.....     56.076             8.1982 
## med  raw eQQ diff.....         43                  0 
## max  raw eQQ diff.....        721                 89 
## 
## mean eCDF diff........   0.025364          0.0057458 
## med  eCDF diff........   0.022193          0.0030488 
## max  eCDF diff........   0.065177           0.018293 
## 
## var ratio (Tr/Co).....     1.0115            0.96054 
## T-test p-value........    0.33337            0.51241 
## KS Bootstrap p-value..        0.5                  1 
## KS Naive p-value......     0.7481                  1 
## KS Statistic..........   0.065177           0.018293 
## 
## 
## ***** (V3) education *****
##                        Before Matching        After Matching
## mean treatment........     10.346              10.23 
## mean control..........     10.088              10.23 
## std mean diff.........     12.806                  0 
## 
## mean raw eQQ diff.....    0.40541                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          2                  0 
## 
## mean eCDF diff........   0.028698                  0 
## med  eCDF diff........   0.012682                  0 
## max  eCDF diff........    0.12651                  0 
## 
## var ratio (Tr/Co).....     1.5513                  1 
## T-test p-value........    0.15017                  1 
## KS Bootstrap p-value.. < 2.22e-16                  1 
## KS Naive p-value......   0.062873                  1 
## KS Statistic..........    0.12651                  0 
## 
## 
## ***** (V4) I(education^2) *****
##                        Before Matching        After Matching
## mean treatment........     111.06             106.73 
## mean control..........     104.37             106.73 
## std mean diff.........     17.012                  0 
## 
## mean raw eQQ diff.....     8.7189                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....         60                  0 
## 
## mean eCDF diff........   0.028698                  0 
## med  eCDF diff........   0.012682                  0 
## max  eCDF diff........    0.12651                  0 
## 
## var ratio (Tr/Co).....     1.6625                  1 
## T-test p-value........   0.053676                  1 
## KS Bootstrap p-value.. < 2.22e-16                  1 
## KS Naive p-value......   0.062873                  1 
## KS Statistic..........    0.12651                  0 
## 
## 
## ***** (V5) black *****
##                        Before Matching        After Matching
## mean treatment........    0.84324            0.95588 
## mean control..........    0.82692            0.95588 
## std mean diff.........     4.4767                  0 
## 
## mean raw eQQ diff.....   0.016216                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........  0.0081601                  0 
## med  eCDF diff........  0.0081601                  0 
## max  eCDF diff........    0.01632                  0 
## 
## var ratio (Tr/Co).....    0.92503                  1 
## T-test p-value........    0.64736                  1 
## 
## 
## ***** (V6) hispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.059459           0.019608 
## mean control..........    0.10769           0.019608 
## std mean diff.........    -20.341                  0 
## 
## mean raw eQQ diff.....   0.048649                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.024116                  0 
## med  eCDF diff........   0.024116                  0 
## max  eCDF diff........   0.048233                  0 
## 
## var ratio (Tr/Co).....    0.58288                  1 
## T-test p-value........   0.064043                  1 
## 
## 
## ***** (V7) married *****
##                        Before Matching        After Matching
## mean treatment........    0.18919           0.034314 
## mean control..........    0.15385           0.034314 
## std mean diff.........     8.9995                  0 
## 
## mean raw eQQ diff.....   0.037838                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.017672                  0 
## med  eCDF diff........   0.017672                  0 
## max  eCDF diff........   0.035343                  0 
## 
## var ratio (Tr/Co).....     1.1802                  1 
## T-test p-value........    0.33425                  1 
## 
## 
## ***** (V8) nodegree *****
##                        Before Matching        After Matching
## mean treatment........    0.70811            0.82353 
## mean control..........    0.83462            0.82353 
## std mean diff.........    -27.751                  0 
## 
## mean raw eQQ diff.....    0.12432                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.063254                  0 
## med  eCDF diff........   0.063254                  0 
## max  eCDF diff........    0.12651                  0 
## 
## var ratio (Tr/Co).....     1.4998                  1 
## T-test p-value........  0.0020368                  1 
## 
## 
## ***** (V9) re74 *****
##                        Before Matching        After Matching
## mean treatment........     2095.6             48.641 
## mean control..........       2107             62.902 
## std mean diff.........   -0.23437            -5.3426 
## 
## mean raw eQQ diff.....     487.98             17.552 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....       8413               1199 
## 
## mean eCDF diff........   0.019501          0.0051829 
## med  eCDF diff........   0.016112          0.0060976 
## max  eCDF diff........   0.047089          0.0091463 
## 
## var ratio (Tr/Co).....     0.7381             0.4801 
## T-test p-value........    0.98186            0.28012 
## KS Bootstrap p-value..        0.6                0.7 
## KS Naive p-value......    0.97023                  1 
## KS Statistic..........   0.047089          0.0091463 
## 
## 
## ***** (V10) I(re74^2) *****
##                        Before Matching        After Matching
## mean treatment........   28141412              73273 
## mean control..........   36667400             151650 
## std mean diff.........    -7.4722            -14.591 
## 
## mean raw eQQ diff.....   13311768              55879 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  365146183            6964969 
## 
## mean eCDF diff........   0.019501          0.0051829 
## med  eCDF diff........   0.016112          0.0060976 
## max  eCDF diff........   0.047089          0.0091463 
## 
## var ratio (Tr/Co).....    0.50382            0.19069 
## T-test p-value........    0.51322            0.11549 
## KS Bootstrap p-value..        0.6                0.7 
## KS Naive p-value......    0.97023                  1 
## KS Statistic..........   0.047089          0.0091463 
## 
## 
## ***** (V11) re75 *****
##                        Before Matching        After Matching
## mean treatment........     1532.1             78.926 
## mean control..........     1266.9             90.454 
## std mean diff.........     8.2363            -2.7335 
## 
## mean raw eQQ diff.....     367.61             8.3696 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....     2110.3             345.03 
## 
## mean eCDF diff........   0.051061          0.0079849 
## med  eCDF diff........   0.064657          0.0060976 
## max  eCDF diff........    0.10748           0.021341 
## 
## var ratio (Tr/Co).....     1.0763            0.97985 
## T-test p-value........    0.38527            0.22238 
## KS Bootstrap p-value.. < 2.22e-16                0.7 
## KS Naive p-value......    0.16449                  1 
## KS Statistic..........    0.10748           0.021341 
## 
## 
## ***** (V12) I(re75^2) *****
##                        Before Matching        After Matching
## mean treatment........   12654750             183206 
## mean control..........   11196524             188797 
## std mean diff.........     2.6024           -0.44451 
## 
## mean raw eQQ diff.....    2840847             7233.8 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....  101660120             267110 
## 
## mean eCDF diff........   0.051061          0.0079849 
## med  eCDF diff........   0.064657          0.0060976 
## max  eCDF diff........    0.10748           0.021341 
## 
## var ratio (Tr/Co).....     1.4609              1.011 
## T-test p-value........    0.77178            0.88797 
## KS Bootstrap p-value.. < 2.22e-16                0.7 
## KS Naive p-value......    0.16449                  1 
## KS Statistic..........    0.10748           0.021341 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): education I(education^2) re75 I(re75^2)  Number(s): 3 4 11 12 
## 
## After Matching Minimum p.value: 0.11549 
## Variable Name(s): I(re74^2)  Number(s): 10

Caliper matching is to drop observations that cannot be closely matched. In this implementation, a 0.25 standard deviation dropped 241 out of the 445 observations. This shows two concerns: 1) more than half of the observations are dropped meaning what we are analyzing is fundamentally different from the original problem we are investigating on. 2) many of the data cannot be closely matched challenge the validity of the none matching method on this dataset, as the assumption that we can find “identical twins” to obtain Y(0) and Y(1) is not the case in this dataset. The result produced by caliper matching is ATT of 1395.3 with 95% confidence interval between 645.7264 and 2144.9453. This result is not significantly different from result produces by the default genetic matching model. As in caliper matching, unmatched observations are dropped, the SE is smaller making the 95% confidence interval tighter.

Summary In conclusion, we were able to obtain a balanced match using genetic matching and estimate an average treatment effect on the treated unit of 1683.3 with 95% confidence interval between 193.8664 and 3172.6476. However, an extensive analysis suggests further investigation on the relation between the Lalonde Sample and the Current Population Survey Data, as well as if the matching model was able to find close enough match in the Lalonde Sample to produce a meaningful result.