This demonstration is mostly a summary from the following literature.
Reference
1) Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. 2) Morgan, S. L., & Winship, C. (2015). Counterfactuals and causal inference. Cambridge University Press. 3) Iacus, Stefano M., Gary King, and Giuseppe Porro. “CEM: software for coarsened exact matching.” (2009): 1-27. 4) Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, http://gking. harvard. edu/matchit.
Lalonde data set (Lalonde, 1986) “This program provided training to selected individuals for 12-18 months and help finding a job in the hopes of increasing their’ earnings. The treatment variable, treated, is 1 for participants (the treatment group) and 0 for nonparticipants (the control group). The key outcome variable is earnings in 1978 (re78).” (Iacus et al. 2009)
The goal of matching: create a dataset that looks closer to a dataset that we could have in a randomized experiment. This means that we want “the distribution of covariates” to be the same between our treatment and control groups. Put it differently, we want that our treatment and control groups to be as similar as possible in any other aspects besides the treatment.
How do we know we might have self-selection problem? (whether two groups are similar in all other aspects) 1) by theory 2) using imbalance test
variables for other aspects (=pre-treated variables) 1) age (age) 2) years of education (educ) 3) marital status (married) 4) lack of a high school diploma (nodegree) 5) race (black, hispan) 6) indicator variables for unemployment in 1974 (u74) and 1975 (u75) 7) real earnings in 1974 (re74) and 1975 (re75).
#default is logistic regression model, but you can also change it to a probit model
#nearest neighbor
matched1<-matchit(treat~re74+re75+educ+black+hispan+nodegree+married, data=Le, method="nearest", distance = "logit")
matched1 <- glm(treat~re74+re75+educ+black+hispan+nodegree+married, data=Le, family = binomial())
install.packages("stargazer")
## Installing package into 'C:/Users/joufeihu/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'stargazer' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\joufeihu\AppData\Local\Temp\Rtmpy6cr43\downloaded_packages
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
stargazer(matched1, type="html")
##
## <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>treat</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">re74</td><td>-0.0001<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.00003)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">re75</td><td>0.0001</td></tr>
## <tr><td style="text-align:left"></td><td>(0.00005)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">educ</td><td>0.147<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.064)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">black</td><td>3.067<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.286)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">hispan</td><td>0.950<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.423)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">nodegree</td><td>0.653<sup>*</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.334)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">married</td><td>-0.731<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.276)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>-4.164<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.889)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>614</td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td>-244.594</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td>505.187</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
#optimal matching=greedy matching (choose 1 at a time, do not try to minimize a global distance measure. Pick the control matched with the smallest average absolute distance.)
install.packages("optmatch")
## Installing package into 'C:/Users/joufeihu/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'optmatch' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\joufeihu\AppData\Local\Temp\Rtmpy6cr43\downloaded_packages
library("optmatch")
## Loading required package: survival
## The optmatch package has an academic license. Enter relaxinfo() for more information.
matched2<-matchit(treat~re74+re75+educ+black+hispan+nodegree+married, data=Le, method="optimal", ratio = 2)
## Warning in optmatch::fullmatch(d, min.controls = ratio, max.controls = ratio, : Without 'data' argument the order of the match is not guaranteed
## to be the same as your original data.
#ratio: how many controls to 1 treatment
#Full matching: 1 treatment with >=1 controls
matched3<-matchit(treat~re74+re75+educ+black+hispan+nodegree+married, data=Le, method="full", distance = "logit")
## Warning in optmatch::fullmatch(d, ...): Without 'data' argument the order of the match is not guaranteed
## to be the same as your original data.
#genetic matching: let the software picks which matching solution is better
install.packages("rgenoud")
## Installing package into 'C:/Users/joufeihu/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'rgenoud' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\joufeihu\AppData\Local\Temp\Rtmpy6cr43\downloaded_packages
library(rgenoud)
## ## rgenoud (Version 5.8-3.0, Build Date: 2019-01-22)
## ## See http://sekhon.berkeley.edu/rgenoud for additional documentation.
## ## Please cite software as:
## ## Walter Mebane, Jr. and Jasjeet S. Sekhon. 2011.
## ## ``Genetic Optimization Using Derivatives: The rgenoud package for R.''
## ## Journal of Statistical Software, 42(11): 1-26.
## ##
matched4<-matchit(treat~re74+re75+educ+black+hispan+nodegree+married, data=Le, method="genetic")
## Warning in Matching::GenMatch(tt, cbind(dd, xx), M = ratio, ...): The key
## tuning parameters for optimization were are all left at their default values.
## The 'pop.size' option in particular should probably be increased for optimal
## results. For details please see the help page and http://sekhon.berkeley.edu/
## papers/MatchingJSS.pdf
##
##
## Thu Feb 06 15:01:02 2020
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
## 0.000000e+00 <= X7 <= 1.000000e+03
## 0.000000e+00 <= X8 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 15
## (2) Uniform Mutation.................. 12
## (3) Boundary Mutation................. 12
## (4) Non-Uniform Mutation.............. 12
## (5) Polytope Crossover................ 12
## (6) Simple Crossover.................. 12
## (7) Whole Non-Uniform Mutation........ 12
## (8) Heuristic Crossover............... 12
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 100
## Maximum Nonchanging Generations: 4
## Population size : 100
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 3.396340e-02 3.396340e-02 3.503340e-02 1.164679e-01 2.029828e-01 3.013736e-01 7.208510e-01 8.880345e-01 9.173240e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 100, #Total UniqueCount: 100
## var 1:
## best............ 5.315025e+02
## mean............ 4.273368e+02
## variance........ 7.711331e+04
## var 2:
## best............ 2.572754e+02
## mean............ 5.263473e+02
## variance........ 8.404341e+04
## var 3:
## best............ 7.016413e+02
## mean............ 5.036422e+02
## variance........ 7.671305e+04
## var 4:
## best............ 1.960964e+02
## mean............ 4.991992e+02
## variance........ 8.426061e+04
## var 5:
## best............ 9.602479e+02
## mean............ 5.251414e+02
## variance........ 8.447557e+04
## var 6:
## best............ 5.326748e+02
## mean............ 4.913269e+02
## variance........ 9.541267e+04
## var 7:
## best............ 6.242898e+00
## mean............ 4.853063e+02
## variance........ 7.785240e+04
## var 8:
## best............ 7.320737e+02
## mean............ 4.876887e+02
## variance........ 8.788518e+04
##
## GENERATION: 1
## Lexical Fit..... 7.398092e-02 3.173158e-01 3.173158e-01 3.173158e-01 3.173158e-01 3.807949e-01 5.325129e-01 6.174349e-01 6.174349e-01 8.638974e-01 9.676572e-01 9.938631e-01 9.981865e-01 9.996418e-01 1.000000e+00 1.000000e+00
## #unique......... 76, #Total UniqueCount: 176
## var 1:
## best............ 7.253393e+01
## mean............ 4.455872e+02
## variance........ 6.503582e+04
## var 2:
## best............ 8.534252e+01
## mean............ 3.179878e+02
## variance........ 3.816075e+04
## var 3:
## best............ 6.623792e+02
## mean............ 6.973400e+02
## variance........ 1.560474e+04
## var 4:
## best............ 1.969720e+01
## mean............ 4.114144e+02
## variance........ 7.073010e+04
## var 5:
## best............ 3.197347e+02
## mean............ 6.549663e+02
## variance........ 1.042805e+05
## var 6:
## best............ 8.878277e+02
## mean............ 5.354805e+02
## variance........ 5.479737e+04
## var 7:
## best............ 2.219195e+02
## mean............ 2.319605e+02
## variance........ 8.969955e+04
## var 8:
## best............ 2.097229e+01
## mean............ 5.143216e+02
## variance........ 8.165956e+04
##
## GENERATION: 2
## Lexical Fit..... 1.445030e-01 3.173158e-01 3.173158e-01 3.173158e-01 3.173158e-01 3.831725e-01 5.246157e-01 6.550725e-01 6.550725e-01 9.528564e-01 9.934606e-01 9.996066e-01 9.999980e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 69, #Total UniqueCount: 245
## var 1:
## best............ 6.494892e+01
## mean............ 2.727333e+02
## variance........ 4.823605e+04
## var 2:
## best............ 8.250112e+01
## mean............ 3.060292e+02
## variance........ 4.464851e+04
## var 3:
## best............ 6.617304e+02
## mean............ 6.579395e+02
## variance........ 5.128937e+03
## var 4:
## best............ 1.678199e+01
## mean............ 3.425171e+02
## variance........ 9.236491e+04
## var 5:
## best............ 3.091495e+02
## mean............ 4.977477e+02
## variance........ 1.022763e+05
## var 6:
## best............ 8.936970e+02
## mean............ 6.975545e+02
## variance........ 3.680804e+04
## var 7:
## best............ 2.254838e+02
## mean............ 1.623683e+02
## variance........ 2.469322e+04
## var 8:
## best............ 9.220477e+00
## mean............ 3.295178e+02
## variance........ 1.014724e+05
##
## GENERATION: 3
## Lexical Fit..... 1.445030e-01 3.173158e-01 3.173158e-01 3.173158e-01 3.173158e-01 3.831725e-01 5.246157e-01 6.550725e-01 6.550725e-01 9.528564e-01 9.934606e-01 9.996066e-01 9.999980e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 73, #Total UniqueCount: 318
## var 1:
## best............ 6.494892e+01
## mean............ 8.936200e+01
## variance........ 6.348079e+03
## var 2:
## best............ 8.250112e+01
## mean............ 1.646348e+02
## variance........ 3.930713e+04
## var 3:
## best............ 6.617304e+02
## mean............ 6.564856e+02
## variance........ 9.315599e+03
## var 4:
## best............ 1.678199e+01
## mean............ 1.223402e+02
## variance........ 6.700397e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.240043e+02
## variance........ 1.175587e+04
## var 6:
## best............ 8.936970e+02
## mean............ 8.475179e+02
## variance........ 1.959195e+04
## var 7:
## best............ 2.254838e+02
## mean............ 2.474241e+02
## variance........ 1.014500e+04
## var 8:
## best............ 9.220477e+00
## mean............ 3.006410e+01
## variance........ 3.211978e+03
##
## GENERATION: 4
## Lexical Fit..... 2.105432e-01 3.173158e-01 3.173158e-01 3.764898e-01 5.179586e-01 5.787422e-01 6.550725e-01 6.550725e-01 8.187753e-01 8.187753e-01 9.980422e-01 9.996066e-01 9.999980e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 65, #Total UniqueCount: 383
## var 1:
## best............ 6.494892e+01
## mean............ 1.048929e+02
## variance........ 1.681064e+04
## var 2:
## best............ 8.250112e+01
## mean............ 1.191354e+02
## variance........ 1.151890e+04
## var 3:
## best............ 6.617304e+02
## mean............ 6.670835e+02
## variance........ 3.670704e+03
## var 4:
## best............ 1.678199e+01
## mean............ 6.493873e+01
## variance........ 1.922298e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.188328e+02
## variance........ 5.443162e+03
## var 6:
## best............ 1.233238e+02
## mean............ 7.150800e+02
## variance........ 9.571211e+04
## var 7:
## best............ 5.954648e+01
## mean............ 2.434320e+02
## variance........ 6.583166e+03
## var 8:
## best............ 9.220477e+00
## mean............ 3.385269e+01
## variance........ 6.027339e+03
##
## GENERATION: 5
## Lexical Fit..... 2.105432e-01 3.173158e-01 3.173158e-01 3.764898e-01 5.179586e-01 5.787422e-01 6.550725e-01 6.550725e-01 8.187753e-01 8.187753e-01 9.980422e-01 9.996066e-01 9.999980e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 65, #Total UniqueCount: 448
## var 1:
## best............ 6.494892e+01
## mean............ 1.010840e+02
## variance........ 1.692230e+04
## var 2:
## best............ 8.250112e+01
## mean............ 9.419134e+01
## variance........ 4.602902e+03
## var 3:
## best............ 6.617304e+02
## mean............ 6.895726e+02
## variance........ 9.959121e+03
## var 4:
## best............ 1.678199e+01
## mean............ 4.197203e+01
## variance........ 1.191596e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.217066e+02
## variance........ 3.603063e+03
## var 6:
## best............ 1.233238e+02
## mean............ 5.326069e+02
## variance........ 1.363705e+05
## var 7:
## best............ 5.954648e+01
## mean............ 1.751241e+02
## variance........ 1.260205e+04
## var 8:
## best............ 9.220477e+00
## mean............ 2.964725e+01
## variance........ 8.501880e+03
##
## GENERATION: 6
## Lexical Fit..... 2.105432e-01 3.173158e-01 3.173158e-01 3.764898e-01 5.179586e-01 5.787422e-01 6.550725e-01 6.550725e-01 8.187753e-01 8.187753e-01 9.980422e-01 9.996066e-01 9.999980e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 45, #Total UniqueCount: 493
## var 1:
## best............ 6.494892e+01
## mean............ 9.299642e+01
## variance........ 1.231366e+04
## var 2:
## best............ 8.250112e+01
## mean............ 1.086721e+02
## variance........ 1.163918e+04
## var 3:
## best............ 6.617304e+02
## mean............ 6.779170e+02
## variance........ 5.894540e+03
## var 4:
## best............ 1.678199e+01
## mean............ 5.066718e+01
## variance........ 1.423387e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.175484e+02
## variance........ 7.128679e+03
## var 6:
## best............ 1.233238e+02
## mean............ 2.298056e+02
## variance........ 6.690428e+04
## var 7:
## best............ 5.954648e+01
## mean............ 1.001395e+02
## variance........ 1.085742e+04
## var 8:
## best............ 9.220477e+00
## mean............ 3.938751e+01
## variance........ 7.015339e+03
##
## GENERATION: 7
## Lexical Fit..... 2.177730e-01 3.072059e-01 3.173158e-01 3.173158e-01 5.004755e-01 5.509946e-01 6.550725e-01 6.550725e-01 8.187753e-01 8.187753e-01 9.934606e-01 9.996066e-01 9.999561e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 48, #Total UniqueCount: 541
## var 1:
## best............ 6.494892e+01
## mean............ 9.346770e+01
## variance........ 1.299263e+04
## var 2:
## best............ 8.250112e+01
## mean............ 9.488193e+01
## variance........ 3.123320e+03
## var 3:
## best............ 6.617304e+02
## mean............ 6.464627e+02
## variance........ 5.451270e+03
## var 4:
## best............ 1.592755e+01
## mean............ 3.475770e+01
## variance........ 7.675494e+03
## var 5:
## best............ 3.091495e+02
## mean............ 3.263303e+02
## variance........ 7.175456e+03
## var 6:
## best............ 1.233238e+02
## mean............ 1.346450e+02
## variance........ 4.042272e+03
## var 7:
## best............ 5.954648e+01
## mean............ 8.781489e+01
## variance........ 1.138825e+04
## var 8:
## best............ 9.220477e+00
## mean............ 3.333908e+01
## variance........ 9.617490e+03
##
## GENERATION: 8
## Lexical Fit..... 2.177730e-01 3.072059e-01 3.173158e-01 3.173158e-01 5.004755e-01 5.509946e-01 6.550725e-01 6.550725e-01 8.187753e-01 8.187753e-01 9.934606e-01 9.996066e-01 9.999561e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 61, #Total UniqueCount: 602
## var 1:
## best............ 6.494892e+01
## mean............ 8.621336e+01
## variance........ 5.846311e+03
## var 2:
## best............ 8.250112e+01
## mean............ 1.032231e+02
## variance........ 7.795116e+03
## var 3:
## best............ 6.617304e+02
## mean............ 6.590687e+02
## variance........ 5.707771e+03
## var 4:
## best............ 1.592755e+01
## mean............ 4.662520e+01
## variance........ 1.222112e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.240598e+02
## variance........ 6.628350e+03
## var 6:
## best............ 1.233238e+02
## mean............ 1.379894e+02
## variance........ 4.750683e+03
## var 7:
## best............ 5.954648e+01
## mean............ 8.973167e+01
## variance........ 1.283038e+04
## var 8:
## best............ 9.220477e+00
## mean............ 3.822396e+01
## variance........ 1.202743e+04
##
## GENERATION: 9
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 66, #Total UniqueCount: 668
## var 1:
## best............ 6.494892e+01
## mean............ 8.607135e+01
## variance........ 5.599003e+03
## var 2:
## best............ 8.250112e+01
## mean............ 1.127894e+02
## variance........ 1.327988e+04
## var 3:
## best............ 9.096788e+02
## mean............ 6.539117e+02
## variance........ 3.809070e+03
## var 4:
## best............ 1.592755e+01
## mean............ 3.187847e+01
## variance........ 5.839050e+03
## var 5:
## best............ 3.091495e+02
## mean............ 3.750585e+02
## variance........ 2.228332e+04
## var 6:
## best............ 1.233238e+02
## mean............ 1.997542e+02
## variance........ 2.891876e+04
## var 7:
## best............ 5.954648e+01
## mean............ 7.505885e+01
## variance........ 4.472620e+03
## var 8:
## best............ 9.220477e+00
## mean............ 4.216782e+01
## variance........ 1.329217e+04
##
## GENERATION: 10
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 64, #Total UniqueCount: 732
## var 1:
## best............ 6.494892e+01
## mean............ 8.530821e+01
## variance........ 7.804339e+03
## var 2:
## best............ 8.250112e+01
## mean............ 1.040018e+02
## variance........ 9.188159e+03
## var 3:
## best............ 9.096788e+02
## mean............ 7.629706e+02
## variance........ 1.562005e+04
## var 4:
## best............ 1.592755e+01
## mean............ 5.742078e+01
## variance........ 1.992875e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.493890e+02
## variance........ 1.713717e+04
## var 6:
## best............ 1.233238e+02
## mean............ 1.550039e+02
## variance........ 7.711068e+03
## var 7:
## best............ 5.954648e+01
## mean............ 8.753525e+01
## variance........ 1.076327e+04
## var 8:
## best............ 9.220477e+00
## mean............ 2.903032e+01
## variance........ 4.878449e+03
##
## GENERATION: 11
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 61, #Total UniqueCount: 793
## var 1:
## best............ 6.494892e+01
## mean............ 8.133978e+01
## variance........ 7.376730e+03
## var 2:
## best............ 8.250112e+01
## mean............ 9.132185e+01
## variance........ 2.690943e+03
## var 3:
## best............ 9.096788e+02
## mean............ 8.443158e+02
## variance........ 1.351107e+04
## var 4:
## best............ 1.592755e+01
## mean............ 3.182144e+01
## variance........ 2.904055e+03
## var 5:
## best............ 3.091495e+02
## mean............ 3.346625e+02
## variance........ 1.146929e+04
## var 6:
## best............ 1.233238e+02
## mean............ 1.334060e+02
## variance........ 3.155043e+03
## var 7:
## best............ 5.954648e+01
## mean............ 7.981277e+01
## variance........ 3.720432e+03
## var 8:
## best............ 9.220477e+00
## mean............ 3.563045e+01
## variance........ 1.010448e+04
##
## GENERATION: 12
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 49, #Total UniqueCount: 842
## var 1:
## best............ 6.494892e+01
## mean............ 9.802504e+01
## variance........ 1.471808e+04
## var 2:
## best............ 8.250112e+01
## mean............ 9.117046e+01
## variance........ 5.093049e+03
## var 3:
## best............ 9.096788e+02
## mean............ 8.950236e+02
## variance........ 4.932840e+03
## var 4:
## best............ 1.592755e+01
## mean............ 4.649374e+01
## variance........ 1.363803e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.315039e+02
## variance........ 1.070818e+04
## var 6:
## best............ 1.233238e+02
## mean............ 1.496509e+02
## variance........ 1.282953e+04
## var 7:
## best............ 5.954648e+01
## mean............ 7.619424e+01
## variance........ 4.285898e+03
## var 8:
## best............ 9.220477e+00
## mean............ 1.191792e+01
## variance........ 2.213646e+02
##
## GENERATION: 13
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 60, #Total UniqueCount: 902
## var 1:
## best............ 6.494892e+01
## mean............ 9.934597e+01
## variance........ 1.018825e+04
## var 2:
## best............ 8.250112e+01
## mean............ 1.148174e+02
## variance........ 1.509695e+04
## var 3:
## best............ 9.096788e+02
## mean............ 8.872010e+02
## variance........ 7.440467e+03
## var 4:
## best............ 1.592755e+01
## mean............ 4.505442e+01
## variance........ 1.145151e+04
## var 5:
## best............ 3.091495e+02
## mean............ 3.220020e+02
## variance........ 5.439656e+03
## var 6:
## best............ 1.233238e+02
## mean............ 1.574995e+02
## variance........ 1.427489e+04
## var 7:
## best............ 5.954648e+01
## mean............ 7.933658e+01
## variance........ 5.552569e+03
## var 8:
## best............ 9.220477e+00
## mean............ 3.710313e+01
## variance........ 1.215456e+04
##
## GENERATION: 14
## Lexical Fit..... 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 59, #Total UniqueCount: 961
## var 1:
## best............ 6.494892e+01
## mean............ 9.725432e+01
## variance........ 1.092909e+04
## var 2:
## best............ 8.250112e+01
## mean............ 9.038513e+01
## variance........ 1.394343e+03
## var 3:
## best............ 9.096788e+02
## mean............ 8.915411e+02
## variance........ 8.372658e+03
## var 4:
## best............ 1.592755e+01
## mean............ 3.053020e+01
## variance........ 3.639095e+03
## var 5:
## best............ 3.091495e+02
## mean............ 3.175427e+02
## variance........ 4.918956e+03
## var 6:
## best............ 1.233238e+02
## mean............ 1.339058e+02
## variance........ 3.316032e+03
## var 7:
## best............ 5.954648e+01
## mean............ 8.289378e+01
## variance........ 1.122149e+04
## var 8:
## best............ 9.220477e+00
## mean............ 2.823948e+01
## variance........ 7.039777e+03
##
## 'wait.generations' limit reached.
## No significant improvement in 4 generations.
##
## Solution Lexical Fitness Value:
## 2.566608e-01 2.566608e-01 2.607368e-01 3.113079e-01 3.129637e-01 3.173158e-01 3.173158e-01 3.712278e-01 3.712278e-01 3.891086e-01 9.839229e-01 9.999586e-01 9.999982e-01 1.000000e+00 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 6.494892e+01
## X[ 2] : 8.250112e+01
## X[ 3] : 9.096788e+02
## X[ 4] : 1.592755e+01
## X[ 5] : 3.091495e+02
## X[ 6] : 1.233238e+02
## X[ 7] : 5.954648e+01
## X[ 8] : 9.220477e+00
##
## Solution Found Generation 9
## Number of Generations Run 14
##
## Thu Feb 06 15:01:08 2020
## Total run time : 0 hours 0 minutes and 6 seconds
What do you need to do after matching? You need to check if the matching process generate a set of matched samples that looks really similar between the treated and the control groups. 1) summary statistics 2) plots: Q-Q plot: if dots are not lie on the 45 degree line, it’s not a good match jitter plots: it shows the overall distribution of propensity scores (you want more dots overlapped) histograms: the shapes of histograms should look as similar as possible
summary(matched2)
##
## Call:
## matchit(formula = treat ~ re74 + re75 + educ + black + hispan +
## nodegree + married, data = Le, method = "optimal", ratio = 2)
##
## Summary of balance for all data:
## Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance 0.5762 0.1828 0.2301 0.3935 0.5113 0.3935
## re74 2095.5737 5619.2365 6788.7508 -3523.6628 2425.5720 3620.9240
## re75 1532.0553 2466.4844 3291.9962 -934.4291 981.0968 1060.6582
## educ 10.3459 10.2354 2.8552 0.1105 1.0000 0.7027
## black 0.8432 0.2028 0.4026 0.6404 1.0000 0.6432
## hispan 0.0595 0.1422 0.3497 -0.0827 0.0000 0.0811
## nodegree 0.7081 0.5967 0.4911 0.1114 0.0000 0.1135
## married 0.1892 0.5128 0.5004 -0.3236 0.0000 0.3243
## eQQ Max
## distance 0.5979
## re74 9216.5000
## re75 6795.0100
## educ 4.0000
## black 1.0000
## hispan 1.0000
## nodegree 1.0000
## married 1.0000
##
##
## Summary of balance for matched data:
## Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance 0.5762 0.2090 0.2375 0.3672 0.4860 0.3682
## re74 2095.5737 3986.6420 5274.9661 -1891.0683 1469.4500 2006.5518
## re75 1532.0553 2323.7468 3223.1627 -791.6915 857.5645 914.9357
## educ 10.3459 10.2730 2.7710 0.0730 0.0000 0.6000
## black 0.8432 0.2351 0.4247 0.6081 1.0000 0.6108
## hispan 0.0595 0.1649 0.3716 -0.1054 0.0000 0.1027
## nodegree 0.7081 0.6243 0.4850 0.0838 0.0000 0.0865
## married 0.1892 0.4378 0.4968 -0.2486 0.0000 0.2486
## eQQ Max
## distance 0.5821
## re74 9177.7500
## re75 6795.0100
## educ 4.0000
## black 1.0000
## hispan 1.0000
## nodegree 1.0000
## married 1.0000
##
## Percent Balance Improvement:
## Mean Diff. eQQ Med eQQ Mean eQQ Max
## distance 6.6735 4.9393 6.4347 2.6390
## re74 46.3323 39.4184 44.5845 0.4204
## re75 15.2754 12.5912 13.7389 0.0000
## educ 33.9699 100.0000 14.6154 0.0000
## black 5.0493 0.0000 5.0420 0.0000
## hispan -27.4063 0.0000 -26.6667 0.0000
## nodegree 24.7709 0.0000 23.8095 0.0000
## married 23.1692 0.0000 23.3333 0.0000
##
## Sample sizes:
## Control Treated
## All 429 185
## Matched 370 185
## Unmatched 59 0
## Discarded 0 0
summary(matched3)
##
## Call:
## matchit(formula = treat ~ re74 + re75 + educ + black + hispan +
## nodegree + married, data = Le, method = "full", distance = "logit")
##
## Summary of balance for all data:
## Means Treated Means Control Mean Diff eQQ Med eQQ Mean eQQ Max
## distance 0.5762 0.1828 0.3935 0.5113 0.3935 0.5979
## re74 2095.5737 5619.2365 -3523.6628 2425.5720 3620.9240 9216.5000
## re75 1532.0553 2466.4844 -934.4291 981.0968 1060.6582 6795.0100
## educ 10.3459 10.2354 0.1105 1.0000 0.7027 4.0000
## black 0.8432 0.2028 0.6404 1.0000 0.6432 1.0000
## hispan 0.0595 0.1422 -0.0827 0.0000 0.0811 1.0000
## nodegree 0.7081 0.5967 0.1114 0.0000 0.1135 1.0000
## married 0.1892 0.5128 -0.3236 0.0000 0.3243 1.0000
##
##
## Summary of balance for matched data:
## Means Treated Means Control Mean Diff eQQ Med eQQ Mean eQQ Max
## distance 0.5762 0.5759 0.0003 0.0015 0.0055 0.0727
## re74 2095.5737 1824.1690 271.4047 0.0000 498.7641 17212.7000
## re75 1532.0553 1203.1728 328.8826 42.9677 476.1530 12746.0500
## educ 10.3459 10.2219 0.1241 0.0000 0.3876 4.0000
## black 0.8432 0.8350 0.0082 0.0000 0.0112 1.0000
## hispan 0.0595 0.0576 0.0019 0.0000 0.0000 0.0000
## nodegree 0.7081 0.7033 0.0048 0.0000 0.0060 1.0000
## married 0.1892 0.1293 0.0599 0.0000 0.0604 1.0000
##
## Percent Balance Improvement:
## Mean Diff. eQQ Med eQQ Mean eQQ Max
## distance 99.9249 99.7019 98.6081 87.8474
## re74 92.2977 100.0000 86.2255 -86.7596
## re75 64.8039 95.6204 55.1078 -87.5796
## educ -12.2722 100.0000 44.8415 0.0000
## black 98.7139 100.0000 98.2588 0.0000
## hispan 97.7274 0.0000 100.0000 100.0000
## nodegree 95.7157 0.0000 94.7143 0.0000
## married 81.4834 0.0000 81.3767 0.0000
##
## Sample sizes:
## Control Treated
## All 429 185
## Matched 429 185
## Unmatched 0 0
## Discarded 0 0
summary(matched4)
##
## Call:
## matchit(formula = treat ~ re74 + re75 + educ + black + hispan +
## nodegree + married, data = Le, method = "genetic")
##
## Summary of balance for all data:
## Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance 0.5762 0.1828 0.2301 0.3935 0.5113 0.3935
## re74 2095.5737 5619.2365 6788.7508 -3523.6628 2425.5720 3620.9240
## re75 1532.0553 2466.4844 3291.9962 -934.4291 981.0968 1060.6582
## educ 10.3459 10.2354 2.8552 0.1105 1.0000 0.7027
## black 0.8432 0.2028 0.4026 0.6404 1.0000 0.6432
## hispan 0.0595 0.1422 0.3497 -0.0827 0.0000 0.0811
## nodegree 0.7081 0.5967 0.4911 0.1114 0.0000 0.1135
## married 0.1892 0.5128 0.5004 -0.3236 0.0000 0.3243
## eQQ Max
## distance 0.5979
## re74 9216.5000
## re75 6795.0100
## educ 4.0000
## black 1.0000
## hispan 1.0000
## nodegree 1.0000
## married 1.0000
##
##
## Summary of balance for matched data:
## Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance 0.5762 0.5707 0.2268 0.0055 0.0648 0.1344
## re74 2095.5737 1885.3442 4244.4136 210.2295 0.0000 787.8090
## re75 1532.0553 1479.4258 2951.0193 52.6295 0.0000 403.2667
## educ 10.3459 10.2649 2.0247 0.0811 0.0000 0.2857
## black 0.8432 0.8378 0.3706 0.0054 0.0000 0.1978
## hispan 0.0595 0.0595 0.2378 0.0000 0.0000 0.0330
## nodegree 0.7081 0.6919 0.4643 0.0162 0.0000 0.0769
## married 0.1892 0.2108 0.4101 -0.0216 0.0000 0.0989
## eQQ Max
## distance 0.4497
## re74 13121.7500
## re75 6795.0100
## educ 2.0000
## black 1.0000
## hispan 1.0000
## nodegree 1.0000
## married 1.0000
##
## Percent Balance Improvement:
## Mean Diff. eQQ Med eQQ Mean eQQ Max
## distance 98.6023 87.3305 65.8595 24.7848
## re74 94.0338 100.0000 78.2429 -42.3724
## re75 94.3677 100.0000 61.9796 0.0000
## educ 26.6332 100.0000 59.3407 50.0000
## black 99.1560 100.0000 69.2492 0.0000
## hispan 100.0000 0.0000 59.3407 0.0000
## nodegree 85.4395 0.0000 32.2344 0.0000
## married 93.3191 0.0000 69.5055 0.0000
##
## Sample sizes:
## Control Treated
## All 429 185
## Matched 91 185
## Unmatched 338 0
## Discarded 0 0
plot(matched2)
plot(matched3)
plot(matched4)
What to do after matching? 1) assigned the matched ones into a dataframe, and then conduct analysis 2) or export the matched dataset to other softwares for further analysis
#1
m.data1 <- match.data(matched1)
#2
write.csv(m.data1, file = "matchLe.csv")