This notebook demonstrates how to use the MatchIt
package in R to perform propensity score matching using the classic
lalonde
dataset.
library(MatchIt)
## Warning: package 'MatchIt' was built under R version 4.3.3
library(cobalt)
## cobalt (Version 4.6.0, Build Date: 2025-04-15)
##
## Attaching package: 'cobalt'
## The following object is masked from 'package:MatchIt':
##
## lalonde
data("lalonde", package = "MatchIt")
head(lalonde)
treat
: 1 = treated, 0 = controlre78
: 1978 earnings (outcome)age
, educ
, race
,
married
, nodegree
, re74
,
re75
m.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde,
method = "nearest",
ratio = 1)
summary(m.out)
##
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree +
## re74 + re75, data = lalonde, method = "nearest", ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.5774 0.1822 1.7941 0.9211 0.3774
## age 25.8162 28.0303 -0.3094 0.4400 0.0813
## educ 10.3459 10.2354 0.0550 0.4959 0.0347
## raceblack 0.8432 0.2028 1.7615 . 0.6404
## racehispan 0.0595 0.1422 -0.3498 . 0.0827
## racewhite 0.0973 0.6550 -1.8819 . 0.5577
## married 0.1892 0.5128 -0.8263 . 0.3236
## nodegree 0.7081 0.5967 0.2450 . 0.1114
## re74 2095.5737 5619.2365 -0.7211 0.5181 0.2248
## re75 1532.0553 2466.4844 -0.2903 0.9563 0.1342
## eCDF Max
## distance 0.6444
## age 0.1577
## educ 0.1114
## raceblack 0.6404
## racehispan 0.0827
## racewhite 0.5577
## married 0.3236
## nodegree 0.1114
## re74 0.4470
## re75 0.2876
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.5774 0.3629 0.9739 0.7566 0.1321
## age 25.8162 25.3027 0.0718 0.4568 0.0847
## educ 10.3459 10.6054 -0.1290 0.5721 0.0239
## raceblack 0.8432 0.4703 1.0259 . 0.3730
## racehispan 0.0595 0.2162 -0.6629 . 0.1568
## racewhite 0.0973 0.3135 -0.7296 . 0.2162
## married 0.1892 0.2108 -0.0552 . 0.0216
## nodegree 0.7081 0.6378 0.1546 . 0.0703
## re74 2095.5737 2342.1076 -0.0505 1.3289 0.0469
## re75 1532.0553 1614.7451 -0.0257 1.4956 0.0452
## eCDF Max Std. Pair Dist.
## distance 0.4216 0.9740
## age 0.2541 1.3938
## educ 0.0757 1.2474
## raceblack 0.3730 1.0259
## racehispan 0.1568 1.0743
## racewhite 0.2162 0.8390
## married 0.0216 0.8281
## nodegree 0.0703 1.0106
## re74 0.2757 0.7965
## re75 0.2054 0.7381
##
## Sample Sizes:
## Control Treated
## All 429 185
## Matched 185 185
## Unmatched 244 0
## Discarded 0 0
love.plot(m.out, threshold = 0.1, stars = "std")
The love.plot
displays standardized mean
differences for covariates before and after matching. Values
under 0.1 are considered well-balanced.
matched_data <- match.data(m.out)
head(matched_data)
# Simple difference in means of 1978 earnings (re78)
with(matched_data, t.test(re78 ~ treat))
##
## Welch Two Sample t-test
##
## data: re78 by treat
## t = -1.2247, df = 345.59, p-value = 0.2215
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -2330.7493 542.0143
## sample estimates:
## mean in group 0 mean in group 1
## 5454.776 6349.144
Interpret the result: A significant difference suggests that the treatment (job training) had an effect on earnings.
MatchIt
package simplifies this process in R.cobalt
for balance diagnostics: https://cran.r-project.org/web/packages/cobalt