A Practical Guide to Propensity Score Designs in R: Exploring MatchIt and WeightIt on the Lalonde Dataset

Author

Dinesh Kumar

Published

November 23, 2025

1. Introduction

In randomized trials, treatment allocation is determined by design, and therefore treated and control groups are comparable at baseline. In observational settings, however, treatment is driven by clinical judgment, access, socioeconomic factors, or patient characteristics. As a result, the treated and control groups often differ even before treatment begins.

If we evaluate outcomes without addressing these differences, the analysis does not reflect a true treatment comparison. The purpose of propensity score designs is to first make the treated and control populations comparable, and only after that, study the outcome.

This tutorial focuses entirely on the design stage and compares multiple methods from MatchIt and WeightIt. We use the well-known Lalonde dataset, which contains information on a job-training program. The perspective here is practical: rather than recommending a single best method, we show how different methods alter covariate balance and sample structure.

The outcome analysis at the end is minimal by design, because the primary goal is to understand how each design handles confounding before modeling.

Choosing the right design based on the estimand

The target population (estimand) should drive the selection of the propensity score method.
The table below summarizes the commonly used estimands and the recommended designs.

Estimand	Scientific Question	Target Population	Recommended Methods	Notes
ATT (Average Treatment Effect on the Treated)	What is the treatment effect among those who actually received the treatment?	Treated subjects	Nearest neighbor matching, Full matching, Exact + hybrid matching, Matching weights	Mimics a matched cohort. May discard controls when no close matches exist.
ATE (Average Treatment Effect)	What would happen if everyone in the population were treated vs. untreated?	Full study population	IPTW via logistic regression (PS), IPTW via GBM, Entropy balancing, CBPS weighting	Uses full dataset. Sensitive to extreme PS values and weight variability; needs diagnostics.
ATO (Average Treatment Effect in the Overlap Population)	What is the treatment effect among patients who realistically could have received either treatment?	Region of good overlap	Overlap weighting	Produces highly stable estimates. Often the most robust when PS distributions do not overlap well.

The estimand determines the scientific meaning of the treatment effect.
ATT focuses on the patients who received treatment, ATE focuses on the entire population, and ATO focuses on the subgroup where treatment decision could plausibly have gone either way. The method should always be selected to match the estimand rather than the other way around.

Selecting the appropriate method: why and when to use each design

The table below summarizes the practical motivations and ideal use-cases for commonly used matching and weighting techniques.

Method	Why use it	When it is most appropriate	Considerations
Nearest Neighbor Matching	Creates a sample that closely resembles a randomized experiment by pairing similar treated and untreated individuals	When the clinical or analytical team prefers a “matched cohort” framework and the sample size is sufficiently large for good matches	Loss of sample if good matches are unavailable; results apply to the treated population (ATT)
Optimal Matching	Finds globally optimal pairs rather than local greedy matches	When nearest neighbor matching results in poor balance or inefficient pairs	Slightly more complex; still susceptible to sample loss
Full Matching	Uses sets of multiple treated and control individuals to improve efficiency and preserve sample size	When maintaining sample size is important while still achieving good covariate balance	Produces weights rather than strict pairs; more complex to describe
Subclassification	Stratifies subjects into strata based on propensity score and compares within each stratum	When a simple and interpretable design is preferred, often as an exploratory analysis	Balance may not be perfect; strata must contain both treatment groups
Exact / Coarsened Exact Matching	Prevents unrealistic comparisons by forcing treated and untreated subjects to match within predefined covariate categories	When key categorical covariates should not be compared across levels (for example, race, disease stage)	May drop large numbers of participants when covariates are many or finely stratified
Mahalanobis Matching	Matches based on multivariate distance between covariates rather than the PS	When covariates are few and continuous and treatment assignment is not too imbalanced	Not scalable when many covariates or categorical variables exist
IPTW (Logistic Regression PS)	Creates a pseudo-population where treatment assignment is independent of covariates	When full sample retention and population-level (ATE) estimates are desired	Requires diagnostics for extreme weights; model misspecification can harm balance
Boosted IPTW (GBM)	Captures nonlinear and higher-order relationships automatically	When logistic regression weighting does not yield acceptable balance	More computationally intensive; requires tuning
Entropy Balancing	Directly balances covariate moments without explicitly modeling treatment	When strict covariate balance is required for ATE with minimal weight variability	Requires continuous or discretized covariates; less intuitive to communicate
CBPS Weighting	Integrates balance constraints into the PS estimation step	When logistic regression PS is inadequate and Boosting is not preferred	Useful compromise between PS and balancing methods
Overlap Weighting	Focuses on the region where treated and control groups are most comparable	When the interest lies in a realistic treatment population and extreme PS values exist	Produces highly stable estimates; interprets the ATO estimand
Matching Weights	Emulates matching behavior using weights rather than discarding subjects	When ATT is desired without sample loss associated with strict pair matching	Effectively a weighted form of PSM; easy to present when ATT is the goal

In practice, method selection should be guided by the estimand, the structure of the dataset, and analytical goals. Matching is generally preferred when the scientific audience relates well to a “matched cohort” interpretation. Weighting is preferred when sample retention and statistical efficiency are priorities, particularly for survival analysis. Regardless of method, covariate balance diagnostics must be reviewed before any treatment effect is interpreted.

2. Data setup and causal question

The simplified causal question from the Lalonde study is:

Does participation in the training program (treat) improve earnings in 1978 (re78)?

We first inspect baseline differences to see how strongly confounded the dataset is.

library(MatchIt)
library(WeightIt)
library(cobalt)
library(tidyverse)
library(gtsummary)
library(survey)
library(sandwich)
library(lmtest)

data("lalonde")

lalonde %>%
  select(treat, age, educ, race, married, nodegree, re74, re75, re78) %>%
  tbl_summary(by = treat) %>%
  add_overall() %>%
  add_p()

Characteristic	Overall N = 614¹	0 N = 429¹	1 N = 185¹	p-value²
age	25 (20, 32)	25 (19, 35)	25 (20, 29)	0.5
educ	11 (9, 12)	11 (9, 12)	11 (9, 12)	0.8
race				<0.001
black	243 (40%)	87 (20%)	156 (84%)
hispan	72 (12%)	61 (14%)	11 (5.9%)
white	299 (49%)	281 (66%)	18 (9.7%)
married	255 (42%)	220 (51%)	35 (19%)	<0.001
nodegree	387 (63%)	256 (60%)	131 (71%)	0.009
re74	1,042 (0, 7,892)	2,547 (0, 9,277)	0 (0, 1,291)	<0.001
re75	602 (0, 3,255)	1,087 (0, 3,881)	0 (0, 1,817)	<0.001
re78	4,759 (238, 10,923)	4,976 (220, 11,689)	4,232 (485, 9,643)	0.3
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

As expected, there are substantial differences between the treated and control groups, especially in earnings prior to treatment (re74, re75). These imbalances are enough to create misleading estimates if the data is analyzed without adjustment.

We define a consistent propensity score formula to be used across all methods:

ps_formula <- treat ~ age + educ + race + married + nodegree + re74 + re75
covariates <- c("age", "educ", "race", "married", "nodegree", "re74", "re75")

3. MatchIt: Matching designs

Different matching approaches offer different trade-offs. We explore several to highlight how design decisions influence balance.

3.1 Nearest neighbor matching

This is the most commonly used approach because the matched dataset resembles an RCT-like sample.

m_nearest <- matchit(
  ps_formula,
  data = lalonde,
  method = "nearest",
  distance = "logit",
  ratio = 1
)

summary(m_nearest)


Call:
matchit(formula = ps_formula, data = lalonde, method = "nearest", 
    distance = "logit", ratio = 1)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.1822          1.7941     0.9211    0.3774
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6444
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.3629          0.9739     0.7566    0.1321
age              25.8162       25.3027          0.0718     0.4568    0.0847
educ             10.3459       10.6054         -0.1290     0.5721    0.0239
raceblack         0.8432        0.4703          1.0259          .    0.3730
racehispan        0.0595        0.2162         -0.6629          .    0.1568
racewhite         0.0973        0.3135         -0.7296          .    0.2162
married           0.1892        0.2108         -0.0552          .    0.0216
nodegree          0.7081        0.6378          0.1546          .    0.0703
re74           2095.5737     2342.1076         -0.0505     1.3289    0.0469
re75           1532.0553     1614.7451         -0.0257     1.4956    0.0452
           eCDF Max Std. Pair Dist.
distance     0.4216          0.9740
age          0.2541          1.3938
educ         0.0757          1.2474
raceblack    0.3730          1.0259
racehispan   0.1568          1.0743
racewhite    0.2162          0.8390
married      0.0216          0.8281
nodegree     0.0703          1.0106
re74         0.2757          0.7965
re75         0.2054          0.7381

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

love.plot(
  m_nearest,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Nearest neighbor matching"
)

Nearest neighbor is often a first choice because it is intuitive and easy to communicate clinically. The trade-off is sample loss when good matches do not exist.

3.2 Optimal matching

Optimal matching looks globally across all possible pairings instead of making greedy decisions pair by pair.

m_optimal <- matchit(
  ps_formula,
  data = lalonde,
  method = "optimal",
  distance = "logit"
)

summary(m_optimal)


Call:
matchit(formula = ps_formula, data = lalonde, method = "optimal", 
    distance = "logit")

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.1822          1.7941     0.9211    0.3774
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6444
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.3629          0.9739     0.7566    0.1321
age              25.8162       25.3892          0.0597     0.4536    0.0853
educ             10.3459       10.5514         -0.1022     0.5831    0.0222
raceblack         0.8432        0.4703          1.0259          .    0.3730
racehispan        0.0595        0.2162         -0.6629          .    0.1568
racewhite         0.0973        0.3135         -0.7296          .    0.2162
married           0.1892        0.2108         -0.0552          .    0.0216
nodegree          0.7081        0.6541          0.1189          .    0.0541
re74           2095.5737     2461.2839         -0.0748     1.2413    0.0520
re75           1532.0553     1697.7677         -0.0515     1.3694    0.0497
           eCDF Max Std. Pair Dist.
distance     0.4216          0.9740
age          0.2486          1.4105
educ         0.0649          1.2420
raceblack    0.3730          1.0259
racehispan   0.1568          1.0743
racewhite    0.2162          0.8390
married      0.0216          0.8281
nodegree     0.0541          1.0225
re74         0.2811          0.8209
re75         0.2054          0.7697

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

love.plot(
  m_optimal,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Optimal matching"
)

This method is useful when the nearest neighbor algorithm gives suboptimal pairing combinations.

3.3 Full matching

Instead of strict 1:1 matching, full matching groups multiple treated and control observations. It tends to be efficient and retains more subjects.

m_full <- matchit(
  ps_formula,
  data = lalonde,
  method = "full",
  distance = "logit"
)

summary(m_full)


Call:
matchit(formula = ps_formula, data = lalonde, method = "full", 
    distance = "logit")

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.1822          1.7941     0.9211    0.3774
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6444
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.5762          0.0054     0.9930    0.0041
age              25.8162       24.8095          0.1407     0.4976    0.0795
educ             10.3459       10.3452          0.0004     0.5830    0.0206
raceblack         0.8432        0.8347          0.0236          .    0.0086
racehispan        0.0595        0.0657         -0.0266          .    0.0063
racewhite         0.0973        0.0996         -0.0078          .    0.0023
married           0.1892        0.1368          0.1338          .    0.0524
nodegree          0.7081        0.7056          0.0056          .    0.0025
re74           2095.5737     2363.4473         -0.0548     1.1080    0.0424
re75           1532.0553     1632.4020         -0.0312     1.8588    0.0704
           eCDF Max Std. Pair Dist.
distance     0.0486          0.0192
age          0.3131          1.3111
educ         0.0548          1.2390
raceblack    0.0086          0.0324
racehispan   0.0063          0.5400
racewhite    0.0023          0.3911
married      0.0524          0.4715
nodegree     0.0025          0.9593
re74         0.2492          0.8654
re75         0.2366          0.8099

Sample Sizes:
              Control Treated
All            429.       185
Matched (ESS)   52.11     185
Matched        429.       185
Unmatched        0.         0
Discarded        0.         0

love.plot(
  m_full,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Full matching"
)

Full matching is a good choice when the goal is to keep sample size high without extreme weights.

3.4 Subclassification

Subclassification divides participants into groups (for example quintiles) based on the propensity score.

m_sub <- matchit(
  ps_formula,
  data = lalonde,
  method = "subclass",
  subclass = 5
)

summary(m_sub)


Call:
matchit(formula = ps_formula, data = lalonde, method = "subclass", 
    subclass = 5)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.1822          1.7941     0.9211    0.3774
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6444
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance Across Subclasses
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.5487          0.1306     0.7824    0.0475
age              25.8162       26.1733         -0.0499     0.4018    0.0962
educ             10.3459       10.3500         -0.0020     0.6294    0.0189
raceblack         0.8432        0.8079          0.0973          .    0.0354
racehispan        0.0595        0.0343          0.1065          .    0.0252
racewhite         0.0973        0.1579         -0.2044          .    0.0606
married           0.1892        0.2430         -0.1375          .    0.0538
nodegree          0.7081        0.6719          0.0797          .    0.0362
re74           2095.5737     2857.7899         -0.1560     0.8637    0.0647
re75           1532.0553     1646.2761         -0.0355     1.2884    0.0373
           eCDF Max
distance     0.1130
age          0.2706
educ         0.0529
raceblack    0.0354
racehispan   0.0252
racewhite    0.0606
married      0.0538
nodegree     0.0362
re74         0.2749
re75         0.1595

Sample Sizes:
              Control Treated
All            429.       185
Matched (ESS)  103.35     185
Matched        429.       185
Unmatched        0.         0
Discarded        0.         0

love.plot(
  m_sub,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Propensity score subclassification"
)

Subclassification is simple and interpretable and often serves as a diagnostic tool to explore whether a more sophisticated design will be required.

3.5 Exact matching with a hybrid approach

Exact matching can be added to prevent inappropriate comparisons across categorical strata.

m_exact <- matchit(
  ps_formula,
  data = lalonde,
  method = "nearest",
  distance = "logit",
  exact = ~ race
)

summary(m_exact)


Call:
matchit(formula = ps_formula, data = lalonde, method = "nearest", 
    distance = "logit", exact = ~race)

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5774        0.1822          1.7941     0.9211    0.3774
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
distance     0.6444
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance          0.5851        0.4899          0.4319     1.2527    0.0902
age              25.5517       26.4397         -0.1241     0.4439    0.0804
educ             10.4914       10.3879          0.0515     0.3517    0.0417
raceblack         0.7500        0.7500          0.0000          .    0.0000
racehispan        0.0948        0.0948          0.0000          .    0.0000
racewhite         0.1552        0.1552          0.0000          .    0.0000
married           0.0603        0.2759         -0.5503          .    0.2155
nodegree          0.7845        0.6121          0.3792          .    0.1724
re74            887.7150     3090.7096         -0.4508     0.2353    0.1553
re75           1293.6827     1924.4269         -0.1959     1.0263    0.0941
           eCDF Max Std. Pair Dist.
distance     0.4224          0.4338
age          0.1724          1.4085
educ         0.1724          1.1405
raceblack    0.0000          0.0000
racehispan   0.0000          0.0000
racewhite    0.0000          0.0000
married      0.2155          0.5943
nodegree     0.1724          0.8722
re74         0.3879          0.7036
re75         0.2155          0.7532

Sample Sizes:
          Control Treated
All           429     185
Matched       116     116
Unmatched     313      69
Discarded       0       0

love.plot(
  m_exact,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Nearest neighbor with exact matching on race"
)

This design forces matching only within race categories, which prevents extrapolation across demographic groups.

3.6 Mahalanobis matching

Instead of using propensity scores, Mahalanobis matching pairs units based on multivariate covariate distance.

m_mahal <- matchit(
  treat ~ age + educ + re74 + re75,
  data = lalonde,
  method = "nearest",
  distance = "mahalanobis"
)

summary(m_mahal)


Call:
matchit(formula = treat ~ age + educ + re74 + re75, data = lalonde, 
    method = "nearest", distance = "mahalanobis")

Summary of Balance for All Data:
     Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
age        25.8162       28.0303         -0.3094     0.4400    0.0813   0.1577
educ       10.3459       10.2354          0.0550     0.4959    0.0347   0.1114
re74     2095.5737     5619.2365         -0.7211     0.5181    0.2248   0.4470
re75     1532.0553     2466.4844         -0.2903     0.9563    0.1342   0.2876

Summary of Balance for Matched Data:
     Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
age        25.8162       25.3459          0.0657     0.7704    0.0345   0.1351
educ       10.3459       10.4865         -0.0699     0.8509    0.0137   0.1081
re74     2095.5737     2542.9694         -0.0916     1.2578    0.0742   0.3351
re75     1532.0553     1651.6935         -0.0372     1.3200    0.0408   0.1838
     Std. Pair Dist.
age           0.3120
educ          0.2312
re74          0.2019
re75          0.1648

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

love.plot(
  m_mahal,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Mahalanobis distance matching"
)

Mahalanobis matching is particularly useful when covariates are few and continuous.

3.7 Coarsened exact matching (CEM)

Coarsened exact matching introduces deliberate grouping to ensure comparisons occur only within appropriate regions of covariate space.

m_cem <- matchit(
  ps_formula,
  data = lalonde,
  method = "cem",
  cutpoints = list(
    age = c(20, 30, 40, 50, 60),
    re74 = c(0, 5000, 10000, 20000),
    re75 = c(0, 5000, 10000, 20000)
  )
)

summary(m_cem)


Call:
matchit(formula = ps_formula, data = lalonde, method = "cem", 
    cutpoints = list(age = c(20, 30, 40, 50, 60), re74 = c(0, 
        5000, 10000, 20000), re75 = c(0, 5000, 10000, 20000)))

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
age              25.8162       28.0303         -0.3094     0.4400    0.0813
educ             10.3459       10.2354          0.0550     0.4959    0.0347
raceblack         0.8432        0.2028          1.7615          .    0.6404
racehispan        0.0595        0.1422         -0.3498          .    0.0827
racewhite         0.0973        0.6550         -1.8819          .    0.5577
married           0.1892        0.5128         -0.8263          .    0.3236
nodegree          0.7081        0.5967          0.2450          .    0.1114
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248
re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342
           eCDF Max
age          0.1577
educ         0.1114
raceblack    0.6404
racehispan   0.0827
racewhite    0.5577
married      0.3236
nodegree     0.1114
re74         0.4470
re75         0.2876

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
age              23.5798       22.9714          0.0850     0.7062    0.0322
educ             10.7311       10.6671          0.0318     0.8910    0.0049
raceblack         0.8487        0.8487          0.0000          .    0.0000
racehispan        0.0252        0.0252          0.0000          .    0.0000
racewhite         0.1261        0.1261          0.0000          .    0.0000
married           0.0588        0.0588          0.0000          .    0.0000
nodegree          0.6303        0.6303          0.0000          .    0.0000
re74           1377.3583     1622.3136         -0.0501     0.7671    0.0329
re75            866.5467     1221.9493         -0.1104     0.9496    0.0691
           eCDF Max Std. Pair Dist.
age          0.2033          0.2734
educ         0.0601          0.1949
raceblack    0.0000          0.0000
racehispan   0.0000          0.0000
racewhite    0.0000          0.0000
married      0.0000          0.0000
nodegree     0.0000          0.0000
re74         0.2256          0.1632
re75         0.1709          0.2707

Sample Sizes:
              Control Treated
All            429.       185
Matched (ESS)   30.87     119
Matched        130.       119
Unmatched      299.        66
Discarded        0.         0

love.plot(
  m_cem,
  stat = "mean.diffs",
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Coarsened exact matching"
)

The analyst controls how strict the comparisons should be by setting the cutpoints.

4. WeightIt: Weighting designs

Where matching creates pairs or sets, weighting creates a pseudo-population in which treatment is independent of observed covariates. We now explore several types of weight construction.

4.1 Logistic regression IPTW

w_ps <- weightit(
  ps_formula,
  data = lalonde,
  method = "ps",
  estimand = "ATE"
)

summary(w_ps$weights)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.009   1.052   1.170   1.905   1.623  40.077

bal.tab(w_ps, thresholds = c(m = 0.1))

Balance Measures
                Type Diff.Adj        M.Threshold
prop.score  Distance   0.1360                   
age          Contin.  -0.1676 Not Balanced, >0.1
educ         Contin.   0.1296 Not Balanced, >0.1
race_black    Binary   0.0499     Balanced, <0.1
race_hispan   Binary   0.0047     Balanced, <0.1
race_white    Binary  -0.0546     Balanced, <0.1
married       Binary  -0.0944     Balanced, <0.1
nodegree      Binary  -0.0547     Balanced, <0.1
re74         Contin.  -0.2740 Not Balanced, >0.1
re75         Contin.  -0.1579 Not Balanced, >0.1

Balance tally for mean differences
                   count
Balanced, <0.1         5
Not Balanced, >0.1     4

Variable with the greatest mean difference
 Variable Diff.Adj        M.Threshold
     re74   -0.274 Not Balanced, >0.1

Effective sample sizes
           Control Treated
Unadjusted  429.    185.  
Adjusted    329.01   58.33

love.plot(
  w_ps,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "IPTW via logistic regression"
)

This is the most commonly used weighting method, but weight variability must be checked.

4.2 Boosted IPTW

set.seed(404)
w_gbm <- weightit(
  ps_formula,
  data = lalonde,
  method = "gbm",
  estimand = "ATE"
)

summary(w_gbm$weights)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.008   1.045   1.093   1.528   1.419  21.175

bal.tab(w_gbm, thresholds = c(m = 0.1))

Balance Measures
                Type Diff.Adj        M.Threshold
prop.score  Distance   1.2300                   
age          Contin.  -0.2854 Not Balanced, >0.1
educ         Contin.   0.1153 Not Balanced, >0.1
race_black    Binary   0.2534 Not Balanced, >0.1
race_hispan   Binary   0.0186     Balanced, <0.1
race_white    Binary  -0.2720 Not Balanced, >0.1
married       Binary  -0.1579 Not Balanced, >0.1
nodegree      Binary   0.0322     Balanced, <0.1
re74         Contin.  -0.2953 Not Balanced, >0.1
re75         Contin.  -0.0960     Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1         3
Not Balanced, >0.1     6

Variable with the greatest mean difference
 Variable Diff.Adj        M.Threshold
     re74  -0.2953 Not Balanced, >0.1

Effective sample sizes
           Control Treated
Unadjusted  429.    185.  
Adjusted    278.81   78.94

love.plot(
  w_gbm,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "IPTW via boosted models"
)

GBM can capture nonlinear confounding patterns.

4.3 Entropy balancing

w_ebal <- weightit(
  ps_formula,
  data = lalonde,
  method = "ebal",
  estimand = "ATE"
)

summary(w_ebal$weights)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.07148  0.59725  0.74882  1.00000  0.92261 16.04209

bal.tab(w_ebal, thresholds = c(m = 0.1))

Balance Measures
               Type Diff.Adj    M.Threshold
age         Contin.       -0 Balanced, <0.1
educ        Contin.       -0 Balanced, <0.1
race_black   Binary        0 Balanced, <0.1
race_hispan  Binary       -0 Balanced, <0.1
race_white   Binary        0 Balanced, <0.1
married      Binary       -0 Balanced, <0.1
nodegree     Binary        0 Balanced, <0.1
re74        Contin.        0 Balanced, <0.1
re75        Contin.        0 Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1         9
Not Balanced, >0.1     0

Variable with the greatest mean difference
 Variable Diff.Adj    M.Threshold
      age       -0 Balanced, <0.1

Effective sample sizes
           Control Treated
Unadjusted  429.    185.  
Adjusted    342.55   40.36

love.plot(
  w_ebal,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Entropy balancing"
)

Entropy models directly balance covariate moments instead of modeling treatment.

4.4 Covariate balancing propensity score

w_cbps <- weightit(
  ps_formula,
  data = lalonde,
  method = "cbps",
  estimand = "ATE"
)

summary(w_cbps$weights)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.001   1.029   1.179   2.118   1.529  60.190

bal.tab(w_cbps, thresholds = c(m = 0.1))

Balance Measures
                Type Diff.Adj    M.Threshold
prop.score  Distance  -0.2231               
age          Contin.   0.0001 Balanced, <0.1
educ         Contin.   0.0002 Balanced, <0.1
race_black    Binary   0.0000 Balanced, <0.1
race_hispan   Binary   0.0000 Balanced, <0.1
race_white    Binary  -0.0000 Balanced, <0.1
married       Binary   0.0000 Balanced, <0.1
nodegree      Binary   0.0000 Balanced, <0.1
re74         Contin.   0.0000 Balanced, <0.1
re75         Contin.   0.0000 Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1         9
Not Balanced, >0.1     0

Variable with the greatest mean difference
 Variable Diff.Adj    M.Threshold
     educ   0.0002 Balanced, <0.1

Effective sample sizes
           Control Treated
Unadjusted  429.    185.  
Adjusted    280.01   44.21

love.plot(
  w_cbps,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "CBPS weighting"
)

This method integrates balance criteria into the PS estimation step.

4.5 Overlap weights

w_overlap <- weightit(
  ps_formula,
  data     = lalonde,
  method   = "glm",
  estimand = "ATO"
)


summary(w_overlap$weights)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00908 0.04906 0.14511 0.25464 0.38398 0.97505

bal.tab(w_overlap, thresholds = c(m = 0.1))

Balance Measures
                Type Diff.Adj    M.Threshold
prop.score  Distance  -0.0284 Balanced, <0.1
age          Contin.  -0.0000 Balanced, <0.1
educ         Contin.  -0.0000 Balanced, <0.1
race_black    Binary   0.0000 Balanced, <0.1
race_hispan   Binary   0.0000 Balanced, <0.1
race_white    Binary  -0.0000 Balanced, <0.1
married       Binary  -0.0000 Balanced, <0.1
nodegree      Binary   0.0000 Balanced, <0.1
re74         Contin.  -0.0000 Balanced, <0.1
re75         Contin.  -0.0000 Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1        10
Not Balanced, >0.1     0

Variable with the greatest mean difference
 Variable Diff.Adj    M.Threshold
      age       -0 Balanced, <0.1

Effective sample sizes
           Control Treated
Unadjusted   429.   185.  
Adjusted     166.1  145.64

love.plot(
  w_overlap,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Overlap weighting"
)

Overlap weights focus on the region where treatment assignments are most comparable.

4.6 Matching weights

w_matching <- weightit(
  ps_formula,
  data     = lalonde,
  method   = "glm",
  estimand = "ATM"
)


summary(w_matching$weights)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.009163 0.051588 0.169745 0.359548 0.623320 1.000000

bal.tab(w_matching, thresholds = c(m = 0.1))

Balance Measures
                Type Diff.Adj    M.Threshold
prop.score  Distance  -0.0315 Balanced, <0.1
age          Contin.  -0.0095 Balanced, <0.1
educ         Contin.  -0.0121 Balanced, <0.1
race_black    Binary   0.0008 Balanced, <0.1
race_hispan   Binary  -0.0015 Balanced, <0.1
race_white    Binary   0.0007 Balanced, <0.1
married       Binary   0.0077 Balanced, <0.1
nodegree      Binary   0.0102 Balanced, <0.1
re74         Contin.   0.0152 Balanced, <0.1
re75         Contin.   0.0007 Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1        10
Not Balanced, >0.1     0

Variable with the greatest mean difference
 Variable Diff.Adj    M.Threshold
     re74   0.0152 Balanced, <0.1

Effective sample sizes
           Control Treated
Unadjusted  429.    185.  
Adjusted    147.32  154.18

love.plot(
  w_matching,
  thresholds = c(m = 0.1),
  abs = TRUE,
  title = "Matching weights"
)

Matching weights can be viewed as a weighting analogue to traditional PSM.

5. Outcome illustration for two designs

To avoid diluting the focus of the tutorial, outcome models are illustrated only for: - nearest neighbor matching, and - overlap weighting.

5.1 After nearest neighbor matching

dat_m <- match.data(m_nearest)

fit_match <- lm(
  re78 ~ treat,
  data = dat_m,
  weights = weights
)

tbl_regression(fit_match, estimate_fun = purrr::partial(style_sigfig, digits = 3)) %>%
  as_gt() %>%
  gt::tab_header(title = "Treatment effect after nearest neighbor matching")

Treatment effect after nearest neighbor matching
Characteristic	Beta	95% CI	p-value
treat	894	-542, 2,330	0.2
Abbreviation: CI = Confidence Interval

5.2 After overlap weighting

dat_w <- lalonde %>%
  mutate(weight_overlap = w_overlap$weights)

design_overlap <- svydesign(
  ids = ~1,
  weights = ~weight_overlap,
  data = dat_w
)

fit_overlap <- svyglm(
  re78 ~ treat,
  design = design_overlap
)

tbl_regression(fit_overlap, estimate_fun = purrr::partial(style_sigfig, digits = 3)) %>%
  as_gt() %>%
  gt::tab_header(title = "Treatment effect after overlap weighting")

Treatment effect after overlap weighting
Characteristic	Beta	95% CI	p-value
treat	1,242	-281, 2,766	0.11
Abbreviation: CI = Confidence Interval

6. Interpretation and choosing a design

No single method is universally optimal. Good practice is to compare designs with respect to: 1. Covariate balance (SMD should be less than 0.1). 2. Effective sample size (matching may drop observations). 3. Weight stability (avoid extreme or highly variable weights). 4. Target population (ATT, ATE, or ATO).

In RWE studies, overlap weighting and full matching are often strong choices because they combine good balance with efficient sample use. Nearest neighbor matching remains common because it closely resembles the familiar idea of a matched cohort.

While Nearest Neighbor matching is the most popular method, notice how Entropy Balancing and Overlap Weighting achieved near-perfect balance (SMDs \(\approx\) 0) in this dataset without discarding data. In my own practice, I am increasingly moving toward these weighting methods for survival analysis to preserve statistical power.

7. References and additional material

MatchIt documentation
https://cran.r-project.org/web/packages/MatchIt/vignettes/MatchIt.html
https://kosukeimai.github.io/MatchIt/

WeightIt documentation
https://cran.r-project.org/web/packages/WeightIt/vignettes/WeightIt.html
https://github.com/ngreifer/WeightIt

cobalt documentation
https://cran.r-project.org/web/packages/cobalt/vignettes/cobalt.html
https://github.com/ngreifer/cobalt

Lalonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4), 604-620.