Propensity Matching and the dm2200 data

Author

Thomas E. Love, Ph.D.

Published

2024-01-09

1 Setup

This document demonstrates multiple matching strategies incorporating the propensity score, including the assessment of covariate balance before and after matching. We focus on binary and quantitative outcomes in a (simulated) electronic health records data setting. It uses the cobalt package extensively. See the Key References section at the end of the document.

Code

library(knitr)

opts_chunk$set(comment = NA) # do not remove this
options(max.print="250")
opts_knit$set(width=75)

library(janitor) # load other packages as desired
library(skimr)
library(tableone)
library(broom)
library(Epi)
library(survival)
library(Matching)
library(MatchIt)
library(cobalt)
library(lme4)
library(survey)
library(rbounds)
library(tidyverse) # load tidyverse last

## Note that we will also use the broom.mixed package
## but we won't load it here

theme_set(theme_bw()) # set theme for ggplots

2 The `dm2200` data set

I’ve simulated data to match real information we’ve collected over the years at Better Health Partnership on adults who live with diabetes. These data mirror some of the real data collected from electronic health records across the region by Better Health Partnership, but individual values have been permuted across patients, so the results are not applicable to any population. The data I simulated from was a subset of Better Health data that required that the subject fall into exactly one of the two exposure groups we’ll study, that they live in Cuyahoga County, prefer English for health-related communications, and have no missing data on the variables we’ll study.

The exposure we’ll study is called exposure and consists of two levels: A and B. I won’t specify the details further on how the exposure is determined, except to say that it is uniquely determinable for each subject.
We’ll study a binary outcome, specifically whether the subject’s blood pressure is in control, in the sense that both their systolic blood pressure is below 140 mm Hg, and their diastolic blood pressure is below 90 mm Hg.
We’ll also study a continuous outcome, the subject’s body-mass index or bmi.

Code

dm2200 <- read_csv("data/dm2200.csv", 
                   show_col_types = FALSE) %>% 
    type.convert(as.is = FALSE) %>% # characters to factors
    mutate(subject = as.character(subject),
           bp_good = as.numeric(sbp < 140 & dbp < 90))

dm2200

# A tibble: 2,200 × 26
   subject exposure   age race      hisp sex   insur      nincome nhsgrad cleve
   <chr>   <fct>    <int> <fct>    <int> <fct> <fct>        <dbl>   <int> <int>
 1 S-0001  A           67 Black_AA     0 F     Medicare     24700      72     1
 2 S-0002  B           56 White        0 F     Commercial   62300      85     0
 3 S-0003  B           41 Black_AA     0 M     Medicare      6500      49     1
 4 S-0004  A           56 Black_AA     0 M     Medicaid     45400      86     0
 5 S-0005  B           69 Black_AA     0 F     Medicare     15400      72     0
 6 S-0006  B           44 Black_AA     0 M     Medicaid     34100      87     0
 7 S-0007  B           47 Black_AA     0 F     Medicaid     13900      42     1
 8 S-0008  B           60 Black_AA     0 M     Medicaid     30800      91     0
 9 S-0009  B           67 Other        1 F     Medicare     47100      90     1
10 S-0010  B           70 Black_AA     0 F     Medicare     20800      69     1
# ℹ 2,190 more rows
# ℹ 16 more variables: height_cm <int>, weight_kg <int>, bmi <dbl>, a1c <dbl>,
#   sbp <int>, dbp <int>, ldl <int>, visits <int>, tobacco <fct>, statin <int>,
#   ace_arb <int>, betab <int>, depr_dx <int>, eyeex <int>, pneumo <int>,
#   bp_good <dbl>

2.1 Codebook

Note: I used paste(colnames(dm2200), collapse = " | ") to help me make this list.

Variable	Type	Description
`subject`	character	subject identifier (S-0001 to S-2200)
`exposure`	factor (2 levels)	A or B
`age`	integer	age in years
`race`	factor (4 levels)	White, Black_AA, Asian, Other
`hisp`	1/0	1 = Hispanic or Latinx, 0 = not
`sex`	F/M	F = Female, M = Male
`insur`	factor (4 levels)	Insurance: Medicare, Commercial, Medicaid or Uninsured
`nincome`	integer	est. Neighborhood Median Income, in $
`nhsgrad`	integer	est. % of adults in Neighborhood who are High School graduates
`cleve`	1/0	1 = Cleveland resident, 0 = resident of suburbs
`height_cm`	integer	height in cm
`weight_kg`	integer	weight in kg
`bmi`	numeric	body mass index (kg/m²)
`a1c`	numeric	most recent Hemoglobin A1c (in %)
`sbp`	numeric	most recent systolic blood pressure (in mm Hg)
`dbp`	numeric	most recent diastolic blood pressure (in mm Hg)
`bp_good`	1/0	1 if `sbp` < 140 and `dbp` < 90, 0 otherwise
`ldl`	numeric	most recent LDL cholesterol (in mg/dl)
`visits`	integer	primary care office visits in past year
`tobacco`	factor (3 levels)	Tobacco use: Current, Former, Never
`statin`	1/0	1 if subject had a statin prescription in the past year
`ace_arb`	1/0	1 if subject had an ACE inhibitor or ARB prescription in the past year
`betab`	1/0	1 if subject had a beta-blocker prescription in the past year
`depr_dx`	1/0	1 if the subject has a depression diagnosis
`eyeex`	1/0	1 if the subject has had a retinal eye exam in the past year
`pneumo`	1/0	1 if the subject has had a pneumococcal vaccination in the past 10 years

2.2 Comparing Exposure Groups with `tableone`

Code

t1 <- CreateTableOne(
    vars = c("age", "race", "hisp", "sex", "insur", 
             "nincome", "nhsgrad", "cleve", "sbp", "dbp",
             "ldl", "visits", "tobacco", "statin", 
             "ace_arb", "betab", "depr_dx", "eyeex", 
             "pneumo", "bmi", "bp_good"), 
    factorVars = c("hisp", "cleve", "statin",
                   "ace_arb", "betab", "depr_dx", 
                   "eyeex", "pneumo", "bp_good"),
    strata = "exposure", 
    data = dm2200)

t1

                     Stratified by exposure
                      A                   B                   p      test
  n                        200                2000                       
  age (mean (SD))        54.90 (8.55)        58.90 (9.83)     <0.001     
  race (%)                                                    <0.001     
     Asian                   2 ( 1.0)           25 ( 1.2)                
     Black_AA              166 (83.0)         1065 (53.2)                
     Other                   1 ( 0.5)           74 ( 3.7)                
     White                  31 (15.5)          836 (41.8)                
  hisp = 1 (%)               4 ( 2.0)          104 ( 5.2)      0.068     
  sex = M (%)              128 (64.0)          842 (42.1)     <0.001     
  insur (%)                                                   <0.001     
     Commercial              7 ( 3.5)          510 (25.5)                
     Medicaid              125 (62.5)          501 (25.0)                
     Medicare               37 (18.5)          913 (45.6)                
     Uninsured              31 (15.5)           76 ( 3.8)                
  nincome (mean (SD)) 26927.50 (11642.18) 37992.55 (20854.20) <0.001     
  nhsgrad (mean (SD))    77.74 (11.50)       82.23 (11.35)    <0.001     
  cleve = 1 (%)            181 (90.5)         1109 (55.5)     <0.001     
  sbp (mean (SD))       135.88 (20.60)      132.44 (16.17)     0.005     
  dbp (mean (SD))        82.47 (9.08)        72.41 (11.94)    <0.001     
  ldl (mean (SD))        94.34 (38.62)       97.22 (36.66)     0.293     
  visits (mean (SD))      4.81 (2.62)         3.69 (1.78)     <0.001     
  tobacco (%)                                                 <0.001     
     Current               108 (54.0)          445 (22.2)                
     Former                 46 (23.0)          775 (38.8)                
     Never                  46 (23.0)          780 (39.0)                
  statin = 1 (%)           154 (77.0)         1603 (80.2)      0.334     
  ace_arb = 1 (%)          160 (80.0)         1531 (76.6)      0.310     
  betab = 1 (%)             53 (26.5)          781 (39.1)      0.001     
  depr_dx = 1 (%)           53 (26.5)          753 (37.6)      0.002     
  eyeex = 1 (%)            104 (52.0)         1225 (61.3)      0.013     
  pneumo = 1 (%)           116 (58.0)         1766 (88.3)     <0.001     
  bmi (mean (SD))        32.83 (8.22)        35.09 (8.17)     <0.001     
  bp_good = 1 (%)          117 (58.5)         1431 (71.6)     <0.001

3 Propensity for Exposure

We’ll fit a logistic regression model to predict propensity for exposure A (as compared to B), on the basis of these 18 covariates:

age, race, hisp, sex, insur, nincome, nhsgrad,
cleve, a1c, ldl, visits, tobacco, statin,
ace_arb, betab, depr_dx, eyeex, pneumo

Practically, we might well fit something more complex than a simple model with main effects, but that’s what we’ll limit ourselves to in this setting. Note that we’re not including any direct information on either of our outcomes, or the elements that go into them. In practical work, we might fit different propensity scores for each outcome, but we’re keeping things simple here.

3.1 Fitting a Propensity Model

We’ll use the f.build tool from the cobalt package here.

Code

dm2200 <- dm2200 %>%
    mutate(treat = as.logical(exposure == "A"))

covs_1 <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo)

prop_model <- glm(f.build("treat", covs_1), data = dm2200,
                  family = binomial)

tidy(prop_model, conf.int = TRUE) %>%
    select(term, estimate, std.error, conf.low, conf.high, p.value) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high	p.value
(Intercept)	-4.285	1.628	-7.645	-1.225	0.008
age	0.012	0.012	-0.011	0.037	0.305
raceBlack_AA	-0.048	1.011	-1.787	2.257	0.962
raceOther	-2.176	1.513	-5.585	0.789	0.150
raceWhite	-1.180	1.028	-2.963	1.147	0.251
hisp	0.040	0.633	-1.347	1.183	0.949
sexM	0.831	0.190	0.461	1.208	0.000
insurMedicaid	2.422	0.418	1.669	3.330	0.000
insurMedicare	0.961	0.452	0.126	1.922	0.034
insurUninsured	3.246	0.480	2.351	4.256	0.000
nincome	0.000	0.000	0.000	0.000	0.803
nhsgrad	-0.004	0.010	-0.023	0.015	0.677
cleve	1.812	0.320	1.208	2.468	0.000
a1c	-0.038	0.047	-0.132	0.052	0.412
ldl	-0.004	0.003	-0.009	0.001	0.104
visits	0.268	0.041	0.186	0.349	0.000
tobaccoFormer	-1.134	0.227	-1.586	-0.695	0.000
tobaccoNever	-1.023	0.225	-1.471	-0.587	0.000
statin	0.006	0.228	-0.434	0.461	0.978
ace_arb	0.384	0.229	-0.055	0.844	0.093
betab	-0.260	0.204	-0.665	0.135	0.202
depr_dx	-0.664	0.208	-1.078	-0.263	0.001
eyeex	-0.325	0.191	-0.699	0.050	0.089
pneumo	-1.612	0.211	-2.027	-1.200	0.000

Code

glance(prop_model)

# A tibble: 1 × 8
  null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
          <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1340.    2199  -416.  880. 1016.     832.        2176  2200

3.1.1 Storing the Propensity Scores

Code

dm2200 <- dm2200 %>%
    mutate(ps = prop_model$fitted,
           linps = prop_model$linear.predictors)

ggplot(dm2200, aes(x = exposure, y = linps)) +
    geom_violin() +
    geom_boxplot(width = 0.3)

4 `match_1` 1:1 greedy matching without replacement with the `Matching` package

We’re going to match on the linear propensity score, and define our treat (treatment) as occurring when exposure is A.

Code

match_1 <- Match(Tr = dm2200$treat, X = dm2200$linps, 
                 M = 1, replace = FALSE, ties = FALSE,
                 estimand = "ATT")

summary(match_1)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  2200 
Original number of treated obs...............  200 
Matched number of observations...............  200 
Matched number of observations  (unweighted).  200

4.1 ATT vs. ATE vs. ATC estimates

Note that in each of the matched samples we build, we’ll focus on ATT estimates (average treated effect on the treated) rather than ATE estimates. This means that in our matching we’re trying to mirror the population represented by the “treated” sample we observed.

To obtain ATE estimates rather than ATT with the Match function from the Matching package, use estimand = "ATE" in the process of developing the matched sample.
To obtain ATC estimates (average treatment effect on the controls), use estimand = "ATC".

I encourage the use of ATT estimates in your projects, where possible. I suggest also that you define the “treated” group (the one that the propensity score is estimating) to be the smaller of the two groups you have, to facilitate this approach. If you estimate ATE or ATC instead of ATT, of course, you are answering a different question than what ATT resolves.

4.2 Obtaining the Matched Sample

Now, we build a new matched sample data frame in order to do some of the analyses to come. This will contain only the matched subjects.

Code

match1_matches <- factor(rep(match_1$index.treated, 2))
dm2200_matched1 <- cbind(match1_matches, 
                         dm2200[c(match_1$index.control, 
                                  match_1$index.treated),])

Some sanity checks:

Code

dm2200_matched1 %>% count(exposure)

  exposure   n
1        A 200
2        B 200

Code

dm2200_matched1 %>% head()

  match1_matches subject exposure age     race hisp sex    insur nincome
1              1  S-0236        B  74 Black_AA    0   M Medicare   28900
2              4  S-1650        B  48 Black_AA    0   M Medicaid   22100
3             17  S-0834        B  61 Black_AA    0   M Medicaid   31500
4             26  S-0476        B  52 Black_AA    0   M Medicare   24900
5             27  S-0678        B  36 Black_AA    0   F Medicaid   16600
6             37  S-1007        B  49 Black_AA    0   M Medicaid   29800
  nhsgrad cleve height_cm weight_kg  bmi  a1c sbp dbp ldl visits tobacco statin
1      75     1       178       105 33.1  8.4 138  79  81      2  Former      1
2      80     1       172       134 45.3  5.1  98  66 121      3   Never      1
3      80     1       183        94 28.1  6.5 127  79 137      3 Current      1
4      94     1       191        84 23.0 12.4 151  87  68      2 Current      1
5      79     1       160        80 31.3 14.5 142  95 157      5 Current      1
6      82     1       170        74 25.6 12.1 138  80 126      3 Current      1
  ace_arb betab depr_dx eyeex pneumo bp_good treat         ps      linps
1       0     0       0     1      1       1 FALSE 0.03787384 -3.2348848
2       1     0       0     0      0       1 FALSE 0.62855155  0.5260079
3       1     0       1     0      1       1 FALSE 0.33963493 -0.6649215
4       1     0       1     0      1       0 FALSE 0.07490778 -2.5136357
5       0     1       0     1      0       0 FALSE 0.40466703 -0.3860563
6       1     0       0     0      1       1 FALSE 0.41799059 -0.3310277

4.3 Checking Covariate Balance for our 1:1 Greedy Match

4.3.1 Using `bal.tab` to obtain a balance table

Code

covs_1plus <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo,
           ps, linps)

bal1 <- bal.tab(match_1,
                treat = dm2200$exposure,
                covs = covs_1plus, quick = FALSE,
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)
bal1

Balance Measures
                    Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
age              Contin.  0.4076     1.3195  -0.0570      1.1678
race_Asian        Binary  0.0025          .   0.0000           .
race_Black_AA     Binary -0.2975          .   0.0150           .
race_Other        Binary  0.0320          .   0.0050           .
race_White        Binary  0.2630          .  -0.0200           .
hisp              Binary  0.0320          .  -0.0050           .
sex_M             Binary -0.2190          .  -0.0450           .
insur_Commercial  Binary  0.2200          .  -0.0150           .
insur_Medicaid    Binary -0.3745          .   0.0250           .
insur_Medicare    Binary  0.2715          .   0.0200           .
insur_Uninsured   Binary -0.1170          .  -0.0300           .
nincome          Contin.  0.5306     3.2086  -0.0071      1.3496
nhsgrad          Contin.  0.3958     0.9727   0.0595      0.8375
cleve             Binary -0.3505          .  -0.0050           .
a1c              Contin. -0.0419     0.8190  -0.0627      1.0215
ldl              Contin.  0.0783     0.9012  -0.0681      0.8558
visits           Contin. -0.6304     0.4602  -0.3316      0.7673
tobacco_Current   Binary -0.3175          .  -0.0250           .
tobacco_Former    Binary  0.1575          .   0.0150           .
tobacco_Never     Binary  0.1600          .   0.0100           .
statin            Binary  0.0315          .  -0.0650           .
ace_arb           Binary -0.0345          .  -0.0350           .
betab             Binary  0.1255          .   0.0250           .
depr_dx           Binary  0.1115          .   0.0250           .
eyeex             Binary  0.0925          .   0.0100           .
pneumo            Binary  0.3030          .   0.0850           .
ps               Contin. -2.9189     0.1576  -0.7381      0.4789
linps            Contin. -1.7842     1.3395  -0.2125      0.5375

Sample sizes
            A    B
All       200 2000
Matched   200  200
Unmatched   0 1800

4.3.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) before and after our match_1 is applied.

Code

covs_for_rubin <- dm2200 %>%
    select(linps)

rubin_m1 <- bal.tab(match_1,
                treat = dm2200$treat,
                covs = covs_for_rubin,
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)[1]

rubin_report_m1 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(rubin_m1$Balance$Diff.Un,
                  rubin_m1$Balance$V.Ratio.Un),
    Matched = c(rubin_m1$Balance$Diff.Adj,
               rubin_m1$Balance$V.Ratio.Adj))

rubin_report_m1 %>% knitr::kable(digits = 2)

status	Unmatched	Matched
Rule1	2.07	0.25
Rule2	0.75	1.86

The Rule 1 results tell us about the standardized differences expressed as proportions, so we’d like to be certain that our results are as close to zero as possible, and definitely below 0.5 in absolute value.
- Multiply these by 100 to describe them as percentages, adjusting the cutoff to below 50 in absolute value.
- Here, before matching we have a bias of 206.5005338%, and this is reduced to 24.5906223% after 1:1 greedy matching.
The Rule 2 results tell us about the variance ratio of the linear propensity scores. We want this to be within (0.5, 2) and ideally within (0.8, 1.25).
- Here, before matching we have a variance ratio of 74.6560482%, and this becomes 186.0447169% after 1:1 greedy matching.

4.3.3 Using `bal.plot` from `cobalt`

We can look at any particular variable with this approach, for example, age:

Code

bal.plot(match_1,
         treat = dm2200$exposure,
         covs = covs_1plus,
         var.name = "age", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "Matched Sample"))

We could also look at the propensity scores in each group, perhaps in mirrored histograms, with …

Code

bal.plot(match_1,
         treat = dm2200$exposure,
         covs = covs_1plus,
         var.name = "ps", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "Matched Sample"),
         type = "histogram", mirror = TRUE)

Can we look at a categorical variable this way?

Code

bal.plot(match_1,
         treat = dm2200$exposure,
         covs = covs_1plus,
         var.name = "insur", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "Matched Sample"))

4.3.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal1, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for our 1:1 Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

Code

love.plot(bal1, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          abs = TRUE,
          sample.names = c("Unmatched", "Matched"),
          title = "Absolute Differences for 1:1 Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

4.3.5 Using `love.plot` to look at Variance Ratios

Note that this will only include the variables (and summaries like ps and linps) that describe quantities. Categorical variables are dropped.

Code

love.plot(bal1, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:1 Match")

5 `match_2` 1:2 greedy matching without replacement with the `Matching` package

Again, we’ll match on the linear propensity score, and define our treat (treatment) as occurring when exposure is A. The only difference will be that we’ll allow each subject with exposure A to be matched to exactly two subjects with exposure B.

Code

match_2 <- Match(Tr = dm2200$treat, X = dm2200$linps, 
                 M = 2, replace = FALSE, ties = FALSE,
                 estimand = "ATT")

summary(match_2)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  2200 
Original number of treated obs...............  200 
Matched number of observations...............  200 
Matched number of observations  (unweighted).  400

Note that we now have 400 matched exposure “B” subjects in our matched sample.

5.1 Obtaining the Matched Sample

As before,

Code

match2_matches <- factor(rep(match_2$index.treated, 2))
dm2200_matched2 <- cbind(match2_matches, 
                         dm2200[c(match_2$index.control, 
                                  match_2$index.treated),])

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched2$subject)

[1] 600

This match repeats each exposure A subject twice, to match up with the 400 exposure B subjects.

Code

dm2200_matched2 %>% count(exposure)

  exposure   n
1        A 400
2        B 400

Code

dm2200_matched2 %>% count(subject, exposure) %>% head()

  subject exposure n
1  S-0001        A 2
2  S-0004        A 2
3  S-0007        B 1
4  S-0008        B 1
5  S-0014        B 1
6  S-0017        A 2

5.2 Checking Covariate Balance for our 1:2 Greedy Match

5.2.1 Using `bal.tab` to obtain a balance table

Code

covs_2plus <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo,
           ps, linps)

bal2 <- bal.tab(match_2,
                treat = dm2200$exposure,
                covs = covs_2plus, quick = FALSE,
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)
bal2

Balance Measures
                    Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
age              Contin.  0.4076     1.3195   0.0743      1.2677
race_Asian        Binary  0.0025          .   0.0025           .
race_Black_AA     Binary -0.2975          .  -0.0225           .
race_Other        Binary  0.0320          .   0.0025           .
race_White        Binary  0.2630          .   0.0175           .
hisp              Binary  0.0320          .   0.0075           .
sex_M             Binary -0.2190          .  -0.0725           .
insur_Commercial  Binary  0.2200          .  -0.0025           .
insur_Medicaid    Binary -0.3745          .  -0.0400           .
insur_Medicare    Binary  0.2715          .   0.0825           .
insur_Uninsured   Binary -0.1170          .  -0.0400           .
nincome          Contin.  0.5306     3.2086  -0.0219      1.3953
nhsgrad          Contin.  0.3958     0.9727   0.0198      1.0726
cleve             Binary -0.3505          .  -0.0225           .
a1c              Contin. -0.0419     0.8190  -0.0385      0.8367
ldl              Contin.  0.0783     0.9012  -0.0010      0.9078
visits           Contin. -0.6304     0.4602  -0.3344      0.6921
tobacco_Current   Binary -0.3175          .  -0.1100           .
tobacco_Former    Binary  0.1575          .   0.0650           .
tobacco_Never     Binary  0.1600          .   0.0450           .
statin            Binary  0.0315          .   0.0150           .
ace_arb           Binary -0.0345          .  -0.0050           .
betab             Binary  0.1255          .   0.0275           .
depr_dx           Binary  0.1115          .   0.0400           .
eyeex             Binary  0.0925          .   0.0425           .
pneumo            Binary  0.3030          .   0.1500           .
ps               Contin. -2.9189     0.1576  -1.4655      0.3428
linps            Contin. -1.7842     1.3395  -0.4322      0.4094

Sample sizes
            A    B
All       200 2000
Matched   200  400
Unmatched   0 1600

5.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) before and after our 1:2 greedy match_2 is applied, and compare these to the results we found in match_1 (the 1:1 match).

Code

covs_for_rubin <- dm2200 %>%
    select(linps)

rubin_m2 <- bal.tab(match_2,
                treat = dm2200$treat,
                covs = covs_for_rubin, 
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)[1]

rubin_report_m12 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(rubin_m2$Balance$Diff.Un,
                  rubin_m2$Balance$V.Ratio.Un),
    Match1 = c(rubin_m1$Balance$Diff.Adj,
               rubin_m1$Balance$V.Ratio.Adj),
    Match2 = c(rubin_m2$Balance$Diff.Adj,
               rubin_m2$Balance$V.Ratio.Adj))

rubin_report_m12 %>% knitr::kable(digits = 2)

status	Unmatched	Match1	Match2
Rule1	2.07	0.25	0.50
Rule2	0.75	1.86	2.44

Again, we’d like to see Rule 1 as close to zero as possible, and definitely below 0.5 in absolute value. Unsurprisingly, when we have to match two exposure B subjects to each exposure A subject, we don’t get matches that are as close.
The Rule 2 results tell us about the variance ratio of the linear propensity scores. We want this to be within (0.5, 2) and ideally within (0.8, 1.25). Again, here the results are a bit disappointing in comparison to what we saw in our 1:1 match.

5.2.3 Using `bal.plot` from `cobalt`

Looking at the propensity scores in each group, perhaps in mirrored histograms, we have …

Code

bal.plot(match_2,
         treat = dm2200$exposure,
         covs = covs_2plus,
         var.name = "ps", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_2 Sample"),
         type = "histogram", mirror = TRUE)

5.2.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal2, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for our 1:2 Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

5.2.5 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal2, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:2 Match")

6 `match_3` 1:3 matching, with replacement with the `Matching` package

Again, we’ll match on the linear propensity score, and define our treat (treatment) as occurring when exposure is A. But now, we’ll match with replacement (which means that multiple subject with exposure A can be matched to the same subject with exposure B) and we’ll also match each subject with exposure A to be matched to exactly three subjects with exposure B.

Code

match_3 <- Match(Tr = dm2200$treat, X = dm2200$linps, 
                 M = 3, replace = TRUE, ties = FALSE,
                 estimand = "ATT")

summary(match_3)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  2200 
Original number of treated obs...............  200 
Matched number of observations...............  200 
Matched number of observations  (unweighted).  600

Note that we now have 600 matched exposure “B” subjects in our matched sample.

6.1 Obtaining the Matched Sample

As before,

Code

match3_matches <- factor(rep(match_3$index.treated, 2))
dm2200_matched3 <- cbind(match3_matches, 
                         dm2200[c(match_3$index.control, 
                                  match_3$index.treated),])

If this was being done without replacement, this would repeat each exposure A subject three times, to match up with the 600 exposure B subjects. But here, we have a different result.

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched3$subject)

[1] 501

How many of those are in Exposure A?

Code

temp <- dm2200_matched3 %>% filter(exposure == "A") 
n_distinct(temp$subject)

[1] 200

How many of those are in Exposure B?

Code

temp <- dm2200_matched3 %>% filter(exposure == "B")
n_distinct(temp$subject)

[1] 301

Among those exposure A subjects, how many times were they used in the matches?

Code

dm2200_matched3 %>% filter(exposure == "A") %>% 
    count(subject) %>%
    tabyl(n)

 n n_n percent
 3 200       1

Among those exposure B subjects, how many times were they used in the matches?

Code

dm2200_matched3 %>% filter(exposure == "B") %>% 
    count(subject) %>%
    tabyl(n)

  n n_n     percent
  1 194 0.644518272
  2  61 0.202657807
  3  20 0.066445183
  4   9 0.029900332
  5   3 0.009966777
  6   2 0.006644518
  7   1 0.003322259
  8   1 0.003322259
  9   2 0.006644518
 10   2 0.006644518
 13   2 0.006644518
 15   1 0.003322259
 20   1 0.003322259
 23   1 0.003322259
 24   1 0.003322259

6.2 Checking Covariate Balance for our 1:3 Match

6.2.1 Using `bal.tab` to obtain a balance table

Code

covs_3plus <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo,
           ps, linps)

bal3 <- bal.tab(match_3,
                treat = dm2200$exposure,
                covs = covs_3plus, quick = FALSE,
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)
bal3

Balance Measures
                    Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
age              Contin.  0.4076     1.3195  -0.0768      1.3089
race_Asian        Binary  0.0025          .   0.0000           .
race_Black_AA     Binary -0.2975          .   0.0117           .
race_Other        Binary  0.0320          .   0.0000           .
race_White        Binary  0.2630          .  -0.0117           .
hisp              Binary  0.0320          .  -0.0033           .
sex_M             Binary -0.2190          .   0.0033           .
insur_Commercial  Binary  0.2200          .  -0.0033           .
insur_Medicaid    Binary -0.3745          .  -0.0200           .
insur_Medicare    Binary  0.2715          .   0.0133           .
insur_Uninsured   Binary -0.1170          .   0.0100           .
nincome          Contin.  0.5306     3.2086   0.0230      1.2569
nhsgrad          Contin.  0.3958     0.9727  -0.0282      0.8211
cleve             Binary -0.3505          .  -0.0017           .
a1c              Contin. -0.0419     0.8190  -0.0946      0.9794
ldl              Contin.  0.0783     0.9012   0.0926      1.0342
visits           Contin. -0.6304     0.4602  -0.0778      0.8324
tobacco_Current   Binary -0.3175          .  -0.0133           .
tobacco_Former    Binary  0.1575          .   0.0517           .
tobacco_Never     Binary  0.1600          .  -0.0383           .
statin            Binary  0.0315          .  -0.0100           .
ace_arb           Binary -0.0345          .  -0.0033           .
betab             Binary  0.1255          .  -0.0150           .
depr_dx           Binary  0.1115          .   0.0050           .
eyeex             Binary  0.0925          .  -0.0067           .
pneumo            Binary  0.3030          .  -0.0050           .
ps               Contin. -2.9189     0.1576  -0.0389      0.9513
linps            Contin. -1.7842     1.3395  -0.0268      0.8893

Sample sizes
                       A       B
All                  200 2000.  
Matched (ESS)        200  104.29
Matched (Unweighted) 200  301.  
Unmatched              0 1699.

6.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) before and after our 1:2 greedy match_2 is applied, and compare these to the results we found in match_1 (the 1:1 match).

Code

covs_for_rubin <- dm2200 %>%
    select(linps)

rubin_m3 <- bal.tab(match_3,
                treat = dm2200$treat,
                covs = covs_for_rubin, 
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)[1]

rubin_report_m123 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(rubin_m2$Balance$Diff.Un,
                  rubin_m2$Balance$V.Ratio.Un),
    Match1 = c(rubin_m1$Balance$Diff.Adj,
               rubin_m1$Balance$V.Ratio.Adj),
    Match2 = c(rubin_m2$Balance$Diff.Adj,
               rubin_m2$Balance$V.Ratio.Adj),
    Match3 = c(rubin_m3$Balance$Diff.Adj,
               rubin_m3$Balance$V.Ratio.Adj))


rubin_report_m123 %>% knitr::kable(digits = 2)

status	Unmatched	Match1	Match2	Match3
Rule1	2.07	0.25	0.50	0.03
Rule2	0.75	1.86	2.44	1.12

Again, we’d like to see Rule 1 results as close to zero as possible, and definitely below 0.5 in absolute value.
In Rule 2, we want the variance ratio of the linear propensity scores to be within (0.5, 2) and ideally within (0.8, 1.25).
It appears that (in these data) allowing the same exposure B subject to be used for multiple matches (matching with replacement) more than makes up for the fact that matching 3 exposure B’s for each exposure A (1:3 matching) is a tougher job than pair (1:1) matching, as seen in the results for Rubin’s Rule 1 and Rule 2.

6.2.3 Using `bal.plot` from `cobalt`

Looking at the propensity scores in each group, perhaps in mirrored histograms, we have …

Code

bal.plot(match_3,
         treat = dm2200$exposure,
         covs = covs_3plus,
         var.name = "ps", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_3 Sample"),
         type = "histogram", mirror = TRUE)

6.2.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal3, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          abs = TRUE,
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot of |Mean Differences| for our 1:3 Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

6.2.5 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal3, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:3 Match")

7 `match_4` Caliper Matching (1:1 without replacement) with the `Matching` package

The Match function in the Matching package allows you to specify a caliper. From the Matching help file:

A caliper is the maximum acceptable distance (on a covariate) which we are willing to accept in any match. Observations for which we cannot find a match within the caliper are dropped.Dropping observations generally changes the quantity being estimated.
The caliper is interpreted to be in standardized units. For example, caliper=.25 means that all matches not equal to or within .25 standard deviations of each covariate in X are dropped, and not matched.
- If a scalar caliper is provided to the caliper setting in the Match function, this caliper is used for all covariates in X.
- If a vector of calipers is provided, a caliper value should be provided for each covariate in X.

We’ll again perform a 1:1 match without replacement, but now we’ll do so while only accepting matches where the linear propensity score of each match is within 0.2 standard deviations of the linear PS.

Code

match_4 <- Match(Tr = dm2200$treat, X = dm2200$linps, 
                 M = 1, replace = FALSE, ties = FALSE,
                 caliper = 0.2, estimand = "ATT")

summary(match_4)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  2200 
Original number of treated obs...............  200 
Matched number of observations...............  162 
Matched number of observations  (unweighted).  162 

Caliper (SDs)........................................   0.2 
Number of obs dropped by 'exact' or 'caliper'  38

Note that we have now dropped 38 of the exposure “A” subjects, and reduced our sample to the 168 remaining exposure “A” subjects, who are paired with 162 unique matched exposure “B” subjects in our matched sample.

7.1 Obtaining the Matched Sample

As before,

Code

match4_matches <- factor(rep(match_4$index.treated, 2))
dm2200_matched4 <- cbind(match4_matches, 
                         dm2200[c(match_4$index.control, 
                                  match_4$index.treated),])

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched4$subject)

[1] 324

This match includes 162 pairs so 324 subjects, since we’ve done matching without replacement.

Code

dm2200_matched4 %>% count(exposure)

  exposure   n
1        A 162
2        B 162

7.2 Checking Covariate Balance for our 1:1 Caliper Match

7.2.1 Using `bal.tab` to obtain a balance table

Code

covs_4plus <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo,
           ps, linps)

bal4 <- bal.tab(match_4,
                treat = dm2200$exposure,
                covs = covs_4plus, quick = FALSE,
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)
bal4

Balance Measures
                    Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
age              Contin.  0.4076     1.3195   0.0245      1.1621
race_Asian        Binary  0.0025          .   0.0062           .
race_Black_AA     Binary -0.2975          .   0.0000           .
race_Other        Binary  0.0320          .   0.0000           .
race_White        Binary  0.2630          .  -0.0062           .
hisp              Binary  0.0320          .   0.0062           .
sex_M             Binary -0.2190          .  -0.0247           .
insur_Commercial  Binary  0.2200          .  -0.0123           .
insur_Medicaid    Binary -0.3745          .   0.0062           .
insur_Medicare    Binary  0.2715          .   0.0000           .
insur_Uninsured   Binary -0.1170          .   0.0062           .
nincome          Contin.  0.5306     3.2086  -0.0179      1.3446
nhsgrad          Contin.  0.3958     0.9727  -0.0348      1.0047
cleve             Binary -0.3505          .   0.0185           .
a1c              Contin. -0.0419     0.8190  -0.0633      0.9344
ldl              Contin.  0.0783     0.9012  -0.0173      0.9583
visits           Contin. -0.6304     0.4602  -0.0867      0.9154
tobacco_Current   Binary -0.3175          .  -0.0185           .
tobacco_Former    Binary  0.1575          .   0.0370           .
tobacco_Never     Binary  0.1600          .  -0.0185           .
statin            Binary  0.0315          .  -0.0494           .
ace_arb           Binary -0.0345          .  -0.0123           .
betab             Binary  0.1255          .   0.0494           .
depr_dx           Binary  0.1115          .   0.0062           .
eyeex             Binary  0.0925          .   0.0123           .
pneumo            Binary  0.3030          .  -0.0062           .
ps               Contin. -2.9189     0.1576  -0.0306      0.9596
linps            Contin. -1.7842     1.3395  -0.0072      0.9779

Sample sizes
            A    B
All       200 2000
Matched   162  162
Unmatched   0 1838
Discarded  38    0

7.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) before and after our 1:2 greedy match_4 is applied, and compare these to the results we found in match_1 (the 1:1 match).

Code

covs_for_rubin <- dm2200 %>%
    select(linps)

rubin_m4 <- bal.tab(match_4,
                treat = dm2200$treat,
                covs = covs_for_rubin, 
                data = dm2200, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)[1]

rubin_report_m1234 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(rubin_m2$Balance$Diff.Un,
                  rubin_m2$Balance$V.Ratio.Un),
    Match1 = c(rubin_m1$Balance$Diff.Adj,
               rubin_m1$Balance$V.Ratio.Adj),
    Match2 = c(rubin_m2$Balance$Diff.Adj,
               rubin_m2$Balance$V.Ratio.Adj),
    Match3 = c(rubin_m3$Balance$Diff.Adj,
               rubin_m3$Balance$V.Ratio.Adj),
    Match4 = c(rubin_m4$Balance$Diff.Adj,
               rubin_m4$Balance$V.Ratio.Adj))

rubin_report_m1234 %>% knitr::kable(digits = 2)

status	Unmatched	Match1	Match2	Match3	Match4
Rule1	2.07	0.25	0.50	0.03	0.01
Rule2	0.75	1.86	2.44	1.12	1.02

This approach produces an exceptionally strong match in terms of balance, with Rubin’s Rule 1 being very close to 0, and Rule 2 being very close to 1.
Unfortunately, we’ve only done this by dropping the 38 “hardest to match” subjects receiving exposure “A”.

7.2.3 Using `bal.plot` from `cobalt`

Looking at the propensity scores in each group, perhaps in mirrored histograms, we have …

Code

bal.plot(match_4,
         treat = dm2200$exposure,
         covs = covs_4plus,
         var.name = "ps", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_4 Sample"),
         type = "histogram", mirror = TRUE)

7.2.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal4, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for our 1:1 Caliper Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

7.2.5 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal4, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:1 Caliper Match")

8 `match_5` 1:1 Nearest Neighbor Matching with `Matchit`

The MatchIt package implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by pre-processing data with nonparametric matching methods. Read more about MatchIt at https://gking.harvard.edu/matchit.

With MatchIt, the matching is done using the matchit(treat ~ X, ...) function, where treat is the vector of treatment assignments and X are the covariates to be used in the matching.

We’ll start our exploration of the MatchIt approach to developing matches with nearest neighbor matching, which (quoting the MatchIt manual…)

… selects the r (default = 1, specified by the ratio option) best control matches for each individual in the treated group, using a distance measure specified by the distance option (default = logit). Matches are chosen for each treated unit one at a time, with the order specified by the m.order command (default=largest to smallest). At each matching step we choose the control unit that is not yet matched but is closest to the treated unit on the distance measure.

The full syntax (with default choices indicated) is

matchit(formula, data=NULL, discard=0, exact=FALSE, 
         replace=FALSE, ratio=1, model="logit", 
         reestimate=FALSE, nearest=TRUE, m.order=2, 
         caliper=0, calclosest=FALSE, mahvars=NULL, 
         subclass=0, sub.by="treat", counter=TRUE, 
         full=FALSE, full.options=list(), ...)

Here, we’ll match on the linear propensity score (by specifying the distance to use the logistic link with the linear propensity score via "linear.logit"), and we’ll perform 1:1 nearest neighbor matching without replacement using the default ordering (largest to smallest). Since we’ve already seen that greedy 1:1 matching without replacement doesn’t work well in this setting, we’re not expecting a strong result in terms of balance here, either.

Code

dm2200 <- dm2200 %>%
    mutate(treat = as.logical(exposure == "A"))

covs_1 <- dm2200 %>%
    select(age, race, hisp, sex, insur, nincome,
           nhsgrad, cleve, a1c, ldl, visits, tobacco,
           statin, ace_arb, betab, depr_dx, eyeex, pneumo)

match_5 <- matchit(f.build("treat", covs_1), data = dm2200,
                   distance = "glm", link = "linear.logit",
                   method = "nearest", 
                   ratio = 1, replace = FALSE)

match_5

A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression and linearized
 - number of obs.: 2200 (original), 400 (matched)
 - target estimand: ATT
 - covariates: age, race, hisp, sex, insur, nincome, nhsgrad, cleve, a1c, ldl, visits, tobacco, statin, ace_arb, betab, depr_dx, eyeex, pneumo

8.1 Obtaining the Matched Sample

There is just one tricky part to doing this in MatchIt. The main work can be done with a very simple command.

Code

dm2200_matched5 <- match.data(match_5)

This leaves only the job of creating a matching number, for which we have to develop some additional R code.

Code

# Thanks to Robert McDonald at Mayo
# https://lists.gking.harvard.edu/pipermail/matchit/2012-April/000458.html

len <- dim(dm2200)[1]
matchx <- rep(NA,len)
len2 <- length(match_5$match.matrix)
count <- 1
for(i in 1:len2){
    
    match1 <- match_5$match.matrix[i]
    match2 <- row.names(match_5$match.matrix)[i]
    
    if(!is.na(match1)){
    matchx[as.numeric(match1)] <- count
    matchx[as.numeric(match2)] <- count
    count <- count+1}
    
}

dm2200_matched5 <- dm2200_matched5 %>%
    mutate(match5_matches = 
               matchx[as.numeric
                      (row.names(dm2200_matched5))]) %>%
    mutate(match5_matches = factor(match5_matches))

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched5$subject)

[1] 400

This match includes 200 pairs so 400 subjects, as we’ve done matching without replacement.

Code

dm2200_matched5 %>% count(exposure)

# A tibble: 2 × 2
  exposure     n
  <fct>    <int>
1 A          200
2 B          200

8.2 Checking Covariate Balance for our 1:1 Nearest Neighbor Match

8.2.1 Default Numerical Balance Summary from `MatchIt`

Code

summary(match_5)


Call:
matchit(formula = f.build("treat", covs_1), data = dm2200, method = "nearest", 
    distance = "glm", link = "linear.logit", replace = FALSE, 
    ratio = 1)

Summary of Balance for All Data:
                Means Treated Means Control Std. Mean Diff. Var. Ratio
distance              -0.6577       -4.0979          2.0650     0.7466
age                   54.8950       58.9000         -0.4682     0.7579
raceAsian              0.0100        0.0125         -0.0251          .
raceBlack_AA           0.8300        0.5325          0.7920          .
raceOther              0.0050        0.0370         -0.4537          .
raceWhite              0.1550        0.4180         -0.7267          .
hisp                   0.0200        0.0520         -0.2286          .
sexF                   0.3600        0.5790         -0.4562          .
sexM                   0.6400        0.4210          0.4563          .
insurCommercial        0.0350        0.2550         -1.1971          .
insurMedicaid          0.6250        0.2505          0.7736          .
insurMedicare          0.1850        0.4565         -0.6992          .
insurUninsured         0.1550        0.0380          0.3233          .
nincome            26927.5000    37992.5500         -0.9504     0.3117
nhsgrad               77.7350       82.2260         -0.3904     1.0281
cleve                  0.9050        0.5545          1.1954          .
a1c                    7.7655        7.6868          0.0379     1.2209
ldl                   94.3450       97.2160         -0.0743     1.1096
visits                 4.8100        3.6885          0.4276     2.1732
tobaccoCurrent         0.5400        0.2225          0.6370          .
tobaccoFormer          0.2300        0.3875         -0.3743          .
tobaccoNever           0.2300        0.3900         -0.3802          .
statin                 0.7700        0.8015         -0.0749          .
ace_arb                0.8000        0.7655          0.0863          .
betab                  0.2650        0.3905         -0.2844          .
depr_dx                0.2650        0.3765         -0.2526          .
eyeex                  0.5200        0.6125         -0.1851          .
pneumo                 0.5800        0.8830         -0.6139          .
                eCDF Mean eCDF Max
distance           0.4093   0.6595
age                0.0801   0.2350
raceAsian          0.0025   0.0025
raceBlack_AA       0.2975   0.2975
raceOther          0.0320   0.0320
raceWhite          0.2630   0.2630
hisp               0.0320   0.0320
sexF               0.2190   0.2190
sexM               0.2190   0.2190
insurCommercial    0.2200   0.2200
insurMedicaid      0.3745   0.3745
insurMedicare      0.2715   0.2715
insurUninsured     0.1170   0.1170
nincome            0.1442   0.3445
nhsgrad            0.0724   0.2295
cleve              0.3505   0.3505
a1c                0.0146   0.0465
ldl                0.0189   0.0670
visits             0.0743   0.2150
tobaccoCurrent     0.3175   0.3175
tobaccoFormer      0.1575   0.1575
tobaccoNever       0.1600   0.1600
statin             0.0315   0.0315
ace_arb            0.0345   0.0345
betab              0.1255   0.1255
depr_dx            0.1115   0.1115
eyeex              0.0925   0.0925
pneumo             0.3030   0.3030

Summary of Balance for Matched Data:
                Means Treated Means Control Std. Mean Diff. Var. Ratio
distance              -0.6577       -1.0679          0.2462     1.8590
age                   54.8950       54.2500          0.0754     0.8059
raceAsian              0.0100        0.0100          0.0000          .
raceBlack_AA           0.8300        0.8300          0.0000          .
raceOther              0.0050        0.0000          0.0709          .
raceWhite              0.1550        0.1600         -0.0138          .
hisp                   0.0200        0.0150          0.0357          .
sexF                   0.3600        0.4000         -0.0833          .
sexM                   0.6400        0.6000          0.0833          .
insurCommercial        0.0350        0.0400         -0.0272          .
insurMedicaid          0.6250        0.6450         -0.0413          .
insurMedicare          0.1850        0.1800          0.0129          .
insurUninsured         0.1550        0.1350          0.0553          .
nincome            26927.5000    25385.5000          0.1324     1.1096
nhsgrad               77.7350       77.4650          0.0235     1.0794
cleve                  0.9050        0.9050          0.0000          .
a1c                    7.7655        7.8350         -0.0334     0.9035
ldl                   94.3450       94.4900         -0.0038     1.0865
visits                 4.8100        4.3400          0.1792     1.3150
tobaccoCurrent         0.5400        0.4950          0.0903          .
tobaccoFormer          0.2300        0.2650         -0.0832          .
tobaccoNever           0.2300        0.2400         -0.0238          .
statin                 0.7700        0.7150          0.1307          .
ace_arb                0.8000        0.7650          0.0875          .
betab                  0.2650        0.3100         -0.1020          .
depr_dx                0.2650        0.2850         -0.0453          .
eyeex                  0.5200        0.5300         -0.0200          .
pneumo                 0.5800        0.6600         -0.1621          .
                eCDF Mean eCDF Max Std. Pair Dist.
distance           0.0120    0.230          0.2469
age                0.0198    0.060          1.1755
raceAsian          0.0000    0.000          0.0200
raceBlack_AA       0.0000    0.000          0.2700
raceOther          0.0050    0.005          0.0709
raceWhite          0.0050    0.005          0.7322
hisp               0.0050    0.005          0.2500
sexF               0.0400    0.040          0.9583
sexM               0.0400    0.040          0.9583
insurCommercial    0.0050    0.005          0.2993
insurMedicaid      0.0200    0.020          0.8882
insurMedicare      0.0050    0.005          0.6310
insurUninsured     0.0200    0.020          0.6355
nincome            0.0290    0.140          1.0179
nhsgrad            0.0183    0.050          1.0605
cleve              0.0000    0.000          0.1500
a1c                0.0161    0.070          1.0819
ldl                0.0099    0.045          1.0769
visits             0.0320    0.125          0.9456
tobaccoCurrent     0.0450    0.045          0.8929
tobaccoFormer      0.0350    0.035          0.8436
tobaccoNever       0.0100    0.010          0.8079
statin             0.0550    0.055          0.9861
ace_arb            0.0350    0.035          0.8625
betab              0.0450    0.045          0.8950
depr_dx            0.0200    0.020          0.8837
eyeex              0.0100    0.010          0.8807
pneumo             0.0800    0.080          0.8510

Sample Sizes:
          Control Treated
All          2000     200
Matched       200     200
Unmatched    1800       0
Discarded       0       0

8.2.2 Using `bal.tab` to obtain a balance table

Code

bal5 <- bal.tab(match_5, un = TRUE, disp.v.ratio = TRUE)

bal5

Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
distance         Distance  2.0650     0.7466   0.2462      1.8590
age               Contin. -0.4682     0.7579   0.0754      0.8059
race_Asian         Binary -0.0025          .   0.0000           .
race_Black_AA      Binary  0.2975          .   0.0000           .
race_Other         Binary -0.0320          .   0.0050           .
race_White         Binary -0.2630          .  -0.0050           .
hisp               Binary -0.0320          .   0.0050           .
sex_M              Binary  0.2190          .   0.0400           .
insur_Commercial   Binary -0.2200          .  -0.0050           .
insur_Medicaid     Binary  0.3745          .  -0.0200           .
insur_Medicare     Binary -0.2715          .   0.0050           .
insur_Uninsured    Binary  0.1170          .   0.0200           .
nincome           Contin. -0.9504     0.3117   0.1324      1.1096
nhsgrad           Contin. -0.3904     1.0281   0.0235      1.0794
cleve              Binary  0.3505          .   0.0000           .
a1c               Contin.  0.0379     1.2209  -0.0334      0.9035
ldl               Contin. -0.0743     1.1096  -0.0038      1.0865
visits            Contin.  0.4276     2.1732   0.1792      1.3150
tobacco_Current    Binary  0.3175          .   0.0450           .
tobacco_Former     Binary -0.1575          .  -0.0350           .
tobacco_Never      Binary -0.1600          .  -0.0100           .
statin             Binary -0.0315          .   0.0550           .
ace_arb            Binary  0.0345          .   0.0350           .
betab              Binary -0.1255          .  -0.0450           .
depr_dx            Binary -0.1115          .  -0.0200           .
eyeex              Binary -0.0925          .  -0.0100           .
pneumo             Binary -0.3030          .  -0.0800           .

Sample sizes
          Control Treated
All          2000     200
Matched       200     200
Unmatched    1800       0

8.2.3 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) results before and after match_5 is applied.

Code

rubin_report_m5 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(bal5$Balance$Diff.Un[1],
                  bal5$Balance$V.Ratio.Un[1]),
    Match5 = c(bal5$Balance$Diff.Adj[1],
               bal5$Balance$V.Ratio.Adj[1]))

rubin_report_m5 %>% knitr::kable(digits = 2)

status	Unmatched	Match5
Rule1	2.07	0.25
Rule2	0.75	1.86

As we’d expect based on our previous greedy 1:1 match, this isn’t great sufficiently strong balance.

8.2.4 Using `bal.plot` from `cobalt`

Looking at the linear propensity scores in each group, in mirrored histograms, we have …

Code

bal.plot(match_5,
         var.name = "distance", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_5 Sample"),
         type = "histogram", mirror = TRUE)

8.2.5 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal5, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for our 1:1 Nearest Neighbor Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

8.2.6 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal5, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:1 Nearest Neighbor Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

9 `match_6` 1:1 Nearest Neighbor Caliper Matching with `Matchit`

As in match_5, we’ll match on the linear propensity score (by specifying the distance to use the logistic link with the linear propensity score via "linear.logit"), and we’ll perform 1:1 nearest neighbor matching without replacement using the default ordering (largest to smallest), but we’ll add a some arguments to build a specific kind caliper match. Specifically, we’ll require our matches to be within 0.25 standard deviations of each other on the linear propensity score.

Code

match_6 <- matchit(f.build("treat", covs_1), data = dm2200,
                   distance = "glm", link = "linear.logit", 
                   method = "nearest", caliper = 0.25,
                   ratio = 1, replace = FALSE)

match_6

A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score [caliper]
             - estimated with logistic regression and linearized
 - caliper: <distance> (0.537)
 - number of obs.: 2200 (original), 344 (matched)
 - target estimand: ATT
 - covariates: age, race, hisp, sex, insur, nincome, nhsgrad, cleve, a1c, ldl, visits, tobacco, statin, ace_arb, betab, depr_dx, eyeex, pneumo

9.1 Obtaining the Matched Sample

There is just one tricky part to doing this in MatchIt. The main work can be done with a very simple command.

Code

dm2200_matched6 <- match.data(match_6)

This leaves only the job of creating a matching number, for which we have to develop some additional R code.

Code

# Thanks to Robert McDonald at Mayo
# https://lists.gking.harvard.edu/pipermail/matchit/2012-April/000458.html

len <- dim(dm2200)[1]
matchx <- rep(NA,len)
len2 <- length(match_6$match.matrix)
count <- 1
for(i in 1:len2){
    
    match1 <- match_6$match.matrix[i]
    match2 <- row.names(match_6$match.matrix)[i]
    
    if(!is.na(match1)){
    matchx[as.numeric(match1)] <- count
    matchx[as.numeric(match2)] <- count
    count <- count+1}
    
}

dm2200_matched6 <- dm2200_matched6 %>%
    mutate(match6_matches = 
               matchx[as.numeric
                      (row.names(dm2200_matched6))]) %>%
    mutate(match6_matches = factor(match6_matches))

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched6$subject)

[1] 344

This match includes 172 pairs so 344 subjects, as we’ve done matching without replacement.

Code

dm2200_matched6 %>% count(exposure)

# A tibble: 2 × 2
  exposure     n
  <fct>    <int>
1 A          172
2 B          172

9.2 Checking Covariate Balance for our 1:1 Nearest Neighbor Caliper Match

9.2.1 Using `bal.tab` to obtain a balance table

Code

bal6 <- bal.tab(match_6, un = TRUE, disp.v.ratio = TRUE)

bal6

Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
distance         Distance  2.0650     0.7466   0.0612      1.2154
age               Contin. -0.4682     0.7579   0.0068      0.8700
race_Asian         Binary -0.0025          .   0.0000           .
race_Black_AA      Binary  0.2975          .  -0.0116           .
race_Other         Binary -0.0320          .   0.0058           .
race_White         Binary -0.2630          .   0.0058           .
hisp               Binary -0.0320          .   0.0116           .
sex_M              Binary  0.2190          .   0.0233           .
insur_Commercial   Binary -0.2200          .  -0.0058           .
insur_Medicaid     Binary  0.3745          .   0.0000           .
insur_Medicare     Binary -0.2715          .   0.0000           .
insur_Uninsured    Binary  0.1170          .   0.0058           .
nincome           Contin. -0.9504     0.3117   0.1677      1.1339
nhsgrad           Contin. -0.3904     1.0281   0.1137      0.9119
cleve              Binary  0.3505          .   0.0000           .
a1c               Contin.  0.0379     1.2209  -0.0011      0.9429
ldl               Contin. -0.0743     1.1096  -0.0188      1.1241
visits            Contin.  0.4276     2.1732   0.0887      1.2449
tobacco_Current    Binary  0.3175          .   0.0000           .
tobacco_Former     Binary -0.1575          .  -0.0174           .
tobacco_Never      Binary -0.1600          .   0.0174           .
statin             Binary -0.0315          .   0.0349           .
ace_arb            Binary  0.0345          .   0.0174           .
betab              Binary -0.1255          .  -0.0465           .
depr_dx            Binary -0.1115          .   0.0174           .
eyeex              Binary -0.0925          .  -0.0233           .
pneumo             Binary -0.3030          .   0.0000           .

Sample sizes
          Control Treated
All          2000     200
Matched       172     172
Unmatched    1828      28

9.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) results before and after match_6 is applied, and compare these to the match_5 results.

Code

rubin_report_m56 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(bal6$Balance$Diff.Un[1],
                  bal6$Balance$V.Ratio.Un[1]),
    Match5 = c(bal5$Balance$Diff.Adj[1],
               bal5$Balance$V.Ratio.Adj[1]),
    Match6 = c(bal6$Balance$Diff.Adj[1],
               bal6$Balance$V.Ratio.Adj[1])
    )

rubin_report_m56 %>% knitr::kable(digits = 2)

status	Unmatched	Match5	Match6
Rule1	2.07	0.25	0.06
Rule2	0.75	1.86	1.22

So the caliper appears to help quite a bit to improve the results that we saw in 1:1 nearest neighbor matching without replacement, at the cost of not including 28 of the treated subjects in the match.

9.2.3 Using `bal.plot` from `cobalt`

Looking at the linear propensity scores in each group, in mirrored histograms, we have …

Code

bal.plot(match_6,
         var.name = "distance", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_6 Sample"),
         type = "histogram", mirror = TRUE)

9.2.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal6, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for 1:1 Nearest Neighbor Caliper Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

9.2.5 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal6, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for 1:1 Nearest Neighbor Caliper Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

10 `match_7` 1:1 Optimal Matching with `Matchit`

As in match_5, we’ll match on the linear propensity score (by specifying the distance to use the logistic link with the linear propensity score via "linear.logit"), but now we’ll use an optimal 1:1 matching (which is always done in matchit without replacement).

Code

match_7 <- matchit(f.build("treat", covs_1), data = dm2200,
                   distance = "glm", link = "linear.logit",
                   method = "optimal", ratio = 1)

match_7

A matchit object
 - method: 1:1 optimal pair matching
 - distance: Propensity score
             - estimated with logistic regression and linearized
 - number of obs.: 2200 (original), 400 (matched)
 - target estimand: ATT
 - covariates: age, race, hisp, sex, insur, nincome, nhsgrad, cleve, a1c, ldl, visits, tobacco, statin, ace_arb, betab, depr_dx, eyeex, pneumo

10.1 Obtaining the Matched Sample

As before, much of the work can be done with match.data.

Code

dm2200_matched7 <- match.data(match_7)

This leaves only the job of creating a matching number, for which we have to develop some additional R code.

Code

# Thanks to Robert McDonald at Mayo
# https://lists.gking.harvard.edu/pipermail/matchit/2012-April/000458.html

len <- dim(dm2200)[1]
matchx <- rep(NA,len)
len2 <- length(match_7$match.matrix)
count <- 1
for(i in 1:len2){
    
    match1 <- match_7$match.matrix[i]
    match2 <- row.names(match_7$match.matrix)[i]
    
    if(!is.na(match1)){
    matchx[as.numeric(match1)] <- count
    matchx[as.numeric(match2)] <- count
    count <- count+1}
    
}

dm2200_matched7 <- dm2200_matched7 %>%
    mutate(match7_matches = 
               matchx[as.numeric
                      (row.names(dm2200_matched7))]) %>%
    mutate(match7_matches = factor(match7_matches))

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched7$subject)

[1] 400

This match includes 200 pairs so 400 subjects, as we’ve done matching without replacement.

Code

dm2200_matched7 %>% count(exposure)

# A tibble: 2 × 2
  exposure     n
  <fct>    <int>
1 A          200
2 B          200

10.2 Checking Covariate Balance for our 1:1 Optimal Match

10.2.1 Using `bal.tab` to obtain a balance table

Code

bal7 <- bal.tab(match_7, un = TRUE, disp.v.ratio = TRUE)

bal7

Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
distance         Distance  2.0650     0.7466   0.2460      1.8596
age               Contin. -0.4682     0.7579   0.0737      0.8136
race_Asian         Binary -0.0025          .  -0.0050           .
race_Black_AA      Binary  0.2975          .   0.0100           .
race_Other         Binary -0.0320          .   0.0000           .
race_White         Binary -0.2630          .  -0.0050           .
hisp               Binary -0.0320          .   0.0000           .
sex_M              Binary  0.2190          .   0.0200           .
insur_Commercial   Binary -0.2200          .   0.0100           .
insur_Medicaid     Binary  0.3745          .  -0.0250           .
insur_Medicare     Binary -0.2715          .   0.0000           .
insur_Uninsured    Binary  0.1170          .   0.0150           .
nincome           Contin. -0.9504     0.3117   0.1099      1.0224
nhsgrad           Contin. -0.3904     1.0281  -0.0022      1.0744
cleve              Binary  0.3505          .   0.0000           .
a1c               Contin.  0.0379     1.2209  -0.0214      0.9134
ldl               Contin. -0.0743     1.1096   0.0154      1.0683
visits            Contin.  0.4276     2.1732   0.1964      1.3016
tobacco_Current    Binary  0.3175          .   0.0450           .
tobacco_Former     Binary -0.1575          .  -0.0400           .
tobacco_Never      Binary -0.1600          .  -0.0050           .
statin             Binary -0.0315          .   0.0400           .
ace_arb            Binary  0.0345          .   0.0450           .
betab              Binary -0.1255          .  -0.0600           .
depr_dx            Binary -0.1115          .  -0.0300           .
eyeex              Binary -0.0925          .   0.0150           .
pneumo             Binary -0.3030          .  -0.0950           .

Sample sizes
          Control Treated
All          2000     200
Matched       200     200
Unmatched    1800       0

10.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) results before and after match_6 is applied, and compare these to the match_5 results.

Code

rubin_report_m567 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(bal6$Balance$Diff.Un[1],
                  bal6$Balance$V.Ratio.Un[1]),
    Match5 = c(bal5$Balance$Diff.Adj[1],
               bal5$Balance$V.Ratio.Adj[1]),
    Match6 = c(bal6$Balance$Diff.Adj[1],
               bal6$Balance$V.Ratio.Adj[1]),
    Match7 = c(bal7$Balance$Diff.Adj[1],
               bal7$Balance$V.Ratio.Adj[1])
    )

rubin_report_m567 %>% knitr::kable(digits = 2)

status	Unmatched	Match5	Match6	Match7
Rule1	2.07	0.25	0.06	0.25
Rule2	0.75	1.86	1.22	1.86

We note here that the optimal matching here does very little, if anything, to improved on the nearest neighbor match.

10.2.3 Are the Optimal and Nearest Neighbor Matches the Same?

Code

m5_subs <- dm2200_matched5 %>% select(subject)
m7_subs <- dm2200_matched7 %>% select(subject)

all.equal(m5_subs, m7_subs)

[1] "Component \"subject\": 337 string mismatches"

Apparently not, but they are very similar.

10.2.4 Using `bal.plot` from `cobalt`

Looking at the linear propensity scores in each group, in mirrored histograms, we have …

Code

bal.plot(match_7,
         var.name = "distance", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_7 Sample"),
         type = "histogram", mirror = TRUE)

10.2.5 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal7, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for 1:1 Optimal Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

10.2.6 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal7, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for 1:1 Optimal Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

11 `match_8` 1:2 Optimal Matching with `Matchit`

Again, we’ll match on the linear propensity score (by specifying the distance to use the logistic link with the linear propensity score via "linear.logit"), but now we’ll perform 1:2 (so ratio = 2) optimal (so `method = “optimal”) matching without replacement using the default ordering (largest to smallest).

Code

match_8 <- matchit(f.build("treat", covs_1), data = dm2200,
                   distance = "glm", link = "linear.logit", 
                   method = "optimal", ratio = 2)

match_8

A matchit object
 - method: 2:1 optimal pair matching
 - distance: Propensity score
             - estimated with logistic regression and linearized
 - number of obs.: 2200 (original), 600 (matched)
 - target estimand: ATT
 - covariates: age, race, hisp, sex, insur, nincome, nhsgrad, cleve, a1c, ldl, visits, tobacco, statin, ace_arb, betab, depr_dx, eyeex, pneumo

11.1 Obtaining the Matched Sample

As before, much of the work can be done with a very simple command.

Code

dm2200_matched8 <- match.data(match_8)

This leaves only the job of creating a matching number, for which we have to develop some additional R code, and this needs some modification because we now have two matched controls for each treated subject.

Code

# Thanks to Robert McDonald at Mayo
# https://lists.gking.harvard.edu/pipermail/matchit/2012-April/000458.html

mm1 <- match_8$match.matrix[,1]
mm2 <- match_8$match.matrix[,2]

len <- dim(dm2200)[1]
matchx <- rep(NA,len)
len2 <- length(match_8$match.matrix)
count <- 1
for(i in 1:len2){
    
    match_tr <- row.names(match_8$match.matrix)[i]
    match_con1 <- mm1[i]
    match_con2 <- mm2[i]
    
    if(!is.na(match_tr) & !is.na(match_con1) & !is.na(match_con2)){
    matchx[as.numeric(match_con1)] <- count
    matchx[as.numeric(match_con2)] <- count
    matchx[as.numeric(match_tr)] <- count
    count <- count+1}
    
}

dm2200_matched8 <- dm2200_matched8 %>%
     mutate(match8_matches = 
            matchx[as.numeric(row.names(dm2200_matched8))])

How many unique subjects are in our matched sample?

Code

n_distinct(dm2200_matched8$subject)

[1] 600

This match includes 200 triplets (1 treated and 2 control) so 600 subjects, and again we’ve done matching without replacement.

Code

dm2200_matched8 %>% count(exposure)

# A tibble: 2 × 2
  exposure     n
  <fct>    <int>
1 A          200
2 B          400

11.2 Checking Covariate Balance for our 1:2 Optimal Match

11.2.1 Using `bal.tab` to obtain a balance table

Code

bal8 <- bal.tab(match_8, un = TRUE, disp.v.ratio = TRUE)

bal8

Balance Measures
                     Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
distance         Distance  2.0650     0.7466   0.5002      2.4424
age               Contin. -0.4682     0.7579  -0.0605      0.8144
race_Asian         Binary -0.0025          .   0.0000           .
race_Black_AA      Binary  0.2975          .   0.0225           .
race_Other         Binary -0.0320          .   0.0000           .
race_White         Binary -0.2630          .  -0.0225           .
hisp               Binary -0.0320          .  -0.0050           .
sex_M              Binary  0.2190          .   0.0700           .
insur_Commercial   Binary -0.2200          .  -0.0025           .
insur_Medicaid     Binary  0.3745          .   0.0350           .
insur_Medicare     Binary -0.2715          .  -0.0800           .
insur_Uninsured    Binary  0.1170          .   0.0475           .
nincome           Contin. -0.9504     0.3117   0.0429      0.7310
nhsgrad           Contin. -0.3904     1.0281  -0.0139      0.9187
cleve              Binary  0.3505          .   0.0175           .
a1c               Contin.  0.0379     1.2209   0.0463      1.1943
ldl               Contin. -0.0743     1.1096  -0.0223      1.0845
visits            Contin.  0.4276     2.1732   0.2097      1.3355
tobacco_Current    Binary  0.3175          .   0.1075           .
tobacco_Former     Binary -0.1575          .  -0.0550           .
tobacco_Never      Binary -0.1600          .  -0.0525           .
statin             Binary -0.0315          .  -0.0025           .
ace_arb            Binary  0.0345          .   0.0075           .
betab              Binary -0.1255          .  -0.0375           .
depr_dx            Binary -0.1115          .  -0.0600           .
eyeex              Binary -0.0925          .  -0.0225           .
pneumo             Binary -0.3030          .  -0.1475           .

Sample sizes
          Control Treated
All          2000     200
Matched       400     200
Unmatched    1600       0

11.2.2 Checking Rubin’s Rules 1 and 2

We’ll build a little table of the Rubin’s Rules (1 and 2) results before and after match_8 is applied, and compare these to the other MatchIt results we’ve developed.

Code

rubin_report_m5678 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(bal8$Balance$Diff.Un[1],
                  bal8$Balance$V.Ratio.Un[1]),
    Match5 = c(bal5$Balance$Diff.Adj[1],
               bal5$Balance$V.Ratio.Adj[1]),
    Match6 = c(bal6$Balance$Diff.Adj[1],
               bal6$Balance$V.Ratio.Adj[1]),
    Match7 = c(bal7$Balance$Diff.Adj[1],
               bal7$Balance$V.Ratio.Adj[1]),
    Match8 = c(bal8$Balance$Diff.Adj[1],
               bal8$Balance$V.Ratio.Adj[1])
    )

rubin_report_m5678 %>% knitr::kable(digits = 2)

status	Unmatched	Match5	Match6	Match7	Match8
Rule1	2.07	0.25	0.06	0.25	0.50
Rule2	0.75	1.86	1.22	1.86	2.44

So the optimal matching doesn’t look very strong here. Of course, we’re matching 1:2 in this situation.

11.2.3 Using `bal.plot` from `cobalt`

Looking at the linear propensity scores in each group, in mirrored histograms, we have …

Code

bal.plot(match_8,
         var.name = "distance", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_8 Sample"),
         type = "histogram", mirror = TRUE)

11.2.4 Using `love.plot` to look at Standardized Differences

Code

love.plot(bal8, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for 1:2 Optimal Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

11.2.5 Using `love.plot` to look at Variance Ratios

Again, the categorical variables are dropped.

Code

love.plot(bal8, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for 1:2 Optimal Match") +
    labs(subtitle = "distance = linear propensity score",
         caption = "* indicates raw mean differences (for binary variables)")

12 Outcome Models

We’ll fit two (overly simplistic) outcome models, one for bp_good (our binary outcome) and another for bmi (our quantitative outcome.) Later, we’ll compare the exposure effect estimates made here to the estimates we obtain after propensity matching. In each case, we’ll focus on ATT estimates (average treated effect on the treated) rather than ATE estimates.

12.1 Potential Fitting Problems

When you fit mixed effect models as I have done below for binary and for quantitative outcomes, you may find that the model fit appears to be singular, which means that some element of the variance-covariance matrix estimated by lmer is zero. This may be an ignorable problem in this setting, since the problem is usually with the separation of residual effects from random match effects. We’re focusing right now on the fixed effects, which can be summarized as indicated below.

But, there’s certainly an argument to be made that an alternative strategy for estimating the match might be more appropriate. We won’t worry about that in this example, and just display what we get.

12.2 Unadjusted Models prior to Propensity Matching

12.2.1 Unadjusted Outcome Model for `bp_good`

Code

unadj_mod1 <- glm(bp_good == 1 ~ exposure == "A", data = dm2200, 
                  family = binomial())

tidy(unadj_mod1, exponentiate = TRUE, 
     conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	2.515	0.050	2.284	2.773
exposure == “A”TRUE	0.561	0.152	0.417	0.757

12.2.2 Unadjusted Outcome Model for `bmi`

Code

unadj_mod2 <- lm(bmi ~ exposure == "A", data = dm2200)

tidy(unadj_mod2, conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	35.087	0.183	34.729	35.446
exposure == “A”TRUE	-2.260	0.606	-3.449	-1.071

12.3 Adjusted Outcome Models after `match1`

12.3.1 Binary Outcome: `bp_good`

Code

result_match1_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match1_matches),
                      data = dm2200_matched1)

tidy(result_match1_bp, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0.632	0.213	0.416	0.959

12.3.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

dm2200_matched1 <- dm2200_matched1 %>% 
    mutate(match1_matches_f = as.factor(match1_matches))

result_match1_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match1_matches_f), 
                          data = dm2200_matched1)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(result_match1_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.972	0.606	33.783	36.160
exposure == “A”TRUE	-2.145	0.858	-3.825	-0.464

12.4 Adjusted Outcome Models after `match2`

12.4.1 Binary Outcome: `bp_good`

Code

result_match2_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match2_matches),
                      data = dm2200_matched2)

tidy(result_match2_bp, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0.441	0.175	0.313	0.621

12.4.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

dm2200_matched2 <- dm2200_matched2 %>% 
    mutate(match2_matches_f = as.factor(match2_matches))

result_match2_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match2_matches_f), 
                          data = dm2200_matched2)

broom.mixed::tidy(result_match2_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.665	0.460	33.763	35.568
exposure == “A”TRUE	-1.838	0.557	-2.930	-0.747

12.5 Adjusted Outcome Models after `match3`

12.5.1 Binary Outcome: `bp_good`

Code

result_match3_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match3_matches),
                      data = dm2200_matched3)

tidy(result_match3_bp, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0.607	0.139	0.463	0.797

12.5.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

dm2200_matched3 <- dm2200_matched3 %>% 
    mutate(match3_matches_f = as.factor(match3_matches))

result_match3_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match3_matches_f), 
                          data = dm2200_matched3)

broom.mixed::tidy(result_match3_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.654	0.416	33.839	35.468
exposure == “A”TRUE	-1.827	0.441	-2.691	-0.962

12.6 Adjusted Outcome Models after `match4`

12.6.1 Binary Outcome: `bp_good`

Code

result_match4_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match4_matches),
                      data = dm2200_matched4)

tidy(result_match4_bp, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0.619	0.25	0.38	1.01

12.6.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

dm2200_matched4 <- dm2200_matched4 %>% 
    mutate(match4_matches_f = as.factor(match4_matches))

result_match4_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match4_matches_f), 
                          data = dm2200_matched4)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(result_match4_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.780	0.680	33.448	36.113
exposure == “A”TRUE	-1.529	0.962	-3.414	0.356

12.7 Adjusted Outcome Models after `match5`

12.7.1 Binary Outcome: `bp_good`

Code

result_match5_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match5_matches),
                      data = dm2200_matched5)

Warning in coxexact.fit(X, Y, istrat, offset, init, control, weights = weights,
: Loglik converged before variable 1 ; beta may be infinite.

Code

tidy(result_match5_bp, exponentiate = TRUE, 
     conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	1615474899	40192.97	0	Inf

12.7.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

result_match5_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match5_matches), 
                          data = dm2200_matched5)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(result_match5_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	33.258	1.354	30.605	35.911
exposure == “A”TRUE	1.683	2.037	-2.310	5.676

12.8 Adjusted Outcome Models after `match6`

12.8.1 Binary Outcome: `bp_good`

Code

result_match6_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match6_matches),
                      data = dm2200_matched6)

Warning in coxexact.fit(X, Y, istrat, offset, init, control, weights = weights,
: Ran out of iterations and did not converge

Code

tidy(result_match6_bp, exponentiate = TRUE, 
     conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0	24378.27	0	Inf

12.8.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

result_match6_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match6_matches), 
                          data = dm2200_matched6)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(result_match6_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.338	1.982	30.453	38.223
exposure == “A”TRUE	1.097	2.567	-3.934	6.129

12.9 Adjusted Outcome Models after `match7`

12.9.1 Binary Outcome: `bp_good`

Code

result_match7_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match7_matches),
                      data = dm2200_matched7)

Warning in coxexact.fit(X, Y, istrat, offset, init, control, weights = weights,
: Loglik converged before variable 1 ; beta may be infinite.

Code

tidy(result_match7_bp, exponentiate = TRUE, 
     conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	0	40192.97	0	Inf

12.9.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

result_match7_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match7_matches), 
                          data = dm2200_matched7)

broom.mixed::tidy(result_match7_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	36.097	1.669	32.826	39.367
exposure == “A”TRUE	-1.111	2.365	-5.746	3.525

12.10 Adjusted Outcome Models after `match8`

12.10.1 Binary Outcome: `bp_good`

Code

result_match8_bp <- clogit(bp_good ~ (exposure == "A") + 
                          strata(match8_matches),
                      data = dm2200_matched8)

tidy(result_match8_bp, exponentiate = TRUE, 
     conf.int = TRUE, conf.level = 0.95) %>%
    select(term, estimate, std.error, conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
exposure == “A”TRUE	1.073	0.65	0.3	3.839

12.10.2 Quantitative Outcome: `bmi`

We’ll use a mixed model to account for our 1:1 matching. The matches here are treated as a random factor, with the exposure a fixed factor, in the lme4 package.

Code

dm2200_matched8 <- dm2200_matched8 %>%
    mutate(match8_f = factor(match8_matches))

result_match8_bmi <- lmer(bmi ~ (exposure == "A") + 
                              (1 | match8_f),  
                          data = dm2200_matched8)

broom.mixed::tidy(result_match8_bmi, 
     conf.int = TRUE, conf.level = 0.95) %>% 
    filter(effect == "fixed") %>%
    select(term, estimate, std.error, 
           conf.low, conf.high) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high
(Intercept)	34.597	0.786	33.057	36.137
exposure == “A”TRUE	-3.797	1.329	-6.402	-1.192

13 Cleanup

We’ve created a lot of variables here that we don’t actually need going forward. So I’ll remove them here:

Code

rm(bal1, bal2, bal3, bal4,
   covs_1, covs_1plus, covs_2plus, covs_3plus, covs_4plus,
   covs_for_rubin, dm2200, 
   dm2200_matched1, dm2200_matched2, dm2200_matched3,
   dm2200_matched4,
   match_1, match_2, match_3, match_4,
   prop_model, 
   result_match1_bmi, result_match2_bmi, result_match3_bmi,
   result_match4_bmi,
   result_match1_bp, result_match2_bp, result_match3_bp,
   result_match4_bp,
   rubin_m1, rubin_m2, rubin_m3, rubin_m4,
   rubin_report_m1, rubin_report_m12, rubin_report_m123,
   rubin_report_m1234, rubin_report_m5, 
   rubin_report_m56, rubin_report_m567, rubin_report_m5678, 
   t1, unadj_mod1, unadj_mod2,
   match1_matches, match2_matches, match3_matches,
   count, i, len, len2, match_con1, match_con2, match_tr,
   match1, match2, match4_matches, matchx, mm1, mm2, 
   bal5, bal6, bal7, bal8, dm2200_matched5, 
   dm2200_matched6, dm2200_matched7, dm2200_matched8,
   m5_subs, m7_subs, match_5, match_6, match_7, match_8,
   result_match5_bmi, result_match6_bmi, result_match7_bmi,
   result_match8_bmi,
   result_match5_bp, result_match6_bp, result_match7_bp,
   result_match8_bp)

14 Key References

Matching in these examples was performed using the Matching package (Sekhon, 2011), and the MatchIt package and covariate balance was assessed using cobalt (Greifer, 2020), both in R (R Core Team, 2019).

Greifer, N. (2020). cobalt: Covariate Balance Tables and Plots. R package version 4.3.0.
MatchIt: Nonparametric Preprocessing for Parametric Causal Inference, R package version 4.1.0.
- See also MatchIt: Getting Started by Noah Greifer.
Sekhon, J.S. (2011) Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R, J of Statistical Software, 2011, 42: 7, http://www.jstatsoft.org/. R package version 4.9-6.
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

15 Session Information

Code

xfun::session_info()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  askpass_1.2.0            backports_1.4.1          base64enc_0.1-3         
  bit_4.0.5                bit64_4.0.5              blob_1.2.4              
  boot_1.3-28.1            brio_1.1.4               broom_1.0.5             
  broom.mixed_0.2.9.4      bslib_0.6.1              cachem_1.0.8            
  callr_3.7.3              cellranger_1.1.0         chk_0.9.1               
  class_7.3-22             cli_3.6.2                clipr_0.8.0             
  cmprsk_2.2-11            cobalt_4.5.3             coda_0.19.4             
  codetools_0.2-19         colorspace_2.1-0         compiler_4.3.2          
  conflicted_1.2.0         cpp11_0.4.7              crayon_1.5.2            
  curl_5.2.0               data.table_1.14.10       DBI_1.2.1               
  dbplyr_2.4.0             desc_1.4.3               diffobj_0.3.5           
  digest_0.6.34            dplyr_1.1.4              dtplyr_1.3.1            
  e1071_1.7-14             ellipsis_0.3.2           Epi_2.47.1              
  etm_1.1.1                evaluate_0.23            fansi_1.0.6             
  farver_2.1.1             fastmap_1.1.1            fontawesome_0.5.2       
  forcats_1.0.0            fs_1.6.3                 furrr_0.3.1             
  future_1.33.1            gargle_1.5.2             gdata_3.0.0             
  generics_0.1.3           ggplot2_3.4.4            globals_0.16.2          
  glue_1.7.0               gmodels_2.18.1.1         googledrive_2.1.1       
  googlesheets4_1.1.1      graphics_4.3.2           grDevices_4.3.2         
  grid_4.3.2               gridExtra_2.3            gtable_0.3.4            
  gtools_3.9.5             haven_2.5.4              highr_0.10              
  hms_1.1.3                htmltools_0.5.7          htmlwidgets_1.6.4       
  httr_1.4.7               ids_1.0.1                isoband_0.2.7           
  janitor_2.2.0            jquerylib_0.1.4          jsonlite_1.8.8          
  knitr_1.45               labeling_0.4.3           labelled_2.12.0         
  lattice_0.22-5           lifecycle_1.0.4          listenv_0.9.0           
  lme4_1.1-35.1            lubridate_1.9.3          magrittr_2.0.3          
  MASS_7.3-60.0.1          Matching_4.10-14         MatchIt_4.5.5           
  Matrix_1.6-5             memoise_2.0.1            methods_4.3.2           
  mgcv_1.9-0               mime_0.12                minqa_1.2.6             
  mitools_2.4              modelr_0.1.11            munsell_0.5.0           
  nlme_3.1-163             nloptr_2.0.3             numDeriv_2016.8-1.1     
  openssl_2.1.1            optmatch_0.10.7          parallel_4.3.2          
  parallelly_1.36.0        pillar_1.9.0             pkgbuild_1.4.3          
  pkgconfig_2.0.3          pkgload_1.3.3            plyr_1.8.9              
  praise_1.0.0             prettyunits_1.2.0        processx_3.8.3          
  progress_1.2.3           proxy_0.4-27             ps_1.7.5                
  purrr_1.0.2              R6_2.5.1                 ragg_1.2.7              
  rappdirs_0.3.3           rbounds_2.2              RColorBrewer_1.1.3      
  Rcpp_1.0.12              RcppArmadillo_0.12.6.6.1 RcppEigen_0.3.3.9.4     
  RcppProgress_0.4.2       readr_2.1.5              readxl_1.4.3            
  rematch_2.0.0            rematch2_2.1.2           repr_1.1.6              
  reprex_2.1.0             rlang_1.1.3              rlemon_0.2.1            
  rmarkdown_2.25           rprojroot_2.0.4          rstudioapi_0.15.0       
  rvest_1.0.3              sass_0.4.8               scales_1.3.0            
  selectr_0.4.2            skimr_2.1.5              snakecase_0.11.1        
  splines_4.3.2            stats_4.3.2              stringi_1.8.3           
  stringr_1.5.1            survey_4.2-1             survival_3.5-7          
  sys_3.4.2                systemfonts_1.0.5        tableone_0.13.2         
  testthat_3.2.1           textshaping_0.3.7        tibble_3.2.1            
  tidyr_1.3.0              tidyselect_1.2.0         tidyverse_2.0.0         
  timechange_0.2.0         tinytex_0.49             tools_4.3.2             
  tzdb_0.4.0               utf8_1.2.4               utils_4.3.2             
  uuid_1.2.0               vctrs_0.6.5              viridisLite_0.4.2       
  vroom_1.6.5              waldo_0.5.2              withr_2.5.2             
  xfun_0.41                xml2_1.3.6               yaml_2.3.8              
  zoo_1.8-12

1 Setup

2 The dm2200 data set

2.1 Codebook

2.2 Comparing Exposure Groups with tableone

3 Propensity for Exposure

3.1 Fitting a Propensity Model

3.1.1 Storing the Propensity Scores

4 match_1 1:1 greedy matching without replacement with the Matching package

4.1 ATT vs. ATE vs. ATC estimates

4.2 Obtaining the Matched Sample

4.3 Checking Covariate Balance for our 1:1 Greedy Match

4.3.1 Using bal.tab to obtain a balance table

4.3.2 Checking Rubin’s Rules 1 and 2

4.3.3 Using bal.plot from cobalt

4.3.4 Using love.plot to look at Standardized Differences

4.3.5 Using love.plot to look at Variance Ratios

5 match_2 1:2 greedy matching without replacement with the Matching package

5.1 Obtaining the Matched Sample

5.2 Checking Covariate Balance for our 1:2 Greedy Match

5.2.1 Using bal.tab to obtain a balance table

5.2.2 Checking Rubin’s Rules 1 and 2

5.2.3 Using bal.plot from cobalt

5.2.4 Using love.plot to look at Standardized Differences

5.2.5 Using love.plot to look at Variance Ratios

6 match_3 1:3 matching, with replacement with the Matching package

6.1 Obtaining the Matched Sample

6.2 Checking Covariate Balance for our 1:3 Match

6.2.1 Using bal.tab to obtain a balance table

6.2.2 Checking Rubin’s Rules 1 and 2

6.2.3 Using bal.plot from cobalt

6.2.4 Using love.plot to look at Standardized Differences

6.2.5 Using love.plot to look at Variance Ratios

7 match_4 Caliper Matching (1:1 without replacement) with the Matching package

7.1 Obtaining the Matched Sample

7.2 Checking Covariate Balance for our 1:1 Caliper Match

7.2.1 Using bal.tab to obtain a balance table

7.2.2 Checking Rubin’s Rules 1 and 2

7.2.3 Using bal.plot from cobalt

7.2.4 Using love.plot to look at Standardized Differences

7.2.5 Using love.plot to look at Variance Ratios

8 match_5 1:1 Nearest Neighbor Matching with Matchit

8.1 Obtaining the Matched Sample

8.2 Checking Covariate Balance for our 1:1 Nearest Neighbor Match

8.2.1 Default Numerical Balance Summary from MatchIt

8.2.2 Using bal.tab to obtain a balance table

8.2.3 Checking Rubin’s Rules 1 and 2

8.2.4 Using bal.plot from cobalt

8.2.5 Using love.plot to look at Standardized Differences

8.2.6 Using love.plot to look at Variance Ratios

9 match_6 1:1 Nearest Neighbor Caliper Matching with Matchit

9.1 Obtaining the Matched Sample

9.2 Checking Covariate Balance for our 1:1 Nearest Neighbor Caliper Match

9.2.1 Using bal.tab to obtain a balance table

9.2.2 Checking Rubin’s Rules 1 and 2

9.2.3 Using bal.plot from cobalt

9.2.4 Using love.plot to look at Standardized Differences

9.2.5 Using love.plot to look at Variance Ratios

10 match_7 1:1 Optimal Matching with Matchit

10.1 Obtaining the Matched Sample

10.2 Checking Covariate Balance for our 1:1 Optimal Match

10.2.1 Using bal.tab to obtain a balance table

10.2.2 Checking Rubin’s Rules 1 and 2

10.2.3 Are the Optimal and Nearest Neighbor Matches the Same?

10.2.4 Using bal.plot from cobalt

10.2.5 Using love.plot to look at Standardized Differences

10.2.6 Using love.plot to look at Variance Ratios

11 match_8 1:2 Optimal Matching with Matchit

11.1 Obtaining the Matched Sample

11.2 Checking Covariate Balance for our 1:2 Optimal Match

11.2.1 Using bal.tab to obtain a balance table

11.2.2 Checking Rubin’s Rules 1 and 2

11.2.3 Using bal.plot from cobalt

11.2.4 Using love.plot to look at Standardized Differences

11.2.5 Using love.plot to look at Variance Ratios

12 Outcome Models

12.1 Potential Fitting Problems

12.2 Unadjusted Models prior to Propensity Matching

12.2.1 Unadjusted Outcome Model for bp_good

12.2.2 Unadjusted Outcome Model for bmi

12.3 Adjusted Outcome Models after match1

2 The `dm2200` data set

2.2 Comparing Exposure Groups with `tableone`

4 `match_1` 1:1 greedy matching without replacement with the `Matching` package

4.3.1 Using `bal.tab` to obtain a balance table

4.3.3 Using `bal.plot` from `cobalt`

4.3.4 Using `love.plot` to look at Standardized Differences

4.3.5 Using `love.plot` to look at Variance Ratios

5 `match_2` 1:2 greedy matching without replacement with the `Matching` package

5.2.1 Using `bal.tab` to obtain a balance table

5.2.3 Using `bal.plot` from `cobalt`

5.2.4 Using `love.plot` to look at Standardized Differences

5.2.5 Using `love.plot` to look at Variance Ratios

6 `match_3` 1:3 matching, with replacement with the `Matching` package

6.2.1 Using `bal.tab` to obtain a balance table

6.2.3 Using `bal.plot` from `cobalt`

6.2.4 Using `love.plot` to look at Standardized Differences

6.2.5 Using `love.plot` to look at Variance Ratios

7 `match_4` Caliper Matching (1:1 without replacement) with the `Matching` package

7.2.1 Using `bal.tab` to obtain a balance table

7.2.3 Using `bal.plot` from `cobalt`

7.2.4 Using `love.plot` to look at Standardized Differences

7.2.5 Using `love.plot` to look at Variance Ratios

8 `match_5` 1:1 Nearest Neighbor Matching with `Matchit`

8.2.1 Default Numerical Balance Summary from `MatchIt`

8.2.2 Using `bal.tab` to obtain a balance table

8.2.4 Using `bal.plot` from `cobalt`

8.2.5 Using `love.plot` to look at Standardized Differences

8.2.6 Using `love.plot` to look at Variance Ratios

9 `match_6` 1:1 Nearest Neighbor Caliper Matching with `Matchit`

9.2.1 Using `bal.tab` to obtain a balance table

9.2.3 Using `bal.plot` from `cobalt`

9.2.4 Using `love.plot` to look at Standardized Differences

9.2.5 Using `love.plot` to look at Variance Ratios

10 `match_7` 1:1 Optimal Matching with `Matchit`

10.2.1 Using `bal.tab` to obtain a balance table

10.2.4 Using `bal.plot` from `cobalt`

10.2.5 Using `love.plot` to look at Standardized Differences

10.2.6 Using `love.plot` to look at Variance Ratios

11 `match_8` 1:2 Optimal Matching with `Matchit`

11.2.1 Using `bal.tab` to obtain a balance table

11.2.3 Using `bal.plot` from `cobalt`

11.2.4 Using `love.plot` to look at Standardized Differences

11.2.5 Using `love.plot` to look at Variance Ratios

12.2.1 Unadjusted Outcome Model for `bp_good`

12.2.2 Unadjusted Outcome Model for `bmi`

12.3 Adjusted Outcome Models after `match1`

12.3.1 Binary Outcome: `bp_good`

12.3.2 Quantitative Outcome: `bmi`

12.4 Adjusted Outcome Models after `match2`

12.4.1 Binary Outcome: `bp_good`

12.4.2 Quantitative Outcome: `bmi`

12.5 Adjusted Outcome Models after `match3`

12.5.1 Binary Outcome: `bp_good`

12.5.2 Quantitative Outcome: `bmi`

12.6 Adjusted Outcome Models after `match4`

12.6.1 Binary Outcome: `bp_good`

12.6.2 Quantitative Outcome: `bmi`

12.7 Adjusted Outcome Models after `match5`

12.7.1 Binary Outcome: `bp_good`

12.7.2 Quantitative Outcome: `bmi`

12.8 Adjusted Outcome Models after `match6`

12.8.1 Binary Outcome: `bp_good`

12.8.2 Quantitative Outcome: `bmi`

12.9 Adjusted Outcome Models after `match7`

12.9.1 Binary Outcome: `bp_good`

12.9.2 Quantitative Outcome: `bmi`

12.10 Adjusted Outcome Models after `match8`

12.10.1 Binary Outcome: `bp_good`

12.10.2 Quantitative Outcome: `bmi`