Propensity Score Matching in R

26 February 2025; updated: 04 March 2025

Introduction

This is a tutorial on how to perform propensity score matching in R.

Propensity score matching is a statistical approach to balancing the observed covariates between groups.[1] In essence, the propensity score is the probability that an individual will be given the exposure conditional on their observable characteristics. A propensity score is estimated using regression model methods (e.g., logistic or probit) conditioned on the observed baseline covariates.

In randomized controlled trials, the subject is randomized into the treatment or control arms of a clinical trial. After randomization, the observed characteristics of the treatment and control groups are assessed for balance. If randomized is done correctly, then not only are the observed covariates balanced, but the unobserved covariates are equally balanced between the groups. In observational studies, this balanced is often not observed introducing potential bias or confounding.

To address this problem in observational studies, we can used multiple regression model or use propensity score matching techniques. Multiple regression model control (or adjust) for confounding by including the observable covariates into the regression model. With propensity score matching, the observable characteristics are used to estimate the probability of being in the exposure arm conditional on the observable characteristics. In theory, if this is done correctly, than the balanced observed with the observable characteristics would extend to the unobservable characteristics (e.g., similar to a randomized control trial).

In this article, we will review some of the propensity score matching methods to balanced the observable characteristics between groups.

The propensity score matching methods we will review are:

Nearest neighbor approach
Inverse probability weight (IPW) for average treatment effect (ATE)
Inverse probability weight (IPW) for average treatment effect of the treated (ATT)

Causal Framework (ATE and ATT)

In a randomized control trial, we can estimate the average treatment effect (ATE), which is the difference in outcome $(Y)$ between the exposure $(Z = 1)$ and unexposed groups $(Z = 0)$. In reality, the individual is randomized to either the exposure or control arm. When we observe what happens to the individual, we can only observe what occurred to the individual in the arm they were assigned to. For example, an individual $i$ can be assigned to the exposure arm; hence, we can only observe their outcome in the exposure arm $Y_{i}(1)$. Conversely, an individual $i$ in the control arm can only have their outcome in the control arm observed $Y_{i}(0)$. Thus, we can only report on the outcome for that individual in that specific arm. This can be represented as:

\[\begin{aligned} Y_{i}(Y_{i} = Z_{i}Y_{i}(1) + (1 - Z_{i})Y_{i}(0)) \end{aligned}\]

This expression states that a single outcome for the individual $i$ is based on whether they were assigned to the exposure arm $Z_{i}$, which generates the outcome in the exposure arm $Y{i}(1)$, or the unexposed arm $(1 - Z_{i})$, which generate the outcome in the unexposed arm $Y_{i}(0)$.

In other words, if $Z = 1$, the expression reduces to:

\[\begin{aligned} Y_{i}(Y_{i} = Y_{i}(1)) \end{aligned}\]

If $Z = 0$, the expression reduces to:

\[\begin{aligned} Y_{i}(Y_{i} = Y_{i}(0)) \end{aligned}\]

Thus, in this framework, the ATE is estimated as the:

\[\begin{aligned} ATE = E[Y_{i}(1) - Y_{i}(0)] \end{aligned}\]

The ATE is defined as the treatment effect at the population level. We think of this as the difference in outcome when the everyone in the population who were in the unexposed group were instead in the exposure group. In reality this is an impossibility because we can only observe what happens to an individual based on their treatment assignment. However, if we can maintain this framework, we could have a situation where we can make a causal interpretation.

Similarly, we can also estimate the average treatment effect of the treated (ATT). The ATT is the difference in outcomes $(Y(1) - Y(0))$ among those individual who received treatment $Z = 1$. It can be expressed as:

\[\begin{aligned} ATT = E[Y_{i}(1) - Y_{i}(0)|Z_{i} = 1] \end{aligned}\]

In an ideal randomized controlled trial where the individual is in perfect adherence to their treatment, the ATE and ATT are exactly the same because the exposure and unexposed arms are the same.

There are other types of treatment effects such as the average treatment effect of the control group (ATC), which is the opposite of the ATT, sample average treatment effect (SATE), population average treatment effect (PATE), local average treatment effect (LATE), and conditional average treatment effect (CATE).[2]

In this article, we will only focus on the ATE and ATT since these are the two common types of analyses performed with propensity score matching techniques.

Motivating example

We will use the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS) to evaluate the difference in total healthcare expenditures (costs) between responders who have a diagnosis of type 2 diabetes and responders who do not have a diagnosis of diabetes. (Note: Since MEPS is a survey administered to individual, we call them responders or respondents.)

Load libraries

These are the essential packages you need to install and load.

if (!require("pacman")) install.packages("pacman"); library("pacman")

p_load("MEPS",
       "tidyverse",
       "MatchIt",
       "cobalt",
       "broom",
       "modelsummary",
       "sandwich")

AHRQ MEPS data - Full-Year Consolidated File from 2021

We will use the 2021 Full-Year Consolidated File from MEPS to perform a propensity score match. This data contains information about the total healthcare expenditure of the responder and socioeconomic characteristics.

### Load data
### Load 2021 Full Year Consolidated Data File
hc2021 = read_MEPS(file = "h233")
names(hc2021) <- tolower(names(hc2021))   ## Convert column headers to lower case


## View first fix rows
head(hc2021)

## # A tibble: 6 × 1,488
##      duid   pid dupersid   panel         famid31 famid42 famid53 famid21 famidyr
##     <dbl> <dbl> <chr>      <dbl+lbl>     <chr>   <chr>   <chr>   <chr>   <chr>  
## 1 2320005   101 2320005101 23 [23 PANEL… A       A       A       A       A      
## 2 2320005   102 2320005102 23 [23 PANEL… A       A       A       A       A      
## 3 2320006   101 2320006101 23 [23 PANEL… A       A       A       A       A      
## 4 2320006   102 2320006102 23 [23 PANEL… B       B       B       B       B      
## 5 2320006   103 2320006103 23 [23 PANEL… A       A       A       A       A      
## 6 2320012   102 2320012102 23 [23 PANEL… A       A       A       A       A      
## # ℹ 1,479 more variables: cpsfamid <chr>, fcsz1231 <dbl+lbl>,
## #   fcrp1231 <dbl+lbl>, ruletr31 <chr>, ruletr42 <chr>, ruletr53 <chr>,
## #   ruletr21 <chr>, rusize31 <dbl+lbl>, rusize42 <dbl+lbl>, rusize53 <dbl+lbl>,
## #   rusize21 <dbl+lbl>, ruclas31 <dbl+lbl>, ruclas42 <dbl+lbl>,
## #   ruclas53 <dbl+lbl>, ruclas21 <dbl+lbl>, famsze31 <dbl+lbl>,
## #   famsze42 <dbl+lbl>, famsze53 <dbl+lbl>, famsze21 <dbl+lbl>,
## #   fmrs1231 <dbl+lbl>, fams1231 <dbl+lbl>, famszeyr <dbl+lbl>, …

## Rename columns / Reduce data to only a few variables
hc2021p = hc2021 %>%
  rename(
    age = age21x, 
    totexp = totexp21,
    ertexp = ertexp21,
    diabetes = diabdx_m18,
    race = racev1x,
    poverty = povcat21,
    marital_status = marry21x) %>%
  select(
    dupersid, 
    age,
    sex,
    race,
    poverty,
    diabetes,
    marital_status,
    totexp,
    ertexp)
hc2021p$year <- 2021

Subset data for adults (age >= 18 years)

We are interested in the adult population, so we will use the subset() function to restrict our sample to responders who are 18 years and older.

hc2021p <- subset(hc2021p, age >= 18)

Subset data for adults with marital status information

There are some missing data on responders in regards to their marital status. We will remove these using the subset() function.

hc2021p <- subset(hc2021p, marital_status >= 1)

Label values for variables (Sex, Race, Poverty, and Marital Status)

Since the MEPS data uses specific codes for the values with each variable, we will need to label these so that they are easier to interpret. For example, the sex variable in MEPS is coded as = male and 2 = female. We will change that to 1 = male and 0 = female. We also will change each varaible to a factor so that we can use these in our propensity score matching models.

### SEX
hc2021$sex <- factor(hc2021$sex, levels = c(1, 2), labels = c("Male", "Female"))

### RACE
hc2021p$race <- factor(hc2021p$race, levels = c(1, 2, 3, 4, 6), labels = c("White", "Black", "AI/AN", "Asian", "Multiple"))

### POVERTY
hc2021p$poverty <- factor(hc2021p$poverty, levels = c(1, 2, 3, 4, 5), labels = c("Poor", "Near Poor", "Low Income", "Middle Income", "High Income"))

### MARITAL STATUS
hc2021p$marital_status <- factor(hc2021p$marital_status, levels = c(1, 2, 3, 4, 5), labels = c("Married", "Widowed", "Divorced", "Separated", "Never Married"))

Keep respondents with diabetes indicator

The diabetes variable will need to be recoded from 1 = diabetes and 2 = no diabetes to 1 = diabetes and 0 = no diabetes. In our dataset, there are 3184 respondents with diabetes and 19,262 respondents without diabetes.

hc2021p <- hc2021p[hc2021p$diabetes %in% c(1, 2), ]

## Change diabetes indicator to 1 = Yes, 0 = No.
hc2021p$diabetes[hc2021p$diabetes == 1] = 1
hc2021p$diabetes[hc2021p$diabetes == 2] = 0
table(hc2021p$diabetes)

## 
##     0     1 
## 19262  3184

Propensity score matching

Once we have our data, we can perform the various propensity score matching approaches.

Recall, that we want to evaluate the difference in total expenditures between respondents with and without a diagnosis of diabetes in 2021. Hence, we will need to balance the observed characteristics between responders with and without diabetes.

Propensity score matching with nearest neighbor approach

The first propensity score matching technique is the nearest neighbor approach. This is where the algorithm will match an individual with another individual with the closest propensity score. We can refine how close using the caliper size. If there are ties, then the algorithm will randomly select a match. For those individuals that do not match, they are dropped from the data.

There are many ways to perform the nearest neighbor approach (e.g., Mahalanobis, Greedy, optimal). In this example, we will use the Greedy method (also known as nearest in R).

We will use a caliper size of 0.01 and perform matching without replacement. Matching can also be done with replacement where individuals in the control group will be used more than once in a match with the treatment group. This will result in an imbalanced in the number of individuals in the treatment and control group.

In our example, we will not apply the replacement option. Instead, we want to have 1:1 matching without replacement. This will give us equal number of responders who have and do not have a diagnosis of diabetes.

Lastly, we will estimate the propensity score using several observed characteristics (age, sex, race, poverty, and martial status).

For this exercise, we will use the MatchIt package and the matchit() function to generate propensity scores.

##########################################################
## Propensity Score Matching - Nearest Neighbor Approach
##########################################################
## Propensity score matching
match1 <- matchit(diabetes ~ age + sex + race + poverty + marital_status,
                  data = hc2021p,
                  method = "nearest",
                  discard = "both",
                  caliper = 0.01)

Visual inspection

After we performed the matching, we will visually inspect to see how well the matching was done. We will generate a Love plot, which will have show us the standardized difference for each covariate used in the propensity score model between the responders with and without diabetes. Our aim is to have these covariates within an acceptable threshold. In this example, our threshold is a standardized difference of < 0.1. Additionally, we estimate the Kolmogorov-Smirnov test to see if the difference was statistically significant; our aim is to have a value that is less than 0.05.

## Inspect matching
summary(match1)

## 
## Call:
## matchit(formula = diabetes ~ age + sex + race + poverty + marital_status, 
##     data = hc2021p, method = "nearest", discard = "both", caliper = 0.01)
## 
## Summary of Balance for All Data:
##                             Means Treated Means Control Std. Mean Diff.
## distance                           0.2193        0.1290          0.8088
## age                               63.3841       49.3954          1.0276
## sex                                1.5380        1.5378          0.0004
## raceWhite                          0.6982        0.7685         -0.1531
## raceBlack                          0.2048        0.1389          0.1633
## raceAI/AN                          0.0154        0.0077          0.0626
## raceAsian                          0.0496        0.0573         -0.0352
## raceMultiple                       0.0320        0.0277          0.0245
## povertyPoor                        0.2296        0.1469          0.1966
## povertyNear Poor                   0.0700        0.0458          0.0948
## povertyLow Income                  0.1583        0.1285          0.0815
## povertyMiddle Income               0.2795        0.2797         -0.0004
## povertyHigh Income                 0.2626        0.3990         -0.3100
## marital_statusMarried              0.4491        0.4652         -0.0323
## marital_statusWidowed              0.1674        0.0754          0.2463
## marital_statusDivorced             0.1947        0.1348          0.1514
## marital_statusSeparated            0.0377        0.0217          0.0840
## marital_statusNever Married        0.1511        0.3029         -0.4241
##                             Var. Ratio eCDF Mean eCDF Max
## distance                        1.1806    0.2438   0.3785
## age                             0.5344    0.2057   0.3576
## sex                             1.0002    0.0001   0.0002
## raceWhite                            .    0.0703   0.0703
## raceBlack                            .    0.0659   0.0659
## raceAI/AN                            .    0.0077   0.0077
## raceAsian                            .    0.0076   0.0076
## raceMultiple                         .    0.0043   0.0043
## povertyPoor                          .    0.0827   0.0827
## povertyNear Poor                     .    0.0242   0.0242
## povertyLow Income                    .    0.0297   0.0297
## povertyMiddle Income                 .    0.0002   0.0002
## povertyHigh Income                   .    0.1364   0.1364
## marital_statusMarried                .    0.0160   0.0160
## marital_statusWidowed                .    0.0920   0.0920
## marital_statusDivorced               .    0.0600   0.0600
## marital_statusSeparated              .    0.0160   0.0160
## marital_statusNever Married          .    0.1519   0.1519
## 
## Summary of Balance for Matched Data:
##                             Means Treated Means Control Std. Mean Diff.
## distance                           0.2177        0.2177          0.0001
## age                               63.2943       63.7663         -0.0347
## sex                                1.5390        1.5349          0.0082
## raceWhite                          0.7013        0.7379         -0.0798
## raceBlack                          0.2030        0.1850          0.0446
## raceAI/AN                          0.0139        0.0098          0.0333
## raceAsian                          0.0496        0.0433          0.0291
## raceMultiple                       0.0322        0.0240          0.0466
## povertyPoor                        0.2280        0.2327         -0.0113
## povertyNear Poor                   0.0695        0.0565          0.0507
## povertyLow Income                  0.1582        0.1547          0.0095
## povertyMiddle Income               0.2804        0.2877         -0.0162
## povertyHigh Income                 0.2640        0.2684         -0.0100
## marital_statusMarried              0.4490        0.4531         -0.0083
## marital_statusWidowed              0.1670        0.1601          0.0186
## marital_statusDivorced             0.1955        0.2097         -0.0359
## marital_statusSeparated            0.0369        0.0319          0.0265
## marital_statusNever Married        0.1516        0.1452          0.0176
##                             Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance                        1.0004    0.0001   0.0019          0.0003
## age                             0.9761    0.0070   0.0256          0.1621
## sex                             0.9988    0.0021   0.0041          0.2121
## raceWhite                            .    0.0366   0.0366          0.2476
## raceBlack                            .    0.0180   0.0180          0.2793
## raceAI/AN                            .    0.0041   0.0041          0.1513
## raceAsian                            .    0.0063   0.0063          0.2152
## raceMultiple                         .    0.0082   0.0082          0.2582
## povertyPoor                          .    0.0047   0.0047          0.2380
## povertyNear Poor                     .    0.0129   0.0129          0.1868
## povertyLow Income                    .    0.0035   0.0035          0.2206
## povertyMiddle Income                 .    0.0073   0.0073          0.1625
## povertyHigh Income                   .    0.0044   0.0044          0.1378
## marital_statusMarried                .    0.0041   0.0041          0.1835
## marital_statusWidowed                .    0.0069   0.0069          0.2098
## marital_statusDivorced               .    0.0142   0.0142          0.2544
## marital_statusSeparated              .    0.0051   0.0051          0.1956
## marital_statusNever Married          .    0.0063   0.0063          0.1675
## 
## Sample Sizes:
##           Control Treated
## All         19262    3184
## Matched      3167    3167
## Unmatched   15974      16
## Discarded     121       1

## Visual inspection using Love plot
love.plot(match1, 
          stats = c("m", "ks"), ### m = mean difference; ks = Kolmogorov-Smirnov
          thresholds = c(m = 0.1, ks = 0.05), 
          drop.distance = TRUE, 
          colors = c("dodgerblue4", "firebrick"),
          shapes = c(16, 17),
          stars = "none")

In the summary(match1) output, we can review the number of respondents who were matched. We started out with 3184 respondents who reported having a diagnosis of diabetes. After matching, this number was reduced to 3167.

We also see that there were an equal number of respondents without diabetes. However, 15,974 respondents without diabetes were not matched.

Moreover, based on the love plot, the covariates appear to be well-balanced between responders who have and do not have a diagnosis of diabetes.

Convert Propensity Score Matched results into an analyzable data

Now that we have the propensity score matched groups, we need to create a dataset with these individuals. We will call this dataset match_data.

### Convert to data
match_data <- match.data(match1)
head(match_data)

## # A tibble: 6 × 13
##   dupersid     age sex       race  poverty diabetes marital_status totexp ertexp
##   <chr>      <dbl> <dbl+lbl> <fct> <fct>   <dbl+lb> <fct>           <dbl>  <dbl>
## 1 2320012102    81 2 [2 FEM… White Low In… 1 [1 YE… Widowed          9813      0
## 2 2320024101    44 2 [2 FEM… White Low In… 0        Married          2106      0
## 3 2320028101    43 2 [2 FEM… White Middle… 1 [1 YE… Married           377      0
## 4 2320038101    83 1 [1 MAL… White High I… 0        Married          7677      0
## 5 2320038102    83 2 [2 FEM… White High I… 0        Married          3953      0
## 6 2320040101    63 1 [1 MAL… White High I… 1 [1 YE… Widowed         18740      0
## # ℹ 4 more variables: year <dbl>, distance <dbl>, weights <dbl>, subclass <fct>

Inverse Probability Weight (IPW) Methods

The nearest neighbor approach does a great job of identifying the best match, which can be displayed on a Table 1 demographics similar to a randomized controlled trial. However, there are a lot of individuals who were not matched and dropped from the data.

In some cases, this is undesirable.

Another method uses the inverse probability weight (IPW) method to keep as much of the data as possible by providing weights to each individual in the data. We use these weights (which are based on the propensity score) in the analysis to provide us with the weighted estimates of the treatment effect.

Using logistic regression to estimate the propensity score

We estimate the propensity score using a logistic regression model with the observable covariates. The dependent variable will be the diabetes variable.

Once we perform the logistic regression model, we will need to estimated the predicted values of the probability of having a diagnosis of diabetes. This is essentially the propensity score.

There are two ways to estimate the predicted values. The first uses the uagment_columns() function. The second uses the mutate() function. I show how you can do both below.

After the predicted probability of having a diagnosis of diabetes, we can plot these distributions between the respondents with and without diabetes.

##########################################################
## Propensity Score Estimation
##########################################################
### Construct a logistic regression model
logit1 <- glm(diabetes ~ age + sex + race + poverty + marital_status, 
              data = hc2021p,
              family = "binomial"(link = "logit"))
summary(logit1)

## 
## Call:
## glm(formula = diabetes ~ age + sex + race + poverty + marital_status, 
##     family = binomial(link = "logit"), data = hc2021p)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4945  -0.5963  -0.4024  -0.2621   2.8965  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -3.810998   0.124969 -30.496  < 2e-16 ***
## age                          0.045751   0.001448  31.604  < 2e-16 ***
## sex                         -0.141087   0.041395  -3.408 0.000654 ***
## raceBlack                    0.474756   0.053747   8.833  < 2e-16 ***
## raceAI/AN                    0.933197   0.177984   5.243 1.58e-07 ***
## raceAsian                    0.288761   0.093057   3.103 0.001915 ** 
## raceMultiple                 0.566715   0.118194   4.795 1.63e-06 ***
## povertyNear Poor            -0.140514   0.090899  -1.546 0.122149    
## povertyLow Income           -0.287948   0.068344  -4.213 2.52e-05 ***
## povertyMiddle Income        -0.407309   0.059286  -6.870 6.41e-12 ***
## povertyHigh Income          -0.853575   0.061451 -13.890  < 2e-16 ***
## marital_statusWidowed       -0.121386   0.065844  -1.844 0.065248 .  
## marital_statusDivorced       0.016357   0.056997   0.287 0.774125    
## marital_statusSeparated      0.354424   0.114800   3.087 0.002020 ** 
## marital_statusNever Married -0.197674   0.063481  -3.114 0.001846 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 18330  on 22445  degrees of freedom
## Residual deviance: 16263  on 22431  degrees of freedom
## AIC: 16293
## 
## Number of Fisher Scoring iterations: 5

## Method 1: Using augment_columns
prob_fitted <- augment_columns(logit1, 
                               data = hc2021p,
                               type.predict = "response") %>%
                rename(propensity_score = .fitted)

## Inspect propensity scores (propensity_score)
summary(prob_fitted$propensity_score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01311 0.05468 0.10972 0.14185 0.20192 0.68263

## Method 2: Using mutate
prob_fitted2 <- hc2021p %>%
    mutate(
        propensity_score =  glm(diabetes ~ age + sex + race + poverty + marital_status, 
                                data = hc2021p,
                                family = "binomial"(link = "logit")) %>%
          predict(type = "response")
          )

## Inspect propensity scores (propensity_score)
summary(prob_fitted2$propensity_score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01311 0.05468 0.10972 0.14185 0.20192 0.68263

## Visualize the propensity score distributions between the diabetes and no-diabetes groups
ps_fitted <- prob_fitted2 %>%
    tidyr::spread(diabetes, propensity_score, sep = "_")

ggplot(ps_fitted) + 
  geom_histogram(bins = 50, aes(diabetes_0)) + 
  geom_histogram(bins = 50, aes(x = diabetes_1, y = -after_stat(count))) + 
  ylab("count") + xlab("p") +
  geom_hline(yintercept = 0, lwd = 0.5) +
  scale_y_continuous(label = abs)

# Note: The code for the graph came from:  https://livefreeordichotomize.com/posts/2019-01-17-understanding-propensity-score-weighting/

IPW for ATE

Once we have the propensity scores, we can estimate the IPW for the ATE analysis. The propensity score for the respondent $ps_{i}$ is used to estimate the IPW of the ATE $ipw_{ATE}$ based on whether they have a diagnosis of diabetes $Z_{i} = 1$ or not $Z_{i} = 0$. The formula to estimate the IPW for the ATE is:

\[\begin{aligned} ipw_{ATE} = \frac{Z_{i}}{ps_{i}} + \frac{1 - Z_{i}}{1 - ps_{i}} \end{aligned}\]

Once the IPW for the ATE is estimated, we will save this as the ate_fitted dataset for our analysis.

##############
#### ATE IPW
##############
ate_fitted <- prob_fitted2 %>%
    mutate(
        ipw_ate = (diabetes / propensity_score) + ((1 - diabetes) / (1 - propensity_score))
          )

## Visualize the propensity scores between the diabetes and no-diabetes groups
ps_ate <- ate_fitted %>%
  tidyr::spread(diabetes, propensity_score, sep = "_")

ggplot(ps_ate) +
  geom_histogram(bins = 50, aes(diabetes_1), alpha = 0.5) + 
  geom_histogram(bins = 50, aes(diabetes_1, weight = ipw_ate), fill = "green", alpha = 0.5) + 
  geom_histogram(bins = 50, alpha = 0.5, aes(x = diabetes_0, y = -..count..)) + 
  geom_histogram(bins = 50, aes(x = diabetes_0, weight = ipw_ate, y = -..count..), fill = "blue", alpha = 0.5) + 
  ylab("count") + xlab("p") +
  geom_hline(yintercept = 0, lwd = 0.5) +
  scale_y_continuous(label = abs)

# Note: The code for the graph came from:  https://livefreeordichotomize.com/posts/2019-01-17-understanding-propensity-score-weighting/

We can visualize how well the IPW ATE does in providing weights to the group with {green and without blue a diagnosis of diabetes.

IPW for ATT

Once we have the propensity scores, we can estimate the IPW for the ATT analysis. The propensity score for the respondent $ps_{i}$ is used to estimate the IPW of the ATT $ipw_{ATT}$ based on whether they have a diagnosis of diabetes $Z_{i} = 1$ or not $Z_{i} = 0$. The formula to estimate the IPW for the ATT is:

\[\begin{aligned} ipw_{ATT} = \frac{ps_{i}Z_{i}}{ps_{i}} + \frac{ps_{i}(1 - Z_{i})}{1 - ps_{i}} \end{aligned}\]

Once the IPW for the ATT is estimated, we will save this as the att_fitted dataset for our analysis.

##############
#### ATT IPW
##############
att_fitted <- prob_fitted2 %>%
  mutate(
    ipw_att = ((propensity_score * diabetes) / propensity_score) + ((propensity_score * (1 - diabetes)) / (1 - propensity_score))
        )

## Visualize the propensity scores between the diabetes and no-diabetes groups
ps_att <- att_fitted %>%
  tidyr::spread(diabetes, propensity_score, sep = "_")

ggplot(ps_att) +
  geom_histogram(bins = 50, aes(diabetes_1), alpha = 0.5) + 
  geom_histogram(bins = 50, aes(diabetes_1, weight = ipw_att), fill = "green", alpha = 0.5) + 
  geom_histogram(bins = 50, alpha = 0.5, aes(x = diabetes_0, y = -..count..)) + 
  geom_histogram(bins = 50, aes(x = diabetes_0, weight = ipw_att, y = -..count..), fill = "blue", alpha = 0.5) + 
  ylab("count") + xlab("p") +
  geom_hline(yintercept = 0, lwd = 0.5) +
  scale_y_continuous(label = abs)

# Note: The code for the graph came from:  https://livefreeordichotomize.com/posts/2019-01-17-understanding-propensity-score-weighting/

The IPW for ATT focused on those respondents who had diabetes. Thus, the weights are generated based on those individuals who had a diagnosis of diabetes. This is why there is a large grey area with the respondents who did not have a diagnosis of diabetes. It’s because their weights have been reduced as denoted by the blue area.

Comparison of propensity score matched methods

Let’s recap. So far, we have estimated the propensity scores for each respondents in our sample. We used three propensity score matching approaches. The first used the nearest neighbor approach where we matched the respondents without diabetes to respondents with diabetes in a 1:1 matching ratio without replacement. The second approach used the propensity score to estimate the inverse probability weights for respondents based on the ATE analysis. Last, we used the propensity score to estimate the inverse probability weights for respondents based on the the ATT analysis.

We can now estimate the total healthcare expenditures between respondents with and without a diagnosis of diabetes.

We will use a linear regression model where the total healthcare expenditure totexp is the dependent variable and the diabetes variable as the predictor of interest. We will not need to include the observed characteristics (age, sex, race, poverty, and marital status) since those were used to estimate the propensity scores. (Note: In some cases, these characteristics may be added to adjust for the model results.)

\[\begin{aligned} Expenditures_{i} = \beta_{0} + \beta_{1}Diabetes_{i} + \epsilon_{i} \end{aligned}\]

Compare the propensity score matching approaches

The first analysis will use the MEPS data without any adjustments or propensity score matching.
The second analysis will use the nearest neighbor matching approach.
The third analysis will use the IPW with ATE weights.
The fourth analysis will use the IPW with ATT weights.

In R, we can apply the weights from the IPW using the weight = option in the glm() function.

## No matching
model0 <- glm(totexp ~ diabetes,
              data = hc2021p)
summary(model0)

## 
## Call:
## glm(formula = totexp ~ diabetes, data = hc2021p)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##  -16703    -7519    -6096    -1224  2179645  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7644.8      212.3   36.00   <2e-16 ***
## diabetes      9057.8      563.8   16.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 868430410)
## 
##     Null deviance: 1.9715e+13  on 22445  degrees of freedom
## Residual deviance: 1.9491e+13  on 22444  degrees of freedom
## AIC: 525691
## 
## Number of Fisher Scoring iterations: 2

## Nearest neighbor matching
model1 <- glm(totexp ~ diabetes,
              data = match_data)
summary(model1)

## 
## Call:
## glm(formula = totexp ~ diabetes, data = match_data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -16733  -10565   -7864    1186  816647  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10564.8      494.9  21.346   <2e-16 ***
## diabetes      6167.8      699.9   8.812   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 775792433)
## 
##     Null deviance: 4.9726e+12  on 6333  degrees of freedom
## Residual deviance: 4.9123e+12  on 6332  degrees of freedom
## AIC: 147632
## 
## Number of Fisher Scoring iterations: 2

## IPW with ATE
model2 <- glm(totexp ~ diabetes, 
              data = ate_fitted, 
              weight = ipw_ate)
summary(model2)

## 
## Call:
## glm(formula = totexp ~ diabetes, data = ate_fitted, weights = ipw_ate)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -100962    -8266    -7037    -1649  2208362  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8085.1      258.9   31.23   <2e-16 ***
## diabetes      6980.4      373.6   18.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1508361886)
## 
##     Null deviance: 3.4380e+13  on 22445  degrees of freedom
## Residual deviance: 3.3854e+13  on 22444  degrees of freedom
## AIC: 529952
## 
## Number of Fisher Scoring iterations: 2

## IPW with ATT
model3 <- glm(totexp ~ diabetes, 
              data = att_fitted, 
              weight = ipw_att)
summary(model3)

## 
## Call:
## glm(formula = totexp ~ diabetes, data = att_fitted, weights = ipw_att)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##  -16703    -3885    -2399    -1234  1001355  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10704.8      283.5   37.76   <2e-16 ***
## diabetes      5997.8      402.6   14.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 260161980)
## 
##     Null deviance: 5.8968e+12  on 22445  degrees of freedom
## Residual deviance: 5.8391e+12  on 22444  degrees of freedom
## AIC: 541393
## 
## Number of Fisher Scoring iterations: 2

#### Comparing the models together
model_list <- list("No matching" = model0,
                   "Nearest neighbor matching" = model1,
                   "IPW wih ATE" = model2,
                   "IPW with ATT" = model3)

modelsummary1 <- modelsummary(model_list,
             stars = TRUE,
             gof_omit = ".IC",
             fmt = fmt_decimal(digits = 0, pdigits = 3),
             statistic = NULL,
             conf_level = 0.95,
             vcov = "robust",
             estimate = "{estimate} [{conf.low}, {conf.high}] {stars}",
             notes = list("*** P<0.001")) 
modelsummary1

	No matching	Nearest neighbor matching	IPW wih ATE	IPW with ATT
*** P<0.001
(Intercept)	7645 [7221, 8069] ***	10565 [9534, 11596] ***	8085 [7639, 8531] ***	10705 [9978, 11432] ***
diabetes	9058 [8062, 10054] ***	6168 [4795, 7540] ***	6980 [5888, 8073] ***	5998 [4840, 7156] ***
Num.Obs.	22446	6334	22446	22446
R2	0.011	0.012	0.011	0.002
Log.Lik.	-262842.503	-73813.132	-299527.066	-266362.351
F	317.532	77.625	156.822	103.017
RMSE	29467.83	27848.65	29477.10	29603.86
Std.Errors	HC3	HC3	HC3	HC3

Note that the No matching, IPW with ATE, and IPW with ATT had the complete number of respondents (N = 22,246) in the analysis. Whereas, the Nearest neighbor matching only had N = 6334 in the analysis. This is because the Nearest neighbor matching approach discarded respondents who did not match.

The results varied based on the methods used.

In the no matching approach, respondents with a diagnosis of diabetes were associated with a greater increase in healthcare expenditures compared to respondents without a diagnosis of diabetes (+$9058; 95% CI: $8062, $10,054).

In the nearest neighbor matching approach, respondents with a diagnosis of diabetes were associated with a greater increase in healthcare expenditures compared to respondents without a diagnosis of diabetes (+$6184; 95% CI: $4810, $7557).

In the IPW with ATE approach, respondents with a diagnosis of diabetes were associated with a greater increase in healthcare expenditures compared to respondents without a diagnosis of diabetes (+$6980; 95% CI: $5888, $8073).

In the IPW with ATT approach, the interpretation is a little different. If the respondents who had an actual diagnosis of diabetes did not have diabetes, the difference in total healthcare expenditure would have been +$5998 greater (95% CI: $4840, $7156).

Conclusions

Propensity score matching is a useful technique that can mitigate or eliminate confounding from observational studies. However, the application of these techniques will depend on your assumptions and knowledge about the causal framework of your research endeavors. For example, which covariates used in estimating the propensity score will depend on your causal framework, which can be illustrated by a directed acyclic graph (DAG) diagram. It’s not recommended that you include all observable characteristics into the propensity score estimation. Each covariate should be carefully considered and justified for inclusion to estimate a propensity score based on some kind of causal framework. How ever you use these methods, keep in mind that they are based on assumptions, and it is our responsibility to ensure that we are transparent and explicit about these in our methods.

I have provided a couple of references below to help expand on topics related to propensity score matching.

References

1. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011;46(3):399-424. doi:10.1080/00273171.2011.568786.
1. Garrido MM, Dowd B, Hebert PL, Maciejewski ML. Understanding Treatment Effect Terminology in Pain and Symptom Management Research. J Pain Symptom Manage. 2016;52(3):446-452. doi:10.1016/j.jpainsymman.2016.01.016.

Acknowledgements

There are numerous resources that were used to develop this tutorial.

The MatchIt package GitHub site is a great resource on how to use this in estimating the propensity scores in R. URL: https://kosukeimai.github.io/MatchIt/

Dr. Andrew Heiss has a wealth of tutorials on propensity matching and IPW. This was a great resource and inspiration for my article: URL: https://evalsp21.classes.andrewheiss.com/example/matching-ipw/

The “Live Free or Dichotomize” blog had excellent explanations of the IPW for various treatment effects by author Lucy D’Agostino McGowan: URL: https://livefreeordichotomize.com/posts/2019-01-17-understanding-propensity-score-weighting/

Francisco Yirá’s blog on Matching was an inspiration for this article: URL: https://www.franciscoyira.com/post/matching-in-r-3-propensity-score-iptw/

The cobalt package was used to generate the Love plots: URL: https://ngreifer.github.io/WeightIt/articles/WeightIt.html

Disclaimers

This is for educational purposes only.

This is a work in progress and updates may occur in the future.

Update notes

Fixed an error where the IPW equations were displayed incorrectly.