Propensity Score Matching in R

R Markdown

Cornelia Parker - Thirty Pieces of Silver 1988-9 Detail - Tate Purchased

This blog will analyse the effect of going on a Family Treatment Therapy Sessions, as opposed to not attending a session, on SDQ scores for a groups of children. Because children who attend sessions might be different to those who do not , propensity score mating will be used to get more credible causal estimates of Family Treatment Therapy Sessions.

Propensity Score Matching is a class of statistical methods developed for estimating treatment effects with non-experimental data. In social and health sciences evaluators often face a fundamental task of drawing conditional causal inferences from quasi-experimental studies. PSM has been used in a variety of disciplines and professions, including epidemiology, medice, psychology, social work and sociology.

In 1983, Rosenbaum and Rubin published a seminal article that proposed to use a propensity score to address the Endogeneity Problem. The Endogeneity problem is synonymous with “Whenever other reasons exist that give rise to a correlation between a treatment and outcome, the overall correlation cannot be interpreted as a causal effect. THis can occur under a variety of conditions, but two cases are especially common in research: “omitted variable bias” and “simultaneity bias”.

On the whole, Randomised Control Trials (experimental design) is not always feasible in the world of human services. Propensity Score Matching is a method to implement randomized conditions to analyse service effects using non-experimental data.

A central question in social care program evaluation is whether children and families receiving various kinds of service fare better than those who do not receive services. Answering a question such as this requires a strategy that addresses selection bias: how to account for the common finding that families with the greatest need receive the most services, and the outcomes for these families often remain worse than those for families receiving less care (Barth and Mccrae 2007).

The motivation to develop the PSM approach stems from the need to analyse causal effects of treatment from observational data and the inherent need to reduce selection biases in program evaluation.

Heckman and Smith (1995) stipulate that there are three major challenges to the assumption that randomisation can result in no selection bias:

Influences on who becomes an applicant for the program
Influences on which applicants are accepted into the program
Influences on retention in the program

The Counterfactual Framework

One of the seminal developments in the conceptualisation of PSM is the counterfactual framework. The key assumption is that individuals selected into treatment and non-treatment groups have potential outcomes in both states: the one in which they are observed and the one in which they are not observed.

Steps in conducting PSM

There are essentially three steps involved in this procedure.

Step 1: Estimating Propensity Scores Using Logistic Regression

A propensity score is an estimated probability that a unit might be exposed to the program; it is constructed using the units observed characteristics (eg gender, age, number of siblings, length of involvement). Logit or probit regression is used to estimate the probability of a units exposure to a programme (Better Evaluation 2022)

The characteristics included should be well-considered and as extensive as possible.
It is very important that characteristics which may have been affected by the treatment

The challenge in this analysis primarily to do with model specification. Sometimes it is possible that other variables exist that distinguish between recipients and non-recipients.

Step 2: Matching

After obtaining estimated propensity scores, one matches cases to create a new sample of cases that share approximate similar likelihoods of being assigned to the treatment condition.

No matching algorithm that is superior in every context. Each involves a trade-off between efficiency and bias.

Step 3: Bivariate or multivariate analysis based on the new sample

With the new sample derived from matching we have obtained comparable or balanced groups with regard to propensity of treatment participation. THe hope is that through this procedure the selection into treatment has become ignorable.

Assumptions

PSM requires a deep understanding of the observable covariate that drive participation in an intervention and requires that there is substantial overlap between the propensity scores of those subjects or units which have benefitted from the program and those that have not; this is called the “common support”. If either of these two factors are lacking, PSM is not a suitable methodology for estimating causal effects of an intervention.
PSM also requires a large sample size in order to gain statistically reliable results.
PSM is not a panacea - because it matches only on observed information, it may not eliminate bias from unobserved differences between treatment and comparison groups.

(source: Better Evaluation website)

The following packages are required for this analysis.

library(wakefield) # This package generates random data sets including: data.frames, lists, and vectors.
library(MatchIt) # This package includes several popular approaches to matching 
library(pacman) # This package  is an R package management tool
library(knitr) # Provides a general-purpose tool for dynamic report generation in R
library(tableone) # This package can summarize both continuous and categorical variables mixed within one table
library(captioner) # This package allows you to store figure and table captions and print them later.
library(zoo)
library(lmtest) # this is required for coeftest function 
library(sandwich) # this is required for the vcovCL function
library(optmatch)

Step 1.1: Create a pseudo dataset of children who have and have not attended Family Treatment Therapy Sessions

The wakefield package will be used to generate a random dataset. The independent variable of interest is treat, this indicates if the child attended a session.

## Using the wakefield package to easily generate reproducible "treatment" sample data
set.seed(9999)
df.treatment <- 
  r_data_frame(
    n = 225,
    r_sample(n = 255, x = 1, prob = NULL, name = "treat"),
    id, 
    age(x = 5:18, prob = NULL, name = "age"), 
    sex,
    r_sample(x = c("White", "black", "asian","other"),
             name = "ethnicity"), 
    normal(mean = 20, sd = 2, min = NULL, max = NULL, name = "SDQ_Before"), 
    normal(mean = 10, sd = 2, min = NULL, max = NULL, name = "SDQ_After"),
    r_sample( x = c("CIN", "CP", "LAC"),
              prob = c(0.5, 0.3, 0.2),
              name = "CSS_type")
  )

## Using the wakefield package to easily generate reproducible "population" sample data
df.population <- 
  r_data_frame(
    n = 1000,
    r_sample(x = 0, prob = NULL, name = "treat"),
    id, 
    age(x = 0:18, prob = NULL, name = "age"), 
    sex,
    r_sample(x = c("White", "black", "asian","other"),
             name = "ethnicity"), 
    normal(mean = 20, sd = 2, min = NULL, max = NULL, name = "SDQ_Before"), 
    normal(mean = 20, sd = 2, min = NULL, max = NULL, name = "SDQ_After"),
    r_sample( x = c("CIN", "CP", "LAC"),
              prob = c(0.3, 0.3, 0.4),
              name = "CSS_type")
  )
# Merging the dataframes
mydata <- rbind(df.treatment, df.population)

head(mydata)

## # A tibble: 6 × 8
##   treat ID      age Sex    ethnicity SDQ_Before SDQ_After CSS_type
##   <dbl> <chr> <int> <fct>  <chr>          <dbl>     <dbl> <chr>   
## 1     1 001      18 Male   other           21.9      6.75 LAC     
## 2     1 002       7 Male   White           24.3      8.35 CIN     
## 3     1 003      16 Male   other           16.4      9.78 CP      
## 4     1 004       6 Female black           22.3     10.3  CIN     
## 5     1 005       8 Female White           20.8      6.04 CIN     
## 6     1 006      18 Female other           20.2     11.2  CIN

Step 1.2: Propensity score estimation

To estimate the propensity score a logit model is run where the outcome variable is a binary variable indicating the treatment status.

For the matching to provide a causal estimate it is important to include any covariate that is related to both the treatment assignment and potential outcomes.

Using this model it is now possible to calculate the propensity score for each child. It is simply the child’s predicted probability of being treated (attend a session), given the estimates from the logit model.

The predict() command can be used to create a dataframe that has the propensity score as well as the child’s actual treatment status.

Step 1.2.1: No Matching & Compare Covariate Imbalance Here, we construct a propensity score for each observation based on the covariates (Age, ethnicity, CSS_Type, SDQ_Before), representing its propensity to attend a session.

result_0 <- matchit(treat ~ age + ethnicity + Sex + CSS_type + SDQ_Before, data = mydata, method = NULL, distance = 'glm')
result_0

## A matchit object
##  - method: None (no matching)
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 1225 (original)
##  - target estimand: ATT
##  - covariates: age, ethnicity, Sex, CSS_type, SDQ_Before

summary(result_0)

## 
## Call:
## matchit(formula = treat ~ age + ethnicity + Sex + CSS_type + 
##     SDQ_Before, data = mydata, method = NULL, distance = "glm")
## 
## Summary of Balance for All Data:
##                Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance              0.2421        0.1705          0.7306     0.8529    0.2043
## age                  11.3378        8.9180          0.6183     0.5128    0.1274
## ethnicityasian        0.2222        0.2580         -0.0861          .    0.0358
## ethnicityblack        0.2400        0.2430         -0.0070          .    0.0030
## ethnicityother        0.3156        0.2400          0.1626          .    0.0756
## ethnicityWhite        0.2222        0.2590         -0.0885          .    0.0368
## SexMale               0.5200        0.5290         -0.0180          .    0.0090
## SexFemale             0.4800        0.4710          0.0180          .    0.0090
## CSS_typeCIN           0.5067        0.3070          0.3994          .    0.1997
## CSS_typeCP            0.3244        0.2880          0.0778          .    0.0364
## CSS_typeLAC           0.1689        0.4050         -0.6302          .    0.2361
## SDQ_Before           20.2212       20.0532          0.0792     1.1409    0.0251
##                eCDF Max
## distance         0.3742
## age              0.2600
## ethnicityasian   0.0358
## ethnicityblack   0.0030
## ethnicityother   0.0756
## ethnicityWhite   0.0368
## SexMale          0.0090
## SexFemale        0.0090
## CSS_typeCIN      0.1997
## CSS_typeCP       0.0364
## CSS_typeLAC      0.2361
## SDQ_Before       0.0803
## 
## 
## Sample Sizes:
##           Control Treated
## All          1000     225
## Matched      1000     225
## Unmatched       0       0
## Discarded       0       0

In total, there are 225 treated observations and 1000 non-treated observations. Standard Mean Difference and eCDF statistics approaching 0 or Variance Ratio approximating to 1 indicate good balance; Otherwise, imbalance.

With no matching method applied, there are severe imbalances between the treated and non-treated groups. They look quite different in terms of Standard Mean Difference (Std. Mean Diff.), Variance Ratio (Var. Ratio), and Empirical Cumulative Density Function (eCDF).

Step 1.2.2: Nearest Neighbor

Nearest neighbor matching is also known as greedy matching. It involves running through the list of treated units and selecting the closest eligible control unit to be paired with each treated unit.

result_1 <- matchit(treat ~ age + ethnicity + Sex + CSS_type + SDQ_Before, data = mydata, method = "nearest", distance = 'glm')
result_1

## A matchit object
##  - method: 1:1 nearest neighbor matching without replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 1225 (original), 450 (matched)
##  - target estimand: ATT
##  - covariates: age, ethnicity, Sex, CSS_type, SDQ_Before

summary(result_1, un=FALSE)

## 
## Call:
## matchit(formula = treat ~ age + ethnicity + Sex + CSS_type + 
##     SDQ_Before, data = mydata, method = "nearest", distance = "glm")
## 
## Summary of Balance for Matched Data:
##                Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance              0.2421        0.2418          0.0032     1.0071    0.0012
## age                  11.3378       11.1378          0.0511     0.6757    0.0358
## ethnicityasian        0.2222        0.2489         -0.0641          .    0.0267
## ethnicityblack        0.2400        0.2356          0.0104          .    0.0044
## ethnicityother        0.3156        0.2933          0.0478          .    0.0222
## ethnicityWhite        0.2222        0.2222          0.0000          .    0.0000
## SexMale               0.5200        0.5467         -0.0534          .    0.0267
## SexFemale             0.4800        0.4533          0.0534          .    0.0267
## CSS_typeCIN           0.5067        0.4844          0.0444          .    0.0222
## CSS_typeCP            0.3244        0.3911         -0.1424          .    0.0667
## CSS_typeLAC           0.1689        0.1244          0.1186          .    0.0444
## SDQ_Before           20.2212       20.0199          0.0948     1.2580    0.0323
##                eCDF Max Std. Pair Dist.
## distance         0.0222          0.0064
## age              0.0933          0.9120
## ethnicityasian   0.0267          0.7911
## ethnicityblack   0.0044          0.7597
## ethnicityother   0.0222          0.7938
## ethnicityWhite   0.0000          0.3289
## SexMale          0.0267          1.0141
## SexFemale        0.0267          1.0141
## CSS_typeCIN      0.0222          0.7912
## CSS_typeCP       0.0667          0.9019
## CSS_typeLAC      0.0444          0.2847
## SDQ_Before       0.1067          1.0203
## 
## Sample Sizes:
##           Control Treated
## All          1000     225
## Matched       225     225
## Unmatched     775       0
## Discarded       0       0

plot(result_1, type = "jitter", interactive = FALSE)

After 1:1 matching, the two groups have a better balance compared to no matching in terms of Std. Mean Diff., Var. Ratio, and eCDF statistics.

However, there still exists group imbalance. Let’s check other matching methods and compare the results.

Step 1.2.3: Optimal Full Matching Optimal full matching (often just called full matching) assigns every treated and control unit in the sample to one subclass each (Hansen 2004; Stuart and Green 2008). Each subclass contains one treated unit and one or more control units or one control units and one or more treated units. It is optimal in the sense that the chosen number of subclasses and the assignment of units to subclasses minimize the sum of the absolute within-subclass distances in the matched sample.

result_2 <- matchit(treat ~ age + ethnicity + Sex + CSS_type + SDQ_Before, data = mydata, method = "full", distance = "glm", link = "probit")
result_2

## A matchit object
##  - method: Optimal full matching
##  - distance: Propensity score
##              - estimated with probit regression
##  - number of obs.: 1225 (original), 1225 (matched)
##  - target estimand: ATT
##  - covariates: age, ethnicity, Sex, CSS_type, SDQ_Before

summary(result_2, un = FALSE)

## 
## Call:
## matchit(formula = treat ~ age + ethnicity + Sex + CSS_type + 
##     SDQ_Before, data = mydata, method = "full", distance = "glm", 
##     link = "probit")
## 
## Summary of Balance for Matched Data:
##                Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance              0.2435        0.2434          0.0010     0.9989    0.0024
## age                  11.3378       11.0871          0.0641     0.6546    0.0385
## ethnicityasian        0.2222        0.2211          0.0026          .    0.0011
## ethnicityblack        0.2400        0.2474         -0.0173          .    0.0074
## ethnicityother        0.3156        0.3115          0.0087          .    0.0041
## ethnicityWhite        0.2222        0.2200          0.0054          .    0.0022
## SexMale               0.5200        0.5165          0.0071          .    0.0035
## SexFemale             0.4800        0.4835         -0.0071          .    0.0035
## CSS_typeCIN           0.5067        0.4637          0.0859          .    0.0429
## CSS_typeCP            0.3244        0.4053         -0.1727          .    0.0808
## CSS_typeLAC           0.1689        0.1310          0.1012          .    0.0379
## SDQ_Before           20.2212       20.3282         -0.0504     1.1637    0.0252
##                eCDF Max Std. Pair Dist.
## distance         0.0133          0.0225
## age              0.1091          1.0219
## ethnicityasian   0.0011          0.8248
## ethnicityblack   0.0074          0.9097
## ethnicityother   0.0041          0.6867
## ethnicityWhite   0.0022          0.9154
## SexMale          0.0035          1.0097
## SexFemale        0.0035          1.0097
## CSS_typeCIN      0.0429          0.5134
## CSS_typeCP       0.0808          0.7536
## CSS_typeLAC      0.0379          0.4947
## SDQ_Before       0.0655          1.0530
## 
## Sample Sizes:
##               Control Treated
## All            1000.      225
## Matched (ESS)   326.8     225
## Matched        1000.      225
## Unmatched         0.        0
## Discarded         0.        0

1.2.4: Choosing a Matching Method

Choosing the best matching method for one’s data depends on the unique characteristics of the dataset as well as the goals of the analysis. For example, because different matching methods can target different estimands, when certain estimands are desired, specific methods must be used. On the other hand, some methods may be more effective than others when retaining the target estimand is less important.

There are many matching methods. These include: Nearest Neighbor Matching, Optimal Pair Matching, Optimal Full Matching, Genetic Matching, Exact Matching, Coarsened Exact Matching, Subclassification and Cardinality and Template Matching.

For the convenience of blog Optimal Full Matching has been chosen.

# Distribution of propensity scores
plot(result_2, type='hist')

plot(summary(result_2))

Step 3: Estimate the Effect and Standard Error

After assessing the balance of the matched data, we can assign the matched data to a variable and take a look at the matched dataset.

mydata2 <- match.data(result_2)
mydata2

## # A tibble: 1,225 × 11
##    treat ID      age Sex    ethnicity SDQ_Before SDQ_A…¹ CSS_t…² dista…³ weights
##    <dbl> <chr> <int> <fct>  <chr>          <dbl>   <dbl> <chr>     <dbl>   <dbl>
##  1     1 001      18 Male   other           21.9    6.75 LAC       0.223       1
##  2     1 002       7 Male   White           24.3    8.35 CIN       0.230       1
##  3     1 003      16 Male   other           16.4    9.78 CP        0.315       1
##  4     1 004       6 Female black           22.3   10.3  CIN       0.215       1
##  5     1 005       8 Female White           20.8    6.04 CIN       0.221       1
##  6     1 006      18 Female other           20.2   11.2  CIN       0.488       1
##  7     1 007       9 Male   asian           23.9   12.0  CIN       0.264       1
##  8     1 008      13 Female other           23.4    8.10 LAC       0.165       1
##  9     1 009      10 Female asian           23.3    8.45 CP        0.206       1
## 10     1 010      11 Male   black           15.8   12.4  CIN       0.238       1
## # … with 1,215 more rows, 1 more variable: subclass <fct>, and abbreviated
## #   variable names ¹SDQ_After, ²CSS_type, ³distance
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Step 4: An alternative way is to estimate the treatment effect using a model

In the model, the dependent variable is the outcome, and the independent variables are the covariates plus the treatment. The coefficient of the treatment variable is the causal impact.

For the purposes of this blog the dependent variable is an SDQ score taken by all children. The Strengths and Difficulties Questionnaire (SDQ) is a brief behavioural screening questionnaire about 2-17 year olds. It exists in several versions to meet the needs of researchers, clinicians and educationalists.

fit2 <- lm(SDQ_After ~ treat + age + ethnicity + Sex + CSS_type + SDQ_Before, data = mydata2, weights = weights)

coeftest(fit2, vcov. = vcovCL, cluster = ~subclass)

## 
## t test of coefficients:
## 
##                   Estimate  Std. Error  t value Pr(>|t|)    
## (Intercept)     20.0075507   0.8372754  23.8960  < 2e-16 ***
## treat          -10.1227281   0.1679161 -60.2844  < 2e-16 ***
## age             -0.0017570   0.0176085  -0.0998  0.92053    
## ethnicityblack  -0.2184938   0.2485973  -0.8789  0.37963    
## ethnicityother  -0.0081277   0.2464382  -0.0330  0.97370    
## ethnicityWhite  -0.0170951   0.2588987  -0.0660  0.94736    
## SexFemale       -0.2442664   0.1723113  -1.4176  0.15657    
## CSS_typeCP      -0.0011790   0.1879208  -0.0063  0.99500    
## CSS_typeLAC     -0.4042603   0.2302054  -1.7561  0.07933 .  
## SDQ_Before       0.0175669   0.0400622   0.4385  0.66111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on this model, the effect of attending a Family Treatment Therapy Session is a reduction of 10 in the SDQ score with a standard error 0.17 and statistically significant at 0.001.

Where the total SDQ score is Low, no further action is required. Where the total SDQ score is Medium or High asocial worker should consider how best to meet the emotional needs of the child or young person and take action.

Limitations

Propensity score analysis is often used to address selection bias in program evaluation with observational data. However, a recent study suggested that propensity score matching may accomplish the opposite of its intended goal - increasing imbalance, inefficiency, model dependence and bias (Guo et al 2020). This is a problem some authors refer to as the PSM paradox (King and Nielsen 2019)

References

Guo, S., Fraser, M., Chen., Q (2020) Propensity Score Analysis: Recent Debate and Discussion , Journal for Social Work and Research, Volume 11 Number 3

King, G., Nielsen R., (2019) WHy Propensity Scores Should Not Be Used for Matching,

Heckman, J., Smith, J. (1995) Assessing the case for social experiments. Journal of Economic Perspectives, 9 85-100

Barth, R., Mccrae, J (2007) Propensity Score Matching Strategies for Evaluating the Success of Child and Family Service Programs, Research on Social Work Practice

Useful websites

An Ultimate Guide to Matching and Propensity Score Matching

Propensity Score Matching in R

Paul Rucki

2022-08-22

R Markdown