Propensity Scores and the Right Heart Catheterization Data

Author

Thomas E. Love, Ph.D.

Published

2023-03-10

1 Preliminaries

1.1 Study Background

This example is based on the Right Heart Catheterization data set available at Vanderbilt University and described here. The key reference is Connors AF et al. 1996 The effectiveness of RHC in the initial care of critically ill patients. JAMA 276: 889-897. Connors et al. used a logistic regression model to develop a propensity score then: [a] matched RHC to non-RHC patients and [b] adjusted for propensity score in models for outcomes, followed by a sensitivity analysis. The key conclusions were that RHC patients had decreased survival time, and any unmeasured confounder would have to be somewhat strong to explain away the results.

1.2 Loading Packages

Code

knitr::opts_chunk$set(comment = NA)

library(here); library(janitor); library(naniar)
library(tableone); library(broom); library(survival)
library(Matching); library(cobalt); library(rgenoud)
library(lme4); library(rbounds); library(conflicted)
library(twang); library(survey)
library(tidyverse)

conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")

theme_set(theme_bw())

1.3 Importing the Data

The observations in the rhc data set describe 5,735 subjects who are critically ill, and most variables were obtained during day 1 of a hospitalization. We’ll begin by importing the rhc data into a tibble (a lazy and surly data frame - see R’s help file on tibbles for more details) in R that contains 5,735 observations (rows) on 63 variables (columns).

In the import process, we’re going to do a few things to simplify the process of getting a functional data set.

The automated column specification in R’s readr package uses the first 1000 rows of the data set - with the rhc data, this creates parsing failures for several measures which are assumed to be integers but actually should be parsed as doubles (continuous variables). So we specify the column types for those variables here then allow R to identify the type for the other columns.
By default, readr’s read_csv command treats all string variables as characters, and not as factors. For several multicategorical variables, we want a factor representation in a specific order, so we’ll specify that, too, also using the cols function.

Code

column_types_rhc <- 
    cols(urin1 = "d", meanbp1 = "d", resp1 = "d",
    swang1 = col_factor(c("RHC", "No RHC")),
    death = col_factor(c("No", "Yes")),
    sex = col_factor(c("Male", "Female")),
    cat1 = col_factor(c("ARF", "CHF", "Cirrhosis", "Colon Cancer", "Coma", "COPD",
                        "Lung Cancer", "MOSF w/Malignancy", "MOSF w/Sepsis")),
    dnr1 = col_factor(c("No", "Yes")),
    card = col_factor(c("No", "Yes")),
    gastr = col_factor(c("No", "Yes")),
    hema = col_factor(c("No", "Yes")),
    meta = col_factor(c("No", "Yes")),
    neuro = col_factor(c("No", "Yes")),
    ortho = col_factor(c("No", "Yes")),
    renal = col_factor(c("No", "Yes")),
    resp = col_factor(c("No", "Yes")),
    seps = col_factor(c("No", "Yes")),
    trauma = col_factor(c("No", "Yes")),
    income = col_factor(c("Under $11k", "$11-$25k", "$25-$50k", "> $50k")),
    ninsclas = col_factor(c("Private", "Private & Medicare", "Medicare", 
                             "Medicare & Medicaid", "Medicaid", "No insurance")),
    race = col_factor(c("white", "black", "other")),
    ca = col_factor(c("No", "Yes", "Metastatic"))
)

The data are available at the link below.

Code

rhc_raw <- read_csv("https://biostat.app.vumc.org/wiki/pub/Main/DataSets/rhc.csv", 
                    col_types = column_types_rhc)

New names:
• `` -> `...1`

This will throw a message indicating that the `` variable (which I assume is a set of row names) will have a new name, but we’ll ignore that since we won’t use the row names in our subsequent work.

After we import the data, we re-order the columns to present the SUPPORT patient identification code first, then the exposure (swang1), then information on the outcomes, then information on the covariates of interest for our work, dropping the other variables.

Code

rhc_cleaning <- rhc_raw %>% 
    select(ptid, swang1, 
           death, sadmdte, dschdte, dthdte, lstctdte,
           age, sex, edu, income, ninsclas, race,
           cat1, dnr1, wtkilo1, hrt1, meanbp1, resp1, temp1,
           card, gastr, hema, meta, neuro, ortho, renal, 
           resp, seps, trauma,
           amihx, ca, cardiohx, chfhx, chrpulhx, dementhx, 
           gibledhx, immunhx, liverhx, malighx, psychhx, 
           renalhx, transhx, aps1, das2d3pc, scoma1, 
           surv2md1, alb1, bili1, crea1, hema1, paco21, 
           pafi1, ph1, pot1, sod1, wblc1)

1.4 The Data Set after Initial Import

The rhc_cleaning data at this stage is a tibble (a lazy and surly data frame) containing 5735 observations (rows) on 57 variables (columns). The observations describe subjects who are critically ill, and most variables were obtained during day 1 of a hospitalization.

Code

rhc_cleaning

# A tibble: 5,735 × 57
   ptid  swang1 death sadmdte dschdte dthdte lstctdte   age sex      edu income 
   <chr> <fct>  <fct>   <dbl>   <dbl>  <dbl>    <dbl> <dbl> <fct>  <dbl> <fct>  
 1 00005 No RHC No      11142   11151     NA    11382  70.3 Male   12    Under …
 2 00007 RHC    Yes     11799   11844  11844    11844  78.2 Female 12    Under …
 3 00009 RHC    No      12083   12143     NA    12400  46.1 Female 14.1  $25-$5…
 4 00010 No RHC Yes     11146   11183  11183    11182  75.3 Female  9    $11-$2…
 5 00011 RHC    Yes     12035   12037  12037    12036  67.9 Male    9.95 Under …
 6 00012 No RHC No      12389   12396     NA    12590  86.1 Female  8    Under …
 7 00013 No RHC No      12381   12423     NA    12616  55.0 Male   14    $25-$5…
 8 00014 No RHC Yes     11453   11487  11491    11490  43.6 Male   12    $25-$5…
 9 00016 No RHC No      12426   12437     NA    12560  18.0 Female 12.8  Under …
10 00017 RHC    No      11381   11400     NA    11590  48.4 Female 11.0  Under …
# … with 5,725 more rows, and 46 more variables: ninsclas <fct>, race <fct>,
#   cat1 <fct>, dnr1 <fct>, wtkilo1 <dbl>, hrt1 <dbl>, meanbp1 <dbl>,
#   resp1 <dbl>, temp1 <dbl>, card <fct>, gastr <fct>, hema <fct>, meta <fct>,
#   neuro <fct>, ortho <fct>, renal <fct>, resp <fct>, seps <fct>,
#   trauma <fct>, amihx <dbl>, ca <fct>, cardiohx <dbl>, chfhx <dbl>,
#   chrpulhx <dbl>, dementhx <dbl>, gibledhx <dbl>, immunhx <dbl>,
#   liverhx <dbl>, malighx <dbl>, psychhx <dbl>, renalhx <dbl>, …

1.5 Any missingness?

Code

rhc_cleaning %>%
    miss_var_summary()

# A tibble: 57 × 3
   variable n_miss pct_miss
   <chr>     <int>    <dbl>
 1 dthdte     2013  35.1   
 2 dschdte       1   0.0174
 3 ptid          0   0     
 4 swang1        0   0     
 5 death         0   0     
 6 sadmdte       0   0     
 7 lstctdte      0   0     
 8 age           0   0     
 9 sex           0   0     
10 edu           0   0     
# … with 47 more rows

We could also graph this with gg_miss_var(rhc_cleaning). As it turns out, there were three variables in the raw rhc data that have missing values, but we’ve omitted one (about day 1 urine output) in creating this rhc_cleaning file. The remaining two are dates, and we’ll address those in the material on creating our outcomes.

2 The Treatment / Exposure We’ll Study

The treatment/exposure we are studying is the presence of a Swan-Ganz (right heart) catheter on day 1, captured in the variable swang1. The Medline Plus description of right heart catheterization is here. Basically, the procedure passes a thin tube into the right side of the heart and the arteries leading to the lungs, to measure the heart’s function and blood flow. It’s done in very ill patients for several reasons, and is also referred to as right heart catheterization and pulmonary artery catheterization.

Code

rhc_cleaning %>% tabyl(swang1)

 swang1    n   percent
    RHC 2184 0.3808195
 No RHC 3551 0.6191805

Of course, the most important issue to us is that each subject was not randomly allocated to either the “RHC” or “No RHC” group. Instead, this is an observational study, where each subject received their treatment (RHC or no) on the basis of what their providers thought would be best for them.

3 The 3 Outcomes We’ll Study

We will define three outcomes here: one binary, one quantitative, and one time-to-event.

Our first outcome is in-study mortality, captured as a binary variable.
We will then define hospital length of stay, a quantitative variable, and then
Time to death, a time-to-event variable that will be right-censored for those who did not die during the study.

3.1 Mortality, a binary outcome: `death`

One outcome is death during the study, as captured in the death variable. This is a binary outcome.

Code

rhc_cleaning %>% tabyl(death)

 death    n   percent
    No 2013 0.3510026
   Yes 3722 0.6489974

Those people without a value for dthdte turn out to be the people with death values of “No” and this makes sense, because dthdte is a code for date of death.

3.2 Working with Dates to Define the Time-Related Outcomes

3.2.1 The Raw Data on Dates

Variable	Definition	Values
`sadmdte`	Study Admission Date	Integer ranging from 10754 to 12441, indicating the # of days after 1960-01-01, so study admission date range is actually 1989-06-11 to 1994-01-23.
`dschdte`	Hospital Discharge Date	Integer ranging from 10757 to 12560, indicating the # of days after 1960-01-01, so hospital discharge date range is actually 1989-06-14 to 1994-05-22. (Note: 1 missing value)
`dthdte`	Death Date	Integer ranging from 10757 to 12783, indicating the # of days after 1960-01-01, so death date range is actually 1989-06-14 to 1994-12-31. The 2013 people with missing values are those with No for `death`.
`lstctdte`	Date of Last Contact	Integer ranging from 10756 to 12644, indicating the # of days after 1960-01-01, so death date range is actually 1989-06-13 to 1994-08-14.

Next, we will develop the two other primary outcomes: survival time (which we’ll store in survdays), and hospital length of stay (which will go in hospdays).

It turns out that for three subjects, their dates of last contact appear to occur after their death dates, which is irritating, but doesn’t affect us much since the only way in which we’ll use the last contact date is for calculating survival time for patients without a death date.

3.2.2 Impute hospital discharge date for the one missing value

We could just impute a random value, but that’s not a great idea, since hospital discharge date is tied to other information that is available to us for this patient. The subject with a missing hospital discharge date turns out to be ptid 08382.

Code

rhc_cleaning %>% filter(is.na(dschdte)) %>% 
    select(ptid, death, sadmdte, dschdte, dthdte, lstctdte)

# A tibble: 1 × 6
  ptid  death sadmdte dschdte dthdte lstctdte
  <chr> <fct>   <dbl>   <dbl>  <dbl>    <dbl>
1 08382 Yes     12386      NA  12780    12582

So the patient was admitted to the study on day 12386, had their last contact with the study on day 12582 (i.e. 196 days after study entry), and died on day 12780 (i.e. 394 days after study entry.) It seems rational to impute a value of hospital discharge date that falls in between their date of study admission and their date of last contact.

Among the RHC patients, what is a reasonable guess as to the time from study admission to hospital discharge that we might apply to this patient? We could see what the distribution of these differences are in the available data, like so…

Code

mosaic::favstats(dschdte - sadmdte ~ swang1, data = rhc_cleaning)

  swang1 min Q1 median Q3 max     mean       sd    n missing
1    RHC   2  8     16 31 342 24.69125 27.80502 2183       1
2 No RHC   2  7     12 22 338 19.52915 23.58672 3551       0

I’ll arbitrarily choose to impute a discharge date for ptid 08382 so as to yield a 25 day hospitalization, so as to affect the mean stay length within the RHC group as little as possible.

This implies that we’ll impute a value of 12386 + 25 = 12411 for this patient.

Code

rhc_cleaning <- rhc_cleaning %>%
    mutate(dschdte = replace_na(dschdte, 12411))

rhc_cleaning %>% select(dschdte) %>% miss_var_summary()

# A tibble: 1 × 3
  variable n_miss pct_miss
  <chr>     <int>    <dbl>
1 dschdte       0        0

3.3 Length of initial hospital stay, a quantitative outcome: `hospdays`

Our third outcome is length of initial hospital stay, which is captured by subtracting the study admission date (sadmdte) from the hospital discharge date (dschdte). This is a quantitative outcomes, and we will create it soon, and name it hospdays in our data set.

Code

rhc_cleaning <- rhc_cleaning %>%
    mutate(hospdays = dschdte - sadmdte)

mosaic::favstats(hospdays ~ swang1, data = rhc_cleaning)

  swang1 min Q1 median Q3 max     mean       sd    n missing
1    RHC   2  8     16 31 342 24.69139 27.79865 2184       0
2 No RHC   2  7     12 22 338 19.52915 23.58672 3551       0

It looks like all of these values are reasonable (in that they are all positive) and we have no missing values.

3.4 Survival Time in Days, a time-to-event outcome: `survdays`

Another outcome is survival time in days (which is censored for all patients who did not die during the study), and which is captured by subtracting the study admission date (sadmdte) from the death date (dthdte) for the patients who died in study, and the study admission date (sadmdte) from the date of last contact (lstctdte) for the patients who did not die in study. This is a time-to-event outcome, and we will create it soon, and name it survdays in our data set.

For the subjects who died, this is straightforward, in that we take the death date and subtract the study admission date.
For those who survived through the study, we take the date of last contact and subtract the study admission date, in order to get their (right censored) survival times.

Code

rhc_cleaning <- rhc_cleaning %>%
    mutate(survdays = ifelse(death == "Yes", 
                             dschdte - sadmdte,
                             lstctdte - sadmdte))

mosaic::favstats(survdays ~ death, data = rhc_cleaning)

  death min  Q1 median  Q3  max      mean        sd    n missing
1    No   2 187    210 236 1351 231.18082 112.24989 2013       0
2   Yes   2   6     12  23  338  19.99893  24.59443 3722       0

Again, this looks reasonable. All of these values are strictly positive, for instance, and none seem wildly out of line, and we have no missing values. We also see that those who died had much shorter survival times (typically) than those who survived, which makes sense.

We might also look at the survdays variable broken down by RHC status.

Code

mosaic::favstats(survdays ~ swang1, data = rhc_cleaning)

  swang1 min Q1 median  Q3  max     mean       sd    n missing
1    RHC   2  9   25.5 187 1351 91.75962 129.5763 2184       0
2 No RHC   2  9   25.0 191 1243 95.57871 117.7174 3551       0

These seem quite comparable, so far, and again, they make sense (all are positive, and there are no missing values.)

4 The 50 Covariates We’ll Study

Our propensity model will eventually include 50 covariates, described below.

4.1 Group 1: Socio-demographics (6 covariates)

Variable	Definition
`age`	Age in years
`sex`	Sex
`edu`	Years of Education
`income`	Income Category (4 levels)
`ninsclas`	Insurance Category (6 levels)
`race`	Self-Reported Race (3 levels)

Here’s a Table 1 for these covariates.

Code

vars01 <- c("age", "sex", "edu", "income", "ninsclas", "race")

table1_01 <- 
    CreateTableOne(vars = vars01, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_01, smd = TRUE)

                        Stratified by swang1
                         RHC           No RHC        SMD   
  n                       2184          3551               
  age (mean (SD))        60.75 (15.63) 61.76 (17.29)  0.061
  sex = Female (%)         906 (41.5)   1637 (46.1)   0.093
  edu (mean (SD))        11.86 (3.16)  11.57 (3.13)   0.091
  income (%)                                          0.142
     Under $11k           1145 (52.4)   2081 (58.6)        
     $11-$25k              452 (20.7)    713 (20.1)        
     $25-$50k              393 (18.0)    500 (14.1)        
     > $50k                194 ( 8.9)    257 ( 7.2)        
  ninsclas (%)                                        0.194
     Private               731 (33.5)    967 (27.2)        
     Private & Medicare    490 (22.4)    746 (21.0)        
     Medicare              511 (23.4)    947 (26.7)        
     Medicare & Medicaid   123 ( 5.6)    251 ( 7.1)        
     Medicaid              193 ( 8.8)    454 (12.8)        
     No insurance          136 ( 6.2)    186 ( 5.2)        
  race (%)                                            0.036
     white                1707 (78.2)   2753 (77.5)        
     black                 335 (15.3)    585 (16.5)        
     other                 142 ( 6.5)    213 ( 6.0)

Two of these covariates show standardized mean differences larger than 0.10.

4.2 Group 2: Presentation at Admission (7 covariates)

Variable	Definition
`cat1`	Primary disease category (9 levels)
`dnr1`	Do Not Resuscitate status, day 1
`wtkilo1`	Weight in kg, day 1
`hrt1`	Heart rate, day 1
`meanbp1`	Mean Blood Pressure, day 1
`resp1`	Respiratory rate, day 1
`temp1`	Temperature, C, day 1

Code

vars02 <- c("cat1", "dnr1", "wtkilo1", "hrt1", "meanbp1", 
          "resp1", "temp1")

table1_02 <- 
    CreateTableOne(vars = vars02, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_02, smd = TRUE)

                      Stratified by swang1
                       RHC            No RHC         SMD   
  n                      2184           3551               
  cat1 (%)                                            0.583
     ARF                  909 (41.6)    1581 (44.5)        
     CHF                  209 ( 9.6)     247 ( 7.0)        
     Cirrhosis             49 ( 2.2)     175 ( 4.9)        
     Colon Cancer           1 ( 0.0)       6 ( 0.2)        
     Coma                  95 ( 4.3)     341 ( 9.6)        
     COPD                  58 ( 2.7)     399 (11.2)        
     Lung Cancer            5 ( 0.2)      34 ( 1.0)        
     MOSF w/Malignancy    158 ( 7.2)     241 ( 6.8)        
     MOSF w/Sepsis        700 (32.1)     527 (14.8)        
  dnr1 = Yes (%)          155 ( 7.1)     499 (14.1)   0.228
  wtkilo1 (mean (SD))   72.36 (27.73)  65.04 (29.50)  0.256
  hrt1 (mean (SD))     118.93 (41.47) 112.87 (40.94)  0.147
  meanbp1 (mean (SD))   68.20 (34.24)  84.87 (38.87)  0.455
  resp1 (mean (SD))     26.65 (14.17)  28.98 (13.95)  0.165
  temp1 (mean (SD))     37.59 (1.83)   37.63 (1.74)   0.021

Six of these covariates show standardized mean differences larger than 0.10.

4.3 Group 3: Admission Diagnosis Categories (10 covariates)

Variable	Definition
`card`	Cardiovascular diagnosis
`gastr`	Gastrointestinal diagnosis
`hema`	Hematologic diagnosis
`meta`	Metabolic diagnosis
`neuro`	Neurological diagnosis
`ortho`	Orthopedic diagnosis
`renal`	Renal diagnosis
`resp`	Respiratory diagnosis
`seps`	Sepsis diagnosis
`trauma`	Trauma diagnosis

Code

vars03 <- c("card", "gastr", "hema", "meta", "neuro", "ortho",
          "renal", "resp", "seps", "trauma")

table1_03 <- 
    CreateTableOne(vars = vars03, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_03, smd = TRUE)

                  Stratified by swang1
                   RHC          No RHC       SMD   
  n                2184         3551               
  card = Yes (%)    924 (42.3)  1007 (28.4)   0.295
  gastr = Yes (%)   420 (19.2)   522 (14.7)   0.121
  hema = Yes (%)    115 ( 5.3)   239 ( 6.7)   0.062
  meta = Yes (%)     93 ( 4.3)   172 ( 4.8)   0.028
  neuro = Yes (%)   118 ( 5.4)   575 (16.2)   0.353
  ortho = Yes (%)     4 ( 0.2)     3 ( 0.1)   0.027
  renal = Yes (%)   148 ( 6.8)   147 ( 4.1)   0.116
  resp = Yes (%)    632 (28.9)  1481 (41.7)   0.270
  seps = Yes (%)    516 (23.6)   515 (14.5)   0.234
  trauma = Yes (%)   34 ( 1.6)    18 ( 0.5)   0.104

Seven of these covariates show standardized mean differences larger than 0.10.

4.4 Group 4: Comorbid Illness and Transfer Status (13 covariates)

Variable	Definition
`amihx`	Definite Myocardial Infarction
`ca`	Cancer (3 levels)
`cardiohx`	Acute MI, Peripheral Vascular Disease, Severe Cardiovascular Symptoms (NYHA-Class III), Very Severe Cardiovascular Symptoms (NYHA- IV)
`chfhx`	Congestive Heart Failure
`chrpulhx`	Chronic Pulmonary Disease, Severe or Very Severe Pulmonary Disease
`dementhx`	Dementia, Stroke or Cerebral Infarct, Parkinson’s Disease
`gibledhx`	Upper GI Bleeding
`immunhx`	Immunosupperssion, Organ Transplant, HIV Positivity, Diabetes Mellitus Without End Organ Damage, Diabetes Mellitus With End Organ Damage, Connective Tissue Disease
`liverhx`	Cirrhosis, Hepatic Failure
`malighx`	Solid Tumor, Metastatic Disease, Chronic Leukemia/Myeloma, Acute Leukemia, Lymphoma
`psychhx`	Psychiatric History, Active Psychosis or Severe Depression
`renalhx`	Chronic Renal Disease, Chronic Hemodialysis or Peritoneal Dialysis
`transhx`	Transfer (> 24 Hours) from Another Hospital

Code

vars04 <- c("amihx", "ca", "cardiohx", "chfhx", "chrpulhx", "dementhx",
          "gibledhx", "immunhx", "liverhx", "malighx", "psychhx", "renalhx", "transhx")

table1_04 <- 
    CreateTableOne(vars = vars04, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_04, smd = TRUE)

                      Stratified by swang1
                       RHC          No RHC       SMD   
  n                    2184         3551               
  amihx (mean (SD))    0.04 (0.20)  0.03 (0.17)   0.074
  ca (%)                                          0.107
     No                1727 (79.1)  2652 (74.7)        
     Yes                334 (15.3)   638 (18.0)        
     Metastatic         123 ( 5.6)   261 ( 7.4)        
  cardiohx (mean (SD)) 0.20 (0.40)  0.16 (0.37)   0.116
  chfhx (mean (SD))    0.19 (0.40)  0.17 (0.37)   0.069
  chrpulhx (mean (SD)) 0.14 (0.35)  0.22 (0.41)   0.192
  dementhx (mean (SD)) 0.07 (0.25)  0.12 (0.32)   0.163
  gibledhx (mean (SD)) 0.02 (0.16)  0.04 (0.19)   0.070
  immunhx (mean (SD))  0.29 (0.45)  0.26 (0.44)   0.080
  liverhx (mean (SD))  0.06 (0.24)  0.07 (0.26)   0.049
  malighx (mean (SD))  0.20 (0.40)  0.25 (0.43)   0.101
  psychhx (mean (SD))  0.05 (0.21)  0.08 (0.27)   0.143
  renalhx (mean (SD))  0.05 (0.21)  0.04 (0.20)   0.032
  transhx (mean (SD))  0.15 (0.36)  0.09 (0.29)   0.170

Seven of these covariates show standardized mean differences larger than 0.10.

4.5 Group 5: Day 1 Summary Measures of Presentation / Severity of Illness (4 covariates)

Variable	Definition
`aps1`	APACHE III score ignoring Coma, day 1
`das2d3pc`	DASI (Duke Activity Status Index prior to admission)
`scoma1`	Support Coma score based on Glasgow, day 1
`surv2md1`	SUPPORT model estimate of Prob(surviving 2 months)

Code

vars05 <- c("aps1", "das2d3pc", "scoma1", "surv2md1")

table1_05 <- 
    CreateTableOne(vars = vars05, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_05, smd = TRUE)

                      Stratified by swang1
                       RHC           No RHC        SMD   
  n                     2184          3551               
  aps1 (mean (SD))     60.74 (20.27) 50.93 (18.81)  0.501
  das2d3pc (mean (SD)) 20.70 (5.03)  20.37 (5.48)   0.063
  scoma1 (mean (SD))   18.97 (28.26) 22.25 (31.37)  0.110
  surv2md1 (mean (SD))  0.57 (0.20)   0.61 (0.19)   0.198

Three of these covariates show standardized mean differences larger than 0.10.

4.6 Group 6: Day 1 Lab Results (10 covariates)

Variable	Definition
`alb1`	Albumin
`bili1`	Bilirubin
`crea1`	Serum Creatinine
`hema1`	Hematocrit
`paco21`	PaCo2
`pafi1`	PaO2/(.01 FIO2)
`ph1`	Serum PH (arterial)
`pot1`	Serum Potassium
`sod1`	Serum Sodium
`wblc1`	White blood cell count (4 subjects with value 0)

Code

vars06 <- c("alb1", "bili1", "crea1", "hema1", "paco21",
            "pafi1", "pot1", "sod1", "wblc1")

table1_06 <- 
    CreateTableOne(vars = vars06, strata = "swang1", 
                   data = rhc_cleaning, test = FALSE)

print(table1_06, smd = TRUE)

                    Stratified by swang1
                     RHC             No RHC          SMD   
  n                    2184            3551                
  alb1 (mean (SD))     2.98 (0.93)     3.16 (0.67)    0.230
  bili1 (mean (SD))    2.71 (5.33)     2.00 (4.43)    0.145
  crea1 (mean (SD))    2.47 (2.05)     1.92 (2.03)    0.270
  hema1 (mean (SD))   30.51 (7.42)    32.70 (8.79)    0.269
  paco21 (mean (SD))  36.79 (10.97)   39.95 (14.24)   0.249
  pafi1 (mean (SD))  192.43 (105.54) 240.63 (116.66)  0.433
  pot1 (mean (SD))     4.05 (1.01)     4.08 (1.04)    0.027
  sod1 (mean (SD))   136.33 (7.60)   137.04 (7.68)    0.092
  wblc1 (mean (SD))   16.27 (12.55)   15.26 (11.41)   0.084

Six of these covariates show standardized mean differences larger than 0.10.

So, in all, 31 of the 50 covariates show standardized mean differences that exceed 0.10 before any propensity score adjustment.

5 A Little Refactoring

For our outcome, and our treatment, it will be useful to have 1/0 versions of the variables, and it will also be helpful to have the factor versions releveled to put the outcome of interest (death) first and the treatment of interest (RHC) first.

Code

rhc <- rhc_cleaning %>%
    mutate(treat_rhc = as.numeric(swang1 == "RHC"),
           swang1 = fct_relevel(swang1, "RHC"),
           death = fct_relevel(death, "Yes"),
           died = as.numeric(death == "Yes"))

5.1 Saving the Data File

Code

saveRDS(rhc, here("data", "rhc.Rds"))

6 Setting a Seed

I’m going to set a seed for random numbers before we start, which I can change later if I like.

Code

set.seed(50012345)

7 Unadjusted Outcome Assessments

Before we estimate a propensity score, we’ll perform unadjusted assessments to describe the impact of RHC (or not) on our three outcomes, without adjustment for any covariates at all.

7.1 Outcome A: In-Study Mortality (binary)

Code

rhc %>% tabyl(swang1, death) %>% 
    adorn_totals() %>%
    adorn_percentages() %>%
    adorn_pct_formatting() %>%
    adorn_ns(position = "front") %>%
    adorn_title

                death              
 swang1           Yes            No
    RHC 1,486 (68.0%)   698 (32.0%)
 No RHC 2,236 (63.0%) 1,315 (37.0%)
  Total 3,722 (64.9%) 2,013 (35.1%)

The odds ratio of death associated with RHC as opposed to No RHC should be larger than 1, specifically, it should be

\[ \frac{1486 \times 1315}{2236 \times 698} = 1.25 \]

Let’s see if that’s what we get.

7.1.1 Fitting the Model with the 1/0 treatment and outcome

Code

death_unadj <- glm(died ~ treat_rhc, data = rhc, family = binomial())

death_unadj


Call:  glm(formula = died ~ treat_rhc, family = binomial(), data = rhc)

Coefficients:
(Intercept)    treat_rhc  
     0.5309       0.2248  

Degrees of Freedom: 5734 Total (i.e. Null);  5733 Residual
Null Deviance:      7433 
Residual Deviance: 7418     AIC: 7422

Code

tidy(death_unadj, conf.int = TRUE, exponentiate = TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)     1.70    0.0348     15.3  1.11e-52     1.59      1.82
2 treat_rhc       1.25    0.0576      3.90 9.43e- 5     1.12      1.40

7.1.2 Fitting the Model with the treatment and outcome as factors requires care

Alternatively, we could just work with the factors, but then, it’s important to be sure what is actually being modeled. Do this by making the outcome and treatment logical results, as follows.

Code

modelA_unadj1 <- glm((death == "Yes") ~ (swang1 == "RHC"), data = rhc, family = binomial())

tidy(modelA_unadj1, conf.int = TRUE, exponentiate = TRUE)

# A tibble: 2 × 7
  term                    estimate std.error statistic  p.value conf.low conf.…¹
  <chr>                      <dbl>     <dbl>     <dbl>    <dbl>    <dbl>   <dbl>
1 "(Intercept)"               1.70    0.0348     15.3  1.11e-52     1.59    1.82
2 "swang1 == \"RHC\"TRUE"     1.25    0.0576      3.90 9.43e- 5     1.12    1.40
# … with abbreviated variable name ¹conf.high

Failing to do this can cause trouble:

Code

modelA_unadj_bad <- glm(death ~ swang1, data = rhc, family = binomial())

tidy(modelA_unadj_bad, conf.int = TRUE, exponentiate = TRUE)

# A tibble: 2 × 7
  term         estimate std.error statistic  p.value conf.low conf.high
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)     0.470    0.0459    -16.5  6.32e-61    0.429     0.514
2 swang1No RHC    1.25     0.0576      3.90 9.43e- 5    1.12      1.40

Note that this last version is actually predicting “Death = No” on the basis of “swang1 = No RHC”, which can be extremely confusing.

And if you did this:

Code

modelA_unadj_bad2 <- glm(death ~ swang1 == "RHC", data = rhc, family = binomial())

tidy(modelA_unadj_bad2, conf.int = TRUE, exponentiate = TRUE)

# A tibble: 2 × 7
  term                    estimate std.error statistic  p.value conf.low conf.…¹
  <chr>                      <dbl>     <dbl>     <dbl>    <dbl>    <dbl>   <dbl>
1 "(Intercept)"              0.588    0.0348    -15.3  1.11e-52    0.549   0.629
2 "swang1 == \"RHC\"TRUE"    0.799    0.0576     -3.90 9.43e- 5    0.713   0.894
# … with abbreviated variable name ¹conf.high

Now, you’ve inverted the odds ratio! Not good. So, either use 0 and 1 (with 1 being the result you want to study), or use the factors but specify the direction for both treatment and outcome as a logical statement.

7.2 Outcome B: Length of Stay (quantitative)

Code

hospdays_unadj <- lm(hospdays ~ treat_rhc, data = rhc)

hospdays_unadj


Call:
lm(formula = hospdays ~ treat_rhc, data = rhc)

Coefficients:
(Intercept)    treat_rhc  
     19.529        5.162

Code

tidy(hospdays_unadj, conf.int = TRUE)

# A tibble: 2 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)    19.5      0.424     46.0  0           18.7      20.4 
2 treat_rhc       5.16     0.687      7.51 6.76e-14     3.81      6.51

7.3 Outcome C: Time to death (time-to-event)

Here’s the data on survdays and death for our first six patients:

Code

rhc %>% select(survdays, death) %>% head()

# A tibble: 6 × 2
  survdays death
     <dbl> <fct>
1      240 No   
2       45 Yes  
3      317 No   
4       37 Yes  
5        2 Yes  
6      201 No

Here, we have a (right-censored) time to event outcome, called survdays, and an indicator of censoring (death which is “Yes” if the patient’s time is real and not censored). So the first subject should have a time to death of 240 or more days, because they were still alive when the study ended, while the second should have a time that is exactly 45 days, because that person died during the study. The survival object we need, then, is:

Code

Surv(rhc$survdays, rhc$death == "Yes") %>% head()

[1] 240+  45  317+  37    2  201+

Code

survdays_unadj <- coxph(Surv(survdays, death == "Yes") ~ treat_rhc, data = rhc)

survdays_unadj

Call:
coxph(formula = Surv(survdays, death == "Yes") ~ treat_rhc, data = rhc)

             coef exp(coef) se(coef)     z      p
treat_rhc 0.07841   1.08156  0.03347 2.342 0.0192

Likelihood ratio test=5.46  on 1 df, p=0.0195
n= 5735, number of events= 3722

Code

tidy(survdays_unadj, conf.int = TRUE, exponentiate = TRUE)

# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 treat_rhc     1.08    0.0335      2.34  0.0192     1.01      1.15

8 Estimate the Propensity Score with Logistic Regression

Code

propensity_model <- 
    glm(swang1 == "RHC"  ~ 
            age + sex + edu + income + ninsclas + race +
            cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
            resp1 + temp1 +
            card + gastr + hema + meta + neuro + ortho + 
            renal + resp + seps + trauma +
            amihx + ca + cardiohx + chfhx + chrpulhx +
            dementhx + gibledhx + immunhx + liverhx +
            malighx + psychhx + renalhx + transhx +
            aps1 + das2d3pc + scoma1 + surv2md1 +
            alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
            ph1 + pot1 + sod1 + wblc1,
        family = binomial(link = "logit"), data = rhc)

8.1 Store the propensity scores and linear propensity scores

Code

rhc <- rhc %>%
    mutate(ps = fitted(propensity_model),
           linps = propensity_model$linear.predictors)

rhc %>% select(ptid, ps, linps) %>% head()

# A tibble: 6 × 3
  ptid      ps  linps
  <chr>  <dbl>  <dbl>
1 00005 0.351  -0.615
2 00007 0.669   0.704
3 00009 0.634   0.547
4 00010 0.369  -0.537
5 00011 0.446  -0.218
6 00012 0.0553 -2.84

8.2 Plot the Propensity Scores to Verify that they Match Your Expectations

Code

ggplot(rhc, aes(x = ps, fill = swang1)) +
    geom_density(alpha = 0.5) +
    scale_fill_viridis_d(option = "plasma") +
    theme_bw()

Code

ggplot(rhc, aes(x = swang1, y = ps)) +
    geom_violin(aes(fill = swang1)) + 
    geom_boxplot(width = 0.2) +
    scale_fill_viridis_d(option = "plasma") + 
    guides(fill = "none") + 
    coord_flip() +
    theme_bw()

OK. We’re trying to use ps to estimate the probability of having a RHC, and it looks the people who actually had RHC have higher PS, on average, so that’s promising.

9 Checking Rubin’s Rules Prior to Propensity Adjustment

Rubin Rule 1 result should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.unadj <- with(rhc, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.unadj

[1] 101.583

Rubin Rule 2 result should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.unadj <- with(rhc, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.unadj

[1] 0.7359463

So, we’ve got some work to do. Let’s try matching. In several different ways.

10 Match 1: 1:1 Greedy Matching on the linear PS, without Replacement

Code

X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match1 <- Match(Tr = Tr, X = X, M = 1, 
                estimand = "ATT", replace = FALSE, 
                ties = FALSE)
summary(match1)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  2184 
Matched number of observations  (unweighted).  2184

10.1 Love Plot for Match 1

Code

b1 <- bal.tab(match1, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b1, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 1") +
    theme_bw()

10.2 Rubin’s Rules 1 and 2 for Match 1

Create a new data frame, containing only the matched sample.

Code

matches_1 <- factor(rep(match1$index.treated, 2))
rhc.matches_1 <- cbind(matches_1, 
          rhc[c(match1$index.control, match1$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m1 <- with(rhc.matches_1, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m1

[1] 61.88315

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m1 <- with(rhc.matches_1, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m1

[1] 1.687288

11 Match 2: 1:2 Greedy Matching on the linear PS, without Replacement

Code

X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match2 <- Match(Tr = Tr, X = X, M = 2, 
                estimand = "ATT", replace = FALSE, 
                ties = FALSE)
summary(match2)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  1775 
Matched number of observations  (unweighted).  3550

11.1 Love Plot for Match 2

Code

b2 <- bal.tab(match2, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b2, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 2") +
    theme_bw()

11.2 Rubin’s Rules 1 and 2 for Match 2

Create a new data frame, containing only the matched sample.

Code

matches_2 <- factor(rep(match2$index.treated, 2))
rhc.matches_2 <- cbind(matches_2, 
          rhc[c(match2$index.control, match2$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m2 <- with(rhc.matches_2, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m2

[1] 101.2148

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m2 <- with(rhc.matches_2, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m2

[1] 0.7569187

12 Match 3: Genetic Search Matching to do 1:1 Matching without replacement

The GenMatch function finds optimal balance using multivariate matching where a genetic search algorithm determines the weight that each covariate is given.

Please note that I am using way too small values of pop.size (especially) and probably max.generations, too, so that things run quickly. I am also only matching on the propensity score and linear propensity score, rather than on the individual covariates. For a course project, I recommend you try matching on the individual covariates if you can, but this reduction in the paramater values is fine.

For actual publications, follow the recommendations of the GenMatch algorithm, as described in the Matching and genoud package help files. The default values (which may be too small themselves) are 100 for both pop.size and max.generations.

Code

X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
genout3 <- GenMatch(Tr = Tr, X = X,
                    estimand = "ATT", M = 1,
                    pop.size = 10, max.generations = 10,
                    wait.generations = 4, verbose = FALSE)



Fri Mar 10 07:39:32 2023
Domains:
 0.000000e+00   <=  X1   <=    1.000000e+03 
 0.000000e+00   <=  X2   <=    1.000000e+03 

Data Type: Floating Point
Operators (code number, name, population) 
    (1) Cloning...........................  0
    (2) Uniform Mutation..................  1
    (3) Boundary Mutation.................  1
    (4) Non-Uniform Mutation..............  1
    (5) Polytope Crossover................  1
    (6) Simple Crossover..................  2
    (7) Whole Non-Uniform Mutation........  1
    (8) Heuristic Crossover...............  2
    (9) Local-Minimum Crossover...........  0

SOFT Maximum Number of Generations: 10
Maximum Nonchanging Generations: 4
Population size       : 10
Convergence Tolerance: 1.000000e-03

Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
Not Checking Gradients before Stopping.
Using Out of Bounds Individuals.

Maximization Problem.
GENERATION: 0 (initializing the population)
Lexical Fit..... 1.311905e-01  2.248375e-01  1.000000e+00  1.000000e+00  
#unique......... 10, #Total UniqueCount: 10
var 1:
best............ 2.514367e+01
mean............ 3.122722e+02
variance........ 5.498279e+04
var 2:
best............ 6.998688e+02
mean............ 4.798079e+02
variance........ 1.318615e+05

GENERATION: 1
Lexical Fit..... 1.313094e-01  2.252163e-01  1.000000e+00  1.000000e+00  
#unique......... 6, #Total UniqueCount: 16
var 1:
best............ 3.190046e+01
mean............ 4.206356e+01
variance........ 1.654869e+03
var 2:
best............ 5.221507e+02
mean............ 7.154282e+02
variance........ 5.651104e+04

GENERATION: 2
Lexical Fit..... 1.313094e-01  2.252163e-01  1.000000e+00  1.000000e+00  
#unique......... 7, #Total UniqueCount: 23
var 1:
best............ 3.190046e+01
mean............ 6.657013e+01
variance........ 4.258032e+03
var 2:
best............ 5.221507e+02
mean............ 6.198933e+02
variance........ 1.196544e+04

GENERATION: 3
Lexical Fit..... 1.313094e-01  2.252163e-01  1.000000e+00  1.000000e+00  
#unique......... 6, #Total UniqueCount: 29
var 1:
best............ 3.190046e+01
mean............ 9.278428e+01
variance........ 2.429210e+04
var 2:
best............ 5.221507e+02
mean............ 5.796450e+02
variance........ 1.073255e+04

GENERATION: 4
Lexical Fit..... 1.313266e-01  2.252830e-01  1.000000e+00  1.000000e+00  
#unique......... 4, #Total UniqueCount: 33
var 1:
best............ 4.099422e+01
mean............ 3.764262e+01
variance........ 1.006961e+02
var 2:
best............ 5.231225e+02
mean............ 4.814444e+02
variance........ 1.694430e+04

GENERATION: 5
Lexical Fit..... 1.313266e-01  2.252830e-01  1.000000e+00  1.000000e+00  
#unique......... 6, #Total UniqueCount: 39
var 1:
best............ 4.099422e+01
mean............ 3.515675e+01
variance........ 2.348079e+01
var 2:
best............ 5.231225e+02
mean............ 5.324653e+02
variance........ 1.354162e+03

'wait.generations' limit reached.
No significant improvement in 4 generations.

Solution Lexical Fitness Value:
1.313266e-01  2.252830e-01  1.000000e+00  1.000000e+00  

Parameters at the Solution:

 X[ 1] :    4.099422e+01
 X[ 2] :    5.231225e+02

Solution Found Generation 1
Number of Generations Run 5

Fri Mar 10 07:39:40 2023
Total run time : 0 hours 0 minutes and 8 seconds

Code

match3 <- Match(Tr = Tr, X = X, estimand = "ATT", 
                 Weight.matrix = genout3)
summary(match3)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  2184 
Matched number of observations  (unweighted).  2297

12.1 Love Plot for Match 3

Code

b3 <- bal.tab(match3, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b3, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 3") +
    theme_bw()

12.2 Rubin’s Rules 1 and 2 for Match 3

Create a new data frame, containing only the matched sample.

Code

matches_3 <- factor(rep(match3$index.treated, 2))
rhc.matches_3 <- cbind(matches_3, 
          rhc[c(match3$index.control, match3$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m3 <- with(rhc.matches_3, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m3

[1] 0.04837903

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m3 <- with(rhc.matches_3, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m3

[1] 1.002522

13 Match 4: 1:1 Greedy Matching on the linear PS, WITH Replacement

Code

X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match4 <- Match(Tr = Tr, X = X, M = 1, 
                estimand = "ATT", replace = TRUE, 
                ties = FALSE)
summary(match4)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  2184 
Matched number of observations  (unweighted).  2184

13.1 Love Plot for Match 4

Code

b4 <- bal.tab(match4, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b4, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 4") +
    theme_bw()

13.2 Rubin’s Rules 1 and 2 for Match 4

Create a new data frame, containing only the matched sample.

Code

matches_4 <- factor(rep(match4$index.treated, 2))
rhc.matches_4 <- cbind(matches_4, 
          rhc[c(match4$index.control, match4$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m4 <- with(rhc.matches_4, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m4

[1] 0.05521448

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m4 <- with(rhc.matches_4, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m4

[1] 1.002868

14 Match 5: 1:1 Caliper Matching on the linear PS, WITHOUT Replacement

Code

X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match5 <- Match(Tr = Tr, X = X, M = 1, 
                caliper = 0.2,
                estimand = "ATT", replace = FALSE, 
                ties = FALSE)
summary(match5)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  1563 
Matched number of observations  (unweighted).  1563 

Caliper (SDs)........................................   0.2 
Number of obs dropped by 'exact' or 'caliper'  621

14.1 Love Plot for Match 5

Code

b5 <- bal.tab(match5, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b5, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 5") +
    theme_bw()

14.2 Rubin’s Rules 1 and 2 for Match 5

Create a new data frame, containing only the matched sample.

Code

matches_5 <- factor(rep(match5$index.treated, 2))
rhc.matches_5 <- cbind(matches_5, 
          rhc[c(match5$index.control, match5$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m5 <- with(rhc.matches_5, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m5

[1] 1.697439

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m5 <- with(rhc.matches_5, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m5

[1] 1.027527

15 Match 6: 1:2 Greedy Matching on the linear PS, WITH Replacement

Code

X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match6 <- Match(Tr = Tr, X = X, M = 2, 
                estimand = "ATT", replace = TRUE, 
                ties = FALSE)
summary(match6)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  5735 
Original number of treated obs...............  2184 
Matched number of observations...............  2184 
Matched number of observations  (unweighted).  4368

15.1 Love Plot for Match 6

Code

b6 <- bal.tab(match6, 
              swang1 == "RHC"  ~ 
                  age + sex + edu + income + ninsclas + race +
                  cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 + 
                  resp1 + temp1 +
                  card + gastr + hema + meta + neuro + ortho + 
                  renal + resp + seps + trauma +
                  amihx + ca + cardiohx + chfhx + chrpulhx +
                  dementhx + gibledhx + immunhx + liverhx +
                  malighx + psychhx + renalhx + transhx +
                  aps1 + das2d3pc + scoma1 + surv2md1 +
                  alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
                  ph1 + pot1 + sod1 + wblc1 + ps + linps,
              data=rhc, un = TRUE)

love.plot(b6, threshold = .1, size = 1.5, stars = "raw",
               var.order = "unadjusted",
               title = "Standardized Differences and Match 6") +
    theme_bw()

15.2 Rubin’s Rules 1 and 2 for Match 6

Create a new data frame, containing only the matched sample.

Code

matches_6 <- factor(rep(match6$index.treated, 2))
rhc.matches_6 <- cbind(matches_6, 
          rhc[c(match6$index.control, match6$index.treated),])

Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.

Code

rubin1.m6 <- with(rhc.matches_6, 
                  abs(100*(mean(linps[swang1=="RHC"]) - 
                               mean(linps[swang1=="No RHC"]))/
                          sd(linps)))
rubin1.m6

[1] 0.1186399

Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.

Code

rubin2.m6 <- with(rhc.matches_6, 
                  var(linps[swang1 == "RHC"]) / 
                      var(linps[swang1 == "No RHC"]))
rubin2.m6

[1] 1.005432

16 Ranking the Matches

Match	Rubin 1	Rubin 2	Love Plot	Matched Sets	Description
None	101.6	0.74	–	–	No Matching
1	61.9	1.69	Pretty Bad	2,184	1:1 greedy w/o repl
2	101.2	0.76	Dreadful	1,775	1:2 greedy w/o repl
3	0	1	Very Good	2,184	Genetic Search 1:1 w/o repl
4	0.1	1	Very Good	2,184	1:1 greedy with repl
5	1.7	1.03	Very Good	1,562	1:1 caliper without repl
6	0.1	1.01	Very Good	2,184	1:2 greedy with repl

17 ATT Matching Estimates for the Binary Outcome, Death

17.1 With Match 1, in detail

Note that the variable which identifies the matches (called matches_1) is set up as a factor, as it needs to be, and that I’m using the 1/0 versions of both the outcome (so died rather than death) and treatment (so treat_rhc rather than swang1).

Code

death_adj_m1 <- clogit(died ~ treat_rhc + strata(matches_1), data = rhc.matches_1)

summary(death_adj_m1)

Call:
coxph(formula = Surv(rep(1, 4368L), died) ~ treat_rhc + strata(matches_1), 
    data = rhc.matches_1, method = "exact")

  n= 4368, number of events= 2861 

             coef exp(coef) se(coef)     z Pr(>|z|)    
treat_rhc 0.23156   1.26056  0.06488 3.569 0.000358 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
treat_rhc     1.261     0.7933      1.11     1.432

Concordance= 0.558  (se = 0.023 )
Likelihood ratio test= 12.82  on 1 df,   p=3e-04
Wald test            = 12.74  on 1 df,   p=4e-04
Score (logrank) test = 12.79  on 1 df,   p=3e-04

The odds ratio in the exp(coef) section above is the average causal effect estimate. It describes the odds of having an event (died) occur associated with being a RHC subject (as identified by treat_rhc), as compared to the odds of that event when a non-RHC subject. We can tidy this with:

Code

tidy(death_adj_m1, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic  p.value
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>
1 treat_rhc     1.26    0.0649      3.57 0.000358

17.2 With Matches 2-6, in much less detail

Code

death_adj_m2 <- clogit(died ~ treat_rhc + strata(matches_2), data = rhc.matches_2)
tidy(death_adj_m2, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic     p.value
  <chr>        <dbl>     <dbl>     <dbl>       <dbl>
1 treat_rhc     1.33    0.0548      5.27 0.000000137

Code

death_adj_m3 <- clogit(died ~ treat_rhc + strata(matches_3), data = rhc.matches_3)
tidy(death_adj_m3, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic    p.value
  <chr>        <dbl>     <dbl>     <dbl>      <dbl>
1 treat_rhc     1.34    0.0630      4.60 0.00000418

Code

death_adj_m4 <- clogit(died ~ treat_rhc + strata(matches_4), data = rhc.matches_4)
tidy(death_adj_m4, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic p.value
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>
1 treat_rhc     1.23    0.0636      3.22 0.00127

Code

death_adj_m5 <- clogit(died ~ treat_rhc + strata(matches_5), data = rhc.matches_5)
tidy(death_adj_m5, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic  p.value
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>
1 treat_rhc     1.33    0.0754      3.76 0.000170

Code

death_adj_m6 <- clogit(died ~ treat_rhc + strata(matches_6), data = rhc.matches_6)
tidy(death_adj_m6, exponentiate = TRUE, conf.level = 0.95)

# A tibble: 1 × 5
  term      estimate std.error statistic  p.value
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>
1 treat_rhc     1.36    0.0491      6.25 4.18e-10

17.3 Graphing the Results Across Multiple Matching Strategies

Code

temp0 <- tidy(death_unadj, conf.int = TRUE, exponentiate = TRUE) %>% 
    filter(term == "treat_rhc")
temp1 <- tidy(death_adj_m1, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp2 <- tidy(death_adj_m2, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp3 <- tidy(death_adj_m3, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp4 <- tidy(death_adj_m4, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp5 <- tidy(death_adj_m5, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp6 <- tidy(death_adj_m6, exponentiate = TRUE, conf.int = T, conf.level = 0.95)

death_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)

death_match_results <- death_match_results %>%
    mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
    mutate(description = c("No Matching", "1:1 greedy matching without replacement", 
                           "1:2 greedy matching without replacement",
                           "1:1 genetic search matching without replacement",
                           "1:1 greedy matching with replacement",
                           "1:1 caliper matching on the linear PS without replacement",
                           "1:2 greedy matching with replacement"))

death_match_results

# A tibble: 7 × 9
  term      estimate std.error statis…¹  p.value conf.…² conf.…³ appro…⁴ descr…⁵
  <chr>        <dbl>     <dbl>    <dbl>    <dbl>   <dbl>   <dbl> <chr>   <chr>  
1 treat_rhc     1.25    0.0576     3.90 9.43e- 5    1.12    1.40 Unmatc… No Mat…
2 treat_rhc     1.26    0.0649     3.57 3.58e- 4    1.11    1.43 Match 1 1:1 gr…
3 treat_rhc     1.33    0.0548     5.27 1.37e- 7    1.20    1.49 Match 2 1:2 gr…
4 treat_rhc     1.34    0.0630     4.60 4.18e- 6    1.18    1.51 Match 3 1:1 ge…
5 treat_rhc     1.23    0.0636     3.22 1.27e- 3    1.08    1.39 Match 4 1:1 gr…
6 treat_rhc     1.33    0.0754     3.76 1.70e- 4    1.15    1.54 Match 5 1:1 ca…
7 treat_rhc     1.36    0.0491     6.25 4.18e-10    1.23    1.50 Match 6 1:2 gr…
# … with abbreviated variable names ¹statistic, ²conf.low, ³conf.high,
#   ⁴approach, ⁵description

Code

ggplot(death_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for Death using RHC Propensity Matching",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Odds Ratio for Death associated with RHC")

Let’s look at just the four strategies that achieved stronger balance.

Code

death_match_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "ATT Matching Estimates in RHC (Death)",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Odds Ratio for Death associated with RHC")

18 ATT Matching Estimates for the Quantitative Outcome, Hospital Length of Stay

18.1 With Match 1, in detail

Again, the variable which identifies the matches (called matches_1) is set up as a factor, as it needs to be, and that I’m using the 1/0 version of treatment (so treat_rhc rather than swang1).

Our result for this quantitative outcome (hospdays) comes from a mixed model where the matches are treated as a random factor, but the treatment group is treated as a fixed factor, using the lme4 package.

Code

hospdays_adj_m1 <- lmer(hospdays ~ treat_rhc + (1 | matches_1),
                        data = rhc.matches_1)

summary(hospdays_adj_m1); confint(hospdays_adj_m1)

Linear mixed model fit by REML ['lmerMod']
Formula: hospdays ~ treat_rhc + (1 | matches_1)
   Data: rhc.matches_1

REML criterion at convergence: 40934.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.9292 -0.5540 -0.3137  0.1747 11.9768 

Random effects:
 Groups    Name        Variance Std.Dev.
 matches_1 (Intercept)  14.5     3.809  
 Residual              674.0    25.962  
Number of obs: 4368, groups:  matches_1, 2184

Fixed effects:
            Estimate Std. Error t value
(Intercept)  20.8255     0.5615  37.090
treat_rhc     3.8658     0.7857   4.921

Correlation of Fixed Effects:
          (Intr)
treat_rhc -0.700

Computing profile confidence intervals ...

                2.5 %    97.5 %
.sig01       0.000000  6.594183
.sigma      25.205111 26.717184
(Intercept) 19.725069 21.926030
treat_rhc    2.325669  5.406016

The estimated effect of treat_rhc in the fixed effects section, combined with the 95% confidence intervals material, provides the average causal effect estimate. It describes the change in the outcome hospdays associated with being a RHC subject (as identified by treat_rhc), as compared to being a non-RHC subject. We can tidy this with:

Code

broom.mixed::tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95)

# A tibble: 4 × 8
  effect   group     term            estimate std.error statis…¹ conf.…² conf.…³
  <chr>    <chr>     <chr>              <dbl>     <dbl>    <dbl>   <dbl>   <dbl>
1 fixed    <NA>      (Intercept)        20.8      0.561    37.1    19.7    21.9 
2 fixed    <NA>      treat_rhc           3.87     0.786     4.92    2.33    5.41
3 ran_pars matches_1 sd__(Intercept)     3.81    NA        NA      NA      NA   
4 ran_pars Residual  sd__Observation    26.0     NA        NA      NA      NA   
# … with abbreviated variable names ¹statistic, ²conf.low, ³conf.high

or more specifically, we can look at just the treatment effect with:

Code

broom.mixed::tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     3.87     0.786      4.92     2.33      5.41

18.2 With Matches 2-6, in much less detail

Code

hospdays_adj_m2 <- lmer(hospdays ~ treat_rhc + (1 | matches_2),
                        data = rhc.matches_2)
broom.mixed::tidy(hospdays_adj_m2, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     4.98     0.545      9.14     3.91      6.05

Code

hospdays_adj_m3 <- lmer(hospdays ~ treat_rhc + (1 | matches_3),
                        data = rhc.matches_3)
broom.mixed::tidy(hospdays_adj_m3, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     1.40     0.824      1.70   -0.216      3.01

Code

hospdays_adj_m4 <- lmer(hospdays ~ treat_rhc + (1 | matches_4),
                        data = rhc.matches_4)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(hospdays_adj_m4, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     1.27     0.859      1.48   -0.411      2.96

Code

hospdays_adj_m5 <- lmer(hospdays ~ treat_rhc + (1 | matches_5),
                        data = rhc.matches_5)

boundary (singular) fit: see help('isSingular')

Code

broom.mixed::tidy(hospdays_adj_m5, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     2.30     0.919      2.50    0.496      4.10

Code

hospdays_adj_m6 <- lmer(hospdays ~ treat_rhc + (1 | matches_6),
                        data = rhc.matches_6)
broom.mixed::tidy(hospdays_adj_m6, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc")

# A tibble: 1 × 8
  effect group term      estimate std.error statistic conf.low conf.high
  <chr>  <chr> <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 fixed  <NA>  treat_rhc     1.22     0.556      2.18    0.125      2.31

18.3 Graphing the Results Across Multiple Matching Strategies

Code

temp0 <- tidy(hospdays_unadj, conf.int = TRUE) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp1 <- tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp2 <- tidy(hospdays_adj_m2, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp3 <- tidy(hospdays_adj_m3, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp4 <- tidy(hospdays_adj_m4, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp5 <- tidy(hospdays_adj_m5, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

temp6 <- tidy(hospdays_adj_m6, conf.int = TRUE, conf.level = 0.95) %>% 
    filter(term == "treat_rhc") %>% 
    select(term, estimate, std.error, conf.low, conf.high)

hospdays_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)

hospdays_match_results <- hospdays_match_results %>%
    mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
    mutate(description = c("No Matching", "1:1 greedy matching without replacement", 
                           "1:2 greedy matching without replacement",
                           "1:1 genetic search matching without replacement",
                           "1:1 greedy matching with replacement",
                           "1:1 caliper matching on the linear PS without replacement",
                           "1:2 greedy matching with replacement"))

hospdays_match_results

# A tibble: 7 × 7
  term      estimate std.error conf.low conf.high approach  description         
  <chr>        <dbl>     <dbl>    <dbl>     <dbl> <chr>     <chr>               
1 treat_rhc     5.16     0.687    3.81       6.51 Unmatched No Matching         
2 treat_rhc     3.87     0.786    2.33       5.41 Match 1   1:1 greedy matching…
3 treat_rhc     4.98     0.545    3.91       6.05 Match 2   1:2 greedy matching…
4 treat_rhc     1.40     0.824   -0.216      3.01 Match 3   1:1 genetic search …
5 treat_rhc     1.27     0.859   -0.411      2.96 Match 4   1:1 greedy matching…
6 treat_rhc     2.30     0.919    0.496      4.10 Match 5   1:1 caliper matchin…
7 treat_rhc     1.22     0.556    0.125      2.31 Match 6   1:2 greedy matching…

Code

ggplot(hospdays_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for LOS using RHC Propensity Matching",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Change in Hospital Length of Stay associated with RHC")

Let’s look at just the four strategies that achieved stronger balance.

Code

hospdays_match_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "ATT Matching Estimates in RHC (LOS)",
         x = "Propensity Adjustment Approach", 
         y = "Est. Change in Hospital LOS associated with RHC")

19 ATT Matching Estimates for the Time-to-Event Outcome, In-Study Survival

We’ll use a stratified Cox proportional hazards model to compare the RHC/No RHC groups on our time-to-event outcome survdays, while accounting for the matched pairs. The main result will be a relative hazard rate estimate, with 95% CI. I will use the 0/1 numeric versions of the event indicator (died), and of the treatment indicator (treat_rhc).

19.1 With Match 1, in detail

Code

survdays_adj_m1 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_1), data=rhc.matches_1)
summary(survdays_adj_m1)

Call:
coxph(formula = Surv(survdays, died) ~ treat_rhc + strata(matches_1), 
    data = rhc.matches_1)

  n= 4368, number of events= 2861 

             coef exp(coef) se(coef)     z Pr(>|z|)
treat_rhc 0.04768   1.04883  0.04554 1.047    0.295

          exp(coef) exp(-coef) lower .95 upper .95
treat_rhc     1.049     0.9534    0.9593     1.147

Concordance= 0.512  (se = 0.016 )
Likelihood ratio test= 1.1  on 1 df,   p=0.3
Wald test            = 1.1  on 1 df,   p=0.3
Score (logrank) test = 1.1  on 1 df,   p=0.3

The hazard ratio in the exp(coef) section is the average causal effect estimate. It describes the hazard for having an event (died) occur associated with being a RHC subject (as identified by treat_rhc), as compared to the hazard of that event when a non-RHC subject. We can tidy this with:

Code

tidy(survdays_adj_m1, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 treat_rhc     1.05    0.0455      1.05   0.295    0.959      1.15

19.2 With Matches 2-6, in much less detail

Code

survdays_adj_m2 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_2), data=rhc.matches_2)
tidy(survdays_adj_m2, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic  p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 treat_rhc     1.27    0.0343      7.02 2.17e-12     1.19      1.36

Code

survdays_adj_m3 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_3), data=rhc.matches_3)
tidy(survdays_adj_m3, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 treat_rhc     1.11    0.0441      2.44  0.0145     1.02      1.21

Code

survdays_adj_m4 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_4), data=rhc.matches_4)
tidy(survdays_adj_m4, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 treat_rhc     1.06    0.0452      1.20   0.231    0.966      1.15

Code

survdays_adj_m5 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_5), data=rhc.matches_5)
tidy(survdays_adj_m5, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1 treat_rhc     1.12    0.0535      2.05  0.0399     1.01      1.24

Code

survdays_adj_m6 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_6), data=rhc.matches_6)
tidy(survdays_adj_m6, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

# A tibble: 1 × 7
  term      estimate std.error statistic  p.value conf.low conf.high
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 treat_rhc     1.36    0.0313      9.89 4.68e-23     1.28      1.45

19.3 Graphing the Results Across Multiple Matching Strategies

Code

temp0 <- tidy(survdays_unadj, conf.int = TRUE, exponentiate = TRUE)

temp1 <- tidy(survdays_adj_m1, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

temp2 <- tidy(survdays_adj_m2, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

temp3 <- tidy(survdays_adj_m3, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

temp4 <- tidy(survdays_adj_m4, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

temp5 <- tidy(survdays_adj_m5, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)

temp6 <- tidy(survdays_adj_m6, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)


survdays_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)

survdays_match_results <- survdays_match_results %>%
    mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
    mutate(description = c("No Matching", "1:1 greedy matching without replacement", 
                           "1:2 greedy matching without replacement",
                           "1:1 genetic search matching without replacement",
                           "1:1 greedy matching with replacement",
                           "1:1 caliper matching on the linear PS without replacement",
                           "1:2 greedy matching with replacement"))

survdays_match_results

# A tibble: 7 × 9
  term      estimate std.error statis…¹  p.value conf.…² conf.…³ appro…⁴ descr…⁵
  <chr>        <dbl>     <dbl>    <dbl>    <dbl>   <dbl>   <dbl> <chr>   <chr>  
1 treat_rhc     1.08    0.0335     2.34 1.92e- 2   1.01     1.15 Unmatc… No Mat…
2 treat_rhc     1.05    0.0455     1.05 2.95e- 1   0.959    1.15 Match 1 1:1 gr…
3 treat_rhc     1.27    0.0343     7.02 2.17e-12   1.19     1.36 Match 2 1:2 gr…
4 treat_rhc     1.11    0.0441     2.44 1.45e- 2   1.02     1.21 Match 3 1:1 ge…
5 treat_rhc     1.06    0.0452     1.20 2.31e- 1   0.966    1.15 Match 4 1:1 gr…
6 treat_rhc     1.12    0.0535     2.05 3.99e- 2   1.01     1.24 Match 5 1:1 ca…
7 treat_rhc     1.36    0.0313     9.89 4.68e-23   1.28     1.45 Match 6 1:2 gr…
# … with abbreviated variable names ¹statistic, ²conf.low, ³conf.high,
#   ⁴approach, ⁵description

Code

ggplot(survdays_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for Survival using RHC Propensity Matching",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Hazard Ratio associated with RHC")

Let’s look at just the four strategies that achieved stronger balance.

Code

survdays_match_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "ATT Matching Estimates in RHC (Survival)",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Hazard Ratio associated with RHC")

20 Propensity Score Weighting (ATT approach)

20.1 Estimate the Propensity Score using Generalized Boosted Regression, and then perfom ATT Weighting

To begin, we’ll re-estimate the propensity score using the twang function ps. This uses a generalized boosted regression approach to estimate the propensity score and produce material for checking balance.

Code

# Recall that twang does not play well with tibbles,
# so we have to use the data frame version of rhc object
# and I'm using the 1/0 version of the treatment

rhc_df <- base::as.data.frame(rhc)

ps_rhc <- ps(treat_rhc ~ 
                 age + sex + edu + income + ninsclas + 
                 race + cat1 + dnr1 + wtkilo1 + hrt1 + 
                 meanbp1 + resp1 + temp1 + card + gastr +
                 hema + meta + neuro + ortho + renal + 
                 resp + seps + trauma + amihx + ca + 
                 cardiohx + chfhx + chrpulhx + dementhx + 
                 gibledhx + immunhx + liverhx + malighx + 
                 psychhx + renalhx + transhx + aps1 + 
                 das2d3pc + scoma1 + surv2md1 + alb1 + 
                 bili1 + crea1 + hema1 + paco21 + pafi1 +
                 ph1 + pot1 + sod1 + wblc1,
             data = rhc_df,
             n.trees = 4000,
             interaction.depth = 2,
             stop.method = c("es.mean"),
             estimand = "ATT",
             verbose = FALSE)

20.1.1 Did we let the simulations run long enough to stabilize estimates?

Code

plot(ps_rhc)

20.1.2 What is the effective sample size of our weighted results?

Code

summary(ps_rhc)

            n.treat n.ctrl ess.treat ess.ctrl     max.es    mean.es     max.ks
unw            2184   3551      2184 3551.000 0.53367192 0.14968341 0.21274111
es.mean.ATT    2184   3551      2184 1250.501 0.09123789 0.03520429 0.04966016
            max.ks.p    mean.ks iter
unw               NA 0.05831639   NA
es.mean.ATT       NA 0.01540652 3999

20.1.3 How is the balance?

Code

plot(ps_rhc, plots = 2)

Code

plot(ps_rhc, plots = 3)

20.1.4 Assessing Balance with `cobalt`

Code

b2 <- bal.tab(ps_rhc, full.stop.method = "es.mean.att", 
        stats = c("m", "v"), un = TRUE)

b2

Balance Measures
                                 Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
prop.score                   Distance  1.5554     1.2083   0.5108      0.9058
age                           Contin. -0.0647     0.8175   0.0027      0.9662
sex_Female                     Binary -0.0462          .   0.0015           .
edu                           Contin.  0.0910     1.0147   0.0149      1.0020
income_Under $11k              Binary -0.0618          .  -0.0144           .
income_$11-$25k                Binary  0.0062          .  -0.0128           .
income_$25-$50k                Binary  0.0391          .   0.0245           .
income_> $50k                  Binary  0.0165          .   0.0027           .
ninsclas_Private               Binary  0.0624          .   0.0195           .
ninsclas_Private & Medicare    Binary  0.0143          .  -0.0057           .
ninsclas_Medicare              Binary -0.0327          .  -0.0011           .
ninsclas_Medicare & Medicaid   Binary -0.0144          .   0.0023           .
ninsclas_Medicaid              Binary -0.0395          .  -0.0231           .
ninsclas_No insurance          Binary  0.0099          .   0.0082           .
race_white                     Binary  0.0063          .  -0.0088           .
race_black                     Binary -0.0114          .   0.0062           .
race_other                     Binary  0.0050          .   0.0026           .
cat1_ARF                       Binary -0.0290          .  -0.0098           .
cat1_CHF                       Binary  0.0261          .  -0.0019           .
cat1_Cirrhosis                 Binary -0.0268          .  -0.0063           .
cat1_Colon Cancer              Binary -0.0012          .   0.0000           .
cat1_Coma                      Binary -0.0525          .  -0.0053           .
cat1_COPD                      Binary -0.0858          .  -0.0057           .
cat1_Lung Cancer               Binary -0.0073          .  -0.0002           .
cat1_MOSF w/Malignancy         Binary  0.0045          .   0.0002           .
cat1_MOSF w/Sepsis             Binary  0.1721          .   0.0289           .
dnr1_Yes                       Binary -0.0696          .  -0.0129           .
wtkilo1                       Contin.  0.2640     0.8835   0.0075      1.0408
hrt1                          Contin.  0.1460     1.0260   0.0506      0.9705
meanbp1                       Contin. -0.4869     0.7759  -0.0476      0.9655
resp1                         Contin. -0.1641     1.0330  -0.0912      1.0078
temp1                         Contin. -0.0209     1.1024  -0.0378      1.0288
card_Yes                       Binary  0.1395          .   0.0136           .
gastr_Yes                      Binary  0.0453          .   0.0250           .
hema_Yes                       Binary -0.0146          .  -0.0099           .
meta_Yes                       Binary -0.0059          .   0.0030           .
neuro_Yes                      Binary -0.1079          .  -0.0134           .
ortho_Yes                      Binary  0.0010          .   0.0016           .
renal_Yes                      Binary  0.0264          .   0.0083           .
resp_Yes                       Binary -0.1277          .  -0.0374           .
seps_Yes                       Binary  0.0912          .   0.0178           .
trauma_Yes                     Binary  0.0105          .   0.0092           .
amihx                          Binary  0.0139          .   0.0168           .
ca_No                          Binary  0.0439          .   0.0216           .
ca_Yes                         Binary -0.0267          .  -0.0093           .
ca_Metastatic                  Binary -0.0172          .  -0.0122           .
cardiohx                       Binary  0.0445          .  -0.0094           .
chfhx                          Binary  0.0268          .  -0.0100           .
chrpulhx                       Binary -0.0737          .  -0.0021           .
dementhx                       Binary -0.0472          .  -0.0055           .
gibledhx                       Binary -0.0122          .  -0.0065           .
immunhx                        Binary  0.0358          .  -0.0126           .
liverhx                        Binary -0.0124          .  -0.0075           .
malighx                        Binary -0.0423          .  -0.0211           .
psychhx                        Binary -0.0348          .  -0.0052           .
renalhx                        Binary  0.0066          .  -0.0102           .
transhx                        Binary  0.0554          .   0.0178           .
aps1                          Contin.  0.4837     1.1609   0.0664      1.0236
das2d3pc                      Contin.  0.0654     0.8429   0.0195      0.9675
scoma1                        Contin. -0.1160     0.8116  -0.0114      0.9701
surv2md1                      Contin. -0.1954     1.0663  -0.0226      1.0063
alb1                          Contin. -0.2010     1.8947  -0.0217      1.6852
bili1                         Contin.  0.1329     1.4504  -0.0102      0.9447
crea1                         Contin.  0.2678     1.0276   0.0335      0.9294
hema1                         Contin. -0.2954     0.7109  -0.0098      0.9668
paco21                        Contin. -0.2880     0.5935  -0.0559      1.0124
pafi1                         Contin. -0.4566     0.8184  -0.0635      0.9492
ph1                           Contin. -0.1163     1.1322  -0.0754      1.1463
pot1                          Contin. -0.0274     0.9501  -0.0149      1.0131
sod1                          Contin. -0.0927     0.9799  -0.0272      1.0189
wblc1                         Contin.  0.0799     1.2088  -0.0110      1.0154

Effective sample sizes
           Control Treated
Unadjusted  3551.     2184
Adjusted    1250.5    2184

20.2 Rubin’s Rule 1 After Weighting

We could simply compare the propensity score balance as reported in the table above, looking at the standardized biases (the standardized difference on a proportion scale, rather than a percentage.) I’ll multiply each by 100 to place things on the usual scale.

Code

100*b2[1]$Balance$Diff.Un[1] # imbalance prior to weighting

[1] 155.5377

Code

100*b2[1]$Balance$Diff.Adj[1] # imbalance after weighting

[1] 51.07571

So this is still indicative of some difficulty reaching a completely reasonable level of balance. We’d like to see this much closer to 0.

20.3 Rubin’s Rule 2 After Weighting

Code

b2[1]$Balance$V.Ratio.Un[1] # imbalance prior to weighting

[1] 1.208331

Code

b2[1]$Balance$V.Ratio.Adj[1] # imbalance after weighting

[1] 0.9057622

So there’s no particular problem with Rule 2 here.

20.4 Semi-Automated Love plot of Standardized Differences

Code

p <- love.plot(b2, 
               threshold = .1, size = 3, stars = "raw",
               title = "Standardized Diffs and ATT weights")
p + theme_bw()

20.5 Semi-Automated Love plot of Variance Ratios

Code

p <- love.plot(b2, stat = "v",
               threshold = 1.25, size = 3, 
               title = "Variance Ratios: TWANG ATT weighting")
p + theme_bw()

21 ATT Weighting Estimates

We’re not too excited about weighted estimates without (at least) an additional adjustment for differences in the propensity score in this case, but I’ll run them anyway just to show how that would be done.

21.1 Weighted ATT for Outcome A: Death

The relevant regression approach uses the svydesign and svyglm functions from the survey package. For a binary outcome, we build the outcome model using the quasibinomial, rather than the usual binomial family. Note the use of the 1/0 version of both the outcome and the treatment variables here.

Code

rhcwts_design <- svydesign(
    ids=~1, weights=~get.weights(ps_rhc, 
                                 stop.method = "es.mean"),
    data=rhc_df) # using twang ATT weights


adj_death_wtd <- svyglm(died ~ treat_rhc, 
                        design=rhcwts_design,
                      family=quasibinomial())

wt_twangatt_res_death <- 
    tidy(adj_death_wtd, exponentiate = TRUE,
         conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

wt_twangatt_res_death

# A tibble: 2 × 5
  term        estimate std.error  lo95  hi95
  <chr>          <dbl>     <dbl> <dbl> <dbl>
1 (Intercept)     1.73    0.0602  1.53  1.94
2 treat_rhc       1.23    0.0757  1.06  1.43

The estimates we want come from the treat_rhc row.

21.2 Double Robust Weighted Estimate for Death

Now let’s add in a regression adjustment for the linear propensity score, which should alleviate some of our concerns about residual imbalance after weighting.

Code

dr_adj_death <- svyglm(died ~ treat_rhc + linps, 
                        design=rhcwts_design,
                      family=quasibinomial())

dr_death <- 
    tidy(dr_adj_death, exponentiate = TRUE,
         conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

dr_death

# A tibble: 3 × 5
  term        estimate std.error  lo95  hi95
  <chr>          <dbl>     <dbl> <dbl> <dbl>
1 (Intercept)    1.72     0.0612 1.52   1.94
2 treat_rhc      1.26     0.0812 1.07   1.47
3 linps          0.938    0.0415 0.864  1.02

Again, the estimates we want come from the treat_rhc row.

21.3 Graphing the Weighted and Matched Results for Outcome A

Code

temp7 <- 
    tidy(adj_death_wtd, exponentiate = TRUE, 
              conf.int = T, conf.level = 0.95) %>%
    filter(term == "treat_rhc")

temp8 <- 
    tidy(dr_adj_death, exponentiate = TRUE, 
              conf.int = T, conf.level = 0.95) %>%
    filter(term == "treat_rhc")

death_new_results <- rbind(temp7, temp8) %>%
    mutate(approach = c("ATT weighted", "DR")) %>%
    mutate(description = c("ATT weights", "DR: ATT wts + adj"))
                               
death_results <- rbind(death_match_results, 
                       death_new_results)

death_results

# A tibble: 9 × 9
  term      estimate std.error statis…¹  p.value conf.…² conf.…³ appro…⁴ descr…⁵
  <chr>        <dbl>     <dbl>    <dbl>    <dbl>   <dbl>   <dbl> <chr>   <chr>  
1 treat_rhc     1.25    0.0576     3.90 9.43e- 5    1.12    1.40 Unmatc… No Mat…
2 treat_rhc     1.26    0.0649     3.57 3.58e- 4    1.11    1.43 Match 1 1:1 gr…
3 treat_rhc     1.33    0.0548     5.27 1.37e- 7    1.20    1.49 Match 2 1:2 gr…
4 treat_rhc     1.34    0.0630     4.60 4.18e- 6    1.18    1.51 Match 3 1:1 ge…
5 treat_rhc     1.23    0.0636     3.22 1.27e- 3    1.08    1.39 Match 4 1:1 gr…
6 treat_rhc     1.33    0.0754     3.76 1.70e- 4    1.15    1.54 Match 5 1:1 ca…
7 treat_rhc     1.36    0.0491     6.25 4.18e-10    1.23    1.50 Match 6 1:2 gr…
8 treat_rhc     1.23    0.0757     2.78 5.50e- 3    1.06    1.43 ATT we… ATT we…
9 treat_rhc     1.26    0.0812     2.81 4.94e- 3    1.07    1.47 DR      DR: AT…
# … with abbreviated variable names ¹statistic, ²conf.low, ³conf.high,
#   ⁴approach, ⁵description

So there’s our table. It’s worth remembering that the only analyses that we’ve

Code

death_results %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "ATT Estimates in RHC (Death)",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Odds Ratio for Death associated with RHC")

And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…

Code

death_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5",
                            "Match 6", "DR")) %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "ATT Estimates in RHC (Death)",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Odds Ratio for Death associated with RHC")

21.4 Weighted ATT for Outcome B: Hospital Length of Stay

The relevant regression approach uses the svydesign and svyglm functions from the survey package.

Code

rhcwts_design <- svydesign(
    ids=~1, weights=~get.weights(ps_rhc, 
                                 stop.method = "es.mean"),
    data=rhc_df) # using twang ATT weights

adj_hospdays_wtd <- 
    svyglm(hospdays ~ treat_rhc, 
           design= rhcwts_design)

wt_twangatt_res_hospdays <- 
    tidy(adj_hospdays_wtd, conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

wt_twangatt_res_hospdays

# A tibble: 2 × 5
  term        estimate std.error    lo95  hi95
  <chr>          <dbl>     <dbl>   <dbl> <dbl>
1 (Intercept)    22.6      0.844 20.9    24.2 
2 treat_rhc       2.10     1.03   0.0750  4.12

The estimates we want come from the treat_rhc row.

21.5 Double Robust Weighted Estimate for LOS

Now let’s add in a regression adjustment for the linear propensity score. Again, we should have some additional faith in these results over those with just weighting alone, because of the failure of the weighting approach to pass Rubin’s Rule 1.

Code

dr_adj_days <- svyglm(hospdays ~ treat_rhc + linps, 
                        design=rhcwts_design)

dr_los <- 
    tidy(dr_adj_days, conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

dr_los

# A tibble: 3 × 5
  term        estimate std.error   lo95  hi95
  <chr>          <dbl>     <dbl>  <dbl> <dbl>
1 (Intercept)    22.8      0.873 21.1   24.5 
2 treat_rhc       1.56     1.10  -0.605  3.72
3 linps           1.93     0.507  0.934  2.92

Once again, the estimates we want come from the treat_rhc row.

21.6 Graphing the Weighted and Matched Results for Outcome B

Code

temp9 <- 
    tidy(adj_hospdays_wtd, conf.int = T, 
         conf.level = 0.95) %>%
    filter(term == "treat_rhc") %>%
    select(-statistic, -p.value)

temp10 <- 
    tidy(dr_adj_days, conf.int = T, 
         conf.level = 0.95) %>%
    filter(term == "treat_rhc") %>%
    select(-statistic, -p.value)

hospdays_new_results <- rbind(temp9, temp10) %>%
    mutate(approach = c("ATT weighted", "DR")) %>%
    mutate(description = c("ATT weights", "DR: ATT wts + adj"))
                               
hospdays_results <- rbind(hospdays_match_results, 
                       hospdays_new_results)

hospdays_results

# A tibble: 9 × 7
  term      estimate std.error conf.low conf.high approach     description      
  <chr>        <dbl>     <dbl>    <dbl>     <dbl> <chr>        <chr>            
1 treat_rhc     5.16     0.687   3.81        6.51 Unmatched    No Matching      
2 treat_rhc     3.87     0.786   2.33        5.41 Match 1      1:1 greedy match…
3 treat_rhc     4.98     0.545   3.91        6.05 Match 2      1:2 greedy match…
4 treat_rhc     1.40     0.824  -0.216       3.01 Match 3      1:1 genetic sear…
5 treat_rhc     1.27     0.859  -0.411       2.96 Match 4      1:1 greedy match…
6 treat_rhc     2.30     0.919   0.496       4.10 Match 5      1:1 caliper matc…
7 treat_rhc     1.22     0.556   0.125       2.31 Match 6      1:2 greedy match…
8 treat_rhc     2.10     1.03    0.0750      4.12 ATT weighted ATT weights      
9 treat_rhc     1.56     1.10   -0.605       3.72 DR           DR: ATT wts + adj

So there’s our table.

Code

hospdays_results %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for LOS",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Change in Hospital LOS associated with RHC")

And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…

Code

hospdays_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5",
                            "Match 6", "DR")) %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for LOS",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Change in Hospital LOS associated with RHC")

21.7 Weighted ATT for Outcome C: In-Study Time to Death

Code

rhc_wts <- get.weights(ps_rhc, stop.method = "es.mean")

adj_survdays_wtd <- coxph(Surv(survdays, died) ~ treat_rhc,
                          data = rhc_df, weights = rhc_wts)

wt_twangatt_res_surv <- 
    tidy(adj_survdays_wtd, exponentiate = TRUE,
         conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

wt_twangatt_res_surv

# A tibble: 1 × 5
  term      estimate std.error  lo95  hi95
  <chr>        <dbl>     <dbl> <dbl> <dbl>
1 treat_rhc     1.11    0.0396  1.02  1.22

Again, the estimates we want come from the treat_rhc row.

21.8 Double Robust Weighted Estimate for Days to Death

Now let’s add in a regression adjustment for the linear propensity score. Again, in this case, we prefer this double robust approach to weighting alone because of Rubin’s Rule 1 and the Love plot.

Code

dr_survdays <- 
    coxph(Surv(survdays, died) ~ treat_rhc + linps,
          data = rhc_df, weights = rhc_wts)

dr_surv <- 
    tidy(dr_survdays, exponentiate = TRUE, 
         conf.int = TRUE,
         conf.level = 0.95) %>% 
    select(term, estimate, std.error,
           lo95 = conf.low, hi95 = conf.high)

dr_surv

# A tibble: 2 × 5
  term      estimate std.error  lo95  hi95
  <chr>        <dbl>     <dbl> <dbl> <dbl>
1 treat_rhc    1.13     0.0399 1.03  1.23 
2 linps        0.955    0.0173 0.913 0.999

Once again, the estimates we want come from the treat_rhc row.

21.9 Graphing the Weighted and Matched Results for Outcome C

Code

temp11 <- 
    tidy(adj_survdays_wtd, exponentiate = T, conf.int = T, 
         conf.level = 0.95) %>%
    filter(term == "treat_rhc") %>%
    select(-robust.se)

temp12 <- 
    tidy(dr_survdays, exponentiate = T, conf.int = T, 
         conf.level = 0.95) %>%
    filter(term == "treat_rhc") %>%
    select(-robust.se)

survdays_new_results <- rbind(temp11, temp12) %>%
    mutate(approach = c("ATT weighted", "DR")) %>%
    mutate(description = c("ATT weights", "DR: ATT wts + adj"))
                               
survdays_results <- rbind(survdays_new_results, 
                       survdays_match_results)

survdays_results

# A tibble: 9 × 9
  term      estimate std.error statis…¹  p.value conf.…² conf.…³ appro…⁴ descr…⁵
  <chr>        <dbl>     <dbl>    <dbl>    <dbl>   <dbl>   <dbl> <chr>   <chr>  
1 treat_rhc     1.11    0.0396     2.39 1.67e- 2   1.02     1.22 ATT we… ATT we…
2 treat_rhc     1.13    0.0399     2.55 1.09e- 2   1.03     1.23 DR      DR: AT…
3 treat_rhc     1.08    0.0335     2.34 1.92e- 2   1.01     1.15 Unmatc… No Mat…
4 treat_rhc     1.05    0.0455     1.05 2.95e- 1   0.959    1.15 Match 1 1:1 gr…
5 treat_rhc     1.27    0.0343     7.02 2.17e-12   1.19     1.36 Match 2 1:2 gr…
6 treat_rhc     1.11    0.0441     2.44 1.45e- 2   1.02     1.21 Match 3 1:1 ge…
7 treat_rhc     1.06    0.0452     1.20 2.31e- 1   0.966    1.15 Match 4 1:1 gr…
8 treat_rhc     1.12    0.0535     2.05 3.99e- 2   1.01     1.24 Match 5 1:1 ca…
9 treat_rhc     1.36    0.0313     9.89 4.68e-23   1.28     1.45 Match 6 1:2 gr…
# … with abbreviated variable names ¹statistic, ²conf.low, ³conf.high,
#   ⁴approach, ⁵description

So there’s our table.

Code

survdays_results %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for Survival Time",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Hazard Ratio associated with RHC")

And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…

Code

survdays_results %>%
    filter(approach %in% c("Match 3", "Match 4", "Match 5",
                            "Match 6", "DR")) %>%
    ggplot(., aes(x = description, y = estimate, 
                  ymin = conf.low, ymax = conf.high)) +
    geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
    geom_pointrange() +
    geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
    theme_bw() +
    coord_flip() +
    labs(title = "Comparing ATT Estimates for Survival Time",
         x = "Propensity Adjustment Approach", 
         y = "Estimated Hazard Ratio associated with RHC")

22 Sensitivity Analyses After Matching

22.1 For the Binary Outcome, Death

The rbounds package can evaluate binary outcomes using the binarysens and Fishersens functions. The binarysens function works directly on a matched sample as developed using the Matching package. For example, in our third match, based on the genetic search algorithm with 1:1 matching without replacement, this is match3. Note that in order to do this, we needed to run match3 including the appropriate (binary) outcome (Y).

Code

Y <- rhc$died
X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
match3 <- Match(Y = Y, Tr = Tr, X = X, estimand = "ATT", 
                 Weight.matrix = genout3)

Rc <- match3$mdata$Y[match3$mdata$Tr==0]
Rt <- match3$mdata$Y[match3$mdata$Tr==1] 

out.tab <- matrix(c(table(Rc, Rt)), nrow = 2)
out.tab

     [,1] [,2]
[1,]  275  594
[2,]  446  982

Code

binarysens(out.tab[2,1], out.tab[1,2],
           Gamma = 1.5, GammaInc = 0.05)


 Rosenbaum Sensitivity Test 
 
Unconfounded estimate ....  0 

 Gamma Lower bound Upper bound
  1.00           0     0.00000
  1.05           0     0.00006
  1.10           0     0.00099
  1.15           0     0.00865
  1.20           0     0.04479
  1.25           0     0.14833
  1.30           0     0.33859
  1.35           0     0.57390
  1.40           0     0.77815
  1.45           0     0.90728
  1.50           0     0.96873

 Note: Gamma is Odds of Differential Assignment To
 Treatment Due to Unobserved Factors

At a 5% significance level, we need to compare the listed upper bounds to a value of \(\alpha = 0.05/2 = 0.025\). It appears that we retain significance up to a hidden bias of \(\Gamma\) = 1.15, but not at \(\Gamma\) = 1.2. See Chapter 9 of Rosenbaum’s Observation and Experiment for more details on interpretation of this result.

We can also use this approach with other Matching results, including Match 6, which is a 1:2 match with replacement. Again, we have to run the match with the appropriate outcome included.

Code

Y <- rhc$died
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match6 <- Match(Y = Y, Tr = Tr, X = X, M = 2, 
                estimand = "ATT", replace = TRUE, 
                ties = FALSE)

Rc <- match6$mdata$Y[match6$mdata$Tr==0]
Rt <- match6$mdata$Y[match6$mdata$Tr==1] 

out.tab <- matrix(c(table(Rc, Rt)), nrow = 2)
out.tab

     [,1] [,2]
[1,]  526 1090
[2,]  870 1882

Code

binarysens(out.tab[2,1], out.tab[1,2],
           Gamma = 1.5, GammaInc = 0.05)


 Rosenbaum Sensitivity Test 
 
Unconfounded estimate ....  0 

 Gamma Lower bound Upper bound
  1.00           0     0.00000
  1.05           0     0.00004
  1.10           0     0.00192
  1.15           0     0.02811
  1.20           0     0.16574
  1.25           0     0.47114
  1.30           0     0.78518
  1.35           0     0.94718
  1.40           0     0.99213
  1.45           0     0.99927
  1.50           0     0.99996

 Note: Gamma is Odds of Differential Assignment To
 Treatment Due to Unobserved Factors

Here, at a 5% significance level, we again compare our upper bounds to \(\alpha/2 = 0.025\) and we retain significance up to a hidden bias of \(\Gamma\) = 1.10, but not at \(\Gamma\) = 1.15.

22.2 For the Quantitative Outcome, In-Hospital Length of Stay

We have already used the Match function from the Matching package to develop our matched samples. For our 1:1 matches, we need only run the psens function from the rbounds package to obtain sensitivity results. For example, in Match 3,

Code

Y <- rhc$hospdays
X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
match3 <- Match(Y = Y, Tr = Tr, X = X, estimand = "ATT", 
                 Weight.matrix = genout3)

Rc <- match3$mdata$Y[match3$mdata$Tr==0]
Rt <- match3$mdata$Y[match3$mdata$Tr==1] 

psens(Rt, Rc, Gamma = 1.5, GammaInc = 0.05)


 Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value 
 
Unconfounded estimate ....  1e-04 

 Gamma Lower bound Upper bound
  1.00       1e-04      0.0001
  1.05       0e+00      0.0041
  1.10       0e+00      0.0453
  1.15       0e+00      0.2168
  1.20       0e+00      0.5344
  1.25       0e+00      0.8213
  1.30       0e+00      0.9575
  1.35       0e+00      0.9937
  1.40       0e+00      0.9994
  1.45       0e+00      1.0000
  1.50       0e+00      1.0000

 Note: Gamma is Odds of Differential Assignment To
 Treatment Due to Unobserved Factors

At a 5% significance level, we retain significance up to a hidden bias of \(\Gamma\) = 1.05, but not at \(\Gamma\) = 1.1. See Chapter 9 of Rosenbaum’s Observation and Experiment for more details on interpretation of this result.

22.3 For the Survival Outcome, In-Study Time to Death

In this setting, we would need to use the spreadsheet software provided by Dr. Love. That software is designed only for 1:1 greedy matching without replacement or weighting, which is the approach we took in Match 1 (we could also have looked at matching using a caliper, but without replacement.) We need to identify, for each matched pair of subjects, whether that pair’s set of results was:

that the RHC patient definitely survived longer than the non-RHC patient
that the non-RHC patient definitely survived longer than the RHC patient
that it is unclear which patient survived longer, due to censoring

Here is some very fussy code that accomplishes this counting. This could probably be accomplished just as well with some sort of case_when statement, but it’s a bit tricky.

Code

a1 <- rhc.matches_1 %>% 
    select(matches_1, swang1, death, survdays) %>%
    arrange(matches_1)

Code

a2 <- unite(a1, situation, swang1, death, sep = "_", remove = TRUE)

a3 <- spread(a2, situation, survdays)

a4 <- a3 %>%
    filter(complete.cases(`No RHC_Yes`, `RHC_Yes`)) %>%
    mutate(result = ifelse(`RHC_Yes` > `No RHC_Yes`, "RHC lived longer", "No RHC lived longer"))

a5 <- a3 %>%
    filter(complete.cases(`No RHC_No`, `RHC_No`)) %>%
    mutate(result = "No clear winner")

a6 <- a3 %>%
    filter(complete.cases(`No RHC_No`, `RHC_Yes`)) %>%
    mutate(result = ifelse(`RHC_Yes` > `No RHC_No`, "No clear winner", "No RHC lived longer"))

a7 <- a3 %>%
    filter(complete.cases(`No RHC_Yes`, `RHC_No`)) %>%
    mutate(result = ifelse(`RHC_No` > `No RHC_Yes`, "RHC lived longer", "No clear winner"))

a8 <- bind_rows(a4, a5, a6, a7)

a8 %>% tabyl(result) %>% adorn_totals()

              result    n   percent
 No RHC lived longer  988 0.4523810
     No clear winner  278 0.1272894
    RHC lived longer  918 0.4203297
               Total 2184 1.0000000

So we have counts for the pairs where a clear winner is identified, and we can enter the spreadsheet. We need the number of pairs with a clear winner, and the number of pairs where the No RHC patient lived longer than the RHC patient. Doing so, we see that the result doesn’t meet the sign-score test standard necessary to have a p value below 0.10, so we stop there, and conclude that the effect would be highly sensitive to hidden bias.

23 Session Information

Code

xfun::session_info()

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  askpass_1.1         backports_1.4.1     base64enc_0.1.3    
  bit_4.0.5           bit64_4.0.5         blob_1.2.3         
  boot_1.3-28.1       brio_1.1.3          broom_1.0.3        
  broom.mixed_0.2.9.4 bslib_0.4.2         cachem_1.0.7       
  callr_3.7.3         cellranger_1.1.0    class_7.3-20       
  cli_3.6.0           clipr_0.8.0         cobalt_4.4.1       
  coda_0.19.4         codetools_0.2-18    colorspace_2.1-0   
  compiler_4.2.2      conflicted_1.2.0    cpp11_0.4.3        
  crayon_1.5.2        curl_5.0.0          data.table_1.14.8  
  DBI_1.1.3           dbplyr_2.3.1        deldir_1.0-6       
  desc_1.4.2          diffobj_0.3.5       digest_0.6.31      
  dplyr_1.1.0         dtplyr_1.3.0        e1071_1.7-13       
  ellipsis_0.3.2      evaluate_0.20       fansi_1.0.4        
  farver_2.1.1        fastmap_1.1.1       forcats_1.0.0      
  fs_1.6.1            furrr_0.3.1         future_1.31.0      
  gargle_1.3.0        gbm_2.1.8.1         gdata_2.18.0.1     
  generics_0.1.3      ggforce_0.4.1       ggformula_0.10.2   
  ggplot2_3.4.1       ggridges_0.5.4      ggstance_0.3.6     
  globals_0.16.2      glue_1.6.2          gmodels_2.18.1.1   
  googledrive_2.0.0   googlesheets4_1.0.1 graphics_4.2.2     
  grDevices_4.2.2     grid_4.2.2          gridExtra_2.3      
  gtable_0.3.1        gtools_3.9.4        haven_2.5.1        
  here_1.0.1          highr_0.10          hms_1.1.2          
  htmltools_0.5.4     htmlwidgets_1.6.1   httr_1.4.5         
  ids_1.0.1           interp_1.1-3        isoband_0.2.7      
  janitor_2.2.0       jpeg_0.1-10         jquerylib_0.1.4    
  jsonlite_1.8.4      knitr_1.42          labeling_0.4.2     
  labelled_2.10.0     lattice_0.20-45     latticeExtra_0.6-30
  lifecycle_1.0.3     listenv_0.9.0       lme4_1.1-31        
  lubridate_1.9.2     magrittr_2.0.3      MASS_7.3-58.2      
  Matching_4.10-8     Matrix_1.5-3        MatrixModels_0.5-1 
  memoise_2.0.1       methods_4.2.2       mgcv_1.8.41        
  mime_0.12           minqa_1.2.5         mitools_2.4        
  modelr_0.1.10       mosaic_1.8.4.2      mosaicCore_0.9.2.1 
  mosaicData_0.20.3   munsell_0.5.0       naniar_1.0.0       
  nlme_3.1-160        nloptr_2.0.3        norm_1.0.10.0      
  numDeriv_2016.8.1.1 openssl_2.0.5       parallel_4.2.2     
  parallelly_1.34.0   pillar_1.8.1        pkgconfig_2.0.3    
  pkgload_1.3.2       plyr_1.8.8          png_0.1-8          
  polyclip_1.10-4     praise_1.0.0        prettyunits_1.1.1  
  processx_3.8.0      progress_1.2.2      proxy_0.4-27       
  ps_1.7.2            purrr_1.0.1         R6_2.5.1           
  ragg_1.2.5          rappdirs_0.3.3      rbounds_2.2        
  RColorBrewer_1.1-3  Rcpp_1.0.10         RcppEigen_0.3.3.9.3
  readr_2.1.4         readxl_1.4.2        rematch_1.0.1      
  rematch2_2.1.2      reprex_2.0.2        rgenoud_5.9-0.3    
  rlang_1.0.6         rmarkdown_2.20      rprojroot_2.0.3    
  rstudioapi_0.14     rvest_1.0.3         sass_0.4.5         
  scales_1.2.1        selectr_0.4.2       snakecase_0.11.0   
  splines_4.2.2       stats_4.2.2         stringi_1.7.12     
  stringr_1.5.0       survey_4.1-1        survival_3.5-3     
  sys_3.4.1           systemfonts_1.0.4   tableone_0.13.2    
  testthat_3.1.6      textshaping_0.3.6   tibble_3.1.8       
  tidyr_1.3.0         tidyselect_1.2.0    tidyverse_2.0.0    
  timechange_0.2.0    tinytex_0.44        tools_4.2.2        
  twang_2.5           tweenr_2.0.2        tzdb_0.3.0         
  UpSetR_1.4.0        utf8_1.2.3          utils_4.2.2        
  uuid_1.1.0          vctrs_0.5.2         viridis_0.6.2      
  viridisLite_0.4.1   visdat_0.6.0        vroom_1.6.1        
  waldo_0.4.0         withr_2.5.0         xfun_0.37          
  xgboost_1.7.3.1     xml2_1.3.3          xtable_1.8-4       
  yaml_2.3.7          zoo_1.8-11

1 Preliminaries

1.1 Study Background

1.2 Loading Packages

1.3 Importing the Data

1.4 The Data Set after Initial Import

1.5 Any missingness?

2 The Treatment / Exposure We’ll Study

3 The 3 Outcomes We’ll Study

3.1 Mortality, a binary outcome: death

3.2 Working with Dates to Define the Time-Related Outcomes

3.2.1 The Raw Data on Dates

3.2.2 Impute hospital discharge date for the one missing value

3.3 Length of initial hospital stay, a quantitative outcome: hospdays

3.4 Survival Time in Days, a time-to-event outcome: survdays

4 The 50 Covariates We’ll Study

4.1 Group 1: Socio-demographics (6 covariates)

4.2 Group 2: Presentation at Admission (7 covariates)

4.3 Group 3: Admission Diagnosis Categories (10 covariates)

4.4 Group 4: Comorbid Illness and Transfer Status (13 covariates)

4.5 Group 5: Day 1 Summary Measures of Presentation / Severity of Illness (4 covariates)

4.6 Group 6: Day 1 Lab Results (10 covariates)

5 A Little Refactoring

5.1 Saving the Data File

6 Setting a Seed

7 Unadjusted Outcome Assessments

7.1 Outcome A: In-Study Mortality (binary)

7.1.1 Fitting the Model with the 1/0 treatment and outcome

7.1.2 Fitting the Model with the treatment and outcome as factors requires care

7.2 Outcome B: Length of Stay (quantitative)

7.3 Outcome C: Time to death (time-to-event)

8 Estimate the Propensity Score with Logistic Regression

8.1 Store the propensity scores and linear propensity scores

8.2 Plot the Propensity Scores to Verify that they Match Your Expectations

9 Checking Rubin’s Rules Prior to Propensity Adjustment

10 Match 1: 1:1 Greedy Matching on the linear PS, without Replacement

10.1 Love Plot for Match 1

10.2 Rubin’s Rules 1 and 2 for Match 1

11 Match 2: 1:2 Greedy Matching on the linear PS, without Replacement

11.1 Love Plot for Match 2

11.2 Rubin’s Rules 1 and 2 for Match 2

12 Match 3: Genetic Search Matching to do 1:1 Matching without replacement

12.1 Love Plot for Match 3

12.2 Rubin’s Rules 1 and 2 for Match 3

13 Match 4: 1:1 Greedy Matching on the linear PS, WITH Replacement

13.1 Love Plot for Match 4

13.2 Rubin’s Rules 1 and 2 for Match 4

14 Match 5: 1:1 Caliper Matching on the linear PS, WITHOUT Replacement

14.1 Love Plot for Match 5

14.2 Rubin’s Rules 1 and 2 for Match 5

15 Match 6: 1:2 Greedy Matching on the linear PS, WITH Replacement

15.1 Love Plot for Match 6

15.2 Rubin’s Rules 1 and 2 for Match 6

16 Ranking the Matches

17 ATT Matching Estimates for the Binary Outcome, Death

17.1 With Match 1, in detail

17.2 With Matches 2-6, in much less detail

17.3 Graphing the Results Across Multiple Matching Strategies

18 ATT Matching Estimates for the Quantitative Outcome, Hospital Length of Stay

18.1 With Match 1, in detail

18.2 With Matches 2-6, in much less detail

18.3 Graphing the Results Across Multiple Matching Strategies

19 ATT Matching Estimates for the Time-to-Event Outcome, In-Study Survival

19.1 With Match 1, in detail

19.2 With Matches 2-6, in much less detail

19.3 Graphing the Results Across Multiple Matching Strategies

20 Propensity Score Weighting (ATT approach)

20.1 Estimate the Propensity Score using Generalized Boosted Regression, and then perfom ATT Weighting

20.1.1 Did we let the simulations run long enough to stabilize estimates?

20.1.2 What is the effective sample size of our weighted results?

20.1.3 How is the balance?

20.1.4 Assessing Balance with cobalt

20.2 Rubin’s Rule 1 After Weighting

20.3 Rubin’s Rule 2 After Weighting

20.4 Semi-Automated Love plot of Standardized Differences

20.5 Semi-Automated Love plot of Variance Ratios

21 ATT Weighting Estimates

21.1 Weighted ATT for Outcome A: Death

21.2 Double Robust Weighted Estimate for Death

21.3 Graphing the Weighted and Matched Results for Outcome A

21.4 Weighted ATT for Outcome B: Hospital Length of Stay

3.1 Mortality, a binary outcome: `death`

3.3 Length of initial hospital stay, a quantitative outcome: `hospdays`

3.4 Survival Time in Days, a time-to-event outcome: `survdays`

20.1.4 Assessing Balance with `cobalt`