This example is based on the Right Heart Catheterization data set available at Vanderbilt University and described here. The key reference is Connors AF et al. 1996 The effectiveness of RHC in the initial care of critically ill patients. JAMA 276: 889-897. Connors et al. used a logistic regression model to develop a propensity score then: [a] matched RHC to non-RHC patients and [b] adjusted for propensity score in models for outcomes, followed by a sensitivity analysis. The key conclusions were that RHC patients had decreased survival time, and any unmeasured confounder would have to be somewhat strong to explain away the results.
library(here); library(janitor); library(naniar)
library(tableone); library(broom); library(survival)
library(Matching); library(cobalt); library(rgenoud)
library(lme4); library(rbounds); library(conflicted)
library(twang); library(survey)
library(tidyverse)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
theme_set(theme_bw())
The observations in the rhc
data set describe 5,735 subjects who are critically ill, and most variables were obtained during day 1 of a hospitalization. We’ll begin by importing the rhc
data into a tibble (a lazy and surly data frame - see R’s help file on tibbles for more details) in R that contains 5,735 observations (rows) on 63 variables (columns).
In the import process, we’re going to do a few things to simplify the process of getting a functional data set.
rhc
data, this creates parsing failures for several measures which are assumed to be integers but actually should be parsed as doubles (continuous variables). So we specify the column types for those variables here then allow R to identify the type for the other columns.read_csv
command treats all string variables as characters, and not as factors. For several multicategorical variables, we want a factor representation in a specific order, so we’ll specify that, too, also using the cols
function.column_types_rhc <-
cols(urin1 = "d", meanbp1 = "d", resp1 = "d",
swang1 = col_factor(c("RHC", "No RHC")),
death = col_factor(c("No", "Yes")),
sex = col_factor(c("Male", "Female")),
cat1 = col_factor(c("ARF", "CHF", "Cirrhosis", "Colon Cancer", "Coma", "COPD",
"Lung Cancer", "MOSF w/Malignancy", "MOSF w/Sepsis")),
dnr1 = col_factor(c("No", "Yes")),
card = col_factor(c("No", "Yes")),
gastr = col_factor(c("No", "Yes")),
hema = col_factor(c("No", "Yes")),
meta = col_factor(c("No", "Yes")),
neuro = col_factor(c("No", "Yes")),
ortho = col_factor(c("No", "Yes")),
renal = col_factor(c("No", "Yes")),
resp = col_factor(c("No", "Yes")),
seps = col_factor(c("No", "Yes")),
trauma = col_factor(c("No", "Yes")),
income = col_factor(c("Under $11k", "$11-$25k", "$25-$50k", "> $50k")),
ninsclas = col_factor(c("Private", "Private & Medicare", "Medicare",
"Medicare & Medicaid", "Medicaid", "No insurance")),
race = col_factor(c("white", "black", "other")),
ca = col_factor(c("No", "Yes", "Metastatic"))
)
The data are available at the link below.
rhc_raw <- read_csv("https://biostat.app.vumc.org/wiki/pub/Main/DataSets/rhc.csv",
col_types = column_types_rhc)
New names:
* `` -> ...1
swang1
), then information on the outcomes, then information on the covariates of interest for our work, dropping the other variables.rhc_cleaning <- rhc_raw %>%
select(ptid, swang1,
death, sadmdte, dschdte, dthdte, lstctdte,
age, sex, edu, income, ninsclas, race,
cat1, dnr1, wtkilo1, hrt1, meanbp1, resp1, temp1,
card, gastr, hema, meta, neuro, ortho, renal,
resp, seps, trauma,
amihx, ca, cardiohx, chfhx, chrpulhx, dementhx,
gibledhx, immunhx, liverhx, malighx, psychhx,
renalhx, transhx, aps1, das2d3pc, scoma1,
surv2md1, alb1, bili1, crea1, hema1, paco21,
pafi1, ph1, pot1, sod1, wblc1)
The rhc_cleaning
data at this stage is a tibble (a lazy and surly data frame) containing 5735 observations (rows) on 57 variables (columns). The observations describe subjects who are critically ill, and most variables were obtained during day 1 of a hospitalization.
rhc_cleaning
# A tibble: 5,735 x 57
ptid swang1 death sadmdte dschdte dthdte lstctdte age sex edu income
<chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <fct>
1 00005 No RHC No 11142 11151 NA 11382 70.3 Male 12 Under ~
2 00007 RHC Yes 11799 11844 11844 11844 78.2 Female 12 Under ~
3 00009 RHC No 12083 12143 NA 12400 46.1 Female 14.1 $25-$5~
4 00010 No RHC Yes 11146 11183 11183 11182 75.3 Female 9 $11-$2~
5 00011 RHC Yes 12035 12037 12037 12036 67.9 Male 9.95 Under ~
6 00012 No RHC No 12389 12396 NA 12590 86.1 Female 8 Under ~
7 00013 No RHC No 12381 12423 NA 12616 55.0 Male 14 $25-$5~
8 00014 No RHC Yes 11453 11487 11491 11490 43.6 Male 12 $25-$5~
9 00016 No RHC No 12426 12437 NA 12560 18.0 Female 12.8 Under ~
10 00017 RHC No 11381 11400 NA 11590 48.4 Female 11.0 Under ~
# ... with 5,725 more rows, and 46 more variables: ninsclas <fct>, race <fct>,
# cat1 <fct>, dnr1 <fct>, wtkilo1 <dbl>, hrt1 <dbl>, meanbp1 <dbl>,
# resp1 <dbl>, temp1 <dbl>, card <fct>, gastr <fct>, hema <fct>, meta <fct>,
# neuro <fct>, ortho <fct>, renal <fct>, resp <fct>, seps <fct>,
# trauma <fct>, amihx <dbl>, ca <fct>, cardiohx <dbl>, chfhx <dbl>,
# chrpulhx <dbl>, dementhx <dbl>, gibledhx <dbl>, immunhx <dbl>,
# liverhx <dbl>, malighx <dbl>, psychhx <dbl>, renalhx <dbl>, ...
rhc_cleaning %>%
miss_var_summary()
# A tibble: 57 x 3
variable n_miss pct_miss
<chr> <int> <dbl>
1 dthdte 2013 35.1
2 dschdte 1 0.0174
3 ptid 0 0
4 swang1 0 0
5 death 0 0
6 sadmdte 0 0
7 lstctdte 0 0
8 age 0 0
9 sex 0 0
10 edu 0 0
# ... with 47 more rows
We could also graph this with gg_miss_var(rhc_cleaning)
. As it turns out, there were three variables in the raw rhc
data that have missing values, but we’ve omitted one (about day 1 urine output) in creating this rhc_cleaning
file. The remaining two are dates, and we’ll address those in the material on creating our outcomes.
The treatment/exposure we are studying is the presence of a Swan-Ganz (right heart) catheter on day 1, captured in the variable swang1
. The Medline Plus description of right heart catheterization is here. Basically, the procedure passes a thin tube into the right side of the heart and the arteries leading to the lungs, to measure the heart’s function and blood flow. It’s done in very ill patients for several reasons, and is also referred to as right heart catheterization and pulmonary artery catheterization.
rhc_cleaning %>% tabyl(swang1)
swang1 n percent
RHC 2184 0.3808195
No RHC 3551 0.6191805
Of course, the most important issue to us is that each subject was not randomly allocated to either the “RHC” or “No RHC” group. Instead, this is an observational study, where each subject received their treatment (RHC or no) on the basis of what their providers thought would be best for them.
We will define three outcomes here: one binary, one quantitative, and one time-to-event.
death
One outcome is death during the study, as captured in the death
variable. This is a binary outcome.
rhc_cleaning %>% tabyl(death)
death n percent
No 2013 0.3510026
Yes 3722 0.6489974
Those people without a value for dthdte
turn out to be the people with death
values of “No” and this makes sense, because dthdte
is a code for date of death.
hospdays
Our third outcome is length of initial hospital stay, which is captured by subtracting the study admission date (sadmdte
) from the hospital discharge date (dschdte
). This is a quantitative outcomes, and we will create it soon, and name it hospdays
in our data set.
rhc_cleaning <- rhc_cleaning %>%
mutate(hospdays = dschdte - sadmdte)
mosaic::favstats(hospdays ~ swang1, data = rhc_cleaning)
swang1 min Q1 median Q3 max mean sd n missing
1 RHC 2 8 16 31 342 24.69139 27.79865 2184 0
2 No RHC 2 7 12 22 338 19.52915 23.58672 3551 0
It looks like all of these values are reasonable (in that they are all positive) and we have no missing values.
survdays
Another outcome is survival time in days (which is censored for all patients who did not die during the study), and which is captured by subtracting the study admission date (sadmdte
) from the death date (dthdte
) for the patients who died in study, and the study admission date (sadmdte
) from the date of last contact (lstctdte
) for the patients who did not die in study. This is a time-to-event outcome, and we will create it soon, and name it survdays
in our data set.
rhc_cleaning <- rhc_cleaning %>%
mutate(survdays = ifelse(death == "Yes",
dschdte - sadmdte,
lstctdte - sadmdte))
mosaic::favstats(survdays ~ death, data = rhc_cleaning)
death min Q1 median Q3 max mean sd n missing
1 No 2 187 210 236 1351 231.18082 112.24989 2013 0
2 Yes 2 6 12 23 338 19.99893 24.59443 3722 0
Again, this looks reasonable. All of these values are strictly positive, for instance, and none seem wildly out of line, and we have no missing values. We also see that those who died had much shorter survival times (typically) than those who survived, which makes sense.
We might also look at the survdays
variable broken down by RHC status.
mosaic::favstats(survdays ~ swang1, data = rhc_cleaning)
swang1 min Q1 median Q3 max mean sd n missing
1 RHC 2 9 25.5 187 1351 91.75962 129.5763 2184 0
2 No RHC 2 9 25.0 191 1243 95.57871 117.7174 3551 0
These seem quite comparable, so far, and again, they make sense (all are positive, and there are no missing values.)
Our propensity model will eventually include 50 covariates, described below.
Variable | Definition |
---|---|
age |
Age in years |
sex |
Sex |
edu |
Years of Education |
income |
Income Category (4 levels) |
ninsclas |
Insurance Category (6 levels) |
race |
Self-Reported Race (3 levels) |
Here’s a Table 1 for these covariates.
vars01 <- c("age", "sex", "edu", "income", "ninsclas", "race")
table1_01 <-
CreateTableOne(vars = vars01, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_01, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
age (mean (SD)) 60.75 (15.63) 61.76 (17.29) 0.061
sex = Female (%) 906 (41.5) 1637 (46.1) 0.093
edu (mean (SD)) 11.86 (3.16) 11.57 (3.13) 0.091
income (%) 0.142
Under $11k 1145 (52.4) 2081 (58.6)
$11-$25k 452 (20.7) 713 (20.1)
$25-$50k 393 (18.0) 500 (14.1)
> $50k 194 ( 8.9) 257 ( 7.2)
ninsclas (%) 0.194
Private 731 (33.5) 967 (27.2)
Private & Medicare 490 (22.4) 746 (21.0)
Medicare 511 (23.4) 947 (26.7)
Medicare & Medicaid 123 ( 5.6) 251 ( 7.1)
Medicaid 193 ( 8.8) 454 (12.8)
No insurance 136 ( 6.2) 186 ( 5.2)
race (%) 0.036
white 1707 (78.2) 2753 (77.5)
black 335 (15.3) 585 (16.5)
other 142 ( 6.5) 213 ( 6.0)
Variable | Definition |
---|---|
cat1 |
Primary disease category (9 levels) |
dnr1 |
Do Not Resuscitate status, day 1 |
wtkilo1 |
Weight in kg, day 1 |
hrt1 |
Heart rate, day 1 |
meanbp1 |
Mean Blood Pressure, day 1 |
resp1 |
Respiratory rate, day 1 |
temp1 |
Temperature, C, day 1 |
vars02 <- c("cat1", "dnr1", "wtkilo1", "hrt1", "meanbp1",
"resp1", "temp1")
table1_02 <-
CreateTableOne(vars = vars02, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_02, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
cat1 (%) 0.583
ARF 909 (41.6) 1581 (44.5)
CHF 209 ( 9.6) 247 ( 7.0)
Cirrhosis 49 ( 2.2) 175 ( 4.9)
Colon Cancer 1 ( 0.0) 6 ( 0.2)
Coma 95 ( 4.3) 341 ( 9.6)
COPD 58 ( 2.7) 399 (11.2)
Lung Cancer 5 ( 0.2) 34 ( 1.0)
MOSF w/Malignancy 158 ( 7.2) 241 ( 6.8)
MOSF w/Sepsis 700 (32.1) 527 (14.8)
dnr1 = Yes (%) 155 ( 7.1) 499 (14.1) 0.228
wtkilo1 (mean (SD)) 72.36 (27.73) 65.04 (29.50) 0.256
hrt1 (mean (SD)) 118.93 (41.47) 112.87 (40.94) 0.147
meanbp1 (mean (SD)) 68.20 (34.24) 84.87 (38.87) 0.455
resp1 (mean (SD)) 26.65 (14.17) 28.98 (13.95) 0.165
temp1 (mean (SD)) 37.59 (1.83) 37.63 (1.74) 0.021
Variable | Definition |
---|---|
card |
Cardiovascular diagnosis |
gastr |
Gastrointestinal diagnosis |
hema |
Hematologic diagnosis |
meta |
Metabolic diagnosis |
neuro |
Neurological diagnosis |
ortho |
Orthopedic diagnosis |
renal |
Renal diagnosis |
resp |
Respiratory diagnosis |
seps |
Sepsis diagnosis |
trauma |
Trauma diagnosis |
vars03 <- c("card", "gastr", "hema", "meta", "neuro", "ortho",
"renal", "resp", "seps", "trauma")
table1_03 <-
CreateTableOne(vars = vars03, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_03, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
card = Yes (%) 924 (42.3) 1007 (28.4) 0.295
gastr = Yes (%) 420 (19.2) 522 (14.7) 0.121
hema = Yes (%) 115 ( 5.3) 239 ( 6.7) 0.062
meta = Yes (%) 93 ( 4.3) 172 ( 4.8) 0.028
neuro = Yes (%) 118 ( 5.4) 575 (16.2) 0.353
ortho = Yes (%) 4 ( 0.2) 3 ( 0.1) 0.027
renal = Yes (%) 148 ( 6.8) 147 ( 4.1) 0.116
resp = Yes (%) 632 (28.9) 1481 (41.7) 0.270
seps = Yes (%) 516 (23.6) 515 (14.5) 0.234
trauma = Yes (%) 34 ( 1.6) 18 ( 0.5) 0.104
Variable | Definition |
---|---|
amihx |
Definite Myocardial Infarction |
ca |
Cancer (3 levels) |
cardiohx |
Acute MI, Peripheral Vascular Disease, Severe Cardiovascular Symptoms (NYHA-Class III), Very Severe Cardiovascular Symptoms (NYHA- IV) |
chfhx |
Congestive Heart Failure |
chrpulhx |
Chronic Pulmonary Disease, Severe or Very Severe Pulmonary Disease |
dementhx |
Dementia, Stroke or Cerebral Infarct, Parkinson’s Disease |
gibledhx |
Upper GI Bleeding |
immunhx |
Immunosupperssion, Organ Transplant, HIV Positivity, Diabetes Mellitus Without End Organ Damage, Diabetes Mellitus With End Organ Damage, Connective Tissue Disease |
liverhx |
Cirrhosis, Hepatic Failure |
malighx |
Solid Tumor, Metastatic Disease, Chronic Leukemia/Myeloma, Acute Leukemia, Lymphoma |
psychhx |
Psychiatric History, Active Psychosis or Severe Depression |
renalhx |
Chronic Renal Disease, Chronic Hemodialysis or Peritoneal Dialysis |
transhx |
Transfer (> 24 Hours) from Another Hospital |
vars04 <- c("amihx", "ca", "cardiohx", "chfhx", "chrpulhx", "dementhx",
"gibledhx", "immunhx", "liverhx", "malighx", "psychhx", "renalhx", "transhx")
table1_04 <-
CreateTableOne(vars = vars04, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_04, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
amihx (mean (SD)) 0.04 (0.20) 0.03 (0.17) 0.074
ca (%) 0.107
No 1727 (79.1) 2652 (74.7)
Yes 334 (15.3) 638 (18.0)
Metastatic 123 ( 5.6) 261 ( 7.4)
cardiohx (mean (SD)) 0.20 (0.40) 0.16 (0.37) 0.116
chfhx (mean (SD)) 0.19 (0.40) 0.17 (0.37) 0.069
chrpulhx (mean (SD)) 0.14 (0.35) 0.22 (0.41) 0.192
dementhx (mean (SD)) 0.07 (0.25) 0.12 (0.32) 0.163
gibledhx (mean (SD)) 0.02 (0.16) 0.04 (0.19) 0.070
immunhx (mean (SD)) 0.29 (0.45) 0.26 (0.44) 0.080
liverhx (mean (SD)) 0.06 (0.24) 0.07 (0.26) 0.049
malighx (mean (SD)) 0.20 (0.40) 0.25 (0.43) 0.101
psychhx (mean (SD)) 0.05 (0.21) 0.08 (0.27) 0.143
renalhx (mean (SD)) 0.05 (0.21) 0.04 (0.20) 0.032
transhx (mean (SD)) 0.15 (0.36) 0.09 (0.29) 0.170
Variable | Definition |
---|---|
aps1 |
APACHE III score ignoring Coma, day 1 |
das2d3pc |
DASI (Duke Activity Status Index prior to admission) |
scoma1 |
Support Coma score based on Glasgow, day 1 |
surv2md1 |
SUPPORT model estimate of Prob(surviving 2 months) |
vars05 <- c("aps1", "das2d3pc", "scoma1", "surv2md1")
table1_05 <-
CreateTableOne(vars = vars05, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_05, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
aps1 (mean (SD)) 60.74 (20.27) 50.93 (18.81) 0.501
das2d3pc (mean (SD)) 20.70 (5.03) 20.37 (5.48) 0.063
scoma1 (mean (SD)) 18.97 (28.26) 22.25 (31.37) 0.110
surv2md1 (mean (SD)) 0.57 (0.20) 0.61 (0.19) 0.198
Variable | Definition |
---|---|
alb1 |
Albumin |
bili1 |
Bilirubin |
crea1 |
Serum Creatinine |
hema1 |
Hematocrit |
paco21 |
PaCo2 |
pafi1 |
PaO2/(.01 FIO2) |
ph1 |
Serum PH (arterial) |
pot1 |
Serum Potassium |
sod1 |
Serum Sodium |
wblc1 |
White blood cell count (4 subjects with value 0) |
vars06 <- c("alb1", "bili1", "crea1", "hema1", "paco21",
"pafi1", "pot1", "sod1", "wblc1")
table1_06 <-
CreateTableOne(vars = vars06, strata = "swang1",
data = rhc_cleaning, test = FALSE)
print(table1_06, smd = TRUE)
Stratified by swang1
RHC No RHC SMD
n 2184 3551
alb1 (mean (SD)) 2.98 (0.93) 3.16 (0.67) 0.230
bili1 (mean (SD)) 2.71 (5.33) 2.00 (4.43) 0.145
crea1 (mean (SD)) 2.47 (2.05) 1.92 (2.03) 0.270
hema1 (mean (SD)) 30.51 (7.42) 32.70 (8.79) 0.269
paco21 (mean (SD)) 36.79 (10.97) 39.95 (14.24) 0.249
pafi1 (mean (SD)) 192.43 (105.54) 240.63 (116.66) 0.433
pot1 (mean (SD)) 4.05 (1.01) 4.08 (1.04) 0.027
sod1 (mean (SD)) 136.33 (7.60) 137.04 (7.68) 0.092
wblc1 (mean (SD)) 16.27 (12.55) 15.26 (11.41) 0.084
So, in all, 31 of the 50 covariates show standardized mean differences that exceed 0.10 before any propensity score adjustment.
For our outcome, and our treatment, it will be useful to have 1/0 versions of the variables, and it will also be helpful to have the factor versions releveled to put the outcome of interest (death) first and the treatment of interest (RHC) first.
rhc <- rhc_cleaning %>%
mutate(treat_rhc = as.numeric(swang1 == "RHC"),
swang1 = fct_relevel(swang1, "RHC"),
death = fct_relevel(death, "Yes"),
died = as.numeric(death == "Yes"))
saveRDS(rhc, here("data", "rhc.Rds"))
I’m going to set a seed for random numbers before we start, which I can change later if I like.
set.seed(50012345)
Before we estimate a propensity score, we’ll perform unadjusted assessments to describe the impact of RHC (or not) on our three outcomes, without adjustment for any covariates at all.
rhc %>% tabyl(swang1, death) %>%
adorn_totals() %>%
adorn_percentages() %>%
adorn_pct_formatting() %>%
adorn_ns(position = "front") %>%
adorn_title
death
swang1 Yes No
RHC 1486 (68.0%) 698 (32.0%)
No RHC 2236 (63.0%) 1315 (37.0%)
Total 3722 (64.9%) 2013 (35.1%)
The odds ratio of death associated with RHC as opposed to No RHC should be larger than 1, specifically, it should be
\[ \frac{1486 \times 1315}{2236 \times 698} = 1.25 \]
Let’s see if that’s what we get.
death_unadj <- glm(died ~ treat_rhc, data = rhc, family = binomial())
death_unadj
Call: glm(formula = died ~ treat_rhc, family = binomial(), data = rhc)
Coefficients:
(Intercept) treat_rhc
0.5309 0.2248
Degrees of Freedom: 5734 Total (i.e. Null); 5733 Residual
Null Deviance: 7433
Residual Deviance: 7418 AIC: 7422
tidy(death_unadj, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 2 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.70 0.0348 15.3 1.11e-52 1.59 1.82
2 treat_rhc 1.25 0.0576 3.90 9.43e- 5 1.12 1.40
Alternatively, we could just work with the factors, but then, it’s important to be sure what is actually being modeled. Do this by making the outcome and treatment logical results, as follows.
modelA_unadj1 <- glm((death == "Yes") ~ (swang1 == "RHC"), data = rhc, family = binomial())
tidy(modelA_unadj1, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 2 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "(Intercept)" 1.70 0.0348 15.3 1.11e-52 1.59 1.82
2 "swang1 == \"RHC\"TR~ 1.25 0.0576 3.90 9.43e- 5 1.12 1.40
Failing to do this can cause trouble:
modelA_unadj_bad <- glm(death ~ swang1, data = rhc, family = binomial())
tidy(modelA_unadj_bad, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 2 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.470 0.0459 -16.5 6.32e-61 0.429 0.514
2 swang1No RHC 1.25 0.0576 3.90 9.43e- 5 1.12 1.40
Note that this last version is actually predicting “Death = No” on the basis of “swang1 = No RHC”, which can be extremely confusing.
And if you did this:
modelA_unadj_bad2 <- glm(death ~ swang1 == "RHC", data = rhc, family = binomial())
tidy(modelA_unadj_bad2, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 2 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "(Intercept)" 0.588 0.0348 -15.3 1.11e-52 0.549 0.629
2 "swang1 == \"RHC\"TR~ 0.799 0.0576 -3.90 9.43e- 5 0.713 0.894
Now, you’ve inverted the odds ratio! Not good. So, either use 0 and 1 (with 1 being the result you want to study), or use the factors but specify the direction for both treatment and outcome as a logical statement.
hospdays_unadj <- lm(hospdays ~ treat_rhc, data = rhc)
hospdays_unadj
Call:
lm(formula = hospdays ~ treat_rhc, data = rhc)
Coefficients:
(Intercept) treat_rhc
19.529 5.162
tidy(hospdays_unadj, conf.int = TRUE)
# A tibble: 2 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 19.5 0.424 46.0 0 18.7 20.4
2 treat_rhc 5.16 0.687 7.51 6.76e-14 3.81 6.51
Here’s the data on survdays
and death
for our first six patients:
rhc %>% select(survdays, death) %>% head()
# A tibble: 6 x 2
survdays death
<dbl> <fct>
1 240 No
2 45 Yes
3 317 No
4 37 Yes
5 2 Yes
6 201 No
Here, we have a (right-censored) time to event outcome, called survdays
, and an indicator of censoring (death
which is “Yes” if the patient’s time is real and not censored). So the first subject should have a time to death of 240 or more days, because they were still alive when the study ended, while the second should have a time that is exactly 45 days, because that person died during the study. The survival object we need, then, is:
Surv(rhc$survdays, rhc$death == "Yes") %>% head()
[1] 240+ 45 317+ 37 2 201+
survdays_unadj <- coxph(Surv(survdays, death == "Yes") ~ treat_rhc, data = rhc)
survdays_unadj
Call:
coxph(formula = Surv(survdays, death == "Yes") ~ treat_rhc, data = rhc)
coef exp(coef) se(coef) z p
treat_rhc 0.07841 1.08156 0.03347 2.342 0.0192
Likelihood ratio test=5.46 on 1 df, p=0.0195
n= 5735, number of events= 3722
tidy(survdays_unadj, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.08 0.0335 2.34 0.0192 1.01 1.15
propensity_model <-
glm(swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1,
family = binomial(link = "logit"), data = rhc)
rhc <- rhc %>%
mutate(ps = fitted(propensity_model),
linps = propensity_model$linear.predictors)
rhc %>% select(ptid, ps, linps) %>% head()
# A tibble: 6 x 3
ptid ps linps
<chr> <dbl> <dbl>
1 00005 0.351 -0.615
2 00007 0.669 0.704
3 00009 0.634 0.547
4 00010 0.369 -0.537
5 00011 0.446 -0.218
6 00012 0.0553 -2.84
ggplot(rhc, aes(x = ps, fill = swang1)) +
geom_density(alpha = 0.5) +
scale_fill_viridis_d(option = "plasma") +
theme_bw()
ggplot(rhc, aes(x = swang1, y = ps)) +
geom_violin(aes(fill = swang1)) +
geom_boxplot(width = 0.2) +
scale_fill_viridis_d(option = "plasma") +
guides(fill = "none") +
coord_flip() +
theme_bw()
OK. We’re trying to use ps
to estimate the probability of having a RHC, and it looks the people who actually had RHC have higher PS, on average, so that’s promising.
Rubin Rule 1 result should ideally be near 0, and certainly between -50 and +50.
rubin1.unadj <- with(rhc,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.unadj
[1] 101.583
Rubin Rule 2 result should ideally be near 1, and certainly between 1/2 and 2.
rubin2.unadj <- with(rhc,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.unadj
[1] 0.7359463
So, we’ve got some work to do. Let’s try matching. In several different ways.
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match1 <- Match(Tr = Tr, X = X, M = 1,
estimand = "ATT", replace = FALSE,
ties = FALSE)
summary(match1)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 2184
Matched number of observations (unweighted). 2184
b1 <- bal.tab(match1,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b1, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 1") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_1 <- factor(rep(match1$index.treated, 2))
rhc.matches_1 <- cbind(matches_1,
rhc[c(match1$index.control, match1$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m1 <- with(rhc.matches_1,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m1
[1] 61.88315
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m1 <- with(rhc.matches_1,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m1
[1] 1.687288
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match2 <- Match(Tr = Tr, X = X, M = 2,
estimand = "ATT", replace = FALSE,
ties = FALSE)
summary(match2)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 1775
Matched number of observations (unweighted). 3550
b2 <- bal.tab(match2,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b2, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 2") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_2 <- factor(rep(match2$index.treated, 2))
rhc.matches_2 <- cbind(matches_2,
rhc[c(match2$index.control, match2$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m2 <- with(rhc.matches_2,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m2
[1] 101.2148
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m2 <- with(rhc.matches_2,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m2
[1] 0.7569187
The GenMatch
function finds optimal balance using multivariate matching where a genetic search algorithm determines the weight that each covariate is given.
Please note that I am using way too small values of pop.size
(especially) and probably max.generations
, too, so that things run quickly. I am also only matching on the propensity score and linear propensity score, rather than on the individual covariates. For a course project, I recommend you try matching on the individual covariates if you can, but this reduction in the paramater values is fine.
For actual publications, follow the recommendations of the GenMatch
algorithm, as described in the Matching
and genoud
package help files. The default values (which may be too small themselves) are 100 for both pop.size
and max.generations
.
X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
genout3 <- GenMatch(Tr = Tr, X = X,
estimand = "ATT", M = 1,
pop.size = 10, max.generations = 10,
wait.generations = 4, verbose = FALSE)
Thu Feb 24 16:28:58 2022
Domains:
0.000000e+00 <= X1 <= 1.000000e+03
0.000000e+00 <= X2 <= 1.000000e+03
Data Type: Floating Point
Operators (code number, name, population)
(1) Cloning........................... 0
(2) Uniform Mutation.................. 1
(3) Boundary Mutation................. 1
(4) Non-Uniform Mutation.............. 1
(5) Polytope Crossover................ 1
(6) Simple Crossover.................. 2
(7) Whole Non-Uniform Mutation........ 1
(8) Heuristic Crossover............... 2
(9) Local-Minimum Crossover........... 0
SOFT Maximum Number of Generations: 10
Maximum Nonchanging Generations: 4
Population size : 10
Convergence Tolerance: 1.000000e-03
Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
Not Checking Gradients before Stopping.
Using Out of Bounds Individuals.
Maximization Problem.
GENERATION: 0 (initializing the population)
Lexical Fit..... 9.956314e-02 1.374329e-01 1.000000e+00 1.000000e+00
#unique......... 10, #Total UniqueCount: 10
var 1:
best............ 2.557446e+02
mean............ 4.477418e+02
variance........ 9.086363e+04
var 2:
best............ 1.546012e+02
mean............ 2.039613e+02
variance........ 1.939734e+04
GENERATION: 1
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 5, #Total UniqueCount: 15
var 1:
best............ 1.182693e+02
mean............ 2.121052e+02
variance........ 6.281992e+03
var 2:
best............ 8.855767e+02
mean............ 2.551477e+02
variance........ 4.781941e+04
GENERATION: 2
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 8, #Total UniqueCount: 23
var 1:
best............ 1.182693e+02
mean............ 1.605890e+02
variance........ 6.899419e+03
var 2:
best............ 8.855767e+02
mean............ 7.395392e+02
variance........ 7.794576e+04
GENERATION: 3
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 8, #Total UniqueCount: 31
var 1:
best............ 1.182693e+02
mean............ 1.654700e+02
variance........ 2.088027e+04
var 2:
best............ 8.855767e+02
mean............ 8.976217e+02
variance........ 5.297551e+02
GENERATION: 4
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 7, #Total UniqueCount: 38
var 1:
best............ 1.182693e+02
mean............ 1.682741e+02
variance........ 2.316481e+04
var 2:
best............ 8.855767e+02
mean............ 8.898565e+02
variance........ 2.827973e+01
GENERATION: 5
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 7, #Total UniqueCount: 45
var 1:
best............ 1.182693e+02
mean............ 1.108716e+02
variance........ 3.545226e+02
var 2:
best............ 8.855767e+02
mean............ 8.903722e+02
variance........ 4.330907e+01
GENERATION: 6
Lexical Fit..... 1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
#unique......... 9, #Total UniqueCount: 54
var 1:
best............ 1.182693e+02
mean............ 2.017702e+02
variance........ 6.331150e+04
var 2:
best............ 8.855767e+02
mean............ 8.819291e+02
variance........ 2.067346e+02
'wait.generations' limit reached.
No significant improvement in 4 generations.
Solution Lexical Fitness Value:
1.311405e-01 2.250274e-01 1.000000e+00 1.000000e+00
Parameters at the Solution:
X[ 1] : 1.182693e+02
X[ 2] : 8.855767e+02
Solution Found Generation 1
Number of Generations Run 6
Thu Feb 24 16:29:21 2022
Total run time : 0 hours 0 minutes and 23 seconds
match3 <- Match(Tr = Tr, X = X, estimand = "ATT",
Weight.matrix = genout3)
summary(match3)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 2184
Matched number of observations (unweighted). 2259
b3 <- bal.tab(match3,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b3, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 3") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_3 <- factor(rep(match3$index.treated, 2))
rhc.matches_3 <- cbind(matches_3,
rhc[c(match3$index.control, match3$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m3 <- with(rhc.matches_3,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m3
[1] 0.04940705
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m3 <- with(rhc.matches_3,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m3
[1] 1.002561
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match4 <- Match(Tr = Tr, X = X, M = 1,
estimand = "ATT", replace = TRUE,
ties = FALSE)
summary(match4)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 2184
Matched number of observations (unweighted). 2184
b4 <- bal.tab(match4,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b4, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 4") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_4 <- factor(rep(match4$index.treated, 2))
rhc.matches_4 <- cbind(matches_4,
rhc[c(match4$index.control, match4$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m4 <- with(rhc.matches_4,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m4
[1] 0.05521448
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m4 <- with(rhc.matches_4,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m4
[1] 1.002868
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match5 <- Match(Tr = Tr, X = X, M = 1,
caliper = 0.2,
estimand = "ATT", replace = FALSE,
ties = FALSE)
summary(match5)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 1563
Matched number of observations (unweighted). 1563
Caliper (SDs)........................................ 0.2
Number of obs dropped by 'exact' or 'caliper' 621
b5 <- bal.tab(match5,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b5, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 5") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_5 <- factor(rep(match5$index.treated, 2))
rhc.matches_5 <- cbind(matches_5,
rhc[c(match5$index.control, match5$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m5 <- with(rhc.matches_5,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m5
[1] 1.697439
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m5 <- with(rhc.matches_5,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m5
[1] 1.027527
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match6 <- Match(Tr = Tr, X = X, M = 2,
estimand = "ATT", replace = TRUE,
ties = FALSE)
summary(match6)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 5735
Original number of treated obs............... 2184
Matched number of observations............... 2184
Matched number of observations (unweighted). 4368
b6 <- bal.tab(match6,
swang1 == "RHC" ~
age + sex + edu + income + ninsclas + race +
cat1 + dnr1 + wtkilo1 + hrt1 + meanbp1 +
resp1 + temp1 +
card + gastr + hema + meta + neuro + ortho +
renal + resp + seps + trauma +
amihx + ca + cardiohx + chfhx + chrpulhx +
dementhx + gibledhx + immunhx + liverhx +
malighx + psychhx + renalhx + transhx +
aps1 + das2d3pc + scoma1 + surv2md1 +
alb1 + bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1 + ps + linps,
data=rhc, un = TRUE)
love.plot(b6, threshold = .1, size = 1.5, stars = "raw",
var.order = "unadjusted",
title = "Standardized Differences and Match 6") +
theme_bw()
Create a new data frame, containing only the matched sample.
matches_6 <- factor(rep(match6$index.treated, 2))
rhc.matches_6 <- cbind(matches_6,
rhc[c(match6$index.control, match6$index.treated),])
Rubin Rule 1 result after matching should ideally be near 0, and certainly between -50 and +50.
rubin1.m6 <- with(rhc.matches_6,
abs(100*(mean(linps[swang1=="RHC"]) -
mean(linps[swang1=="No RHC"]))/
sd(linps)))
rubin1.m6
[1] 0.1186399
Rubin Rule 2 result after matching should ideally be near 1, and certainly between 1/2 and 2.
rubin2.m6 <- with(rhc.matches_6,
var(linps[swang1 == "RHC"]) /
var(linps[swang1 == "No RHC"]))
rubin2.m6
[1] 1.005432
Match | Rubin 1 | Rubin 2 | Love Plot | Matched Sets | Description |
---|---|---|---|---|---|
None | 101.6 | 0.74 | – | – | No Matching |
1 | 61.9 | 1.69 | Pretty Bad | 2,184 | 1:1 greedy w/o repl |
2 | 101.2 | 0.76 | Dreadful | 1,775 | 1:2 greedy w/o repl |
3 | 0 | 1 | Very Good | 2,184 | Genetic Search 1:1 w/o repl |
4 | 0.1 | 1 | Very Good | 2,184 | 1:1 greedy with repl |
5 | 1.7 | 1.03 | Very Good | 1,562 | 1:1 caliper without repl |
6 | 0.1 | 1.01 | Very Good | 2,184 | 1:2 greedy with repl |
Note that the variable which identifies the matches (called matches_1
) is set up as a factor, as it needs to be, and that I’m using the 1/0 versions of both the outcome (so died
rather than death
) and treatment (so treat_rhc
rather than swang1
).
death_adj_m1 <- clogit(died ~ treat_rhc + strata(matches_1), data = rhc.matches_1)
summary(death_adj_m1)
Call:
coxph(formula = Surv(rep(1, 4368L), died) ~ treat_rhc + strata(matches_1),
data = rhc.matches_1, method = "exact")
n= 4368, number of events= 2861
coef exp(coef) se(coef) z Pr(>|z|)
treat_rhc 0.23156 1.26056 0.06488 3.569 0.000358 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treat_rhc 1.261 0.7933 1.11 1.432
Concordance= 0.558 (se = 0.023 )
Likelihood ratio test= 12.82 on 1 df, p=3e-04
Wald test = 12.74 on 1 df, p=4e-04
Score (logrank) test = 12.79 on 1 df, p=3e-04
The odds ratio in the exp(coef)
section above is the average causal effect estimate. It describes the odds of having an event (died
) occur associated with being a RHC subject (as identified by treat_rhc
), as compared to the odds of that event when a non-RHC subject. We can tidy this with:
tidy(death_adj_m1, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.26 0.0649 3.57 0.000358
death_adj_m2 <- clogit(died ~ treat_rhc + strata(matches_2), data = rhc.matches_2)
tidy(death_adj_m2, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.33 0.0548 5.27 0.000000137
death_adj_m3 <- clogit(died ~ treat_rhc + strata(matches_3), data = rhc.matches_3)
tidy(death_adj_m3, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.32 0.0634 4.35 0.0000137
death_adj_m4 <- clogit(died ~ treat_rhc + strata(matches_4), data = rhc.matches_4)
tidy(death_adj_m4, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.23 0.0636 3.22 0.00127
death_adj_m5 <- clogit(died ~ treat_rhc + strata(matches_5), data = rhc.matches_5)
tidy(death_adj_m5, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.33 0.0754 3.76 0.000170
death_adj_m6 <- clogit(died ~ treat_rhc + strata(matches_6), data = rhc.matches_6)
tidy(death_adj_m6, exponentiate = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.36 0.0491 6.25 4.18e-10
temp0 <- tidy(death_unadj, conf.int = TRUE, exponentiate = TRUE) %>%
filter(term == "treat_rhc")
temp1 <- tidy(death_adj_m1, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp2 <- tidy(death_adj_m2, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp3 <- tidy(death_adj_m3, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp4 <- tidy(death_adj_m4, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp5 <- tidy(death_adj_m5, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
temp6 <- tidy(death_adj_m6, exponentiate = TRUE, conf.int = T, conf.level = 0.95)
death_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)
death_match_results <- death_match_results %>%
mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
mutate(description = c("No Matching", "1:1 greedy matching without replacement",
"1:2 greedy matching without replacement",
"1:1 genetic search matching without replacement",
"1:1 greedy matching with replacement",
"1:1 caliper matching on the linear PS without replacement",
"1:2 greedy matching with replacement"))
death_match_results
# A tibble: 7 x 9
term estimate std.error statistic p.value conf.low conf.high approach
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 treat_rhc 1.25 0.0576 3.90 9.43e- 5 1.12 1.40 Unmatched
2 treat_rhc 1.26 0.0649 3.57 3.58e- 4 1.11 1.43 Match 1
3 treat_rhc 1.33 0.0548 5.27 1.37e- 7 1.20 1.49 Match 2
4 treat_rhc 1.32 0.0634 4.35 1.37e- 5 1.16 1.49 Match 3
5 treat_rhc 1.23 0.0636 3.22 1.27e- 3 1.08 1.39 Match 4
6 treat_rhc 1.33 0.0754 3.76 1.70e- 4 1.15 1.54 Match 5
7 treat_rhc 1.36 0.0491 6.25 4.18e-10 1.23 1.50 Match 6
# ... with 1 more variable: description <chr>
ggplot(death_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for Death using RHC Propensity Matching",
x = "Propensity Adjustment Approach",
y = "Estimated Odds Ratio for Death associated with RHC")
Let’s look at just the four strategies that achieved stronger balance.
death_match_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "ATT Matching Estimates in RHC (Death)",
x = "Propensity Adjustment Approach",
y = "Estimated Odds Ratio for Death associated with RHC")
Again, the variable which identifies the matches (called matches_1
) is set up as a factor, as it needs to be, and that I’m using the 1/0 version of treatment (so treat_rhc
rather than swang1
).
Our result for this quantitative outcome (hospdays
) comes from a mixed model where the matches are treated as a random factor, but the treatment group is treated as a fixed factor, using the lme4
package.
hospdays_adj_m1 <- lmer(hospdays ~ treat_rhc + (1 | matches_1),
data = rhc.matches_1)
summary(hospdays_adj_m1); confint(hospdays_adj_m1)
Linear mixed model fit by REML ['lmerMod']
Formula: hospdays ~ treat_rhc + (1 | matches_1)
Data: rhc.matches_1
REML criterion at convergence: 40934.5
Scaled residuals:
Min 1Q Median 3Q Max
-0.9292 -0.5540 -0.3137 0.1747 11.9768
Random effects:
Groups Name Variance Std.Dev.
matches_1 (Intercept) 14.5 3.809
Residual 674.0 25.962
Number of obs: 4368, groups: matches_1, 2184
Fixed effects:
Estimate Std. Error t value
(Intercept) 20.8255 0.5615 37.090
treat_rhc 3.8658 0.7857 4.921
Correlation of Fixed Effects:
(Intr)
treat_rhc -0.700
Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 0.000000 6.594183
.sigma 25.205111 26.717184
(Intercept) 19.725069 21.926030
treat_rhc 2.325669 5.406016
The estimated effect of treat_rhc
in the fixed effects section, combined with the 95% confidence intervals material, provides the average causal effect estimate. It describes the change in the outcome hospdays
associated with being a RHC subject (as identified by treat_rhc
), as compared to being a non-RHC subject. We can tidy this with:
broom.mixed::tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95)
# A tibble: 4 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 20.8 0.561 37.1 19.7 21.9
2 fixed <NA> treat_rhc 3.87 0.786 4.92 2.33 5.41
3 ran_pars matches_1 sd__(Inter~ 3.81 NA NA NA NA
4 ran_pars Residual sd__Observ~ 26.0 NA NA NA NA
or more specifically, we can look at just the treatment effect with:
broom.mixed::tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 3.87 0.786 4.92 2.33 5.41
hospdays_adj_m2 <- lmer(hospdays ~ treat_rhc + (1 | matches_2),
data = rhc.matches_2)
broom.mixed::tidy(hospdays_adj_m2, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 4.98 0.545 9.14 3.91 6.05
hospdays_adj_m3 <- lmer(hospdays ~ treat_rhc + (1 | matches_3),
data = rhc.matches_3)
broom.mixed::tidy(hospdays_adj_m3, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 1.41 0.842 1.68 -0.240 3.06
hospdays_adj_m4 <- lmer(hospdays ~ treat_rhc + (1 | matches_4),
data = rhc.matches_4)
boundary (singular) fit: see help('isSingular')
broom.mixed::tidy(hospdays_adj_m4, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 1.27 0.859 1.48 -0.411 2.96
hospdays_adj_m5 <- lmer(hospdays ~ treat_rhc + (1 | matches_5),
data = rhc.matches_5)
boundary (singular) fit: see help('isSingular')
broom.mixed::tidy(hospdays_adj_m5, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 2.30 0.919 2.50 0.496 4.10
hospdays_adj_m6 <- lmer(hospdays ~ treat_rhc + (1 | matches_6),
data = rhc.matches_6)
broom.mixed::tidy(hospdays_adj_m6, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc")
# A tibble: 1 x 8
effect group term estimate std.error statistic conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> treat_rhc 1.22 0.556 2.18 0.125 2.31
temp0 <- tidy(hospdays_unadj, conf.int = TRUE) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp1 <- tidy(hospdays_adj_m1, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp2 <- tidy(hospdays_adj_m2, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp3 <- tidy(hospdays_adj_m3, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp4 <- tidy(hospdays_adj_m4, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp5 <- tidy(hospdays_adj_m5, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
temp6 <- tidy(hospdays_adj_m6, conf.int = TRUE, conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(term, estimate, std.error, conf.low, conf.high)
hospdays_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)
hospdays_match_results <- hospdays_match_results %>%
mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
mutate(description = c("No Matching", "1:1 greedy matching without replacement",
"1:2 greedy matching without replacement",
"1:1 genetic search matching without replacement",
"1:1 greedy matching with replacement",
"1:1 caliper matching on the linear PS without replacement",
"1:2 greedy matching with replacement"))
hospdays_match_results
# A tibble: 7 x 7
term estimate std.error conf.low conf.high approach description
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 treat_rhc 5.16 0.687 3.81 6.51 Unmatched No Matching
2 treat_rhc 3.87 0.786 2.33 5.41 Match 1 1:1 greedy matching~
3 treat_rhc 4.98 0.545 3.91 6.05 Match 2 1:2 greedy matching~
4 treat_rhc 1.41 0.842 -0.240 3.06 Match 3 1:1 genetic search ~
5 treat_rhc 1.27 0.859 -0.411 2.96 Match 4 1:1 greedy matching~
6 treat_rhc 2.30 0.919 0.496 4.10 Match 5 1:1 caliper matchin~
7 treat_rhc 1.22 0.556 0.125 2.31 Match 6 1:2 greedy matching~
ggplot(hospdays_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for LOS using RHC Propensity Matching",
x = "Propensity Adjustment Approach",
y = "Estimated Change in Hospital Length of Stay associated with RHC")
Let’s look at just the four strategies that achieved stronger balance.
hospdays_match_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "ATT Matching Estimates in RHC (LOS)",
x = "Propensity Adjustment Approach",
y = "Est. Change in Hospital LOS associated with RHC")
We’ll use a stratified Cox proportional hazards model to compare the RHC/No RHC groups on our time-to-event outcome survdays
, while accounting for the matched pairs. The main result will be a relative hazard rate estimate, with 95% CI. I will use the 0/1 numeric versions of the event indicator (died
), and of the treatment indicator (treat_rhc
).
survdays_adj_m1 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_1), data=rhc.matches_1)
summary(survdays_adj_m1)
Call:
coxph(formula = Surv(survdays, died) ~ treat_rhc + strata(matches_1),
data = rhc.matches_1)
n= 4368, number of events= 2861
coef exp(coef) se(coef) z Pr(>|z|)
treat_rhc 0.04768 1.04883 0.04554 1.047 0.295
exp(coef) exp(-coef) lower .95 upper .95
treat_rhc 1.049 0.9534 0.9593 1.147
Concordance= 0.512 (se = 0.016 )
Likelihood ratio test= 1.1 on 1 df, p=0.3
Wald test = 1.1 on 1 df, p=0.3
Score (logrank) test = 1.1 on 1 df, p=0.3
The hazard ratio in the exp(coef)
section is the average causal effect estimate. It describes the hazard for having an event (died
) occur associated with being a RHC subject (as identified by treat_rhc
), as compared to the hazard of that event when a non-RHC subject. We can tidy this with:
tidy(survdays_adj_m1, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.05 0.0455 1.05 0.295 0.959 1.15
survdays_adj_m2 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_2), data=rhc.matches_2)
tidy(survdays_adj_m2, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.27 0.0343 7.02 2.17e-12 1.19 1.36
survdays_adj_m3 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_3), data=rhc.matches_3)
tidy(survdays_adj_m3, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.09 0.0445 2.03 0.0426 1.00 1.19
survdays_adj_m4 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_4), data=rhc.matches_4)
tidy(survdays_adj_m4, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.06 0.0452 1.20 0.231 0.966 1.15
survdays_adj_m5 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_5), data=rhc.matches_5)
tidy(survdays_adj_m5, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.12 0.0535 2.05 0.0399 1.01 1.24
survdays_adj_m6 <- coxph(Surv(survdays, died) ~ treat_rhc + strata(matches_6), data=rhc.matches_6)
tidy(survdays_adj_m6, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.36 0.0313 9.89 4.68e-23 1.28 1.45
temp0 <- tidy(survdays_unadj, conf.int = TRUE, exponentiate = TRUE)
temp1 <- tidy(survdays_adj_m1, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
temp2 <- tidy(survdays_adj_m2, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
temp3 <- tidy(survdays_adj_m3, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
temp4 <- tidy(survdays_adj_m4, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
temp5 <- tidy(survdays_adj_m5, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
temp6 <- tidy(survdays_adj_m6, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.95)
survdays_match_results <- rbind(temp0, temp1, temp2, temp3, temp4, temp5, temp6)
survdays_match_results <- survdays_match_results %>%
mutate(approach = c("Unmatched", "Match 1", "Match 2", "Match 3", "Match 4", "Match 5", "Match 6")) %>%
mutate(description = c("No Matching", "1:1 greedy matching without replacement",
"1:2 greedy matching without replacement",
"1:1 genetic search matching without replacement",
"1:1 greedy matching with replacement",
"1:1 caliper matching on the linear PS without replacement",
"1:2 greedy matching with replacement"))
survdays_match_results
# A tibble: 7 x 9
term estimate std.error statistic p.value conf.low conf.high approach
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 treat_rhc 1.08 0.0335 2.34 1.92e- 2 1.01 1.15 Unmatched
2 treat_rhc 1.05 0.0455 1.05 2.95e- 1 0.959 1.15 Match 1
3 treat_rhc 1.27 0.0343 7.02 2.17e-12 1.19 1.36 Match 2
4 treat_rhc 1.09 0.0445 2.03 4.26e- 2 1.00 1.19 Match 3
5 treat_rhc 1.06 0.0452 1.20 2.31e- 1 0.966 1.15 Match 4
6 treat_rhc 1.12 0.0535 2.05 3.99e- 2 1.01 1.24 Match 5
7 treat_rhc 1.36 0.0313 9.89 4.68e-23 1.28 1.45 Match 6
# ... with 1 more variable: description <chr>
ggplot(survdays_match_results, aes(x = approach, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for Survival using RHC Propensity Matching",
x = "Propensity Adjustment Approach",
y = "Estimated Hazard Ratio associated with RHC")
Let’s look at just the four strategies that achieved stronger balance.
survdays_match_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5", "Match 6")) %>%
ggplot(., aes(x = description, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "ATT Matching Estimates in RHC (Survival)",
x = "Propensity Adjustment Approach",
y = "Estimated Hazard Ratio associated with RHC")
To begin, we’ll re-estimate the propensity score using the twang
function ps
. This uses a generalized boosted regression approach to estimate the propensity score and produce material for checking balance.
# Recall that twang does not play well with tibbles,
# so we have to use the data frame version of rhc object
# and I'm using the 1/0 version of the treatment
rhc_df <- base::as.data.frame(rhc)
ps_rhc <- ps(treat_rhc ~
age + sex + edu + income + ninsclas +
race + cat1 + dnr1 + wtkilo1 + hrt1 +
meanbp1 + resp1 + temp1 + card + gastr +
hema + meta + neuro + ortho + renal +
resp + seps + trauma + amihx + ca +
cardiohx + chfhx + chrpulhx + dementhx +
gibledhx + immunhx + liverhx + malighx +
psychhx + renalhx + transhx + aps1 +
das2d3pc + scoma1 + surv2md1 + alb1 +
bili1 + crea1 + hema1 + paco21 + pafi1 +
ph1 + pot1 + sod1 + wblc1,
data = rhc_df,
n.trees = 4000,
interaction.depth = 2,
stop.method = c("es.mean"),
estimand = "ATT",
verbose = FALSE)
plot(ps_rhc)
summary(ps_rhc)
n.treat n.ctrl ess.treat ess.ctrl max.es mean.es max.ks
unw 2184 3551 2184 3551.000 0.53367192 0.14968341 0.21274111
es.mean.ATT 2184 3551 2184 1250.501 0.09123789 0.03520429 0.04966016
max.ks.p mean.ks iter
unw NA 0.05831639 NA
es.mean.ATT NA 0.01540652 3999
plot(ps_rhc, plots = 2)
plot(ps_rhc, plots = 3)
cobalt
b2 <- bal.tab(ps_rhc, full.stop.method = "es.mean.att",
stats = c("m", "v"), un = TRUE)
b2
Call
ps.fast(formula = formula, data = data, params = params, n.trees = nrounds,
interaction.depth = max_depth, shrinkage = eta, bag.fraction = subsample,
n.minobsinnode = min_child_weight, perm.test.iters = perm.test.iters,
print.level = print.level, verbose = verbose, estimand = estimand,
stop.method = stop.method, sampw = sampw, multinom = multinom,
ks.exact = ks.exact, version = version, tree_method = tree_method,
n.keep = n.keep, n.grid = n.grid, keep.data = keep.data)
Balance Measures
Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
prop.score Distance 1.5554 1.2083 0.5108 0.9058
age Contin. -0.0647 0.8175 0.0027 0.9662
sex_Female Binary -0.0462 . 0.0015 .
edu Contin. 0.0910 1.0147 0.0149 1.0020
income_Under $11k Binary -0.0618 . -0.0144 .
income_$11-$25k Binary 0.0062 . -0.0128 .
income_$25-$50k Binary 0.0391 . 0.0245 .
income_> $50k Binary 0.0165 . 0.0027 .
ninsclas_Private Binary 0.0624 . 0.0195 .
ninsclas_Private & Medicare Binary 0.0143 . -0.0057 .
ninsclas_Medicare Binary -0.0327 . -0.0011 .
ninsclas_Medicare & Medicaid Binary -0.0144 . 0.0023 .
ninsclas_Medicaid Binary -0.0395 . -0.0231 .
ninsclas_No insurance Binary 0.0099 . 0.0082 .
race_white Binary 0.0063 . -0.0088 .
race_black Binary -0.0114 . 0.0062 .
race_other Binary 0.0050 . 0.0026 .
cat1_ARF Binary -0.0290 . -0.0098 .
cat1_CHF Binary 0.0261 . -0.0019 .
cat1_Cirrhosis Binary -0.0268 . -0.0063 .
cat1_Colon Cancer Binary -0.0012 . 0.0000 .
cat1_Coma Binary -0.0525 . -0.0053 .
cat1_COPD Binary -0.0858 . -0.0057 .
cat1_Lung Cancer Binary -0.0073 . -0.0002 .
cat1_MOSF w/Malignancy Binary 0.0045 . 0.0002 .
cat1_MOSF w/Sepsis Binary 0.1721 . 0.0289 .
dnr1_Yes Binary -0.0696 . -0.0129 .
wtkilo1 Contin. 0.2640 0.8835 0.0075 1.0408
hrt1 Contin. 0.1460 1.0260 0.0506 0.9705
meanbp1 Contin. -0.4869 0.7759 -0.0476 0.9655
resp1 Contin. -0.1641 1.0330 -0.0912 1.0078
temp1 Contin. -0.0209 1.1024 -0.0378 1.0288
card_Yes Binary 0.1395 . 0.0136 .
gastr_Yes Binary 0.0453 . 0.0250 .
hema_Yes Binary -0.0146 . -0.0099 .
meta_Yes Binary -0.0059 . 0.0030 .
neuro_Yes Binary -0.1079 . -0.0134 .
ortho_Yes Binary 0.0010 . 0.0016 .
renal_Yes Binary 0.0264 . 0.0083 .
resp_Yes Binary -0.1277 . -0.0374 .
seps_Yes Binary 0.0912 . 0.0178 .
trauma_Yes Binary 0.0105 . 0.0092 .
amihx Binary 0.0139 . 0.0168 .
ca_No Binary 0.0439 . 0.0216 .
ca_Yes Binary -0.0267 . -0.0093 .
ca_Metastatic Binary -0.0172 . -0.0122 .
cardiohx Binary 0.0445 . -0.0094 .
chfhx Binary 0.0268 . -0.0100 .
chrpulhx Binary -0.0737 . -0.0021 .
dementhx Binary -0.0472 . -0.0055 .
gibledhx Binary -0.0122 . -0.0065 .
immunhx Binary 0.0358 . -0.0126 .
liverhx Binary -0.0124 . -0.0075 .
malighx Binary -0.0423 . -0.0211 .
psychhx Binary -0.0348 . -0.0052 .
renalhx Binary 0.0066 . -0.0102 .
transhx Binary 0.0554 . 0.0178 .
aps1 Contin. 0.4837 1.1609 0.0664 1.0236
das2d3pc Contin. 0.0654 0.8429 0.0195 0.9675
scoma1 Contin. -0.1160 0.8116 -0.0114 0.9701
surv2md1 Contin. -0.1954 1.0663 -0.0226 1.0063
alb1 Contin. -0.2010 1.8947 -0.0217 1.6852
bili1 Contin. 0.1329 1.4504 -0.0102 0.9447
crea1 Contin. 0.2678 1.0276 0.0335 0.9294
hema1 Contin. -0.2954 0.7109 -0.0098 0.9668
paco21 Contin. -0.2880 0.5935 -0.0559 1.0124
pafi1 Contin. -0.4566 0.8184 -0.0635 0.9492
ph1 Contin. -0.1163 1.1322 -0.0754 1.1463
pot1 Contin. -0.0274 0.9501 -0.0149 1.0131
sod1 Contin. -0.0927 0.9799 -0.0272 1.0189
wblc1 Contin. 0.0799 1.2088 -0.0110 1.0154
Effective sample sizes
Control Treated
Unadjusted 3551. 2184
Adjusted 1250.5 2184
We could simply compare the propensity score balance as reported in the table above, looking at the standardized biases (the standardized difference on a proportion scale, rather than a percentage.) I’ll multiply each by 100 to place things on the usual scale.
100*b2[1]$Balance$Diff.Un[1] # imbalance prior to weighting
[1] 155.5377
100*b2[1]$Balance$Diff.Adj[1] # imbalance after weighting
[1] 51.07571
So this is still indicative of some difficulty reaching a completely reasonable level of balance. We’d like to see this much closer to 0.
b2[1]$Balance$V.Ratio.Un[1] # imbalance prior to weighting
[1] 1.208331
b2[1]$Balance$V.Ratio.Adj[1] # imbalance after weighting
[1] 0.9057622
So there’s no particular problem with Rule 2 here.
p <- love.plot(b2,
threshold = .1, size = 3, stars = "raw",
title = "Standardized Diffs and ATT weights")
p + theme_bw()
p <- love.plot(b2, stat = "v",
threshold = 1.25, size = 3,
title = "Variance Ratios: TWANG ATT weighting")
p + theme_bw()
We’re not too excited about weighted estimates without (at least) an additional adjustment for differences in the propensity score in this case, but I’ll run them anyway just to show how that would be done.
The relevant regression approach uses the svydesign
and svyglm
functions from the survey
package. For a binary outcome, we build the outcome model using the quasibinomial, rather than the usual binomial family. Note the use of the 1/0 version of both the outcome and the treatment variables here.
rhcwts_design <- svydesign(
ids=~1, weights=~get.weights(ps_rhc,
stop.method = "es.mean"),
data=rhc_df) # using twang ATT weights
adj_death_wtd <- svyglm(died ~ treat_rhc,
design=rhcwts_design,
family=quasibinomial())
wt_twangatt_res_death <-
tidy(adj_death_wtd, exponentiate = TRUE,
conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
wt_twangatt_res_death
# A tibble: 2 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.73 0.0602 1.53 1.94
2 treat_rhc 1.23 0.0757 1.06 1.43
The estimates we want come from the treat_rhc
row.
Now let’s add in a regression adjustment for the linear propensity score, which should alleviate some of our concerns about residual imbalance after weighting.
dr_adj_death <- svyglm(died ~ treat_rhc + linps,
design=rhcwts_design,
family=quasibinomial())
dr_death <-
tidy(dr_adj_death, exponentiate = TRUE,
conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
dr_death
# A tibble: 3 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.72 0.0612 1.52 1.94
2 treat_rhc 1.26 0.0812 1.07 1.47
3 linps 0.938 0.0415 0.864 1.02
Again, the estimates we want come from the treat_rhc
row.
temp7 <-
tidy(adj_death_wtd, exponentiate = TRUE,
conf.int = T, conf.level = 0.95) %>%
filter(term == "treat_rhc")
temp8 <-
tidy(dr_adj_death, exponentiate = TRUE,
conf.int = T, conf.level = 0.95) %>%
filter(term == "treat_rhc")
death_new_results <- rbind(temp7, temp8) %>%
mutate(approach = c("ATT weighted", "DR")) %>%
mutate(description = c("ATT weights", "DR: ATT wts + adj"))
death_results <- rbind(death_match_results,
death_new_results)
death_results
# A tibble: 9 x 9
term estimate std.error statistic p.value conf.low conf.high approach
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 treat_rhc 1.25 0.0576 3.90 9.43e- 5 1.12 1.40 Unmatched
2 treat_rhc 1.26 0.0649 3.57 3.58e- 4 1.11 1.43 Match 1
3 treat_rhc 1.33 0.0548 5.27 1.37e- 7 1.20 1.49 Match 2
4 treat_rhc 1.32 0.0634 4.35 1.37e- 5 1.16 1.49 Match 3
5 treat_rhc 1.23 0.0636 3.22 1.27e- 3 1.08 1.39 Match 4
6 treat_rhc 1.33 0.0754 3.76 1.70e- 4 1.15 1.54 Match 5
7 treat_rhc 1.36 0.0491 6.25 4.18e-10 1.23 1.50 Match 6
8 treat_rhc 1.23 0.0757 2.78 5.50e- 3 1.06 1.43 ATT weight~
9 treat_rhc 1.26 0.0812 2.81 4.94e- 3 1.07 1.47 DR
# ... with 1 more variable: description <chr>
So there’s our table. It’s worth remembering that the only analyses that we’ve
death_results %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "ATT Estimates in RHC (Death)",
x = "Propensity Adjustment Approach",
y = "Estimated Odds Ratio for Death associated with RHC")
And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…
death_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5",
"Match 6", "DR")) %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "ATT Estimates in RHC (Death)",
x = "Propensity Adjustment Approach",
y = "Estimated Odds Ratio for Death associated with RHC")
The relevant regression approach uses the svydesign
and svyglm
functions from the survey
package.
rhcwts_design <- svydesign(
ids=~1, weights=~get.weights(ps_rhc,
stop.method = "es.mean"),
data=rhc_df) # using twang ATT weights
adj_hospdays_wtd <-
svyglm(hospdays ~ treat_rhc,
design= rhcwts_design)
wt_twangatt_res_hospdays <-
tidy(adj_hospdays_wtd, conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
wt_twangatt_res_hospdays
# A tibble: 2 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.6 0.844 20.9 24.2
2 treat_rhc 2.10 1.03 0.0750 4.12
The estimates we want come from the treat_rhc
row.
Now let’s add in a regression adjustment for the linear propensity score. Again, we should have some additional faith in these results over those with just weighting alone, because of the failure of the weighting approach to pass Rubin’s Rule 1.
dr_adj_days <- svyglm(hospdays ~ treat_rhc + linps,
design=rhcwts_design)
dr_los <-
tidy(dr_adj_days, conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
dr_los
# A tibble: 3 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.8 0.873 21.1 24.5
2 treat_rhc 1.56 1.10 -0.605 3.72
3 linps 1.93 0.507 0.934 2.92
Once again, the estimates we want come from the treat_rhc
row.
temp9 <-
tidy(adj_hospdays_wtd, conf.int = T,
conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(-statistic, -p.value)
temp10 <-
tidy(dr_adj_days, conf.int = T,
conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(-statistic, -p.value)
hospdays_new_results <- rbind(temp9, temp10) %>%
mutate(approach = c("ATT weighted", "DR")) %>%
mutate(description = c("ATT weights", "DR: ATT wts + adj"))
hospdays_results <- rbind(hospdays_match_results,
hospdays_new_results)
hospdays_results
# A tibble: 9 x 7
term estimate std.error conf.low conf.high approach description
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 treat_rhc 5.16 0.687 3.81 6.51 Unmatched No Matching
2 treat_rhc 3.87 0.786 2.33 5.41 Match 1 1:1 greedy match~
3 treat_rhc 4.98 0.545 3.91 6.05 Match 2 1:2 greedy match~
4 treat_rhc 1.41 0.842 -0.240 3.06 Match 3 1:1 genetic sear~
5 treat_rhc 1.27 0.859 -0.411 2.96 Match 4 1:1 greedy match~
6 treat_rhc 2.30 0.919 0.496 4.10 Match 5 1:1 caliper matc~
7 treat_rhc 1.22 0.556 0.125 2.31 Match 6 1:2 greedy match~
8 treat_rhc 2.10 1.03 0.0750 4.12 ATT weighted ATT weights
9 treat_rhc 1.56 1.10 -0.605 3.72 DR DR: ATT wts + adj
So there’s our table.
hospdays_results %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for LOS",
x = "Propensity Adjustment Approach",
y = "Estimated Change in Hospital LOS associated with RHC")
And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…
hospdays_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5",
"Match 6", "DR")) %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for LOS",
x = "Propensity Adjustment Approach",
y = "Estimated Change in Hospital LOS associated with RHC")
rhc_wts <- get.weights(ps_rhc, stop.method = "es.mean")
adj_survdays_wtd <- coxph(Surv(survdays, died) ~ treat_rhc,
data = rhc_df, weights = rhc_wts)
wt_twangatt_res_surv <-
tidy(adj_survdays_wtd, exponentiate = TRUE,
conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
wt_twangatt_res_surv
# A tibble: 1 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.11 0.0396 1.02 1.22
Again, the estimates we want come from the treat_rhc
row.
Now let’s add in a regression adjustment for the linear propensity score. Again, in this case, we prefer this double robust approach to weighting alone because of Rubin’s Rule 1 and the Love plot.
dr_survdays <-
coxph(Surv(survdays, died) ~ treat_rhc + linps,
data = rhc_df, weights = rhc_wts)
dr_surv <-
tidy(dr_survdays, exponentiate = TRUE,
conf.int = TRUE,
conf.level = 0.95) %>%
select(term, estimate, std.error,
lo95 = conf.low, hi95 = conf.high)
dr_surv
# A tibble: 2 x 5
term estimate std.error lo95 hi95
<chr> <dbl> <dbl> <dbl> <dbl>
1 treat_rhc 1.13 0.0399 1.03 1.23
2 linps 0.955 0.0173 0.913 0.999
Once again, the estimates we want come from the treat_rhc
row.
temp11 <-
tidy(adj_survdays_wtd, exponentiate = T, conf.int = T,
conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(-robust.se)
temp12 <-
tidy(dr_survdays, exponentiate = T, conf.int = T,
conf.level = 0.95) %>%
filter(term == "treat_rhc") %>%
select(-robust.se)
survdays_new_results <- rbind(temp11, temp12) %>%
mutate(approach = c("ATT weighted", "DR")) %>%
mutate(description = c("ATT weights", "DR: ATT wts + adj"))
survdays_results <- rbind(survdays_new_results,
survdays_match_results)
survdays_results
# A tibble: 9 x 9
term estimate std.error statistic p.value conf.low conf.high approach
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 treat_rhc 1.11 0.0396 2.39 1.67e- 2 1.02 1.22 ATT weight~
2 treat_rhc 1.13 0.0399 2.55 1.09e- 2 1.03 1.23 DR
3 treat_rhc 1.08 0.0335 2.34 1.92e- 2 1.01 1.15 Unmatched
4 treat_rhc 1.05 0.0455 1.05 2.95e- 1 0.959 1.15 Match 1
5 treat_rhc 1.27 0.0343 7.02 2.17e-12 1.19 1.36 Match 2
6 treat_rhc 1.09 0.0445 2.03 4.26e- 2 1.00 1.19 Match 3
7 treat_rhc 1.06 0.0452 1.20 2.31e- 1 0.966 1.15 Match 4
8 treat_rhc 1.12 0.0535 2.05 3.99e- 2 1.01 1.24 Match 5
9 treat_rhc 1.36 0.0313 9.89 4.68e-23 1.28 1.45 Match 6
# ... with 1 more variable: description <chr>
So there’s our table.
survdays_results %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for Survival Time",
x = "Propensity Adjustment Approach",
y = "Estimated Hazard Ratio associated with RHC")
And here are the results for only the five methods (adding our double robust approach to the analyses we liked in the matching) which showed reasonable balance after propensity score adjustment…
survdays_results %>%
filter(approach %in% c("Match 3", "Match 4", "Match 5",
"Match 6", "DR")) %>%
ggplot(., aes(x = description, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_text(aes(label = round_half_up(estimate,2)), vjust = -1.25) +
geom_pointrange() +
geom_hline(yintercept = 1, col = "red", linetype = "dashed") +
theme_bw() +
coord_flip() +
labs(title = "Comparing ATT Estimates for Survival Time",
x = "Propensity Adjustment Approach",
y = "Estimated Hazard Ratio associated with RHC")
The rbounds
package can evaluate binary outcomes using the binarysens
and Fishersens
functions. The binarysens
function works directly on a matched sample as developed using the Matching package. For example, in our third match, based on the genetic search algorithm with 1:1 matching without replacement, this is match3
. Note that in order to do this, we needed to run match3
including the appropriate (binary) outcome (Y).
Y <- rhc$died
X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
match3 <- Match(Y = Y, Tr = Tr, X = X, estimand = "ATT",
Weight.matrix = genout3)
binarysens(match3, Gamma = 1.5, GammaInc = 0.05)
Rosenbaum Sensitivity Test
Unconfounded estimate .... 0
Gamma Lower bound Upper bound
1.00 1e-05 0.00001
1.05 0e+00 0.00020
1.10 0e+00 0.00260
1.15 0e+00 0.01854
1.20 0e+00 0.07975
1.25 0e+00 0.22444
1.30 0e+00 0.44660
1.35 0e+00 0.67877
1.40 0e+00 0.85082
1.45 0e+00 0.94458
1.50 0e+00 0.98337
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
At a 5% significance level, we retain significance up to a hidden bias of \(\Gamma\) = 1.15, but not at \(\Gamma\) = 1.2. See the toy_2019
example discussed in Class 4, as well as Chapter 9 of Rosenbaum’s Observation and Experiment for more details on interpretation of this result.
We can also use this approach with other Matching results, including Match 6, which is a 1:2 match with replacement. Again, we have to run the match with the appropriate outcome included.
Y <- rhc$died
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match6 <- Match(Y = Y, Tr = Tr, X = X, M = 2,
estimand = "ATT", replace = TRUE,
ties = FALSE)
binarysens(match6, Gamma = 1.5, GammaInc = 0.05)
Rosenbaum Sensitivity Test
Unconfounded estimate .... 0
Gamma Lower bound Upper bound
1.00 0 0.00000
1.05 0 0.00005
1.10 0 0.00222
1.15 0 0.03116
1.20 0 0.17730
1.25 0 0.48926
1.30 0 0.79823
1.35 0 0.95191
1.40 0 0.99306
1.45 0 0.99938
1.50 0 0.99996
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
Here, at a 5% significance level, we retain significance up to a hidden bias of \(\Gamma\) = 1.15, but not at \(\Gamma\) = 1.20.
We have already used the Match
function from the Matching package to develop our matched samples. For our 1:1 matches, we need only run the psens
function from the rbounds
package to obtain sensitivity results. For example, in Match 3,
Y <- rhc$hospdays
X <- cbind(rhc$linps, rhc$ps)
Tr <- as.logical(rhc$swang1 == "RHC")
match3 <- Match(Y = Y, Tr = Tr, X = X, estimand = "ATT",
Weight.matrix = genout3)
psens(match3, Gamma = 1.5, GammaInc = 0.1)
Rosenbaum Sensitivity Test for Wilcoxon Signed Rank P-Value
Unconfounded estimate .... 1e-04
Gamma Lower bound Upper bound
1.0 1e-04 0.0001
1.1 0e+00 0.0410
1.2 0e+00 0.5095
1.3 0e+00 0.9501
1.4 0e+00 0.9992
1.5 0e+00 1.0000
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
At a 5% significance level, we retain significance up to a hidden bias of \(\Gamma\) = 1.1, but not at \(\Gamma\) = 1.2. See the toy_2019
example discussed in Class 4, as well as Chapter 9 of Rosenbaum’s Observation and Experiment for more details on interpretation of this result.
For a matching with more than 1 control, we need to apply the mcontrol
function, which requires the use of the data.prep
function from rbounds
. So, for example, in Match 6, which is a 1:2 greedy match (so that each matched set has 3 patients in it, making the group.size
= 3), we would have:
Y <- rhc$hospdays
X <- rhc$linps
Tr <- as.logical(rhc$swang1 == "RHC")
match6 <- Match(Y = Y, Tr = Tr, X = X, M = 2,
estimand = "ATT", replace = TRUE,
ties = FALSE)
tmp <- data.prep(match6, group.size = 3)
mcontrol(tmp$Y, tmp$id, tmp$treat, group.size=3, Gamma = 1.5, GammaInc = 0.1)
Rosenbaum Sensitivity Test for Wilcoxon Strat. Rank P-Value
Unconfounded estimate .... 0
Gamma Lower bound Upper bound
1.0 0 0.0000
1.1 0 0.0177
1.2 0 0.3501
1.3 0 0.8834
1.4 0 0.9960
1.5 0 1.0000
Note: Gamma is Odds of Differential Assignment To
Treatment Due to Unobserved Factors
Again, at a 5% significance level, we retain significance up to a hidden bias of \(\Gamma\) = 1.1, but not at \(\Gamma\) = 1.2.
In this setting, we would need to use the spreadsheet software provided by Dr. Love. That software is designed only for 1:1 greedy matching without replacement or weighting, which is the approach we took in Match 1 (we could also have looked at matching using a caliper, but without replacement.) We need to identify, for each matched pair of subjects, whether that pair’s set of results was:
Here is some very fussy code that accomplishes this counting. This could probably be accomplished just as well with some sort of case_when
statement, but it’s a bit tricky.
a1 <- rhc.matches_1 %>%
select(matches_1, swang1, death, survdays) %>%
arrange(matches_1)
a2 <- unite(a1, situation, swang1, death, sep = "_", remove = TRUE)
a3 <- spread(a2, situation, survdays)
a4 <- a3 %>%
filter(complete.cases(`No RHC_Yes`, `RHC_Yes`)) %>%
mutate(result = ifelse(`RHC_Yes` > `No RHC_Yes`, "RHC lived longer", "No RHC lived longer"))
a5 <- a3 %>%
filter(complete.cases(`No RHC_No`, `RHC_No`)) %>%
mutate(result = "No clear winner")
a6 <- a3 %>%
filter(complete.cases(`No RHC_No`, `RHC_Yes`)) %>%
mutate(result = ifelse(`RHC_Yes` > `No RHC_No`, "No clear winner", "No RHC lived longer"))
a7 <- a3 %>%
filter(complete.cases(`No RHC_Yes`, `RHC_No`)) %>%
mutate(result = ifelse(`RHC_No` > `No RHC_Yes`, "RHC lived longer", "No clear winner"))
a8 <- bind_rows(a4, a5, a6, a7)
a8 %>% tabyl(result) %>% adorn_totals()
result n percent
No clear winner 278 0.1272894
No RHC lived longer 988 0.4523810
RHC lived longer 918 0.4203297
Total 2184 1.0000000
So we have counts for the pairs where a clear winner is identified, and we can enter the spreadsheet. We need the number of pairs with a clear winner, and the number of pairs where the No RHC patient lived longer than the RHC patient.
xfun::session_info()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Locale:
LC_COLLATE=English_United States.1252
LC_CTYPE=English_United States.1252
LC_MONETARY=English_United States.1252
LC_NUMERIC=C
LC_TIME=English_United States.1252
Package version:
askpass_1.1 assertthat_0.2.1 backports_1.4.1
base64enc_0.1.3 bit_4.0.4 bit64_4.0.5
blob_1.2.2 boot_1.3-28 brio_1.1.3
broom_0.7.12 broom.mixed_0.2.7 bslib_0.3.1
cachem_1.0.6 callr_3.7.0 cellranger_1.1.0
class_7.3-19 cli_3.1.1 clipr_0.7.1
cobalt_4.3.2 coda_0.19.4 colorspace_2.0-2
compiler_4.1.2 conflicted_1.1.0 cpp11_0.4.2
crayon_1.4.2 crosstalk_1.2.0 curl_4.3.2
data.table_1.14.2 DBI_1.1.2 dbplyr_2.1.1
desc_1.4.0 diffobj_0.3.5 digest_0.6.29
dplyr_1.0.8 dtplyr_1.2.1 e1071_1.7-9
ellipsis_0.3.2 evaluate_0.14 fansi_1.0.2
farver_2.1.0 fastmap_1.1.0 forcats_0.5.1
fs_1.5.2 gargle_1.2.0 gbm_2.1.8
gdata_2.18.0 generics_0.1.2 ggdendro_0.1.22
ggforce_0.3.3 ggformula_0.10.1 ggplot2_3.3.5
ggrepel_0.9.1 ggridges_0.5.3 ggstance_0.3.5
glue_1.6.1 gmodels_2.18.1 googledrive_2.0.0
googlesheets4_1.0.0 graphics_4.1.2 grDevices_4.1.2
grid_4.1.2 gridExtra_2.3 gtable_0.3.0
gtools_3.9.2 haven_2.4.3 here_1.0.1
highr_0.9 hms_1.1.1 htmltools_0.5.2
htmlwidgets_1.5.4 httr_1.4.2 ids_1.0.1
isoband_0.2.5 janitor_2.1.0 jpeg_0.1-9
jquerylib_0.1.4 jsonlite_1.7.3 knitr_1.37
labeling_0.4.2 labelled_2.9.0 lattice_0.20-45
latticeExtra_0.6-29 lazyeval_0.2.2 leaflet_2.0.4.1
leaflet.providers_1.9.0 lifecycle_1.0.1 lme4_1.1-28
lubridate_1.8.0 magrittr_2.0.2 markdown_1.1
MASS_7.3-55 Matching_4.9-11 Matrix_1.3-4
MatrixModels_0.5-0 memoise_2.0.1 methods_4.1.2
mgcv_1.8.38 mime_0.12 minqa_1.2.4
mitools_2.4 modelr_0.1.8 mosaic_1.8.3
mosaicCore_0.9.0 mosaicData_0.20.2 munsell_0.5.0
naniar_0.6.1 nlme_3.1-153 nloptr_2.0.0
norm_1.0.9.5 numDeriv_2016.8.1.1 openssl_1.4.6
parallel_4.1.2 pillar_1.7.0 pkgconfig_2.0.3
pkgload_1.2.4 plyr_1.8.6 png_0.1-7
polyclip_1.10-0 praise_1.0.0 prettyunits_1.1.1
processx_3.5.2 progress_1.2.2 proxy_0.4-26
ps_1.6.0 purrr_0.3.4 R6_2.5.1
rappdirs_0.3.3 raster_3.5.15 rbounds_2.1
RColorBrewer_1.1-2 Rcpp_1.0.8 RcppEigen_0.3.3.9.1
readr_2.1.2 readxl_1.3.1 rematch_1.0.1
rematch2_2.1.2 reprex_2.0.1 rgenoud_5.8-3.0
rlang_1.0.1 rmarkdown_2.11 rprojroot_2.0.2
rstudioapi_0.13 rvest_1.0.2 sass_0.4.0
scales_1.1.1 selectr_0.4.2 snakecase_0.11.0
sp_1.4.6 splines_4.1.2 stats_4.1.2
stringi_1.7.6 stringr_1.4.0 survey_4.1-1
survival_3.2-13 sys_3.4 tableone_0.13.0
terra_1.5.17 testthat_3.1.2 tibble_3.1.6
tidyr_1.2.0 tidyselect_1.1.1 tidyverse_1.3.1
tinytex_0.36 tools_4.1.2 twang_2.5
tweenr_1.0.2 tzdb_0.2.0 UpSetR_1.4.0
utf8_1.2.2 utils_4.1.2 uuid_1.0.3
vctrs_0.3.8 viridis_0.6.2 viridisLite_0.4.0
visdat_0.5.3 vroom_1.5.7 waldo_0.3.1
withr_2.4.3 xfun_0.29 xgboost_1.5.0.2
xml2_1.3.3 xtable_1.8-4 yaml_2.2.2
zoo_1.8-9