The Lindner Example

Author

Wyatt P. Bensken and Harry Persaud

Published

2024-01-09

Authorship Note

Wyatt P. Bensken and Harry Persaud built this example in 2020-2022. Thomas Love converted it to Quarto in 2023, and made some edits. If you notice errors or encounter problems, contact Thomas dot Love at case dot edu.

1 Setup

Code

library(knitr)

opts_chunk$set(comment = NA) 
options(max.print="250")
opts_knit$set(width=75)

library(janitor) 
library(broom)
library(patchwork)
library(cobalt)
library(Matching)
library(tableone)
library(survey)
library(survival)
library(twang)
library(naniar)
library(magrittr)
library(lme4)
library(tidyverse) # load tidyverse last

## Note that we will also use the broom.mixed package
## but we won't load it here

decim <- function(x, k) format(round(x, k), nsmall=k)

theme_set(theme_light()) # set theme for ggplots

Note: we will also use the broom.mixed package, but we are not loading it as to prevent it conflicting with the functions of broom.

2 Load data

Information on the lindner dataset can be found at at this site ¹.

Code

lindner_raw <- read_csv("data/lindner.csv", show_col_types = FALSE)

lindner_raw

# A tibble: 996 × 12
   studynum lifepres cardbill abcix stent height female diabetic acutemi
   <chr>       <dbl>    <dbl> <dbl> <dbl>  <dbl>  <dbl>    <dbl>   <dbl>
 1 lin_0001     11.6    14232     0     1    170      0        0       0
 2 lin_0002     11.6    10829     1     1    160      1        1       0
 3 lin_0003     11.6    12259     0     1    180      0        0       0
 4 lin_0004     11.6    14628     1     1    175      0        0       0
 5 lin_0005     11.6    11194     1     1    155      1        0       0
 6 lin_0006     11.6    12049     0     1    185      0        1       0
 7 lin_0007     11.6    14304     1     1    168      0        0       0
 8 lin_0008     11.6    11693     1     1    173      0        0       0
 9 lin_0009     11.6    12858     1     1    168      1        0       0
10 lin_0010     11.6    13188     1     1    178      0        0       1
# ℹ 986 more rows
# ℹ 3 more variables: ejecfrac <dbl>, ves1proc <dbl>, sixMonthSurvive <lgl>

Code

n_miss(lindner_raw)

[1] 0

After reading in the data, we can print the first 10 rows to get a sense of what our data looks like. We see the tibble contains information on 996 participants, and there is no missing data.

3 Data managment

3.1 Managing binary variables

In the course of this example, we’ll want both a numeric and factored version of each binary variable.

In all numeric versions of binary variables: 1 indicates ‘yes’ to having trait/characteristic, 0 indicates ‘no’ to having trait/characteristic.
Variable names with trailing “_f” denotes the factored version of each binary variable.

Code

# Six month survival (turning logical variable to a factor)
lindner_raw$sixMonthSurvive_f <- factor(lindner_raw$sixMonthSurvive, levels = c(TRUE,FALSE),
                                        labels = c("yes", "no"))

# Creating numeric (1/0) version of six month survival variable
lindner_raw$sixMonthSurvive <- factor(lindner_raw$sixMonthSurvive_f, levels = c("yes","no"),
                                      labels = c(1, 0))

lindner_raw$sixMonthSurvive <- ifelse(lindner_raw$sixMonthSurvive == "1", 1, 0)

#Add variable named treated (same values as abcix variable)
lindner_raw$treated <- lindner_raw$abcix

# Factoring the exposure of interest variable. Change the name to 'treated' too.
lindner_raw$treated_f <- factor(lindner_raw$abcix, levels = c(1,0),
                                labels = c("treated", "control"))

# Factor version of stent variable
lindner_raw$stent_f <- factor(lindner_raw$stent, levels = c(1,0),
                              labels = c("yes", "no"))

# Factoring the female variable
lindner_raw$female_f <- factor(lindner_raw$female, levels = c(1,0),
                               labels = c("female", "male"))

# Factoring the diabetic variable
lindner_raw$diabetic_f <- factor(lindner_raw$diabetic, levels = c(1,0),
                                 labels = c("yes", "no"))

# Factoring the acutemi variable
lindner_raw$acutemi_f <- factor(lindner_raw$acutemi, levels = c(1,0),
                                labels = c("yes", "no"))

# Add patientid

lindner_raw <- tibble::rowid_to_column(lindner_raw, "patientid")

# Make lindner dataset with "clean" name. 
lindner_clean <- lindner_raw

3.2 Inspecting the clean data

Code

mosaic::inspect(lindner_clean)

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2


categorical variables:  
               name     class levels   n missing
1          studynum character    996 996       0
2 sixMonthSurvive_f    factor      2 996       0
3         treated_f    factor      2 996       0
4           stent_f    factor      2 996       0
5          female_f    factor      2 996       0
6        diabetic_f    factor      2 996       0
7         acutemi_f    factor      2 996       0
                                   distribution
1 lin_0001 (0.1%), lin_0002 (0.1%) ...         
2 yes (97.4%), no (2.6%)                       
3 treated (70.1%), control (29.9%)             
4 yes (66.9%), no (33.1%)                      
5 male (65.3%), female (34.7%)                 
6 no (77.6%), yes (22.4%)                      
7 no (85.6%), yes (14.4%)                      

quantitative variables:  
              name   class  min       Q1  median       Q3      max         mean
1        patientid integer    1   249.75   498.5   747.25    996.0 4.985000e+02
2         lifepres numeric    0    11.60    11.6    11.60     11.6 1.129719e+01
3         cardbill numeric 2216 10218.75 12458.0 16660.00 178534.0 1.567416e+04
4            abcix numeric    0     0.00     1.0     1.00      1.0 7.008032e-01
5            stent numeric    0     0.00     1.0     1.00      1.0 6.686747e-01
6           height numeric  108   165.00   173.0   178.00    196.0 1.714438e+02
7           female numeric    0     0.00     0.0     1.00      1.0 3.473896e-01
8         diabetic numeric    0     0.00     0.0     0.00      1.0 2.238956e-01
9          acutemi numeric    0     0.00     0.0     0.00      1.0 1.435743e-01
10        ejecfrac numeric    0    45.00    55.0    56.00     90.0 5.096687e+01
11        ves1proc numeric    0     1.00     1.0     2.00      5.0 1.385542e+00
12 sixMonthSurvive numeric    0     1.00     1.0     1.00      1.0 9.738956e-01
13         treated numeric    0     0.00     1.0     1.00      1.0 7.008032e-01
             sd   n missing
1  2.876647e+02 996       0
2  1.850501e+00 996       0
3  1.118226e+04 996       0
4  4.581362e-01 996       0
5  4.709262e-01 996       0
6  1.065813e+01 996       0
7  4.763800e-01 996       0
8  4.170623e-01 996       0
9  3.508337e-01 996       0
10 1.041326e+01 996       0
11 6.573525e-01 996       0
12 1.595259e-01 996       0
13 4.581362e-01 996       0

4 Codebook

Information was copy/pasted from here (with some changes to reflect this analysis)

cardbill (Quantitative Outcome): “Cardiac related costs incurred within 6 months of patient’s initial PCI; numeric value in 1998 dollars; costs were truncated by death for the 26 patients with lifepres == 0.”
sixMonthSurvive/sixMonthSurvive_f (Binary Outcome): “Survival at six months a recoded version of lifepres.”
treated/treated_f (Exposure): “Numeric treatment selection indicator; 0 implies usual PCI care alone; 1 implies usual PCI care deliberately augmented by either planned or rescue treatment with abciximab.”
stent/stent_f: “Coronary stent deployment; numeric, with 1 meaning YES and 0 meaning NO.”
height: “Height in centimeters; numeric integer from 108 to 196.”
female/female_f: “Female gender; numeric, with 1 meaning YES and 0 meaning NO.”
diabetic/diabetic_f: “Diabetes mellitus diagnosis; numeric, with 1 meaning YES and 0 meaning NO.”
acutemi/acutemi_f: “Acute myocardial infarction within the previous 7 days; numeric, with 1 meaning YES and 0 meaning NO.”
ejecfrac: “Left ejection fraction; numeric value from 0 percent to 90 percent.”
ves1proc: “Number of vessels involved in the patient’s initial PCI procedure; numeric integer from 0 to 5.”
patientid: a made-up order of numbers to assign patient IDs for later use
Note: Percutaneous Coronary Intervention (PCI)

5 Table 1

Code

var_list = c("cardbill", "sixMonthSurvive_f", "stent_f", "height", "female_f", "diabetic_f", 
             "acutemi_f", "ejecfrac", "ves1proc")

factor_list = c("sixMonthSurvive_f", "stent_f", "female_f", "diabetic_f", "acutemi_f")

CreateTableOne(vars = var_list, strata = "treated_f",
               data = lindner_clean, factorVars = factor_list)

                            Stratified by treated_f
                             treated            control             p      test
  n                               698                298                       
  cardbill (mean (SD))       16126.68 (9383.83) 14614.22 (14514.00)  0.051     
  sixMonthSurvive_f = no (%)       11 ( 1.6)          15 ( 5.0)      0.004     
  stent_f = no (%)                206 (29.5)         124 (41.6)     <0.001     
  height (mean (SD))           171.44 (10.69)     171.45 (10.59)     0.996     
  female_f = male (%)             467 (66.9)         183 (61.4)      0.111     
  diabetic_f = no (%)             555 (79.5)         218 (73.2)      0.034     
  acutemi_f = no (%)              573 (82.1)         280 (94.0)     <0.001     
  ejecfrac (mean (SD))          50.40 (10.42)      52.29 (10.30)     0.009     
  ves1proc (mean (SD))           1.46 (0.71)        1.20 (0.48)     <0.001

As we can see, the mean cardbill was higher in the treated population and a larger percentage of controls did not survive through 6 months.

6 Task 1: Ignoring covariates, estimate the effect of treatment vs. control on the two outcomes

6.1 Quantitative outcome: `cardbill`

Code

lindner_clean %$%
  mosaic::favstats(cardbill ~ treated_f)

  treated_f  min       Q1 median       Q3    max     mean        sd   n missing
1   treated 3563 10902.25  12944 17080.75  96741 16126.68  9383.825 698       0
2   control 2216  8300.00  10423 15895.75 178534 14614.22 14513.996 298       0

Across the entire sample, the mean ($16,127 vs. $14,614) and median ($12,944 vs. $10,423) cardiac care costs were higher in treated individuals than non-treated controls.

Code

ggplot(lindner_clean, aes(x = cardbill, fill = "cardbill")) +
  geom_histogram(bins = 20) +
  labs(y = "Count",
       x = "Cardiac care costs ($)",
       title = "Cardbill appears to be right skewed") +
  guides(fill = "none")

As we can see in this figure, cardbill appears to be right/positively skewed.

Code

unadjust_quant_outcome <- lm(cardbill ~ treated, data = lindner_clean)

unadjust_quant_outcome_tidy <- tidy(unadjust_quant_outcome, conf.int = TRUE, conf.level = 0.95) %>%
    filter(term == "treated")

unadjust_quant_outcome_tidy |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	1512.462	772.692	1.957	0.051	-3.833	3028.757

Treated individuals were estimated to spend 1512.46 (95% CI: -3.83, 3028.76) more dollars than non-treated controls

6.2 Binary outcome: `sixMonthSurvive`

Code

Epi::twoby2(table(lindner_clean$treated_f, lindner_clean$sixMonthSurvive_f))

2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : yes 
Comparing : treated vs. control 

        yes no    P(yes) 95% conf. interval
treated 687 11    0.9842    0.9718   0.9913
control 283 15    0.9497    0.9182   0.9694

                                   95% conf. interval
             Relative Risk: 1.0364    1.0080   1.0656
         Sample Odds Ratio: 3.3103    1.5020   7.2957
Conditional MLE Odds Ratio: 3.3057    1.3992   8.0624
    Probability difference: 0.0346    0.0115   0.0664

             Exact P-value: 0.0037 
        Asymptotic P-value: 0.0030 
------------------------------------------------------

The odds treated individuals were alive after 6 months was roughly 3.31 times the odds that non-treated individuals were alive after 6 months.

Code

unadjust_binary_outcome <- glm(sixMonthSurvive ~ treated, data = lindner_clean, family = binomial())

unadjust_binary_outcome_tidy <- tidy(unadjust_binary_outcome, conf.int = TRUE, conf.level = 0.95, exponentiate = TRUE) %>%
    filter(term == "treated")

unadjust_binary_outcome_tidy |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	3.31	0.403	2.969	0.003	1.511	7.479

The odds of being alive after six months in treated individuals was 3.31 (95% CI: 1.51, 7.48) times higher than the odds that a non-treated control would be alive after six months.

7 Task 2: Fitting the propensity score model

We will now fit the propensity score, which predicts treatment status based on available covariates. Remember, we’re not worried about overfitting (including too many covariates) when calculating the propensity scores.

Code

psmodel <- glm(treated ~ stent + height + female + diabetic + acutemi + ejecfrac + ves1proc, family = binomial(), data =lindner_clean)

summary(psmodel)


Call:
glm(formula = treated ~ stent + height + female + diabetic + 
    acutemi + ejecfrac + ves1proc, family = binomial(), data = lindner_clean)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.965651   1.731085   1.713  0.08668 .  
stent        0.573018   0.150454   3.809  0.00014 ***
height      -0.015366   0.009534  -1.612  0.10700    
female      -0.359060   0.206904  -1.735  0.08267 .  
diabetic    -0.406810   0.170623  -2.384  0.01711 *  
acutemi      1.199548   0.270468   4.435 9.20e-06 ***
ejecfrac    -0.014789   0.007403  -1.998  0.04574 *  
ves1proc     0.760502   0.138437   5.493 3.94e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1215.5  on 995  degrees of freedom
Residual deviance: 1124.3  on 988  degrees of freedom
AIC: 1140.3

Number of Fisher Scoring iterations: 4

Store the raw and linear propensity scores below.

Code

lindner_clean$ps <- psmodel$fitted
lindner_clean$linps <- psmodel$linear.predictors

7.1 Comparing distribution of propensity scores across treatment groups

Here’s a numerical description.

Code

lindner_clean %$% 
  mosaic::favstats(ps ~ treated_f) |> kable(digits = 3)

treated_f	min	Q1	median	Q3	max	mean	sd	n	missing
treated	0.312	0.640	0.716	0.826	0.980	0.727	0.130	698	0
control	0.232	0.556	0.646	0.709	0.958	0.641	0.123	298	0

We can see there are no propensity scores equal to, or very close to, 0 or 1. Now, let’s draw some plots.

7.1.1 Boxplot

Now we’ll visualize the distribution of the propensity scores stratified by treatment status.

Code

ggplot(lindner_clean, aes(x = treated_f, y = ps, color = treated_f)) +
  geom_boxplot() +
  geom_jitter(width = 0.1) +
  guides(color = "none") +
  labs(x = "",
     title = "Raw propensity scores, stratified by exposure group")

7.1.2 Density plot

Code

ggplot(lindner_clean, aes(x = linps, fill = treated_f)) +
  geom_density(alpha = 0.3)

Both of these plots demonstrate good overlap, suggesting a propensity score analysis may be appropriate.

8 Task 3: Rubin’s Rules For Assessing Overlap Before Propensity Adjustment

8.1 Rubin’s Rule 1

Code

rubin1.unadj <- with(lindner_clean,
abs(100*(mean(linps[treated==1])-mean(linps[treated==0]))/sd(linps)))
rubin1.unadj

[1] 61.86668

As you can see, we fail Rubin’s Rule 1 - in which we want below 50%.

8.2 Rubin’s Rule 2

Code

rubin2.unadj <-with(lindner_clean, var(linps[treated==1])/var(linps[treated==0]))
rubin2.unadj

[1] 1.672048

We also “fail” Rubin’s Rule 2 where we are looking for value between 0.8 - 1.2 (ideally, 1).

9 Task 4: Greedy 1:1 matching on the linear PS

The first type of match we will conduct is greedy 1:1 matching, without replacement. As we had only 298 controls, we will not match all of the 698 treated patients.

Code

X <- lindner_clean$linps ## matching on the linear propensity score
Tr <- as.logical(lindner_clean$treated)
match1 <- Match(Tr=Tr, X=X, M = 1, replace=FALSE, ties=FALSE)

Warning in Match(Tr = Tr, X = X, M = 1, replace = FALSE, ties = FALSE):
replace==FALSE, but there are more (weighted) treated obs than control obs.
Some treated obs will not be matched.  You may want to estimate ATC instead.

Code

summary(match1)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  996 
Original number of treated obs...............  698 
Matched number of observations...............  298 
Matched number of observations  (unweighted).  298

Below we’ll assess the match balance from the 1:1 matching.

Code

set.seed(123)
mb1 <- MatchBalance(treated ~ stent + height + female + 
                      diabetic + acutemi + ejecfrac + 
                      ves1proc + ps + linps, 
                    data=lindner_clean, match.out = match1, nboots=500)


***** (V1) stent *****
                       Before Matching       After Matching
mean treatment........    0.70487           0.67114 
mean control..........    0.58389           0.58389 
std mean diff.........     26.505             18.54 

mean raw eQQ diff.....    0.12081          0.087248 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.060489          0.043624 
med  eCDF diff........   0.060489          0.043624 
max  eCDF diff........    0.12098          0.087248 

var ratio (Tr/Co).....    0.85457           0.90842 
T-test p-value........ 0.00032255          0.011186 


***** (V2) height *****
                       Before Matching       After Matching
mean treatment........     171.44            171.14 
mean control..........     171.45            171.45 
std mean diff.........  -0.033804            -2.759 

mean raw eQQ diff.....    0.56376            0.7953 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....         20                22 

mean eCDF diff........  0.0078996            0.0105 
med  eCDF diff........  0.0060095         0.0067114 
max  eCDF diff........   0.024971          0.033557 

var ratio (Tr/Co).....     1.0201            1.0925 
T-test p-value........    0.99608           0.72937 
KS Bootstrap p-value..      0.958             0.912 
KS Naive p-value......    0.99947           0.99608 
KS Statistic..........   0.024971          0.033557 


***** (V3) female *****
                       Before Matching       After Matching
mean treatment........    0.33095            0.3557 
mean control..........    0.38591           0.38591 
std mean diff.........    -11.672           -6.2981 

mean raw eQQ diff.....   0.053691          0.030201 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........    0.02748          0.015101 
med  eCDF diff........    0.02748          0.015101 
max  eCDF diff........    0.05496          0.030201 

var ratio (Tr/Co).....    0.93253           0.96707 
T-test p-value........    0.10045           0.42827 


***** (V4) diabetic *****
                       Before Matching       After Matching
mean treatment........    0.20487           0.19128 
mean control..........    0.26846           0.26846 
std mean diff.........    -15.743           -19.591 

mean raw eQQ diff.....   0.063758          0.077181 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.031793          0.038591 
med  eCDF diff........   0.031793          0.038591 
max  eCDF diff........   0.063585          0.077181 

var ratio (Tr/Co).....    0.82788           0.78767 
T-test p-value........    0.03402          0.015484 


***** (V5) acutemi *****
                       Before Matching       After Matching
mean treatment........    0.17908            0.1745 
mean control..........   0.060403          0.060403 
std mean diff.........     30.931            30.011 

mean raw eQQ diff.....    0.11745           0.11409 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........    0.05934          0.057047 
med  eCDF diff........    0.05934          0.057047 
max  eCDF diff........    0.11868           0.11409 

var ratio (Tr/Co).....     2.5853            2.5381 
T-test p-value........ 4.6617e-09        3.8175e-06 


***** (V6) ejecfrac *****
                       Before Matching       After Matching
mean treatment........     50.403            49.748 
mean control..........     52.289            52.289 
std mean diff.........    -18.102           -23.453 

mean raw eQQ diff.....     2.0503            2.5403 
med  raw eQQ diff.....          1                 1 
max  raw eQQ diff.....         20                20 

mean eCDF diff........   0.035602          0.045818 
med  eCDF diff........   0.011423          0.028523 
max  eCDF diff........    0.11383           0.13758 

var ratio (Tr/Co).....     1.0238            1.1065 
T-test p-value........  0.0085806         0.0014413 
KS Bootstrap p-value..      0.002        < 2.22e-16 
KS Naive p-value......  0.0089219         0.0070991 
KS Statistic..........    0.11383           0.13758 


***** (V7) ves1proc *****
                       Before Matching       After Matching
mean treatment........     1.4628            1.5101 
mean control..........     1.2047            1.2047 
std mean diff.........     36.545             42.62 

mean raw eQQ diff.....     0.2651           0.30537 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.043323          0.050895 
med  eCDF diff........  0.0090671         0.0083893 
max  eCDF diff........    0.18842           0.22819 

var ratio (Tr/Co).....     2.1614            2.2254 
T-test p-value........   4.21e-11        2.2551e-11 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value...... 7.2635e-07         3.649e-07 
KS Statistic..........    0.18842           0.22819 


***** (V8) ps *****
                       Before Matching       After Matching
mean treatment........     0.7265           0.72904 
mean control..........    0.64061           0.64061 
std mean diff.........     66.092            66.768 

mean raw eQQ diff.....   0.085216          0.088433 
med  raw eQQ diff.....   0.081353          0.084782 
max  raw eQQ diff.....    0.12087            0.1245 

mean eCDF diff........    0.17141           0.17874 
med  eCDF diff........    0.17768           0.17785 
max  eCDF diff........    0.27599           0.28859 

var ratio (Tr/Co).....     1.1161            1.1593 
T-test p-value........ < 2.22e-16        < 2.22e-16 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value......  3.042e-14        3.3294e-11 
KS Statistic..........    0.27599           0.28859 


***** (V9) linps *****
                       Before Matching       After Matching
mean treatment........     1.1148            1.1369 
mean control..........    0.63332           0.63332 
std mean diff.........     60.484            61.204 

mean raw eQQ diff.....     0.4787           0.50359 
med  raw eQQ diff.....    0.35992           0.38066 
max  raw eQQ diff.....     1.0113            1.0858 

mean eCDF diff........    0.17141           0.17874 
med  eCDF diff........    0.17768           0.17785 
max  eCDF diff........    0.27599           0.28859 

var ratio (Tr/Co).....      1.672            1.7863 
T-test p-value........ < 2.22e-16        < 2.22e-16 
KS Bootstrap p-value.. < 2.22e-16        < 2.22e-16 
KS Naive p-value......  3.042e-14        3.3294e-11 
KS Statistic..........    0.27599           0.28859 


Before Matching Minimum p.value: < 2.22e-16 
Variable Name(s): ves1proc ps linps  Number(s): 7 8 9 

After Matching Minimum p.value: < 2.22e-16 
Variable Name(s): ejecfrac ves1proc ps linps  Number(s): 6 7 8 9

Code

covnames <- c("stent", "height", "female", "diabetic", "acutemi", "ejecfrac", "ves1proc", "ps", "linps")

This is Dr. Love’s code to extract the standardized differences.

Code

pre.szd <- NULL; post.szd <- NULL
for(i in 1:length(covnames)) {
pre.szd[i] <- mb1$BeforeMatching[[i]]$sdiff.pooled
post.szd[i] <- mb1$AfterMatching[[i]]$sdiff.pooled
}

We can now print our table of standardized differences.

Code

match_szd <- data.frame(covnames, pre.szd, post.szd)
print(match_szd, digits=3)

  covnames pre.szd post.szd
1    stent  25.445    18.54
2   height  -0.034    -2.76
3   female -11.466    -6.30
4 diabetic -14.983   -19.59
5  acutemi  37.145    30.01
6 ejecfrac -18.208   -23.45
7 ves1proc  42.734    42.62
8       ps  67.880    66.77
9    linps  67.664    61.20

We don’t really need to go on here. The post-matching standardized difference (for the linear propensity score) are barely improved.

9.1 Love Plot of standardized differences before and after 1:1 matching without replacement

9.2 Using ggplot

In this figure, blue points are post-matching while white are pre-match

Code

lp_wo_rep <- ggplot(match_szd, aes(x = pre.szd, y = reorder(covnames, pre.szd))) +
  geom_point(col = "black", size = 3, pch = 1) +
  geom_point(aes(x = post.szd, y = reorder(covnames, pre.szd)), size = 3, col = "blue") +
  geom_vline(aes(xintercept = 0)) +
  geom_vline(aes(xintercept = 10), linetype = "dashed", col = "red") +
  geom_vline(aes(xintercept = -10), linetype = "dashed", col = "red") +
  labs(x = "Standardized Difference (%)", 
     y = "",
     title = "Love Plot",
     subtitle = "1:1 matching without replacement")

lp_wo_rep

Just visually, we can see this match is dreadful.

9.3 Creating a dataframe containing the matched sample

We will created a data frame which includes our matched sample, and do a quick count for a sanity check.

Code

matches <- factor(rep(match1$index.treated, 2))
lindner_clean.matchedsample <- cbind(matches, lindner_clean[c(match1$index.control, match1$index.treated),])

lindner_clean.matchedsample %>% count(treated_f)

  treated_f   n
1   treated 298
2   control 298

9.4 Reassessing Rubin’s Rules after 1:1 matching without replacement

9.4.1 Rubin’s Rule 1

Code

rubin1.match <- with(lindner_clean.matchedsample,
abs(100*(mean(linps[treated==1])-mean(linps[treated==0]))/sd(linps)))
rubin1.match

[1] 65.52744

The new value for Rubin’s Rule 1 is 65.53. This is not an improvement from the pre-match value of 61.87. We continue to fail Rubin’s Rule 1.

9.4.2 Rubin’s Rule 2

Code

rubin2.match <- with(lindner_clean.matchedsample, var(linps[treated==1])/var(linps[treated==0]))
rubin2.match

[1] 1.786349

The new value for Rubin’s Rule 2 is 1.79. This does not pass Rubin’s Rule 2 and is not an improvement from the pre-match value of 1.67.

10 Task 5: Outcomes after 1:1 Matching Without Replacement

Since the 1:1 match without replacement didn’t produce a useful match, we’ll skip this, and move on to more fruitful uses of the propensity score in this setting.

11 Task 6 1:1 Matching With replacement

In our attempt at 1:1 matching without replacement, 400 treated participants were excluded from the sample. This is a waste of data and we’ll address this by again matching 1 treated participant to 1 control participant. However, this time we’ll match with replacement, meaning each time a control participant is matched to a treated participant, the control participant will be placed back into the pool of possible patients a treated individual can be matched to.

Thus, some control participants will be matched multiple times (not all control participants have to be matched to a treated participant). In the Lindner dataset 1:1 matching with replacement is a more reasonable choice than matching with replacement.

Code

X <- lindner_clean$linps ## matching on the linear propensity score
Tr <- as.logical(lindner_clean$treated)
match1 <- Match(Tr=Tr, X=X, M = 1, replace=TRUE, ties=FALSE) # notice replace =  TRUE
summary(match1)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  996 
Original number of treated obs...............  698 
Matched number of observations...............  698 
Matched number of observations  (unweighted).  698

As you can see, this time we matched 698 treated individuals with 698 control participants. To reiterate, as we matched with replacement, and there were less control participants than treated participants, some control participants were matched multiple times.

Below we’ll assess the match balance from the 1:1 matching with replacement.

Code

set.seed(202102)
mb1 <- MatchBalance(treated ~ stent + height + female + diabetic + acutemi + ejecfrac + ves1proc + ps + linps, data=lindner_clean,
match.out = match1, nboots=500)


***** (V1) stent *****
                       Before Matching       After Matching
mean treatment........    0.70487           0.70487 
mean control..........    0.58389           0.73782 
std mean diff.........     26.505           -7.2194 

mean raw eQQ diff.....    0.12081          0.032951 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.060489          0.016476 
med  eCDF diff........   0.060489          0.016476 
max  eCDF diff........    0.12098          0.032951 

var ratio (Tr/Co).....    0.85457            1.0754 
T-test p-value........ 0.00032255          0.081877 


***** (V2) height *****
                       Before Matching       After Matching
mean treatment........     171.44            171.44 
mean control..........     171.45             171.5 
std mean diff.........  -0.033804          -0.52243 

mean raw eQQ diff.....    0.56376           0.80372 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....         20                22 

mean eCDF diff........  0.0078996          0.010546 
med  eCDF diff........  0.0060095          0.008596 
max  eCDF diff........   0.024971          0.035817 

var ratio (Tr/Co).....     1.0201           0.80708 
T-test p-value........    0.99608           0.92306 
KS Bootstrap p-value..       0.97             0.506 
KS Naive p-value......    0.99947           0.76185 
KS Statistic..........   0.024971          0.035817 


***** (V3) female *****
                       Before Matching       After Matching
mean treatment........    0.33095           0.33095 
mean control..........    0.38591            0.2937 
std mean diff.........    -11.672            7.9104 

mean raw eQQ diff.....   0.053691          0.037249 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........    0.02748          0.018625 
med  eCDF diff........    0.02748          0.018625 
max  eCDF diff........    0.05496          0.037249 

var ratio (Tr/Co).....    0.93253            1.0674 
T-test p-value........    0.10045          0.087608 


***** (V4) diabetic *****
                       Before Matching       After Matching
mean treatment........    0.20487           0.20487 
mean control..........    0.26846           0.21633 
std mean diff.........    -15.743           -2.8377 

mean raw eQQ diff.....   0.063758          0.011461 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.031793         0.0057307 
med  eCDF diff........   0.031793         0.0057307 
max  eCDF diff........   0.063585          0.011461 

var ratio (Tr/Co).....    0.82788           0.96087 
T-test p-value........    0.03402           0.54429 


***** (V5) acutemi *****
                       Before Matching       After Matching
mean treatment........    0.17908           0.17908 
mean control..........   0.060403           0.16046 
std mean diff.........     30.931             4.854 

mean raw eQQ diff.....    0.11745          0.018625 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........    0.05934         0.0093123 
med  eCDF diff........    0.05934         0.0093123 
max  eCDF diff........    0.11868          0.018625 

var ratio (Tr/Co).....     2.5853            1.0913 
T-test p-value........ 4.6617e-09           0.20446 


***** (V6) ejecfrac *****
                       Before Matching       After Matching
mean treatment........     50.403            50.403 
mean control..........     52.289            50.871 
std mean diff.........    -18.102           -4.4965 

mean raw eQQ diff.....     2.0503           0.82378 
med  raw eQQ diff.....          1                 0 
max  raw eQQ diff.....         20                20 

mean eCDF diff........   0.035602          0.013079 
med  eCDF diff........   0.011423         0.0057307 
max  eCDF diff........    0.11383          0.065903 

var ratio (Tr/Co).....     1.0238            1.0948 
T-test p-value........  0.0085806           0.36071 
KS Bootstrap p-value..      0.002              0.03 
KS Naive p-value......  0.0089219          0.096474 
KS Statistic..........    0.11383          0.065903 


***** (V7) ves1proc *****
                       Before Matching       After Matching
mean treatment........     1.4628            1.4628 
mean control..........     1.2047             1.457 
std mean diff.........     36.545           0.81157 

mean raw eQQ diff.....     0.2651          0.051576 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.043323          0.008596 
med  eCDF diff........  0.0090671         0.0078797 
max  eCDF diff........    0.18842          0.018625 

var ratio (Tr/Co).....     2.1614            1.0956 
T-test p-value........   4.21e-11           0.82203 
KS Bootstrap p-value.. < 2.22e-16             0.584 
KS Naive p-value...... 7.2635e-07           0.99973 
KS Statistic..........    0.18842          0.018625 


***** (V8) ps *****
                       Before Matching       After Matching
mean treatment........     0.7265            0.7265 
mean control..........    0.64061           0.72619 
std mean diff.........     66.092            0.2413 

mean raw eQQ diff.....   0.085216          0.001401 
med  raw eQQ diff.....   0.081353         0.0006372 
max  raw eQQ diff.....    0.12087          0.021689 

mean eCDF diff........    0.17141         0.0031904 
med  eCDF diff........    0.17768         0.0014327 
max  eCDF diff........    0.27599          0.024355 

var ratio (Tr/Co).....     1.1161            1.0082 
T-test p-value........ < 2.22e-16         0.0022796 
KS Bootstrap p-value.. < 2.22e-16             0.974 
KS Naive p-value......  3.042e-14           0.98578 
KS Statistic..........    0.27599          0.024355 


***** (V9) linps *****
                       Before Matching       After Matching
mean treatment........     1.1148            1.1148 
mean control..........    0.63332            1.1079 
std mean diff.........     60.484           0.86582 

mean raw eQQ diff.....     0.4787          0.016273 
med  raw eQQ diff.....    0.35992         0.0028864 
max  raw eQQ diff.....     1.0113           0.75735 

mean eCDF diff........    0.17141         0.0031904 
med  eCDF diff........    0.17768         0.0014327 
max  eCDF diff........    0.27599          0.024355 

var ratio (Tr/Co).....      1.672            1.0465 
T-test p-value........ < 2.22e-16         0.0014818 
KS Bootstrap p-value.. < 2.22e-16             0.974 
KS Naive p-value......  3.042e-14           0.98578 
KS Statistic..........    0.27599          0.024355 


Before Matching Minimum p.value: < 2.22e-16 
Variable Name(s): ves1proc ps linps  Number(s): 7 8 9 

After Matching Minimum p.value: 0.0014818 
Variable Name(s): linps  Number(s): 9

Code

covnames <- c("stent", "height", "female", "diabetic", "acutemi", "ejecfrac", "ves1proc", "ps", "linps")

Dr. Love’s code to extract the standardized differences.

Code

pre.szd <- NULL; post.szd <- NULL
for(i in 1:length(covnames)) {
pre.szd[i] <- mb1$BeforeMatching[[i]]$sdiff.pooled
post.szd[i] <- mb1$AfterMatching[[i]]$sdiff.pooled
}

Table of standardized differences

Code

match_szd <- data.frame(covnames, pre.szd, post.szd, row.names=covnames)
print(match_szd, digits=3)

         covnames pre.szd post.szd
stent       stent  25.445   -7.219
height     height  -0.034   -0.522
female     female -11.466    7.910
diabetic diabetic -14.983   -2.838
acutemi   acutemi  37.145    4.854
ejecfrac ejecfrac -18.208   -4.497
ves1proc ves1proc  42.734    0.812
ps             ps  67.880    0.241
linps       linps  67.664    0.866

11.1 Love Plot of standardized differences before and after 1:1 matching

11.2 Using ggplot

In this figure, blue points are post-matching while white are pre-match.

Code

lp_w_rep <- ggplot(match_szd, aes(x = pre.szd, y = reorder(covnames, pre.szd))) +
  geom_point(col = "black", size = 3, pch = 1) +
  geom_point(aes(x = post.szd, y = reorder(covnames, pre.szd)), size = 3, col = "blue") +
  geom_vline(aes(xintercept = 0)) +
  geom_vline(aes(xintercept = 10), linetype = "dashed", col = "red") +
  geom_vline(aes(xintercept = -10), linetype = "dashed", col = "red") +
  labs(x = "Standardized Difference (%)", 
     y = "",
     title = "Love Plot",
     subtitle = "1:1 matching with replacement")

lp_w_rep

Visually, the Love Plot using 1:1 matching with replacement looks pretty good.

Code

# comparison of love plots with and without replacement
lp_wo_rep +  lp_w_rep

When we look at the plots without replacement and with replacement side-by-side, it definitely looks better than the 1:1 matching without replacement.

11.3 Using `cobalt` to make the Love Plot

Again, we can also use an automated way to make the Love Plot.

Code

cobalt_tab <- bal.tab(match1, treated ~ stent + height + female + diabetic + acutemi + ejecfrac + ves1proc + ps + linps, data=lindner_clean, un = TRUE,
                      stats = c("m","v"))

cobalt_tab

Balance Measures
            Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
stent     Binary  0.1210          .  -0.0330           .
height   Contin. -0.0003     1.0201  -0.0052      0.8010
female    Binary -0.0550          .   0.0372           .
diabetic  Binary -0.0636          .  -0.0115           .
acutemi   Binary  0.1187          .   0.0186           .
ejecfrac Contin. -0.1810     1.0238  -0.0450      1.0865
ves1proc Contin.  0.3654     2.1614   0.0081      1.0873
ps       Contin.  0.6609     1.1161   0.0024      1.0006
linps    Contin.  0.6048     1.6720   0.0087      1.0386

Sample sizes
                     Control Treated
All                    298.      698
Matched (ESS)          111.8     698
Matched (Unweighted)   225.      698
Unmatched               73.        0

Code

p <- love.plot(cobalt_tab, threshold = .1, size = 3,
               var.order = "unadjusted",
               title = "Standardized Differences after 1:1 Matching With Replacement",
               stars = "std")

p

11.4 Extracting Variance Ratios

Code

pre.vratio <- NULL; post.vratio <- NULL
for(i in 1:length(covnames)) {
pre.vratio[i] <- mb1$BeforeMatching[[i]]$var.ratio
post.vratio[i] <- mb1$AfterMatching[[i]]$var.ratio
}
## Table of Variance Ratios
match_vrat <- data.frame(names = covnames, pre.vratio, post.vratio, row.names=covnames)
print(match_vrat, digits=2)

            names pre.vratio post.vratio
stent       stent       0.85        1.08
height     height       1.02        0.81
female     female       0.93        1.07
diabetic diabetic       0.83        0.96
acutemi   acutemi       2.59        1.09
ejecfrac ejecfrac       1.02        1.09
ves1proc ves1proc       2.16        1.10
ps             ps       1.12        1.01
linps       linps       1.67        1.05

11.4.1 Using ‘cobalt’ to make a Love Plot of Variance Ratios

Code

p <- love.plot(cobalt_tab, stats = "v", threshold = .1, size = 3,
               var.order = "unadjusted",
               title = "Variance Ratios after 1:1 Matching With Replacement",
               stars = "")

p

11.5 Creating a dataframe containing the matched sample

Code

matches <- factor(rep(match1$index.treated, 2))
lindner_clean.matchedsample <- cbind(matches, lindner_clean[c(match1$index.control, match1$index.treated),])

lindner_clean.matchedsample %>% count(treated_f)

  treated_f   n
1   treated 698
2   control 698

11.5.1 How many times were the non-treated patients matched?

Code

lindner_clean.matchedsample %>% filter(treated_f == "control") %>% 
    count(patientid) %>%
    janitor::tabyl(n)

  n n_n     percent
  1  74 0.328888889
  2  59 0.262222222
  3  41 0.182222222
  4  15 0.066666667
  5   6 0.026666667
  6   7 0.031111111
  7   6 0.026666667
  8   3 0.013333333
  9   1 0.004444444
 10   2 0.008888889
 11   2 0.008888889
 12   2 0.008888889
 13   1 0.004444444
 15   1 0.004444444
 16   3 0.013333333
 17   2 0.008888889

11.6 Reassessing Rubin’s Rules after 1:1 matching with replacement

11.6.1 Rubin’s Rule 1

Code

rubin1.match.rep <- with(lindner_clean.matchedsample,
abs(100*(mean(linps[treated==1])-mean(linps[treated==0]))/sd(linps)))
rubin1.match.rep

[1] 0.87591

The new value for Rubin’s Rule 1 is 0.88. This value passes Rubin’s Rule 1 and is an improvement from the Rubin’s Rule 1 value obtained during 1:1 matching without replacement, 65.53. The pre-match value was 61.87.

11.6.2 Rubin’s Rule 2

Code

rubin2.match.rep <- with(lindner_clean.matchedsample, var(linps[treated==1])/var(linps[treated==0]))
rubin2.match.rep

[1] 1.046494

The new value for Rubin’s Rule 2 is 1.05. This passes Rule 2 and is an improvement from the Rubin’s Rule 2 value obtained during 1:1 matching without replacement, 1.79. The pre-match value was 1.67.

11.7 Estimating the causal effect of the treatment on both outcomes after 1:1 matching with replacement

11.7.1 The Quantitative Outcome

We’ll use a mixed model to estimate the effect of the treatment on cardbill. The matches will be treated as a random effect in the model (syntax “(1| matches.f)”. and the treatment group will be treated as a fixed effect. We will use restricted maximum likelihood (REML) to estimate coefficient values.

Code

#to appease lme4, factor the matches 
lindner_clean.matchedsample$matches.f <- as.factor(lindner_clean.matchedsample$matches)

# fit the mixed model
matched_mixedmodel.rep.out1 <- lmer(cardbill ~ treated + (1 | matches.f), REML = TRUE, data=lindner_clean.matchedsample)

summary(matched_mixedmodel.rep.out1)

Linear mixed model fit by REML ['lmerMod']
Formula: cardbill ~ treated + (1 | matches.f)
   Data: lindner_clean.matchedsample

REML criterion at convergence: 30291.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.0996 -0.4634 -0.2698  0.1000 12.7641 

Random effects:
 Groups    Name        Variance  Std.Dev.
 matches.f (Intercept)   3053638  1747   
 Residual              155732228 12479   
Number of obs: 1396, groups:  matches.f, 698

Fixed effects:
            Estimate Std. Error t value
(Intercept)    16300        477  34.174
treated         -173        668  -0.259

Correlation of Fixed Effects:
        (Intr)
treated -0.700

Code

confint(matched_mixedmodel.rep.out1)

Computing profile confidence intervals ...

                2.5 %    97.5 %
.sig01          0.000  3863.772
.sigma      11843.703 13056.942
(Intercept) 15364.927 17234.506
treated     -1483.153  1137.087

Code

tidy_mixed_matched_rep <- broom.mixed::tidy(matched_mixedmodel.rep.out1, conf.int = TRUE, conf.level = 0.95) %>% 
  filter(term == "treated")

tidy_mixed_matched_rep |> kable(digits = 3)

effect	group	term	estimate	std.error	statistic	conf.low	conf.high
fixed	NA	treated	-173.033	668	-0.259	-1482.289	1136.223

Treated individuals were estimated to spend $-173.03 (95% CI:-1482.29, 1136.22) as compared to non-treated individuals: so the treated will spend somewhat less. Zero is in that 95% CI, so a formal sensitivity analysis on the Quantitative outcome using Rosenbaum bounds will still not make sense.

Code

#sanity check for model
lindner_clean.matchedsample %>% group_by(treated_f) %>% summarise(mean_card = mean(cardbill))

# A tibble: 2 × 2
  treated_f mean_card
  <fct>         <dbl>
1 treated      16127.
2 control      16300.

The mixed model above predicted treated individuals would spend roughly $-173.03 as compared to control participants. After doing a quick check of the mean cardbill within the matched sample, the mixed model results make sense.

11.7.2 The Binary Outcome

We will use conditional logistic regression to estimate the log odds (and ORs) of being alive after 6 months based on treatment status.

Code

binary_outcome_adjusted_rep <- survival::clogit(sixMonthSurvive ~ treated + strata(matches), data=lindner_clean.matchedsample)

summary(binary_outcome_adjusted_rep)

Call:
coxph(formula = Surv(rep(1, 1396L), sixMonthSurvive) ~ treated + 
    strata(matches), data = lindner_clean.matchedsample, method = "exact")

  n= 1396, number of events= 1328 

          coef exp(coef) se(coef)     z Pr(>|z|)    
treated 1.7228    5.6000   0.3433 5.018 5.22e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
treated       5.6     0.1786     2.857     10.98

Concordance= 0.848  (se = 0.062 )
Likelihood ratio test= 35.35  on 1 df,   p=3e-09
Wald test            = 25.18  on 1 df,   p=5e-07
Score (logrank) test = 32.06  on 1 df,   p=1e-08

Code

#Tidy model
tidy_binary_outcome_adjusted_rep <- tidy(binary_outcome_adjusted_rep, exponentiate = TRUE, conf.int = 0.95)

tidy_binary_outcome_adjusted_rep |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	5.6	0.343	5.018	0	2.857	10.975

The odds of being alive after six months were 5.6 times higher in treated individuals than non-treated controls (95% CI: 2.86, 10.98)

12 Task 7: Subclassification by Propensity Score Quintile

Code

#cut into quintiles
lindner_clean$stratum <- Hmisc::cut2(lindner_clean$ps, g=5)
lindner_clean$quintile <- factor(lindner_clean$stratum, labels=1:5)

#Sanity check: check to make sure quintiles are similar in size, etc.
lindner_clean %>% count(stratum, quintile)

# A tibble: 5 × 3
  stratum       quintile     n
  <fct>         <fct>    <int>
1 [0.232,0.581) 1          200
2 [0.581,0.669) 2          199
3 [0.669,0.726) 3          200
4 [0.726,0.826) 4          199
5 [0.826,0.980] 5          198

12.1 Check Balance and Propensity Score Overlap in Each Quintile

12.1.1 Numerically

Only 20 controls were were in the largest quintile, which seems a bit low.

Code

lindner_clean %>% count(quintile, treated_f)

# A tibble: 10 × 3
   quintile treated_f     n
   <fct>    <fct>     <int>
 1 1        treated     105
 2 1        control      95
 3 2        treated     124
 4 2        control      75
 5 3        treated     135
 6 3        control      65
 7 4        treated     156
 8 4        control      43
 9 5        treated     178
10 5        control      20

12.1.2 Graphically

Code

ggplot(lindner_clean, aes(x = treated_f, y = round(ps,2), group = quintile, color = treated_f)) +
geom_jitter(width = 0.2) +
guides(color = "none") +
facet_wrap(~ quintile, scales = "free_y") +
labs(x = "", y = "Propensity for Treatment",
title = "Quintile Subclassification in the Lindner data")

12.2 Creating a Standardized Difference Calculation Function

Here we implement Dr. Love’s function to calculate the standardizes differences is utilized below.

Code

szd <- function(covlist, g) {
covlist2 <- as.matrix(covlist)
g <- as.factor(g)
res <- NA
for(i in 1:ncol(covlist2)) {
cov <- as.numeric(covlist2[,i])
num <- 100*diff(tapply(cov, g, mean, na.rm=TRUE))
den <- sqrt(mean(tapply(cov, g, var, na.rm=TRUE)))
res[i] <- round(num/den,2)
}
names(res) <- names(covlist)
res
}

Now we’ll split data into quintiles - and give them each their own dataframe.

Code

quin1 <- filter(lindner_clean, quintile==1)
quin2 <- filter(lindner_clean, quintile==2)
quin3 <- filter(lindner_clean, quintile==3)
quin4 <- filter(lindner_clean, quintile==4)
quin5 <- filter(lindner_clean, quintile==5)

Now we’ll run the function above to calculate the standardized differences for each covariate in each quintile.

Code

covs <- c("stent", "height", "female", "diabetic", "acutemi", "ejecfrac", "ves1proc", "ps", "linps")
d.q1 <- szd(quin1[covs], quin1$treated)
d.q2 <- szd(quin2[covs], quin2$treated)
d.q3 <- szd(quin3[covs], quin3$treated)
d.q4 <- szd(quin4[covs], quin4$treated)
d.q5 <- szd(quin5[covs], quin5$treated)
d.all <- szd(lindner_clean[covs], lindner_clean$treated)
lindner_clean.szd <- tibble(covs, Overall = d.all, Q1 = d.q1, Q2 = d.q2, Q3 = d.q3, Q4 = d.q4, Q5 = d.q5)
lindner_clean.szd <- gather(lindner_clean.szd, "quint", "sz.diff", 2:7)

12.3 Plotting the post-subclassification standardized differences

Code

ggplot(lindner_clean.szd, aes(x = sz.diff, y = reorder(covs, -sz.diff), group = quint)) +
  geom_point() +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-10,10), linetype = "dashed", col = "blue") +
facet_wrap(~ quint) +
labs(x = "Standardized Difference, %", y = "",
title = "Comparing Standardized Differences by PS Quintile")

Warning: Removed 1 rows containing missing values (`geom_point()`).

The results of the standardized differences by quintile are fairly variable.

12.4 Rubin’s Rules post subclassification

12.4.1 Rule 1

Code

rubin1.q1 <- with(quin1, abs(100*(mean(linps[treated==1]) - mean(linps[treated==0]))/sd(linps)))

rubin1.q2 <- with(quin2, abs(100*(mean(linps[treated==1]) -mean(linps[treated==0]))/sd(linps)))

rubin1.q3 <- with(quin3, abs(100*(mean(linps[treated==1]) -mean(linps[treated==0]))/sd(linps)))

rubin1.q4 <- with(quin4, abs(100*(mean(linps[treated==1]) -mean(linps[treated==0]))/sd(linps)))

rubin1.q5 <- with(quin5, abs(100*(mean(linps[treated==1]) -mean(linps[treated==0]))/sd(linps)))

rubin1.sub <- c(rubin1.q1, rubin1.q2, rubin1.q3, rubin1.q4, rubin1.q5)
names(rubin1.sub)=c("Q1", "Q2", "Q3", "Q4", "Q5")

rubin1.sub

       Q1        Q2        Q3        Q4        Q5 
42.633282 10.122973  9.054266 23.662028 20.717673

All are under 50. Not great, but OK. For comparison, the original Rubin’s Rule 1 value was 61.87.

12.4.2 Rule 2

Code

rubin2.q1 <- with(quin1, var(linps[treated==1])/var(linps[treated==0]))
rubin2.q2 <- with(quin2, var(linps[treated==1])/var(linps[treated==0]))
rubin2.q3 <- with(quin3, var(linps[treated==1])/var(linps[treated==0]))
rubin2.q4 <- with(quin4, var(linps[treated==1])/var(linps[treated==0]))
rubin2.q5 <- with(quin5, var(linps[treated==1])/var(linps[treated==0]))

rubin2.sub <- c(rubin2.q1, rubin2.q2, rubin2.q3, rubin2.q4, rubin2.q5)
names(rubin2.sub)=c("Q1", "Q2", "Q3", "Q4", "Q5")
rubin2.sub

       Q1        Q2        Q3        Q4        Q5 
0.6582169 1.2083230 1.1754770 1.2154060 1.2353984

All but Q1 are at least close to passing Rule 2. For comparison, the original Rubin’s Rule 2 value was 1.67.

13 Task 8: Estimated effect after subclassification

13.1 Quantitative outcome

Code

quin1.out1 <- lm(cardbill ~ treated, data=quin1)
quin2.out1 <- lm(cardbill ~ treated, data=quin2)
quin3.out1 <- lm(cardbill ~ treated, data=quin3)
quin4.out1 <- lm(cardbill ~ treated, data=quin4)
quin5.out1 <- lm(cardbill ~ treated, data=quin5)

coef(summary(quin1.out1)); coef(summary(quin2.out1)); coef(summary(quin3.out1)); coef(summary(quin4.out1)); coef(summary(quin5.out1))

               Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 14262.49474   1083.197 13.16704155 7.497113e-29
treated       -67.69474   1494.953 -0.04528217 9.639280e-01

             Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 15038.427   1794.884 8.378497 1.000329e-14
treated      1412.154   2273.799 0.621055 5.352814e-01

             Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 13259.415   1099.734 12.05693 1.846022e-25
treated      2837.814   1338.554  2.12006 3.524616e-02

            Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 14474.19   1620.396 8.932501 2.966193e-16
treated      2979.16   1830.144 1.627828 1.051596e-01

             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 19398.350   1967.305  9.860368 7.011002e-19
treated     -3498.063   2074.886 -1.685906 9.340509e-02

The mean of the five quintile-specific estimated regression coefficients is below.

Code

est.st <- (coef(quin1.out1)[2] + coef(quin2.out1)[2] + coef(quin3.out1)[2] +
coef(quin4.out1)[2] + coef(quin5.out1)[2])/5

est.st

treated 
732.674

The mean SE is below.

Code

se.q1 <- summary(quin1.out1)$coefficients[2,2]
se.q2 <- summary(quin2.out1)$coefficients[2,2]
se.q3 <- summary(quin3.out1)$coefficients[2,2]
se.q4 <- summary(quin4.out1)$coefficients[2,2]
se.q5 <- summary(quin5.out1)$coefficients[2,2]

se.st <- sqrt((se.q1^2 + se.q2^2 + se.q3^2 + se.q4^2 + se.q5^2)*(1/25))
se.st

[1] 821.008

The mean estimate, with a 95% CI, is below.

Code

strat.result1 <- tibble(estimate = est.st,
conf.low = est.st - 1.96*se.st,
conf.high = est.st + 1.96*se.st)
strat.result1

# A tibble: 1 × 3
  estimate conf.low conf.high
     <dbl>    <dbl>     <dbl>
1     733.    -877.     2342.

So treated individuals were estimated to spend $732.67 more (95% CI:-876.5, 2341.85) than non treated individuals.

13.2 Binary Outcome

Code

quin1.out2 <- glm(sixMonthSurvive ~ treated, data=quin1, family=binomial())
quin2.out2 <- glm(sixMonthSurvive ~ treated, data=quin2, family=binomial())
quin3.out2 <- glm(sixMonthSurvive ~ treated, data=quin3, family=binomial())
quin4.out2 <- glm(sixMonthSurvive ~ treated, data=quin4, family=binomial())
quin5.out2 <- glm(sixMonthSurvive ~ treated, data=quin5, family=binomial())

coef(summary(quin1.out2)); coef(summary(quin2.out2)); coef(summary(quin3.out2)); coef(summary(quin4.out2)); coef(summary(quin5.out2))

            Estimate Std. Error  z value     Pr(>|z|)
(Intercept) 3.124565  0.5108708 6.116155 9.586018e-10
treated     1.519826  1.1272001 1.348319 1.775557e-01

            Estimate Std. Error  z value     Pr(>|z|)
(Intercept) 2.876386  0.5138915 5.597262 2.177636e-08
treated     1.935799  1.1278865 1.716306 8.610597e-02

            Estimate Std. Error  z value     Pr(>|z|)
(Intercept) 3.028522  0.5911534 5.123073 3.005960e-07
treated     1.869318  1.1648042 1.604834 1.085303e-01

            Estimate Std. Error   z value     Pr(>|z|)
(Intercept) 3.737670   1.011815 3.6940239 0.0002207331
treated     0.194156   1.167726 0.1662684 0.8679457146

            Estimate Std. Error  z value    Pr(>|z|)
(Intercept) 1.734601  0.6262243 2.769936 0.005606735
treated     1.809253  0.7732630 2.339764 0.019295953

Estimated log-odds (averaged over the quintiles).

Code

est.st.log <- (coef(quin1.out2)[2] + coef(quin2.out2)[2] + coef(quin3.out2)[2] +
coef(quin4.out2)[2] + coef(quin5.out2)[2])/5

est.st.log

treated 
1.46567

Estimated odds ratio (averaged over the quintiles).

Code

exp(est.st.log)

 treated 
4.330444

The average SE (averaged over the quintiles).

Code

se.q1.log <- summary(quin1.out2)$coefficients[2,2]
se.q2.log <- summary(quin2.out2)$coefficients[2,2]
se.q3.log <- summary(quin3.out2)$coefficients[2,2]
se.q4.log <- summary(quin4.out2)$coefficients[2,2]
se.q5.log <- summary(quin5.out2)$coefficients[2,2]

se.st.log <- sqrt((se.q1.log^2 + se.q2.log^2 + se.q3.log^2 + se.q4.log^2 + se.q5.log^2)*(1/25))
se.st.log #log odds

[1] 0.4841899

Code

strat.result2 <- tibble(estimate = exp(est.st.log),
conf.low = exp(est.st.log - 1.96*se.st.log),
conf.high = exp(est.st.log + 1.96*se.st.log))

strat.result2

# A tibble: 1 × 3
  estimate conf.low conf.high
     <dbl>    <dbl>     <dbl>
1     4.33     1.68      11.2

The odds of being alive after 6 months was 4.33 (95% CI:1.68, 11.19) times higher in treated individuals than non-treated individuals.

14 Task 9: Weighting

14.1 Calculating the ATT and ATE weights

14.1.1 ATT weights

First, we can use the average treatment effect on the treated (ATT) approach where we weight treated subjects as 1 and controls as ps/(1-ps)

Code

lindner_clean$wts1 <- ifelse(lindner_clean$treated==1, 1, lindner_clean$ps/(1-lindner_clean$ps))

14.1.2 ATE weights

We can also use the average treatment effect (ATE) weights where we weight treated subjects by 1/ps and controls by 1/(1-PS)

Code

lindner_clean$wts2 <- ifelse(lindner_clean$treated==1, 1/lindner_clean$ps, 1/(1-lindner_clean$ps))

14.2 Working with the ATT weights

Code

ggplot(lindner_clean, aes(x = ps, y = wts1, color = treated_f)) +
  geom_point() +
  guides(color = "none") +
  facet_wrap(~ treated_f) +
  labs(x = "Estimated Propensity for Treatment",
       y = "ATT weight",
       title = "ATT weighting structure")

Code

#turn dataset into a dataframe for twang (its a tibble now)
lindner_clean_df <- data.frame(lindner_clean)

#name covariates
covlist <- c("stent", "height", "female", "diabetic", "acutemi", "ejecfrac", "ves1proc", "ps", "linps")

Code

bal.wts1 <- dx.wts(x=lindner_clean_df$wts1, data=lindner_clean_df, 
                   vars=covlist, treat.var="treated", estimand="ATT")

bal.wts1

  type n.treat n.ctrl ess.treat ess.ctrl     max.es    mean.es     max.ks
1  unw     698    298       698 298.0000 0.66091743 0.29567509 0.27599469
2          698    298       698 149.4503 0.08471131 0.03315857 0.06089807
     mean.ks iter
1 0.13749095   NA
2 0.03182485   NA

Code

bal.table(bal.wts1)

$unw
           tx.mn  tx.sd   ct.mn  ct.sd std.eff.sz   stat     p    ks ks.pval
stent      0.705  0.456   0.584  0.494      0.265  3.624 0.000 0.121   0.004
height   171.443 10.695 171.446 10.589      0.000 -0.005 0.996 0.025   0.999
female     0.331  0.471   0.386  0.488     -0.117 -1.647 0.100 0.055   0.531
diabetic   0.205  0.404   0.268  0.444     -0.157 -2.127 0.034 0.064   0.349
acutemi    0.179  0.384   0.060  0.239      0.309  5.923 0.000 0.119   0.005
ejecfrac  50.403 10.419  52.289 10.297     -0.181 -2.640 0.008 0.114   0.008
ves1proc   1.463  0.706   1.205  0.480      0.365  6.693 0.000 0.188   0.000
ps         0.727  0.130   0.641  0.123      0.661  9.928 0.000 0.276   0.000
linps      1.115  0.796   0.633  0.616      0.605 10.321 0.000 0.276   0.000

[[2]]
           tx.mn  tx.sd   ct.mn  ct.sd std.eff.sz   stat     p    ks ks.pval
stent      0.705  0.456   0.702  0.458      0.005  0.065 0.948 0.002   1.000
height   171.443 10.695 171.568 11.934     -0.012 -0.102 0.919 0.042   0.974
female     0.331  0.471   0.311  0.464      0.042  0.497 0.620 0.020   1.000
diabetic   0.205  0.404   0.235  0.425     -0.074 -0.716 0.474 0.030   1.000
acutemi    0.179  0.384   0.180  0.385     -0.001 -0.011 0.991 0.001   1.000
ejecfrac  50.403 10.419  50.384 10.358      0.002  0.019 0.985 0.032   0.999
ves1proc   1.463  0.706   1.523  0.749     -0.085 -0.647 0.518 0.038   0.990
ps         0.727  0.130   0.730  0.134     -0.030 -0.273 0.785 0.061   0.725
linps      1.115  0.796   1.153  0.839     -0.048 -0.360 0.719 0.061   0.725

Code

bal.before.wts1 <- bal.table(bal.wts1)[1]
bal.after.wts1 <- bal.table(bal.wts1)[2]
balance.att.weights <- tibble(names = rownames(bal.before.wts1$unw),
pre.weighting = 100*bal.before.wts1$unw$std.eff.sz,
ATT.weighted = 100*bal.after.wts1[[1]]$std.eff.sz)
balance.att.weights <- gather(balance.att.weights, timing, szd, 2:3)

Now we can plot the standardized differences after ATT weighting.

Code

ggplot(balance.att.weights, aes(x = szd, y = reorder(names, szd), color = timing)) +
  geom_point() +
  geom_vline(xintercept = 0) +
  geom_vline(xintercept = c(-10,10), linetype = "dashed", col = "blue") +
  labs(x = "Standardized Difference", 
       y = "",
       title = "Standardized Difference before and after ATT Weighting")

The standardized differences look much better here in this approach.

14.2.1 Rubin’s Rules

14.2.1.1 Rule 1

Numbers from linps row in part [2] of balance table above: (-0.048 * 100) = 4.8%. So passes Rule 1.

14.2.1.2 Rule 2

Numbers from linps row in part [2] of balance table above: (0.796²)/(0.839²) = 0.9001237. Passes Rule 2

14.2.2 Estimated effect on outcomes after ATT weighting

14.2.2.1 Quantitative outcome

To estimate the effect of the treatment on cardbill, we’ll use svyglm from the survey package to apply the ATT weights in a linear model.

Code

lindnerwt1.design <- svydesign(ids=~1, weights=~wts1, data=lindner_clean) # using ATT weights

adjout1.wt1 <- svyglm(cardbill ~ treated, design=lindnerwt1.design)

wt_att_results1 <- tidy(adjout1.wt1, conf.int = TRUE) %>% filter(term == "treated")

wt_att_results1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	-239.275	1416.996	-0.169	0.866	-3019.921	2541.371

Estimate (95% CI) -239.28 (-3019.92, 2541.37)

14.2.2.2 Binary outcome

We’ll do similar coding for the binary outcome.

Code

adjout2.wt1 <- svyglm(sixMonthSurvive ~ treated, design=lindnerwt1.design, family=quasibinomial())

wt_att_results2 <- tidy(adjout2.wt1, conf.int = TRUE, exponentiate = TRUE) %>%
filter(term == "treated")
wt_att_results2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	6.503	0.537	3.487	0.001	2.268	18.65

Estimate (95% CI) 6.5 (2.27, 18.65)

14.3 Working with the ATE weights

Now, we’ll go through the same steps with the ATE weights.

Code

ggplot(lindner_clean, aes(x = ps, y = wts2, color = treated_f)) +
geom_point() +
guides(color = "none") +
facet_wrap(~ treated_f) +
labs(x = "Estimated Propensity for Treatment",
y = "ATE weights",
title = "ATE weighting structure")

Code

bal.wts2 <- dx.wts(x=lindner_clean_df$wts2, data=lindner_clean_df, vars=covlist,
treat.var="treated", estimand="ATE")

bal.wts2

  type n.treat n.ctrl ess.treat ess.ctrl     max.es    mean.es     max.ks
1  unw     698    298   698.000 298.0000 0.64205075 0.29974928 0.27599469
2          698    298   671.093 199.6805 0.06759172 0.02390944 0.04595042
     mean.ks iter
1 0.13749095   NA
2 0.02622715   NA

Code

bal.table(bal.wts2)

$unw
           tx.mn  tx.sd   ct.mn  ct.sd std.eff.sz   stat     p    ks ks.pval
stent      0.705  0.456   0.584  0.494      0.257  3.624 0.000 0.121   0.004
height   171.443 10.695 171.446 10.589      0.000 -0.005 0.996 0.025   0.999
female     0.331  0.471   0.386  0.488     -0.115 -1.647 0.100 0.055   0.531
diabetic   0.205  0.404   0.268  0.444     -0.152 -2.127 0.034 0.064   0.349
acutemi    0.179  0.384   0.060  0.239      0.338  5.923 0.000 0.119   0.005
ejecfrac  50.403 10.419  52.289 10.297     -0.181 -2.640 0.008 0.114   0.008
ves1proc   1.463  0.706   1.205  0.480      0.393  6.693 0.000 0.188   0.000
ps         0.727  0.130   0.641  0.123      0.642  9.928 0.000 0.276   0.000
linps      1.115  0.796   0.633  0.616      0.619 10.321 0.000 0.276   0.000

[[2]]
           tx.mn  tx.sd   ct.mn  ct.sd std.eff.sz   stat     p    ks ks.pval
stent      0.670  0.470   0.667  0.472      0.006  0.081 0.936 0.003   1.000
height   171.404 10.602 171.532 11.552     -0.012 -0.124 0.902 0.038   0.974
female     0.344  0.475   0.333  0.472      0.022  0.283 0.777 0.010   1.000
diabetic   0.223  0.416   0.245  0.431     -0.052 -0.601 0.548 0.022   1.000
acutemi    0.143  0.351   0.144  0.352     -0.003 -0.026 0.979 0.001   1.000
ejecfrac  50.943 10.109  50.948 10.377      0.000 -0.006 0.995 0.042   0.934
ves1proc   1.384  0.663   1.428  0.696     -0.068 -0.586 0.558 0.028   0.999
ps         0.701  0.133   0.704  0.137     -0.018 -0.185 0.853 0.046   0.884
linps      0.973  0.774   0.999  0.815     -0.034 -0.292 0.771 0.046   0.884

Code

bal.before.wts2 <- bal.table(bal.wts2)[1]
bal.after.wts2 <- bal.table(bal.wts2)[2]
balance.ate.weights <- tibble(names = rownames(bal.before.wts2$unw),
pre.weighting = 100*bal.before.wts2$unw$std.eff.sz,

ATE.weighted = 100*bal.after.wts2[[1]]$std.eff.sz)
balance.ate.weights <- gather(balance.ate.weights, timing, szd, 2:3)

Code

ggplot(balance.ate.weights, aes(x = szd, y = reorder(names, szd), color = timing)) +
geom_point() +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-10,10), linetype = "dashed", col = "blue") +
labs(x = "Standardized Difference", y = "",
title = "Standardized Difference before and after ATE Weighting")

Again, the standardized differences look good here.

14.3.1 Rubin’s Rules

14.3.1.1 Rule 1

-0.034*100 = 3.4%. Passes Rule 1 (numbers from ATE weight balance table above).

14.3.1.2 Rule 2

(0.774²)/(0.815²) = 0.9019173. Passes Rule 2 (numbers from ATE weight balance table above).

14.3.2 Estimated effect on outcomes after ATE weighting

14.3.2.1 Quantitative outcome

Code

lindnerwt2.design <- svydesign(ids=~1, weights=~wts2, data=lindner_clean) # using ATE weights

adjout1.wt2 <- svyglm(cardbill ~ treated, design=lindnerwt2.design)

wt_ate_results1 <- tidy(adjout1.wt2, conf.int = TRUE) %>% filter(term == "treated")
wt_ate_results1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	147.256	1192.313	0.124	0.902	-2192.483	2486.996

Estimate 147.26 (95% CI: -2192.48, 2487)

14.3.2.2 Binary outcome

Code

adjout2.wt2 <- svyglm(sixMonthSurvive ~ treated, design=lindnerwt2.design, family=quasibinomial())

wt_ate_results2 <- tidy(adjout2.wt2, conf.int = TRUE, exponentiate = TRUE) %>%
filter(term == "treated")
wt_ate_results2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	5.738	0.503	3.472	0.001	2.138	15.4

Estimate 5.74 (95% CI: 2.14, 15.4)

15 Task 10: Using TWANG for propensity score estimation and ATT weighting

Code

ps.toy <- ps(treated ~ stent + height + female + diabetic + acutemi + ejecfrac + ves1proc,
data = lindner_clean_df,
n.trees = 3000,
interaction.depth = 2,
stop.method = c("es.mean"),
estimand = "ATT",
verbose = FALSE)

Code

plot(ps.toy)

Code

summary(ps.toy)

            n.treat n.ctrl ess.treat ess.ctrl     max.es    mean.es    max.ks
unw             698    298       698   298.00 0.36544982 0.19933096 0.1884195
es.mean.ATT     698    298       698   172.19 0.08373615 0.04075872 0.0388038
            max.ks.p    mean.ks iter
unw               NA 0.09791845   NA
es.mean.ATT       NA 0.02469335 1628

Code

plot(ps.toy, plots = 2)

Code

plot(ps.toy, plots = 3)

Code

bal.tab(ps.toy, full.stop.method = "es.mean.att")

Balance Measures
               Type Diff.Adj
prop.score Distance   0.2497
stent        Binary   0.0263
height      Contin.  -0.0068
female       Binary   0.0082
diabetic     Binary  -0.0235
acutemi      Binary   0.0321
ejecfrac    Contin.  -0.0614
ves1proc    Contin.   0.0001

Effective sample sizes
           Control Treated
Unadjusted  298.       698
Adjusted    172.19     698

Code

p <- love.plot(bal.tab(ps.toy),
               threshold = .1, size = 1.5,
               title = "Standardized Differences and TWANG ATT Weighting")

Warning: Standardized mean differences and raw mean differences are present in the same plot. 
Use the `stars` argument to distinguish between them and appropriately label the x-axis.

Code

Compared to the manual ATT/ATE weights, the standardized differences look a bit worse here.

15.1 Estimated effect on outcomes after TWANG ATT weighting

15.1.1 Quantitative outcome

Code

toywt3.design <- svydesign(ids=~1,
weights=~get.weights(ps.toy,
stop.method = "es.mean"),
data=lindner_clean) # using twang ATT weights

adjout1.wt3 <- svyglm(cardbill ~ treated, design=toywt3.design)
wt_twangatt_results1 <- tidy(adjout1.wt3, conf.int = TRUE) %>% filter(term == "treated")
wt_twangatt_results1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	500.507	1102.398	0.454	0.65	-1662.786	2663.801

Estimate 500.51 (95% CI: -1662.79, 2663.8)

15.1.2 Binary outcome

Code

adjout2.wt3 <- svyglm(sixMonthSurvive ~ treated, design=toywt3.design,
family=quasibinomial())

wt_twangatt_results2 <- tidy(adjout2.wt3, conf.int = TRUE, exponentiate = TRUE) %>%
filter(term == "treated")
wt_twangatt_results2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	4.018	0.487	2.856	0.004	1.545	10.447

Estimate 4.02 (95% CI: 1.55, 10.45)

16 Task 11: After direct adjustment with linear PS

Here we’ll directly adjust for the linear propensity score by including it as a covariate in the model.

16.1 Quantitative outcome

Code

direct_out1 <- lm(cardbill ~ treated + linps, data=lindner_clean)

adj_out1 <- tidy(direct_out1, conf.int = TRUE) %>% filter(term == "treated")
adj_out1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	1167.899	805.216	1.45	0.147	-412.221	2748.018

Estimate 1167.9 (95% CI:-412.22, 2748.02)

16.2 Binary outcome

Code

direct_out2 <- glm(sixMonthSurvive ~ treated + linps, data=lindner_clean, family=binomial())

adj_out2 <- tidy(direct_out2, exponentiate = TRUE, conf.int = TRUE) %>%
filter(term == "treated")
adj_out2

# A tibble: 1 × 7
  term    estimate std.error statistic  p.value conf.low conf.high
  <chr>      <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 treated     4.64     0.438      3.50 0.000463     1.99      11.3

Estimate 4.64 (95% CI: 1.99, 11.27)

17 Task 12: “Double Robust” Approach: Weighting + Direct Adjustment

Here we’ll adjust for the linear propensity score and the ATT/ATE/TWANG weights when predicting the quantitative outcome.

17.1 Quantitative outcome

17.1.1 ATT weights

Code

design_att <- svydesign(ids=~1, weights=~wts1, data=lindner_clean) # using ATT weights

dr.out1.wt1 <- svyglm(cardbill ~ treated + linps, design=design_att)
dr_att_out1 <- tidy(dr.out1.wt1, conf.int = TRUE) %>% filter(term == "treated")
dr_att_out1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	-126.72	1216.658	-0.104	0.917	-2514.236	2260.796

Estimate -126.72 (95% CI: -2514.24, 2260.8)

17.1.2 ATE weights

Code

design_ate<- svydesign(ids=~1, weights=~wts2, data=lindner_clean) # using ATE weights

dr.out1.wt2 <- svyglm(cardbill ~ treated + linps, design=design_ate)
dr_ate_out1 <- tidy(dr.out1.wt2, conf.int = TRUE) %>% filter(term == "treated")
dr_ate_out1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	216.774	1069.081	0.203	0.839	-1881.144	2314.692

Estimate 216.77 (95% CI: -1881.14, 2314.69)

17.1.3 TWANG ATT weights

Code

wts3 <- get.weights(ps.toy, stop.method = "es.mean")
twang.design <- svydesign(ids=~1, weights=~wts3, data=lindner_clean) # twang ATT weights

dr.out1.wt3 <- svyglm(cardbill ~ treated + linps, design=twang.design)
dr_twangatt_out1 <- tidy(dr.out1.wt3, conf.int = TRUE) %>% filter(term == "treated")
dr_twangatt_out1 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	375.055	1103.135	0.34	0.734	-1789.689	2539.799

Estimate 375.05 (95% CI: -1789.69, 2539.8)

17.2 Binary outcome

Now we’ll adjust for the linear propensity score and the ATT/ATE/TWANG weights when predicting the binary outcome.

17.2.1 ATT weights

Code

dr.out2.wt1 <- svyglm(sixMonthSurvive ~ treated + linps, design=design_att,
family=quasibinomial())

dr_att_out2 <- tidy(dr.out2.wt1, exponentiate = TRUE, conf.int = TRUE) %>%
filter(term == "treated")
dr_att_out2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	6.897	0.563	3.428	0.001	2.283	20.836

Estimate 6.9 (95% CI: 2.28, 20.84)

17.2.2 ATE weights

Code

dr.out2.wt2 <- svyglm(sixMonthSurvive ~ treated + linps, design=design_ate,
family=quasibinomial())

dr_ate_out2 <- tidy(dr.out2.wt2, exponentiate = TRUE, conf.int = TRUE) %>%
  filter(term == "treated")

dr_ate_out2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	5.947	0.517	3.447	0.001	2.155	16.409

Estimate 5.95 (95% CI: 2.16, 16.41)

17.2.3 TWANG ATT weights

Code

dr.out2.wt3 <- svyglm(sixMonthSurvive ~ treated + linps, design=twang.design,
family=quasibinomial())

dr_twangatt_out2 <- tidy(dr.out2.wt3, exponentiate = TRUE, conf.int = TRUE) %>%
filter(term == "treated")
dr_twangatt_out2 |> kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
treated	4.873	0.554	2.857	0.004	1.642	14.457

Estimate 4.87 (95% CI: 1.64, 14.46)

18 Session Information

Code

xfun::session_info()

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Locale:
  LC_COLLATE=English_United States.utf8 
  LC_CTYPE=English_United States.utf8   
  LC_MONETARY=English_United States.utf8
  LC_NUMERIC=C                          
  LC_TIME=English_United States.utf8    

Package version:
  askpass_1.2.0            backports_1.4.1          base64enc_0.1-3         
  bit_4.0.5                bit64_4.0.5              blob_1.2.4              
  boot_1.3-28.1            brio_1.1.4               broom_1.0.5             
  broom.mixed_0.2.9.4      bslib_0.6.1              cachem_1.0.8            
  callr_3.7.3              cellranger_1.1.0         checkmate_2.3.1         
  chk_0.9.1                class_7.3-22             cli_3.6.2               
  clipr_0.8.0              cluster_2.1.4            cmprsk_2.2-11           
  cobalt_4.5.3             coda_0.19.4              codetools_0.2-19        
  colorspace_2.1-0         compiler_4.3.2           conflicted_1.2.0        
  cpp11_0.4.7              crayon_1.5.2             curl_5.2.0              
  data.table_1.14.10       DBI_1.2.1                dbplyr_2.4.0            
  deldir_2.0-2             desc_1.4.3               diffobj_0.3.5           
  digest_0.6.34            dplyr_1.1.4              dtplyr_1.3.1            
  e1071_1.7-14             ellipsis_0.3.2           Epi_2.47.1              
  etm_1.1.1                evaluate_0.23            fansi_1.0.6             
  farver_2.1.1             fastmap_1.1.1            fontawesome_0.5.2       
  forcats_1.0.0            foreign_0.8-86           Formula_1.2-5           
  fs_1.6.3                 furrr_0.3.1              future_1.33.1           
  gargle_1.5.2             gbm_2.1.9                gdata_3.0.0             
  generics_0.1.3           ggformula_0.12.0         ggplot2_3.4.4           
  ggridges_0.5.5           globals_0.16.2           glue_1.7.0              
  gmodels_2.18.1.1         googledrive_2.1.1        googlesheets4_1.1.1     
  graphics_4.3.2           grDevices_4.3.2          grid_4.3.2              
  gridExtra_2.3            gtable_0.3.4             gtools_3.9.5            
  haven_2.5.4              highr_0.10               Hmisc_5.1-1             
  hms_1.1.3                htmlTable_2.4.2          htmltools_0.5.7         
  htmlwidgets_1.6.4        httr_1.4.7               ids_1.0.1               
  interp_1.1-5             isoband_0.2.7            janitor_2.2.0           
  jpeg_0.1-10              jquerylib_0.1.4          jsonlite_1.8.8          
  knitr_1.45               labeling_0.4.3           labelled_2.12.0         
  lattice_0.22-5           latticeExtra_0.6-30      lifecycle_1.0.4         
  listenv_0.9.0            lme4_1.1-35.1            lubridate_1.9.3         
  magrittr_2.0.3           MASS_7.3-60.0.1          Matching_4.10-14        
  Matrix_1.6-5             MatrixModels_0.5-3       memoise_2.0.1           
  methods_4.3.2            mgcv_1.9-0               mime_0.12               
  minqa_1.2.6              mitools_2.4              modelr_0.1.11           
  mosaic_1.9.0             mosaicCore_0.9.4.0       mosaicData_0.20.4       
  munsell_0.5.0            naniar_1.0.0             nlme_3.1-163            
  nloptr_2.0.3             nnet_7.3-19              norm_1.0.11.1           
  numDeriv_2016.8-1.1      openssl_2.1.1            parallel_4.3.2          
  parallelly_1.36.0        patchwork_1.2.0          pillar_1.9.0            
  pkgbuild_1.4.3           pkgconfig_2.0.3          pkgload_1.3.3           
  plyr_1.8.9               png_0.1-8                praise_1.0.0            
  prettyunits_1.2.0        processx_3.8.3           progress_1.2.3          
  proxy_0.4-27             ps_1.7.5                 purrr_1.0.2             
  R6_2.5.1                 ragg_1.2.7               rappdirs_0.3.3          
  RColorBrewer_1.1-3       Rcpp_1.0.12              RcppArmadillo_0.12.6.6.1
  RcppEigen_0.3.3.9.4      readr_2.1.5              readxl_1.4.3            
  rematch_2.0.0            rematch2_2.1.2           reprex_2.1.0            
  rlang_1.1.3              rmarkdown_2.25           rpart_4.1.21            
  rprojroot_2.0.4          rstudioapi_0.15.0        rvest_1.0.3             
  sass_0.4.8               scales_1.3.0             selectr_0.4.2           
  snakecase_0.11.1         splines_4.3.2            stats_4.3.2             
  stringi_1.8.3            stringr_1.5.1            survey_4.2-1            
  survival_3.5-7           sys_3.4.2                systemfonts_1.0.5       
  tableone_0.13.2          testthat_3.2.1           textshaping_0.3.7       
  tibble_3.2.1             tidyr_1.3.0              tidyselect_1.2.0        
  tidyverse_2.0.0          timechange_0.2.0         tinytex_0.49            
  tools_4.3.2              twang_2.6                tzdb_0.4.0              
  UpSetR_1.4.0             utf8_1.2.4               utils_4.3.2             
  uuid_1.2.0               vctrs_0.6.5              viridis_0.6.4           
  viridisLite_0.4.2        visdat_0.6.0             vroom_1.6.5             
  waldo_0.5.2              withr_2.5.2              xfun_0.41               
  xgboost_1.7.6.1          xml2_1.3.6               xtable_1.8-4            
  yaml_2.3.8               zoo_1.8-12

Footnotes

Kereiakes DJ, Obenchain RL, Barber BL, et al. Abciximab provides cost effective survival advantage in high volume interventional practice. Am Heart J 2000; 140: 603-610.↩︎

Authorship Note

1 Setup

2 Load data

3 Data managment

3.1 Managing binary variables

3.2 Inspecting the clean data

4 Codebook

5 Table 1

6 Task 1: Ignoring covariates, estimate the effect of treatment vs. control on the two outcomes

6.1 Quantitative outcome: cardbill

6.2 Binary outcome: sixMonthSurvive

7 Task 2: Fitting the propensity score model

7.1 Comparing distribution of propensity scores across treatment groups

7.1.1 Boxplot

7.1.2 Density plot

8 Task 3: Rubin’s Rules For Assessing Overlap Before Propensity Adjustment

8.1 Rubin’s Rule 1

8.2 Rubin’s Rule 2

9 Task 4: Greedy 1:1 matching on the linear PS

9.1 Love Plot of standardized differences before and after 1:1 matching without replacement

9.2 Using ggplot

9.3 Creating a dataframe containing the matched sample

9.4 Reassessing Rubin’s Rules after 1:1 matching without replacement

9.4.1 Rubin’s Rule 1

9.4.2 Rubin’s Rule 2

10 Task 5: Outcomes after 1:1 Matching Without Replacement

11 Task 6 1:1 Matching With replacement

11.1 Love Plot of standardized differences before and after 1:1 matching

11.2 Using ggplot

11.3 Using cobalt to make the Love Plot

11.4 Extracting Variance Ratios

11.4.1 Using ‘cobalt’ to make a Love Plot of Variance Ratios

11.5 Creating a dataframe containing the matched sample

11.5.1 How many times were the non-treated patients matched?

11.6 Reassessing Rubin’s Rules after 1:1 matching with replacement

11.6.1 Rubin’s Rule 1

11.6.2 Rubin’s Rule 2

11.7 Estimating the causal effect of the treatment on both outcomes after 1:1 matching with replacement

11.7.1 The Quantitative Outcome

11.7.2 The Binary Outcome

12 Task 7: Subclassification by Propensity Score Quintile

12.1 Check Balance and Propensity Score Overlap in Each Quintile

12.1.1 Numerically

12.1.2 Graphically

12.2 Creating a Standardized Difference Calculation Function

12.3 Plotting the post-subclassification standardized differences

12.4 Rubin’s Rules post subclassification

12.4.1 Rule 1

12.4.2 Rule 2

13 Task 8: Estimated effect after subclassification

13.1 Quantitative outcome

13.2 Binary Outcome

14 Task 9: Weighting

14.1 Calculating the ATT and ATE weights

14.1.1 ATT weights

14.1.2 ATE weights

14.2 Working with the ATT weights

14.2.1 Rubin’s Rules

14.2.1.1 Rule 1

14.2.1.2 Rule 2

14.2.2 Estimated effect on outcomes after ATT weighting

14.2.2.1 Quantitative outcome

14.2.2.2 Binary outcome

14.3 Working with the ATE weights

14.3.1 Rubin’s Rules

14.3.1.1 Rule 1

14.3.1.2 Rule 2

14.3.2 Estimated effect on outcomes after ATE weighting

14.3.2.1 Quantitative outcome

14.3.2.2 Binary outcome

15 Task 10: Using TWANG for propensity score estimation and ATT weighting

15.1 Estimated effect on outcomes after TWANG ATT weighting

15.1.1 Quantitative outcome

15.1.2 Binary outcome

16 Task 11: After direct adjustment with linear PS

16.1 Quantitative outcome

16.2 Binary outcome

17 Task 12: “Double Robust” Approach: Weighting + Direct Adjustment

17.1 Quantitative outcome

17.1.1 ATT weights

6.1 Quantitative outcome: `cardbill`

6.2 Binary outcome: `sixMonthSurvive`

11.3 Using `cobalt` to make the Love Plot