In this assignment we will exercise causal inference on the RHC (Right Heart Catheterization) dataset This dataset was used in Connors et al. (1996): The effectiveness of RHC in the initial care of critically ill patients. J American Medical Association 276:889-897. The dataset pertains to day 1 of hospitalization, i.e., the “treatment” variable swang1 is whether or not a patient received a RHC (also called the Swan-Ganz catheter) on the first day in which the patient qualified for a specific research study called SUPPORT.
What should you submit? You need to submit: 1. This Notebook (in .Rmd format) after filling in all your answers inside this notebook. You are required to do an action (write or fill-in code or write documentation) at the places marked by an arrow (=>). This arrow appears within the comments (and it will be preceeded by “#”) or outside the comments in the text (without “#”). 2. The Notebook in html format with all the results.
==> Student names: Florence Cornelissen, Myrte Kuipers
==> Student numbers: 12186317, 11869089
Please Open https://hbiostat.org/data/repo/rhc.html and take a good look at the variables. Make sure you inspect the variables included in the dataset (they appear in a tabel below the text on the website). You can refer to the table whenever needed.
Packages: install packages if needed
#install.packages("tableone")
#install.packages("Matching")
#install.packages("dplyr")
# Recommended: add "dplyr" if you have used it before to make efficient preprocessing.
library(tableone) # This library allows us to calculate summary statistics about the variables including the SMD (standardized mean difference).
library(Matching) # This library will do the matching
## Loading required package: MASS
## ##
## ## Matching (Version 4.10-8, Build Date: 2022-11-03)
## ## See http://sekhon.berkeley.edu/matching for additional documentation.
## ## Please cite software as:
## ## Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ## Software with Automated Balance Optimization: The Matching package for R.''
## ## Journal of Statistical Software, 42(7): 1-52.
## ##
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Download rhc dataset from CANVAS and read it
rhc <- read.csv("rhc.csv")
Tip: Look at the data, use the “View” command on the dataset “rhc”
print("===================================================")
## [1] "==================================================="
# => how many rows does rhc contain? write a print command
nrow(rhc)
## [1] 5735
# => What are the variable names? write a print command
ls(rhc)
## [1] "adld3p" "age" "alb1" "amihx" "aps1" "bili1"
## [7] "ca" "card" "cardiohx" "cat1" "cat2" "chfhx"
## [13] "chrpulhx" "crea1" "das2d3pc" "death" "dementhx" "dnr1"
## [19] "dschdte" "dth30" "dthdte" "edu" "gastr" "gibledhx"
## [25] "hema" "hema1" "hrt1" "immunhx" "income" "liverhx"
## [31] "lstctdte" "malighx" "meanbp1" "meta" "neuro" "ninsclas"
## [37] "ortho" "paco21" "pafi1" "ph1" "pot1" "psychhx"
## [43] "ptid" "race" "renal" "renalhx" "resp" "resp1"
## [49] "sadmdte" "scoma1" "seps" "sex" "sod1" "surv2md1"
## [55] "swang1" "t3d30" "temp1" "transhx" "trauma" "urin1"
## [61] "wblc1" "wtkilo1" "X"
# print("===================================================")
For the assignment the TREATMENT variable is swang1 and the OUTCOME variable is death.
If you look at the values of the dichotomous variables you will see that they are factors (in other words, categorical variables). We would like to work with numeric variables in this exercise so we need to perform some conversions.
Let’s start with the treatement and outcome variables. Create an “outcome.died” and “swang.treatment” variables with the numeric values 0 and 1.
rhc$outcome.died <- as.numeric(rhc$death=='Yes') # Again you might want, instead, to do all preprocessing with dplyr if you prefer
rhc$treatment.swang <- as.numeric(rhc$swang1=='RHC')
Let’s take a quick look into the dataset using these variables
print("===================================================")
## [1] "==================================================="
# => What is the mean of outcome.died in the whole data set?
mean(rhc$outcome.died)
## [1] 0.6489974
# => How many subjects are treated?
sum(rhc$treatment.swang)
## [1] 2184
# => What is the mean mortality (outcome.died) in this treated group?
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])
## [1] 0.6804029
# => How many subjects are not treated (i.e. in the control group)?
sum(rhc$treatment.swang == 0)
## [1] 3551
# => What is the mean mortality in this control group?
mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])
## [1] 0.6296818
# print("===================================================")
The variable cat1 is a factor (categorical variable) containing the primary disease category for the patient.
print("===================================================")
## [1] "==================================================="
# => How many different categories are in Cat1? Tip: use the levels() function on this variable.
levels(factor(rhc$cat1))
## [1] "ARF" "CHF" "Cirrhosis"
## [4] "Colon Cancer" "Coma" "COPD"
## [7] "Lung Cancer" "MOSF w/Malignancy" "MOSF w/Sepsis"
length(levels(factor(rhc$cat1)))
## [1] 9
# print("===================================================")
We would like to “translate” Cat1 into a set of variables each with the numeric values of 0/1. For example for the value of “ARF” we will create a variable “ARF” that will be 1 if rhc$cat1==‘ARF’.
rhc$ARF <- as.numeric(rhc$cat1=='ARF')
print("===================================================")
## [1] "==================================================="
# => Do the same (as we did for ARF) also for all other values of cat1 but ignore the category "Missing".
rhc$CHF <- as.numeric(rhc$cat1=='CHF')
rhc$Cirrhosis <- as.numeric(rhc$cat1=='Cirrhosis')
rhc$COPD <- as.numeric(rhc$cat1=='COPD')
rhc$Lungcancer <- as.numeric(rhc$cat1=='Lung Cancer')
rhc$MOSFM <- as.numeric(rhc$cat1=='MOSF w/Malignancy')
rhc$Coloncancer <- as.numeric(rhc$cat1=='Colon Cancer')
rhc$Coma <- as.numeric(rhc$cat1=='Coma')
rhc$MOSFS <- as.numeric(rhc$cat1=='MOSF w/Sepsis')
# print("===================================================")
==> Create a data frame and call it rhc.small with the following variables of interest: All the underlying categories of “cat1” (ARF etc) that you created before, “cardiohx”, “chfhx”, “dementhx”, “psychhx”, “chrpulhx”, “renalhx”, “liverhx”, “gibledhx”, “malighx”, “immunhx”, “transhx”, “amihx”, “age”, “female”, “edu”, “das2d3pc”, “aps1”, “scoma1”, “meanbp1”, wblc1”, “hrt1”, “resp1”, “temp1”, “pafi1”, “alb1”, “hema1”, “bili1”, “crea1”, “sod1”, “pot1”, “paco21”, “ph1”, “wtkilo1”, “dnr1”, “resp”, “card”, “neuro”, “gastr”, “renal”, “meta”, “hema”, “seps”, “trauma”, “ortho”, “adld3p”, “urin1”, “treatment.swang”, “outcome.died”.
rhc.small <- rhc[,c("ARF", "CHF","Cirrhosis","COPD", "Lungcancer", "MOSFM", "Coloncancer", "Coma", "MOSFS", "cardiohx", "chfhx", "dementhx", "psychhx", "chrpulhx", "renalhx", "liverhx", "gibledhx", "malighx", "immunhx", "transhx", "amihx", "age", "edu", "das2d3pc", "aps1", "scoma1", "meanbp1", "wblc1", "hrt1", "resp1", "temp1", "pafi1", "alb1", "hema1", "bili1", "crea1", "sod1", "pot1", "paco21", "ph1", "wtkilo1", "dnr1", "resp", "card", "neuro", "gastr", "renal", "meta", "hema", "seps", "trauma", "ortho", "adld3p", "urin1", "treatment.swang", "outcome.died")]
#=> Create this data frame
Make sure you take a look at the definition of variables (on the URL given above) with unclear names to you. This data frame includes now all candidate variables from which we will select later for matching. However, before we jump into variable selection, let us continue data type conversion.
Check classes of these variables by using the str command
str(rhc.small)
## 'data.frame': 5735 obs. of 56 variables:
## $ ARF : num 0 0 0 1 0 0 0 1 0 1 ...
## $ CHF : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cirrhosis : num 0 0 0 0 0 0 0 0 0 0 ...
## $ COPD : num 1 0 0 0 0 1 0 0 0 0 ...
## $ Lungcancer : num 0 0 0 0 0 0 0 0 0 0 ...
## $ MOSFM : num 0 0 1 0 0 0 1 0 1 0 ...
## $ Coloncancer : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Coma : num 0 0 0 0 0 0 0 0 0 0 ...
## $ MOSFS : num 0 1 0 0 1 0 0 0 0 0 ...
## $ cardiohx : int 0 1 0 0 0 0 0 0 0 0 ...
## $ chfhx : int 0 1 0 0 0 1 0 0 0 0 ...
## $ dementhx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ psychhx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ chrpulhx : int 1 0 0 0 0 1 0 0 0 0 ...
## $ renalhx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ liverhx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ gibledhx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ malighx : int 1 0 1 0 0 0 1 0 0 1 ...
## $ immunhx : int 0 1 1 1 0 0 0 0 0 0 ...
## $ transhx : int 0 1 0 0 0 0 0 1 0 0 ...
## $ amihx : int 0 0 0 0 0 0 0 0 0 0 ...
## $ age : num 70.3 78.2 46.1 75.3 67.9 ...
## $ edu : num 12 12 14.07 9 9.95 ...
## $ das2d3pc : num 23.5 14.8 18.1 22.9 21.1 ...
## $ aps1 : int 46 50 82 48 72 38 29 25 47 48 ...
## $ scoma1 : int 0 0 0 0 41 0 26 100 0 0 ...
## $ meanbp1 : num 41 63 57 55 65 115 67 128 53 73 ...
## $ wblc1 : num 22.1 28.9 0.05 23.3 29.7 ...
## $ hrt1 : int 124 137 130 58 125 134 135 102 118 141 ...
## $ resp1 : num 10 38 40 26 27 36 10 34 30 40 ...
## $ temp1 : num 38.7 38.9 36.4 35.8 34.8 ...
## $ pafi1 : num 68 218 276 157 478 ...
## $ alb1 : num 3.5 2.6 3.5 3.5 3.5 ...
## $ hema1 : num 58 32.5 21.1 26.3 24 ...
## $ bili1 : num 1.01 0.7 1.01 0.4 1.01 ...
## $ crea1 : num 1.2 0.6 2.6 1.7 3.6 ...
## $ sod1 : int 145 137 146 117 126 138 136 136 136 146 ...
## $ pot1 : num 4 3.3 2.9 5.8 5.8 ...
## $ paco21 : num 40 34 16 30 17 68 45 26 40 30 ...
## $ ph1 : num 7.36 7.33 7.36 7.46 7.23 ...
## $ wtkilo1 : num 64.7 45.7 0 54.6 78.4 ...
## $ dnr1 : chr "No" "No" "No" "No" ...
## $ resp : chr "Yes" "No" "No" "Yes" ...
## $ card : chr "Yes" "No" "Yes" "No" ...
## $ neuro : chr "No" "No" "No" "No" ...
## $ gastr : chr "No" "No" "No" "No" ...
## $ renal : chr "No" "No" "No" "No" ...
## $ meta : chr "No" "No" "No" "No" ...
## $ hema : chr "No" "No" "No" "No" ...
## $ seps : chr "No" "Yes" "No" "No" ...
## $ trauma : chr "No" "No" "No" "No" ...
## $ ortho : chr "No" "No" "No" "No" ...
## $ adld3p : int 0 NA NA NA NA 0 NA NA NA NA ...
## $ urin1 : num NA 1437 599 NA 64 ...
## $ treatment.swang: num 0 1 1 0 1 0 0 0 0 1 ...
## $ outcome.died : num 0 1 0 1 1 0 0 1 0 0 ...
If you have “labelled” variables like labelled int and labelled num it means that the datset was imported from SPSS. In such a case, simply change such variables into numeric variables. If you do not see such variables then skip this step.
rhc.small$age <- as.numeric(rhc.small$age) # only if in your version of the data "age" was a "labelled" variable
print("===================================================")
## [1] "==================================================="
# => Change all labelled variables that include numbers (integer or numeric) into numeric.
integer_cols <- ls(select_if(rhc.small, is.integer))
for (col in integer_cols) {
rhc.small[col] <- as.numeric(unlist(rhc.small[col]))
}
# print("===================================================")
For each categorical variable with 2 values (such as sex) change it into a numeric variable by selecting one of the values (such as “female”) to be 1 and the other zero.
rhc.small$female <- as.numeric(rhc$sex=='Female')
print("===================================================")
## [1] "==================================================="
# => Change any factor with two values into a 0/1 variable by selecting a value (such as "female" above) to be 1 and the other value will be zero.
character_cols <- ls(select_if(rhc.small, is.character))
for (col in character_cols) {
rhc.small[col] <- as.numeric(rhc.small[col] == "Yes")
}
# print("===================================================")
Before we start the analysis we need to be aware of the following issue: If you encounter a variable with missing values then you might run into a problem with the Match() function later on. For this exercise you can follow this advice:
If the proportion of missing is above 15% for a variable then just exclude the whole variable from the data frame and hence from the analysis, it is likely that the information it had still resides in the combination of the many other variables.
If you have less than 15% missing variables try to impute the values of the variable. In the extreme case you could apply multiple imputations and repeat the analysis on each imputation set and then pool the results. However for this exercise you can follow a simple strategy such as imputing the value with mean/median/mode. This would be a limitation but imputing missing values is not the focus of our exercise on causal inference.
=> Inspect the proportion of missing values for each variable and follow the advice above.
print("===================================================")
## [1] "==================================================="
prop_missing = colMeans(is.na(rhc.small))*100
#There are only two columns with more than 0% missing values. Both have a proportion missing values of more than 15%, so no imputation of missing values is needed and those columns are excluded from the dataset.
rhc.small <- select_if(rhc.small, prop_missing < 15)
print("===================================================")
## [1] "==================================================="
Now we are ready to start the analysis.
According to the disjunctive cause criterion, we want to find variables (to control for) that are predictive of treatment and/or the outcome. In this assignment we will use the criterion to select all (pre-treatment) variables that are predictive of the outcome. This will include true confounders (that also affect the treatment, and risk factors that only affect the outcome). All our variables in rhc.small, except the outcome of course, are pre-treatment variables (none is measured AFTER the intervention) so they are all good candidates for the selection.
The truth is that the best way to select these variables is to use clinical knowledge. However, in this assignment we will do this in a data-driven and practical way. Specifically, we will operationalize “predictive of the outcome” as the following criterion: a variable is predictive of the outcome if its univariate association with the outcome is significant at the 0.1 level. Because our outcome is binary we can translate this criterion to: For every candidate variable in rhc.small fit a logistic regression model to predict outcome (outcome.died) using only the candidate variable as a predictor and retain the variable if the p-value of its association with the outcome is ≤ 0.1, and otherwise discard the variable. Note: Do not include the treatment in the selection.
print("===================================================")
## [1] "==================================================="
# => Fit logistic regression models for each variable separately with the outcome and retain those with an association having a p-value of ≤ 0.1. Tip: you may want to automate the whole process!
vars <- colnames(rhc.small[,!names(rhc.small) %in% c("treatment.swang", "outcome.died")])
xvars = list()
for (var in vars) {
mod <- as.formula(sprintf("outcome.died ~ %s", var))
fit <- glm(formula = mod, data = rhc.small, family = binomial)
p_value <- summary(fit)$coefficients[,4][2]
if (p_value <= 0.1){
xvars[[var]] <- fit
}
}
# => Put the names of the selected variables (without the treatment and outcome variables) in the variable called xvars by writing xvars <- c("ARF", ...) or do it automatically!
# => Print xvar
xvars <- names(xvars)
print(xvars)
## [1] "ARF" "CHF" "Cirrhosis" "Lungcancer" "MOSFM"
## [6] "Coma" "cardiohx" "chfhx" "dementhx" "psychhx"
## [11] "chrpulhx" "renalhx" "liverhx" "malighx" "immunhx"
## [16] "transhx" "age" "edu" "das2d3pc" "aps1"
## [21] "scoma1" "meanbp1" "wblc1" "temp1" "alb1"
## [26] "hema1" "bili1" "crea1" "pot1" "paco21"
## [31] "ph1" "wtkilo1" "dnr1" "resp" "card"
## [36] "gastr" "renal" "hema" "seps" "trauma"
# => Put these selected variables in a data.frame called rhc.selected that also includes the treatment and outcome variables to this data frame.
rhc.selected <- rhc.small[, c("treatment.swang", "outcome.died", xvars)]
# print("===================================================")
print("===================================================")
## [1] "==================================================="
#=> look at "table 1", complete the command by filling in the "..."
table1<- CreateTableOne(vars=xvars,strata="treatment.swang", data=rhc.selected, test=FALSE)
## include standardized mean difference (SMD)
print(table1, smd=TRUE)
## Stratified by treatment.swang
## 0 1 SMD
## n 3551 2184
## ARF (mean (SD)) 0.45 (0.50) 0.42 (0.49) 0.059
## CHF (mean (SD)) 0.07 (0.25) 0.10 (0.29) 0.095
## Cirrhosis (mean (SD)) 0.05 (0.22) 0.02 (0.15) 0.145
## Lungcancer (mean (SD)) 0.01 (0.10) 0.00 (0.05) 0.095
## MOSFM (mean (SD)) 0.07 (0.25) 0.07 (0.26) 0.018
## Coma (mean (SD)) 0.10 (0.29) 0.04 (0.20) 0.207
## cardiohx (mean (SD)) 0.16 (0.37) 0.20 (0.40) 0.116
## chfhx (mean (SD)) 0.17 (0.37) 0.19 (0.40) 0.069
## dementhx (mean (SD)) 0.12 (0.32) 0.07 (0.25) 0.163
## psychhx (mean (SD)) 0.08 (0.27) 0.05 (0.21) 0.143
## chrpulhx (mean (SD)) 0.22 (0.41) 0.14 (0.35) 0.192
## renalhx (mean (SD)) 0.04 (0.20) 0.05 (0.21) 0.032
## liverhx (mean (SD)) 0.07 (0.26) 0.06 (0.24) 0.049
## malighx (mean (SD)) 0.25 (0.43) 0.20 (0.40) 0.101
## immunhx (mean (SD)) 0.26 (0.44) 0.29 (0.45) 0.080
## transhx (mean (SD)) 0.09 (0.29) 0.15 (0.36) 0.170
## age (mean (SD)) 61.76 (17.29) 60.75 (15.63) 0.061
## edu (mean (SD)) 11.57 (3.13) 11.86 (3.16) 0.091
## das2d3pc (mean (SD)) 20.37 (5.48) 20.70 (5.03) 0.063
## aps1 (mean (SD)) 50.93 (18.81) 60.74 (20.27) 0.501
## scoma1 (mean (SD)) 22.25 (31.37) 18.97 (28.26) 0.110
## meanbp1 (mean (SD)) 84.87 (38.87) 68.20 (34.24) 0.455
## wblc1 (mean (SD)) 15.26 (11.41) 16.27 (12.55) 0.084
## temp1 (mean (SD)) 37.63 (1.74) 37.59 (1.83) 0.021
## alb1 (mean (SD)) 3.16 (0.67) 2.98 (0.93) 0.230
## hema1 (mean (SD)) 32.70 (8.79) 30.51 (7.42) 0.269
## bili1 (mean (SD)) 2.00 (4.43) 2.71 (5.33) 0.145
## crea1 (mean (SD)) 1.92 (2.03) 2.47 (2.05) 0.270
## pot1 (mean (SD)) 4.08 (1.04) 4.05 (1.01) 0.027
## paco21 (mean (SD)) 39.95 (14.24) 36.79 (10.97) 0.249
## ph1 (mean (SD)) 7.39 (0.11) 7.38 (0.11) 0.120
## wtkilo1 (mean (SD)) 65.04 (29.50) 72.36 (27.73) 0.256
## dnr1 (mean (SD)) 0.14 (0.35) 0.07 (0.26) 0.228
## resp (mean (SD)) 0.42 (0.49) 0.29 (0.45) 0.270
## card (mean (SD)) 0.28 (0.45) 0.42 (0.49) 0.295
## gastr (mean (SD)) 0.15 (0.35) 0.19 (0.39) 0.121
## renal (mean (SD)) 0.04 (0.20) 0.07 (0.25) 0.116
## hema (mean (SD)) 0.07 (0.25) 0.05 (0.22) 0.062
## seps (mean (SD)) 0.15 (0.35) 0.24 (0.42) 0.234
## trauma (mean (SD)) 0.01 (0.07) 0.02 (0.12) 0.104
## For which variables we do not have a good balance? Look at the slides for a guideline.
smd_values1 <- ExtractSmd(table1)
# The variables printed to not have a good balance, since it is above 0.2.
smd_values1[smd_values1 > 0.2, ]
## Coma aps1 meanbp1 alb1 hema1 crea1 paco21 wtkilo1
## 0.2072776 0.5014019 0.4550959 0.2299199 0.2693013 0.2696236 0.2485938 0.2556794
## dnr1 resp card seps
## 0.2276000 0.2695134 0.2949292 0.2337967
print("===================================================")
## [1] "==================================================="
Let us now do matching in the hope that the balance will get better so we can have a more valid inference.
print("===================================================")
## [1] "==================================================="
# We want to match the treated with the untreated subjects. Find 1 on 1 match.
# # => Complete the command
greedymatch <- Match(Tr=rhc.selected$treatment.swang,M=1, X=rhc.selected[xvars], replace=FALSE)
# read the documentation for Match() by typing ?Match.
?Match
# => What does replace=FALSE mean?
# That matching should not be done with replacement and therewith the order of matches generally matters. The matching for the first observation will be fund first, for the second observation second, etc. This is greedy matching.
# => What is inside X in the command above?
# A matrix containing the variables we wish to match on. In this case all the candidate variables.
# print("===================================================")
print("===================================================")
## [1] "==================================================="
greedymatch$index.treated # => What do you think this is?
## [1] 2 3 5 10 12 13 17 18 21 22 23 26 33 36
## [15] 38 42 51 55 56 57 59 62 70 71 73 80 83 86
## [29] 89 98 100 101 102 107 109 114 115 118 120 124 134 136
## [43] 137 138 142 143 145 146 150 152 153 155 164 167 168 169
## [57] 171 172 173 175 176 178 182 183 185 187 189 192 193 195
## [71] 201 205 206 208 211 216 219 220 224 225 226 227 228 231
## [85] 233 236 238 240 244 245 247 248 260 265 266 267 274 275
## [99] 277 281 282 285 288 290 291 294 295 296 298 303 304 306
## [113] 309 312 315 317 321 323 324 325 327 330 333 334 337 339
## [127] 341 342 343 346 347 348 350 351 356 358 366 368 372 374
## [141] 375 376 378 381 385 389 391 393 396 398 402 403 404 406
## [155] 413 416 418 421 422 423 425 426 432 433 435 436 441 444
## [169] 445 448 460 463 468 470 471 472 474 480 481 482 484 485
## [183] 487 489 493 494 495 499 503 504 508 509 510 511 512 513
## [197] 515 526 527 528 532 533 537 540 543 550 552 554 556 559
## [211] 560 564 566 567 568 569 571 572 574 577 578 579 583 584
## [225] 585 589 592 594 595 598 599 600 601 603 608 610 617 618
## [239] 625 627 630 631 634 639 640 641 642 650 651 654 655 658
## [253] 659 664 665 667 672 675 678 682 689 690 691 694 695 698
## [267] 700 701 703 705 706 707 712 713 718 719 721 724 727 731
## [281] 737 738 739 742 743 746 747 748 752 754 756 757 758 759
## [295] 762 769 771 772 773 777 778 780 782 784 787 788 794 795
## [309] 796 800 805 811 814 820 822 824 828 829 830 834 837 840
## [323] 841 842 844 848 850 851 852 854 862 863 866 867 871 877
## [337] 881 882 885 886 891 893 894 895 896 897 899 900 902 908
## [351] 920 921 922 924 925 930 932 933 934 944 945 947 948 951
## [365] 952 959 960 961 962 963 964 969 971 972 973 974 982 985
## [379] 986 991 992 993 1001 1002 1004 1005 1007 1008 1012 1014 1016 1017
## [393] 1020 1021 1022 1023 1024 1027 1031 1035 1036 1038 1039 1043 1053 1054
## [407] 1055 1065 1071 1072 1074 1075 1076 1079 1088 1089 1095 1100 1108 1112
## [421] 1113 1117 1120 1127 1129 1131 1134 1135 1136 1137 1139 1144 1146 1148
## [435] 1150 1153 1157 1159 1163 1175 1176 1179 1180 1189 1193 1194 1195 1196
## [449] 1197 1203 1208 1211 1212 1214 1215 1217 1218 1219 1220 1228 1230 1232
## [463] 1233 1237 1241 1247 1250 1251 1254 1255 1260 1265 1266 1267 1269 1270
## [477] 1271 1273 1274 1276 1277 1278 1279 1280 1284 1292 1293 1297 1298 1301
## [491] 1302 1305 1307 1308 1310 1311 1312 1314 1323 1324 1326 1329 1331 1333
## [505] 1339 1343 1347 1348 1350 1351 1352 1355 1358 1359 1360 1362 1363 1364
## [519] 1365 1367 1369 1370 1371 1381 1384 1387 1391 1395 1396 1397 1399 1402
## [533] 1406 1409 1411 1413 1414 1415 1419 1421 1422 1423 1425 1426 1429 1433
## [547] 1434 1435 1440 1444 1448 1449 1454 1456 1458 1459 1463 1466 1468 1472
## [561] 1475 1480 1484 1485 1490 1492 1494 1499 1500 1503 1504 1505 1507 1508
## [575] 1511 1513 1514 1516 1517 1518 1522 1526 1527 1528 1530 1531 1533 1535
## [589] 1536 1537 1541 1542 1544 1548 1551 1552 1553 1554 1556 1557 1559 1560
## [603] 1563 1567 1568 1569 1570 1572 1576 1583 1585 1591 1593 1594 1597 1600
## [617] 1603 1606 1609 1611 1615 1616 1618 1619 1622 1623 1624 1629 1630 1634
## [631] 1636 1637 1642 1643 1644 1645 1646 1650 1651 1653 1663 1669 1674 1675
## [645] 1677 1678 1680 1682 1686 1688 1699 1701 1702 1703 1704 1705 1707 1709
## [659] 1712 1717 1718 1721 1725 1729 1739 1740 1744 1747 1756 1760 1762 1766
## [673] 1770 1771 1772 1773 1775 1782 1792 1794 1795 1796 1802 1807 1809 1811
## [687] 1812 1813 1818 1820 1822 1825 1826 1827 1828 1835 1836 1838 1843 1850
## [701] 1857 1860 1862 1863 1864 1866 1867 1870 1876 1877 1878 1888 1889 1890
## [715] 1895 1899 1905 1906 1907 1916 1917 1923 1924 1930 1932 1933 1942 1945
## [729] 1946 1955 1956 1957 1959 1963 1966 1967 1969 1970 1972 1973 1976 1979
## [743] 1981 1984 1985 1989 1990 1996 1997 1999 2002 2004 2007 2010 2016 2017
## [757] 2018 2022 2028 2030 2032 2035 2036 2039 2041 2049 2051 2053 2054 2056
## [771] 2057 2058 2060 2061 2063 2064 2065 2068 2071 2072 2075 2078 2079 2085
## [785] 2090 2095 2096 2097 2101 2102 2105 2109 2110 2113 2121 2122 2123 2127
## [799] 2129 2130 2135 2138 2139 2140 2146 2147 2154 2155 2159 2162 2165 2168
## [813] 2169 2171 2173 2176 2177 2178 2179 2181 2183 2185 2186 2192 2193 2194
## [827] 2195 2196 2202 2203 2208 2210 2216 2217 2221 2225 2227 2228 2231 2234
## [841] 2235 2237 2238 2239 2240 2241 2243 2246 2255 2258 2261 2262 2267 2271
## [855] 2273 2275 2276 2281 2282 2290 2291 2292 2293 2296 2298 2300 2302 2303
## [869] 2307 2308 2312 2314 2319 2322 2324 2330 2338 2339 2341 2343 2348 2349
## [883] 2352 2359 2361 2365 2367 2368 2369 2371 2373 2375 2377 2386 2387 2388
## [897] 2395 2396 2400 2401 2402 2407 2409 2412 2413 2414 2419 2420 2426 2427
## [911] 2435 2436 2439 2442 2443 2446 2448 2450 2451 2453 2454 2459 2462 2466
## [925] 2468 2476 2479 2500 2513 2515 2521 2523 2528 2530 2531 2535 2540 2544
## [939] 2546 2549 2550 2552 2555 2560 2561 2569 2571 2572 2573 2578 2582 2584
## [953] 2585 2587 2588 2592 2596 2600 2601 2605 2606 2607 2611 2612 2614 2615
## [967] 2616 2620 2622 2627 2628 2630 2633 2637 2644 2648 2650 2651 2653 2654
## [981] 2661 2670 2671 2672 2673 2674 2676 2677 2678 2682 2685 2686 2689 2691
## [995] 2694 2696 2697 2698 2700 2704 2707 2710 2712 2713 2714 2716 2717 2721
## [1009] 2722 2724 2727 2728 2729 2732 2735 2740 2742 2746 2749 2751 2755 2758
## [1023] 2759 2760 2761 2763 2766 2767 2772 2773 2780 2781 2782 2785 2786 2787
## [1037] 2792 2796 2798 2804 2805 2806 2815 2817 2819 2820 2821 2822 2824 2825
## [1051] 2827 2828 2831 2835 2839 2842 2845 2846 2847 2848 2851 2853 2854 2857
## [1065] 2860 2861 2862 2864 2865 2871 2872 2875 2876 2878 2880 2881 2887 2888
## [1079] 2889 2890 2892 2895 2896 2899 2900 2903 2908 2909 2911 2913 2914 2929
## [1093] 2930 2932 2933 2935 2937 2942 2952 2954 2956 2957 2959 2968 2969 2972
## [1107] 2973 2977 2979 2981 2983 2988 2992 2994 2995 3001 3002 3005 3009 3011
## [1121] 3014 3019 3020 3023 3025 3028 3029 3030 3036 3037 3038 3040 3041 3045
## [1135] 3048 3049 3050 3051 3055 3059 3062 3064 3065 3066 3067 3068 3069 3081
## [1149] 3082 3083 3085 3087 3089 3092 3093 3095 3101 3103 3104 3105 3107 3108
## [1163] 3113 3115 3119 3120 3122 3123 3125 3127 3133 3135 3136 3138 3141 3142
## [1177] 3144 3146 3148 3149 3153 3154 3155 3157 3160 3161 3168 3169 3181 3188
## [1191] 3193 3194 3199 3201 3203 3205 3206 3209 3210 3212 3213 3217 3219 3220
## [1205] 3222 3225 3227 3231 3234 3235 3240 3242 3244 3246 3253 3258 3260 3261
## [1219] 3263 3266 3271 3273 3274 3275 3276 3278 3280 3287 3290 3296 3298 3301
## [1233] 3303 3307 3309 3313 3316 3320 3321 3323 3324 3327 3332 3333 3334 3335
## [1247] 3336 3337 3341 3343 3344 3353 3356 3359 3360 3363 3364 3370 3374 3376
## [1261] 3379 3381 3384 3388 3390 3397 3398 3400 3401 3402 3403 3406 3409 3410
## [1275] 3411 3413 3414 3416 3417 3431 3432 3433 3437 3438 3439 3443 3444 3446
## [1289] 3447 3449 3453 3455 3458 3459 3460 3466 3468 3473 3474 3476 3478 3480
## [1303] 3484 3487 3489 3492 3494 3497 3498 3501 3504 3508 3513 3515 3519 3523
## [1317] 3526 3529 3534 3536 3537 3538 3540 3541 3542 3543 3546 3547 3549 3550
## [1331] 3551 3552 3557 3558 3559 3560 3561 3566 3568 3569 3576 3581 3582 3588
## [1345] 3590 3591 3592 3596 3597 3600 3602 3605 3606 3609 3612 3615 3616 3619
## [1359] 3620 3623 3625 3626 3629 3634 3646 3647 3655 3658 3661 3663 3665 3677
## [1373] 3680 3682 3686 3688 3693 3695 3697 3700 3701 3703 3705 3710 3712 3713
## [1387] 3716 3721 3723 3724 3725 3726 3728 3731 3734 3737 3739 3743 3744 3745
## [1401] 3746 3748 3749 3751 3753 3754 3756 3759 3764 3766 3767 3768 3770 3771
## [1415] 3774 3775 3777 3778 3781 3783 3791 3792 3795 3797 3798 3800 3805 3807
## [1429] 3809 3813 3815 3818 3819 3827 3828 3832 3834 3836 3838 3841 3846 3848
## [1443] 3851 3852 3854 3857 3860 3861 3863 3864 3865 3866 3870 3872 3877 3879
## [1457] 3880 3881 3882 3891 3892 3901 3902 3916 3919 3923 3924 3925 3926 3929
## [1471] 3932 3933 3939 3940 3946 3947 3951 3952 3954 3956 3957 3960 3962 3964
## [1485] 3967 3972 3975 3976 3978 3982 3983 3985 3991 3994 3995 3996 3999 4001
## [1499] 4005 4011 4014 4017 4019 4020 4024 4026 4027 4030 4032 4035 4036 4037
## [1513] 4038 4040 4043 4046 4049 4052 4058 4060 4071 4074 4081 4083 4087 4088
## [1527] 4090 4095 4098 4099 4102 4104 4105 4109 4112 4113 4114 4115 4117 4119
## [1541] 4121 4122 4126 4127 4129 4131 4135 4146 4147 4148 4150 4151 4152 4153
## [1555] 4154 4155 4157 4159 4164 4165 4166 4168 4172 4174 4176 4178 4181 4185
## [1569] 4186 4188 4190 4192 4198 4200 4203 4204 4208 4211 4212 4215 4218 4221
## [1583] 4222 4227 4229 4230 4231 4233 4236 4237 4238 4242 4250 4251 4252 4253
## [1597] 4254 4256 4258 4264 4265 4267 4274 4275 4277 4278 4279 4280 4281 4286
## [1611] 4288 4289 4292 4293 4294 4295 4296 4297 4299 4308 4309 4312 4320 4329
## [1625] 4330 4331 4332 4337 4343 4344 4345 4349 4351 4352 4354 4358 4360 4362
## [1639] 4363 4365 4366 4368 4372 4378 4381 4384 4386 4392 4395 4396 4398 4399
## [1653] 4408 4409 4412 4414 4415 4416 4417 4419 4423 4424 4425 4426 4427 4435
## [1667] 4437 4440 4442 4443 4444 4447 4449 4453 4454 4455 4456 4460 4465 4467
## [1681] 4472 4473 4474 4476 4477 4478 4480 4482 4486 4487 4491 4494 4497 4502
## [1695] 4506 4508 4510 4511 4517 4521 4522 4524 4526 4527 4528 4530 4531 4533
## [1709] 4537 4538 4540 4541 4542 4543 4552 4554 4556 4557 4565 4567 4571 4573
## [1723] 4574 4578 4582 4588 4590 4592 4595 4598 4602 4603 4607 4609 4610 4613
## [1737] 4615 4619 4621 4625 4634 4638 4640 4648 4649 4650 4651 4654 4656 4658
## [1751] 4659 4660 4662 4663 4664 4667 4672 4674 4677 4681 4683 4688 4693 4694
## [1765] 4697 4702 4703 4706 4709 4711 4713 4714 4716 4719 4722 4723 4724 4727
## [1779] 4728 4732 4733 4736 4739 4741 4748 4751 4752 4754 4755 4763 4766 4767
## [1793] 4768 4773 4778 4782 4783 4784 4789 4790 4793 4794 4795 4797 4798 4805
## [1807] 4807 4808 4809 4812 4813 4819 4822 4826 4827 4831 4834 4835 4836 4844
## [1821] 4848 4851 4853 4857 4859 4861 4864 4870 4871 4872 4874 4875 4876 4878
## [1835] 4879 4881 4884 4885 4888 4890 4891 4893 4894 4895 4896 4902 4907 4908
## [1849] 4910 4915 4916 4918 4920 4921 4922 4928 4929 4931 4935 4936 4941 4943
## [1863] 4946 4948 4949 4954 4956 4959 4961 4970 4976 4977 4978 4980 4982 4984
## [1877] 4985 4986 4987 4988 4989 4991 4992 4993 4994 4997 4998 4999 5000 5001
## [1891] 5003 5006 5008 5012 5013 5015 5016 5018 5020 5021 5022 5023 5025 5026
## [1905] 5029 5033 5034 5038 5044 5046 5048 5049 5055 5063 5069 5070 5075 5077
## [1919] 5078 5079 5081 5082 5083 5085 5087 5088 5093 5095 5102 5105 5106 5107
## [1933] 5109 5112 5113 5114 5115 5118 5123 5128 5131 5132 5138 5143 5145 5150
## [1947] 5152 5154 5156 5160 5162 5163 5164 5165 5167 5170 5173 5174 5176 5179
## [1961] 5183 5184 5185 5186 5192 5193 5194 5195 5196 5199 5202 5203 5205 5206
## [1975] 5209 5210 5218 5219 5220 5230 5233 5234 5235 5236 5237 5240 5242 5247
## [1989] 5248 5251 5254 5255 5256 5264 5266 5267 5268 5270 5274 5275 5276 5277
## [2003] 5285 5286 5293 5297 5300 5307 5309 5312 5313 5318 5319 5321 5322 5327
## [2017] 5333 5336 5338 5345 5346 5347 5349 5354 5356 5358 5360 5363 5365 5367
## [2031] 5369 5372 5381 5383 5388 5389 5390 5392 5394 5395 5396 5397 5398 5402
## [2045] 5406 5407 5408 5419 5421 5424 5430 5434 5435 5436 5440 5442 5445 5446
## [2059] 5448 5450 5454 5456 5457 5461 5464 5465 5466 5468 5470 5471 5473 5474
## [2073] 5476 5477 5479 5480 5482 5488 5489 5491 5493 5495 5497 5498 5499 5500
## [2087] 5502 5503 5504 5506 5509 5511 5512 5513 5514 5519 5521 5523 5525 5527
## [2101] 5530 5533 5536 5538 5539 5541 5545 5547 5549 5555 5558 5560 5564 5568
## [2115] 5569 5573 5574 5575 5576 5577 5580 5582 5583 5585 5589 5590 5592 5594
## [2129] 5595 5596 5597 5598 5600 5602 5604 5605 5607 5608 5609 5610 5612 5613
## [2143] 5614 5617 5620 5627 5628 5630 5631 5633 5635 5639 5643 5651 5652 5653
## [2157] 5658 5662 5663 5666 5667 5668 5672 5673 5674 5675 5678 5681 5684 5685
## [2171] 5695 5699 5704 5705 5710 5711 5712 5714 5715 5720 5722 5723 5726 5729
# The indices of all the treated patients in the original dataset that were matched to an untreated patient, in this specific order.
greedymatch$index.control # => What do you think this is?
## [1] 4620 2486 3247 392 3787 984 582 4339 905 4500 200 1751 250 4726
## [15] 4130 3249 5447 5518 2921 868 3694 4548 1066 4216 4639 3391 1126 2263
## [29] 2891 5089 2783 1059 3156 5153 5364 3736 2938 889 3128 3603 1639 909
## [43] 2692 927 2998 2066 5212 264 3907 232 4529 1887 2247 269 5226 2093
## [57] 32 3942 4839 2299 2916 1502 3256 2665 3632 1798 133 235 5216 5492
## [71] 180 1300 370 1322 887 1092 531 755 2800 4015 4232 1658 4811 2161
## [85] 5490 4652 4632 3053 4002 518 876 2945 4932 3944 2478 2491 31 1151
## [99] 2966 3129 4781 4906 1224 1210 744 1026 612 5694 1952 3304 4960 3112
## [113] 1118 3448 3351 3354 2641 500 2376 2389 3998 692 2198 1664 5646 4846
## [127] 4704 5244 1394 449 1389 1257 4802 2084 1902 3628 3134 384 2494 3202
## [141] 5191 5043 3054 2778 2424 1791 1839 2974 3117 4572 2986 1670 818 3021
## [155] 5683 5377 761 2206 5351 939 4777 4461 5517 64 4223 3137 2723 179
## [169] 2621 1995 4202 2383 646 1335 1056 3027 5002 1745 5391 300 3488 5625
## [183] 1229 1620 3894 2280 2918 4744 2547 3159 3762 979 3845 128 5654 1142
## [197] 4772 1681 4718 5664 3527 1202 4873 1460 1814 2222 3823 5587 5522 3091
## [211] 711 1515 644 2408 4975 4317 2311 4418 5314 3920 4599 2756 1587 2144
## [225] 911 5444 1082 817 4106 2619 4175 5619 5133 1041 4644 3255 497 879
## [239] 4852 1201 2055 2327 4214 4953 875 1832 5259 3362 1881 2939 4911 3173
## [253] 2976 373 2539 686 4197 2506 995 4604 3524 1859 359 5642 4631 308
## [267] 5579 3339 5227 3057 2668 365 2559 4357 977 1845 3254 2536 1915 2693
## [281] 5528 870 3395 1171 3997 4136 2752 699 452 4957 2687 4125 2213 2429
## [295] 2156 5056 2583 2503 4323 3826 3126 3462 3198 4955 1190 141 4310 5328
## [309] 785 5469 3124 4535 213 2044 1935 3893 4923 789 2545 1246 5252 2776
## [323] 815 546 2009 1417 3293 2356 2157 4867 307 5559 1357 2287 4189 5280
## [337] 1631 4382 1571 5011 2119 3965 4244 4391 3532 2703 1077 44 5405 2610
## [351] 52 1121 4373 5263 4525 4328 3839 4889 892 534 3684 4355 3472 4182
## [365] 5039 3530 1403 1049 5071 5137 24 4369 1061 123 1829 5180 1793 4108
## [379] 3121 3720 753 5526 1764 3300 2538 4483 3664 328 3464 2534 1386 4484
## [393] 2656 3441 2480 1540 3847 2410 461 1998 1656 3585 1849 2438 1286 4666
## [407] 4828 4364 1683 1819 3285 4680 3074 2364 3816 2027 3780 1817 2483 3373
## [421] 3611 1776 5144 725 836 2905 199 1483 5697 4144 3667 5342 3742 2038
## [435] 4446 2187 3988 4249 4925 2898 1549 2631 575 2421 1447 4761 3010 1376
## [449] 2415 586 2690 4318 4979 3948 714 2645 3910 379 5437 2337 3618 1149
## [463] 1750 3328 4950 680 320 271 1111 5655 5148 455 674 4899 4133 1209
## [477] 3773 1691 2309 2495 1078 966 4319 1488 4824 4641 4261 3763 521 3958
## [491] 3185 2493 570 645 5024 2305 230 283 1632 1964 1264 2517 255 4199
## [505] 3928 3471 1774 5581 3639 3277 2901 3992 3912 3673 1140 3644 475 1689
## [519] 544 5151 903 5731 978 85 2873 2948 3943 2031 4774 5696 2856 5041
## [533] 4903 1865 3974 4765 4245 745 4823 4 65 1461 3197 4359 127 1467
## [547] 5718 1949 5054 931 1116 4850 19 2609 2272 1393 1080 3088 2719 5629
## [561] 1851 1928 4815 5458 5676 3380 1319 4479 2777 4004 4544 2141 4021 3319
## [575] 3248 4269 88 2170 4559 779 1755 3669 2701 2725 4738 5028 3814 4469
## [589] 3750 3785 3564 2709 473 5201 4485 5250 4897 1927 45 765 1777 2626
## [603] 2481 2711 2602 4149 2562 558 3643 4645 749 197 5690 3171 2020 823
## [617] 5117 103 1931 1143 5368 3286 3888 5481 3070 2069 3908 2256 3914 4947
## [631] 4438 5103 4635 305 5127 5140 5288 813 4201 901 3883 4880 3516 4586
## [645] 3214 2961 581 1372 2664 2970 2008 2646 2417 956 4617 1268 3418 1803
## [659] 1734 2810 1177 3429 2045 4096 3017 2508 1034 5670 1968 5157 1099 1599
## [673] 2754 5084 3622 548 3571 4849 5387 2832 1141 2744 2467 649 2428 349
## [687] 1259 3927 4655 3450 1831 4184 1785 1315 1353 3862 1941 3251 1879 3039
## [701] 3368 2639 3544 106 456 3465 4676 1046 3621 5181 3167 3174 2115 775
## [715] 4110 774 2863 4684 2642 954 5057 2591 1925 2625 4507 3421 4863 1992
## [729] 1052 998 793 1231 1354 4140 1824 2447 5344 3520 5463 4498 5734 25
## [743] 2843 1882 4272 1693 5207 5155 2982 1607 3608 2577 37 3909 4307 1965
## [757] 2323 2463 2111 1064 3237 914 60 5316 2214 859 3056 1790 4939 2799
## [771] 838 1898 1706 708 620 4952 2971 1252 2279 5291 4647 184 5279 4686
## [785] 3365 1184 2423 2136 1368 2160 2172 5045 1178 4335 5096 3109 2385 1855
## [799] 397 4624 5189 1912 1601 4499 2919 1610 1780 2737 4670 2669 3729 4901
## [813] 2091 2978 949 4944 1152 5484 5624 2288 2205 29 5626 4729 1167 3031
## [827] 5299 4933 5269 4374 2855 2944 3535 4420 5382 5352 3986 5567 1781 188
## [841] 855 683 3900 702 1258 1382 580 3987 4376 2201 542 440 2731 3288
## [855] 5310 2098 1105 3176 3830 3539 4141 8 5124 1523 4287 1469 2088 5335
## [869] 4173 5516 5417 15 4545 735 2922 615 1977 555 1561 3885 3654 3013
## [883] 4995 4669 3502 3636 2142 1032 2488 1512 1937 5524 2996 4770 410 4786
## [897] 1555 1974 3950 2381 4124 668 3371 3554 5551 626 5462 679 1700 3218
## [911] 3445 1943 5158 505 4061 2604 5050 2532 3990 2512 1913 352 4678 4942
## [925] 3867 1883 2092 91 5282 4536 1861 5190 1451 4034 4926 2874 1808 3617
## [939] 5487 1834 1506 4336 802 1084 516 5554 4514 1294 3718 1768 4512 4623
## [953] 1698 1604 459 5014 1918 4457 332 401 3968 4576 3589 1720 522 5669
## [967] 2594 2232 1657 4067 3116 4840 2125 119 3435 5698 5027 1453 4350 254
## [981] 5292 767 1325 1248 148 3183 2223 4488 545 3789 3899 1213 2779 3456
## [995] 1564 2393 4161 3521 79 1470 4734 3906 158 496 4562 1379 3598 1107
## [1009] 2802 2811 1328 1086 1172 4523 3727 5135 1830 2209 3993 1130 458 4407
## [1023] 2218 2720 3241 1138 4575 766 2708 429 1869 3719 1486 20 2984 34
## [1037] 2180 1442 5302 3685 1181 1660 4717 2332 3650 4611 1903 4177 3422 1110
## [1051] 5067 130 2397 1263 345 2211 2268 1799 5283 5086 3840 3858 4866 2285
## [1065] 2226 454 3096 3905 4570 740 9 2224 5486 4788 1788 4504 953 1816
## [1079] 1951 340 3911 252 4413 5200 557 1344 4116 2823 1602 1723 1081 2104
## [1093] 808 5161 1654 2445 76 3385 2295 1588 2608 3681 622 2504 319 2297
## [1107] 1018 90 400 5572 4158 1162 3772 1742 1087 4448 988 5578 4643 2430
## [1121] 2001 2114 3000 477 1767 5147 4550 4912 498 3889 2866 4689 3347 3434
## [1135] 3895 3875 84 5374 810 2593 3297 2920 5177 4132 3670 30 4642 4450
## [1149] 4101 2315 1309 4492 2502 604 2501 4077 4969 256 3799 11 4913 4284
## [1163] 2372 4758 918 786 5036 4580 5260 3228 2320 4029 3291 1408 5308 5420
## [1177] 3268 1852 218 717 4397 4093 4618 1186 791 1249 3238 4048 121 5296
## [1191] 1336 5068 2507 5623 1960 4919 1815 1908 514 5370 798 316 1635 1304
## [1205] 1684 835 997 439 2643 1733 3689 3061 1058 4886 4749 825 3503 688
## [1219] 222 5393 5304 2906 210 5556 3499 1509 687 7 5548 4563 1356 3162
## [1233] 4388 1787 5599 4010 4065 2444 239 355 4973 5225 1783 937 386 3624
## [1247] 2893 2684 4501 2730 1405 3613 1062 1373 160 2270 4900 1757 47 2837
## [1261] 1732 4210 3730 427 2784 4673 5570 2556 2726 4841 4904 5409 2333 156
## [1275] 278 462 1652 1452 3052 4945 1073 5215 1168 2736 35 797 1710 5379
## [1289] 2449 3507 1040 4346 1752 1627 1581 2870 864 3829 5208 2526 1390 3518
## [1303] 535 4720 1471 5732 3699 1299 3594 3404 2499 1617 2882 4605 5323 2251
## [1317] 4546 2394 4865 344 1306 5119 643 5508 5214 1778 5222 1122 4086 4505
## [1331] 3477 4721 3849 3485 732 847 212 5376 447 4629 5657 1349 3187 4089
## [1345] 715 2014 490 5198 898 3884 2849 5659 1464 4564 2197 3164 3638 3887
## [1359] 5111 3342 916 5534 2617 5357 809 279 5700 1158 1236 4193 4764 2638
## [1373] 5121 478 3627 2518 4340 5404 1154 1626 2199 4597 4470 377 917 4334
## [1387] 2791 2542 94 4883 1900 4661 1128 4657 2884 3790 190 928 1103 2220
## [1401] 5213 4843 1192 536 1281 3294 614 4759 3938 4633 1244 2808 563 4637
## [1415] 234 3495 5634 104 4616 3574 5187 5494 5228 3463 2774 4270 2145 4390
## [1429] 3250 2048 1694 3152 869 1170 3405 5334 958 273 3853 162 4063 4591
## [1443] 5728 4325 1621 4860 3855 5438 4675 5262 3394 1805 2858 4887 857 1392
## [1457] 3191 479 3662 4842 1748 1443 980 3961 4968 3555 2153 14 1823 5553
## [1471] 3935 4589 5724 2958 237 4466 1445 647 3583 3810 4303 4111 5416 5362
## [1485] 3419 2566 2052 4972 4260 2046 5702 4940 1029 2012 3058 662 361 4234
## [1499] 3200 2257 4066 3392 3259 5713 5735 1592 5353 1227 1690 3525 2015 3493
## [1513] 3236 2254 5733 3043 975 5343 3812 3831 3367 3510 2999 1097 4259 3102
## [1527] 2188 3878 3850 1692 2163 3707 4441 3145 4051 3548 2907 3896 5092 2757
## [1541] 99 5467 1000 3648 5265 523 5378 2629 1765 5324 783 5130 4393 2342
## [1555] 2250 5298 4257 2200 3042 4495 750 1295 2355 1238 3706 1804 3505 261
## [1569] 3012 5478 3825 1982 2083 4519 696 4436 5004 3269 3299 2391 2151 5331
## [1583] 4862 1067 2398 4882 2117 2548 1667 5149 1156 616 4687 2989 5253 4698
## [1597] 4025 5097 2474 2509 2059 1037 2750 2576 3100 469 3281 1846 2099 1659
## [1611] 257 314 2358 3352 2094 1495 3284 5243 4838 1847 2574 2191 3451 2770
## [1625] 4561 1313 2636 132 1316 1988 5563 1114 2259 5080 2378 5125 884 596
## [1639] 1303 1940 965 1289 2404 2869 4679 72 1378 1226 3412 4091 3063 501
## [1653] 5166 1283 3004 3018 704 4967 5366 1786 2738 2975 1479 5415 751 4854
## [1667] 2399 357 5618 955 2511 807 405 3573 1239 2950 5104 1961 2524 1741
## [1681] 1123 3563 5311 203 5271 628 3396 4577 2126 96 2470 2529 3674 1487
## [1695] 5122 1476 4490 661 4326 2603 177 2370 1009 1713 734 722 1287 54
## [1709] 5485 3399 4047 2384 4039 3345 2471 58 1006 4924 3683 1685 2743 4583
## [1723] 670 3317 4033 2266 4290 1856 5640 450 2813 4430 1922 4990 1580 4966
## [1737] 2960 4241 3272 4974 2321 1431 2525 2681 4614 3172 2363 3709 110 5550
## [1751] 2067 2484 1166 2248 3015 1185 4143 4123 3556 3267 4103 843 1521 4671
## [1765] 2519 4825 4601 3080 2490 1566 720 1489 5679 1090 4951 648 3163 1493
## [1779] 2613 637 3216 4304 419 2034 371 5168 3740 16 4475 5433 5035 2775
## [1793] 4283 1789 434 1844 3824 764 5348 4406 4892 3386 353 4737 268 2033
## [1807] 4626 276 4428 4695 1719 806 3989 3660 5017 4301 2344 39 5037 3340
## [1821] 1589 4044 4333 4050 4062 3346 4539 4400 1810 2924 2570 3452 4742 4707
## [1835] 1430 1385 263 113 4207 733 4463 4064 5159 3761 4534 792 388 2925
## [1849] 4830 387 1262 3350 420 2252 5060 4608 5110 5172 2040 367 5341 2597
## [1863] 4760 2867 1221 412 3886 2190 5644 1731 2789 5238 5606 5647 976 2465
## [1877] 4389 3570 262 3375 4518 3512 383 1432 5273 888 2553 5507 151 5337
## [1891] 4170 865 5443 3292 1124 4515 607 1070 4375 941 2894 2990 1436 957
## [1905] 4224 1947 3579 5325 2886 1666 2660 5472 5719 827 5453 1101 1011 3192
## [1919] 1410 506 2926 4622 3204 2917 524 78 3305 5064 3577 636 5660 251
## [1933] 685 3094 4300 5257 322 407 2897 4464 3086 2212 1519 1275 4194 492
## [1947] 467 660 2457 1042 3640 5584 3531 5510 1708 2715 3491 4868 1578 1950
## [1961] 4493 4801 4405 3378 1727 929 2418 4581 3208 3357 4206 2598 4205 5281
## [1975] 2557 411 2951 2836 2269 741 198 3279 2595 4394 4630 5139 4551 1608
## [1989] 2080 186 2304 4740 1662 1164 673 1498 3008 3215 5073 4877 5615 4699
## [2003] 3496 4069 5622 3481 2425 4596 2456 3578 1665 4855 1407 3377 1938 2346
## [2017] 1455 2635 395 5306 5284 354 4410 5452 2021 3601 5636 3755 5332 5546
## [2031] 2278 1420 4451 335 5410 4710 4730 170 2489 2357 4291 2987 5032 1633
## [2045] 4981 4845 3945 624 4118 1282 1648 2912 1015 3383 3158 1525 3672 4513
## [2059] 1400 3659 2405 946 5403 4757 1147 3833 3575 2360 5330 2294 4128 4016
## [2073] 4080 4696 2003 3607 3349 4701 5411 3245 4068 2564 3869 3760 3361 3859
## [2087] 394 2124 4471 4445 4963 2980 5449 990 338 4962 4569 3649 4361 437
## [2101] 3314 1044 253 4829 2580 2803 2568 5483 3808 4383 502 92 1598 1161
## [2115] 3635 2923 1404 803 3230 4082 2498 3533 46 1418 1098 2652 5126 3306
## [2129] 1728 2688 4810 723 517 2655 2949 1085 1746 2477 3077 1801 5501 4285
## [2143] 5561 5040 4779 3047 4691 49 1975 1628 1206 2440 3326 1240 1758 5400
## [2157] 1102 3966 3440 2112 5116 4042 1334 1028 3035 488 2244 663 3190 1383
## [2171] 3868 989 3811 3758 1759 2441 4800 4377 3151 1672 2118 983 5562 3424
# The indices of all the untreated patients in the original dataset that were matched to a treated patient.
print("===================================================")
## [1] "==================================================="
=> Are all treated subjects matched to a control one?
Yes, when one computes the length of greedymatch$index.treated, it is equal to 2184 which is the number of treated patients.
matched <- rhc.selected[unlist(greedymatch[c("index.treated","index.control")]), ]
matched[1,] # This is the first treated subject
print("===================================================")
## [1] "==================================================="
matched[length(greedymatch$index.treated)+1,] # => What do you think this subject is?
#This is the first untreated subject.
print("===================================================")
## [1] "==================================================="
Get table 1 for matched data with standardized differences
print("===================================================")
## [1] "==================================================="
matchedtab1 <- CreateTableOne(vars=xvars,strata="treatment.swang", data=matched, test=FALSE) # => Complete the command to create table 1 after matching
# => Print the balance with smd.
print(matchedtab1, smd=TRUE)
## Stratified by treatment.swang
## 0 1 SMD
## n 2184 2184
## ARF (mean (SD)) 0.51 (0.50) 0.42 (0.49) 0.182
## CHF (mean (SD)) 0.10 (0.30) 0.10 (0.29) 0.003
## Cirrhosis (mean (SD)) 0.02 (0.15) 0.02 (0.15) 0.012
## Lungcancer (mean (SD)) 0.00 (0.05) 0.00 (0.05) <0.001
## MOSFM (mean (SD)) 0.07 (0.26) 0.07 (0.26) 0.002
## Coma (mean (SD)) 0.04 (0.20) 0.04 (0.20) 0.002
## cardiohx (mean (SD)) 0.19 (0.39) 0.20 (0.40) 0.039
## chfhx (mean (SD)) 0.17 (0.38) 0.19 (0.40) 0.053
## dementhx (mean (SD)) 0.07 (0.26) 0.07 (0.25) 0.021
## psychhx (mean (SD)) 0.04 (0.21) 0.05 (0.21) 0.009
## chrpulhx (mean (SD)) 0.16 (0.36) 0.14 (0.35) 0.033
## renalhx (mean (SD)) 0.05 (0.21) 0.05 (0.21) 0.002
## liverhx (mean (SD)) 0.06 (0.24) 0.06 (0.24) 0.011
## malighx (mean (SD)) 0.22 (0.42) 0.20 (0.40) 0.048
## immunhx (mean (SD)) 0.28 (0.45) 0.29 (0.45) 0.036
## transhx (mean (SD)) 0.11 (0.31) 0.15 (0.36) 0.116
## age (mean (SD)) 61.09 (17.10) 60.75 (15.63) 0.020
## edu (mean (SD)) 11.66 (2.90) 11.86 (3.16) 0.065
## das2d3pc (mean (SD)) 20.57 (5.43) 20.70 (5.03) 0.025
## aps1 (mean (SD)) 52.66 (18.31) 60.74 (20.27) 0.418
## scoma1 (mean (SD)) 17.58 (27.03) 18.97 (28.26) 0.050
## meanbp1 (mean (SD)) 79.50 (36.09) 68.20 (34.24) 0.321
## wblc1 (mean (SD)) 15.02 (10.93) 16.27 (12.55) 0.106
## temp1 (mean (SD)) 37.72 (1.75) 37.59 (1.83) 0.072
## alb1 (mean (SD)) 3.10 (0.68) 2.98 (0.93) 0.152
## hema1 (mean (SD)) 31.38 (7.68) 30.51 (7.42) 0.116
## bili1 (mean (SD)) 2.11 (4.47) 2.71 (5.33) 0.122
## crea1 (mean (SD)) 2.02 (1.93) 2.47 (2.05) 0.228
## pot1 (mean (SD)) 4.02 (0.97) 4.05 (1.01) 0.032
## paco21 (mean (SD)) 38.09 (10.76) 36.79 (10.97) 0.119
## ph1 (mean (SD)) 7.40 (0.09) 7.38 (0.11) 0.181
## wtkilo1 (mean (SD)) 68.27 (26.34) 72.36 (27.73) 0.151
## dnr1 (mean (SD)) 0.08 (0.27) 0.07 (0.26) 0.031
## resp (mean (SD)) 0.38 (0.49) 0.29 (0.45) 0.189
## card (mean (SD)) 0.33 (0.47) 0.42 (0.49) 0.193
## gastr (mean (SD)) 0.15 (0.36) 0.19 (0.39) 0.109
## renal (mean (SD)) 0.05 (0.22) 0.07 (0.25) 0.074
## hema (mean (SD)) 0.05 (0.22) 0.05 (0.22) 0.008
## seps (mean (SD)) 0.17 (0.38) 0.24 (0.42) 0.153
## trauma (mean (SD)) 0.01 (0.09) 0.02 (0.12) 0.068
# => After matching how many "bad" variables with SMD > 0.2 are there?
smd_values2 <- ExtractSmd(matchedtab1)
smd_values2[smd_values2 > 0.2, ]
## aps1 meanbp1 crea1
## 0.4181082 0.3213080 0.2283157
# print("===================================================")
print("===================================================")
## [1] "==================================================="
y_trt <- matched[matched$treatment.swang == 1, 'outcome.died'] # => Put here the vector of outcomes y (from matched$outcome.died) of those in the treatment group
y_con <- matched[matched$treatment.swang == 0, 'outcome.died'] # Put here the vector of outcomes y (from matched$outcome.died) of those in the control group
print("===================================================")
## [1] "==================================================="
diffy <- y_trt-y_con # pairwise difference
t.test(diffy)
##
## One Sample t-test
##
## data: diffy
## t = 6.6885, df = 2183, p-value = 2.855e-11
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.06310748 0.11546394
## sample estimates:
## mean of x
## 0.08928571
=> Look at the p-value. Is it below 0.05? => What does this mean?
It is indeed below 0.05, so the null hypothesis that the true mean difference is zero can be rejected. This means that there is a difference between the treatment and control groups.
table(y_trt,y_con)
## y_con
## y_trt 0 1
## 0 362 336
## 1 531 955
# In this table you will see the 2x2 table:
# y_con
# ytreat 0 1
# 0 a b
# 1 c d
# Enter these a, b, c, and d numbers in the following McNemar test
# print("===================================================")
mcnemar.test(matrix(c(362,336,531,955),2,2)) # => Complete the command by filling the a, b, c, and d numbers in the correct order.
##
## McNemar's Chi-squared test with continuity correction
##
## data: matrix(c(362, 336, 531, 955), 2, 2)
## McNemar's chi-squared = 43.409, df = 1, p-value = 4.44e-11
# => Is the p-value less than 0.05? What does this result mean?
# Yes it is less than 0.05 indicating there is a difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment has an impact on the outcome.
# print("===================================================")
gmodel1 <- glm(outcome.died ~ treatment.swang, family=binomial, data=matched)
print("===================================================")
## [1] "==================================================="
# => Is the coefficient of treatment significant? you can use summary(gmodel1) or confint(gmodel1)
summary(gmodel1)
##
## Call:
## glm(formula = outcome.died ~ treatment.swang, family = binomial,
## data = matched)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.36859 0.04352 8.468 < 2e-16 ***
## treatment.swang 0.38704 0.06325 6.120 9.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5729.2 on 4367 degrees of freedom
## Residual deviance: 5691.6 on 4366 degrees of freedom
## AIC: 5695.6
##
## Number of Fisher Scoring iterations: 4
# => What is the odds ratio that is associated with treatment? You can easily derive it from the coefficient (Google otherwise how to get the odds ratio from the coefficient)
exp(coef(gmodel1)[2])
## treatment.swang
## 1.472613
# => If the coefficient is positive, what does that mean in terms of odds ratio?
# The odds ratio for each coefficient represents the average increase in the odds of an individual defaulting, assuming all other predictor variables are held constant. It is 1.472613, meaning that the odds of the outcome (dying) occurring in the treatment group are 1.472613 times higher than the odds of the outcome (dying) occurring in the control group.
# => If the coefficient is negative, what does that mean in terms of odds ratio?
# The odds ratio for each coefficient represents the average decrease in the odds of an individual defaulting, assuming all other predictor variables are held constant. It thus indicates the event is less likely to occur.
# The great thing about modelling is that we can now again adjust for the variables with the worst SMD. This is clled doubly-robust estimation. Adjust for the variable with the worst SMD that you encountered before. To adjust you just add the name of the variable in the logistic regression formula. => Complete the command
gmodel2 <- glm(outcome.died ~ treatment.swang + aps1, family=binomial, data=matched)
# => What is the odds ratio for treatment now?
exp(coef(gmodel2)[2])
## treatment.swang
## 1.248695
# print("===================================================")
Fit a propensity score model to predict treatment (not the outcome!). Use logistic regression with glm
print("===================================================")
## [1] "==================================================="
xvars <- c(xvars)
formula <- as.formula(paste("treatment.swang ~", paste(xvars, collapse = " + ")))
psmodel <- glm(formula, data = rhc.selected, family = binomial)
# => Use the variables we found before in xvars in the first assignment to predict treatment
print("===================================================")
## [1] "==================================================="
summary(psmodel) # show coefficients etc
##
## Call:
## glm(formula = formula, family = binomial, data = rhc.selected)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.1435181 2.8001589 3.265 0.001093 **
## ARF -0.1993146 0.0790906 -2.520 0.011733 *
## CHF 0.1105408 0.1506968 0.734 0.463235
## Cirrhosis -1.4265886 0.2289511 -6.231 4.64e-10 ***
## Lungcancer -0.8240366 0.5053386 -1.631 0.102962
## MOSFM -0.2802613 0.1475816 -1.899 0.057561 .
## Coma -0.7781231 0.1655087 -4.701 2.58e-06 ***
## cardiohx 0.0994467 0.0900903 1.104 0.269656
## chfhx -0.0015423 0.0982163 -0.016 0.987471
## dementhx -0.4737206 0.1143868 -4.141 3.45e-05 ***
## psychhx -0.4843293 0.1318953 -3.672 0.000241 ***
## chrpulhx -0.2169838 0.0903336 -2.402 0.016304 *
## renalhx -0.3680069 0.1724127 -2.134 0.032806 *
## liverhx -0.2261681 0.1701381 -1.329 0.183742
## malighx -0.2885066 0.0871562 -3.310 0.000932 ***
## immunhx 0.0148209 0.0700495 0.212 0.832437
## transhx 0.5068504 0.0935418 5.418 6.01e-08 ***
## age 0.0009015 0.0020687 0.436 0.663000
## edu 0.0311376 0.0098326 3.167 0.001541 **
## das2d3pc 0.0063801 0.0062066 1.028 0.303973
## aps1 0.0180990 0.0022395 8.082 6.38e-16 ***
## scoma1 -0.0008406 0.0012483 -0.673 0.500704
## meanbp1 -0.0073053 0.0009272 -7.879 3.31e-15 ***
## wblc1 0.0017317 0.0025814 0.671 0.502322
## temp1 -0.0084141 0.0182504 -0.461 0.644773
## alb1 -0.1039610 0.0459299 -2.263 0.023607 *
## hema1 -0.0180316 0.0043474 -4.148 3.36e-05 ***
## bili1 0.0035635 0.0070830 0.503 0.614897
## crea1 0.0224782 0.0203248 1.106 0.268750
## pot1 -0.1689508 0.0320553 -5.271 1.36e-07 ***
## paco21 -0.0188045 0.0032662 -5.757 8.55e-09 ***
## ph1 -1.1484028 0.3601799 -3.188 0.001431 **
## wtkilo1 0.0071622 0.0011498 6.229 4.69e-10 ***
## dnr1 -0.7060266 0.1101245 -6.411 1.44e-10 ***
## resp -0.1985399 0.0752337 -2.639 0.008316 **
## card 0.6931842 0.0780785 8.878 < 2e-16 ***
## gastr 0.4735664 0.0991902 4.774 1.80e-06 ***
## renal 0.2907049 0.1427665 2.036 0.041728 *
## hema -0.5385458 0.1394851 -3.861 0.000113 ***
## seps 0.2352497 0.0864182 2.722 0.006484 **
## trauma 1.3864699 0.3167119 4.378 1.20e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7621.4 on 5734 degrees of freedom
## Residual deviance: 6463.1 on 5694 degrees of freedom
## AIC: 6545.1
##
## Number of Fisher Scoring iterations: 4
pscore <- psmodel$fitted.values # create propensity score
print("===================================================")
## [1] "==================================================="
# => plot the density of the propensity scores of the treated and untreated. Tip: you might want to use par(mfrow=c(2,1)) to create 2 panels (2 rows, 1 column) and then plotting the first hist (histogram), with say, ylim=c(0,650) and col='darkblue', and the second histogram with ylim=c(650, 0) (note that we flipped the order) and col='darkblue'. This will plot the blue histogram as a normal histogram and the red one will be plotted upside down underneath the blue one so you can easily see the overlap.
par(mfrow=c(2,1))
hist(pscore[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(pscore[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")
# => What do you think about the overlap?
# For every propensity score there seems to be at least one subject in the treated group and one in the untreated group, so the positivity assumption is reasonable.
print("===================================================")
## [1] "==================================================="
We want to do greedy matching on PS using Match with a caliper. Note that instead of the probability PS itself, we usually use the log odds (logit) of PS which is log(PS/(1-PS)) which is also equal to log(PS)-log(1-PS) . So let us first write a function (of one line) to calculate the logit.
print("===================================================")
## [1] "==================================================="
logit <- function(p) {
log(p/(1-p))
} # => Complete this function to calculate the log odds
print("===================================================")
## [1] "==================================================="
Now we want to use Match() with rhc.selected$treatment.swang as treatement, with 1 matched control per treated subject, with X as the logit(pscore), with no replacement and with a caliper of 0.2
print("===================================================")
## [1] "==================================================="
psmatch <- Match(Tr=rhc.selected$treatment.swang, X = logit(pscore), M = 1, caliper = 0.2, replace = FALSE) # => complete the command
print("===================================================")
## [1] "==================================================="
matched <- rhc.selected[unlist(psmatch[c("index.treated","index.control")]), ]
print("===================================================")
## [1] "==================================================="
matchedtab1 <- CreateTableOne(vars=xvars, strata ="treatment.swang",
data=matched, test = FALSE) # => Complete the command
# => print the matching table showing the SMD
print(matchedtab1, smd=TRUE)
## Stratified by treatment.swang
## 0 1 SMD
## n 1734 1734
## ARF (mean (SD)) 0.45 (0.50) 0.44 (0.50) 0.034
## CHF (mean (SD)) 0.11 (0.31) 0.10 (0.30) 0.019
## Cirrhosis (mean (SD)) 0.03 (0.17) 0.03 (0.16) 0.010
## Lungcancer (mean (SD)) 0.00 (0.07) 0.00 (0.05) 0.028
## MOSFM (mean (SD)) 0.08 (0.27) 0.08 (0.27) 0.004
## Coma (mean (SD)) 0.04 (0.21) 0.05 (0.21) 0.016
## cardiohx (mean (SD)) 0.21 (0.41) 0.20 (0.40) 0.023
## chfhx (mean (SD)) 0.20 (0.40) 0.20 (0.40) 0.013
## dementhx (mean (SD)) 0.07 (0.26) 0.07 (0.26) 0.009
## psychhx (mean (SD)) 0.05 (0.23) 0.05 (0.22) 0.010
## chrpulhx (mean (SD)) 0.15 (0.36) 0.16 (0.36) 0.014
## renalhx (mean (SD)) 0.05 (0.22) 0.05 (0.22) 0.011
## liverhx (mean (SD)) 0.06 (0.25) 0.06 (0.25) <0.001
## malighx (mean (SD)) 0.23 (0.42) 0.22 (0.41) 0.030
## immunhx (mean (SD)) 0.29 (0.46) 0.28 (0.45) 0.022
## transhx (mean (SD)) 0.12 (0.32) 0.13 (0.34) 0.038
## age (mean (SD)) 61.09 (17.36) 60.85 (15.62) 0.014
## edu (mean (SD)) 11.83 (3.21) 11.77 (3.18) 0.018
## das2d3pc (mean (SD)) 20.70 (5.62) 20.66 (5.04) 0.007
## aps1 (mean (SD)) 57.75 (19.26) 57.38 (19.23) 0.019
## scoma1 (mean (SD)) 18.91 (28.79) 18.62 (27.91) 0.010
## meanbp1 (mean (SD)) 72.63 (34.73) 72.83 (35.40) 0.006
## wblc1 (mean (SD)) 15.77 (12.82) 15.82 (12.52) 0.004
## temp1 (mean (SD)) 37.62 (1.93) 37.63 (1.75) 0.005
## alb1 (mean (SD)) 3.03 (0.71) 3.03 (0.95) 0.002
## hema1 (mean (SD)) 30.65 (8.00) 30.89 (7.51) 0.030
## bili1 (mean (SD)) 2.49 (5.41) 2.49 (4.98) 0.001
## crea1 (mean (SD)) 2.32 (2.34) 2.31 (2.00) 0.003
## pot1 (mean (SD)) 4.06 (1.04) 4.05 (0.99) 0.007
## paco21 (mean (SD)) 37.19 (11.17) 37.40 (11.26) 0.019
## ph1 (mean (SD)) 7.39 (0.11) 7.39 (0.11) 0.007
## wtkilo1 (mean (SD)) 70.11 (26.23) 70.84 (27.25) 0.028
## dnr1 (mean (SD)) 0.09 (0.29) 0.08 (0.28) 0.031
## resp (mean (SD)) 0.32 (0.47) 0.33 (0.47) 0.007
## card (mean (SD)) 0.40 (0.49) 0.40 (0.49) <0.001
## gastr (mean (SD)) 0.18 (0.39) 0.18 (0.38) 0.010
## renal (mean (SD)) 0.06 (0.24) 0.06 (0.24) 0.002
## hema (mean (SD)) 0.06 (0.25) 0.06 (0.24) 0.017
## seps (mean (SD)) 0.21 (0.41) 0.21 (0.41) 0.007
## trauma (mean (SD)) 0.01 (0.10) 0.01 (0.11) 0.011
# => Are there any variables with bad balance now?
smd_values3 <- ExtractSmd(matchedtab1)
smd_values3[smd_values3 > 0.2, ]
## numeric(0)
# There are no variables with bad balance now
print("===================================================")
## [1] "==================================================="
print("===================================================")
## [1] "==================================================="
y_trt.ps <- matched[matched$treatment.swang == 1, 'outcome.died']
# => Put here the vector of outcomes y (from matched$outcome.died) of those in the treatment group
y_con.ps <- matched[matched$treatment.swang == 0, 'outcome.died']
# Put here the vector of outcomes y (from matched$outcome.died) of those in the control group
print("===================================================")
## [1] "==================================================="
Perform a t-test
# Perform paired t-test. Just to get a feeling of the result
diffy.ps <- y_trt.ps - y_con.ps # pairwise difference
t.test(diffy.ps)
##
## One Sample t-test
##
## data: diffy.ps
## t = 3.4305, df = 1733, p-value = 0.0006167
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.02370986 0.08701678
## sample estimates:
## mean of x
## 0.05536332
print("===================================================")
## [1] "==================================================="
# => Is the test significant?
# The test is significant, meaning that the null hypothesis can be rejected and the true mean is not equal to 0 ( = there is a difference between the two groups), since the p-value is less than 0.05.
print("===================================================")
## [1] "==================================================="
Use the McNemar test as before. Enter the a, b, c, and d numbers you get from the 2x2 table in the McNemar test
print("===================================================")
## [1] "==================================================="
mccnemar_tab <- table(y_trt.ps,y_con.ps)
mcnemar.test(matrix(c(mccnemar_tab[1,1], mccnemar_tab[1,2], mccnemar_tab[2,1], mccnemar_tab[2,2]),2,2)) # => Complete the command
##
## McNemar's Chi-squared test with continuity correction
##
## data: matrix(c(mccnemar_tab[1, 1], mccnemar_tab[1, 2], mccnemar_tab[2, 1], mccnemar_tab[2, 2]), 2, 2)
## McNemar's chi-squared = 11.453, df = 1, p-value = 0.0007138
# => Is the p-value less than 0.05? What does this result mean?
# Yes it is less than 0.05 indicating there is a difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment has an impact on the outcome.
# => Compare the difference in the means before and after matching.
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])
## [1] 0.6804029
mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])
## [1] 0.6296818
mean(matched[matched$treatment.swang == 1, 'outcome.died'])
## [1] 0.6799308
mean(matched[matched$treatment.swang == 0, 'outcome.died'])
## [1] 0.6245675
# Means are almost the same.
print("===================================================")
## [1] "==================================================="
Use logistic regression
gmodel3 <- glm(outcome.died ~ treatment.swang, family="binomial", data = matched)
print("===================================================")
## [1] "==================================================="
# => What is the odds ratio of treatment?
exp(coef(gmodel3)[2])
## treatment.swang
## 1.276948
# => Now correct also for the least well balanced variable
gmodel3 <- glm(outcome.died ~ treatment.swang + malighx, family="binomial", data = matched)
# => What is the odds ratio of treatment now?
exp(coef(gmodel3)[2])
## treatment.swang
## 1.306322
print("===================================================")
## [1] "==================================================="
Instead of the glm to calculate the propensity score use any machine learning technique you like such as a regression tree (using rpart), or random forests (randomForest) or neural networks (neuralnet) or another formalism in R to create the propensity score model and to predict the probabilities on such a model. Note that you might need to tweak the model parameters in order to improve it. Compare your results to those you obtained by matching (via the Mahalanobis distance and the propensity score with logistic regression). Which approach do you prefer, and why?
#install.packages("randomForest")
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
rhc.selected$treatment.swang <- factor(rhc.selected$treatment.swang)
formula <- as.formula(paste("treatment.swang ~", paste(xvars, collapse = " + ")))
rf.out <- randomForest(formula = formula, data=rhc.selected)
eps <- rf.out$votes[,2]
rf.out
##
## Call:
## randomForest(formula = formula, data = rhc.selected)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 6
##
## OOB estimate of error rate: 28.63%
## Confusion matrix:
## 0 1 class.error
## 0 3029 522 0.1470008
## 1 1120 1064 0.5128205
par(mfrow=c(2,2))
hist(eps[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(pscore[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(eps[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")
hist(pscore[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")
rhc.selected$treatment.swang <- as.numeric(rhc.selected$treatment.swang) - 1
epsmatch <- Match(Tr=rhc.selected$treatment.swang, X = logit(eps), M = 1, caliper = 0.2, replace = FALSE)
matched1 <- rhc.selected[unlist(epsmatch[c("index.treated","index.control")]), ]
matchedtab2 <- CreateTableOne(vars=xvars, strata ="treatment.swang",
data=matched1, test = FALSE) # => Complete the command
print(matchedtab2, smd=TRUE)
## Stratified by treatment.swang
## 0 1 SMD
## n 1674 1674
## ARF (mean (SD)) 0.46 (0.50) 0.43 (0.50) 0.054
## CHF (mean (SD)) 0.08 (0.27) 0.10 (0.29) 0.051
## Cirrhosis (mean (SD)) 0.04 (0.19) 0.02 (0.15) 0.085
## Lungcancer (mean (SD)) 0.00 (0.07) 0.00 (0.05) 0.029
## MOSFM (mean (SD)) 0.09 (0.28) 0.08 (0.27) 0.039
## Coma (mean (SD)) 0.05 (0.23) 0.05 (0.21) 0.041
## cardiohx (mean (SD)) 0.19 (0.40) 0.20 (0.40) 0.015
## chfhx (mean (SD)) 0.19 (0.40) 0.19 (0.39) 0.005
## dementhx (mean (SD)) 0.10 (0.30) 0.07 (0.26) 0.103
## psychhx (mean (SD)) 0.07 (0.25) 0.05 (0.21) 0.092
## chrpulhx (mean (SD)) 0.15 (0.36) 0.15 (0.36) 0.003
## renalhx (mean (SD)) 0.06 (0.24) 0.05 (0.22) 0.044
## liverhx (mean (SD)) 0.07 (0.26) 0.06 (0.24) 0.046
## malighx (mean (SD)) 0.26 (0.44) 0.22 (0.41) 0.098
## immunhx (mean (SD)) 0.31 (0.46) 0.29 (0.45) 0.046
## transhx (mean (SD)) 0.11 (0.31) 0.14 (0.34) 0.080
## age (mean (SD)) 61.42 (16.47) 61.03 (16.05) 0.024
## edu (mean (SD)) 11.66 (3.13) 11.80 (3.17) 0.043
## das2d3pc (mean (SD)) 20.49 (5.37) 20.59 (5.10) 0.019
## aps1 (mean (SD)) 58.99 (18.97) 57.62 (19.29) 0.071
## scoma1 (mean (SD)) 20.22 (29.64) 19.06 (28.22) 0.040
## meanbp1 (mean (SD)) 73.12 (35.78) 73.34 (35.98) 0.006
## wblc1 (mean (SD)) 16.28 (13.18) 16.07 (12.65) 0.016
## temp1 (mean (SD)) 37.65 (1.88) 37.62 (1.78) 0.019
## alb1 (mean (SD)) 3.02 (0.71) 3.02 (0.95) 0.005
## hema1 (mean (SD)) 30.55 (8.06) 30.69 (7.53) 0.018
## bili1 (mean (SD)) 2.80 (5.99) 2.49 (5.04) 0.055
## crea1 (mean (SD)) 2.41 (2.21) 2.28 (2.01) 0.063
## pot1 (mean (SD)) 4.14 (1.12) 4.03 (1.00) 0.102
## paco21 (mean (SD)) 37.40 (11.61) 37.29 (11.33) 0.009
## ph1 (mean (SD)) 7.39 (0.11) 7.39 (0.11) 0.006
## wtkilo1 (mean (SD)) 70.15 (25.91) 69.90 (27.40) 0.009
## dnr1 (mean (SD)) 0.11 (0.31) 0.08 (0.27) 0.104
## resp (mean (SD)) 0.36 (0.48) 0.33 (0.47) 0.062
## card (mean (SD)) 0.36 (0.48) 0.39 (0.49) 0.067
## gastr (mean (SD)) 0.18 (0.38) 0.19 (0.40) 0.041
## renal (mean (SD)) 0.07 (0.26) 0.07 (0.25) 0.014
## hema (mean (SD)) 0.08 (0.27) 0.06 (0.23) 0.093
## seps (mean (SD)) 0.20 (0.40) 0.21 (0.41) 0.022
## trauma (mean (SD)) 0.01 (0.09) 0.02 (0.12) 0.066
smd_values4 <- ExtractSmd(matchedtab2)
smd_values4[smd_values4 > 0.2, ]
## numeric(0)
y_trt.eps <- matched1[matched1$treatment.swang == 1, 'outcome.died']
y_con.eps <- matched1[matched1$treatment.swang == 0, 'outcome.died']
# Perform paired t-test. Just to get a feeling of the result
diffy.eps <- y_trt.eps - y_con.eps # pairwise difference
t.test(diffy.eps)
##
## One Sample t-test
##
## data: diffy.eps
## t = 1.1228, df = 1673, p-value = 0.2617
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.01338447 0.04922676
## sample estimates:
## mean of x
## 0.01792115
#t-test is not significant, so H0 cannot be rejected and the true mean could still be equal to zero, indicating no difference between the two groups
mccnemar_tab1 <- table(y_trt.eps,y_con.eps)
mcnemar.test(matrix(c(mccnemar_tab1[1,1], mccnemar_tab1[1,2], mccnemar_tab1[2,1], mccnemar_tab1[2,2]),2,2))
##
## McNemar's Chi-squared test with continuity correction
##
## data: matrix(c(mccnemar_tab1[1, 1], mccnemar_tab1[1, 2], mccnemar_tab1[2, 1], mccnemar_tab1[2, 2]), 2, 2)
## McNemar's chi-squared = 1.1779, df = 1, p-value = 0.2778
#not significant/or significant, so H0 cannot be rejected so there could be no difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment doesn't necessarily have an impact on the outcome.
# => Compare the difference in the means before and after matching.
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])
## [1] 0.6804029
mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])
## [1] 0.6296818
mean(matched[matched$treatment.swang == 1, 'outcome.died'])
## [1] 0.6799308
mean(matched[matched$treatment.swang == 0, 'outcome.died'])
## [1] 0.6245675
mean(matched1[matched1$treatment.swang == 1, 'outcome.died'])
## [1] 0.6780167
mean(matched1[matched1$treatment.swang == 0, 'outcome.died'])
## [1] 0.6600956
#There is less difference in the means using random forest to create the propensity score and match on these scores. This makes sense since there is no significant difference in the means.
gmodel4 <- glm(outcome.died ~ treatment.swang, family="binomial", data = matched1)
# => What is the odds ratio of treatment?
exp(coef(gmodel4)[2])
## treatment.swang
## 1.084319
# => Now correct also for the least well balanced variable
gmodel4 <- glm(outcome.died ~ treatment.swang + transhx, family="binomial", data = matched1)
# => What is the odds ratio of treatment now?
exp(coef(gmodel4)[2])
## treatment.swang
## 1.101813
#install.packages("ggplot2")
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked _by_ '.GlobalEnv':
##
## table1
dataPlot <- data.frame(covariates = rownames(smd_values3),
Greedy = as.numeric(smd_values2),
"PS with LR" = as.numeric(smd_values3),
"PS with RF" = as.numeric(smd_values4))
dataPlotMelt <- pivot_longer(data = dataPlot, cols = -covariates, names_to = "Method", values_to = "SMD")
varNames <- as.character(dataPlot$covariates)[order(dataPlot$Greedy)]
dataPlotMelt$variable <- factor(dataPlotMelt$covariates,
levels = varNames)
ggplot(data = dataPlotMelt,
mapping = aes(x = covariates, y = SMD, group = Method, color = Method)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.2, color = "black", linewidth = 0.1) +
coord_flip() +
theme_bw() +
theme(legend.key = element_blank())
# After matching on the propensity score created by random forest, there are no covariates left with a bad balance. However, there is no significant difference between the means of the two groups. This indicates that the covariates with a bad balance do influence the outcome (confounder variables) when using greedy matching.