In this assignment we will exercise causal inference on the RHC (Right Heart Catheterization) dataset This dataset was used in Connors et al. (1996): The effectiveness of RHC in the initial care of critically ill patients. J American Medical Association 276:889-897. The dataset pertains to day 1 of hospitalization, i.e., the “treatment” variable swang1 is whether or not a patient received a RHC (also called the Swan-Ganz catheter) on the first day in which the patient qualified for a specific research study called SUPPORT.

What should you submit? You need to submit: 1. This Notebook (in .Rmd format) after filling in all your answers inside this notebook. You are required to do an action (write or fill-in code or write documentation) at the places marked by an arrow (=>). This arrow appears within the comments (and it will be preceeded by “#”) or outside the comments in the text (without “#”). 2. The Notebook in html format with all the results.

==> Student names: Florence Cornelissen, Myrte Kuipers

==> Student numbers: 12186317, 11869089

Open part

Understand the datase:

Please Open https://hbiostat.org/data/repo/rhc.html and take a good look at the variables. Make sure you inspect the variables included in the dataset (they appear in a tabel below the text on the website). You can refer to the table whenever needed.

Packages: install packages if needed

#install.packages("tableone")
#install.packages("Matching")
#install.packages("dplyr")
# Recommended: add "dplyr" if you have used it before to make efficient preprocessing.

Load packages

library(tableone) # This library allows us to calculate summary statistics about the variables including the SMD (standardized mean difference).
library(Matching) # This library will do the matching

## Loading required package: MASS

## ## 
## ##  Matching (Version 4.10-8, Build Date: 2022-11-03)
## ##  See http://sekhon.berkeley.edu/matching for additional documentation.
## ##  Please cite software as:
## ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ##   Software with Automated Balance Optimization: The Matching package for R.''
## ##   Journal of Statistical Software, 42(7): 1-52. 
## ##

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:MASS':
## 
##     select

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read database

# Download rhc dataset from CANVAS and read it
rhc <- read.csv("rhc.csv")

Understanding the dataset

Tip: Look at the data, use the “View” command on the dataset “rhc”

print("===================================================")

## [1] "==================================================="

# => how many rows does rhc contain? write a print command
nrow(rhc)

## [1] 5735

# => What are the variable names? write a print command
ls(rhc)

##  [1] "adld3p"   "age"      "alb1"     "amihx"    "aps1"     "bili1"   
##  [7] "ca"       "card"     "cardiohx" "cat1"     "cat2"     "chfhx"   
## [13] "chrpulhx" "crea1"    "das2d3pc" "death"    "dementhx" "dnr1"    
## [19] "dschdte"  "dth30"    "dthdte"   "edu"      "gastr"    "gibledhx"
## [25] "hema"     "hema1"    "hrt1"     "immunhx"  "income"   "liverhx" 
## [31] "lstctdte" "malighx"  "meanbp1"  "meta"     "neuro"    "ninsclas"
## [37] "ortho"    "paco21"   "pafi1"    "ph1"      "pot1"     "psychhx" 
## [43] "ptid"     "race"     "renal"    "renalhx"  "resp"     "resp1"   
## [49] "sadmdte"  "scoma1"   "seps"     "sex"      "sod1"     "surv2md1"
## [55] "swang1"   "t3d30"    "temp1"    "transhx"  "trauma"   "urin1"   
## [61] "wblc1"    "wtkilo1"  "X"

# print("===================================================")

For the assignment the TREATMENT variable is swang1 and the OUTCOME variable is death.

Preprocess

If you look at the values of the dichotomous variables you will see that they are factors (in other words, categorical variables). We would like to work with numeric variables in this exercise so we need to perform some conversions.

Let’s start with the treatement and outcome variables. Create an “outcome.died” and “swang.treatment” variables with the numeric values 0 and 1.

rhc$outcome.died <- as.numeric(rhc$death=='Yes')  # Again you might want, instead, to do all preprocessing with dplyr if you prefer
rhc$treatment.swang <- as.numeric(rhc$swang1=='RHC')

Check

Let’s take a quick look into the dataset using these variables

print("===================================================")

## [1] "==================================================="

# => What is the mean of outcome.died in the whole data set?
mean(rhc$outcome.died)

## [1] 0.6489974

# => How many subjects are treated?
sum(rhc$treatment.swang)

## [1] 2184

# =>  What is the mean mortality (outcome.died) in this treated group?
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])

## [1] 0.6804029

# => How many subjects are not treated (i.e. in the control group)?
sum(rhc$treatment.swang == 0)

## [1] 3551

# => What is the mean mortality in this control group?
mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])

## [1] 0.6296818

# print("===================================================")

The variable cat1 is a factor (categorical variable) containing the primary disease category for the patient.

print("===================================================")

## [1] "==================================================="

# => How many different categories are in Cat1? Tip: use the levels() function on this variable.
levels(factor(rhc$cat1))

## [1] "ARF"               "CHF"               "Cirrhosis"        
## [4] "Colon Cancer"      "Coma"              "COPD"             
## [7] "Lung Cancer"       "MOSF w/Malignancy" "MOSF w/Sepsis"

length(levels(factor(rhc$cat1)))

## [1] 9

# print("===================================================")

We would like to “translate” Cat1 into a set of variables each with the numeric values of 0/1. For example for the value of “ARF” we will create a variable “ARF” that will be 1 if rhc$cat1==‘ARF’.

rhc$ARF <- as.numeric(rhc$cat1=='ARF')
print("===================================================")

## [1] "==================================================="

# => Do the same (as we did for ARF) also for all other values of cat1 but ignore the category "Missing".

rhc$CHF <- as.numeric(rhc$cat1=='CHF')
rhc$Cirrhosis <- as.numeric(rhc$cat1=='Cirrhosis')
rhc$COPD <- as.numeric(rhc$cat1=='COPD')
rhc$Lungcancer <- as.numeric(rhc$cat1=='Lung Cancer')
rhc$MOSFM <- as.numeric(rhc$cat1=='MOSF w/Malignancy')
rhc$Coloncancer <- as.numeric(rhc$cat1=='Colon Cancer')
rhc$Coma <- as.numeric(rhc$cat1=='Coma')
rhc$MOSFS <- as.numeric(rhc$cat1=='MOSF w/Sepsis')
# print("===================================================")

Create data frame with variables of interest

==> Create a data frame and call it rhc.small with the following variables of interest: All the underlying categories of “cat1” (ARF etc) that you created before, “cardiohx”, “chfhx”, “dementhx”, “psychhx”, “chrpulhx”, “renalhx”, “liverhx”, “gibledhx”, “malighx”, “immunhx”, “transhx”, “amihx”, “age”, “female”, “edu”, “das2d3pc”, “aps1”, “scoma1”, “meanbp1”, wblc1”, “hrt1”, “resp1”, “temp1”, “pafi1”, “alb1”, “hema1”, “bili1”, “crea1”, “sod1”, “pot1”, “paco21”, “ph1”, “wtkilo1”, “dnr1”, “resp”, “card”, “neuro”, “gastr”, “renal”, “meta”, “hema”, “seps”, “trauma”, “ortho”, “adld3p”, “urin1”, “treatment.swang”, “outcome.died”.

rhc.small <- rhc[,c("ARF", "CHF","Cirrhosis","COPD", "Lungcancer", "MOSFM", "Coloncancer", "Coma", "MOSFS", "cardiohx", "chfhx", "dementhx", "psychhx", "chrpulhx", "renalhx", "liverhx", "gibledhx", "malighx", "immunhx", "transhx", "amihx", "age", "edu", "das2d3pc", "aps1", "scoma1", "meanbp1", "wblc1", "hrt1", "resp1", "temp1", "pafi1", "alb1", "hema1", "bili1", "crea1", "sod1", "pot1", "paco21", "ph1", "wtkilo1", "dnr1", "resp", "card", "neuro", "gastr", "renal", "meta", "hema", "seps", "trauma", "ortho", "adld3p", "urin1", "treatment.swang", "outcome.died")]
#=> Create this data frame

Make sure you take a look at the definition of variables (on the URL given above) with unclear names to you. This data frame includes now all candidate variables from which we will select later for matching. However, before we jump into variable selection, let us continue data type conversion.

Check classes of these variables by using the str command

str(rhc.small)

## 'data.frame':    5735 obs. of  56 variables:
##  $ ARF            : num  0 0 0 1 0 0 0 1 0 1 ...
##  $ CHF            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Cirrhosis      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COPD           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ Lungcancer     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ MOSFM          : num  0 0 1 0 0 0 1 0 1 0 ...
##  $ Coloncancer    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Coma           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ MOSFS          : num  0 1 0 0 1 0 0 0 0 0 ...
##  $ cardiohx       : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ chfhx          : int  0 1 0 0 0 1 0 0 0 0 ...
##  $ dementhx       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ psychhx        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ chrpulhx       : int  1 0 0 0 0 1 0 0 0 0 ...
##  $ renalhx        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ liverhx        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ gibledhx       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ malighx        : int  1 0 1 0 0 0 1 0 0 1 ...
##  $ immunhx        : int  0 1 1 1 0 0 0 0 0 0 ...
##  $ transhx        : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ amihx          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age            : num  70.3 78.2 46.1 75.3 67.9 ...
##  $ edu            : num  12 12 14.07 9 9.95 ...
##  $ das2d3pc       : num  23.5 14.8 18.1 22.9 21.1 ...
##  $ aps1           : int  46 50 82 48 72 38 29 25 47 48 ...
##  $ scoma1         : int  0 0 0 0 41 0 26 100 0 0 ...
##  $ meanbp1        : num  41 63 57 55 65 115 67 128 53 73 ...
##  $ wblc1          : num  22.1 28.9 0.05 23.3 29.7 ...
##  $ hrt1           : int  124 137 130 58 125 134 135 102 118 141 ...
##  $ resp1          : num  10 38 40 26 27 36 10 34 30 40 ...
##  $ temp1          : num  38.7 38.9 36.4 35.8 34.8 ...
##  $ pafi1          : num  68 218 276 157 478 ...
##  $ alb1           : num  3.5 2.6 3.5 3.5 3.5 ...
##  $ hema1          : num  58 32.5 21.1 26.3 24 ...
##  $ bili1          : num  1.01 0.7 1.01 0.4 1.01 ...
##  $ crea1          : num  1.2 0.6 2.6 1.7 3.6 ...
##  $ sod1           : int  145 137 146 117 126 138 136 136 136 146 ...
##  $ pot1           : num  4 3.3 2.9 5.8 5.8 ...
##  $ paco21         : num  40 34 16 30 17 68 45 26 40 30 ...
##  $ ph1            : num  7.36 7.33 7.36 7.46 7.23 ...
##  $ wtkilo1        : num  64.7 45.7 0 54.6 78.4 ...
##  $ dnr1           : chr  "No" "No" "No" "No" ...
##  $ resp           : chr  "Yes" "No" "No" "Yes" ...
##  $ card           : chr  "Yes" "No" "Yes" "No" ...
##  $ neuro          : chr  "No" "No" "No" "No" ...
##  $ gastr          : chr  "No" "No" "No" "No" ...
##  $ renal          : chr  "No" "No" "No" "No" ...
##  $ meta           : chr  "No" "No" "No" "No" ...
##  $ hema           : chr  "No" "No" "No" "No" ...
##  $ seps           : chr  "No" "Yes" "No" "No" ...
##  $ trauma         : chr  "No" "No" "No" "No" ...
##  $ ortho          : chr  "No" "No" "No" "No" ...
##  $ adld3p         : int  0 NA NA NA NA 0 NA NA NA NA ...
##  $ urin1          : num  NA 1437 599 NA 64 ...
##  $ treatment.swang: num  0 1 1 0 1 0 0 0 0 1 ...
##  $ outcome.died   : num  0 1 0 1 1 0 0 1 0 0 ...

If you have “labelled” variables like labelled int and labelled num it means that the datset was imported from SPSS. In such a case, simply change such variables into numeric variables. If you do not see such variables then skip this step.

rhc.small$age <- as.numeric(rhc.small$age) # only if in your version of the data "age" was a "labelled" variable
print("===================================================")

## [1] "==================================================="

# => Change all labelled variables that include numbers (integer or numeric) into numeric.
integer_cols <- ls(select_if(rhc.small, is.integer))
for (col in integer_cols) {
  rhc.small[col] <- as.numeric(unlist(rhc.small[col]))
}
# print("===================================================")

For each categorical variable with 2 values (such as sex) change it into a numeric variable by selecting one of the values (such as “female”) to be 1 and the other zero.

rhc.small$female <- as.numeric(rhc$sex=='Female')
print("===================================================")

## [1] "==================================================="

# => Change any factor with two values into a 0/1 variable by selecting a value (such as "female" above) to be 1 and the other value will be zero.
character_cols <- ls(select_if(rhc.small, is.character))
for (col in character_cols) {
  rhc.small[col] <- as.numeric(rhc.small[col] == "Yes")
}
# print("===================================================")

Before we start the analysis we need to be aware of the following issue: If you encounter a variable with missing values then you might run into a problem with the Match() function later on. For this exercise you can follow this advice:

If the proportion of missing is above 15% for a variable then just exclude the whole variable from the data frame and hence from the analysis, it is likely that the information it had still resides in the combination of the many other variables.
If you have less than 15% missing variables try to impute the values of the variable. In the extreme case you could apply multiple imputations and repeat the analysis on each imputation set and then pool the results. However for this exercise you can follow a simple strategy such as imputing the value with mean/median/mode. This would be a limitation but imputing missing values is not the focus of our exercise on causal inference.

=> Inspect the proportion of missing values for each variable and follow the advice above.

print("===================================================")

## [1] "==================================================="

prop_missing = colMeans(is.na(rhc.small))*100
#There are only two columns with more than 0% missing values. Both have a proportion missing values of more than 15%, so no imputation of missing values is needed and those columns are excluded from the dataset. 
rhc.small <- select_if(rhc.small, prop_missing < 15)
print("===================================================")

## [1] "==================================================="

Now we are ready to start the analysis.

Selecting variables to control for

According to the disjunctive cause criterion, we want to find variables (to control for) that are predictive of treatment and/or the outcome. In this assignment we will use the criterion to select all (pre-treatment) variables that are predictive of the outcome. This will include true confounders (that also affect the treatment, and risk factors that only affect the outcome). All our variables in rhc.small, except the outcome of course, are pre-treatment variables (none is measured AFTER the intervention) so they are all good candidates for the selection.

The truth is that the best way to select these variables is to use clinical knowledge. However, in this assignment we will do this in a data-driven and practical way. Specifically, we will operationalize “predictive of the outcome” as the following criterion: a variable is predictive of the outcome if its univariate association with the outcome is significant at the 0.1 level. Because our outcome is binary we can translate this criterion to: For every candidate variable in rhc.small fit a logistic regression model to predict outcome (outcome.died) using only the candidate variable as a predictor and retain the variable if the p-value of its association with the outcome is ≤ 0.1, and otherwise discard the variable. Note: Do not include the treatment in the selection.

print("===================================================")

## [1] "==================================================="

# => Fit logistic regression models for each variable separately with the outcome and retain those with an association having a p-value of ≤ 0.1. Tip: you may want to automate the whole process!

vars <- colnames(rhc.small[,!names(rhc.small) %in% c("treatment.swang", "outcome.died")])

xvars = list()

for (var in vars) {
  mod <- as.formula(sprintf("outcome.died ~ %s", var))
  fit <- glm(formula = mod, data = rhc.small, family = binomial)
  p_value <- summary(fit)$coefficients[,4][2]
  if (p_value <= 0.1){
    xvars[[var]] <- fit
  }
}

# => Put the names of the selected variables (without the treatment and outcome variables) in the variable called xvars by writing xvars <- c("ARF", ...) or do it automatically!
# => Print xvar
xvars <- names(xvars)
print(xvars)

##  [1] "ARF"        "CHF"        "Cirrhosis"  "Lungcancer" "MOSFM"     
##  [6] "Coma"       "cardiohx"   "chfhx"      "dementhx"   "psychhx"   
## [11] "chrpulhx"   "renalhx"    "liverhx"    "malighx"    "immunhx"   
## [16] "transhx"    "age"        "edu"        "das2d3pc"   "aps1"      
## [21] "scoma1"     "meanbp1"    "wblc1"      "temp1"      "alb1"      
## [26] "hema1"      "bili1"      "crea1"      "pot1"       "paco21"    
## [31] "ph1"        "wtkilo1"    "dnr1"       "resp"       "card"      
## [36] "gastr"      "renal"      "hema"       "seps"       "trauma"

# => Put these selected variables in a data.frame called rhc.selected that also includes the treatment and outcome variables to this data frame.
rhc.selected <- rhc.small[, c("treatment.swang", "outcome.died", xvars)]

# print("===================================================")

Inspect balance before matching

look at “table 1”

print("===================================================")

## [1] "==================================================="

#=> look at "table 1", complete the command by filling in the "..."
table1<- CreateTableOne(vars=xvars,strata="treatment.swang", data=rhc.selected, test=FALSE)
## include standardized mean difference (SMD)
print(table1, smd=TRUE)

##                         Stratified by treatment.swang
##                          0             1             SMD   
##   n                       3551          2184               
##   ARF (mean (SD))         0.45 (0.50)   0.42 (0.49)   0.059
##   CHF (mean (SD))         0.07 (0.25)   0.10 (0.29)   0.095
##   Cirrhosis (mean (SD))   0.05 (0.22)   0.02 (0.15)   0.145
##   Lungcancer (mean (SD))  0.01 (0.10)   0.00 (0.05)   0.095
##   MOSFM (mean (SD))       0.07 (0.25)   0.07 (0.26)   0.018
##   Coma (mean (SD))        0.10 (0.29)   0.04 (0.20)   0.207
##   cardiohx (mean (SD))    0.16 (0.37)   0.20 (0.40)   0.116
##   chfhx (mean (SD))       0.17 (0.37)   0.19 (0.40)   0.069
##   dementhx (mean (SD))    0.12 (0.32)   0.07 (0.25)   0.163
##   psychhx (mean (SD))     0.08 (0.27)   0.05 (0.21)   0.143
##   chrpulhx (mean (SD))    0.22 (0.41)   0.14 (0.35)   0.192
##   renalhx (mean (SD))     0.04 (0.20)   0.05 (0.21)   0.032
##   liverhx (mean (SD))     0.07 (0.26)   0.06 (0.24)   0.049
##   malighx (mean (SD))     0.25 (0.43)   0.20 (0.40)   0.101
##   immunhx (mean (SD))     0.26 (0.44)   0.29 (0.45)   0.080
##   transhx (mean (SD))     0.09 (0.29)   0.15 (0.36)   0.170
##   age (mean (SD))        61.76 (17.29) 60.75 (15.63)  0.061
##   edu (mean (SD))        11.57 (3.13)  11.86 (3.16)   0.091
##   das2d3pc (mean (SD))   20.37 (5.48)  20.70 (5.03)   0.063
##   aps1 (mean (SD))       50.93 (18.81) 60.74 (20.27)  0.501
##   scoma1 (mean (SD))     22.25 (31.37) 18.97 (28.26)  0.110
##   meanbp1 (mean (SD))    84.87 (38.87) 68.20 (34.24)  0.455
##   wblc1 (mean (SD))      15.26 (11.41) 16.27 (12.55)  0.084
##   temp1 (mean (SD))      37.63 (1.74)  37.59 (1.83)   0.021
##   alb1 (mean (SD))        3.16 (0.67)   2.98 (0.93)   0.230
##   hema1 (mean (SD))      32.70 (8.79)  30.51 (7.42)   0.269
##   bili1 (mean (SD))       2.00 (4.43)   2.71 (5.33)   0.145
##   crea1 (mean (SD))       1.92 (2.03)   2.47 (2.05)   0.270
##   pot1 (mean (SD))        4.08 (1.04)   4.05 (1.01)   0.027
##   paco21 (mean (SD))     39.95 (14.24) 36.79 (10.97)  0.249
##   ph1 (mean (SD))         7.39 (0.11)   7.38 (0.11)   0.120
##   wtkilo1 (mean (SD))    65.04 (29.50) 72.36 (27.73)  0.256
##   dnr1 (mean (SD))        0.14 (0.35)   0.07 (0.26)   0.228
##   resp (mean (SD))        0.42 (0.49)   0.29 (0.45)   0.270
##   card (mean (SD))        0.28 (0.45)   0.42 (0.49)   0.295
##   gastr (mean (SD))       0.15 (0.35)   0.19 (0.39)   0.121
##   renal (mean (SD))       0.04 (0.20)   0.07 (0.25)   0.116
##   hema (mean (SD))        0.07 (0.25)   0.05 (0.22)   0.062
##   seps (mean (SD))        0.15 (0.35)   0.24 (0.42)   0.234
##   trauma (mean (SD))      0.01 (0.07)   0.02 (0.12)   0.104

## For which variables we do not have a good balance? Look at the slides for a guideline. 
smd_values1 <- ExtractSmd(table1)
# The variables printed to not have a good balance, since it is above 0.2.
smd_values1[smd_values1 > 0.2, ]

##      Coma      aps1   meanbp1      alb1     hema1     crea1    paco21   wtkilo1 
## 0.2072776 0.5014019 0.4550959 0.2299199 0.2693013 0.2696236 0.2485938 0.2556794 
##      dnr1      resp      card      seps 
## 0.2276000 0.2695134 0.2949292 0.2337967

print("===================================================")

## [1] "==================================================="

Greedy matching on Mahalanobis distance

Let us now do matching in the hope that the balance will get better so we can have a more valid inference.

print("===================================================")

## [1] "==================================================="

# We want to match the treated with the untreated subjects. Find 1 on 1 match.
# # => Complete the command
greedymatch <- Match(Tr=rhc.selected$treatment.swang,M=1, X=rhc.selected[xvars], replace=FALSE) 
# read the documentation for Match() by typing ?Match.
?Match
# => What does replace=FALSE mean?
# That matching should not be done with replacement and therewith the order of matches generally matters. The matching for the first observation will be fund first, for the second observation second, etc. This is greedy matching.

# => What is inside X in the command above?
# A matrix containing the variables we wish to match on. In this case all the candidate variables. 

# print("===================================================")

print("===================================================")

## [1] "==================================================="

greedymatch$index.treated # => What do you think this is?

##    [1]    2    3    5   10   12   13   17   18   21   22   23   26   33   36
##   [15]   38   42   51   55   56   57   59   62   70   71   73   80   83   86
##   [29]   89   98  100  101  102  107  109  114  115  118  120  124  134  136
##   [43]  137  138  142  143  145  146  150  152  153  155  164  167  168  169
##   [57]  171  172  173  175  176  178  182  183  185  187  189  192  193  195
##   [71]  201  205  206  208  211  216  219  220  224  225  226  227  228  231
##   [85]  233  236  238  240  244  245  247  248  260  265  266  267  274  275
##   [99]  277  281  282  285  288  290  291  294  295  296  298  303  304  306
##  [113]  309  312  315  317  321  323  324  325  327  330  333  334  337  339
##  [127]  341  342  343  346  347  348  350  351  356  358  366  368  372  374
##  [141]  375  376  378  381  385  389  391  393  396  398  402  403  404  406
##  [155]  413  416  418  421  422  423  425  426  432  433  435  436  441  444
##  [169]  445  448  460  463  468  470  471  472  474  480  481  482  484  485
##  [183]  487  489  493  494  495  499  503  504  508  509  510  511  512  513
##  [197]  515  526  527  528  532  533  537  540  543  550  552  554  556  559
##  [211]  560  564  566  567  568  569  571  572  574  577  578  579  583  584
##  [225]  585  589  592  594  595  598  599  600  601  603  608  610  617  618
##  [239]  625  627  630  631  634  639  640  641  642  650  651  654  655  658
##  [253]  659  664  665  667  672  675  678  682  689  690  691  694  695  698
##  [267]  700  701  703  705  706  707  712  713  718  719  721  724  727  731
##  [281]  737  738  739  742  743  746  747  748  752  754  756  757  758  759
##  [295]  762  769  771  772  773  777  778  780  782  784  787  788  794  795
##  [309]  796  800  805  811  814  820  822  824  828  829  830  834  837  840
##  [323]  841  842  844  848  850  851  852  854  862  863  866  867  871  877
##  [337]  881  882  885  886  891  893  894  895  896  897  899  900  902  908
##  [351]  920  921  922  924  925  930  932  933  934  944  945  947  948  951
##  [365]  952  959  960  961  962  963  964  969  971  972  973  974  982  985
##  [379]  986  991  992  993 1001 1002 1004 1005 1007 1008 1012 1014 1016 1017
##  [393] 1020 1021 1022 1023 1024 1027 1031 1035 1036 1038 1039 1043 1053 1054
##  [407] 1055 1065 1071 1072 1074 1075 1076 1079 1088 1089 1095 1100 1108 1112
##  [421] 1113 1117 1120 1127 1129 1131 1134 1135 1136 1137 1139 1144 1146 1148
##  [435] 1150 1153 1157 1159 1163 1175 1176 1179 1180 1189 1193 1194 1195 1196
##  [449] 1197 1203 1208 1211 1212 1214 1215 1217 1218 1219 1220 1228 1230 1232
##  [463] 1233 1237 1241 1247 1250 1251 1254 1255 1260 1265 1266 1267 1269 1270
##  [477] 1271 1273 1274 1276 1277 1278 1279 1280 1284 1292 1293 1297 1298 1301
##  [491] 1302 1305 1307 1308 1310 1311 1312 1314 1323 1324 1326 1329 1331 1333
##  [505] 1339 1343 1347 1348 1350 1351 1352 1355 1358 1359 1360 1362 1363 1364
##  [519] 1365 1367 1369 1370 1371 1381 1384 1387 1391 1395 1396 1397 1399 1402
##  [533] 1406 1409 1411 1413 1414 1415 1419 1421 1422 1423 1425 1426 1429 1433
##  [547] 1434 1435 1440 1444 1448 1449 1454 1456 1458 1459 1463 1466 1468 1472
##  [561] 1475 1480 1484 1485 1490 1492 1494 1499 1500 1503 1504 1505 1507 1508
##  [575] 1511 1513 1514 1516 1517 1518 1522 1526 1527 1528 1530 1531 1533 1535
##  [589] 1536 1537 1541 1542 1544 1548 1551 1552 1553 1554 1556 1557 1559 1560
##  [603] 1563 1567 1568 1569 1570 1572 1576 1583 1585 1591 1593 1594 1597 1600
##  [617] 1603 1606 1609 1611 1615 1616 1618 1619 1622 1623 1624 1629 1630 1634
##  [631] 1636 1637 1642 1643 1644 1645 1646 1650 1651 1653 1663 1669 1674 1675
##  [645] 1677 1678 1680 1682 1686 1688 1699 1701 1702 1703 1704 1705 1707 1709
##  [659] 1712 1717 1718 1721 1725 1729 1739 1740 1744 1747 1756 1760 1762 1766
##  [673] 1770 1771 1772 1773 1775 1782 1792 1794 1795 1796 1802 1807 1809 1811
##  [687] 1812 1813 1818 1820 1822 1825 1826 1827 1828 1835 1836 1838 1843 1850
##  [701] 1857 1860 1862 1863 1864 1866 1867 1870 1876 1877 1878 1888 1889 1890
##  [715] 1895 1899 1905 1906 1907 1916 1917 1923 1924 1930 1932 1933 1942 1945
##  [729] 1946 1955 1956 1957 1959 1963 1966 1967 1969 1970 1972 1973 1976 1979
##  [743] 1981 1984 1985 1989 1990 1996 1997 1999 2002 2004 2007 2010 2016 2017
##  [757] 2018 2022 2028 2030 2032 2035 2036 2039 2041 2049 2051 2053 2054 2056
##  [771] 2057 2058 2060 2061 2063 2064 2065 2068 2071 2072 2075 2078 2079 2085
##  [785] 2090 2095 2096 2097 2101 2102 2105 2109 2110 2113 2121 2122 2123 2127
##  [799] 2129 2130 2135 2138 2139 2140 2146 2147 2154 2155 2159 2162 2165 2168
##  [813] 2169 2171 2173 2176 2177 2178 2179 2181 2183 2185 2186 2192 2193 2194
##  [827] 2195 2196 2202 2203 2208 2210 2216 2217 2221 2225 2227 2228 2231 2234
##  [841] 2235 2237 2238 2239 2240 2241 2243 2246 2255 2258 2261 2262 2267 2271
##  [855] 2273 2275 2276 2281 2282 2290 2291 2292 2293 2296 2298 2300 2302 2303
##  [869] 2307 2308 2312 2314 2319 2322 2324 2330 2338 2339 2341 2343 2348 2349
##  [883] 2352 2359 2361 2365 2367 2368 2369 2371 2373 2375 2377 2386 2387 2388
##  [897] 2395 2396 2400 2401 2402 2407 2409 2412 2413 2414 2419 2420 2426 2427
##  [911] 2435 2436 2439 2442 2443 2446 2448 2450 2451 2453 2454 2459 2462 2466
##  [925] 2468 2476 2479 2500 2513 2515 2521 2523 2528 2530 2531 2535 2540 2544
##  [939] 2546 2549 2550 2552 2555 2560 2561 2569 2571 2572 2573 2578 2582 2584
##  [953] 2585 2587 2588 2592 2596 2600 2601 2605 2606 2607 2611 2612 2614 2615
##  [967] 2616 2620 2622 2627 2628 2630 2633 2637 2644 2648 2650 2651 2653 2654
##  [981] 2661 2670 2671 2672 2673 2674 2676 2677 2678 2682 2685 2686 2689 2691
##  [995] 2694 2696 2697 2698 2700 2704 2707 2710 2712 2713 2714 2716 2717 2721
## [1009] 2722 2724 2727 2728 2729 2732 2735 2740 2742 2746 2749 2751 2755 2758
## [1023] 2759 2760 2761 2763 2766 2767 2772 2773 2780 2781 2782 2785 2786 2787
## [1037] 2792 2796 2798 2804 2805 2806 2815 2817 2819 2820 2821 2822 2824 2825
## [1051] 2827 2828 2831 2835 2839 2842 2845 2846 2847 2848 2851 2853 2854 2857
## [1065] 2860 2861 2862 2864 2865 2871 2872 2875 2876 2878 2880 2881 2887 2888
## [1079] 2889 2890 2892 2895 2896 2899 2900 2903 2908 2909 2911 2913 2914 2929
## [1093] 2930 2932 2933 2935 2937 2942 2952 2954 2956 2957 2959 2968 2969 2972
## [1107] 2973 2977 2979 2981 2983 2988 2992 2994 2995 3001 3002 3005 3009 3011
## [1121] 3014 3019 3020 3023 3025 3028 3029 3030 3036 3037 3038 3040 3041 3045
## [1135] 3048 3049 3050 3051 3055 3059 3062 3064 3065 3066 3067 3068 3069 3081
## [1149] 3082 3083 3085 3087 3089 3092 3093 3095 3101 3103 3104 3105 3107 3108
## [1163] 3113 3115 3119 3120 3122 3123 3125 3127 3133 3135 3136 3138 3141 3142
## [1177] 3144 3146 3148 3149 3153 3154 3155 3157 3160 3161 3168 3169 3181 3188
## [1191] 3193 3194 3199 3201 3203 3205 3206 3209 3210 3212 3213 3217 3219 3220
## [1205] 3222 3225 3227 3231 3234 3235 3240 3242 3244 3246 3253 3258 3260 3261
## [1219] 3263 3266 3271 3273 3274 3275 3276 3278 3280 3287 3290 3296 3298 3301
## [1233] 3303 3307 3309 3313 3316 3320 3321 3323 3324 3327 3332 3333 3334 3335
## [1247] 3336 3337 3341 3343 3344 3353 3356 3359 3360 3363 3364 3370 3374 3376
## [1261] 3379 3381 3384 3388 3390 3397 3398 3400 3401 3402 3403 3406 3409 3410
## [1275] 3411 3413 3414 3416 3417 3431 3432 3433 3437 3438 3439 3443 3444 3446
## [1289] 3447 3449 3453 3455 3458 3459 3460 3466 3468 3473 3474 3476 3478 3480
## [1303] 3484 3487 3489 3492 3494 3497 3498 3501 3504 3508 3513 3515 3519 3523
## [1317] 3526 3529 3534 3536 3537 3538 3540 3541 3542 3543 3546 3547 3549 3550
## [1331] 3551 3552 3557 3558 3559 3560 3561 3566 3568 3569 3576 3581 3582 3588
## [1345] 3590 3591 3592 3596 3597 3600 3602 3605 3606 3609 3612 3615 3616 3619
## [1359] 3620 3623 3625 3626 3629 3634 3646 3647 3655 3658 3661 3663 3665 3677
## [1373] 3680 3682 3686 3688 3693 3695 3697 3700 3701 3703 3705 3710 3712 3713
## [1387] 3716 3721 3723 3724 3725 3726 3728 3731 3734 3737 3739 3743 3744 3745
## [1401] 3746 3748 3749 3751 3753 3754 3756 3759 3764 3766 3767 3768 3770 3771
## [1415] 3774 3775 3777 3778 3781 3783 3791 3792 3795 3797 3798 3800 3805 3807
## [1429] 3809 3813 3815 3818 3819 3827 3828 3832 3834 3836 3838 3841 3846 3848
## [1443] 3851 3852 3854 3857 3860 3861 3863 3864 3865 3866 3870 3872 3877 3879
## [1457] 3880 3881 3882 3891 3892 3901 3902 3916 3919 3923 3924 3925 3926 3929
## [1471] 3932 3933 3939 3940 3946 3947 3951 3952 3954 3956 3957 3960 3962 3964
## [1485] 3967 3972 3975 3976 3978 3982 3983 3985 3991 3994 3995 3996 3999 4001
## [1499] 4005 4011 4014 4017 4019 4020 4024 4026 4027 4030 4032 4035 4036 4037
## [1513] 4038 4040 4043 4046 4049 4052 4058 4060 4071 4074 4081 4083 4087 4088
## [1527] 4090 4095 4098 4099 4102 4104 4105 4109 4112 4113 4114 4115 4117 4119
## [1541] 4121 4122 4126 4127 4129 4131 4135 4146 4147 4148 4150 4151 4152 4153
## [1555] 4154 4155 4157 4159 4164 4165 4166 4168 4172 4174 4176 4178 4181 4185
## [1569] 4186 4188 4190 4192 4198 4200 4203 4204 4208 4211 4212 4215 4218 4221
## [1583] 4222 4227 4229 4230 4231 4233 4236 4237 4238 4242 4250 4251 4252 4253
## [1597] 4254 4256 4258 4264 4265 4267 4274 4275 4277 4278 4279 4280 4281 4286
## [1611] 4288 4289 4292 4293 4294 4295 4296 4297 4299 4308 4309 4312 4320 4329
## [1625] 4330 4331 4332 4337 4343 4344 4345 4349 4351 4352 4354 4358 4360 4362
## [1639] 4363 4365 4366 4368 4372 4378 4381 4384 4386 4392 4395 4396 4398 4399
## [1653] 4408 4409 4412 4414 4415 4416 4417 4419 4423 4424 4425 4426 4427 4435
## [1667] 4437 4440 4442 4443 4444 4447 4449 4453 4454 4455 4456 4460 4465 4467
## [1681] 4472 4473 4474 4476 4477 4478 4480 4482 4486 4487 4491 4494 4497 4502
## [1695] 4506 4508 4510 4511 4517 4521 4522 4524 4526 4527 4528 4530 4531 4533
## [1709] 4537 4538 4540 4541 4542 4543 4552 4554 4556 4557 4565 4567 4571 4573
## [1723] 4574 4578 4582 4588 4590 4592 4595 4598 4602 4603 4607 4609 4610 4613
## [1737] 4615 4619 4621 4625 4634 4638 4640 4648 4649 4650 4651 4654 4656 4658
## [1751] 4659 4660 4662 4663 4664 4667 4672 4674 4677 4681 4683 4688 4693 4694
## [1765] 4697 4702 4703 4706 4709 4711 4713 4714 4716 4719 4722 4723 4724 4727
## [1779] 4728 4732 4733 4736 4739 4741 4748 4751 4752 4754 4755 4763 4766 4767
## [1793] 4768 4773 4778 4782 4783 4784 4789 4790 4793 4794 4795 4797 4798 4805
## [1807] 4807 4808 4809 4812 4813 4819 4822 4826 4827 4831 4834 4835 4836 4844
## [1821] 4848 4851 4853 4857 4859 4861 4864 4870 4871 4872 4874 4875 4876 4878
## [1835] 4879 4881 4884 4885 4888 4890 4891 4893 4894 4895 4896 4902 4907 4908
## [1849] 4910 4915 4916 4918 4920 4921 4922 4928 4929 4931 4935 4936 4941 4943
## [1863] 4946 4948 4949 4954 4956 4959 4961 4970 4976 4977 4978 4980 4982 4984
## [1877] 4985 4986 4987 4988 4989 4991 4992 4993 4994 4997 4998 4999 5000 5001
## [1891] 5003 5006 5008 5012 5013 5015 5016 5018 5020 5021 5022 5023 5025 5026
## [1905] 5029 5033 5034 5038 5044 5046 5048 5049 5055 5063 5069 5070 5075 5077
## [1919] 5078 5079 5081 5082 5083 5085 5087 5088 5093 5095 5102 5105 5106 5107
## [1933] 5109 5112 5113 5114 5115 5118 5123 5128 5131 5132 5138 5143 5145 5150
## [1947] 5152 5154 5156 5160 5162 5163 5164 5165 5167 5170 5173 5174 5176 5179
## [1961] 5183 5184 5185 5186 5192 5193 5194 5195 5196 5199 5202 5203 5205 5206
## [1975] 5209 5210 5218 5219 5220 5230 5233 5234 5235 5236 5237 5240 5242 5247
## [1989] 5248 5251 5254 5255 5256 5264 5266 5267 5268 5270 5274 5275 5276 5277
## [2003] 5285 5286 5293 5297 5300 5307 5309 5312 5313 5318 5319 5321 5322 5327
## [2017] 5333 5336 5338 5345 5346 5347 5349 5354 5356 5358 5360 5363 5365 5367
## [2031] 5369 5372 5381 5383 5388 5389 5390 5392 5394 5395 5396 5397 5398 5402
## [2045] 5406 5407 5408 5419 5421 5424 5430 5434 5435 5436 5440 5442 5445 5446
## [2059] 5448 5450 5454 5456 5457 5461 5464 5465 5466 5468 5470 5471 5473 5474
## [2073] 5476 5477 5479 5480 5482 5488 5489 5491 5493 5495 5497 5498 5499 5500
## [2087] 5502 5503 5504 5506 5509 5511 5512 5513 5514 5519 5521 5523 5525 5527
## [2101] 5530 5533 5536 5538 5539 5541 5545 5547 5549 5555 5558 5560 5564 5568
## [2115] 5569 5573 5574 5575 5576 5577 5580 5582 5583 5585 5589 5590 5592 5594
## [2129] 5595 5596 5597 5598 5600 5602 5604 5605 5607 5608 5609 5610 5612 5613
## [2143] 5614 5617 5620 5627 5628 5630 5631 5633 5635 5639 5643 5651 5652 5653
## [2157] 5658 5662 5663 5666 5667 5668 5672 5673 5674 5675 5678 5681 5684 5685
## [2171] 5695 5699 5704 5705 5710 5711 5712 5714 5715 5720 5722 5723 5726 5729

# The indices of all the treated patients in the original dataset that were matched to an untreated patient, in this specific order.

greedymatch$index.control # => What do you think this is?

##    [1] 4620 2486 3247  392 3787  984  582 4339  905 4500  200 1751  250 4726
##   [15] 4130 3249 5447 5518 2921  868 3694 4548 1066 4216 4639 3391 1126 2263
##   [29] 2891 5089 2783 1059 3156 5153 5364 3736 2938  889 3128 3603 1639  909
##   [43] 2692  927 2998 2066 5212  264 3907  232 4529 1887 2247  269 5226 2093
##   [57]   32 3942 4839 2299 2916 1502 3256 2665 3632 1798  133  235 5216 5492
##   [71]  180 1300  370 1322  887 1092  531  755 2800 4015 4232 1658 4811 2161
##   [85] 5490 4652 4632 3053 4002  518  876 2945 4932 3944 2478 2491   31 1151
##   [99] 2966 3129 4781 4906 1224 1210  744 1026  612 5694 1952 3304 4960 3112
##  [113] 1118 3448 3351 3354 2641  500 2376 2389 3998  692 2198 1664 5646 4846
##  [127] 4704 5244 1394  449 1389 1257 4802 2084 1902 3628 3134  384 2494 3202
##  [141] 5191 5043 3054 2778 2424 1791 1839 2974 3117 4572 2986 1670  818 3021
##  [155] 5683 5377  761 2206 5351  939 4777 4461 5517   64 4223 3137 2723  179
##  [169] 2621 1995 4202 2383  646 1335 1056 3027 5002 1745 5391  300 3488 5625
##  [183] 1229 1620 3894 2280 2918 4744 2547 3159 3762  979 3845  128 5654 1142
##  [197] 4772 1681 4718 5664 3527 1202 4873 1460 1814 2222 3823 5587 5522 3091
##  [211]  711 1515  644 2408 4975 4317 2311 4418 5314 3920 4599 2756 1587 2144
##  [225]  911 5444 1082  817 4106 2619 4175 5619 5133 1041 4644 3255  497  879
##  [239] 4852 1201 2055 2327 4214 4953  875 1832 5259 3362 1881 2939 4911 3173
##  [253] 2976  373 2539  686 4197 2506  995 4604 3524 1859  359 5642 4631  308
##  [267] 5579 3339 5227 3057 2668  365 2559 4357  977 1845 3254 2536 1915 2693
##  [281] 5528  870 3395 1171 3997 4136 2752  699  452 4957 2687 4125 2213 2429
##  [295] 2156 5056 2583 2503 4323 3826 3126 3462 3198 4955 1190  141 4310 5328
##  [309]  785 5469 3124 4535  213 2044 1935 3893 4923  789 2545 1246 5252 2776
##  [323]  815  546 2009 1417 3293 2356 2157 4867  307 5559 1357 2287 4189 5280
##  [337] 1631 4382 1571 5011 2119 3965 4244 4391 3532 2703 1077   44 5405 2610
##  [351]   52 1121 4373 5263 4525 4328 3839 4889  892  534 3684 4355 3472 4182
##  [365] 5039 3530 1403 1049 5071 5137   24 4369 1061  123 1829 5180 1793 4108
##  [379] 3121 3720  753 5526 1764 3300 2538 4483 3664  328 3464 2534 1386 4484
##  [393] 2656 3441 2480 1540 3847 2410  461 1998 1656 3585 1849 2438 1286 4666
##  [407] 4828 4364 1683 1819 3285 4680 3074 2364 3816 2027 3780 1817 2483 3373
##  [421] 3611 1776 5144  725  836 2905  199 1483 5697 4144 3667 5342 3742 2038
##  [435] 4446 2187 3988 4249 4925 2898 1549 2631  575 2421 1447 4761 3010 1376
##  [449] 2415  586 2690 4318 4979 3948  714 2645 3910  379 5437 2337 3618 1149
##  [463] 1750 3328 4950  680  320  271 1111 5655 5148  455  674 4899 4133 1209
##  [477] 3773 1691 2309 2495 1078  966 4319 1488 4824 4641 4261 3763  521 3958
##  [491] 3185 2493  570  645 5024 2305  230  283 1632 1964 1264 2517  255 4199
##  [505] 3928 3471 1774 5581 3639 3277 2901 3992 3912 3673 1140 3644  475 1689
##  [519]  544 5151  903 5731  978   85 2873 2948 3943 2031 4774 5696 2856 5041
##  [533] 4903 1865 3974 4765 4245  745 4823    4   65 1461 3197 4359  127 1467
##  [547] 5718 1949 5054  931 1116 4850   19 2609 2272 1393 1080 3088 2719 5629
##  [561] 1851 1928 4815 5458 5676 3380 1319 4479 2777 4004 4544 2141 4021 3319
##  [575] 3248 4269   88 2170 4559  779 1755 3669 2701 2725 4738 5028 3814 4469
##  [589] 3750 3785 3564 2709  473 5201 4485 5250 4897 1927   45  765 1777 2626
##  [603] 2481 2711 2602 4149 2562  558 3643 4645  749  197 5690 3171 2020  823
##  [617] 5117  103 1931 1143 5368 3286 3888 5481 3070 2069 3908 2256 3914 4947
##  [631] 4438 5103 4635  305 5127 5140 5288  813 4201  901 3883 4880 3516 4586
##  [645] 3214 2961  581 1372 2664 2970 2008 2646 2417  956 4617 1268 3418 1803
##  [659] 1734 2810 1177 3429 2045 4096 3017 2508 1034 5670 1968 5157 1099 1599
##  [673] 2754 5084 3622  548 3571 4849 5387 2832 1141 2744 2467  649 2428  349
##  [687] 1259 3927 4655 3450 1831 4184 1785 1315 1353 3862 1941 3251 1879 3039
##  [701] 3368 2639 3544  106  456 3465 4676 1046 3621 5181 3167 3174 2115  775
##  [715] 4110  774 2863 4684 2642  954 5057 2591 1925 2625 4507 3421 4863 1992
##  [729] 1052  998  793 1231 1354 4140 1824 2447 5344 3520 5463 4498 5734   25
##  [743] 2843 1882 4272 1693 5207 5155 2982 1607 3608 2577   37 3909 4307 1965
##  [757] 2323 2463 2111 1064 3237  914   60 5316 2214  859 3056 1790 4939 2799
##  [771]  838 1898 1706  708  620 4952 2971 1252 2279 5291 4647  184 5279 4686
##  [785] 3365 1184 2423 2136 1368 2160 2172 5045 1178 4335 5096 3109 2385 1855
##  [799]  397 4624 5189 1912 1601 4499 2919 1610 1780 2737 4670 2669 3729 4901
##  [813] 2091 2978  949 4944 1152 5484 5624 2288 2205   29 5626 4729 1167 3031
##  [827] 5299 4933 5269 4374 2855 2944 3535 4420 5382 5352 3986 5567 1781  188
##  [841]  855  683 3900  702 1258 1382  580 3987 4376 2201  542  440 2731 3288
##  [855] 5310 2098 1105 3176 3830 3539 4141    8 5124 1523 4287 1469 2088 5335
##  [869] 4173 5516 5417   15 4545  735 2922  615 1977  555 1561 3885 3654 3013
##  [883] 4995 4669 3502 3636 2142 1032 2488 1512 1937 5524 2996 4770  410 4786
##  [897] 1555 1974 3950 2381 4124  668 3371 3554 5551  626 5462  679 1700 3218
##  [911] 3445 1943 5158  505 4061 2604 5050 2532 3990 2512 1913  352 4678 4942
##  [925] 3867 1883 2092   91 5282 4536 1861 5190 1451 4034 4926 2874 1808 3617
##  [939] 5487 1834 1506 4336  802 1084  516 5554 4514 1294 3718 1768 4512 4623
##  [953] 1698 1604  459 5014 1918 4457  332  401 3968 4576 3589 1720  522 5669
##  [967] 2594 2232 1657 4067 3116 4840 2125  119 3435 5698 5027 1453 4350  254
##  [981] 5292  767 1325 1248  148 3183 2223 4488  545 3789 3899 1213 2779 3456
##  [995] 1564 2393 4161 3521   79 1470 4734 3906  158  496 4562 1379 3598 1107
## [1009] 2802 2811 1328 1086 1172 4523 3727 5135 1830 2209 3993 1130  458 4407
## [1023] 2218 2720 3241 1138 4575  766 2708  429 1869 3719 1486   20 2984   34
## [1037] 2180 1442 5302 3685 1181 1660 4717 2332 3650 4611 1903 4177 3422 1110
## [1051] 5067  130 2397 1263  345 2211 2268 1799 5283 5086 3840 3858 4866 2285
## [1065] 2226  454 3096 3905 4570  740    9 2224 5486 4788 1788 4504  953 1816
## [1079] 1951  340 3911  252 4413 5200  557 1344 4116 2823 1602 1723 1081 2104
## [1093]  808 5161 1654 2445   76 3385 2295 1588 2608 3681  622 2504  319 2297
## [1107] 1018   90  400 5572 4158 1162 3772 1742 1087 4448  988 5578 4643 2430
## [1121] 2001 2114 3000  477 1767 5147 4550 4912  498 3889 2866 4689 3347 3434
## [1135] 3895 3875   84 5374  810 2593 3297 2920 5177 4132 3670   30 4642 4450
## [1149] 4101 2315 1309 4492 2502  604 2501 4077 4969  256 3799   11 4913 4284
## [1163] 2372 4758  918  786 5036 4580 5260 3228 2320 4029 3291 1408 5308 5420
## [1177] 3268 1852  218  717 4397 4093 4618 1186  791 1249 3238 4048  121 5296
## [1191] 1336 5068 2507 5623 1960 4919 1815 1908  514 5370  798  316 1635 1304
## [1205] 1684  835  997  439 2643 1733 3689 3061 1058 4886 4749  825 3503  688
## [1219]  222 5393 5304 2906  210 5556 3499 1509  687    7 5548 4563 1356 3162
## [1233] 4388 1787 5599 4010 4065 2444  239  355 4973 5225 1783  937  386 3624
## [1247] 2893 2684 4501 2730 1405 3613 1062 1373  160 2270 4900 1757   47 2837
## [1261] 1732 4210 3730  427 2784 4673 5570 2556 2726 4841 4904 5409 2333  156
## [1275]  278  462 1652 1452 3052 4945 1073 5215 1168 2736   35  797 1710 5379
## [1289] 2449 3507 1040 4346 1752 1627 1581 2870  864 3829 5208 2526 1390 3518
## [1303]  535 4720 1471 5732 3699 1299 3594 3404 2499 1617 2882 4605 5323 2251
## [1317] 4546 2394 4865  344 1306 5119  643 5508 5214 1778 5222 1122 4086 4505
## [1331] 3477 4721 3849 3485  732  847  212 5376  447 4629 5657 1349 3187 4089
## [1345]  715 2014  490 5198  898 3884 2849 5659 1464 4564 2197 3164 3638 3887
## [1359] 5111 3342  916 5534 2617 5357  809  279 5700 1158 1236 4193 4764 2638
## [1373] 5121  478 3627 2518 4340 5404 1154 1626 2199 4597 4470  377  917 4334
## [1387] 2791 2542   94 4883 1900 4661 1128 4657 2884 3790  190  928 1103 2220
## [1401] 5213 4843 1192  536 1281 3294  614 4759 3938 4633 1244 2808  563 4637
## [1415]  234 3495 5634  104 4616 3574 5187 5494 5228 3463 2774 4270 2145 4390
## [1429] 3250 2048 1694 3152  869 1170 3405 5334  958  273 3853  162 4063 4591
## [1443] 5728 4325 1621 4860 3855 5438 4675 5262 3394 1805 2858 4887  857 1392
## [1457] 3191  479 3662 4842 1748 1443  980 3961 4968 3555 2153   14 1823 5553
## [1471] 3935 4589 5724 2958  237 4466 1445  647 3583 3810 4303 4111 5416 5362
## [1485] 3419 2566 2052 4972 4260 2046 5702 4940 1029 2012 3058  662  361 4234
## [1499] 3200 2257 4066 3392 3259 5713 5735 1592 5353 1227 1690 3525 2015 3493
## [1513] 3236 2254 5733 3043  975 5343 3812 3831 3367 3510 2999 1097 4259 3102
## [1527] 2188 3878 3850 1692 2163 3707 4441 3145 4051 3548 2907 3896 5092 2757
## [1541]   99 5467 1000 3648 5265  523 5378 2629 1765 5324  783 5130 4393 2342
## [1555] 2250 5298 4257 2200 3042 4495  750 1295 2355 1238 3706 1804 3505  261
## [1569] 3012 5478 3825 1982 2083 4519  696 4436 5004 3269 3299 2391 2151 5331
## [1583] 4862 1067 2398 4882 2117 2548 1667 5149 1156  616 4687 2989 5253 4698
## [1597] 4025 5097 2474 2509 2059 1037 2750 2576 3100  469 3281 1846 2099 1659
## [1611]  257  314 2358 3352 2094 1495 3284 5243 4838 1847 2574 2191 3451 2770
## [1625] 4561 1313 2636  132 1316 1988 5563 1114 2259 5080 2378 5125  884  596
## [1639] 1303 1940  965 1289 2404 2869 4679   72 1378 1226 3412 4091 3063  501
## [1653] 5166 1283 3004 3018  704 4967 5366 1786 2738 2975 1479 5415  751 4854
## [1667] 2399  357 5618  955 2511  807  405 3573 1239 2950 5104 1961 2524 1741
## [1681] 1123 3563 5311  203 5271  628 3396 4577 2126   96 2470 2529 3674 1487
## [1695] 5122 1476 4490  661 4326 2603  177 2370 1009 1713  734  722 1287   54
## [1709] 5485 3399 4047 2384 4039 3345 2471   58 1006 4924 3683 1685 2743 4583
## [1723]  670 3317 4033 2266 4290 1856 5640  450 2813 4430 1922 4990 1580 4966
## [1737] 2960 4241 3272 4974 2321 1431 2525 2681 4614 3172 2363 3709  110 5550
## [1751] 2067 2484 1166 2248 3015 1185 4143 4123 3556 3267 4103  843 1521 4671
## [1765] 2519 4825 4601 3080 2490 1566  720 1489 5679 1090 4951  648 3163 1493
## [1779] 2613  637 3216 4304  419 2034  371 5168 3740   16 4475 5433 5035 2775
## [1793] 4283 1789  434 1844 3824  764 5348 4406 4892 3386  353 4737  268 2033
## [1807] 4626  276 4428 4695 1719  806 3989 3660 5017 4301 2344   39 5037 3340
## [1821] 1589 4044 4333 4050 4062 3346 4539 4400 1810 2924 2570 3452 4742 4707
## [1835] 1430 1385  263  113 4207  733 4463 4064 5159 3761 4534  792  388 2925
## [1849] 4830  387 1262 3350  420 2252 5060 4608 5110 5172 2040  367 5341 2597
## [1863] 4760 2867 1221  412 3886 2190 5644 1731 2789 5238 5606 5647  976 2465
## [1877] 4389 3570  262 3375 4518 3512  383 1432 5273  888 2553 5507  151 5337
## [1891] 4170  865 5443 3292 1124 4515  607 1070 4375  941 2894 2990 1436  957
## [1905] 4224 1947 3579 5325 2886 1666 2660 5472 5719  827 5453 1101 1011 3192
## [1919] 1410  506 2926 4622 3204 2917  524   78 3305 5064 3577  636 5660  251
## [1933]  685 3094 4300 5257  322  407 2897 4464 3086 2212 1519 1275 4194  492
## [1947]  467  660 2457 1042 3640 5584 3531 5510 1708 2715 3491 4868 1578 1950
## [1961] 4493 4801 4405 3378 1727  929 2418 4581 3208 3357 4206 2598 4205 5281
## [1975] 2557  411 2951 2836 2269  741  198 3279 2595 4394 4630 5139 4551 1608
## [1989] 2080  186 2304 4740 1662 1164  673 1498 3008 3215 5073 4877 5615 4699
## [2003] 3496 4069 5622 3481 2425 4596 2456 3578 1665 4855 1407 3377 1938 2346
## [2017] 1455 2635  395 5306 5284  354 4410 5452 2021 3601 5636 3755 5332 5546
## [2031] 2278 1420 4451  335 5410 4710 4730  170 2489 2357 4291 2987 5032 1633
## [2045] 4981 4845 3945  624 4118 1282 1648 2912 1015 3383 3158 1525 3672 4513
## [2059] 1400 3659 2405  946 5403 4757 1147 3833 3575 2360 5330 2294 4128 4016
## [2073] 4080 4696 2003 3607 3349 4701 5411 3245 4068 2564 3869 3760 3361 3859
## [2087]  394 2124 4471 4445 4963 2980 5449  990  338 4962 4569 3649 4361  437
## [2101] 3314 1044  253 4829 2580 2803 2568 5483 3808 4383  502   92 1598 1161
## [2115] 3635 2923 1404  803 3230 4082 2498 3533   46 1418 1098 2652 5126 3306
## [2129] 1728 2688 4810  723  517 2655 2949 1085 1746 2477 3077 1801 5501 4285
## [2143] 5561 5040 4779 3047 4691   49 1975 1628 1206 2440 3326 1240 1758 5400
## [2157] 1102 3966 3440 2112 5116 4042 1334 1028 3035  488 2244  663 3190 1383
## [2171] 3868  989 3811 3758 1759 2441 4800 4377 3151 1672 2118  983 5562 3424

# The indices of all the untreated patients in the original dataset that were matched to a treated patient.
print("===================================================")

## [1] "==================================================="

=> Are all treated subjects matched to a control one?

Yes, when one computes the length of greedymatch$index.treated, it is equal to 2184 which is the number of treated patients.

matched <- rhc.selected[unlist(greedymatch[c("index.treated","index.control")]), ]
matched[1,]  # This is the first treated subject

print("===================================================")

## [1] "==================================================="

matched[length(greedymatch$index.treated)+1,] # => What do you think this subject is?

#This is the first untreated subject. 
print("===================================================")

## [1] "==================================================="

Get table 1 for matched data with standardized differences

print("===================================================")

## [1] "==================================================="

matchedtab1 <- CreateTableOne(vars=xvars,strata="treatment.swang", data=matched, test=FALSE) # => Complete the command to create table 1 after matching
# => Print the balance with smd. 
print(matchedtab1, smd=TRUE)

##                         Stratified by treatment.swang
##                          0             1             SMD   
##   n                       2184          2184               
##   ARF (mean (SD))         0.51 (0.50)   0.42 (0.49)   0.182
##   CHF (mean (SD))         0.10 (0.30)   0.10 (0.29)   0.003
##   Cirrhosis (mean (SD))   0.02 (0.15)   0.02 (0.15)   0.012
##   Lungcancer (mean (SD))  0.00 (0.05)   0.00 (0.05)  <0.001
##   MOSFM (mean (SD))       0.07 (0.26)   0.07 (0.26)   0.002
##   Coma (mean (SD))        0.04 (0.20)   0.04 (0.20)   0.002
##   cardiohx (mean (SD))    0.19 (0.39)   0.20 (0.40)   0.039
##   chfhx (mean (SD))       0.17 (0.38)   0.19 (0.40)   0.053
##   dementhx (mean (SD))    0.07 (0.26)   0.07 (0.25)   0.021
##   psychhx (mean (SD))     0.04 (0.21)   0.05 (0.21)   0.009
##   chrpulhx (mean (SD))    0.16 (0.36)   0.14 (0.35)   0.033
##   renalhx (mean (SD))     0.05 (0.21)   0.05 (0.21)   0.002
##   liverhx (mean (SD))     0.06 (0.24)   0.06 (0.24)   0.011
##   malighx (mean (SD))     0.22 (0.42)   0.20 (0.40)   0.048
##   immunhx (mean (SD))     0.28 (0.45)   0.29 (0.45)   0.036
##   transhx (mean (SD))     0.11 (0.31)   0.15 (0.36)   0.116
##   age (mean (SD))        61.09 (17.10) 60.75 (15.63)  0.020
##   edu (mean (SD))        11.66 (2.90)  11.86 (3.16)   0.065
##   das2d3pc (mean (SD))   20.57 (5.43)  20.70 (5.03)   0.025
##   aps1 (mean (SD))       52.66 (18.31) 60.74 (20.27)  0.418
##   scoma1 (mean (SD))     17.58 (27.03) 18.97 (28.26)  0.050
##   meanbp1 (mean (SD))    79.50 (36.09) 68.20 (34.24)  0.321
##   wblc1 (mean (SD))      15.02 (10.93) 16.27 (12.55)  0.106
##   temp1 (mean (SD))      37.72 (1.75)  37.59 (1.83)   0.072
##   alb1 (mean (SD))        3.10 (0.68)   2.98 (0.93)   0.152
##   hema1 (mean (SD))      31.38 (7.68)  30.51 (7.42)   0.116
##   bili1 (mean (SD))       2.11 (4.47)   2.71 (5.33)   0.122
##   crea1 (mean (SD))       2.02 (1.93)   2.47 (2.05)   0.228
##   pot1 (mean (SD))        4.02 (0.97)   4.05 (1.01)   0.032
##   paco21 (mean (SD))     38.09 (10.76) 36.79 (10.97)  0.119
##   ph1 (mean (SD))         7.40 (0.09)   7.38 (0.11)   0.181
##   wtkilo1 (mean (SD))    68.27 (26.34) 72.36 (27.73)  0.151
##   dnr1 (mean (SD))        0.08 (0.27)   0.07 (0.26)   0.031
##   resp (mean (SD))        0.38 (0.49)   0.29 (0.45)   0.189
##   card (mean (SD))        0.33 (0.47)   0.42 (0.49)   0.193
##   gastr (mean (SD))       0.15 (0.36)   0.19 (0.39)   0.109
##   renal (mean (SD))       0.05 (0.22)   0.07 (0.25)   0.074
##   hema (mean (SD))        0.05 (0.22)   0.05 (0.22)   0.008
##   seps (mean (SD))        0.17 (0.38)   0.24 (0.42)   0.153
##   trauma (mean (SD))      0.01 (0.09)   0.02 (0.12)   0.068

# => After matching how many "bad" variables with SMD > 0.2 are there?
smd_values2 <- ExtractSmd(matchedtab1)
smd_values2[smd_values2 > 0.2, ]

##      aps1   meanbp1     crea1 
## 0.4181082 0.3213080 0.2283157

# print("===================================================")

Outcome analysis

print("===================================================")

## [1] "==================================================="

y_trt <- matched[matched$treatment.swang == 1, 'outcome.died'] # => Put here the vector of outcomes y (from matched$outcome.died) of those in the treatment group
y_con <- matched[matched$treatment.swang == 0, 'outcome.died'] # Put here the vector of outcomes y (from matched$outcome.died) of those in the control group
print("===================================================")

## [1] "==================================================="

Let’s do a paired t-test to see if there are differences in death between the matched groups. This is not the best test because the data is binary but we want to just get a feeling of the differences.

diffy <- y_trt-y_con # pairwise difference
t.test(diffy)

## 
##  One Sample t-test
## 
## data:  diffy
## t = 6.6885, df = 2183, p-value = 2.855e-11
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.06310748 0.11546394
## sample estimates:
##  mean of x 
## 0.08928571

=> Look at the p-value. Is it below 0.05? => What does this mean?

It is indeed below 0.05, so the null hypothesis that the true mean difference is zero can be rejected. This means that there is a difference between the treatment and control groups.

We want to use the McNemar test, which is the best choice for paired (matched) binary data

table(y_trt,y_con)

##      y_con
## y_trt   0   1
##     0 362 336
##     1 531 955

# In this table you will see the 2x2 table:  
#         y_con
# ytreat     0     1
#       0    a     b
#       1    c     d

# Enter these a, b, c, and d numbers in the following McNemar test
# print("===================================================")
mcnemar.test(matrix(c(362,336,531,955),2,2)) # => Complete the command by filling the a, b, c, and d numbers in the correct order.

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(362, 336, 531, 955), 2, 2)
## McNemar's chi-squared = 43.409, df = 1, p-value = 4.44e-11

# => Is the p-value less than 0.05? What does this result mean?
# Yes it is less than 0.05 indicating there is a difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment has an impact on the outcome. 
# print("===================================================")

The last way we want to analyse the outcome is with logistic regression

gmodel1 <- glm(outcome.died ~ treatment.swang, family=binomial, data=matched)
print("===================================================")

## [1] "==================================================="

# => Is the coefficient of treatment significant? you can use summary(gmodel1) or confint(gmodel1)
summary(gmodel1)

## 
## Call:
## glm(formula = outcome.died ~ treatment.swang, family = binomial, 
##     data = matched)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.36859    0.04352   8.468  < 2e-16 ***
## treatment.swang  0.38704    0.06325   6.120 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5729.2  on 4367  degrees of freedom
## Residual deviance: 5691.6  on 4366  degrees of freedom
## AIC: 5695.6
## 
## Number of Fisher Scoring iterations: 4

# =>  What is the odds ratio that is associated with treatment? You can easily derive it from the coefficient (Google otherwise how to get the odds ratio from the coefficient)
exp(coef(gmodel1)[2])

## treatment.swang 
##        1.472613

# =>  If the coefficient is positive, what does that mean in terms of odds ratio?
# The odds ratio for each coefficient represents the average increase in the odds of an individual defaulting, assuming all other predictor variables are held constant. It is 1.472613, meaning that the odds of the outcome (dying) occurring in the treatment group are 1.472613 times higher than the odds of the outcome (dying) occurring in the control group.
# => If the coefficient is negative, what does that mean in terms of odds ratio?
# The odds ratio for each coefficient represents the average decrease in the odds of an individual defaulting, assuming all other predictor variables are held constant. It thus indicates the event is less likely to occur.

# The great thing about modelling is that we can now again adjust for the variables with the worst SMD. This is clled doubly-robust estimation. Adjust for the variable with the worst SMD that you encountered before. To adjust you just add the name of the variable in the logistic regression formula. => Complete the command
gmodel2 <- glm(outcome.died ~ treatment.swang + aps1, family=binomial, data=matched)

# => What is the odds ratio for treatment now?
exp(coef(gmodel2)[2])

## treatment.swang 
##        1.248695

# print("===================================================")

Propensity score matching

Fit a propensity score model to predict treatment (not the outcome!). Use logistic regression with glm

print("===================================================")

## [1] "==================================================="

xvars <- c(xvars)
formula <- as.formula(paste("treatment.swang ~", paste(xvars, collapse = " + ")))
psmodel <-  glm(formula, data = rhc.selected, family = binomial)
# => Use the variables we found before in xvars in the first assignment to predict treatment
print("===================================================")

## [1] "==================================================="

summary(psmodel)   # show coefficients etc

## 
## Call:
## glm(formula = formula, family = binomial, data = rhc.selected)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.1435181  2.8001589   3.265 0.001093 ** 
## ARF         -0.1993146  0.0790906  -2.520 0.011733 *  
## CHF          0.1105408  0.1506968   0.734 0.463235    
## Cirrhosis   -1.4265886  0.2289511  -6.231 4.64e-10 ***
## Lungcancer  -0.8240366  0.5053386  -1.631 0.102962    
## MOSFM       -0.2802613  0.1475816  -1.899 0.057561 .  
## Coma        -0.7781231  0.1655087  -4.701 2.58e-06 ***
## cardiohx     0.0994467  0.0900903   1.104 0.269656    
## chfhx       -0.0015423  0.0982163  -0.016 0.987471    
## dementhx    -0.4737206  0.1143868  -4.141 3.45e-05 ***
## psychhx     -0.4843293  0.1318953  -3.672 0.000241 ***
## chrpulhx    -0.2169838  0.0903336  -2.402 0.016304 *  
## renalhx     -0.3680069  0.1724127  -2.134 0.032806 *  
## liverhx     -0.2261681  0.1701381  -1.329 0.183742    
## malighx     -0.2885066  0.0871562  -3.310 0.000932 ***
## immunhx      0.0148209  0.0700495   0.212 0.832437    
## transhx      0.5068504  0.0935418   5.418 6.01e-08 ***
## age          0.0009015  0.0020687   0.436 0.663000    
## edu          0.0311376  0.0098326   3.167 0.001541 ** 
## das2d3pc     0.0063801  0.0062066   1.028 0.303973    
## aps1         0.0180990  0.0022395   8.082 6.38e-16 ***
## scoma1      -0.0008406  0.0012483  -0.673 0.500704    
## meanbp1     -0.0073053  0.0009272  -7.879 3.31e-15 ***
## wblc1        0.0017317  0.0025814   0.671 0.502322    
## temp1       -0.0084141  0.0182504  -0.461 0.644773    
## alb1        -0.1039610  0.0459299  -2.263 0.023607 *  
## hema1       -0.0180316  0.0043474  -4.148 3.36e-05 ***
## bili1        0.0035635  0.0070830   0.503 0.614897    
## crea1        0.0224782  0.0203248   1.106 0.268750    
## pot1        -0.1689508  0.0320553  -5.271 1.36e-07 ***
## paco21      -0.0188045  0.0032662  -5.757 8.55e-09 ***
## ph1         -1.1484028  0.3601799  -3.188 0.001431 ** 
## wtkilo1      0.0071622  0.0011498   6.229 4.69e-10 ***
## dnr1        -0.7060266  0.1101245  -6.411 1.44e-10 ***
## resp        -0.1985399  0.0752337  -2.639 0.008316 ** 
## card         0.6931842  0.0780785   8.878  < 2e-16 ***
## gastr        0.4735664  0.0991902   4.774 1.80e-06 ***
## renal        0.2907049  0.1427665   2.036 0.041728 *  
## hema        -0.5385458  0.1394851  -3.861 0.000113 ***
## seps         0.2352497  0.0864182   2.722 0.006484 ** 
## trauma       1.3864699  0.3167119   4.378 1.20e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7621.4  on 5734  degrees of freedom
## Residual deviance: 6463.1  on 5694  degrees of freedom
## AIC: 6545.1
## 
## Number of Fisher Scoring iterations: 4

pscore <- psmodel$fitted.values # create propensity score

Now it is time to check the positivity assumption

let’s check the overlap between the propensity scores for the two groups. We do that before the matching.

print("===================================================")

## [1] "==================================================="

#  => plot the density of the propensity scores of the treated and untreated. Tip: you might want to use par(mfrow=c(2,1)) to create 2 panels (2 rows, 1 column) and then plotting the first hist (histogram), with say, ylim=c(0,650) and col='darkblue', and the second histogram with ylim=c(650, 0) (note that we flipped the order) and col='darkblue'. This will plot the blue histogram as a normal histogram and the red one will be plotted upside down underneath the blue one so you can easily see the overlap.

par(mfrow=c(2,1))
hist(pscore[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(pscore[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")

# => What do you think about the overlap? 
# For every propensity score there seems to be at least one subject in the treated group and one in the untreated group, so the positivity assumption is reasonable.
print("===================================================")

## [1] "==================================================="

We want to do greedy matching on PS using Match with a caliper. Note that instead of the probability PS itself, we usually use the log odds (logit) of PS which is log(PS/(1-PS)) which is also equal to log(PS)-log(1-PS) . So let us first write a function (of one line) to calculate the logit.

print("===================================================")

## [1] "==================================================="

logit <- function(p) {
  log(p/(1-p))
} # => Complete this function to calculate the log odds
print("===================================================")

## [1] "==================================================="

Now we want to use Match() with rhc.selected$treatment.swang as treatement, with 1 matched control per treated subject, with X as the logit(pscore), with no replacement and with a caliper of 0.2

print("===================================================")

## [1] "==================================================="

psmatch <- Match(Tr=rhc.selected$treatment.swang, X = logit(pscore), M = 1, caliper = 0.2, replace = FALSE) # => complete the command
print("===================================================")

## [1] "==================================================="

matched <- rhc.selected[unlist(psmatch[c("index.treated","index.control")]), ]

Get standardized differences. Note: xvars is the same as in the first assignment.

print("===================================================")

## [1] "==================================================="

matchedtab1 <- CreateTableOne(vars=xvars, strata ="treatment.swang",
                            data=matched, test = FALSE) # => Complete the command
# => print the matching table showing the SMD
print(matchedtab1, smd=TRUE)

##                         Stratified by treatment.swang
##                          0             1             SMD   
##   n                       1734          1734               
##   ARF (mean (SD))         0.45 (0.50)   0.44 (0.50)   0.034
##   CHF (mean (SD))         0.11 (0.31)   0.10 (0.30)   0.019
##   Cirrhosis (mean (SD))   0.03 (0.17)   0.03 (0.16)   0.010
##   Lungcancer (mean (SD))  0.00 (0.07)   0.00 (0.05)   0.028
##   MOSFM (mean (SD))       0.08 (0.27)   0.08 (0.27)   0.004
##   Coma (mean (SD))        0.04 (0.21)   0.05 (0.21)   0.016
##   cardiohx (mean (SD))    0.21 (0.41)   0.20 (0.40)   0.023
##   chfhx (mean (SD))       0.20 (0.40)   0.20 (0.40)   0.013
##   dementhx (mean (SD))    0.07 (0.26)   0.07 (0.26)   0.009
##   psychhx (mean (SD))     0.05 (0.23)   0.05 (0.22)   0.010
##   chrpulhx (mean (SD))    0.15 (0.36)   0.16 (0.36)   0.014
##   renalhx (mean (SD))     0.05 (0.22)   0.05 (0.22)   0.011
##   liverhx (mean (SD))     0.06 (0.25)   0.06 (0.25)  <0.001
##   malighx (mean (SD))     0.23 (0.42)   0.22 (0.41)   0.030
##   immunhx (mean (SD))     0.29 (0.46)   0.28 (0.45)   0.022
##   transhx (mean (SD))     0.12 (0.32)   0.13 (0.34)   0.038
##   age (mean (SD))        61.09 (17.36) 60.85 (15.62)  0.014
##   edu (mean (SD))        11.83 (3.21)  11.77 (3.18)   0.018
##   das2d3pc (mean (SD))   20.70 (5.62)  20.66 (5.04)   0.007
##   aps1 (mean (SD))       57.75 (19.26) 57.38 (19.23)  0.019
##   scoma1 (mean (SD))     18.91 (28.79) 18.62 (27.91)  0.010
##   meanbp1 (mean (SD))    72.63 (34.73) 72.83 (35.40)  0.006
##   wblc1 (mean (SD))      15.77 (12.82) 15.82 (12.52)  0.004
##   temp1 (mean (SD))      37.62 (1.93)  37.63 (1.75)   0.005
##   alb1 (mean (SD))        3.03 (0.71)   3.03 (0.95)   0.002
##   hema1 (mean (SD))      30.65 (8.00)  30.89 (7.51)   0.030
##   bili1 (mean (SD))       2.49 (5.41)   2.49 (4.98)   0.001
##   crea1 (mean (SD))       2.32 (2.34)   2.31 (2.00)   0.003
##   pot1 (mean (SD))        4.06 (1.04)   4.05 (0.99)   0.007
##   paco21 (mean (SD))     37.19 (11.17) 37.40 (11.26)  0.019
##   ph1 (mean (SD))         7.39 (0.11)   7.39 (0.11)   0.007
##   wtkilo1 (mean (SD))    70.11 (26.23) 70.84 (27.25)  0.028
##   dnr1 (mean (SD))        0.09 (0.29)   0.08 (0.28)   0.031
##   resp (mean (SD))        0.32 (0.47)   0.33 (0.47)   0.007
##   card (mean (SD))        0.40 (0.49)   0.40 (0.49)  <0.001
##   gastr (mean (SD))       0.18 (0.39)   0.18 (0.38)   0.010
##   renal (mean (SD))       0.06 (0.24)   0.06 (0.24)   0.002
##   hema (mean (SD))        0.06 (0.25)   0.06 (0.24)   0.017
##   seps (mean (SD))        0.21 (0.41)   0.21 (0.41)   0.007
##   trauma (mean (SD))      0.01 (0.10)   0.01 (0.11)   0.011

# => Are there any variables with bad balance now?
smd_values3 <- ExtractSmd(matchedtab1)
smd_values3[smd_values3 > 0.2, ]

## numeric(0)

# There are no variables with bad balance now 
print("===================================================")

## [1] "==================================================="

Outcome analysis after propensity score matching

print("===================================================")

## [1] "==================================================="

y_trt.ps <- matched[matched$treatment.swang == 1, 'outcome.died']
# => Put here the vector of outcomes y (from matched$outcome.died) of those in the treatment group
y_con.ps <- matched[matched$treatment.swang == 0, 'outcome.died']
# Put here the vector of outcomes y (from matched$outcome.died) of those in the control group
print("===================================================")

## [1] "==================================================="

Perform a t-test

# Perform paired t-test. Just to get a feeling of the result
diffy.ps <- y_trt.ps - y_con.ps # pairwise difference
t.test(diffy.ps)

## 
##  One Sample t-test
## 
## data:  diffy.ps
## t = 3.4305, df = 1733, p-value = 0.0006167
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.02370986 0.08701678
## sample estimates:
##  mean of x 
## 0.05536332

print("===================================================")

## [1] "==================================================="

# => Is the test significant?
# The test is significant, meaning that the null hypothesis can be rejected and the true mean is not equal to 0 ( = there is a difference between the two groups), since the p-value is less than 0.05.
print("===================================================")

## [1] "==================================================="

Use the McNemar test as before. Enter the a, b, c, and d numbers you get from the 2x2 table in the McNemar test

print("===================================================")

## [1] "==================================================="

mccnemar_tab <- table(y_trt.ps,y_con.ps)
mcnemar.test(matrix(c(mccnemar_tab[1,1], mccnemar_tab[1,2], mccnemar_tab[2,1], mccnemar_tab[2,2]),2,2)) # => Complete the command

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(mccnemar_tab[1, 1], mccnemar_tab[1, 2], mccnemar_tab[2, 1], mccnemar_tab[2, 2]), 2, 2)
## McNemar's chi-squared = 11.453, df = 1, p-value = 0.0007138

# => Is the p-value less than 0.05? What does this result mean?
# Yes it is less than 0.05 indicating there is a difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment has an impact on the outcome. 
# => Compare the difference in the means before and after matching.
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])

## [1] 0.6804029

mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])

## [1] 0.6296818

mean(matched[matched$treatment.swang == 1, 'outcome.died'])

## [1] 0.6799308

mean(matched[matched$treatment.swang == 0, 'outcome.died'])

## [1] 0.6245675

# Means are almost the same. 
print("===================================================")

## [1] "==================================================="

Use logistic regression

gmodel3 <- glm(outcome.died ~ treatment.swang, family="binomial", data = matched)
print("===================================================")

## [1] "==================================================="

# => What is the odds ratio of treatment?
exp(coef(gmodel3)[2])

## treatment.swang 
##        1.276948

# => Now correct also for the least well balanced variable
gmodel3 <- glm(outcome.died ~ treatment.swang + malighx, family="binomial", data = matched)
# => What is the odds ratio of treatment now?
exp(coef(gmodel3)[2])

## treatment.swang 
##        1.306322

print("===================================================")

## [1] "==================================================="

Open part

Instead of the glm to calculate the propensity score use any machine learning technique you like such as a regression tree (using rpart), or random forests (randomForest) or neural networks (neuralnet) or another formalism in R to create the propensity score model and to predict the probabilities on such a model. Note that you might need to tweak the model parameters in order to improve it. Compare your results to those you obtained by matching (via the Mahalanobis distance and the propensity score with logistic regression). Which approach do you prefer, and why?

#install.packages("randomForest")
library(randomForest)

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

rhc.selected$treatment.swang <- factor(rhc.selected$treatment.swang)
formula <- as.formula(paste("treatment.swang ~", paste(xvars, collapse = " + ")))
rf.out <- randomForest(formula = formula, data=rhc.selected)
eps <- rf.out$votes[,2]
rf.out

## 
## Call:
##  randomForest(formula = formula, data = rhc.selected) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 6
## 
##         OOB estimate of  error rate: 28.63%
## Confusion matrix:
##      0    1 class.error
## 0 3029  522   0.1470008
## 1 1120 1064   0.5128205

par(mfrow=c(2,2))
hist(eps[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(pscore[rhc.selected$treatment.swang == 1], ylim = c(0, 650), col = 'darkblue', main = "Treated")
hist(eps[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")
hist(pscore[rhc.selected$treatment.swang == 0], ylim = c(650, 0), col = 'darkblue', main = "Untreated")

rhc.selected$treatment.swang <- as.numeric(rhc.selected$treatment.swang) - 1

epsmatch <- Match(Tr=rhc.selected$treatment.swang, X = logit(eps), M = 1, caliper = 0.2, replace = FALSE)
matched1 <- rhc.selected[unlist(epsmatch[c("index.treated","index.control")]), ]

matchedtab2 <- CreateTableOne(vars=xvars, strata ="treatment.swang",
                            data=matched1, test = FALSE) # => Complete the command
print(matchedtab2, smd=TRUE)

##                         Stratified by treatment.swang
##                          0             1             SMD   
##   n                       1674          1674               
##   ARF (mean (SD))         0.46 (0.50)   0.43 (0.50)   0.054
##   CHF (mean (SD))         0.08 (0.27)   0.10 (0.29)   0.051
##   Cirrhosis (mean (SD))   0.04 (0.19)   0.02 (0.15)   0.085
##   Lungcancer (mean (SD))  0.00 (0.07)   0.00 (0.05)   0.029
##   MOSFM (mean (SD))       0.09 (0.28)   0.08 (0.27)   0.039
##   Coma (mean (SD))        0.05 (0.23)   0.05 (0.21)   0.041
##   cardiohx (mean (SD))    0.19 (0.40)   0.20 (0.40)   0.015
##   chfhx (mean (SD))       0.19 (0.40)   0.19 (0.39)   0.005
##   dementhx (mean (SD))    0.10 (0.30)   0.07 (0.26)   0.103
##   psychhx (mean (SD))     0.07 (0.25)   0.05 (0.21)   0.092
##   chrpulhx (mean (SD))    0.15 (0.36)   0.15 (0.36)   0.003
##   renalhx (mean (SD))     0.06 (0.24)   0.05 (0.22)   0.044
##   liverhx (mean (SD))     0.07 (0.26)   0.06 (0.24)   0.046
##   malighx (mean (SD))     0.26 (0.44)   0.22 (0.41)   0.098
##   immunhx (mean (SD))     0.31 (0.46)   0.29 (0.45)   0.046
##   transhx (mean (SD))     0.11 (0.31)   0.14 (0.34)   0.080
##   age (mean (SD))        61.42 (16.47) 61.03 (16.05)  0.024
##   edu (mean (SD))        11.66 (3.13)  11.80 (3.17)   0.043
##   das2d3pc (mean (SD))   20.49 (5.37)  20.59 (5.10)   0.019
##   aps1 (mean (SD))       58.99 (18.97) 57.62 (19.29)  0.071
##   scoma1 (mean (SD))     20.22 (29.64) 19.06 (28.22)  0.040
##   meanbp1 (mean (SD))    73.12 (35.78) 73.34 (35.98)  0.006
##   wblc1 (mean (SD))      16.28 (13.18) 16.07 (12.65)  0.016
##   temp1 (mean (SD))      37.65 (1.88)  37.62 (1.78)   0.019
##   alb1 (mean (SD))        3.02 (0.71)   3.02 (0.95)   0.005
##   hema1 (mean (SD))      30.55 (8.06)  30.69 (7.53)   0.018
##   bili1 (mean (SD))       2.80 (5.99)   2.49 (5.04)   0.055
##   crea1 (mean (SD))       2.41 (2.21)   2.28 (2.01)   0.063
##   pot1 (mean (SD))        4.14 (1.12)   4.03 (1.00)   0.102
##   paco21 (mean (SD))     37.40 (11.61) 37.29 (11.33)  0.009
##   ph1 (mean (SD))         7.39 (0.11)   7.39 (0.11)   0.006
##   wtkilo1 (mean (SD))    70.15 (25.91) 69.90 (27.40)  0.009
##   dnr1 (mean (SD))        0.11 (0.31)   0.08 (0.27)   0.104
##   resp (mean (SD))        0.36 (0.48)   0.33 (0.47)   0.062
##   card (mean (SD))        0.36 (0.48)   0.39 (0.49)   0.067
##   gastr (mean (SD))       0.18 (0.38)   0.19 (0.40)   0.041
##   renal (mean (SD))       0.07 (0.26)   0.07 (0.25)   0.014
##   hema (mean (SD))        0.08 (0.27)   0.06 (0.23)   0.093
##   seps (mean (SD))        0.20 (0.40)   0.21 (0.41)   0.022
##   trauma (mean (SD))      0.01 (0.09)   0.02 (0.12)   0.066

smd_values4 <- ExtractSmd(matchedtab2)
smd_values4[smd_values4 > 0.2, ]

## numeric(0)

y_trt.eps <- matched1[matched1$treatment.swang == 1, 'outcome.died']
y_con.eps <- matched1[matched1$treatment.swang == 0, 'outcome.died']

# Perform paired t-test. Just to get a feeling of the result
diffy.eps <- y_trt.eps - y_con.eps # pairwise difference
t.test(diffy.eps)

## 
##  One Sample t-test
## 
## data:  diffy.eps
## t = 1.1228, df = 1673, p-value = 0.2617
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.01338447  0.04922676
## sample estimates:
##  mean of x 
## 0.01792115

#t-test is not significant, so H0 cannot be rejected and the true mean could still be equal to zero, indicating no difference between the two groups

mccnemar_tab1 <- table(y_trt.eps,y_con.eps)
mcnemar.test(matrix(c(mccnemar_tab1[1,1], mccnemar_tab1[1,2], mccnemar_tab1[2,1], mccnemar_tab1[2,2]),2,2))

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(mccnemar_tab1[1, 1], mccnemar_tab1[1, 2], mccnemar_tab1[2, 1], mccnemar_tab1[2, 2]), 2, 2)
## McNemar's chi-squared = 1.1779, df = 1, p-value = 0.2778

#not significant/or significant, so H0 cannot be rejected so there could be no difference in proportion of people that died when receiving treatment versus when not receiving treatment. This means that the treatment doesn't necessarily have an impact on the outcome.

# => Compare the difference in the means before and after matching.
mean(rhc[rhc$treatment.swang == 1, 'outcome.died'])

## [1] 0.6804029

mean(rhc[rhc$treatment.swang == 0, 'outcome.died'])

## [1] 0.6296818

mean(matched[matched$treatment.swang == 1, 'outcome.died'])

## [1] 0.6799308

mean(matched[matched$treatment.swang == 0, 'outcome.died'])

## [1] 0.6245675

mean(matched1[matched1$treatment.swang == 1, 'outcome.died'])

## [1] 0.6780167

mean(matched1[matched1$treatment.swang == 0, 'outcome.died'])

## [1] 0.6600956

#There is less difference in the means using random forest to create the propensity score and match on these scores. This makes sense since there is no significant difference in the means.

gmodel4 <- glm(outcome.died ~ treatment.swang, family="binomial", data = matched1)
# => What is the odds ratio of treatment?
exp(coef(gmodel4)[2])

## treatment.swang 
##        1.084319

# => Now correct also for the least well balanced variable
gmodel4 <- glm(outcome.died ~ treatment.swang + transhx, family="binomial", data = matched1)
# => What is the odds ratio of treatment now?
exp(coef(gmodel4)[2])

## treatment.swang 
##        1.101813

#install.packages("ggplot2")
library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':
## 
##     margin

library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked _by_ '.GlobalEnv':
## 
##     table1

dataPlot <- data.frame(covariates = rownames(smd_values3),
                       Greedy = as.numeric(smd_values2),
                       "PS with LR" = as.numeric(smd_values3),
                       "PS with RF" = as.numeric(smd_values4))

dataPlotMelt <- pivot_longer(data = dataPlot, cols = -covariates, names_to = "Method", values_to = "SMD")

varNames <- as.character(dataPlot$covariates)[order(dataPlot$Greedy)]

dataPlotMelt$variable <- factor(dataPlotMelt$covariates,
                                levels = varNames)

ggplot(data = dataPlotMelt,
       mapping = aes(x = covariates, y = SMD, group = Method, color = Method)) +
    geom_line() +
    geom_point() +
    geom_hline(yintercept = 0.2, color = "black", linewidth = 0.1) +
    coord_flip() +
    theme_bw() +
    theme(legend.key = element_blank())

# After matching on the propensity score created by random forest, there are no covariates left with a bad balance. However, there is no significant difference between the means of the two groups. This indicates that the covariates with a bad balance do influence the outcome (confounder variables) when using greedy matching.

Casual Inference Notebook

Open part

Understand the datase:

Load packages

Read database

Understanding the dataset

Preprocess

Check

Create data frame with variables of interest

Selecting variables to control for

Inspect balance before matching

look at “table 1”

Greedy matching on Mahalanobis distance

Outcome analysis

Let’s do a paired t-test to see if there are differences in death between the matched groups. This is not the best test because the data is binary but we want to just get a feeling of the differences.

We want to use the McNemar test, which is the best choice for paired (matched) binary data

The last way we want to analyse the outcome is with logistic regression

Propensity score matching

Now it is time to check the positivity assumption

let’s check the overlap between the propensity scores for the two groups. We do that before the matching.

Get standardized differences. Note: xvars is the same as in the first assignment.

Outcome analysis after propensity score matching

Open part