Coursera_Causality course

R Markdown

library(tableone)
library(Matching)

## Loading required package: MASS

## ## 
## ##  Matching (Version 4.9-9, Build Date: 2021-03-15)
## ##  See http://sekhon.berkeley.edu/matching for additional documentation.
## ##  Please cite software as:
## ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ##   Software with Automated Balance Optimization: The Matching package for R.''
## ##   Journal of Statistical Software, 42(7): 1-52. 
## ##

library(MatchIt)

# read in data
load(url("https://biostat.app.vumc.org/wiki/pub/Main/DataSets/rhc.sav"))
# view data
View(rhc)

Optional step; convert some character variables into numeric

ARF<-as.numeric(rhc$cat1=='ARF')
CHF<-as.numeric(rhc$cat1=='CHF')
Cirr<-as.numeric(rhc$cat1=='Cirrhosis')
colcan<-as.numeric(rhc$cat1=='Colon Cancer')
Coma<-as.numeric(rhc$cat1=='Coma')
COPD<-as.numeric(rhc$cat1=='COPD')
lungcan<-as.numeric(rhc$cat1=='Lung Cancer')
MOSF<-as.numeric(rhc$cat1=='MOSF w/Malignancy')
sepsis<-as.numeric(rhc$cat1=='MOSF w/Sepsis')
female<-as.numeric(rhc$sex=='Female')
died<-as.numeric(rhc$death=='Yes')
age<-rhc$age
treatment<-as.numeric(rhc$swang1=='RHC')
meanbp1<-rhc$meanbp1

Make new dataset

mydata<-cbind(ARF,CHF,Cirr,colcan,Coma,lungcan,MOSF,sepsis,
              age,female,meanbp1,treatment,died)
mydata<-data.frame(mydata)

# covariates we will use (shorter list than you would use in practice)
xvars<-c("ARF","CHF","Cirr","colcan","Coma","lungcan","MOSF","sepsis","age","female","meanbp1")

Now we can create a Table 1, prematching. Have summary statistics for the variables (xvars); and then stratify on treatment because we want to look at the balance between the TG and CG. Just see the means and standard deviations and so on.

# look at table 1
table1<- CreateTableOne(vars=xvars,strata="treatment", data=mydata, test=FALSE)
## include standardized mean difference (SMD)
print(table1,smd=TRUE)

##                      Stratified by treatment
##                       0             1             SMD   
##   n                    3551          2184               
##   ARF (mean (SD))      0.45 (0.50)   0.42 (0.49)   0.059
##   CHF (mean (SD))      0.07 (0.25)   0.10 (0.29)   0.095
##   Cirr (mean (SD))     0.05 (0.22)   0.02 (0.15)   0.145
##   colcan (mean (SD))   0.00 (0.04)   0.00 (0.02)   0.038
##   Coma (mean (SD))     0.10 (0.29)   0.04 (0.20)   0.207
##   lungcan (mean (SD))  0.01 (0.10)   0.00 (0.05)   0.095
##   MOSF (mean (SD))     0.07 (0.25)   0.07 (0.26)   0.018
##   sepsis (mean (SD))   0.15 (0.36)   0.32 (0.47)   0.415
##   age (mean (SD))     61.76 (17.29) 60.75 (15.63)  0.061
##   female (mean (SD))   0.46 (0.50)   0.41 (0.49)   0.093
##   meanbp1 (mean (SD)) 84.87 (38.87) 68.20 (34.24)  0.455

We see in the control group, 3551 subjects and in the treated group, 2184 subjects; the means, standard deviations, and the SMDs of the different variables. SMDs greater than 0.1 are problematic. A few of them (cirrhosis, MOSF with sepsis, mean blood pressure)

Now for the matching, we’re using the Match function; Tr is the treatment variable; M equal one has to do with whether you want paiedr match or maybe more than one control subject on a treated; X is a set of variables to match on.

It’ll end up in an index.treated and index.control which will tell you who the treated and controls are. and what their actual ID variable is. If you unlist this, you can end up looking at it.

# do greedy matching on Mahalanobis distance

greedymatch<-Match(Tr=treatment,M=1,X=mydata[xvars],replace=FALSE)
matched<-mydata[unlist(greedymatch[c("index.treated","index.control")]), ]
#View(matched)

matchedtab1<-CreateTableOne(vars=xvars, strata ="treatment", 
                            data=matched, test = FALSE)
print(matchedtab1, smd = TRUE)

##                      Stratified by treatment
##                       0             1             SMD   
##   n                    2184          2184               
##   ARF (mean (SD))      0.42 (0.49)   0.42 (0.49)   0.006
##   CHF (mean (SD))      0.10 (0.29)   0.10 (0.29)  <0.001
##   Cirr (mean (SD))     0.02 (0.15)   0.02 (0.15)  <0.001
##   colcan (mean (SD))   0.00 (0.02)   0.00 (0.02)  <0.001
##   Coma (mean (SD))     0.04 (0.20)   0.04 (0.20)  <0.001
##   lungcan (mean (SD))  0.00 (0.05)   0.00 (0.05)  <0.001
##   MOSF (mean (SD))     0.07 (0.26)   0.07 (0.26)  <0.001
##   sepsis (mean (SD))   0.24 (0.43)   0.32 (0.47)   0.177
##   age (mean (SD))     61.53 (16.15) 60.75 (15.63)  0.049
##   female (mean (SD))   0.44 (0.50)   0.41 (0.49)   0.042
##   meanbp1 (mean (SD)) 73.12 (34.28) 68.20 (34.24)  0.144

We matched 2186 controls with 2186 treated; now the SMDs are very small.

Now that we’re happy with the matching, we can carry out an outcome analysis.

There is the simple paired T-test.

Define two new variables (y for treated and y for control); outcome varaibles are death;

# outcome analysis
y_trt<-matched$died[matched$treatment==1]
y_con<-matched$died[matched$treatment==0]

# pairwise difference
diffy <- y_trt-y_con

# paired t-test
t.test(diffy)

## 
##  One Sample t-test
## 
## data:  diffy
## t = 3.9289, df = 2183, p-value = 8.799e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.02706131 0.08099730
## sample estimates:
## mean of x 
## 0.0540293

Point estimate is 0.054; very small p-value The reason we know iit’s a risk difference is because we’re just taking the difference in outcomes and taking the mean;

Point estimate: 0.054 95% CI: (0.027, 0.080) P-value: < 0.001

Can also carry out a McNemar test; just ask for a table. These are paired outcome observations; so just ask for a table and it will show the counts of each type of pair. For example, 994 pairs had an outcome of one for both groups. 305 pairs were concordant with an outcome of zero. And the more interesting cases are the discordant paris where we see 493 pairs where the treated had an outcome whereas the control did not. 394 pairs had the opposite situation.

# McNemar test
table(y_trt, y_con)

##      y_con
## y_trt   0   1
##     0 303 395
##     1 513 973

395+513 discordant pairs

# McNemar test
mcnemar.test(matrix(c(973, 513, 395, 303), 2, 2))

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  matrix(c(973, 513, 395, 303), 2, 2)
## McNemar's chi-squared = 15.076, df = 1, p-value = 0.0001033

Coursera_Causality course_matching

krhee

7/19/2021

R Markdown