BIOS621 Homework 3

David Lee

Introduction

“Reproduce Figure 2 and Table 2 (you can skip the hepatotoxicity row of Table 2) of this paper, as well as the results of the log-rank test reported in the caption of Figure 2. You don’t have to calculate the risk set shown at the bottom of Figure 2, although it is recommended that you make sure your risk set matches.”"

Methods

I read in the data with library readxl. I also use survminer and survival libraries; the former is used for call ggplot2 functions to plot survival curves.

library(readxl)
## Warning: package 'readxl' was built under R version 3.3.2
library(survminer)
## Warning: package 'survminer' was built under R version 3.3.2
## Loading required package: ggplot2
library(survival)
setwd("C:/Users/David/Desktop")
tb <- readxl::read_excel("journal.pone.0122587.s002.XLSX")
colnames(tb) <- make.names(colnames(tb))

Clean the data using Prof. Waldron’s code

tb$time <- tb$Exp.wk.48
tb$cens <- tb$Exp.48 == "Yes"
tb$rand <- factor(tb$Random, levels=1:3, labels=c("week1", "week4", "week8"))
tb$rand <- relevel(tb$rand, ref="week8")

Methods: Reproducing Fig 2

We fit the survival curve.

We plot it next with the ggsurvplot function, which builds off ggplot2.

ggsurvplot(fit, risk.table = TRUE, linetype=1:3, 
           xlab="Weeks after TB therapy",
           ylab="Probability of Survival",
           risk.table.title="Number at risk (death)"
           )

We then use the survidiff function to examine the log-rang test from the caption in the article. We reproduce the log-rank test in the caption of Fig2 are also consistent (Chi-sq of 2.9, 2 DF, p = .2)

survdiff(Surv(time, cens) ~ rand, data=tb)

Methods: Reproducing Table 2, univariate

I reproduced the univariate analysis results in Table 2, starting with “Randomization weeks”

randcox <- coxph(Surv(time,cens) ~ rand, data=tb)
exp(cbind(OR=coef(randcox), confint(randcox)))

Then BMI:

BMIcox <- coxph(Surv(time,cens) ~ BMI, data=tb)
exp(cbind(OR=coef(BMIcox), confint(BMIcox)))

CD4:

tb$cd4 <- ifelse(tb$CD4.0 < 50, 1, 0)
cd4cox <- coxph(Surv(time, cens) ~ cd4, data=tb)
exp(cbind(OR=coef(cd4cox), confint(cd4cox)))

Albumin:

tb$albumin <- ifelse(tb$Albumin < 3, 1, 0)
albumincox <- coxph(Surv(time, cens) ~ albumin, data=tb)
exp(cbind(OR=coef(albumincox), confint(albumincox)))

Methods: Reproducing Table 2, univariate

“The factors which had P-value <= 0.1 in univariate analysis and of interest were included in the multivariate model.” - Significant included: CD4, Serum, BMI, Hepatoxicity (which we don’t have) - But Randomization weeks were non-significant so they must have been “of interest”??

multicox <- coxph(Surv(time,cens) ~ cd4 + albumin + rand + BMI, data=tb)
exp(cbind(OR=coef(multicox), confint(multicox)))

Results: Reproducing Fig 2

The plot resembles the authors’.

I also successfully reproduced the log-rank test in the caption of Fig2.

survdiff(Surv(time, cens) ~ rand, data=tb)
## Call:
## survdiff(formula = Surv(time, cens) ~ rand, data = tb)
## 
## n=478, 5 observations deleted due to missingness.
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## rand=week8 155       17     21.5     0.959     1.465
## rand=week1 163       27     20.9     1.783     2.684
## rand=week4 160       20     21.6     0.113     0.172
## 
##  Chisq= 2.9  on 2 degrees of freedom, p= 0.235

Results: Reproducing Table 2, univariate

I was able to reproduce results for randomization week.

randcox <- coxph(Surv(time,cens) ~ rand, data=tb)
exp(cbind(OR=coef(randcox), confint(randcox)))
##                 OR     2.5 %   97.5 %
## randweek1 1.640427 0.8940465 3.009912
## randweek4 1.177740 0.6169355 2.248325

Likewise for BMI:

BMIcox <- coxph(Surv(time,cens) ~ BMI, data=tb)
exp(cbind(OR=coef(BMIcox), confint(BMIcox)))
##            OR     2.5 %    97.5 %
## BMI 0.9002232 0.8209125 0.9871964

Unfortunately not for CD4:

tb$cd4 <- ifelse(tb$CD4.0 < 50, 1, 0)
cd4cox <- coxph(Surv(time, cens) ~ cd4, data=tb)
exp(cbind(OR=coef(cd4cox), confint(cd4cox)))
##          OR    2.5 %   97.5 %
## cd4 1.96621 1.199807 3.222169

Nor albumin:

tb$albumin <- ifelse(tb$Albumin < 3, 1, 0)
albumincox <- coxph(Surv(time, cens) ~ albumin, data=tb)
exp(cbind(OR=coef(albumincox), confint(albumincox)))
##               OR   2.5 %   97.5 %
## albumin 2.039028 1.23801 3.358321

Results: Reproducing Table 2, univariate

In the multivariate model, only BMI approximated the authors’ results.

multicox <- coxph(Surv(time,cens) ~ cd4 + albumin + rand + BMI, data=tb)
exp(cbind(OR=coef(multicox), confint(multicox)))
##                  OR     2.5 %   97.5 %
## cd4       1.6896816 1.0227073 2.791633
## albumin   1.7489728 1.0453274 2.926266
## randweek1 1.6670189 0.9009511 3.084465
## randweek4 1.1960810 0.6253030 2.287867
## BMI       0.9182254 0.8349547 1.009801

Discussion

As consistent with authors’ findings, logrank test was not significant suggesting no difference between survival curves. Additionally, Week1 patients had slightly higher as apparent from the figure, but not statistically significant survival comapared to other groups (non-significant difference by randomization weeks for hazard ratios).

Unfortunately, we can’t really draw conclusions from hazard ratios because I couldn’t reproduce most of them.

Reproducibility

I, and probably many in the class, had difficulties reproducing univariate analyses and the multivariate analysis for many of the variables.

I also wasn’t sure how they chose variables for the multivariate model, although maybe I didn’t read carefully enough??

The plot was easy to reproduce though!