David Lee
“Reproduce Figure 2 and Table 2 (you can skip the hepatotoxicity row of Table 2) of this paper, as well as the results of the log-rank test reported in the caption of Figure 2. You don’t have to calculate the risk set shown at the bottom of Figure 2, although it is recommended that you make sure your risk set matches.”"
I read in the data with library readxl. I also use survminer and survival libraries; the former is used for call ggplot2 functions to plot survival curves.
library(readxl)
## Warning: package 'readxl' was built under R version 3.3.2
library(survminer)
## Warning: package 'survminer' was built under R version 3.3.2
## Loading required package: ggplot2
library(survival)
setwd("C:/Users/David/Desktop")
tb <- readxl::read_excel("journal.pone.0122587.s002.XLSX")
colnames(tb) <- make.names(colnames(tb))
Clean the data using Prof. Waldron’s code
tb$time <- tb$Exp.wk.48
tb$cens <- tb$Exp.48 == "Yes"
tb$rand <- factor(tb$Random, levels=1:3, labels=c("week1", "week4", "week8"))
tb$rand <- relevel(tb$rand, ref="week8")
We fit the survival curve.
We plot it next with the ggsurvplot function, which builds off ggplot2.
ggsurvplot(fit, risk.table = TRUE, linetype=1:3,
xlab="Weeks after TB therapy",
ylab="Probability of Survival",
risk.table.title="Number at risk (death)"
)
We then use the survidiff function to examine the log-rang test from the caption in the article. We reproduce the log-rank test in the caption of Fig2 are also consistent (Chi-sq of 2.9, 2 DF, p = .2)
survdiff(Surv(time, cens) ~ rand, data=tb)
I reproduced the univariate analysis results in Table 2, starting with “Randomization weeks”
randcox <- coxph(Surv(time,cens) ~ rand, data=tb)
exp(cbind(OR=coef(randcox), confint(randcox)))
Then BMI:
BMIcox <- coxph(Surv(time,cens) ~ BMI, data=tb)
exp(cbind(OR=coef(BMIcox), confint(BMIcox)))
CD4:
tb$cd4 <- ifelse(tb$CD4.0 < 50, 1, 0)
cd4cox <- coxph(Surv(time, cens) ~ cd4, data=tb)
exp(cbind(OR=coef(cd4cox), confint(cd4cox)))
Albumin:
tb$albumin <- ifelse(tb$Albumin < 3, 1, 0)
albumincox <- coxph(Surv(time, cens) ~ albumin, data=tb)
exp(cbind(OR=coef(albumincox), confint(albumincox)))
“The factors which had P-value <= 0.1 in univariate analysis and of interest were included in the multivariate model.” - Significant included: CD4, Serum, BMI, Hepatoxicity (which we don’t have) - But Randomization weeks were non-significant so they must have been “of interest”??
multicox <- coxph(Surv(time,cens) ~ cd4 + albumin + rand + BMI, data=tb)
exp(cbind(OR=coef(multicox), confint(multicox)))
The plot resembles the authors’.
I also successfully reproduced the log-rank test in the caption of Fig2.
survdiff(Surv(time, cens) ~ rand, data=tb)
## Call:
## survdiff(formula = Surv(time, cens) ~ rand, data = tb)
##
## n=478, 5 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## rand=week8 155 17 21.5 0.959 1.465
## rand=week1 163 27 20.9 1.783 2.684
## rand=week4 160 20 21.6 0.113 0.172
##
## Chisq= 2.9 on 2 degrees of freedom, p= 0.235
I was able to reproduce results for randomization week.
randcox <- coxph(Surv(time,cens) ~ rand, data=tb)
exp(cbind(OR=coef(randcox), confint(randcox)))
## OR 2.5 % 97.5 %
## randweek1 1.640427 0.8940465 3.009912
## randweek4 1.177740 0.6169355 2.248325
Likewise for BMI:
BMIcox <- coxph(Surv(time,cens) ~ BMI, data=tb)
exp(cbind(OR=coef(BMIcox), confint(BMIcox)))
## OR 2.5 % 97.5 %
## BMI 0.9002232 0.8209125 0.9871964
Unfortunately not for CD4:
tb$cd4 <- ifelse(tb$CD4.0 < 50, 1, 0)
cd4cox <- coxph(Surv(time, cens) ~ cd4, data=tb)
exp(cbind(OR=coef(cd4cox), confint(cd4cox)))
## OR 2.5 % 97.5 %
## cd4 1.96621 1.199807 3.222169
Nor albumin:
tb$albumin <- ifelse(tb$Albumin < 3, 1, 0)
albumincox <- coxph(Surv(time, cens) ~ albumin, data=tb)
exp(cbind(OR=coef(albumincox), confint(albumincox)))
## OR 2.5 % 97.5 %
## albumin 2.039028 1.23801 3.358321
In the multivariate model, only BMI approximated the authors’ results.
multicox <- coxph(Surv(time,cens) ~ cd4 + albumin + rand + BMI, data=tb)
exp(cbind(OR=coef(multicox), confint(multicox)))
## OR 2.5 % 97.5 %
## cd4 1.6896816 1.0227073 2.791633
## albumin 1.7489728 1.0453274 2.926266
## randweek1 1.6670189 0.9009511 3.084465
## randweek4 1.1960810 0.6253030 2.287867
## BMI 0.9182254 0.8349547 1.009801
As consistent with authors’ findings, logrank test was not significant suggesting no difference between survival curves. Additionally, Week1 patients had slightly higher as apparent from the figure, but not statistically significant survival comapared to other groups (non-significant difference by randomization weeks for hazard ratios).
Unfortunately, we can’t really draw conclusions from hazard ratios because I couldn’t reproduce most of them.
I, and probably many in the class, had difficulties reproducing univariate analyses and the multivariate analysis for many of the variables.
I also wasn’t sure how they chose variables for the multivariate model, although maybe I didn’t read carefully enough??
The plot was easy to reproduce though!