This section contains the setup and the various utility functions used throughout.
Libraries used:
library(data.table)
library(ggplot2)
library(ggrepel)
library(scico)
library(cpsurvsim)
library(survival)
library(parallel)
sl <- scico(palette = "lajolla", n = 9)
Compiled code (any models used are shown later):
# mod1 <- rstan::stan_model(".//stan//logit_notrand.stan", verbose = FALSE)
# rstan::expose_stan_functions(mod1)
A common approach for binary events e.g. mortality at 14 days is logistic regression among patients admitted to hospital for COVID19. Alternatively, we could use survival analysis, which can have higher efficiency and there increased power.
There has been some commentary on which of these is more appropriate in acute settings. However, the objective of the study should drive which approach is used. Survival methods test the rate at which patients die, but in an acute setting does it actually matter if you die at 5 versus 9 days? What is probably of more interest is whether you die and this would tend to lead use of the logistic regression approach, although you can get a (KM?) estimate from a coxph model (if there are events at the time of interest).
If there are a lot of patients that are censored before the day 28 cutoff then the estimates from the logistic regression will be biased. However, if there is censoring before day 28 then this is possibly going to be informative, which can also create problems if not accounted for.
Let’s say the primary endpoint is a 7 point ordinal scale, assessed on day 14, ranging from (1) not hospitalised and no limitations to (7) death. The complete set of categories are 1. Not hospitalised, no limitations on activities 2. Not hospitalised, limitation on activities 3. Hospitalised, not requiring supplemental oxygen 4. Hospitalised, requiring supplemental oxygen 5. Hospitalised, on non-invasive ventilation or high flow oxygen devices 6. Hospitalised, on invasive mechanical ventilation or extracorporeal membrane oxygenation (ECMO) 7. Death
Assume that empirically the probability of being in each group maps to some latent continuous score as follows.
p_tru <- c(0.01,0.30,0.34,0.155,0.048,0.04,0.107)
d1 <- data.table(cat = 1:6, cuts = qlogis(cumsum(p_tru))[1:6])
ggplot(d1, aes(x = cuts)) +
geom_vline(aes(xintercept = cuts), lwd = 0.3, lty = 2) +
geom_text_repel(aes(x = cuts, y = 0.1, label = paste0(cat)),
box.padding = 2.5,
max.overlaps = Inf,
nudge_x = 0.2,
ylim = c(0.1, Inf), col = 2,
size = 3, inherit.aes = F) +
stat_function(fun = dlogis, n = 101) +
scale_x_continuous("log-odds", lim = c(-5, 5)) +
scale_y_continuous("Density") +
theme_bw()
Figure 1: Figure: Location of cutpoints on logistic density based on probability of score at day 14
The verticals show the cut-points, which reflect the upper bound of being in each group as indicated by the labels.
The data suggested that probability of being in group 1 is zero for day 14, but at day 28, the probability of being in group 1 is about 2%. Zeros are problematic, so for the purposes of this exercise, I assume that the probability of being in group 1 at day 14 is 1% and dropped the probability of being in group 2 at day 14 to 30%. About 70% of the patients are still hospitalised at day 28.
If we concentrate purely on death, then if the cumulative probability of being dead by day 10.6%, the survival probability is \(S(14) = 1 - 0.107 = 0.893\). Moreover, if you are still in hospital after day 14, then by day 28, the cumulative probability of being dead will have increased to 12.6%, implying \(S(28) = 1 - 0.126 = 0.874\). Note that here I am assuming that the 12.6% relates to all those that are hospitalised in the first place (rather than 12.6% of those still hospitalised at day 14), but the information that I have is not definitive on whether this is the correct interpretation.
Anyway, if \(S(14) = 0.893\) then \(H(14) = -\log(0.893) = 0.113\), which implies that \(h(t) = 0.008\), or thereabouts, if we assume a constant hazard. Similarly, \(H(28) = -\log(0.874) = 0.135\), so for \(14 < t\) assume \(h(t) = 0.002\) giving you a piecewise cumulative hazard that looks like
l0_14 <- -log(0.893)/14
l0_28 <- (-log(0.874) + log(0.893))/14
x1 <- seq(0, 14, len = 100)
y1 <- x1 * l0_14
y2 <- y1[100] + (x1) * l0_28
y2 <- y2[-1]
y <- c(y1, y2)
x <- seq(0, 28, len = 199)
d1 <- data.table(x, y)
ggplot(d1, aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = -log(0.874), lwd = 0.2, lty = 2) +
scale_x_continuous("Days") +
scale_y_continuous("Cumulative hazard") +
theme_bw()
Figure 2: Figure: Assumed piecewise cumulative hazard
And we can simulate a large number of event times from this distribution to verify that the observed number of events reflects what it should be.
N <- 100000
d2 <- data.table(exp_cdfsim(N, endtime = 28, theta = c(l0_14, l0_28), tau = 14))
setnames(d2, "censor", "status28")
d2[, status14 := ifelse(time < 14, 1, 0)]
# censoring indicator (0 = censored, 1 = event).
sum(d2$status14)/N
sum(d2$status28)/N
## [1] 0.10622
## [1] 0.12516
However, while the hazard after day 14 drops, you would anticipate it to continue to fall after day 28, otherwise the survival times would drop off well beyond what is sensible. As you can see below, if we assumed the same hazard after day 28, then by the end of the year survival is down to about 50%.
N <- 10000
d2 <- data.table(exp_cdfsim(N, endtime = 10000, theta = c(l0_14, l0_28), tau = 14))
sum(d2$censor)
## [1] 10000
lm_coxph <- coxph(Surv(time, censor)~1, data = d2)
dfig <- data.table(
x = seq(0, 365, len = 366),
p = summary(survfit(lm_coxph), time=seq(0, 365, len = 366))$surv
)
ggplot(dfig, aes(x = x, y = p)) +
geom_line() +
scale_x_continuous("Days") +
scale_y_continuous("Survival") +
theme_bw()
Figure 3: Figure: Survival time density
Anyway, putting the above worries aside, let’s say we is a treatment that might reduce the risk of death by 5, 15 or 25%. This means that the 12% risk of death could reduce to 11.97, 10.71 or 9.45% respectively at day 28, with survival 88.03, 89.29 and 90.55% and cumulative hazards 0.127, 0.113 and 0.099. From these I can compute the piecewise hazards in the treatment group for each effect size:
p0 <- 0.126
gg <- c(0.95, 0.85, 0.75)
p1 <- p0 * gg
H0_14 <- -log(0.893)
H0_28 <- -log(0.874)
H1_28 <- -log(1 - p1)
H1_14 <- (H0_14 * H1_28 / H0_28)
l1_14 <- H1_14/14
l1_28 <- (H1_28 - H1_14)/14
l1_28
## [1] 0.001454232 0.001292126 0.001132292
which can be sanity checked via
N <- 1000000
d2 <- rbind(
data.table(exp_cdfsim(N, endtime = 28, theta = c(l1_14[1], l1_28[1]), tau = 14)),
data.table(exp_cdfsim(N, endtime = 28, theta = c(l1_14[2], l1_28[2]), tau = 14)),
data.table(exp_cdfsim(N, endtime = 28, theta = c(l1_14[3], l1_28[3]), tau = 14))
)
d2[, p1 := rep(p1, each = N)]
setnames(d2, "censor", "status28")
d2[, status14 := ifelse(time < 14, 1, 0)]
# censoring indicator (0 = censored, 1 = event).
d2[, .(p28 = mean(status28)), by = p1]
## p1 p28
## 1: 0.1197 0.119144
## 2: 0.1071 0.107530
## 3: 0.0945 0.094571
and now I have the information that I need to evaluate the power using time to event versus logistic regression.
Start by assuming that there is little censoring. Patients are universally followed to day 28 and we know their status at that time (alive/dead).
run_sim <- function(
N = 1100,
theta0 = c(l0_14, l0_28),
theta1 = c(l1_14[1], l1_28[1]),
nsim = 1000,
etime = 28,
indep_cen = TRUE,
pr_indep_cen = 0.05) {
res <- mclapply(
1:nsim, mc.cores = 10,
FUN = function(i){
d0 <- data.table(exp_cdfsim(N, endtime = Inf, theta = theta0, tau = 14))
d1 <- data.table(exp_cdfsim(N, endtime = Inf, theta = theta1, tau = 14))
d <- rbind(d0, d1)
d[, x := rep(0:1, each = N)]
# introduce an additional indep censoring mech
# say 10% => h = -log(0.9)/28
if(indep_cen){
d[, time_cen := rexp(nrow(d), -log(1-pr_indep_cen)/etime) ]
d[, censor := ifelse(time > time_cen, 0, 1)]
d[, time_obs := time]
d[censor == 0, time_obs := time_cen]
# in those that are observed up to the indep censoring
# censor those with time > 28
d[censor == 1, censor := ifelse(time_obs > etime, 0, 1)]
d[censor == 0 & time_obs > etime, time_obs := etime]
} else {
d[, time_cen := etime]
d[, time_obs := time]
d[censor == 1, censor := ifelse(time_obs > etime, 0, 1)]
d[censor == 0, time_obs := etime]
}
dn <- d[, .(y = sum(censor), n = .N), by = x]
lm_logistic <- glm(cbind(y, n-y)~x, data = dn, family = binomial)
lm_coxph <- coxph(Surv(time_obs, censor)~x, data = d)
s1 <- summary(lm_logistic)$coef
s2 <- summary(lm_coxph)$coef
p_logit <- as.numeric(
predict(lm_logistic, type = "response", newdata = data.table(x = 0:1))
)
xx <- tryCatch({
summary(survfit(lm_coxph, newdata = data.table(x = 0)), time=etime)
}, error=function(e) {
return(NA)
})
if(any(is.na(xx))){
m <- summary(survfit(lm_coxph, newdata = data.table(x = 0)))
p1_cox <- 1 - m$surv[length(m$surv)]
} else{
p1_cox <- 1 - summary(
survfit(lm_coxph, newdata = data.table(x = 0)), time=etime)$surv
}
xx <- tryCatch({
summary(survfit(lm_coxph, newdata = data.table(x = 1)), time=etime)
}, error=function(e) {
return(NA)
})
if(any(is.na(xx))){
m <- summary(survfit(lm_coxph, newdata = data.table(x = 1)))
p2_cox <- 1 - m$surv[length(m$surv)]
} else{
p2_cox <- 1 - summary(
survfit(lm_coxph, newdata = data.table(x = 1)), time=etime)$surv
}
win <- c(logistic = s1[2, 4] < 0.05,
coxph = s2[1, 5] < 0.05,
p_logit0 = p_logit[1],
p_logit1 = p_logit[2],
p_coxph0 = p1_cox,
p_coxph1 = p2_cox)
win
})
dres <- data.table(do.call(rbind, res))
dres
}
Under this setting, the power for detecting a treatment effect is about the same.
p0 <- 0.126
l0 <- qlogis(p0)
p1 <- c(0.114, 0.102, 0.09)
l1 <- qlogis(p1)
lor <- l1 - l0
effect_idx <- 3
dres1 <- run_sim(
N = 1100,
theta0 = c(l0_14, l0_28),
theta1 = c(l1_14[effect_idx], l1_28[effect_idx]),
nsim = 5000, etime = 28, indep_cen = FALSE, pr_indep_cen = NA
)
colMeans(dres1)
## logistic coxph p_logit0 p_logit1 p_coxph0 p_coxph1
## 0.65640000 0.65600000 0.12617436 0.09453400 0.12614285 0.09451454
Additionally, the estimates for the probability of death (at day 28) are the same.
dfig <- melt(dres1[, 3:6], measure.vars = names(dres1)[3:6])
dfig[, group := substr(variable, 3, 7)]
dfig[, trt := substr(variable, 8, 8)]
ggplot(dfig, aes(x = value, col = group)) +
geom_density() +
scale_x_continuous("Probability of death") +
scale_y_continuous("Density") +
scale_colour_discrete("Group") +
theme_bw() +
theme(legend.position = "bottom") +
facet_wrap(~paste0("Treatment ", trt))
Figure 4: Figure: Distribution for probability of death estimated under logistic and coxph
Now add some censoring, that is assume there are patients that are censored before day 28. A survival analysis assumes that these patients are similar in their risk of the outcome to those that continue to be observed. However, the reality is that these are probably going to be more or less sick than those that remain in the study and so the censoring is possibly informative.
dres1 <- run_sim(
N = 1100,
theta0 = c(l0_14, l0_28),
theta1 = c(l1_14[effect_idx], l1_28[effect_idx]),
nsim = 5000, etime = 28, indep_cen = TRUE, pr_indep_cen = 0.05
)
colMeans(dres1)
## logistic coxph p_logit0 p_logit1 p_coxph0 p_coxph1
## 0.63580000 0.63740000 0.12391473 0.09293636 0.12593574 0.09447457
dfig <- melt(dres1[, 3:6], measure.vars = names(dres1)[3:6])
dfig[, group := substr(variable, 3, 7)]
dfig[, trt := substr(variable, 8, 8)]
ggplot(dfig, aes(x = value, col = group)) +
geom_density() +
scale_x_continuous("Probability of death") +
scale_y_continuous("Density") +
scale_colour_discrete("Group") +
theme_bw() +
theme(legend.position = "bottom") +
facet_wrap(~paste0("Treatment ", trt))
Figure 5: Figure: Distribution for probability of death estimated under logistic and coxph
Predictably, the power is still about the same, but we are starting to see some bias in the probability estimates derived from the logistic regression.
So, for a short follow up in a captive audience, where there is little censoring prior to the endpoint and interest is in the proportion of deaths rather than rate of death, logistic seems reasonable and has a more sensible interpretation. However, you would need to accept the fact that you might have a little bias in the parameter estimates, but survival analyses are not immune to this either, especially where you have informative censoring.