This section contains the setup and the various utility functions used throughout.
Libraries used:
library(data.table)
library(ggplot2)
library(mvtnorm)
library(extraDistr)
library(survival)
library(parallel)
rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)
Compiled code (any models used are shown later):
# mod1 <- rstan::stan_model(".//stan//logit_notrand.stan", verbose = FALSE)
# rstan::expose_stan_functions(mod1)
Assume the following:
If not properly accounted for the secular trend can lead to time-confounding. We can illustrate this, and an attempt to verify a remedy to it, via simulation.
First construct a data generation function and visualise the distribution of event times by group when there is a non-zero effect of treatment.
# function to simulate exponential survival data in two groups
get_data <- function(
N0 = 100, N1 = 100,
h0 = -log(0.5)/4,
hr = 1,
admin_censor = 15){
y0 <- rexp(N0, h0)
y1 <- rexp(N1, h0*hr)
d <- data.table(
id = c(1:N0, (N0+1):(N0+N1)),
trt = c(rep(0, N0), rep(1, N1)),
y = c(y0, y1),
evt = 1
)
d[trt == 0, treatment := "Control"]
d[trt == 1, treatment := "Active"]
d[y > admin_censor, `:=`(evt = 0, y = admin_censor)]
d
}
d <- get_data(1000, hr = 0.75, admin_censor = 50)
ggplot(d, aes(x = y, group = trt, col = factor(treatment))) +
geom_density() +
scale_x_continuous("Time to event") +
scale_color_discrete("Group") +
theme_bw() +
theme(legend.position="bottom")
To establish that the true parameter estimate can actually be recovered using the Cox PH model, first assume that all participants are recruited all at the same time so that there are no complications associated with the secular trend. In the following, I assume that there is no treatment effect (the hazard ratio is 1) and it seems that we can obtain an unbiased estimate of the hazard ratio with a type-I error rate within the nominal bounds. We assume that there are 100 participants in each arm. The model used here only includes a term for the treatment effect. Note that the test we use to assess whether a treatment effect was observed is a one-sided test, evaluating whether the p-value for the treatment effect is less than 0.025.
res1 <- data.table(do.call(rbind, mclapply(1:1000, FUN = function(j){
# all recruited at the same time
d <- get_data(
N0 = 100, N1 = 100,
h0 = -log(0.5)/4,
hr = 1,
admin_censor = 15
)
f1 <- coxph(Surv(y, evt)~trt, data = d)
pars <- summary(f1)$coef
win <- pars[5] < 0.025
hr <- pars[2]
c(win=win, hr=hr)
}, mc.cores = 8)))
ggplot(res1, aes(x = hr)) +
geom_density() +
scale_x_continuous("Estimated HR") +
geom_vline(xintercept = 1, linewidth = 0.7, col =2) +
theme_bw()
The figure shows that the distribution of the estimated hazard ratio is centred around one; the mean value is 1.01. Additionally, the type-I error rate is 0.02, which is close to the nominal 0.025 that we desire.
Now, consider the same approach to the analysis (with a single covariate for treatment) but now we also assume a temporal trend in the background hazard of all-cause mortality over the duration of the study. Also, while we assume that there are 100 participants in each arm, this time they are assumed to be distributed over the study recruitment window of four years. For the example, assume that the study runs over four years and the enrolment pattern for the control arm is \((48,32,17,3)\) participants over the four years and the enrolment pattern for the new intervention arm is \((3,17,32,48)\). For the trend, we will assume that the median time to event increases linearly from 4 at the start of the study to 5.5 in the last year of the study.
res2 <- data.table(do.call(rbind, mclapply(1:1000, FUN = function(j){
# recruited at year 1
d1 <- get_data(
N0 = 48, N1 = 3,
h0 = -log(0.5)/4,
hr = 1,
admin_censor = 15
)
d1[, cohort := 1]
d2 <- get_data(
N0 = 32, N1 = 17,
h0 = -log(0.5)/4.5,
hr = 1,
admin_censor = 15
)
d2[, cohort := 2]
d3 <- get_data(
N0 = 17, N1 = 32,
h0 = -log(0.5)/5,
hr = 1,
admin_censor = 15
)
d3[, cohort := 3]
d4 <- get_data(
N0 = 3, N1 = 48,
h0 = -log(0.5)/5.5,
hr = 1,
admin_censor = 15
)
d4[, cohort := 4]
d <- rbind(d1, d2, d3, d4)
f1 <- coxph(Surv(y, evt)~trt, data = d)
pars <- summary(f1)$coef
win <- pars[5] < 0.025
hr <- pars[2]
c(win=win, hr=hr)
}, mc.cores = 8)))
ggplot(res2, aes(x = hr)) +
geom_density() +
scale_x_continuous("Estimated HR") +
geom_vline(xintercept = 1, linewidth = 0.7, col =2) +
theme_bw()
The figure shows that the distribution of the estimated hazard ratio is no longer centred around one; the mean value is 0.86. That is, we are flagging a treatment effect, when by construction we know that one does not exist. Accordingly, the type-I error rate is now 0.11, which is inflated relative to the base case.
Finally, we attempt to ameliorate this issue by including a term in the model to reflect the year of enrolment.
res3 <- data.table(do.call(rbind, mclapply(1:1000, FUN = function(j){
# recruited at year 1
d1 <- get_data(
N0 = 48, N1 = 3,
h0 = -log(0.5)/4,
hr = 1,
admin_censor = 15
)
d1[, cohort := 1]
d2 <- get_data(
N0 = 32, N1 = 17,
h0 = -log(0.5)/4.5,
hr = 1,
admin_censor = 15
)
d2[, cohort := 2]
d3 <- get_data(
N0 = 17, N1 = 32,
h0 = -log(0.5)/5,
hr = 1,
admin_censor = 15
)
d3[, cohort := 3]
d4 <- get_data(
N0 = 3, N1 = 48,
h0 = -log(0.5)/5.5,
hr = 1,
admin_censor = 15
)
d4[, cohort := 4]
d <- rbind(d1, d2, d3, d4)
f1 <- coxph(Surv(y, evt)~trt + factor(cohort), data = d)
pars <- summary(f1)$coef
win <- pars[1,5] < 0.025
hr <- pars[1,2]
c(win=win, hr=hr)
}, mc.cores = 8)))
ggplot(res3, aes(x = hr)) +
geom_density() +
scale_x_continuous("Estimated HR") +
geom_vline(xintercept = 1, linewidth = 0.7, col =2) +
theme_bw()
With adjustment for the year of enrolment, the distribution of the estimated hazard ratio for the treatment term is again centred around one; the mean value is 1.03. Additionally, the type-I error rate is controlled at 0.04 which is close to the nominal value of 0.025 that we desire.