Time-to-event Analysis in Clinical Trials

2024-10-01

Overview

Statistical models
Compare two survival curves
Treatment switching

Statistical models

Overview: Time-to-event analyses

Survival data can be described by 4 entities:

survival probability
hazard probability
prob density function
cumulative density function

Math entities and transformation between them

Overview: Time-to-event analyses

Main assumption: non-informative censoring
Additional simplifying assumptions:
- No cohort effect on survival
- Right censoring only
- Events are independent of each other

Kaplan-Meier curve

Non-parametric method to estimate the survival probability from direct observed survival times
Doesn’t require any assumption of distribution.

Proportional hazards models

Proportional hazards assumption: Hazard can vary, but hazard ratio of two individuals (at the same time) is constant.
Assessing PH assumption
- Kaplan-Meier plot
- Plot \(log(-log(S(t)))\) against (function of) \(t\)
- Schoenfeld residuals:
  - Plot residuals against (function of) time
  - Grambsch-Therneau test
- Time-by-covariate interactions

Handling violation of PH assumption

Time-by-covariate interactions

Stratified Cox regression

Accelerated failure-time model

Cox PH model

Hazard function: \[h(t|X) = h_0(t)e^{\beta X}\]

\(h_0\): baseline hazard
\(e^{\beta X}\): function of covariates

Extended Cox model for time-varying covariates

\[h(t|X) = h_0(t)e^{\beta X(t)}\]

When explanatory variables are collected at more than one time point and change over time, it may be more appropriate to use time varying covariates. The model is robust because it utilizes all available data.

Data for time-varying covariate Cox model need to be in long-table format. Follow-up period of a subject is partitioned into sub-intervals (tstart, tstop] based on measurement times of the covariate(s), and each sub-interval is presented as one row in data frame.

Although a given subject has multiple observations, we generally do not need to worry about correlated data, as this data representation is simply a programming trick. The likelihood equations at any time point use only one copy of any subject, the program picks out the correct row of data at each time. However, when subjects have multiple events, then the rows for the events are correlated within subject and a cluster variance is needed.

Parametric models

Parametric propoportional hazards model: Baseline hazard function is specified
Accelerated failure-time (AFT) models: \[logT = Y = \beta X + W \] where T is event time, X is covariate vector, W is random error, \(\beta\) is vector of regression parameters - log of time ratios/acceleration factors
- Hazard-based form: \[\lambda(t|X) = exp(-\beta X) \lambda_0 (exp(\beta X)t)\]
where \(\lambda_0(t)\) is baseline hazard function corresponding to \(X = 0\)
- Assumptions:
  - Contanst-over-time log time ratio (i.e. log acceleration factors)
  - Linear relationship between each continuous covariate and the log event time.
- Model fit assessment:
  - Information criteria: AIC, BIC
  - Plot the model-based cumulative hazard against the KM estimated cumulative hazard.

In contrast to the Cox PH model, the parametric PH models specify baseline hazard function.
The AFT model provides an alternative to the proportional hazards (PH) model to analyze time-to-event data. In the conventional AFT model, the natural logarithm of the event time, logT, is modeled as a linear function of the covariate vector

The main difference between AFT models and PH models is that AFT models assumes that effects of covariates are multiplicative on time scale, while PH models use the hazard scale.

The choice of the appropriate parametric form is the most difficult part of parametric survival analysis. The specification of the parametric form should be driven by the study hypothesis, along with prior knowledge and biologic plausibility of the shape of the baseline hazard. Data-driven approach:

The most important component of assessing parametric model fit is to check whether the data supports the specified parametric form. This can be assessed visually by graphing the model-based cumulative hazard against the Kaplan-Meier estimated cumulative hazard function. If the specified form is correct, the graph should go through the origin with a slope of 1.

Compare two survival curves

Null hypothesis: the risk of mortality after treatment A is the same as the risk of mortality after treatment B at all time points.

Null hypothesis: the risk of mortality after treatment A is the same as the risk of mortality after treatment B at all time points.

Quantify the difference in treatment benefits:
- hazard ratio (HR)
- median survival time (MT)
- the (cumulative) survival rate
- restricted mean survival time (RMST).

Log-rank test

Test statistic: \[Z= \frac{\sum_{j=1}^{k}(O_j-E_j)}{\sqrt{\sum_{j=1}^{k}V_j}} =\frac{\sum_{j=1}^{k}(d_{1,j} - d_j\frac{n_{1,j}}{n_j})} {\sqrt{\sum_{j=1}^{k}\frac{n_{0,j}n_{1,j}d_j(n_j-d_j)}{n_j^2(n_j-1)}}}\]
Common and classical choice under proportionality assumption
Non-proportionality: power loss

Weighted Log-rank test

Test statistic: \[Z=\frac{\sum_{j=1}^{k}w_j(O_j-E_j)}{\sqrt{\sum_{j=1}^{k}w_j^2V_j}}= \frac{\sum_{j=1}^{k}w_j(d_{1,j} - d_j\frac{n_{1,j}}{n_j})} {\sqrt{\sum_{j=1}^{k}w_j^2\frac{n_{0,j}n_{1,j}d_j(n_j-d_j)}{n_j^2(n_j-1)}}}\]

weighted log-rank test statistics take the form of the weighted sum of the differences of the estimated hazard functions at each observed failure time.

Test whether the hazard difference is zero between the treatment group and the control group.
In standard log-rank test: \(w_j = 1\)
In the non-PH setting: the relative differences of the two hazard functions are not constant over time \(\rightarrow\) a differential weighting (compared to equal weighting in the log-rank statistic) at different time points potentially improve the efficiency of the test statistics.

Weighted Log-rank test (cont.)

Fleming-Harrington \((\rho, \gamma)\) test use weights: \(FH(\rho, \gamma) = \hat{S}(t_j-)^\rho (1-\hat{S}(t_j-))^\gamma\)

\(\hat{S}(t)\): Kaplan Meier estimate of the survival curve in pooled data (both treatment arms)

time \(t_j-\) is the time justbefore \(t_j\)
\(FH(0,0)\): the log-rank statistic, most powerful under the proportional hazards assumption
\(FH(\rho, 0)\) with \(\rho > 0\): early separation (diminishing effect)
\(FH(0, \gamma)\) with \(\gamma > 0\): late separation (delayed effect)
\(FH(\rho, \gamma)\) with \(\rho = \gamma > 0\): the biggest separation of two hazard functions occurs in the middle

Max-Combo test

Test statistic:
\[Z_{max} = max_{\rho, \gamma} \{Z_{FH_{(\rho,\gamma)}}\} \] where \(Z_{FH_{(\rho,\gamma)}}\) is the standardized Fleming-Harrington weighted log-rank statistics.
Original MaxCombo test is interested in the combination of \(FH(0,0)\), \(FH(0,1)\), \(FH(1,1)\) and \(FH(1,0)\)

Modified MaxCombo test:

Option 1: \(FH(0,0)\), \(FH(0,0.5)\), \(FH(0.5,0)\), \(FH(0.5,0.5)\): conservative and less sensitive to tail events.
Option 2: \(FH(0,0)\), \(FH(0,0.5)\), \(FH(0.5,0.5)\): if delayed effect is only possibility

Require appropriate multiplicity control due to the correlation of test statistics

The treatment effect estimate is HR obtained from the weighted Cox model corresponding to the weighted log-rank test with the smallest p-value.

Methods based on difference in median survival times

Depends on information at the median survival point (when the survival rate is equal to 0.5)
Can not apply: large censoring rate or insufficient follow-up time (causing the survival rate does not reach 0.5).
A test based on a difference in MT is not applicable when the crossing point of the survival curves is located near 0.5.

Methods based on difference in RMST

NOT require the assumption of proportional hazards.
Only calculated up to a specified timepoint.
Crossing survival curves: invalid estimates of treatment differences, reduced power

RMST graphically corresponds to the area under KM curve over a specified time.

Although there is no proportional hazards assumption required for this analysis, it does have the major limitation that the area between the curves is only calculated up to a specified time τ, and the statistical significance of the results depends on the chosen timepoint.

Because the selection of based on the observed Kaplan-Meier curves can be cherry-picking when selecting timepoint to maximize the statistical significance. On the other hand, prospectively selecting τ before the study starts can be challenging because of the uncertainties about what the survival curves will look like, and a poor choice could result in dramatically reduced power.

when the two survival curves cross each other, the difference in the RMST may be offset before and after the crossing point, which causes an invalid assessment of the difference in treatment benefits and reduced power

Methods based on area between two survival curves (ABS)

Reflects the absolute benefit of treatment effects between groups.
Robust regardless of non-proportionality or crossing survival curves.

Example

PFS curves from the KEYNOTE-042 trial: compare pembrolizumab with chemotherapy in first-line, metastatic non–small-cell lung cancer.

Mok TSK (2019). Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): A randomised, open-label, controlled, phase 3 trial.

Example (cont.)

Standard log rank test: not significant, HR = 1.07 (95% CI: 0.94, 1.21)
Late-emphasis weighted log-rank test: reject the null hypothesis in favor of pembrolizumab with a one-sided \(P < .0001\)
Max-Combo test:
- reject the null hypothesis in favor of pembrolizumab (one-sided \(P < .0001\))
- same data, reject the null hypothesis in favor of chemotherapy (one-sided \(P < .0001\))
RMST up to 8 months: rejects the null hypothesis in favor of chemotherapy, with a one-sided \(P < .0001\)

Reference: Freidlin B (2019). Methods for accomodating nonproportional hazards in clinical trials: Ready for primary analysis

Recommendations

Kaplan-Meier curves comprehensively dislay treatment effects of study arms
HR and log-rank test as primary analysis tools. Under non-proportional hazards, the HR from the primary analysis can still be meaningfully interpreted as an average HR over time unless there is extensive crossing of the survival curves.
Methods for accommodating nonproportional hazards can be useful secondary analyses

Treatment switching

Estimand framework

Reference: Manitz J (2022). Estimands for Overall Survival in Clinical Trials with Treatment Switching in Oncology

Intercurrent events

Reference: Jin M (2020). Estimand framework: Delineating what to be estimated with clinical questions of interest in clinical trials

Mix of treatment switching scenarios

Reference: Manitz J (2022). Estimands for Overall Survival in Clinical Trials with Treatment Switching in Oncology

Estimands in trials with treatment switching

Reference: Manitz J (2022). Estimands for Overall Survival in Clinical Trials with Treatment Switching in Oncology

Treatment crossover

Reference: Latimer NR (2016). Treatment switching: Statistical and decision-making challenges and approaches

Impact of treatment crossover

Simple methods

Intent to treat:
- Pros: maintain randomisation balance \(\rightarrow\) reducing the possibility of bias affecting results
- Cons: underestimate the effect of experimental treatment
Per-protocol (excluding switchers or censoring at switch):
- Cons: prone to selection bias. Randomisation balance is broken if patients with a good or poor prognosis are more likely to switch.
Treatment as a time-varying covariate: \[\lambda_i(t)= \lambda_0(t) exp[\beta X_i(t)] \] where \(\lambda_0(t)\) is baseline hazard function and \(X_i(t)\) = 0 when patient receive the control treatment and = 1 when patient receives the experimental treatment.
- Cons: prone to selection bias if switching is related to prognosis.

Rank preserving structural failure time model (RPSFT)

Produce counter-factual event times to estimate a causal treatment effect.

Split observed event time for patient \(i\): \(T_i=T_i^{off}+T_i^{on}\), where \(T_i^{off}\) and \(T_i^{on}\) represent the time spent off and on treatment, respectively.

Rank preserving structural failure time model (RPSFT)

Counterfactual event times: \(U_i = T_i^{off}+T_i^{on}*exp(\psi)\), where \(exp(-\psi)\) is acceleration factor.

RPSFT

Estimation:

g-estimation (Grid search) of possible values of \(\psi\)’s to find ‘true’ effect treatment \(\psi_0\)such that \(U_i\) is independent of \(R_i\).

After identifying \(\psi_0\), calculate survival times adjusted for treatment switching for the control group.

Estimate treatment effect (g-estimation) and untreated (counterfactual) survival times

RPSFT

Re-censoring:

Consider a case: A patient has his observed event time \(T_i\) extended and get censored because switching to a superior treatment, whilst he would observe event if not switch. Therefore, when change from \(T_i\) to working on \(U_i\) scale, it requires re-censoring for some patients.

Let \(C_i\) be the administrative censoring time for participant \(i\) on \(T_i\) scale. A participant is recensored (on \(U_i\) scale) at the minimum possible censoring time:

\[D^∗_i(ψ)=min(C_i,C_i exp(ψ))\]

If \(D^∗_i(ψ)<U_i(ψ)\), then update \(U_i\) = \(D^∗_i\) and censoring indicator = 0.

For treatment arm where switching does not occur, there can be no informative censoring and so re-censoring is not applied

Reference: Allison A (2017). rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

RPSFT

Illustration of calculation of underlying quantities in estimation procedure:

Patients A and B with latent survival time \(U_i\)= 3 months,and administrative censoring time \(C_i\)= 4 months. Beneficial active treatment with \(\psi = ln(0.5)\)
Patient A is randomized to control and crosses over at time \(t_i\)= 2 so is exposed to active treatment for 2 months and has an observed survival time of \(T_i\) = 4 months (3 months + 1 month extra)
Patient B is randomized to active so is exposed to active treatment from \(t_i\)= 0 to 4 months and would have a survival time \(T_i\) = 5 months (3 months + 2 months extra) which will be administratively censored so we observe \(T_i\)= 4.
\(D^∗_i(ψ)=min(C_i,C_i exp(ψ) )= 2\) months, so both patients are recensored at 2 months

Reference: Korhonen P (2012) Correcting Overall Survival for the Impact of Crossover Via a Rank-Preserving Structural Failure Time (RPSFT) Model in the RECORD-1 Trial of Everolimus in Metastatic Renal-Cell Carcinoma, Journal of Biopharmaceutical Statistics

RPSFT

Estimate adjusted hazard ratio

RPSFT

Assumptions and Considerations:

Randomization assumption
“Common treatment effect” assumption:
- The treatment effect is the same for all participants no matter when treatment is received.

RPSFT

Assumptions and Considerations:

“Common treatment effect” assumption (cont.)
- clinically implausible: treatment switching is often only permitted after disease progression \(\rightarrow\) the capacity for a patient to benefit may be different compared to before progression
- approximately true? - whether the treatment effect received by switchers can at least be expected to be similar to the effect received by patients initially randomized to the experimental group
- Extension: RPSFT with a treatment-effect modifier variable, allowing the treatment effect to vary across participants

References:

Latimer NR (2014). Adjusting survival time estimates to account for treatment switching in randomized controlled trials - an economic evaluation context: methods, limitations, and recommendations. Med Decis Making
Allison A (2017). rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

RPSFT

Assumptions and Considerations (cont.):

Counterfactual survival model requires that patients are either ‘on treatment’ or ‘off treatment’ at any 1 time
- problematic if the control treatment is active
- additional assumption: the treatment effect is only received while a patient is ‘on treatment’; it disappears as soon as treatment is discontinued \(\rightarrow\) clinical plausbility?
- if expect a continuing treatment effect: assume a lagged treatment effect or on ‘treatment group basis’
  - patients randomized in experimental group: always ‘on-treatment’
  - switchers: remain ‘on-treatment’ from time of treatment switching to death

Reference: Latimer NR (2014). Adjusting survival time estimates to account for treatment switching in randomized controlled trials - an economic evaluation context: methods, limitations, and recommendations. Med Decis Making

RPSFT

Example

Trial compares two policies (immediate or deferred treatment) of zidovudine treatment in symptom free participants infected with HIV
Immediate treatment arm: received treatment at randomisation
Deferred arm: received treatment either at onset of AIDS related complex or AIDS (CDC group IV disease) or development of persistently low CD4 count.
Analysis endpoint: time from study entry to progression to AIDS, or CDC group IV disease, or death (i.e. progression-free survival)

Reference: Allison A (2017). rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

RPSFT

Example - Compare intent-to-treat vs. RPSFT results | ITT result

Fitting Weibull AFT model to full analysis set shows that getting immediate treatment extends survival time by a factor of 1.158, but the effect is not statistically significant (ETR= 1.158, 95%CI: 0.996, 1.347)

## $HR
##            HR        LB       UB
## imm 0.8043545 0.6437549 1.005019
## 
## $ETR
##         ETR        LB       UB
## imm 1.15844 0.9960953 1.347244

RPSFT

Example - Compare intent-to-treat vs. RPSFT results (cont.)| RPSFT result

Using log-rank test, RPSFTM estimates \(\hat{\psi} = -0.181\), so the acceleration factor is \(exp(-\hat{\psi})= 1.199\). This means getting immediate treatment extends survival time by a factor of 1.199 (95%CI: 0.998, 1.419).

## [1] "formula    3   terms      call   " "terms      3   terms      call   "
## [3] ""

RPSFT

Example - Compare intent-to-treat vs. RPSFT results (cont.)| RPSFT result

RPSFT

Example - Compare intent-to-treat vs. RPSFT results (cont.)

Inverse probability of censoring weighting (IPCW)

An extension of the per-protocol censoring approach
Treatment switchers: artificially censored at the time of switch.

Censor switchers at the time of switch

IPCW

Estimate weights for non-switchers [1]

Compute separately for each arm: For stayed patient \(i\) for time interval \(t\), weight is given by:

\[w_{i,t} = \frac{1}{\prod_{k=0}^t P(C(k)_i = 0|C(k-1)_i=0,X_i,Z(k)_i)} \]

\[sw_{i,t} = \frac{\prod_{k=0}^t P(C(k)_i = 0|C(k-1)_i=0,X_i)}{\prod_{k=0}^t P(C(k)_i = 0|C(k-1)_i=0,X_i,Z(k)_i)} \] where \(X_i\) are baseline covariates, \(Z(k)_i\) are time-dependent prognostic factors.

IPCW

Estimate weights for non-switchers [2]

Estimate weights for non-censored patients, based on predictors of the probability of switching

An extension of the per-protocol censoring approach, whereby the bias associated with censoring participants that depart from randomised treatment is removed by weighting the remaining non-switchers
baseline and time-dependent covariates that are prognostic for mortality and that influence the probability of NOT switching are identified.
Models are fitted to this dataset to predict the probability of not switching in control group patients, conditional on the identified prognostic factors. These probabilities are used to determine the size of the weight applied to each patient.
At a given time \(t\), the weights are defined as the inverse of the probability of having remained on the randomized treatment, that is, of being uncensored by treatment, until time t given still on randomized treatment before time \(t\) and given the observed values of the measured baseline and time-dependent confounders at \(t\).
The numerator is the probability of patient \(i\) remaining uncensored at the end of time \(k\) given that the patient was also uncensored at the end of time \(k-1\) and given some baseline covariates X.
The denominator is the probability of patient \(i\) remaining uncensored at the end of time \(k\) given that the patient was also uncensored at the end of time \(k-1\) and given some baseline covariates X and time-dependent prognostic factors Z.
stayed patients with similar baseline characteristics and time-dependent prognostic covariates to switchers are given higher weights.

IPCW

Estimate adjusted treatment effect

Estimate adjusted treatment effect by incorporating weights within standard survival analysis

IPCW

Assumptions & Limitations

“No unmeasured confounders” (exchangability) assumption: all factors that influence both switch and survival are included in the weight calculation
Problematic in relatively small sample: convergence issue, wide confidence intervals.
Substantial error when very few non-switchers

Reference: Latimer NR (2016). Treatment switching: Statistical and decision-making challenges and approaches

IPCW

Example

SHIVA clinical trial, comparing molecularly targeted therapy based on tumour molecular profiling (MTA) versus conventional therapy (CT) for advanced cancer.
Switch to the other arm was scheduled to be proposed at disease progression for patients in both arms
Endpoint of analysis: overall survival
Baseline time-fixed covariates: age at randomization, gender, number of previous lines of treatment, the dichotomized Royal Marsden Hospital score (0 or 1 vs. 2 or 3) and the altered molecular pathway (distinguishing 3 pathways, namely hormone receptors pathway, PI3K/ AKT/mTOR pathway, and RAF/MEK pathway).
Time-varying confounders: the Eastern Cooperative Oncology Group (ECOG) performance status, the presence of concomitant treatments and the need of platelet transfusions

Reference: Nathalie G (2019). ipcwswitch: An R package for inverse probability of censoring weighting with an application to switches in clinical trials. Computers in Biology and Medicine, 2019

IPCW

Example - Compare intent-to-treat vs. IPCW results

IPCW

Example - Compare intent-to-treat vs. IPCW results (cont.)| ITT result

ITT analysis provides an estimated hazard ratio of (1.19, 95%CI = [0.84, 1.68]),

## Call:
## coxph(formula = Surv(os_time, status) ~ bras.f + agerand + sex.f + 
##     tt_Lnum + rmh_alea.c + pathway.f, data = SHIdat)
## 
##                              coef  exp(coef)   se(coef)      z        p
## bras.fMTA               0.1729732  1.1888343  0.1768705  0.978   0.3281
## agerand                 0.0004777  1.0004778  0.0074874  0.064   0.9491
## sex.fFemale            -0.3758205  0.6867256  0.1832455 -2.051   0.0403
## tt_Lnum                 0.0140618  1.0141612  0.0357184  0.394   0.6938
## rmh_alea.c              0.9274363  2.5280198  0.1846264  5.023 5.08e-07
## pathway.fHR            -0.0593481  0.9423786  0.2794362 -0.212   0.8318
## pathway.fPI3K/AKT/mTOR -0.0284340  0.9719665  0.2820677 -0.101   0.9197
## 
## Likelihood ratio test=34.66  on 7 df, p=1.295e-05
## n= 197, number of events= 134

##     2.5 %    97.5 % 
## 0.8405603 1.6814104

IPCW

Example - Compare intent-to-treat vs. IPCW results (cont.)| IPCW result

IPCW provides an estimated causal hazard ratio of 1.30 (95%CI = [0.81, 2.08])

## Call:
## coxph(formula = Surv(tstart, tstop, event) ~ bras.f + agerand + 
##     sex.f + tt_Lnum + rmh_alea.c + pathway.f, data = SHIres, 
##     weights = SHIres$weights.trunc, cluster = id)
## 
##                             coef exp(coef)  se(coef) robust se      z        p
## bras.fMTA               0.262762  1.300518  0.240393  0.239143  1.099 0.271869
## agerand                -0.001184  0.998816  0.009506  0.009876 -0.120 0.904541
## sex.fFemale            -0.392972  0.675048  0.231436  0.234035 -1.679 0.093130
## tt_Lnum                 0.006429  1.006449  0.044150  0.040456  0.159 0.873742
## rmh_alea.c              0.809997  2.247902  0.237453  0.237956  3.404 0.000664
## pathway.fHR            -0.046975  0.954111  0.335226  0.336144 -0.140 0.888860
## pathway.fPI3K/AKT/mTOR -0.080538  0.922620  0.334524  0.327150 -0.246 0.805544
## 
## Likelihood ratio test=18.09  on 7 df, p=0.01156
## n= 9745, number of events= 83

##     2.5 %    97.5 % 
## 0.8138748 2.0781404

Comparison of RPSFT and IPCW

	RPSFT	IPCW
Approach	Randomization-based approach Counter-factual survival time	extended Per-protocol censoring approach Adjust treatment effect for informative censoring
Key assumption	Common treatment effect	Exchangability
Pros	Less sensitive to small patient numbers	Flexible switching: active control, 2-way switching
Cons	Problematic if active control	Small RCT: convergence issue, wide confidence intervals Substantial error in weight estimation Can’t work if having a perfect predictor of switch

Recommendations

Treatment switching possible analyses

Reference: Roche’s Treatment Switching Guidance document.