Time to Event

CDISC

Introduction to Time-to-Event Analysis

This document provides an overview of how Time-to-Event (TTE) analysis is conducted using the R programming language, specifically leveraging the Dplyr and Admiral packages.

TTE analysis, also known as survival analysis, is a critical statistical methodology, particularly prevalent in fields such as clinical research, engineering, and economics.

What is Time-to-Event (TTE) Analysis?

Time-to-Event analysis focuses on the duration until a specific event of interest occurs.

Unlike traditional statistical methods that might only consider whether an event happened,

TTE analysis accounts for the time aspect and the possibility of “censoring.”

Censoring occurs when the event has not been observed for all subjects by the end of the study, or if a subject withdraws before the event occurs.

This distinction makes TTE analysis uniquely suited for data where outcomes are not immediate or uniformly observed.

The Censor variable (often denoted as CNSR in ADaM datasets) is a critical component in Time-to-Event (TTE) or survival analysis. It’s a binary indicator that tells us whether the event of interest has occurred for a subject within the observation period or if the subject was “censored.”

What is Censoring?

Censoring occurs when we don’t observe the event of interest for a subject for the entire follow-up period.

Instead, we only know that the event did not happen up to a certain point in time, at which point the observation stops.

This is different from a missing value; a censored observation still provides valuable information (i.e., the event didn’t happen by that time).

Right Censoring:

This is the most common type. It means that the event of interest has not yet occurred for the subject by the end of the study, or the subject was lost to follow-up, or withdrew from the study, before the event happened. We know the subject was event-free up to their last observation time.

Example: A patient is being monitored for disease progression. At the end of the study, their disease has not progressed. Their time is “right-censored” at the study end date.

Example: A patient moves away and is lost to follow-up. Their time is “right-censored” at their last known contact date.

Left Censoring:

This occurs when the event of interest has already happened before the start of observation. We know the event occurred, but not exactly when. This is less common in clinical trials as studies usually enroll patients who are event-free at baseline.

Example: Studying the time to diagnosis of a chronic disease, but some individuals might have been diagnosed before the study began.

Interval Censoring:

This happens when the exact time of an event is unknown, but we know it occurred within a specific time interval.

Example: A patient is screened every 6 months for tumor recurrence. A recurrence is found at the 12-month visit, but it was not present at the 6-month visit. The event occurred sometime between 6 and 12 months.

How the Censor Variable Works (CNSR)

The CNSR variable is typically coded as:

0: Indicates that the event of interest occurred for the subject.

1: Indicates that the subject was censored, meaning the event did not occur by their last follow-up time.

Sometimes, other numeric codes or character values are used, but 0/1 is standard. In some contexts, you might see it defined as 1 for event and 0 for censored, so it’s always crucial to check the specific dataset’s definition. The CDISC ADaM Basic Data Structure for Time-to-Event Analysis generally follows the 0=event, 1=censored convention.

Example in a Clinical Trial ADaM Dataset (ADTTE)

Consider an ADTTE (Analysis Data for Time-to-Event) dataset for Progression-Free Survival (PFS):

USUBJID ADT AVAL (Days) CNSR PARAMCD
ABC-001 2023-01-15 120 0 PFS
ABC-002 2023-03-20 180 1 PFS
ABC-003 2023-02-10 150 0 PFS
ABC-004 2023-05-01 210 1 PFS
  • ABC-001: Had disease progression (the event) on 2023-01-15, 120 days after treatment start. CNSR is 0.
  • ABC-002: Was censored on 2023-03-20, 180 days after treatment start. This means as of that date, their disease had not progressed. CNSR is 1. They might have completed the study without progression, or been lost to follow-up.
  • ABC-003: Had disease progression on 2023-02-10, 150 days after treatment start. CNSR is 0.
  • ABC-004: Was censored on 2023-05-01, 210 days after treatment start. CNSR is 1. The censor variable is fundamental for properly analyzing time-to-event data, ensuring that all available information is used and that statistical conclusions are valid.

Importance of the Censor Variable

The censor variable is vital because:

Correctly Accounts for Incomplete Information: Survival analysis methods (like Kaplan-Meier curves or Cox Proportional Hazards models) are designed to handle censored data appropriately. They use the information from censored observations up to their censoring time, rather than simply discarding them, which would bias the results.

Prevents Bias: If censored observations were treated as if the event never happened, or if they were excluded, the estimated event rates would be artificially lowered, and survival times would appear longer than they truly are.

Accurate Survival Estimates: Including the censor variable allows the models to provide unbiased estimates of survival probabilities and hazard ratios, reflecting the true risk over time for the population being studied.

ADTTE datasets

ADTTE datasets extend the ADaM Basic Data Structure (BDS) with additional variables specific to time-to-event analysis, as defined in the document (Section 4). - The naming convention for these datasets should follow the ADaM standard described in ADaM v2.1.

  • The ADaM BDS is flexible and can handle various outcomes for TTE analysis. It’s generally recommended to store time-to-event data separately from non-time-to-event data, even if both fit within the ADaM BDS.

  • Sponsors decide the number of ADTTE datasets needed for a study, prioritizing clarity over combining all TTE variables into one dataset.

This information is very helpful in understanding the structure and best practices for ADTTE datasets in clinical research.

“Common Statistical Analysis Methods Supported by ADaM TTE,”

The ADaM Time-to-Event (TTE) analysis dataset structure is designed to support various common statistical methods for analyzing longitudinal event data. Events can be qualitative (transitions between states) or quantitative (significant sudden changes).

Supported Statistical Analysis Methods:

  • Kaplan-Meier product moment curve

  • Actuarial or cohort life table analyses

  • Log-rank tests (stratified or trend)

  • Wilcoxon tests

  • Cox proportional hazards models

Capabilities of the ADaM TTE Dataset Structure:

  • Descriptive and Inferential Presentations: It can be used for a wide range of tabular and graphical data presentations

  • Diagnostic Checks: Supports diagnostic checks to ensure the appropriateness of the methods used.

  • Hypothesis Tests and Formal Modeling: The structure also supports hypothesis testing and formal statistical modeling, along with their associated assumption checks and diagnostics, though these operations are not illustrated in the document.

ADaM Metadata

Table 4.1 Example of ADTTE Dataset Metadata

Dataset Name Dataset Description Dataset Location Dataset Structure Key Variables of Dataset Class of Dataset Documentation
ADTTE Data for the Time to Event Analyses adtte.xpt one record per subject per parameter USUBJID, PARAMCD BDS ADTTE.SAS, SAP Section 10.1

Table 4.2, “Time-to-Event Variables” :

  • Core Column¹: This column identifies whether a variable is typically expected in a dataset, with the following classifications:

  • Expected: The variable is always anticipated.

  • Required: The variable is necessary for the dataset row to be meaningful.

  • Conditional: The variable is required only under specific conditions.

  • Permissible: The variable is allowed but not necessarily expected.

Required Variables for ADaM Time-to-Event Datasets

This section lists the variables identified as “Required” in Table 4.2 of the ADaM Basic Data Structure for Time-to-Event Analysis document, along with their labels and relevant CDISC notes.

Definition of “Required”

As per the “Core” column definition in Table 4.2: * Required: The variable is required in the given dataset for the row to be meaningful.

List of Required Variables

  • USUBJID (Variable Label: Subject’s global, unique, and sequential identifier) CDISC Notes: Derived from source; e.g., Patient ID from an EHR system.

  • AVAL (Variable Label: Analysis Value) CDISC Notes: Duration of follow-up (time to event or censoring) in specified units (e.g., days, weeks, months).

  • CNSR (Variable Label: Censoring Variable) CDISC Notes: Binary indicator for event occurrence (0=Event, 1=Censored).

  • ADT (Variable Label: Analysis Date) CDISC Notes: Date of event or censoring. This date variable corresponds to the time variable AVAL.

  • PARAMCD (Variable Label: Parameter Code) CDISC Notes: Short code for the parameter, e.g., ‘OS’ for Overall Survival, ‘PFS’ for Progression-Free Survival.

  • PARAM (Variable Label: Parameter Desc) CDISC Notes: Full descriptive name of the parameter.

The Admiral package is designed to streamline the creation of ADaM (Analysis Data Model) datasets, including those for Time-to-Event (TTE) analysis (ADTTE datasets). It works by providing a set of functions that are specialized for common ADaM derivations, often used in conjunction with dplyr verbs for data manipulation.

Here’s how you would conceptually use Admiral to derive the “Required” variables for an ADTTE dataset. Remember, the code snippets are illustrative and not executable R code.

Let’s assume you start with your Subject-Level Analysis Dataset (adsl) and potentially other SDTM domains (like dm, ex, ds, tu, rs for dates of death, exposure, tumor assessments, response, etc.).

Deriving Required Variables with Admiral

https://pharmaverse.github.io/admiralonco/articles/adtte.html

  1. USUBJID (Subject Identifier)
  • Derivation: USUBJID is typically a foundational variable that comes directly from the raw SDTM DM (Demographics) domain and is usually carried forward into the ADSL (Analysis Data Subject Level) dataset. From ADSL, it’s then merged or selected into the ADTTE dataset. Admiral helps manage this flow.

  • Admiral Concept: When you build your adsl dataset or when you initiate your ADTTE dataset, USUBJID is usually a starting point from your core subject data.

    # Illustrative Admiral/Dplyr code snippet (not executable)
    library(dplyr)
    library(admiral)

    # Assume 'adsl' is your pre-existing ADSL dataset
    # You would typically filter ADSL to the relevant analysis population first
    adtte_base <- adsl %>%
      filter(SAFFL == "Y") %>% # Example: filter to safety analysis population
      select(USUBJID, TRTSDT, DTHDT, LSTALVDT) # Select key dates for TTE derivation

ADT (Analysis Date)

  • Derivation: ADT represents the date of the event or the date of censoring. It’s crucial because it’s the anchor for calculating the AVAL (analysis value/time). ADT based on event occurrence or censoring logic.

  • Admiral: You would use admiral’s date derivation functions, or more commonly, derive_param_tte() which internally handles ADT derivation based on the event and censoring dates you provide.

# Continuing from 'adtte_base'
# Assume 'tumor_prog_date' is a dataset with subject's tumor progression dates

adtte_prep <- adtte_base %>%
  # Join with event dates (e.g., from SDTM TU or RS domains)
left_join(tumor_prog_date %>% select(USUBJID, PROGDT), by = "USUBJID") %>%
  # Derive ADT based on event or censoring logic
  # The coalesce function picks the first non-missing date
mutate( ADT_TEMP = coalesce(PROGDT, DTHDT, LSTALVDT) 

# Example: Progression, then Death, then Last Alive
      ) %>%
# Use an admiral function to formalize the date derivation and handle specific formats

derive_vars_dt(
        dtc = ADT_TEMP, # Use the temporary date for derivation
        new_vars = "ADT" # This will create the final ADT variable
      )
  

General Concept of derive_param_tte():

Derives a time-to-event parameter. Arguments: It utilizes event objects (defining events and linking to source datasets) and censor_conditions (defining censoring events). It calculates time-to-event based on event and censoring dates.

adtte <- derive_param_tte(
    dataset_adsl = adsl,
    start_date = RANDDT,
    event_conditions = list(death_event, pd_event), 
    censor_conditions = list(lasta_censor, rand_censor), 
    source_datasets = list(adsl = adsl, adrs = adrs), 
    set_values_to = exprs(PARAMCD = "PFS", PARAM = "Progression Free Survival")
  )    

3. AVAL (Analysis Value - Time)

  • Derivation: AVAL is the calculated time duration from a start date (e.g., treatment start, randomization) to the ADT.

This is often in days, weeks, or months. Admiral provides functions that specifically calculate durations and handle the complexities of time calculation (e.g., partial days).

  • Admiral Concept: The derive_param_tte() function is the primary Admiral tool for this. It takes a start date, an event date, and a censoring date, and it calculates AVAL and CNSR simultaneously.
# Illustrative Admiral code snippet (not executable)

# The most common way to derive AVAL and CNSR together for TTE
adtte_final <- adtte_prep %>%
derive_param_tte(
        dataset_adsl = adsl, # Original ADSL for population context
        dataset_source = .,  # Current dataset in the pipe
        start_date = TRTSDT, # Starting point for time calculation
        event_date = PROGDT, # Date of the actual event
        censor_date = DTHDT, # Date for censoring if event didn't happen (e.g., death without progression)
        censor_direct_date = LSTALVDT, # Another censoring date, if applicable
        censor_reason_var = CNSR, # Variable to store censoring reason (optional, but good practice)
        param_code = "PFS", # Specify the parameter code for this TTE analysis
        new_var = "AVAL", # Name of the derived time variable
        cnsr_var = "CNSR" # Name of the derived censoring variable
      )

CNSR (Censoring Variable)

  • Derivation:
    CNSR indicates whether the event occurred (0) or was censored (1). Its derivation is inherently linked to the ADT and AVAL derivation.

  • Admiral derive_param_tte() automatically derives CNSR based on whether event_date is missing or present relative to censor_date or censor_direct_date.

# (Included in the derive_param_tte example above)
# The derive_param_tte function handles CNSR creation based on the
# logic of event_date vs. censor_date/censor_direct_date.

derive_param_tte(
censor_date = DTHDT, # Date for censoring if event didn't happen
                       (e.g., death without progression)
        censor_direct_date = LSTALVDT, # Another censoring date, if applicable
        censor_reason_var = CNSR, # Variable to store censoring reason (optional, but good practice)
        param_code = "PFS", # Specify the parameter code for this TTE analysis
        new_var = "AVAL", # Name of the derived time variable
        cnsr_var = "CNSR" # Name of the derived censoring variable
      )

PARAMCD (Parameter Code) & PARAM (Parameter Description)

  • Derivation: These variables identify the specific TTE endpoint being analyzed (e.g., “PFS” for Progression-Free Survival, “OS” for Overall Survival). They are not “derived” in the sense of a calculation from raw data, but rather assigned based on the analysis objective.

  • Admiral Concept: When using derive_param_tte(), you explicitly provide the param_code and param_desc (or param_code which then maps to PARAM based on metadata or another function). This is how Admiral knows which TTE analysis you are performing and labels the output accordingly.

    # (Included in the derive_param_tte example above)
# When you call derive_param_tte, you specify PARAMCD (param_code)
# and implicitly PARAM (param_desc) is often looked up or set.
# For example:

adtte_final_os <- adtte_prep %>%
      derive_param_tte(
        dataset_adsl = adsl,
        dataset_source = .,
        start_date = TRTSDT,
        event_date = DTHDT, # Event is death for OS
        param_code = "OS",
        param_desc = "Overall Survival", # You might specify desc directly
        new_var = "AVAL",
        cnsr_var = "CNSR"
      ) %>%

mutate(PARAM = "Overall Survival") # Or add PARAM explicitly later if needed

Summary of Admiral’s Role:

Admiral provides specialized, standardized functions that abstract away much of the complex logic for deriving ADaM TTE variables. While dplyr is used for general data manipulation (filtering, selecting, joining), Admiral builds on this by offering “derivations” that ensure CDISC compliance and best practices for creating analysis-ready TTE datasets. The derive_param_tte() function is particularly central for simultaneously generating AVAL, ADT, and CNSR for a given TTE endpoint (PARAMCD/PARAM).

Kaplan-Meier Plot of Progression-Free Survival (PFS) by Treatment Group.

This type of graph is a standard way to visualize time-to-event data in clinical trials, showing how the probability of surviving without an event (in this case, disease progression or death) changes over time for different treatment groups.

KM curve

Explanation of Figure 5.1.1:

  • What it shows: The plot visually represents the estimated probability of patients remaining progression-free (meaning their disease has not progressed and they are still alive) over a period of time since they started treatment. It compares this probability between two groups: “Drug A” (the active treatment) and “Placebo.”
  • X-axis (Time): This axis represents “Time (Months).” It shows how much time has passed since the start of the observation period (e.g., from the start of treatment or randomization).
  • Y-axis (Probability of Progression-Free Survival): This axis represents the estimated probability that a patient is still progression-free. It ranges from 1.0 (100% probability of being event-free) down towards 0.0 (0% probability).

  • The Curves (“Drug A” vs. “Placebo”):
    • Each line (one for Drug A, one for Placebo) is a Kaplan-Meier curve.
    • Steps Down: The “steps” or drops in the curve indicate that an event (disease progression or death) occurred for one or more patients at that specific time point. The size of the drop depends on the number of events relative to the number of patients still at risk. A steeper drop means more events happened at that time.

  • Tick Marks: The small vertical tick marks on the curves represent “censored” patients.

These are patients who were still progression-free at that time but either completed the study, were lost to follow-up, or withdrew, meaning we no longer have data on them, but we know they were event-free up to that point. They essentially “exit” the risk pool without having the event.

  • Number at Risk Table: Below the main plot, there is typically a table labeled “Number at Risk.” This table shows, for each treatment group, how many patients are still being followed and are “at risk” of experiencing the event at various key time points (e.g., at 0, 3, 6, 9 months, etc.). This helps in understanding the sample size contributing to the curve at different durations.

How to Read Kaplan-Meier (KM) Curves

When reading Kaplan-Meier curves like Figure 5.1.1, consider the following:

  1. Higher Curve = Better Outcome: A curve that stays higher on the plot for longer indicates a better outcome. For Progression-Free Survival, a higher curve means a higher probability of remaining progression-free over time. In Figure 5.1.1, if the “Drug A” curve stays above the “Placebo” curve, it suggests Drug A is more effective at preventing progression or death.
  2. Steepness of Drops = Event Rate: Steeper drops in the curve indicate that events are happening more rapidly in that group at that time.
  3. Median Survival/PFS: You can often visually estimate the median survival or PFS by finding the point on the x-axis where the curve drops to 0.5 (50% probability).

  1. Number at Risk: Always refer to the “Number at Risk” table. It’s crucial to know how many subjects are contributing to the curve at later time points, as curves based on very few subjects become less reliable.
  2. Censoring: Be aware of the tick marks, as they represent patients who did not have the event but whose follow-up ended. These observations are still valuable as they contribute to the probability estimate up to their censoring time.

In summary, Figure 5.1.1 provides a clear visual comparison of how well each treatment maintains progression-free survival over time, allowing for an intuitive understanding of the treatment’s impact.

Table 5.1.2 Example of Time-to-Event Analysis Results Display :

Time to Death Calculation: Calculated as date of death – date of randomization.

Censoring: For subjects who did not die on or prior to Week 24 (Day 168), they are censored at Day 168.

Footnote a: Median Time (95% CI) and Event Rate (%) at Day 168 (95% CI) are “Based on the Kaplan-Meier estimates.”

Footnote b: The Cox regression model used for the Event Rate P-value “includes treatment group, age, and sex as covariates.”

Table 5.1.2

Conclusions from the Table:

  • Comparison of Median Time to Death:

Compare the Median Time (95% CI) for Treatment A and Treatment B. A longer median time indicates that subjects in that group survived longer without the event (death). The confidence intervals (CI) provide a range for these estimates; if the CIs for the two treatments largely overlap, the difference in median time might not be statistically significant.

The P value (from the Log-rank Test) indicates the statistical significance of the difference in survival distributions between Treatment A and Treatment B. If this P-value is small (e.g., < 0.05), it suggests a statistically significant difference in median time to death between the groups.

Comparison of Event Rate at Day 168:

Compare the Event Rate (%) at Day 168 (95% CI) for Treatment A and Treatment B. A lower event rate indicates better survival (fewer deaths) by Day 168.

The P value (from the Cox Regression Model) assesses the statistical significance of the difference in event rates, adjusted for age and sex. A small P-value would suggest a significant difference in the risk of death between the treatments by Day 168, considering the specified covariates.

Censoring Rate:

The N (%) Censored provides context on how many subjects in each group were censored. A higher percentage of censored subjects, especially at earlier time points, can impact the precision of later estimates.

Overall Conclusion for Table 14.2.1.1, “Time to Death Through Day 168 by Treatment Group.”

If the Median Time for one treatment (e.g., Treatment A) is substantially longer than the other (Treatment B) and the corresponding P-value is less than 0.05, it would suggest that Treatment A significantly prolongs the time to death compared to Treatment B.

Similarly, if the Event Rate (%) at Day 168 for one treatment (e.g., Treatment A) is significantly lower than the other, with a P-value less than 0.05 (from the Cox model), it would indicate that Treatment A significantly reduces the risk of death by Day 168.

The inclusion of the P-values provides statistical evidence to support whether any observed differences between Treatment A and Treatment B are likely due to the treatments themselves or merely due to random chance.

Table 14.2.1.2, “Time to Death Through Day 168 – Cox Regression Model.”

Purpose: This table displays the results of a Cox Proportional Hazards Regression Model, which evaluates how various factors (covariates) influence the hazard (risk) of death over time.

Analysis Population: The analysis was conducted on the Intent-to-Treat (ITT) population, meaning all subjects randomized were included.

Table 5.1.3

Hazard Ratio (HR):

Measures the relative risk of the event (death) for one level of a covariate compared to a reference level.

  • HR > 1: Increased risk in the first group/level.

  • HR < 1: Decreased risk in the first group/level.

  • HR = 1: No difference in risk.

95% Confidence Interval (CI): A range of plausible values for the true HR. If the CI includes 1, the effect is not statistically significant.

P-value: Indicates the statistical significance of the covariate’s effect. A P-value < 0.05 typically suggests a statistically significant association.

Conclusions :

Treatment Group (Treatment B to Treatment A):

Interpretation: The Hazard Ratio indicates the relative risk of death for patients in Treatment B compared to those in Treatment A.

Conclusion based on hypothetical results:

If x.xx (HR) is less than 1 (e.g., 0.75) and 0.xxx (P-value) is less than 0.05, it would suggest that Treatment B significantly reduces the hazard of death compared to Treatment A.

If x.xx (HR) is greater than 1 (e.g., 1.25) and 0.xxx (P-value) is less than 0.05, it would suggest that Treatment B significantly increases the hazard of death compared to Treatment A.

If the 95% CI includes 1 or the P-value is greater than or equal to 0.05, there is no statistically significant difference in the hazard of death between Treatment B and Treatment A.

Age (< 65 to ≥ 65 years old):

Interpretation: The Hazard Ratio indicates the relative risk of death for subjects aged ≥ 65 years old compared to those aged < 65 years old.

Conclusion based on hypothetical results:

If x.xx (HR) is greater than 1 (e.g., 1.80) and 0.xxx (P-value) is less than 0.05, it would suggest that being 65 years or older significantly increases the hazard of death compared to being younger than 65 years.

If x.xx (HR) is less than 1 or the P-value is greater than or equal to 0.05, there is no statistically significant effect of age on the hazard of death (within these categories), after accounting for other factors in the model.

Sex (Female to Male):

Interpretation: The Hazard Ratio indicates the relative risk of death for females compared to males.

Conclusion based on hypothetical results:

If x.xx (HR) is less than 1 (e.g., 0.90) and 0.xxx (P-value) is less than 0.05, it would suggest that females have a significantly lower hazard of death compared to males.

If x.xx (HR) is greater than 1 (e.g., 1.15) and 0.xxx (P-value) is less than 0.05, it would suggest that females have a significantly higher hazard of death compared to males.

If the 95% CI includes 1 or the P-value is greater than or equal to 0.05, there is no statistically significant difference in the hazard of death between females and males, after accounting for other factors.

Overall Conclusion for Table 14.2.1.2, “Time to Death Through Day 168 – Cox Regression Model.”:

This Cox Regression model provides insights into the independent prognostic factors for time to death, accounting for the effects of treatment group, age, and sex simultaneously. The P-values guide whether the observed associations are statistically significant, allowing researchers to identify which covariates are important predictors of the time to death through Day 168.