This document provides an overview of how Time-to-Event (TTE) analysis is conducted using the R programming language, specifically leveraging the Dplyr
and Admiral
packages.
TTE analysis, also known as survival analysis, is a critical statistical methodology, particularly prevalent in fields such as clinical research, engineering, and economics.
Time-to-Event analysis focuses on the duration until a specific event of interest occurs.
Unlike traditional statistical methods that might only consider whether an event happened,
TTE analysis accounts for the time aspect and the possibility of “censoring.”
Censoring occurs when the event has not been observed for all subjects by the end of the study, or if a subject withdraws before the event occurs.
This distinction makes TTE analysis uniquely suited for data where outcomes are not immediate or uniformly observed.
The Censor variable (often denoted as CNSR in ADaM datasets) is a critical component in Time-to-Event (TTE) or survival analysis. It’s a binary indicator that tells us whether the event of interest has occurred for a subject within the observation period or if the subject was “censored.”
Censoring occurs when we don’t observe the event of interest for a subject for the entire follow-up period.
Instead, we only know that the event did not happen up to a certain point in time, at which point the observation stops.
This is different from a missing value; a censored observation still provides valuable information (i.e., the event didn’t happen by that time).
This is the most common type. It means that the event of interest has not yet occurred for the subject by the end of the study, or the subject was lost to follow-up, or withdrew from the study, before the event happened. We know the subject was event-free up to their last observation time.
Example: A patient is being monitored for disease progression. At the end of the study, their disease has not progressed. Their time is “right-censored” at the study end date.
Example: A patient moves away and is lost to follow-up. Their time is “right-censored” at their last known contact date.
This occurs when the event of interest has already happened before the start of observation. We know the event occurred, but not exactly when. This is less common in clinical trials as studies usually enroll patients who are event-free at baseline.
Example: Studying the time to diagnosis of a chronic disease, but some individuals might have been diagnosed before the study began.
This happens when the exact time of an event is unknown, but we know it occurred within a specific time interval.
Example: A patient is screened every 6 months for tumor recurrence. A recurrence is found at the 12-month visit, but it was not present at the 6-month visit. The event occurred sometime between 6 and 12 months.
The CNSR variable is typically coded as:
0: Indicates that the event of interest occurred for the subject.
1: Indicates that the subject was censored, meaning the event did not occur by their last follow-up time.
Sometimes, other numeric codes or character values are used, but 0/1 is standard. In some contexts, you might see it defined as 1 for event and 0 for censored, so it’s always crucial to check the specific dataset’s definition. The CDISC ADaM Basic Data Structure for Time-to-Event Analysis generally follows the 0=event, 1=censored convention.
Consider an ADTTE (Analysis Data for Time-to-Event) dataset for Progression-Free Survival (PFS):
USUBJID | ADT | AVAL (Days) | CNSR | PARAMCD |
---|---|---|---|---|
ABC-001 | 2023-01-15 | 120 | 0 | PFS |
ABC-002 | 2023-03-20 | 180 | 1 | PFS |
ABC-003 | 2023-02-10 | 150 | 0 | PFS |
ABC-004 | 2023-05-01 | 210 | 1 | PFS |
The censor variable is vital because:
Correctly Accounts for Incomplete Information: Survival analysis methods (like Kaplan-Meier curves or Cox Proportional Hazards models) are designed to handle censored data appropriately. They use the information from censored observations up to their censoring time, rather than simply discarding them, which would bias the results.
Prevents Bias: If censored observations were treated as if the event never happened, or if they were excluded, the estimated event rates would be artificially lowered, and survival times would appear longer than they truly are.
Accurate Survival Estimates: Including the censor variable allows the models to provide unbiased estimates of survival probabilities and hazard ratios, reflecting the true risk over time for the population being studied.
ADTTE datasets extend the ADaM Basic Data Structure (BDS) with additional variables specific to time-to-event analysis, as defined in the document (Section 4). - The naming convention for these datasets should follow the ADaM standard described in ADaM v2.1.
The ADaM BDS is flexible and can handle various outcomes for TTE analysis. It’s generally recommended to store time-to-event data separately from non-time-to-event data, even if both fit within the ADaM BDS.
Sponsors decide the number of ADTTE datasets needed for a study, prioritizing clarity over combining all TTE variables into one dataset.
This information is very helpful in understanding the structure and best practices for ADTTE datasets in clinical research.
The ADaM Time-to-Event (TTE) analysis dataset structure is designed to support various common statistical methods for analyzing longitudinal event data. Events can be qualitative (transitions between states) or quantitative (significant sudden changes).
Kaplan-Meier product moment curve
Actuarial or cohort life table analyses
Log-rank tests (stratified or trend)
Wilcoxon tests
Cox proportional hazards models
Descriptive and Inferential Presentations: It can be used for a wide range of tabular and graphical data presentations
Diagnostic Checks: Supports diagnostic checks to ensure the appropriateness of the methods used.
Hypothesis Tests and Formal Modeling: The structure also supports hypothesis testing and formal statistical modeling, along with their associated assumption checks and diagnostics, though these operations are not illustrated in the document.
Dataset Name | Dataset Description | Dataset Location | Dataset Structure | Key Variables of Dataset | Class of Dataset | Documentation |
---|---|---|---|---|---|---|
ADTTE | Data for the Time to Event Analyses | adtte.xpt | one record per subject per parameter | USUBJID, PARAMCD | BDS | ADTTE.SAS, SAP Section 10.1 |
Core Column¹: This column identifies whether a variable is typically expected in a dataset, with the following classifications:
Expected: The variable is always anticipated.
Required: The variable is necessary for the dataset row to be meaningful.
Conditional: The variable is required only under specific conditions.
Permissible: The variable is allowed but not necessarily expected.
This section lists the variables identified as “Required” in Table 4.2 of the ADaM Basic Data Structure for Time-to-Event Analysis document, along with their labels and relevant CDISC notes.
As per the “Core” column definition in Table 4.2: * Required: The variable is required in the given dataset for the row to be meaningful.
USUBJID (Variable Label: Subject’s global, unique, and sequential identifier) CDISC Notes: Derived from source; e.g., Patient ID from an EHR system.
AVAL (Variable Label: Analysis Value) CDISC Notes: Duration of follow-up (time to event or censoring) in specified units (e.g., days, weeks, months).
CNSR (Variable Label: Censoring Variable) CDISC Notes: Binary indicator for event occurrence (0=Event, 1=Censored).
ADT (Variable Label: Analysis Date) CDISC Notes: Date of event or censoring. This date variable corresponds to the time variable AVAL.
PARAMCD (Variable Label: Parameter Code) CDISC Notes: Short code for the parameter, e.g., ‘OS’ for Overall Survival, ‘PFS’ for Progression-Free Survival.
PARAM (Variable Label: Parameter Desc) CDISC Notes: Full descriptive name of the parameter.
The Admiral
package is designed to streamline the creation of ADaM (Analysis Data Model) datasets, including those for Time-to-Event (TTE) analysis (ADTTE datasets). It works by providing a set of functions that are specialized for common ADaM derivations, often used in conjunction with dplyr
verbs for data manipulation.
Here’s how you would conceptually use Admiral
to derive the “Required” variables for an ADTTE dataset. Remember, the code snippets are illustrative and not executable R code.
Let’s assume you start with your Subject-Level Analysis Dataset (adsl
) and potentially other SDTM domains (like dm
, ex
, ds
, tu
, rs
for dates of death, exposure, tumor assessments, response, etc.).
https://pharmaverse.github.io/admiralonco/articles/adtte.html
USUBJID
(Subject Identifier)Derivation: USUBJID
is typically a foundational variable that comes directly from the raw SDTM DM (Demographics) domain and is usually carried forward into the ADSL
(Analysis Data Subject Level) dataset. From ADSL
, it’s then merged or selected into the ADTTE dataset. Admiral
helps manage this flow.
Admiral Concept: When you build your adsl
dataset or when you initiate your ADTTE dataset, USUBJID
is usually a starting point from your core subject data.
# Illustrative Admiral/Dplyr code snippet (not executable)
library(dplyr)
library(admiral)
# Assume 'adsl' is your pre-existing ADSL dataset
# You would typically filter ADSL to the relevant analysis population first
adtte_base <- adsl %>%
filter(SAFFL == "Y") %>% # Example: filter to safety analysis population
select(USUBJID, TRTSDT, DTHDT, LSTALVDT) # Select key dates for TTE derivation
ADT
(Analysis Date)Derivation: ADT
represents the date of the event or the date of censoring. It’s crucial because it’s the anchor for calculating the AVAL
(analysis value/time). ADT
based on event occurrence or censoring logic.
Admiral: You would use admiral
’s date derivation functions, or more commonly, derive_param_tte()
which internally handles ADT
derivation based on the event and censoring dates you provide.
# Continuing from 'adtte_base'
# Assume 'tumor_prog_date' is a dataset with subject's tumor progression dates
adtte_prep <- adtte_base %>%
# Join with event dates (e.g., from SDTM TU or RS domains)
left_join(tumor_prog_date %>% select(USUBJID, PROGDT), by = "USUBJID") %>%
# Derive ADT based on event or censoring logic
# The coalesce function picks the first non-missing date
mutate( ADT_TEMP = coalesce(PROGDT, DTHDT, LSTALVDT)
# Example: Progression, then Death, then Last Alive
) %>%
# Use an admiral function to formalize the date derivation and handle specific formats
derive_vars_dt(
dtc = ADT_TEMP, # Use the temporary date for derivation
new_vars = "ADT" # This will create the final ADT variable
)
Derives a time-to-event parameter. Arguments: It utilizes event objects (defining events and linking to source datasets) and censor_conditions (defining censoring events). It calculates time-to-event based on event and censoring dates.
AVAL
(Analysis Value - Time)AVAL
is the calculated time duration from a start date (e.g., treatment start, randomization) to the ADT
.This is often in days, weeks, or months. Admiral
provides functions that specifically calculate durations and handle the complexities of time calculation (e.g., partial days).
derive_param_tte()
function is the primary Admiral
tool for this. It takes a start date, an event date, and a censoring date, and it calculates AVAL
and CNSR
simultaneously.# Illustrative Admiral code snippet (not executable)
# The most common way to derive AVAL and CNSR together for TTE
adtte_final <- adtte_prep %>%
derive_param_tte(
dataset_adsl = adsl, # Original ADSL for population context
dataset_source = ., # Current dataset in the pipe
start_date = TRTSDT, # Starting point for time calculation
event_date = PROGDT, # Date of the actual event
censor_date = DTHDT, # Date for censoring if event didn't happen (e.g., death without progression)
censor_direct_date = LSTALVDT, # Another censoring date, if applicable
censor_reason_var = CNSR, # Variable to store censoring reason (optional, but good practice)
param_code = "PFS", # Specify the parameter code for this TTE analysis
new_var = "AVAL", # Name of the derived time variable
cnsr_var = "CNSR" # Name of the derived censoring variable
)
CNSR
(Censoring Variable)Derivation:
CNSR
indicates whether the event occurred (0) or was censored (1). Its derivation is inherently linked to the ADT
and AVAL
derivation.
Admiral derive_param_tte()
automatically derives CNSR
based on whether event_date
is missing or present relative to censor_date
or censor_direct_date
.
# (Included in the derive_param_tte example above)
# The derive_param_tte function handles CNSR creation based on the
# logic of event_date vs. censor_date/censor_direct_date.
derive_param_tte(
censor_date = DTHDT, # Date for censoring if event didn't happen
(e.g., death without progression)
censor_direct_date = LSTALVDT, # Another censoring date, if applicable
censor_reason_var = CNSR, # Variable to store censoring reason (optional, but good practice)
param_code = "PFS", # Specify the parameter code for this TTE analysis
new_var = "AVAL", # Name of the derived time variable
cnsr_var = "CNSR" # Name of the derived censoring variable
)
PARAMCD
(Parameter Code) & PARAM
(Parameter Description)Derivation: These variables identify the specific TTE endpoint being analyzed (e.g., “PFS” for Progression-Free Survival, “OS” for Overall Survival). They are not “derived” in the sense of a calculation from raw data, but rather assigned based on the analysis objective.
Admiral Concept: When using derive_param_tte()
, you explicitly provide the param_code
and param_desc
(or param_code
which then maps to PARAM
based on metadata or another function). This is how Admiral
knows which TTE analysis you are performing and labels the output accordingly.
# (Included in the derive_param_tte example above)
# When you call derive_param_tte, you specify PARAMCD (param_code)
# and implicitly PARAM (param_desc) is often looked up or set.
# For example:
adtte_final_os <- adtte_prep %>%
derive_param_tte(
dataset_adsl = adsl,
dataset_source = .,
start_date = TRTSDT,
event_date = DTHDT, # Event is death for OS
param_code = "OS",
param_desc = "Overall Survival", # You might specify desc directly
new_var = "AVAL",
cnsr_var = "CNSR"
) %>%
mutate(PARAM = "Overall Survival") # Or add PARAM explicitly later if needed
Summary of Admiral’s Role:
Admiral
provides specialized, standardized functions that abstract away much of the complex logic for deriving ADaM TTE variables. While dplyr
is used for general data manipulation (filtering, selecting, joining), Admiral
builds on this by offering “derivations” that ensure CDISC compliance and best practices for creating analysis-ready TTE datasets. The derive_param_tte()
function is particularly central for simultaneously generating AVAL
, ADT
, and CNSR
for a given TTE endpoint (PARAMCD
/PARAM
).
This type of graph is a standard way to visualize time-to-event data in clinical trials, showing how the probability of surviving without an event (in this case, disease progression or death) changes over time for different treatment groups.
These are patients who were still progression-free at that time but either completed the study, were lost to follow-up, or withdrew, meaning we no longer have data on them, but we know they were event-free up to that point. They essentially “exit” the risk pool without having the event.
When reading Kaplan-Meier curves like Figure 5.1.1, consider the following:
In summary, Figure 5.1.1 provides a clear visual comparison of how well each treatment maintains progression-free survival over time, allowing for an intuitive understanding of the treatment’s impact.
Time to Death Calculation: Calculated as date of death – date of randomization.
Censoring: For subjects who did not die on or prior to Week 24 (Day 168), they are censored at Day 168.
Footnote a: Median Time (95% CI) and Event Rate (%) at Day 168 (95% CI) are “Based on the Kaplan-Meier estimates.”
Footnote b: The Cox regression model used for the Event Rate P-value “includes treatment group, age, and sex as covariates.”
Compare the Median Time (95% CI) for Treatment A and Treatment B. A longer median time indicates that subjects in that group survived longer without the event (death). The confidence intervals (CI) provide a range for these estimates; if the CIs for the two treatments largely overlap, the difference in median time might not be statistically significant.
The P value (from the Log-rank Test) indicates the statistical significance of the difference in survival distributions between Treatment A and Treatment B. If this P-value is small (e.g., < 0.05), it suggests a statistically significant difference in median time to death between the groups.
Compare the Event Rate (%) at Day 168 (95% CI) for Treatment A and Treatment B. A lower event rate indicates better survival (fewer deaths) by Day 168.
The P value (from the Cox Regression Model) assesses the statistical significance of the difference in event rates, adjusted for age and sex. A small P-value would suggest a significant difference in the risk of death between the treatments by Day 168, considering the specified covariates.
The N (%) Censored provides context on how many subjects in each group were censored. A higher percentage of censored subjects, especially at earlier time points, can impact the precision of later estimates.
If the Median Time for one treatment (e.g., Treatment A) is substantially longer than the other (Treatment B) and the corresponding P-value is less than 0.05, it would suggest that Treatment A significantly prolongs the time to death compared to Treatment B.
Similarly, if the Event Rate (%) at Day 168 for one treatment (e.g., Treatment A) is significantly lower than the other, with a P-value less than 0.05 (from the Cox model), it would indicate that Treatment A significantly reduces the risk of death by Day 168.
The inclusion of the P-values provides statistical evidence to support whether any observed differences between Treatment A and Treatment B are likely due to the treatments themselves or merely due to random chance.
Purpose: This table displays the results of a Cox Proportional Hazards Regression Model, which evaluates how various factors (covariates) influence the hazard (risk) of death over time.
Analysis Population: The analysis was conducted on the Intent-to-Treat (ITT) population, meaning all subjects randomized were included.
Measures the relative risk of the event (death) for one level of a covariate compared to a reference level.
HR > 1: Increased risk in the first group/level.
HR < 1: Decreased risk in the first group/level.
HR = 1: No difference in risk.
95% Confidence Interval (CI): A range of plausible values for the true HR. If the CI includes 1, the effect is not statistically significant.
P-value: Indicates the statistical significance of the covariate’s effect. A P-value < 0.05 typically suggests a statistically significant association.
Treatment Group (Treatment B to Treatment A):
Interpretation: The Hazard Ratio indicates the relative risk of death for patients in Treatment B compared to those in Treatment A.
Conclusion based on hypothetical results:
If x.xx (HR) is less than 1 (e.g., 0.75) and 0.xxx (P-value) is less than 0.05, it would suggest that Treatment B significantly reduces the hazard of death compared to Treatment A.
If x.xx (HR) is greater than 1 (e.g., 1.25) and 0.xxx (P-value) is less than 0.05, it would suggest that Treatment B significantly increases the hazard of death compared to Treatment A.
If the 95% CI includes 1 or the P-value is greater than or equal to 0.05, there is no statistically significant difference in the hazard of death between Treatment B and Treatment A.
Interpretation: The Hazard Ratio indicates the relative risk of death for subjects aged ≥ 65 years old compared to those aged < 65 years old.
Conclusion based on hypothetical results:
If x.xx (HR) is greater than 1 (e.g., 1.80) and 0.xxx (P-value) is less than 0.05, it would suggest that being 65 years or older significantly increases the hazard of death compared to being younger than 65 years.
If x.xx (HR) is less than 1 or the P-value is greater than or equal to 0.05, there is no statistically significant effect of age on the hazard of death (within these categories), after accounting for other factors in the model.
Interpretation: The Hazard Ratio indicates the relative risk of death for females compared to males.
Conclusion based on hypothetical results:
If x.xx (HR) is less than 1 (e.g., 0.90) and 0.xxx (P-value) is less than 0.05, it would suggest that females have a significantly lower hazard of death compared to males.
If x.xx (HR) is greater than 1 (e.g., 1.15) and 0.xxx (P-value) is less than 0.05, it would suggest that females have a significantly higher hazard of death compared to males.
If the 95% CI includes 1 or the P-value is greater than or equal to 0.05, there is no statistically significant difference in the hazard of death between females and males, after accounting for other factors.
This Cox Regression model provides insights into the independent prognostic factors for time to death, accounting for the effects of treatment group, age, and sex simultaneously. The P-values guide whether the observed associations are statistically significant, allowing researchers to identify which covariates are important predictors of the time to death through Day 168.