Breast Cancer: RWE Survival Analysis

1. Environment Setup & Data Ingestion

This analysis evaluates the real-world overall survival (OS) of patients with metastatic breast cancer. This initial section establishes the R environment, loads necessary statistical packages (including WeightIt, survival, and margins), and imports the primary dataset.

library(data.table)
library(readxl)
library(survival)
library(ggsurvfit)
library(survminer)
library(WeightIt)
library(cobalt)
library(ggplot2)
library(car)
library(ResourceSelection)
library(pROC)
library(margins)
library(EValue)
library(ebal)
library(DT)
library(ggpubr)

# Ensure the file path is correct for your local machine
tryCatch({
  dt <- as.data.table(read_excel("C:/Users/samuel.baird/OneDrive - Cardinal Health/Documents/Code_AACR/Data/AKT1-mutatedBreastCancer_Final dataset.xlsx"))
}, error = function(e) {
  stop("File not found. Please check your path.")
})

setnames(dt, old = c("Patient ID", "Primary Race", "Center", "Overall Tumor Grade",
                     "Has this Patient experienced a Loco-Regional Recurrence (LRR)?",
                     "Was a biopsy performed of metastatic site?", "Sequencing Method",
                     "Discordant receptor status with primary",
                     "Interval between sequencing and metastatic diagnosis (months)",
                     "Additional Sequencing Performed", "Fraction Genome Altered",
                     "HER2 Status", "AJCC Stage", "AKT Mutation Status",
                     "Tumor Mutation Count", "Hormone Receptor Status",
                     "ER/PR Receptor Change", "mTOR Inhibitor Overall", 
                     "CDK4/6 Inhibitor Overall", "AKT Inhibitor Overall", 
                     "Fulvestrant Overall", "Aromatase Inhibitor Overall",
                     "Age at Initial Diagnosis", "Age at Metastatic Diagnosis",
                     "Therapeutic clinical trial overall", "Met Site: Bone",
                     "Met Site: Lymph Node", "Met Site: Multiple", 
                     "Met Site: Other", "Met Site: Soft Tissue",
                     "Met Site: Visceral", "Met Site: Bone Only",
                     "Met Site: Lung", "Met Site: Brain", "Met Site: Liver",
                     "Cancer Type Detailed", "Overall Survival (Months)",
                     "Overall Survival Status"),
             new = c("patient_id", "race", "center", "tumor_grade", "lrr",
                     "met_biopsy", "seq_meth", "discordant", "seq_int",
                     "add_seq", "frac_alt", "HER2", "stage", "akt", "TMB",
                     "HR_status", "HR_change", "mTOR", "CDK4/6", 
                     "AKT_inhibitor", "fulvestrant", "aromatase_inhibitor",
                     "prim_age", "met_age", "clinical_trial", "bone_mets",
                     "lymph_mets", "multiple_mets", "other_mets", "soft_mets",
                     "visceral_mets", "boneonly_mets", "lung_mets", "brain_mets",
                     "liver_mets", "cancer_type", "survival_time", "survival_status"))

2. Cohort Definition

To emulate a clinical trial population, the cohort is restricted to patients who are HER2-negative, Hormone Receptor-positive (HR+), and Stage 4.

Patients are stratified into two cohorts: * Group A (Intensive Regimen): Received a sequence of Fulvestrant, an Aromatase Inhibitor, and a targeted therapy (mTOR, CDK4/6, or AKT inhibitor). * Group B (Control): Received alternative standard-of-care regimens.

eligible <- dt[HER2 == "Negative" & (HR_status %in% c("ER+ & PR-", "ER+ & PR+")) & (stage == "4")]

eligible[, fat_flag := (fulvestrant == "Yes" & aromatase_inhibitor == "Yes") & 
           (mTOR == "Yes" | get("CDK4/6") == "Yes" | AKT_inhibitor == "Yes")]

eligible[, therapy_flag := factor(ifelse(fat_flag == TRUE, "Group A", "Group B"), 
                                  levels = c("Group B", "Group A"))]

eligible[, survival_status := ifelse(survival_status == "1:DECEASED", 1, 0)]

3. Exploratory Covariate Testing

Real-world data often contains missingness and sparse sub-categories that can destabilize causal models. This section recategorizes key clinical variables (e.g., grouping age at metastasis, stabilizing tumor grade, and consolidating metastatic sites) to ensure robust statistical balancing later in the pipeline.

eligible[is.na(met_biopsy) | met_biopsy == "NA", met_biopsy := "Unknown"]
eligible[, met_biopsy := as.factor(met_biopsy)]
eligible[is.na(bone_mets) | bone_mets == "NA", bone_mets := "Unknown"]
eligible[is.na(lymph_mets) | lymph_mets == "NA", lymph_mets := "Unknown"]
eligible[, lymph_mets := as.factor(lymph_mets)]
eligible[, multiple_mets := as.factor(multiple_mets)]
eligible[is.na(other_mets) | other_mets == "NA", other_mets := "Unknown"]
eligible[is.na(soft_mets) | soft_mets == "NA", soft_mets := "Unknown"]
eligible[, soft_mets := as.factor(soft_mets)]
eligible[, visceral_mets := as.factor(visceral_mets)]
eligible[, seq_meth := as.factor(seq_meth)]
eligible[is.na(discordant) | discordant == "NA", discordant := "Unknown"]
eligible[, discordant := as.factor(discordant)]

eligible[, TMB := as.numeric(TMB)]
eligible[, TMB := cut(TMB, c(-Inf, 10, Inf), c("low", "high"), right = T)]
eligible[TMB == "NA" | is.na(TMB), TMB := "Unknown"]

eligible[, frac_alt := as.numeric(frac_alt)]
eligible[, frac_alt := cut(frac_alt, c(-Inf, 0.35, Inf), c("low fra", "high fra"), right = T)]
eligible[frac_alt == "NA" | is.na(frac_alt), frac_alt := "Unknown"]

eligible[race %in% c("Black", "Other", "NA"), race := "Non-White"] 
eligible[, cancer_type := ifelse(cancer_type %in% c("Breast Invasive Lobular Carcinoma", "Invasive Breast Carcinoma", "Breast Mixed Ductal and Lobular Carcinoma", "Breast Invasive Carcinoma, NOS"), "Non-Ductal", "Ductal")]
eligible[, liver_mets := ifelse(liver_mets == "NA" | is.na(liver_mets), "Unknown", liver_mets)]
eligible[, brain_mets := ifelse(brain_mets == "NA" | is.na(brain_mets), "Unknown", brain_mets)]
eligible[, lung_mets := ifelse(lung_mets == "NA" | is.na(lung_mets), "Unknown", lung_mets)]
eligible[, met_age := cut(met_age, c(-Inf,55,Inf), labels = c("<=55",">55"), right = TRUE)]

eligible[tumor_grade %in% c("1","2"), tumor_grade := "Grade 1-2"]
eligible[tumor_grade == "3", tumor_grade := "Grade 3"]
eligible[tumor_grade %in% c("NA", "Unk"), tumor_grade := "Unknown"]

eligible[, seq_int := cut(seq_int, c(-Inf, 3, Inf), c("seq int 1-3 months", "seq int 4+ months"), right = T)]
eligible[, add_seq := as.factor(add_seq)]
eligible[, boneonly_mets := as.factor(boneonly_mets)]
eligible[, akt := as.factor(akt)]
eligible[HR_change == "NA" | is.na(HR_change) | HR_change == "Unk", HR_change := "Unknown"]
eligible[, HR_change := as.factor(HR_change)]
eligible[, clinical_trial := as.factor(clinical_trial)]

# 3-Tier Global Hierarchy for Stability across AME and WeightIt
eligible[, met_pattern_stable := fcase(
  liver_mets == "Yes", "Liver Involved",
  boneonly_mets == "Yes", "Bone-Only",
  default = "Other Sites (Non-Liver)" 
)]

eligible[, met_pattern_stable := factor(met_pattern_stable, 
                                        levels = c("Bone-Only", "Other Sites (Non-Liver)", "Liver Involved"))]

eligible[, tumor_grade_stable := factor(ifelse(tumor_grade == "Grade 1-2", "Grade 1-2", "Grade 3/Unknown"),
                                        levels = c("Grade 1-2", "Grade 3/Unknown"))]

# --- 1L Chemotherapy Data Prep ---
# Create the factor and handle the long column name
eligible[, first_chemo := as.factor(`Chemotherapy in first-line treatment`)]

# Define the binary outcome for the GLM (1 = Yes, 0 = No)
# Note: Check if your data uses "Yes" or "1:Yes"
eligible[, first_chemo_binary := ifelse(first_chemo == "Yes", 1, 0)]

# Clean up the old column name to keep the DT tidy
eligible[, `Chemotherapy in first-line treatment` := NULL]

4. Unweighted Survival Analysis

A naive Kaplan-Meier analysis is generated to observe the crude survival outcomes before adjusting for clinical confounding factors.

# 1. Standard KM analysis
surv_object <- Surv(time = eligible[, survival_time], event = eligible[, survival_status])
km_fit <- survfit2(surv_object ~ therapy_flag, data = eligible)

ggsurvplot(km_fit, data = eligible, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",
           surv.median.line = "hv", linetype = "solid", break.time.by = 12,
           title = "Unweighted Overall Survival", 
           legend.title = "Cohort",
           legend.labs = c("Group B (Control)", "Group A (Intensive Regimen)"),
           palette = c("#549ed1", "#f79c59"),
           ggtheme = theme_classic())

# 2. Reverse KM for Median Follow-Up
eligible[, follow_up_status := ifelse(survival_status == 1, 0, 1)]
follow_up_fit <- survfit(Surv(survival_time, follow_up_status) ~ 1, data = eligible)
print(follow_up_fit)

## Call: survfit(formula = Surv(survival_time, follow_up_status) ~ 1, 
##     data = eligible)
## 
##       n events median 0.95LCL 0.95UCL
## [1,] 83     52   58.9    45.6    74.5

5. Entropy Balancing (WeightIt)

Due to the lack of randomization in real-world data, treatment assignment is often confounded by patient prognosis (e.g., sicker patients receiving different regimens). We utilize Entropy Balancing to generate patient-level weights, ensuring exact covariate balance across age, tumor grade, AKT status, and metastatic pattern to simulate a randomized average treatment effect (ATE).

weightit_model <- weightit(therapy_flag ~ met_age + tumor_grade_stable + akt + met_pattern_stable,
                           data = eligible, method = "ebal", estimand = "ATE")

summary(weightit_model)

##                   Summary of weights
## 
## - Weight ranges:
## 
##           Min                                 Max
## treated 0.143 |---------------------------| 3.038
## control 0.41     |------------|             1.752
## 
## - Units with the 5 most extreme weights by group:
##                                       
##             22    25    18    16     3
##  treated  1.48  1.48 2.383 2.692 3.038
##             29    19     8    14    17
##  control 1.532 1.677 1.752 1.752 1.752
## 
## - Weight statistics:
## 
##         Coef of Var   MAD Entropy # Zeros
## treated       0.713 0.537   0.221       0
## control       0.349 0.270   0.058       0
## 
## - Effective Sample Sizes:
## 
##            Control Treated
## Unweighted   53.     30.  
## Weighted     47.33   20.12

eligible[, ws := weightit_model$weights]

final_labels_weight <- c(
  "met_age_>55" = "Age > 55 at Diagnosis",
  "akt_Wildtype" = "AKT Wildtype",
  "tumor_grade_stable_Grade 3/Unknown" = "Tumor Grade: 3 or Unknown",
  "met_pattern_stable_Bone-Only" = "Bone-Only",
  "met_pattern_stable_Liver Involved" = "Liver Involved",
  "met_pattern_stable_Other Sites (Non-Liver)" = "Other Sites (Non-Liver)"
)

love.plot(weightit_model, thresholds = 0.1, abs = TRUE, var.order = "unadjusted", 
          line = TRUE, shapes = c(16, 17), limits = c(0, 0.6), colors = c("#f79c59", "#549ed1"), 
          var.names = final_labels_weight,
          theme = theme_classic() + theme(plot.title = element_text(face = "bold")))

6. Adjusted (Weighted) Survival Analysis

Using the entropy weights, an adjusted Cox Proportional Hazards model and corresponding Kaplan-Meier curves are generated. This represents the primary outcome of the study, estimating the isolated effect of Group A therapy on overall survival. We also compute an E-value to quantify the minimum strength of an unmeasured confounder required to nullify these results.

km_fit_weighted <- survfit2(Surv(survival_time, survival_status) ~ therapy_flag,
                            data = eligible, weights = ws)

cox_model_weighted <- coxph(Surv(survival_time, survival_status) ~ therapy_flag, 
                            data = eligible, weights = ws, robust = TRUE)
summary(cox_model_weighted)

## Call:
## coxph(formula = Surv(survival_time, survival_status) ~ therapy_flag, 
##     data = eligible, weights = ws, robust = TRUE)
## 
##   n= 83, number of events= 31 
## 
##                        coef exp(coef) se(coef) robust se      z Pr(>|z|)  
## therapy_flagGroup A -0.9458    0.3884   0.4264    0.4588 -2.062   0.0392 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## therapy_flagGroup A    0.3884      2.575     0.158    0.9544
## 
## Concordance= 0.628  (se = 0.046 )
## Likelihood ratio test= 5.51  on 1 df,   p=0.02
## Wald test            = 4.25  on 1 df,   p=0.04
## Score (logrank) test = 5.27  on 1 df,   p=0.02,   Robust = 4.99  p=0.03
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

# Test Proportional Hazards Assumption
ph_test <- cox.zph(cox_model_weighted, transform = "km", terms = TRUE)
print(ph_test)

##              chisq df    p
## therapy_flag  1.01  1 0.32
## GLOBAL        1.01  1 0.32

ggsurvplot(km_fit_weighted, data = eligible, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",
           surv.median.line = "hv", linetype = "solid", break.time.by = 12,
           title = "Adjusted Overall Survival (Entropy Balanced)", 
           legend.title = "Cohort",
           legend.labs = c("Group B (Control)", "Group A (Intensive Regimen)"),
           palette = c("#549ed1", "#f79c59"),
           ggtheme = theme_classic())

evalues.HR(est = 0.39, lo = 0.16, hi = 0.95, true = 1, rare = FALSE)

##              point     lower     upper
## RR       0.5240857 0.2941386 0.9650718
## E-values 3.2244070        NA 1.2298470

To explore the independent prognostic value of clinical covariates, we run univariable Cox models on both the raw and the entropy-balanced cohorts.

covariates_filtered <- c("met_age", "tumor_grade_stable", "akt", "met_pattern_stable")

full_label_map <- c(
  "met_age>55" = "Age > 55 at Diagnosis",
  "aktWildtype" = "AKT Wildtype",
  "tumor_grade_stableGrade 3/Unknown" = "Tumor Grade: 3 or Unknown",
  "met_pattern_stableLiver Involved" = "Liver Involved",
  "met_pattern_stableOther Sites (Non-Liver)" = "Other Sites (Non-Liver)",
  "met_pattern_stableBone-Only" = "Bone-Only"
)

# --- UNWEIGHTED UNIVARIATE ---
uni_cox_unweighted <- lapply(covariates_filtered, function(cov) {
  formula <- as.formula(paste("Surv(survival_time, survival_status) ~", cov))
  fit <- coxph(formula, data = eligible) 
  return(summary(fit))
})
names(uni_cox_unweighted) <- covariates_filtered

uni_table_unweighted <- rbindlist(lapply(names(uni_cox_unweighted), function(cov_name) {
  s <- uni_cox_unweighted[[cov_name]]
  res <- as.data.table(s$conf.int, keep.rownames = "Raw_Level")
  res[, p_val_numeric := s$coefficients[, 5]] # index 5 for unweighted standard
  return(res)
}))

uni_table_unweighted[, Variable := ifelse(Raw_Level %in% names(full_label_map), full_label_map[Raw_Level], Raw_Level)]

datatable(uni_table_unweighted[, .(Variable, HR = round(`exp(coef)`, 2), Lower = round(`lower .95`, 2), Upper = round(`upper .95`, 2), P = round(p_val_numeric, 3))], 
          rownames = FALSE, options = list(dom = 't'), caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 1: Unweighted Univariate Analysis'))

# --- WEIGHTED UNIVARIATE ---
uni_cox_weighted <- lapply(covariates_filtered, function(cov) {
  formula <- as.formula(paste("Surv(survival_time, survival_status) ~", cov))
  fit <- coxph(formula, data = eligible, weights = ws, robust = TRUE) 
  return(summary(fit))
})
names(uni_cox_weighted) <- covariates_filtered

uni_table_weighted <- rbindlist(lapply(names(uni_cox_weighted), function(cov_name) {
  s <- uni_cox_weighted[[cov_name]]
  res <- as.data.table(s$conf.int, keep.rownames = "Raw_Level")
  res[, p_val_numeric := s$coefficients[, 6]] # index 6 for robust models
  return(res)
}))

uni_table_weighted[, Variable := ifelse(Raw_Level %in% names(full_label_map), full_label_map[Raw_Level], Raw_Level)]

datatable(uni_table_weighted[, .(Variable, HR = round(`exp(coef)`, 2), Lower = round(`lower .95`, 2), Upper = round(`upper .95`, 2), P = round(p_val_numeric, 3))], 
          rownames = FALSE, options = list(dom = 't'), caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 2: Weighted Univariate Analysis'))

8. Landmark OS Analysis

To assess the durability of the treatment effect over time and account for potential immortal time bias in early stages, we calculate Hazard Ratios at fixed annual landmarks (1 to 6 years) post-metastatic diagnosis.

landmarks <- seq(12, 72, by = 12)

# --- UNWEIGHTED LANDMARK ---
landmark_unweighted <- rbindlist(lapply(landmarks, function(t) {
  d_land <- eligible[survival_time >= t][, st_adj := survival_time - t]
  if(nrow(d_land) < 5 || length(unique(d_land$therapy_flag)) < 2 || sum(d_land$survival_status) < 1) return(NULL)
  
  fit <- coxph(Surv(st_adj, survival_status) ~ therapy_flag, data = d_land)
  s <- summary(fit)
  
  data.table(Time = paste(t, "Months"), N = nrow(d_land), Events = sum(d_land$survival_status),
             HR = round(s$conf.int[1, 1], 2), Lower = round(s$conf.int[1, 3], 2), Upper = round(s$conf.int[1, 4], 2), P = round(s$coefficients[1, 5], 3))
}))

datatable(landmark_unweighted, rownames = FALSE, options = list(dom = 't'), 
          caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 3: Unweighted Landmark Analysis'))

# --- WEIGHTED LANDMARK ---
landmark_weighted <- rbindlist(lapply(landmarks, function(t) {
  d_land <- eligible[survival_time >= t][, st_adj := survival_time - t]
  if(nrow(d_land) < 5 || length(unique(d_land$therapy_flag)) < 2 || sum(d_land$survival_status) < 1) return(NULL)
  
  fit <- coxph(Surv(st_adj, survival_status) ~ therapy_flag, data = d_land, weights = d_land$ws, robust = TRUE)
  s <- summary(fit)
  
  data.table(Time = paste(t, "Months"), N = nrow(d_land), Events = sum(d_land$survival_status),
             HR = round(s$conf.int[1, 1], 2), Lower = round(s$conf.int[1, 3], 2), Upper = round(s$conf.int[1, 4], 2), P = round(s$coefficients[1, 6], 3))
}))

datatable(landmark_weighted, rownames = FALSE, options = list(dom = 't'), 
          caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 4: Weighted Landmark Analysis'))

9. Clinical Drivers of Selection (Diagnostics)

Understanding why a physician selected a specific therapy in the real world is critical. This section uses logistic regression models and calculates Average Marginal Effects (AME) to quantify how baseline clinical characteristics influenced the probability of being assigned to the Control arm versus the Intensive Regimen arm, as well as the probability of initiating first-line chemotherapy (therapeutic diversion). Goodness-of-fit is assessed via Hosmer-Lemeshow and AUC.

# ==============================================================================
# GROUP B (CONTROL) DIAGNOSTICS & AME
# ==============================================================================
eligible[, therapy_flag_B := ifelse(therapy_flag == "Group B", 1, 0)]

# Set factors to Low-Risk baseline for model
eligible[, met_age_B := factor(met_age, levels = c("<=55", ">55"))]
eligible[, tumor_grade_B := factor(tumor_grade_stable, levels = c("Grade 1-2", "Grade 3/Unknown"))]
eligible[, akt_B := factor(akt, levels = c("Mutant", "Wildtype"))]
eligible[, met_pattern_B := factor(met_pattern_stable, levels = c("Bone-Only", "Liver Involved", "Other Sites (Non-Liver)"))]

glm_B <- glm(therapy_flag_B ~ met_age_B + tumor_grade_B + akt_B + met_pattern_B, data = eligible, family = binomial())

# --- Model Validation ---
y_B <- glm_B$y 
yhat_B <- fitted(glm_B)

# Calibration
hl_B <- hoslem.test(y_B, yhat_B, g = 6)
# Discrimination
roc_B <- roc(y_B, yhat_B, quiet = TRUE)
auc_val_b <- round(auc(roc_B), 3)

cat("--- Group B Assignment Prediction ---\n")

## --- Group B Assignment Prediction ---

cat("AUC: ", auc_val_b, "\n")

## AUC:  0.753

cat("Hosmer-Lemeshow p-value: ", round(hl_B$p.value, 3), "\n\n")

## Hosmer-Lemeshow p-value:  0.534

# --- AME Generation ---
ame_B <- as.data.table(summary(margins(glm_B)))
labels_B <- c("met_pattern_BLiver Involved" = "Liver Involved", "met_pattern_BOther Sites (Non-Liver)" = "Other Sites (Non-Liver)", 
              "tumor_grade_BGrade 3/Unknown" = "Tumor Grade: 3 or Unknown", "met_age_B>55" = "Age > 55 at Diagnosis", "akt_BWildtype" = "AKT Wildtype")
ame_B[, factor := ifelse(factor %in% names(labels_B), labels_B[factor], factor)]

ame_table_b_display <- ame_B[, .(Variable = factor, "Marginal Effect" = round(AME, 3), Lower = round(lower, 3), Upper = round(upper, 3), "P-Value" = round(p, 3))]

datatable(ame_table_b_display, rownames = FALSE, options = list(dom = 't'), 
          caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 5A: Average Marginal Effects for Group B Assignment'))

ggplot(ame_B, aes(x = AME, y = reorder(factor, AME))) +
  geom_point(size = 3, color = "black") + geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") + scale_x_continuous(labels = scales::percent) +
  labs(
    title = "Clinical Drivers of Group B (Control) Selection",
    subtitle = paste0("AUC: ", auc_val_b, " | HL p-val: ", round(hl_B$p.value, 3)),
    x = "Change in Probability of Receiving Group B",
    y = NULL,
    caption = paste0(
      "Note: Values represent Average Marginal Effects (AME). A positive % indicates a higher absolute probability\n",
      "of assignment to the Control arm relative to the baseline, holding other factors constant.\n",
      "Baseline references reflect 'low-risk' criteria: Bone-Only, AKT Mutant, Age <= 55, and Grade 1-2."
    )
  ) +
  theme_pubr() + theme(plot.caption = element_text(hjust = 0, size = 9, face = "italic", color = "grey30"))

# ==============================================================================
# GROUP A (Intensive Regimen) DIAGNOSTICS & AME
# ==============================================================================
eligible[, therapy_flag_A := ifelse(therapy_flag == "Group A", 1, 0)]

# Set factors to High-Risk baseline for model
eligible[, met_age_A := factor(met_age, levels = c(">55", "<=55"))]
eligible[, tumor_grade_A := factor(tumor_grade_stable, levels = c("Grade 3/Unknown", "Grade 1-2"))]
eligible[, akt_A := factor(akt, levels = c("Wildtype", "Mutant"))]
eligible[, met_pattern_A := factor(met_pattern_stable, levels = c("Liver Involved", "Bone-Only", "Other Sites (Non-Liver)"))]

glm_A <- glm(therapy_flag_A ~ met_age_A + tumor_grade_A + akt_A + met_pattern_A, data = eligible, family = binomial())

# --- Model Validation ---
y_A <- glm_A$y 
yhat_A <- fitted(glm_A)

# Calibration
hl_A <- hoslem.test(y_A, yhat_A, g = 6)
# Discrimination
roc_A <- roc(y_A, yhat_A, quiet = TRUE)
auc_val_a <- round(auc(roc_A), 3)

cat("--- Group A Assignment Prediction ---\n")

## --- Group A Assignment Prediction ---

cat("AUC: ", auc_val_a, "\n")

## AUC:  0.753

cat("Hosmer-Lemeshow p-value: ", round(hl_A$p.value, 3), "\n\n")

## Hosmer-Lemeshow p-value:  0.743

# --- AME Generation ---
ame_A <- as.data.table(summary(margins(glm_A)))
labels_A <- c("met_age_A<=55" = "Age <= 55 at Diagnosis", "tumor_grade_AGrade 1-2" = "Tumor Grade: 1-2", "akt_AMutant" = "AKT Mutant",
              "met_pattern_ABone-Only" = "Bone-Only", "met_pattern_AOther Sites (Non-Liver)" = "Other Sites (Non-Liver)")
ame_A[, factor := ifelse(factor %in% names(labels_A), labels_A[factor], factor)]

ame_table_a_display <- ame_A[, .(Variable = factor, "Marginal Effect" = round(AME, 3), Lower = round(lower, 3), Upper = round(upper, 3), "P-Value" = round(p, 3))]

datatable(ame_table_a_display, rownames = FALSE, options = list(dom = 't'), 
          caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 5B: Average Marginal Effects for Group A Assignment'))

ggplot(ame_A, aes(x = AME, y = reorder(factor, AME))) +
  geom_point(size = 3, color = "black") + geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2, color = "black") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") + scale_x_continuous(labels = scales::percent, limits = c(-0.6, 0.6)) +
  labs(
    title = "Clinical Drivers of Intensive Regimen Therapy (Group A) Selection",
    subtitle = paste0("AUC: ", auc_val_a, " | HL p-val: ", round(hl_A$p.value, 3)),
    x = "Change in Probability of Receiving Group A",
    y = NULL,
    caption = paste0(
      "Note: Values represent Average Marginal Effects (AME). A positive % indicates a higher absolute probability\n",
      "of assignment to the Intensive Regimen arm relative to the baseline, holding other factors constant.\n",
      "Baseline references reflect 'high-risk' criteria: Liver Involved, AKT Wildtype, Age > 55, and Grade 3/Unknown."
    )
  ) +
  theme_pubr() + theme(plot.caption = element_text(hjust = 0, size = 9, face = "italic", color = "grey30"))

Therapeutic Diversion (1L Chemotherapy)

To further contextualize the clinical decision-making process, we evaluate the baseline clinical features driving the initiation of first-line systemic chemotherapy. This analysis aims to confirm whether specific high-risk presentations systematically diverted patients away from targeted standard-of-care options.

# ==============================================================================
# 1L CHEMOTHERAPY SELECTION (THERAPEUTIC DIVERSION)
# ==============================================================================

# 1. Define the Outcome within the chunk to keep Section 3 unchanged
eligible[, first_chemo_binary := ifelse(first_chemo == "Yes", 1, 0)]

# 2. Fit the 1L Chemo Selection Model 
# We use the same 'Stable' variables and references as Group B/A for consistency
glm_chemo_1L <- glm(
  first_chemo_binary ~ met_age + tumor_grade_stable + akt + met_pattern_stable,
  data = eligible,
  family = binomial()
)

# 3. Model Diagnostics (Discrimination & Calibration)
y_chemo <- glm_chemo_1L$y
yhat_chemo <- fitted(glm_chemo_1L)
auc_chemo <- round(auc(roc(y_chemo, yhat_chemo, quiet = TRUE)), 3)
hl_chemo <- hoslem.test(y_chemo, yhat_chemo, g = 6)

cat("--- 1L Chemotherapy Prediction Diagnostics ---\n")

## --- 1L Chemotherapy Prediction Diagnostics ---

cat("AUC: ", auc_chemo, "\n")

## AUC:  0.764

cat("Hosmer-Lemeshow p-value: ", round(hl_chemo$p.value, 3), "\n\n")

## Hosmer-Lemeshow p-value:  0.876

# 4. Generate AME Table & Labels
ame_chemo <- as.data.table(summary(margins(glm_chemo_1L)))

labels_chemo <- c(
  "met_pattern_stableLiver Involved" = "Liver Involved",
  "met_pattern_stableOther Sites (Non-Liver)" = "Other Sites (Non-Liver)", 
  "tumor_grade_stableGrade 3/Unknown" = "Tumor Grade: 3 or Unknown",
  "met_age>55" = "Age > 55 at Diagnosis",
  "aktWildtype" = "AKT Wildtype"
)

ame_chemo[, factor := ifelse(factor %in% names(labels_chemo), labels_chemo[factor], factor)]

# 5. Display the Data Table
ame_table_chemo_display <- ame_chemo[, .(Variable = factor, "Marginal Effect" = round(AME, 3), Lower = round(lower, 3), Upper = round(upper, 3), "P-Value" = round(p, 3))]

datatable(ame_table_chemo_display, rownames = FALSE, options = list(dom = 't'), 
          caption = htmltools::tags$caption(style = 'caption-side: top; text-align: left; font-weight: bold; color: black;', 'Table 5C: Average Marginal Effects for 1L Chemotherapy Selection'))

# 6. Final AME Plot for 1L Chemotherapy (No Reference Row)
ggplot(ame_chemo, aes(x = AME, y = reorder(factor, AME))) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray60") +
  geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2, color = "black") +
  geom_point(size = 3.5, color = "black") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1), limits = c(-0.6, 0.8)) +
  labs(
    title = "Clinical Drivers of 1L Chemotherapy Selection",
    subtitle = paste0("AUC: ", auc_chemo, " | HL p-val: ", round(hl_chemo$p.value, 3)),
    x = "Change in Probability of Receiving 1L Chemotherapy",
    y = NULL,
    caption = paste0(
      "Note: Values represent Average Marginal Effects (AME). A positive % indicates a higher absolute probability\n",
      "of receiving 1L chemotherapy relative to the baseline (Bone-Only, AKT Mutant, Age <= 55, Grade 1-2)."
    )
  ) +
  theme_pubr() + 
  theme(plot.caption = element_text(hjust = 0, size = 9, face = "italic", color = "grey30"),
        plot.title = element_text(face = "bold"))

10. Sensitivity Analyses

To validate the robustness of the primary entropy balancing model, we estimate the ATE using Covariate Balancing Propensity Scores (CBPS) and also conduct a sensitivity check to assess if the sequencing interval (‘seq_int’) acts as an unmeasured confounder influencing the treatment effect.

# ==============================================================================
# Sensitivity 1: CBPS Weights
# ==============================================================================
cbps_model <- weightit(therapy_flag ~ met_age + tumor_grade_stable + akt + met_pattern_stable,
                       data = eligible, 
                       method = "cbps", 
                       estimand = "ATE")

eligible[, ws_cbps := cbps_model$weights]

cox_cbps <- coxph(Surv(survival_time, survival_status) ~ therapy_flag, 
                  data = eligible, 
                  weights = ws_cbps, 
                  robust = TRUE)

summary(cox_cbps)

## Call:
## coxph(formula = Surv(survival_time, survival_status) ~ therapy_flag, 
##     data = eligible, weights = ws_cbps, robust = TRUE)
## 
##   n= 83, number of events= 31 
## 
##                        coef exp(coef) se(coef) robust se      z Pr(>|z|)  
## therapy_flagGroup A -1.0226    0.3597   0.2906    0.4415 -2.316   0.0206 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## therapy_flagGroup A    0.3597       2.78    0.1514    0.8545
## 
## Concordance= 0.654  (se = 0.051 )
## Likelihood ratio test= 12.83  on 1 df,   p=3e-04
## Wald test            = 5.36  on 1 df,   p=0.02
## Score (logrank) test = 13.42  on 1 df,   p=2e-04,   Robust = 6.12  p=0.01
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

km_fit_cbps <- survfit2(Surv(survival_time, survival_status) ~ therapy_flag,
                        data = eligible, weights = ws_cbps)

ggsurvplot(km_fit_cbps, data = eligible, pval = TRUE, risk.table = TRUE, risk.table.col = "strata",
           surv.median.line = "hv", linetype = "solid", break.time.by = 12,
           title = "Sensitivity OS Analysis (CBPS Weighted)", 
           legend.title = "Cohort",
           legend.labs = c("Group B (Control)", "Group A (Intensive Regimen)"),
           palette = c("#549ed1", "#f79c59"),
           ggtheme = theme_classic())

# ==============================================================================
# Sensitivity 2: Adjusting for Sequencing Interval (seq_int)
# ==============================================================================
cox_sensitivity <- coxph(
  Surv(survival_time, survival_status) ~ therapy_flag + seq_int, 
  data = eligible, 
  weights = ws, 
  robust = TRUE
)

summary(cox_sensitivity)

## Call:
## coxph(formula = Surv(survival_time, survival_status) ~ therapy_flag + 
##     seq_int, data = eligible, weights = ws, robust = TRUE)
## 
##   n= 83, number of events= 31 
## 
##                             coef exp(coef) se(coef) robust se      z Pr(>|z|)
## therapy_flagGroup A      -0.9775    0.3763   0.4279    0.4680 -2.089 0.036734
## seq_intseq int 4+ months -1.4559    0.2332   0.4486    0.3795 -3.836 0.000125
##                             
## therapy_flagGroup A      *  
## seq_intseq int 4+ months ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                          exp(coef) exp(-coef) lower .95 upper .95
## therapy_flagGroup A         0.3763      2.658    0.1504    0.9415
## seq_intseq int 4+ months    0.2332      4.288    0.1108    0.4906
## 
## Concordance= 0.713  (se = 0.041 )
## Likelihood ratio test= 15.5  on 2 df,   p=4e-04
## Wald test            = 20.75  on 2 df,   p=3e-05
## Score (logrank) test = 16.68  on 2 df,   p=2e-04,   Robust = 10.8  p=0.005
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

11. Manuscript Drafts

Pitch

This study identifies a significant “effectiveness gap” in metastatic breast cancer by demonstrating that the 34-month survival advantage associated with an intensive endocrine/targeted regimen (characterized by the use of an aromatase inhibitor, fulvestrant, and at least one targeted therapy [mTOR, CDK4/6, or AKT1 inhibitor], and potentially tamoxifen or ovarian suppression) (89.9 vs. 55.9 months compared to conventional endocrine/targeted therapy, which included tamoxifen, ovarian suppression, fulvestrant, or aromatase inhibitor, and potentially targeted therapies, but did not comprise all three agent types of the intensive regimen) is not merely an artifact of patient selection. Utilizing a mature real-world cohort (N=83) from various institutions, with nearly five years of follow-up, we identified a distinct clinical trend where high-risk visceral patients were differentially allocated to alternative care—including first-line chemotherapy. Specifically, the presence of liver metastases drove a nearly 44% absolute increase in the probability of initiating chemotherapy compared to patients with bone-only disease. However, through rigorous Entropy Balancing of baseline characteristics, we confirmed that even after neutralizing these initial clinical differences, the intensive endocrine/targeted regimen provided a robust 61% reduction in the risk of death (aHR: 0.39; p=0.039). While the relatively small sample size (N=83, with an effective sample size of 20.1 for the intensive regimen group after weighting) necessitates cautious interpretation regarding the precise magnitude of effect, and despite the inherent potential for unmeasured confounding in observational studies, these findings strongly suggest that the current clinical practice of prioritizing chemotherapy or other less intensive approaches over this intensive endocrine/targeted regimen for high-risk visceral patients is associated with a substantial loss of survival years. This highlights the urgent need to re-evaluate current treatment paradigms, suggesting that this intensive endocrine/targeted regimen should be considered and potentially maintained for high-risk visceral patients, irrespective of initial metastatic burden.

Statistical Methods

Overall survival (OS) was evaluated using Kaplan-Meier methodology and Cox proportional hazards regression. To account for the lack of randomization in this real-world cohort, Entropy Balancing was employed to calculate patient-level weights, ensuring exact covariate balance across age, tumor grade, AKT mutation status, and metastatic presentation—covariates selected based on established clinical and prognostic relevance, guided by expert clinical judgment—to simulate a randomized Average Treatment Effect (ATE)¹. This method has been increasingly applied in medical research, including oncology, to address confounding in observational studies². It is important to note that all treatment assignments and subsequent therapies, including the intensive regimen, the conventional endocrine/targeted therapy, and any chemotherapy, were initiated post-baseline. The intensive regimen group received an aromatase inhibitor, fulvestrant, and at least one targeted therapy (mTOR, CDK4/6, or AKT1 inhibitor). Patients in this group could also have received tamoxifen or ovarian suppression. It is important to note that the precise sequence or concurrent combination of these agents within this regimen was not uniformly recorded or analyzed, reflecting the real-world variability in treatment administration. This group is characterized by the breadth of endocrine/targeted agents utilized rather than a fixed, concurrent triple-agent combination. The control group received conventional endocrine/targeted therapy, which could include tamoxifen, ovarian suppression, fulvestrant, or aromatase inhibitor, and potentially targeted therapies, but did not comprise all three agent types of the intensive regimen. As 1L chemotherapy was a post-baseline treatment decision, it was not included as a baseline covariate in the primary Entropy Balancing model, nor was it modeled as a time-varying covariate in the Cox regression, reflecting its potential role as a post-treatment decision or mediator. Baseline characteristics presented in Table 1 were compared between groups using Fisher’s exact test, Chi-square test, or Wilcoxon rank-sum (Mann-Whitney U) test, as appropriate for the data type. Statistical analyses were performed using R (v. 4.5.2) utilizing the WeightIt, survival, and margins packages. All statistical tests were two-sided, and a p-value <0.05 was considered statistically significant.

Missing data for clinical covariates were handled by treating missingness as a distinct ‘Unknown’ category to preserve sample size and reflect real-world data quality. While this approach can introduce bias if missingness is non-random, it was chosen to maximize the available data within this limited cohort, and alternative methods like multiple imputation were not pursued due to the specific characteristics of this dataset and the desire to reflect the prognostic information potentially embedded in the ‘Unknown’ category itself. The median follow-up time was determined using the Reverse Kaplan-Meier method. Covariate balance was quantitatively confirmed and visualized using a Love plot, with a target absolute standardized mean difference (ASMD) of <0.1 across all variables. To ensure mathematical stability given the sample size, metastatic sites were universally consolidated into a three-tier clinical hierarchy (Liver Involved, Bone-Only, and Other Sites), based on their established prognostic differences and clinical staging relevance. Following weighting, an adjusted Cox model with Huber-White robust (sandwich) standard errors³ was utilized to estimate the primary hazard ratio (HR), accounting for the uncertainty in weight estimation. The proportional hazards assumption was verified for all models using Schoenfeld residuals.

Additionally, univariable Cox models were applied to both the unweighted and entropy-balanced cohorts to assess the independent prognostic value of baseline clinical covariates. Landmark survival analyses were also performed at annual intervals to evaluate the durability of the treatment effect over time and mitigate potential immortal time bias. To quantify the potential impact of unmeasured confounding, an E-value⁴ was computed for the primary ATE hazard ratio, indicating how strongly an unmeasured confounder would need to be associated with both treatment assignment and survival (in the direction of negating the observed effect) to explain away the observed association. Furthermore, two sensitivity analyses were conducted to confirm the robustness of the primary findings: first, by utilizing Covariate Balancing Propensity Scores (CBPS)⁵ to verify the results were not dependent on the chosen weighting algorithm, and second, by directly adjusting the primary model for the interval between sequencing and metastatic diagnosis to rule out temporal confounding. Finally, to evaluate clinical drivers of treatment selection and identify potential therapeutic diversion, logistic regression models with Average Marginal Effects (AME) were generated to isolate the probability of assignment to either cohort, as well as the initiation of first-line chemotherapy, based on baseline clinical characteristics. The discrimination and calibration of these selection models were rigorously evaluated using the Area Under the Receiver Characteristic Curve (AUC) and the Hosmer-Lemeshow goodness-of-fit test⁶, respectively. This study was conducted using de-identified data in accordance with institutional review board guidelines.

Results

From an initial dataset of 185 patients from various institutions (details of which are provided in Table 1), 83 met the final eligibility criteria (HER2-negative, HR-positive, Stage IV) and were included in the analysis (Control Group [conventional endocrine/targeted therapy, which could include tamoxifen, ovarian suppression, fulvestrant, or aromatase inhibitor, and potentially targeted therapies, but did not comprise all three agent types of the intensive regimen], N=53; Intensive Regimen Group [an endocrine/targeted regimen characterized by the presence of an aromatase inhibitor, fulvestrant, and at least one targeted therapy (mTOR, CDK4/6, or AKT1 inhibitor), and potentially tamoxifen or ovarian suppression], N=30). While providing a mature dataset, the multi-institutional nature of this cohort enhances the generalizability of the findings compared to single-center studies, though potential differences in patient demographics and treatment approaches across institutions should still be considered. The median follow-up for the study was 58.9 months (95% CI: 45.6–74.5); a total of 31 events (deaths) were observed, with 52 patients (62.7%) censored at last follow-up. Baseline clinical characteristics and imbalances are summarized in Table 1. Following entropy balancing, pristine covariate balance was achieved (ASMD <0.1 across all clinical factors), as visualized in the Love plot. The resulting weighted pseudo-population maintained an effective sample size (ESS) of 20.1 in the intensive regimen group and 47.3 in the control group. While this ESS is relatively small, it reflects the necessary reduction in sample size to achieve pristine covariate balance, ensuring the validity of the comparison, albeit with a potential reduction in statistical power.

In the unweighted cohort, patients receiving the intensive triple-agent regimen demonstrated improved median overall survival compared to the conventional endocrine/targeted therapy (121.6 [95% CI: 77.7, NE] vs. 55.9 [95% CI: 46.4, NE] months; HR: 0.35; 95% CI, 0.15–0.81; p=0.013). In the primary entropy-balanced model, the survival benefit remained robust and statistically significant, with a median OS of 89.9 months (95% CI: 52.3, NE) for the intensive regimen cohort versus 55.9 months (95% CI: 46.4, NE) for the control cohort (Adjusted HR: 0.39; 95% CI, 0.16–0.95; p=0.039). The “NE” (Not Estimable) in the confidence intervals for median OS indicates that more than 50% of patients in these groups were censored before reaching the median, suggesting a prolonged survival benefit but also that the median is an extrapolated estimate and should be interpreted with caution. The proportional hazards assumption was satisfied for the primary model (Global p=0.32).

The robustness of this finding was strongly validated in sensitivity analyses. Utilizing the alternative CBPS weighting algorithm yielded a highly consistent survival benefit (Adjusted HR: 0.36; 95% CI, 0.15–0.85; p=0.021). Similarly, multivariable adjustment for the interval between sequencing and metastatic diagnosis confirmed that the survival benefit remained independent of sequencing timing (Adjusted HR: 0.38; 95% CI, 0.15–0.94; p = 0.037). The E-value for the primary ATE was calculated at 3.22, indicating that an unmeasured confounder would need to be strongly associated with both treatment assignment and survival (in the direction of negating the observed effect) to negate the observed effect. However, the possibility of residual unmeasured confounding, inherent to observational studies, cannot be entirely excluded. Univariate analyses demonstrated that liver involvement was the sole significant independent predictor of mortality in the balanced cohort (Adjusted HR: 2.59; 95% CI, 1.04–6.48; p=0.041). Furthermore, landmark OS analyses confirmed that the survival benefit of the intensive triple-agent regimen was durable, maintaining a consistent effect size at the 12-month post-diagnosis milestone (Adjusted HR: 0.39; 95% CI, 0.16–0.95; p=0.039).

Analysis of treatment selection drivers revealed a significant “therapeutic differential allocation.” Patients presenting with higher-risk visceral features, specifically liver involvement, had a significantly higher absolute probability of receiving the conventional endocrine/targeted therapy, which often represented a less intensive endocrine approach or a pathway towards early chemotherapy. Conversely, the intensive endocrine/targeted regimen was disproportionately selected for patients presenting with bone-only metastases or other non-liver metastatic sites relative to those with liver involvement. The logistic regression models utilized to derive these Average Marginal Effects demonstrated strong discrimination (Control Group AUC = 0.753; Intensive Regimen Group AUC = 0.753) and excellent calibration (Hosmer-Lemeshow Control Group p=0.534; Intensive Regimen Group p=0.743).

Furthermore, analysis of therapeutic differential allocation confirmed that a visceral risk gradient exists; while liver involvement was the dominant driver of first-line chemotherapy initiation (a post-baseline treatment decision), this 1L chemotherapy prediction model also exhibited strong discrimination (AUC = 0.764) and excellent calibration (Hosmer-Lemeshow p=0.876). Patients with liver metastases experienced a 43.6% higher absolute probability of receiving 1L chemotherapy compared to the standard-risk baseline (p=0.001), and even non-liver visceral involvement (Other Sites) triggered a significant 30.2% absolute increase in chemotherapy initiation compared to bone-only disease (p=0.003). Conversely, age >55, higher tumor grade, and AKT wildtype status did not significantly influence the decision to utilize chemotherapy (p>0.3 for all). This confirms that the observed clinical practice, likely driven by perceived visceral crisis, led to a higher likelihood of high-risk patients being directed toward chemotherapy and away from the intensive endocrine/targeted regimen, suggesting a potential diversion from what might be considered a standard endocrine-based sequence for mBC.

References

¹Hainmueller J. Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Polit Anal. 2012;20(1):25-46.

²Chen R, Chen G, Yu M. Entropy Balancing for Causal Generalization with Target Sample Summary Information. Biometrics. 2023;79(1):16-28.

³White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48(4):817-838.

⁴VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167(4):268-274.

⁵Imai K, Ratkovic M. Covariate balancing propensity score. J R Stat Soc Series B Stat Methodol. 2014;76(1):243-263.

⁶Hosmer DW Jr, Lemeshow S. A goodness-of-fit test for the multiple logistic regression model. Commun Stat Theory Methods. 1980;9(10):1035-1049.