Survival Analysis in R

Author

Richmond Silvanus Baye

Published

April 12, 2025

Introduction

I demonstrate a step-by-step survival analysis workflow in R, covering the following components:
1. Loading Data and Packages (Data import, required libraries such as survival, survminer, dplyr, gtsummary, etc.)
2. Creating a Summary Table (Descriptive statistics of study population.)
3. Kaplan-Meier (KM) Survival Analysis (Generating KM survival curves and assessing survival probabilities.)
4. Cox Proportional Hazards (PH) Model (Modeling hazard ratios, testing proportional hazards assumptions, and interpretation.)
5. Subgroup Analyses (Separate survival analyses for metastatic and in-situ patient groups.
6. Comparative Effectiveness of Treatments
  (Assessing and comparing survival outcomes among different drug treatments.)

# Load required packages
pacman::p_load( lubridate, ggsurvfit, gtsummary, tidycmprsk, tidyverse,
  dplyr, gt, survival, broom, survminer, survRM2
)

# Loading the cleaned synthetic data from Synthea.
df <- read_csv("parquet.csv")

# Quick look at the data structure
glimpse(df)

Rows: 927
Columns: 13
$ ...1                       <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
$ patient                    <chr> "00b99a22-6749-4c13-b0f8-5d3957263377", "00…
$ progression                <chr> "in situ -> metastatic", "in situ", "in sit…
$ race                       <chr> "white", "white", "white", "other", "white"…
$ ethnicity                  <chr> "nonhispanic", "nonhispanic", "nonhispanic"…
$ gender                     <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M"…
$ income                     <dbl> 50.249, 126.045, 167.457, 94.640, 323.357, …
$ age_at_dx                  <dbl> 67, 63, 60, 65, 69, 61, 73, 70, 66, 66, 60,…
$ followup_neoplasm_months   <dbl> 12.221766, 10.809035, 85.322382, 69.223819,…
$ followup_metastasis_months <dbl> 9.264887, NA, NA, NA, NA, NA, NA, NA, NA, 3…
$ is_dead                    <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1…
$ lines_of_treatment         <chr> "L1", "L2", "L2", "L2", "L2", "L2", "L1", "…
$ death_reason               <chr> "Alive", "Alive", "Alive", "Alive", "Other …

Creating a summary table

The package gtsummary allows us to create a publication ready summary statistics.

# Create a named list of column labels for the summary table
column_labels <- list(
  'gender'             = 'Gender, n (%)',
  'race'               = 'Race, n (%)',
  'ethnicity'          = 'Ethnicity, n (%)',
  'age_at_dx'          = 'Age at Dx (years)',
  'income'             = 'Income (K USD)',
  'death_reason'       = 'Death Reason, n (%)',
  'lines_of_treatment' = 'Lines of Treatment, n (%)'
)

# Identify columns that have more than one distinct value
columns_include <- names(column_labels)
columns_use <- df %>%
  select(all_of(columns_include)) %>%
  summarise(across(everything(), ~ n_distinct(.))) %>%
  pivot_longer(everything(), names_to = "var", values_to = "n_levels") %>%
  filter(n_levels > 1) %>%
  pull(var)

# Build summary table, stratifying by 'progression' 
summary_stats <- df %>%
  select(all_of(c(columns_use, "progression"))) %>%
  tbl_summary(
    by = "progression",
    label = column_labels,
    statistic = list(
      all_categorical() ~ "{n} ({p}%)",
      all_continuous() ~ "{mean} ({sd})"
    )
  ) %>%
  add_n() %>%
  add_p(
    test = list(
      all_categorical() ~ "chisq.test",
      all_continuous() ~ "aov"
    ),
    test.args = all_tests("chisq.test") ~ list(simulate.p.value = TRUE, B = 10000)
  )

# Print the table
summary_stats

Characteristic	N	in situ N = 752¹	in situ -> metastatic N = 94¹	metastatic N = 81¹	p-value²
Race, n (%)	927				0.5
black		62 (8.2%)	7 (7.4%)	10 (12%)
other		56 (7.4%)	4 (4.3%)	6 (7.4%)
white		634 (84%)	83 (88%)	65 (80%)
Ethnicity, n (%)	927				0.4
hispanic		77 (10%)	8 (8.5%)	12 (15%)
nonhispanic		675 (90%)	86 (91%)	69 (85%)
Age at Dx (years)	927	68 (9)	68 (8)	67 (7)	0.3
Income (K USD)	927	114 (157)	131 (175)	111 (150)	0.6
Death Reason, n (%)	927				<0.001
Alive		607 (81%)	23 (24%)	34 (42%)
Missing		1 (0.1%)	0 (0%)	0 (0%)
Other causes		99 (13%)	8 (8.5%)	3 (3.7%)
prostate cancer		45 (6.0%)	63 (67%)	44 (54%)
Lines of Treatment, n (%)	846				<0.001
L1		62 (8.2%)	94 (100%)	0 (NA%)
L2		690 (92%)	0 (0%)	0 (NA%)
Unknown		0	0	81
¹ n (%); Mean (SD)
² Pearson’s Chi-squared test with simulated p-value (based on 10000 replicates); One-way analysis of means

The cohort of 927 participants consisted predominantly of white (84%-88%) and non-Hispanic individuals (85%-91%), with no significant racial or ethnic differences across disease stages (p=0.5 and p=0.3, respectively). Age at diagnosis is 68 years in the in-situ and the progression from in-situ to the metastatic stage. While the age at diagnosis at the metastatic stage is 67. Income were similar across groups ($111K-$131K). Death due to prostate cancer was significantly higher among those who progressed from in-situ to metastatic (67%) and metastatic-only groups (54%), compared to the in-situ group (6%) (p<0.001). Treatment intensity, reflected by lines of treatment, differed significantly by disease progression status (p<0.001)

Kaplan Meier Survival Curves

# Fit the Kaplan-Meier model for followup_neoplasm_months (time) and is_dead (event)
fit <- survfit2(Surv(followup_neoplasm_months, is_dead) ~ progression, data = df)

# Plot the KM curves with ggsurvfit
ggsurvfit(fit) +
  add_confidence_interval() +
  labs(
    x = "Follow-up (Months)",
    y = "Survival Probability"
  ) +
  theme_minimal()

The Kaplan Meier curve above shows the survival probability and the follow-up time for the groups. As observed, there is a clear difference in survival probability across the group. In-situ (red): have the highest survival probability with over 50% surviving beyond 10 years. The in-situ -> metastatic (green) experienced a sharp decline in survival, with most deaths occurring within 5 years (60 months). The metastatic (blue) group aslo experienced a sharp decline in survival within 5 years. In what follows, we will estimate the time point at which 50% of the study population is expected to die from disease progression.

Median survival

This will return the median survival time for each group (state). If a group does not reach the median survival probability, the function will return NA. From the table, we observe that patients in the in-situ state stayed above 0.5 over the study period (120 months), suggesting that over half of the patients with in-situ prostate cancer lived longer than 10 years (long term survival benefit). For those progressing from the in-situ to the metastatic state, the median survival time is approximately 31 months, whereas for those already in the metastatic state, it is about 40 months. Suggesting that disease progression worsens compared to in-situ group.

# Calculate median survival times for each group
med_surv <- surv_median(fit)

# Display in a gt table
gt(med_surv)

strata	median	lower	upper
progression=in situ	NA	NA	NA
progression=in situ -> metastatic	30.45585	26.15195	36.43532
progression=metastatic	39.75359	36.86242	42.97331

Survival Estimate in 30 and 60 months

While median survival times provide a useful overall summary of survival patterns, time-specific survival probabilities at clinically relevant follow-up points offer additional insight into the short- and mid-term prognosis across groups.

The estimate below shows survival probabilities at 30 (2.5 years) and 60 (5 years) months

# Display survival at 2.5 years (30 months) and 5 years (60 months)
summary(fit, times = c(30, 60))

Call: survfit(formula = Surv(followup_neoplasm_months, is_dead) ~ progression, 
    data = df)

                progression=in situ 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   30    465      59    0.902  0.0122        0.878        0.926
   60    259      56    0.766  0.0199        0.728        0.806

                progression=in situ -> metastatic 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   30     38      40   0.5034  0.0561       0.4046        0.626
   60      4      28   0.0691  0.0327       0.0273        0.175

                progression=metastatic 
        time       n.risk      n.event     survival      std.err lower 95% CI 
      30.000       39.000       17.000        0.725        0.057        0.621 
upper 95% CI 
       0.846

We observe that patients in the in-situ state have a 90% survival probability at 30 months, which declines to 77% at 60 months. Patients who progress from in-situ to metastatic show a significantly lower survival probability—50% at 30 months and just 7% at 60 months. For those already in the metastatic state, survival at 30 months is approximately 72% A natural question is are there differences across the groups?

Cox Proportional Hazards Model.

Estimating the relative hazard of death associated with each progression compared to the in-situ group.

# Fit the Cox model: Surv(time, event) ~ progression
cox_model <- coxph(Surv(followup_neoplasm_months, is_dead) ~ progression, data = df)

cox_model

Call:
coxph(formula = Surv(followup_neoplasm_months, is_dead) ~ progression, 
    data = df)

                                   coef exp(coef) se(coef)     z      p
progressionin situ -> metastatic 2.2210    9.2161   0.1564 14.20 <2e-16
progressionmetastatic            1.9177    6.8053   0.1779 10.78 <2e-16

Likelihood ratio test=210.7  on 2 df, p=< 2.2e-16
n= 927, number of events= 263

# Create a tidy summary of the model
cox_result <- tidy(cox_model, exponentiate = TRUE, conf.int = TRUE) %>%
  select(term, estimate, conf.low, conf.high, p.value) %>%
  rename(
    Variable     = term,
    Hazard_Ratio = estimate,
    CI_Lower     = conf.low,
    CI_Upper     = conf.high,
    P_Value      = p.value
  )

# Display results with gt
cox_result %>%
  gt() %>%
  fmt_number(columns = c(Hazard_Ratio, CI_Lower, CI_Upper), decimals = 2) %>%
  fmt_number(columns = P_Value, decimals = 3)

Variable	Hazard_Ratio	CI_Lower	CI_Upper	P_Value
progressionin situ -> metastatic	9.22	6.78	12.52	0.000
progressionmetastatic	6.81	4.80	9.64	0.000

The Cox model results show that patients who progress from the in-situ to the metastatic state have a hazard ratio of 9.22, meaning their risk of death is over nine times higher than patients who remain in the in-situ state. Patients who are already in the metastatic state have a hazard ratio of 6.81. Both groups have significantly elevated mortality risks compared to the in-situ group. Interestingly, the hazard is higher in those who progressed from in-situ to metastatic than in those who were metastatic at baseline. This leads us to investigate if the effect changes over time. To do this, we test the:

Proportional hazard assumption

We need to check the proportional hazard assumption to assess if the effect changes over time

cox.zph(cox_model)

            chisq df       p
progression  28.4  2 6.7e-07
GLOBAL       28.4  2 6.7e-07

As shown above, the global test for the proportional hazards assumption shows a significant p-value (p = 6.7e-07), indicating that the effect of the ‘progression’ variable on the hazard is not constant over time. This violation suggests that the proportional hazards assumption is not met. To address this, additional covariates should be included in the model or time-varying effects should be considered to account for heterogeneity and time-fixed effects.

Adjusted cox models

Accounting for patient characteristics such as age at diagnosis, income, race and ethnicity.

# Fit a Cox model adjusting for other variables
cox_model_adj <- coxph(
  Surv(followup_neoplasm_months, is_dead) ~ progression + age_at_dx + income + race + ethnicity,
  data = df
)

# Summarize the adjusted model
summary(cox_model_adj)

Call:
coxph(formula = Surv(followup_neoplasm_months, is_dead) ~ progression + 
    age_at_dx + income + race + ethnicity, data = df)

  n= 927, number of events= 263 

                                       coef  exp(coef)   se(coef)      z
progressionin situ -> metastatic  2.2175068  9.1844038  0.1568369 14.139
progressionmetastatic             1.9240586  6.8486980  0.1785788 10.774
age_at_dx                         0.0143020  1.0144047  0.0069693  2.052
income                            0.0003711  1.0003712  0.0003825  0.970
raceother                        -0.3295910  0.7192178  0.3351694 -0.983
racewhite                        -0.0236943  0.9765842  0.2132497 -0.111
ethnicitynonhispanic             -0.1980421  0.8203353  0.2012288 -0.984
                                 Pr(>|z|)    
progressionin situ -> metastatic   <2e-16 ***
progressionmetastatic              <2e-16 ***
age_at_dx                          0.0402 *  
income                             0.3319    
raceother                          0.3254    
racewhite                          0.9115    
ethnicitynonhispanic               0.3250    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                                 exp(coef) exp(-coef) lower .95 upper .95
progressionin situ -> metastatic    9.1844     0.1089    6.7538    12.490
progressionmetastatic               6.8487     0.1460    4.8262     9.719
age_at_dx                           1.0144     0.9858    1.0006     1.028
income                              1.0004     0.9996    0.9996     1.001
raceother                           0.7192     1.3904    0.3729     1.387
racewhite                           0.9766     1.0240    0.6430     1.483
ethnicitynonhispanic                0.8203     1.2190    0.5530     1.217

Concordance= 0.705  (se = 0.018 )
Likelihood ratio test= 217.3  on 7 df,   p=<2e-16
Wald test            = 244.1  on 7 df,   p=<2e-16
Score (logrank) test = 334.7  on 7 df,   p=<2e-16

# Check PH assumption again
cox.zph(cox_model_adj)

             chisq df       p
progression 27.489  2 1.1e-06
age_at_dx    0.425  1    0.51
income       0.117  1    0.73
race         2.105  2    0.35
ethnicity    0.267  1    0.61
GLOBAL      30.379  7 8.1e-05

# Display with gtsummary
cox_model_adj %>%
  tbl_regression(exp = TRUE) %>%
  bold_p(t = 0.05) %>%
  modify_caption("**Cox Proportional Hazards Model (Adjusted)**")

**Cox Proportional Hazards Model (Adjusted)**
Characteristic	HR	95% CI	p-value
progression
in situ	—	—
in situ -> metastatic	9.18	6.75, 12.5	<0.001
metastatic	6.85	4.83, 9.72	<0.001
age_at_dx	1.01	1.00, 1.03	0.040
income	1.00	1.00, 1.00	0.3
race
black	—	—
other	0.72	0.37, 1.39	0.3
white	0.98	0.64, 1.48	>0.9
ethnicity
hispanic	—	—
nonhispanic	0.82	0.55, 1.22	0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Even after controlling for patient characteristics, we observe that patients progressing from in-situ to metastatic disease had a hazard ratio similar to the univariate case (HR) of 9.18 (95% CI: 6.75–12.49), indicating a 9-fold increased risk of death compared to those who remained in the in-situ state. Patients already in the metastatic state had an HR of 6.85 (95% CI: 4.83–9.72). Age at diagnosis was also significantly associated with survival (HR = 1.014, p = 0.0402), suggesting that each additional year of age increases the hazard by approximately 1.4%. Income, race, and ethnicity were not statistically significant predictors in the adjusted model. The non significance of income, race and ethnicity means they do not violate the proportional hazard assumption. The concordance statistic is 0.705, indicating good discrimination.

Further diagnosis. Stratified Cox Model

The idea is to stratify by disease progression levels while assuming that the effect on other covariates are constant across strata.

coxph(Surv(followup_neoplasm_months, is_dead) ~ age_at_dx + income + race + ethnicity + strata(progression), data = df)

Call:
coxph(formula = Surv(followup_neoplasm_months, is_dead) ~ age_at_dx + 
    income + race + ethnicity + strata(progression), data = df)

                           coef  exp(coef)   se(coef)      z     p
age_at_dx             0.0108583  1.0109174  0.0070961  1.530 0.126
income                0.0003780  1.0003781  0.0003827  0.988 0.323
raceother            -0.4239805  0.6544367  0.3378339 -1.255 0.209
racewhite            -0.0992986  0.9054723  0.2142302 -0.464 0.643
ethnicitynonhispanic -0.1605612  0.8516657  0.2062997 -0.778 0.436

Likelihood ratio test=5.2  on 5 df, p=0.3925
n= 927, number of events= 263

Results from the stratified cox model shows that non of the covariates included in the model independently affected the survival. This shows that the results from the cox model may be driven mainly by disease progression.

Further diagnosis

# Quick check of death counts by progression group
df %>%
  group_by(progression) %>%
  summarise(
    deaths = sum(is_dead),
    total = n(),
    death_rate = deaths / total
  )

# A tibble: 3 × 4
  progression           deaths total death_rate
  <chr>                  <dbl> <int>      <dbl>
1 in situ                  145   752      0.193
2 in situ -> metastatic     71    94      0.755
3 metastatic                47    81      0.580

Our further descriptive assessment shows that the death rate among patients in the in-situ state is approximately 19%. In contrast, patients who progressed from in-situ to metastatic had a significantly higher death rate of 76%. Among those initially diagnosed in the metastatic state, the death rate was 58%. These findings highlight the substantial increase in mortality risk associated with disease progression, particularly among patients who transition from in-situ to metastatic stages.

Plotting the fitted cox model

# Plot overall survival curve from the fitted Cox model
ggsurvplot(survfit(cox_model), data = df, conf.int = TRUE)

In-situ ->Metastatic and Metastatic Group Analysis

What could explain the low survival rates in the in-situ -> metastatic and metastatic state. Here, the idea is to filter the data to focus on patients who progress from the in-situ to the metastatic state, as well as those already in the metastatic state.

# Filter to cases: "in situ -> metastatic" and "metastatic"
met_df <- df %>%
  filter(progression %in% c("in situ -> metastatic", "metastatic"))

Plotting the Kaplan Meier curve for these specific groups

# Fit KM model using followup_metastasis_months as time variable
km_fit <- survfit2(Surv(followup_metastasis_months, is_dead) ~ progression, data = met_df)

# Plot the curves
km_fit %>%
  ggsurvfit() +
  add_confidence_interval() +
  labs(
    x = "Follow-up (Months)",
    y = "Survival Probability",
    title = "Kaplan-Meier Survival Curves for Metastatic Groups"
  ) +
  theme_minimal()

# Extract median survival times
medsurv_overall <- surv_median(km_fit)

# Display in a table
medsurv_overall %>%
  select(strata, median) %>%
  rename(
    Group = strata,
    Median_Survival = median
  ) %>%
  gt() %>%
  fmt_number(columns = c(Median_Survival), decimals = 2) %>%
  tab_header(title = "Median Survival Times for Metastatic Groups")

Median Survival Times for Metastatic Groups
Group	Median_Survival
progression=in situ -> metastatic	27.50
progression=metastatic	39.75

# Check survival at the 50% time quantile
summary(km_fit, times = quantile(met_df$followup_metastasis_months, 0.5))

Call: survfit(formula = Surv(followup_metastasis_months, is_dead) ~ 
    progression, data = met_df)

                progression=in situ -> metastatic 
        time       n.risk      n.event     survival      std.err lower 95% CI 
     24.2793      43.0000      37.0000       0.5430       0.0557       0.4441 
upper 95% CI 
      0.6639 

                progression=metastatic 
        time       n.risk      n.event     survival      std.err lower 95% CI 
     24.2793      45.0000      15.0000       0.7606       0.0539       0.6619 
upper 95% CI 
      0.8740

# Tidy survival estimates
tidy(km_fit) %>%
  select(strata, estimate, conf.low, conf.high)

# A tibble: 168 × 4
   strata                            estimate conf.low conf.high
   <chr>                                <dbl>    <dbl>     <dbl>
 1 progression=in situ -> metastatic    1        1             1
 2 progression=in situ -> metastatic    1        1             1
 3 progression=in situ -> metastatic    1        1             1
 4 progression=in situ -> metastatic    1        1             1
 5 progression=in situ -> metastatic    0.989    0.967         1
 6 progression=in situ -> metastatic    0.989    0.967         1
 7 progression=in situ -> metastatic    0.977    0.946         1
 8 progression=in situ -> metastatic    0.977    0.946         1
 9 progression=in situ -> metastatic    0.977    0.946         1
10 progression=in situ -> metastatic    0.965    0.928         1
# ℹ 158 more rows

At around 24 months, the in situ → metastatic group shows an estimated survival probability of 54.3% (95% CI: 44.4–66.4), whereas the metastatic group’s estimated survival is 76.1% (95% CI: 66.2–87.4). This suggests that patients whose disease progressed from in situ to metastatic experienced lower survival rates compared to those who were initially diagnosed with metastatic prostate cancer.

Conduct a log rank test between the two groups

# Perform a log-rank test between the two metastatic progression groups
logrank_test <- survdiff(Surv(followup_metastasis_months, is_dead) ~ progression, data = met_df)

# Convert to a tidy format
logrank_df <- tibble(
  Statistic = logrank_test$chisq,
  P_Value   = 1 - pchisq(logrank_test$chisq, df = length(logrank_test$n) - 1)
)

# Display the log-rank test result
logrank_df %>%
  gt() %>%
  fmt_number(columns = c(Statistic, P_Value), decimals = 3) %>%
  tab_header(title = "Log-Rank Test for Survival Differences")

Log-Rank Test for Survival Differences
Statistic	P_Value
6.609	0.010

With a p-value 0.010. We can confirm there is a significant difference in survival between the two metastatic subgroups

Now, lets conduct a cox model for the metastatic group

# Fit Cox model for metastatic-only data
cox_model_met <- coxph(Surv(followup_metastasis_months, is_dead) ~ progression, data = met_df)
summary(cox_model_met)

Call:
coxph(formula = Surv(followup_metastasis_months, is_dead) ~ progression, 
    data = met_df)

  n= 175, number of events= 118 

                         coef exp(coef) se(coef)      z Pr(>|z|)  
progressionmetastatic -0.4855    0.6154   0.1908 -2.544    0.011 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                      exp(coef) exp(-coef) lower .95 upper .95
progressionmetastatic    0.6154      1.625    0.4234    0.8945

Concordance= 0.587  (se = 0.026 )
Likelihood ratio test= 6.61  on 1 df,   p=0.01
Wald test            = 6.47  on 1 df,   p=0.01
Score (logrank) test = 6.59  on 1 df,   p=0.01

# Extract HR, CI, p-values
cox_result_met <- tidy(cox_model_met, exponentiate = TRUE, conf.int = TRUE) %>%
  select(term, estimate, conf.low, conf.high, p.value) %>%
  rename(
    Variable      = term,
    Hazard_Ratio  = estimate,
    CI_Lower      = conf.low,
    CI_Upper      = conf.high,
    P_Value       = p.value
  )

# Display nicely with gt
cox_result_met %>%
  gt() %>%
  fmt_number(columns = c(Hazard_Ratio, CI_Lower, CI_Upper), decimals = 2) %>%
  fmt_number(columns = P_Value, decimals = 3) %>%
  tab_header(title = "Cox Proportional Hazards Model for Metastatic Groups") %>%
  cols_label(
    Variable      = "Predictor",
    Hazard_Ratio  = "HR",
    CI_Lower      = "95% CI Lower",
    CI_Upper      = "95% CI Upper",
    P_Value       = "P-Value"
  ) %>%
  tab_options(table.font.size = px(14))

Cox Proportional Hazards Model for Metastatic Groups
Predictor	HR	95% CI Lower	95% CI Upper	P-Value
progressionmetastatic	0.62	0.42	0.89	0.011

# Check PH assumption
cox.zph(cox_model_met)

            chisq df     p
progression  5.07  1 0.024
GLOBAL       5.07  1 0.024

#A significant value shows the the model is time dependent

Comparative effectiveness of treatment on in-situ

# Filter dataset for patients who remain 'in situ'
insitu_df <- df %>%
  filter(progression == "in situ")

# Kaplan-Meier by lines_of_treatment
km_fit_insitu <- survfit2(Surv(followup_neoplasm_months, is_dead) ~ lines_of_treatment, data = insitu_df)

# Plot survival curves
km_fit_insitu %>%
  ggsurvfit() +
  labs(
    x = "Follow-up (Months)",
    y = "Survival Probability",
    title = "Kaplan-Meier Survival for In Situ Disease (by Lines of Treatment)"
  ) +
  theme_minimal()

# Extract median survival times
medsurv_overall <- surv_median(km_fit_insitu)

# Display as a table
medsurv_overall %>%
  select(strata, median) %>%
  rename(
    Group = strata,
    Median_Survival = median
  ) %>%
  gt() %>%
  fmt_number(columns = c(Median_Survival), decimals = 2) %>%
  tab_header(title = "Median Survival for In Situ Disease by Lines of Treatment")

Median Survival for In Situ Disease by Lines of Treatment
Group	Median_Survival
lines_of_treatment=L1	29.40
lines_of_treatment=L2	NA

We observe a notable trend in survival outcomes based on the number of treatment lines received. Patients who received only one line of treatment experienced a sharp decline in survival probability during the early follow-up period. In contrast, those who received two lines of treatment demonstrated a consistently higher survival probability throughout the follow-up. This suggests that receiving additional lines of therapy may be associated with improved survival outcomes

Cox model by lines of treatment.

# Define formula
formula <- as.formula("Surv(followup_neoplasm_months, is_dead) ~ lines_of_treatment")

# Fit Cox model
coxph_model <- coxph(formula, data = insitu_df)
# Extract hazard ratios, confidence intervals, p-values
cox_result <- tidy(coxph_model, exponentiate = TRUE, conf.int = TRUE) %>%
  select(term, estimate, conf.low, conf.high, p.value) %>%
  rename(
    Variable      = term,
    Hazard_Ratio  = estimate,
    CI_Lower      = conf.low,
    CI_Upper      = conf.high,
    P_Value       = p.value
  )

# Display with gt
cox_result %>%
  gt() %>%
  fmt_number(columns = c(Hazard_Ratio, CI_Lower, CI_Upper), decimals = 2) %>%
  fmt_number(columns = P_Value, decimals = 3) %>%
  tab_header(title = "Cox Proportional Hazards Model by Lines of Treatment (In Situ Patients)") %>%
  cols_label(
    Variable      = "Treatment Line",
    Hazard_Ratio  = "HR",
    CI_Lower      = "95% CI Lower",
    CI_Upper      = "95% CI Upper",
    P_Value       = "P-Value"
  )

Cox Proportional Hazards Model by Lines of Treatment (In Situ Patients)
Treatment Line	HR	95% CI Lower	95% CI Upper	P-Value
lines_of_treatmentL2	0.08	0.05	0.12	0.000

The Hazard Ratio value is lower than one for the group with two lines of therapy, and the confidence intervals do not cross 1, confirming that the risk of death is lower for the group with two lines of treatment than for the group with one line of treatment.