The primary objective of this project is to compare the survival experience (remission times in weeks) between two groups of leukemia patients: * Group 1: Treatment group (\(n = 21\)) * Group 2: Placebo group (\(n = 21\))
The analysis determines if the treatment significantly prolongs remission compared to the placebo.
Below, we structure the raw data into an R data frame. Censored
observations (marked with a + in the prompt) are assigned a
status of 0, while completed events are assigned
1.
# Treatment Group Data (n = 21)
t1 <- c(6, 6, 6, 7, 10, 13, 16, 22, 23, 6, 9, 10, 11, 17, 19, 20, 25, 32, 32, 34, 35)
s1 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
g1 <- rep("Treatment", 21)
# Placebo Group Data (n = 21)
t2 <- c(1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23)
s2 <- rep(1, 21) # All placebo patients experienced the event (no + signs)
g2 <- rep("Placebo", 21)
# Combine into a unified data frame
leukemia_df <- data.frame(
time = c(t1, t2),
status = c(s1, s2),
group = factor(c(g1, g2))
)
# Display the structured data table
kable(head(leukemia_df, 10), caption = "First 10 Rows of Patient Data")
| time | status | group |
|---|---|---|
| 6 | 1 | Treatment |
| 6 | 1 | Treatment |
| 6 | 1 | Treatment |
| 7 | 1 | Treatment |
| 10 | 1 | Treatment |
| 13 | 1 | Treatment |
| 16 | 1 | Treatment |
| 22 | 1 | Treatment |
| 23 | 1 | Treatment |
| 6 | 0 | Treatment |
We fit the non-parametric Kaplan-Meier survival model to evaluate the probability of staying in remission over time for both groups.
# Fit the survival curves
km_fit <- survfit(Surv(time, status) ~ group, data = leukemia_df)
# Print the survival summary tables
print(km_fit)
## Call: survfit(formula = Surv(time, status) ~ group, data = leukemia_df)
##
## n events median 0.95LCL 0.95UCL
## group=Placebo 21 21 8 4 12
## group=Treatment 21 9 23 16 NA
The plot below visualizes the survival curves. It includes \(95\%\) confidence intervals, median survival reference lines, and an integrated risk table showing active patients over time.
ggsurvplot(
km_fit,
data = leukemia_df,
conf.int = TRUE, # Add 95% confidence intervals
pval = TRUE, # Automatically run and display Log-Rank test p-value
risk.table = TRUE, # Display the number at risk table beneath
surv.median.line = "hv", # Draw horizontal/vertical lines at median survival
palette = c("#E7B800", "#2E9FDF"), # Modern, presentation-ready colors
theme = theme_minimal(),
xlab = "Time (Weeks)",
ylab = "Probability of Remission",
title = "Kaplan-Meier Remission Curves by Patient Group"
)
To statistically test whether the survival experiences of the two groups are genuinely different, we perform a Log-Rank test.
# Perform Log-Rank Test
log_rank_test <- survdiff(Surv(time, status) ~ group, data = leukemia_df)
print(log_rank_test)
## Call:
## survdiff(formula = Surv(time, status) ~ group, data = leukemia_df)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## group=Placebo 21 21 10.7 9.77 16.8
## group=Treatment 21 9 19.3 5.46 16.8
##
## Chisq= 16.8 on 1 degrees of freedom, p= 4e-05
Based on the survival analysis of the leukemia patient dataset, we reach the following conclusions: