Leukemia Remission Survival Analysis Project

1. Project Background and Objective

The primary objective of this project is to compare the survival experience (remission times in weeks) between two groups of leukemia patients: * Group 1: Treatment group (\(n = 21\)) * Group 2: Placebo group (\(n = 21\))

The analysis determines if the treatment significantly prolongs remission compared to the placebo.

2. Data Preparation

Below, we structure the raw data into an R data frame. Censored observations (marked with a + in the prompt) are assigned a status of 0, while completed events are assigned 1.

# Treatment Group Data (n = 21)
t1 <- c(6, 6, 6, 7, 10, 13, 16, 22, 23, 6, 9, 10, 11, 17, 19, 20, 25, 32, 32, 34, 35)
s1 <- c(1, 1, 1, 1,  1,  1,  1,  1,  1, 0, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0)
g1 <- rep("Treatment", 21)

# Placebo Group Data (n = 21)
t2 <- c(1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23)
s2 <- rep(1, 21) # All placebo patients experienced the event (no + signs)
g2 <- rep("Placebo", 21)

# Combine into a unified data frame
leukemia_df <- data.frame(
  time = c(t1, t2),
  status = c(s1, s2),
  group = factor(c(g1, g2))
)

# Display the structured data table
kable(head(leukemia_df, 10), caption = "First 10 Rows of Patient Data")

First 10 Rows of Patient Data
time	status	group
6	1	Treatment
6	1	Treatment
6	1	Treatment
7	1	Treatment
10	1	Treatment
13	1	Treatment
16	1	Treatment
22	1	Treatment
23	1	Treatment
6	0	Treatment

3. Kaplan-Meier Survival Curve Estimation

We fit the non-parametric Kaplan-Meier survival model to evaluate the probability of staying in remission over time for both groups.

# Fit the survival curves
km_fit <- survfit(Surv(time, status) ~ group, data = leukemia_df)

# Print the survival summary tables
print(km_fit)

## Call: survfit(formula = Surv(time, status) ~ group, data = leukemia_df)
## 
##                  n events median 0.95LCL 0.95UCL
## group=Placebo   21     21      8       4      12
## group=Treatment 21      9     23      16      NA

4. Modern Visualization

The plot below visualizes the survival curves. It includes \(95\%\) confidence intervals, median survival reference lines, and an integrated risk table showing active patients over time.

ggsurvplot(
  km_fit,
  data = leukemia_df,
  conf.int = TRUE,           # Add 95% confidence intervals
  pval = TRUE,               # Automatically run and display Log-Rank test p-value
  risk.table = TRUE,         # Display the number at risk table beneath
  surv.median.line = "hv",   # Draw horizontal/vertical lines at median survival
  palette = c("#E7B800", "#2E9FDF"), # Modern, presentation-ready colors
  theme = theme_minimal(),
  xlab = "Time (Weeks)",
  ylab = "Probability of Remission",
  title = "Kaplan-Meier Remission Curves by Patient Group"
)

5. Statistical Hypothesis Testing

To statistically test whether the survival experiences of the two groups are genuinely different, we perform a Log-Rank test.

# Perform Log-Rank Test
log_rank_test <- survdiff(Surv(time, status) ~ group, data = leukemia_df)
print(log_rank_test)

## Call:
## survdiff(formula = Surv(time, status) ~ group, data = leukemia_df)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## group=Placebo   21       21     10.7      9.77      16.8
## group=Treatment 21        9     19.3      5.46      16.8
## 
##  Chisq= 16.8  on 1 degrees of freedom, p= 4e-05

6. Project Conclusion

Based on the survival analysis of the leukemia patient dataset, we reach the following conclusions:

Statistical Significance: The Log-Rank test yields a highly significant p-value (\(p < 0.001\)). We reject the null hypothesis that there is no difference between the groups.
Treatment Efficacy: Patients in the Treatment group experienced significantly longer remission times compared to those in the Placebo group.
Median Survival: The median survival time for the Placebo group is 8 weeks. The Treatment group’s curve does not drop to 0.5 within the tracked timeframe due to a high number of censored individuals (\(12/21\)), indicating superior long-term remission rates.