1. Project Background and Objective

The primary objective of this project is to compare the survival experience (remission times in weeks) between two groups of leukemia patients: * Group 1: Treatment group (\(n = 21\)) * Group 2: Placebo group (\(n = 21\))

The analysis determines if the treatment significantly prolongs remission compared to the placebo.

2. Data Preparation

Below, we structure the raw data into an R data frame. Censored observations (marked with a + in the prompt) are assigned a status of 0, while completed events are assigned 1.

# Treatment Group Data (n = 21)
t1 <- c(6, 6, 6, 7, 10, 13, 16, 22, 23, 6, 9, 10, 11, 17, 19, 20, 25, 32, 32, 34, 35)
s1 <- c(1, 1, 1, 1,  1,  1,  1,  1,  1, 0, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0)
g1 <- rep("Treatment", 21)

# Placebo Group Data (n = 21)
t2 <- c(1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23)
s2 <- rep(1, 21) # All placebo patients experienced the event (no + signs)
g2 <- rep("Placebo", 21)

# Combine into a unified data frame
leukemia_df <- data.frame(
  time = c(t1, t2),
  status = c(s1, s2),
  group = factor(c(g1, g2))
)

# Display the structured data table
kable(head(leukemia_df, 10), caption = "First 10 Rows of Patient Data")
First 10 Rows of Patient Data
time status group
6 1 Treatment
6 1 Treatment
6 1 Treatment
7 1 Treatment
10 1 Treatment
13 1 Treatment
16 1 Treatment
22 1 Treatment
23 1 Treatment
6 0 Treatment

3. Kaplan-Meier Survival Curve Estimation

We fit the non-parametric Kaplan-Meier survival model to evaluate the probability of staying in remission over time for both groups.

# Fit the survival curves
km_fit <- survfit(Surv(time, status) ~ group, data = leukemia_df)

# Print the survival summary tables
print(km_fit)
## Call: survfit(formula = Surv(time, status) ~ group, data = leukemia_df)
## 
##                  n events median 0.95LCL 0.95UCL
## group=Placebo   21     21      8       4      12
## group=Treatment 21      9     23      16      NA

4. Modern Visualization

The plot below visualizes the survival curves. It includes \(95\%\) confidence intervals, median survival reference lines, and an integrated risk table showing active patients over time.

ggsurvplot(
  km_fit,
  data = leukemia_df,
  conf.int = TRUE,           # Add 95% confidence intervals
  pval = TRUE,               # Automatically run and display Log-Rank test p-value
  risk.table = TRUE,         # Display the number at risk table beneath
  surv.median.line = "hv",   # Draw horizontal/vertical lines at median survival
  palette = c("#E7B800", "#2E9FDF"), # Modern, presentation-ready colors
  theme = theme_minimal(),
  xlab = "Time (Weeks)",
  ylab = "Probability of Remission",
  title = "Kaplan-Meier Remission Curves by Patient Group"
)

5. Statistical Hypothesis Testing

To statistically test whether the survival experiences of the two groups are genuinely different, we perform a Log-Rank test.

# Perform Log-Rank Test
log_rank_test <- survdiff(Surv(time, status) ~ group, data = leukemia_df)
print(log_rank_test)
## Call:
## survdiff(formula = Surv(time, status) ~ group, data = leukemia_df)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## group=Placebo   21       21     10.7      9.77      16.8
## group=Treatment 21        9     19.3      5.46      16.8
## 
##  Chisq= 16.8  on 1 degrees of freedom, p= 4e-05

6. Project Conclusion

Based on the survival analysis of the leukemia patient dataset, we reach the following conclusions:

  1. Statistical Significance: The Log-Rank test yields a highly significant p-value (\(p < 0.001\)). We reject the null hypothesis that there is no difference between the groups.
  2. Treatment Efficacy: Patients in the Treatment group experienced significantly longer remission times compared to those in the Placebo group.
  3. Median Survival: The median survival time for the Placebo group is 8 weeks. The Treatment group’s curve does not drop to 0.5 within the tracked timeframe due to a high number of censored individuals (\(12/21\)), indicating superior long-term remission rates.