2024-10-20

What is Survival Analysis?

  • Survival analysis studies the time until one or more events (e.g., death, failure) occur.
  • Widely used in medicine to analyze survival time after treatments or for comparison across patient groups.
  • Goal: Estimate the survival function \(S(t)\), compare survival between groups, and understand risk factors.

Key Concepts

  1. Survival Function \(S(t)\):
    • The probability of surviving beyond time \(t\). \[ S(t) = P(T > t) \]
  2. Hazard Function \(h(t)\):
    • The instantaneous risk of an event occurring at time \(t\), given survival until time \(t\). \[ h(t) = \frac{f(t)}{S(t)} \]
  3. Censoring:
    • Not all patients reach the event by the end of the study (censored data is common).

Kaplan-Meier Survival Curve

Kaplan-Meier survival curves estimate survival probabilities over time for different groups.

data <- data.frame(
  time = c(5, 8, 12, 15, 18, 22, 25, 30, 32, 35),
  status = c(1, 1, 0, 1, 1, 0, 0, 1, 0, 1),
  group = factor(c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2))
)

km_fit <- survfit(Surv(time, status) ~ group, data = data)

km_data <- data.frame(time = km_fit$time, 
                      surv = km_fit$surv, 
                      strata = rep(levels(data$group),
                                   times = km_fit$strata))

g <- ggplot(km_data, aes(x = time, y = surv, color = strata)) +
  geom_step(linewidth = 1.2) +
  labs(title = "Kaplan-Meier Survival Curve", 
       x = "Time",
       y = "Survival Probability") +
  theme_minimal()

g

3D Plot of Survival Probability Over Time

Visualize survival probability over time in 3D.

Cumulative Hazard Plot

The Nelson-Aalen estimator provides the cumulative hazard function, which shows the accumulated risk of experiencing an event over time.

Kaplan-Meier Estimator

The Kaplan-Meier Estimator is defined as:

\[ S(t) = \prod_{t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right) \]

Where: - \(t_i\) is the time of the \(i\)-th event. - \(d_i\) is the number of events at time \(t_i\). - \(n_i\) is the number of individuals at risk just before \(t_i\).

Hazard Function

The hazard function describes the risk of failure at any time \(t\), given survival until that time:

\[ h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t} \]

Where \(T\) is the time-to-event random variable.

Conclusion

  • Survival Analysis is essential in medicine to estimate survival times and compare treatments.
  • The Kaplan-Meier Estimator provides a non-parametric estimate of survival.
  • The hazard function helps describe the risk of failure over time.