2025-06-07

What is Survival Analysis?

Survival analysis is a method for analyzing time-to-event data, often used in medicine, engineering, and social sciences.

Why is it Unique?

  • Focuses on time until event such as death or failure
  • Handles censored data when the event isn’t observed

Exploring the Data

First 6 Rows of the Lung Dataset
Institution Survival Time Status Age Sex ECOG Score Physician Karnofsky Patient Karnofsky Meal Calories Weight Loss
3 306 2 74 1 1 90 100 1175 NA
3 455 2 68 1 0 90 90 1225 15
3 1010 1 56 1 0 90 90 NA 15
5 210 2 57 1 1 90 60 1150 11
1 883 2 60 1 0 100 90 NA 0
12 1022 1 74 1 1 50 80 513 0

Survival Curve by Sex

Count of Events by Status R Code

lung$Status <- factor(lung$status, labels = c("Censored", "Event"))

ggplot(lung, aes(x = Status, fill = Status)) +
  geom_bar() +
  labs(title = "Count of Events vs Censored Cases",
       x = "Status",
       y = "Count") +
  theme_minimal() +
  theme(legend.position = "none")

Count of Events by Status Graph

Interactive Plotly Chart

Kaplan-Meier Estimator

The Kaplan-Meier estimator calculates the probability of surviving past time \(t\):

\[ \hat{S}(t) = \prod_{t_i \le t} \left( 1 - \frac{d_i}{n_i} \right) \]

Where: - \(d_i\) = number of events (deaths) at time \(t_i\)
- \(n_i\) = number of individuals at risk just before time \(t_i\)

Hazard Function

The hazard function represents the instantaneous risk of an event at time \(t\):

\[ h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t} \]

It reflects the likelihood of the event occurring at a small time interval, given survival until \(t\).

Conclusion

  • Survival analysis models time-to-event outcomes and accounts for censored data.
  • Kaplan-Meier curves revealed that females had significantly higher survival probabilities than males (p = 0.0013).
  • A bar plot showed more observed events than censored cases, while a scatterplot illustrated wide variability in survival time across ages.
  • These methods are widely used in fields like medicine, engineering, and business to assess risk and understand event timing.