Survival analysis is a method for analyzing time-to-event data, often used in medicine, engineering, and social sciences.
2025-06-07
Survival analysis is a method for analyzing time-to-event data, often used in medicine, engineering, and social sciences.
| Institution | Survival Time | Status | Age | Sex | ECOG Score | Physician Karnofsky | Patient Karnofsky | Meal Calories | Weight Loss |
|---|---|---|---|---|---|---|---|---|---|
| 3 | 306 | 2 | 74 | 1 | 1 | 90 | 100 | 1175 | NA |
| 3 | 455 | 2 | 68 | 1 | 0 | 90 | 90 | 1225 | 15 |
| 3 | 1010 | 1 | 56 | 1 | 0 | 90 | 90 | NA | 15 |
| 5 | 210 | 2 | 57 | 1 | 1 | 90 | 60 | 1150 | 11 |
| 1 | 883 | 2 | 60 | 1 | 0 | 100 | 90 | NA | 0 |
| 12 | 1022 | 1 | 74 | 1 | 1 | 50 | 80 | 513 | 0 |
lung$Status <- factor(lung$status, labels = c("Censored", "Event"))
ggplot(lung, aes(x = Status, fill = Status)) +
geom_bar() +
labs(title = "Count of Events vs Censored Cases",
x = "Status",
y = "Count") +
theme_minimal() +
theme(legend.position = "none")
The Kaplan-Meier estimator calculates the probability of surviving past time \(t\):
\[ \hat{S}(t) = \prod_{t_i \le t} \left( 1 - \frac{d_i}{n_i} \right) \]
Where: - \(d_i\) = number of events (deaths) at time \(t_i\)
- \(n_i\) = number of individuals at risk just before time \(t_i\)
The hazard function represents the instantaneous risk of an event at time \(t\):
\[ h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t} \]
It reflects the likelihood of the event occurring at a small time interval, given survival until \(t\).