A survival curve shows how the probability of remaining event-free changes over time.
The survival function is defined as:
\[ S(t) = P(T > t) \]
Where:
Interpretation:
Examples of survival analysis applications:
In this lecture we will use a dataset from the R survival package.
The lung dataset contains survival data from patients with advanced lung cancer.
Key variables include:
| Variable | Description |
|---|---|
| time | survival time (days) |
| status | event indicator |
| age | patient age |
| sex | gender |
| ph.ecog | performance score |
Before building a survival model we explore the dataset.
inst | time | status | age | sex | ph.ecog | ph.karno | pat.karno | meal.cal | wt.loss |
|---|---|---|---|---|---|---|---|---|---|
3 | 306 | 2 | 74 | 1 | 1 | 90 | 100 | 1,175 | |
3 | 455 | 2 | 68 | 1 | 0 | 90 | 90 | 1,225 | 15 |
3 | 1,010 | 1 | 56 | 1 | 0 | 90 | 90 | 15 | |
5 | 210 | 2 | 57 | 1 | 1 | 90 | 60 | 1,150 | 11 |
1 | 883 | 2 | 60 | 1 | 0 | 100 | 90 | 0 | |
12 | 1,022 | 1 | 74 | 1 | 1 | 50 | 80 | 513 | 0 |
Important variables:
However the event variable must be recoded.
In the dataset:
status = 1 → censoredstatus = 2 → event occurredFor survival analysis we convert to:
The core structure in survival analysis is the Survival Object.
[1] 306 455 1010+ 210 883 1022+ 310 361 218 166 170 654
[13] 728 71 567 144 613 707 61 88 301 81 624 371
[25] 394 520 574 118 390 12 473 26 533 107 53 122
[37] 814 965+ 93 731 460 153 433 145 583 95 303 519
[49] 643 765 735 189 53 246 689 65 5 132 687 345
[61] 444 223 175 60 163 65 208 821+ 428 230 840+ 305
[73] 11 132 226 426 705 363 11 176 791 95 196+ 167
[85] 806+ 284 641 147 740+ 163 655 239 88 245 588+ 30
[97] 179 310 477 166 559+ 450 364 107 177 156 529+ 11
[109] 429 351 15 181 283 201 524 13 212 524 288 363
[121] 442 199 550 54 558 207 92 60 551+ 543+ 293 202
[133] 353 511+ 267 511+ 371 387 457 337 201 404+ 222 62
[145] 458+ 356+ 353 163 31 340 229 444+ 315+ 182 156 329
[157] 364+ 291 179 376+ 384+ 268 292+ 142 413+ 266+ 194 320
[169] 181 285 301+ 348 197 382+ 303+ 296+ 180 186 145 269+
[181] 300+ 284+ 350 272+ 292+ 332+ 285 259+ 110 286 270 81
[193] 131 225+ 269 225+ 243+ 279+ 276+ 135 79 59 240+ 202+
[205] 235+ 105 224+ 239 237+ 173+ 252+ 221+ 185+ 92+ 13 222+
[217] 192+ 183 211+ 175+ 197+ 203+ 116 188+ 191+ 105+ 174+ 177+
Interpretation of output:
306 → event occurred at time 3061010+ → censored observationThe + symbol indicates right censoring.
The Kaplan–Meier estimator calculates survival probability over time.
Call: survfit(formula = S_lung ~ 1, data = lung)
n events median 0.95LCL 0.95UCL
[1,] 228 165 310 285 363
Explanation:
~1 means we estimate one overall survival curve.Interpretation:
Kaplan–Meier curves are step functions.
Important rules:
Thus the survival curve describes how quickly events occur over time.
We can estimate survival probability at particular time points.
Example:
Call: survfit(formula = S_lung ~ 1, data = lung)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
100 196 31 0.864 0.0227 0.821 0.910
200 144 41 0.680 0.0311 0.622 0.744
300 92 29 0.531 0.0346 0.467 0.603
This output provides:
Example interpretation:
If (S(200)=0.59)
→ 59% of patients survive beyond 200 days.
Median survival is the time when:
\[ S(t) = 0.5 \]
Meaning:
Extract median survival:
records n.max n.start events rmean se(rmean) median 0.95LCL
228.00000 228.00000 228.00000 165.00000 376.27475 19.70779 310.00000 285.00000
0.95UCL
363.00000
Median survival is widely reported in clinical studies.
For better visualization we use survminer.
This graph shows:
The risk table shows the number of individuals still being observed.
Example interpretation:
| Time | At Risk |
|---|---|
| 0 | 228 |
| 200 | 120 |
| 400 | 45 |
| 600 | 10 |
As time increases:
We can estimate survival curves for groups.
Example: survival by sex
This produces two survival curves.
When comparing curves visually we examine:
However visual comparison alone is not sufficient.
A statistical test is needed.
To formally compare survival curves we use:
Log-Rank Test
This test evaluates whether survival functions differ between groups.
This will be discussed in the next lecture.
Surv() creates the survival objectsurvfit() estimates Kaplan–Meier curvesggsurvplot() produces clear survival visualizationsUnderstanding survival curves is the foundation for: