Book:
Survival Analysis
A Self Learning Text
Third Edition
library(survival)addicts <- foreign::read.dta("data/addicts.dta")
head(addicts) id clinic status survt prison dose
1 1 1 1 428 0 50
2 2 1 1 275 1 55
3 3 1 1 262 0 55
4 4 1 1 183 0 30
5 5 1 1 259 1 65
6 6 1 1 714 0 55
Description of the data -
ID – Patient ID
CLINIC – Indicates which methadone treatment clinic the patient attended (coded 1 or 2)
STATUS – Indicates whether the patient dropped out of the clinic (coded 1) or was censored (coded 0)
SURVT – The time (in days) until the patient dropped out of the clinic or was censored
PRISON – Indicates whether the patient had a prison record (coded 1) or not (coded 0)
DOSE – A continuous variable for the patient’s maximum methadone dose (mg/day)
In Surv() function, time = follow up time for the right
censored data & event = the status indicator, normally 0=alive,
1=dead.
Y <- Surv(time = addicts$survt, event = addicts$status)
Y [1] 428 275 262 183 259 714 438 796+ 892 393 161+ 836
[13] 523 612 212 399 771 514 512 624 209 341 299 826+
[25] 262 566+ 368 302 602+ 652 293 564+ 394 755 591 787+
[37] 739 550 837 612 581+ 523 504 785 774 560 160 482
[49] 518 683 147 563 646 899 857 180 452 760 496 258
[61] 181 386 439+ 563+ 337 613+ 192 405+ 667 905+ 247 821
[73] 821 517+ 346+ 294 244 95 376 212 96 532 522 679
[85] 408+ 840+ 148+ 168 489 541+ 205 475+ 237 517 749 150
[97] 465 708 713+ 146+ 450 555+ 460 53+ 122 35 532+ 684+
[109] 769+ 591+ 769+ 609+ 932+ 932+ 587+ 26 72+ 641+ 367+ 633+
[121] 661 232 13 563+ 969+ 1052+ 944+ 881+ 190 79 884+ 170
[133] 286 358+ 326+ 769+ 161 564+ 268 611+ 322 1076+ 2+ 788+
[145] 575+ 109 730+ 790+ 456+ 231 143 86+ 1021+ 684+ 878 216
[157] 808+ 268 222+ 683+ 496+ 389 126 17 350 531+ 317+ 461+
[169] 37 167 358 49 457 127 7 29 62 150+ 223 129+
[181] 204+ 129 581 176 30 41 543+ 210+ 193 434 367 348
[193] 28+ 337+ 175+ 149 546 84 283+ 533 207 216 28+ 67
[205] 62+ 111+ 257 136 342+ 41 531+ 98+ 145 50 53+ 103+
[217] 2+ 157 75 19 35 394+ 117 175 180 314 480+ 325+
[229] 280 204 366 531+ 59 33 540 551+ 90 47
This survival object created by the Surv() function is
used as response variable for survival analyses. The plus (+) sign after
the survival times indicates censorship rather than event.
In the following model, Y~1 requests an intercept only model.
kmfit1 = survfit(Y~1)
kmfit1Call: survfit(formula = Y ~ 1)
n events median 0.95LCL 0.95UCL
238 150 504 399 560
The output contains descriptive information on the number of records, the number at risk at time 0, the number of events, and the median estimated survival time with a 95% confidence interval.
The summary function shows the survival estimates -
summary(kmfit1)Call: survfit(formula = Y ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7 236 1 0.996 0.00423 0.9875 1.000
13 235 1 0.992 0.00597 0.9799 1.000
17 234 1 0.987 0.00729 0.9731 1.000
19 233 1 0.983 0.00840 0.9667 1.000
26 232 1 0.979 0.00937 0.9606 0.997
29 229 1 0.975 0.01026 0.9546 0.995
30 228 1 0.970 0.01107 0.9488 0.992
33 227 1 0.966 0.01182 0.9431 0.989
35 226 2 0.957 0.01317 0.9320 0.984
37 224 1 0.953 0.01379 0.9265 0.981
41 223 2 0.945 0.01493 0.9158 0.974
47 221 1 0.940 0.01546 0.9105 0.971
49 220 1 0.936 0.01597 0.9053 0.968
50 219 1 0.932 0.01646 0.9001 0.965
59 216 1 0.927 0.01694 0.8949 0.961
62 215 1 0.923 0.01740 0.8897 0.958
67 213 1 0.919 0.01785 0.8845 0.954
75 211 1 0.914 0.01829 0.8793 0.951
79 210 1 0.910 0.01871 0.8742 0.948
84 209 1 0.906 0.01913 0.8691 0.944
90 207 1 0.901 0.01953 0.8639 0.940
95 206 1 0.897 0.01992 0.8588 0.937
96 205 1 0.893 0.02029 0.8537 0.933
109 202 1 0.888 0.02067 0.8486 0.930
117 200 1 0.884 0.02104 0.8435 0.926
122 199 1 0.879 0.02140 0.8384 0.922
126 198 1 0.875 0.02174 0.8333 0.919
127 197 1 0.870 0.02208 0.8282 0.915
129 196 1 0.866 0.02241 0.8232 0.911
136 194 1 0.862 0.02274 0.8181 0.907
143 193 1 0.857 0.02305 0.8131 0.903
145 192 1 0.853 0.02336 0.8080 0.900
147 190 1 0.848 0.02366 0.8030 0.896
149 188 1 0.844 0.02396 0.7979 0.892
150 187 1 0.839 0.02426 0.7929 0.888
157 185 1 0.835 0.02455 0.7878 0.884
160 184 1 0.830 0.02483 0.7828 0.880
161 183 1 0.826 0.02510 0.7777 0.876
167 181 1 0.821 0.02538 0.7727 0.872
168 180 1 0.816 0.02564 0.7676 0.868
170 179 1 0.812 0.02590 0.7626 0.864
175 178 1 0.807 0.02615 0.7576 0.860
176 176 1 0.803 0.02640 0.7526 0.856
180 175 2 0.794 0.02689 0.7425 0.848
181 173 1 0.789 0.02712 0.7375 0.844
183 172 1 0.784 0.02735 0.7325 0.840
190 171 1 0.780 0.02757 0.7275 0.836
192 170 1 0.775 0.02779 0.7226 0.832
193 169 1 0.771 0.02800 0.7176 0.827
204 168 1 0.766 0.02821 0.7127 0.823
205 166 1 0.761 0.02841 0.7077 0.819
207 165 1 0.757 0.02861 0.7027 0.815
209 164 1 0.752 0.02881 0.6978 0.811
212 162 2 0.743 0.02919 0.6878 0.802
216 160 2 0.734 0.02955 0.6779 0.794
223 157 1 0.729 0.02973 0.6729 0.790
231 156 1 0.724 0.02991 0.6679 0.785
232 155 1 0.720 0.03008 0.6630 0.781
237 154 1 0.715 0.03024 0.6580 0.777
244 153 1 0.710 0.03040 0.6531 0.772
247 152 1 0.706 0.03056 0.6481 0.768
257 151 1 0.701 0.03071 0.6432 0.764
258 150 1 0.696 0.03086 0.6383 0.759
259 149 1 0.692 0.03101 0.6333 0.755
262 148 2 0.682 0.03128 0.6235 0.746
268 146 2 0.673 0.03154 0.6138 0.738
275 144 1 0.668 0.03167 0.6089 0.733
280 143 1 0.663 0.03179 0.6040 0.729
286 141 1 0.659 0.03191 0.5991 0.724
293 140 1 0.654 0.03203 0.5942 0.720
294 139 1 0.649 0.03214 0.5893 0.716
299 138 1 0.645 0.03225 0.5844 0.711
302 137 1 0.640 0.03236 0.5796 0.707
314 136 1 0.635 0.03246 0.5747 0.702
322 134 1 0.631 0.03256 0.5698 0.698
337 131 1 0.626 0.03267 0.5648 0.693
341 129 1 0.621 0.03277 0.5598 0.689
348 126 1 0.616 0.03288 0.5547 0.684
350 125 1 0.611 0.03298 0.5496 0.679
358 124 1 0.606 0.03308 0.5446 0.675
366 122 1 0.601 0.03318 0.5395 0.670
367 121 1 0.596 0.03328 0.5343 0.665
368 119 1 0.591 0.03338 0.5292 0.660
376 118 1 0.586 0.03347 0.5241 0.656
386 117 1 0.581 0.03355 0.5189 0.651
389 116 1 0.576 0.03364 0.5138 0.646
393 115 1 0.571 0.03371 0.5087 0.641
394 114 1 0.566 0.03379 0.5036 0.636
399 112 1 0.561 0.03386 0.4984 0.631
428 109 1 0.556 0.03394 0.4932 0.627
434 108 1 0.551 0.03401 0.4879 0.622
438 107 1 0.546 0.03408 0.4827 0.617
450 105 1 0.540 0.03415 0.4774 0.612
452 104 1 0.535 0.03422 0.4722 0.607
457 102 1 0.530 0.03428 0.4668 0.602
460 101 1 0.525 0.03434 0.4615 0.597
465 99 1 0.519 0.03440 0.4562 0.591
482 96 1 0.514 0.03447 0.4507 0.586
489 95 1 0.509 0.03453 0.4452 0.581
496 94 1 0.503 0.03458 0.4398 0.576
504 92 1 0.498 0.03463 0.4342 0.570
512 91 1 0.492 0.03468 0.4287 0.565
514 90 1 0.487 0.03473 0.4232 0.560
517 89 1 0.481 0.03476 0.4178 0.554
518 87 1 0.476 0.03480 0.4122 0.549
522 86 1 0.470 0.03483 0.4067 0.544
523 85 2 0.459 0.03488 0.3956 0.533
532 80 1 0.453 0.03491 0.3899 0.527
533 78 1 0.448 0.03495 0.3841 0.522
540 77 1 0.442 0.03497 0.3783 0.516
546 74 1 0.436 0.03501 0.3723 0.510
550 73 1 0.430 0.03503 0.3664 0.504
560 70 1 0.424 0.03507 0.3603 0.498
563 69 1 0.418 0.03509 0.3542 0.492
581 62 1 0.411 0.03517 0.3474 0.486
591 59 1 0.404 0.03525 0.3404 0.479
612 54 2 0.389 0.03550 0.3252 0.465
624 51 1 0.381 0.03561 0.3175 0.458
646 48 1 0.373 0.03574 0.3095 0.450
652 47 1 0.365 0.03586 0.3015 0.443
661 46 1 0.357 0.03595 0.2935 0.435
667 45 1 0.350 0.03601 0.2856 0.428
679 44 1 0.342 0.03606 0.2777 0.420
683 43 1 0.334 0.03609 0.2699 0.412
708 39 1 0.325 0.03616 0.2614 0.404
714 37 1 0.316 0.03623 0.2527 0.396
739 35 1 0.307 0.03631 0.2437 0.387
749 34 1 0.298 0.03635 0.2348 0.379
755 33 1 0.289 0.03635 0.2260 0.370
760 32 1 0.280 0.03632 0.2173 0.361
771 28 1 0.270 0.03638 0.2075 0.352
774 27 1 0.260 0.03638 0.1978 0.342
785 26 1 0.250 0.03633 0.1882 0.332
821 20 2 0.225 0.03675 0.1635 0.310
836 17 1 0.212 0.03690 0.1506 0.298
837 16 1 0.199 0.03689 0.1380 0.286
857 14 1 0.184 0.03688 0.1246 0.273
878 13 1 0.170 0.03667 0.1116 0.260
892 10 1 0.153 0.03675 0.0958 0.245
899 9 1 0.136 0.03639 0.0807 0.230
plot(kmfit1)For specified survival time, the times argument can be
used. For example, to produce, survival estimate for for survival time
30 -
summary(kmfit1, times = 30)Call: survfit(formula = Y ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
30 228 7 0.97 0.0111 0.949 0.992
To get survival estimates at different survival times, say in interval of 100 -
summary(kmfit1,times=seq(0,max(addicts$survt),by=100))Call: survfit(formula = Y ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 238 0 1.000 0.0000 1.0000 1.000
100 203 25 0.893 0.0203 0.8537 0.933
200 168 27 0.771 0.0280 0.7176 0.827
300 137 27 0.645 0.0323 0.5844 0.711
400 111 17 0.561 0.0339 0.4984 0.631
500 92 11 0.503 0.0346 0.4398 0.576
600 57 17 0.404 0.0353 0.3404 0.479
700 39 9 0.334 0.0361 0.2699 0.412
800 21 9 0.250 0.0363 0.1882 0.332
900 8 8 0.136 0.0364 0.0807 0.230
1000 3 0 0.136 0.0364 0.0807 0.230
To stratify by the variable CLINIC -
kmfit2 = survfit(Y~addicts$clinic)
kmfit2Call: survfit(formula = Y ~ addicts$clinic)
n events median 0.95LCL 0.95UCL
addicts$clinic=1 163 122 428 348 514
addicts$clinic=2 75 28 NA 661 NA
To get survival estimates at specified times -
summary(kmfit2,times=seq(0,max(addicts$survt),by=100))Call: survfit(formula = Y ~ addicts$clinic)
addicts$clinic=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 163 0 1.0000 0.0000 1.00000 1.000
100 137 20 0.8746 0.0262 0.82467 0.928
200 110 20 0.7420 0.0353 0.67601 0.814
300 87 20 0.6046 0.0399 0.53120 0.688
400 68 14 0.5025 0.0415 0.42741 0.591
500 53 9 0.4319 0.0418 0.35719 0.522
600 30 16 0.2951 0.0403 0.22570 0.386
700 20 8 0.2113 0.0383 0.14818 0.301
800 10 8 0.1268 0.0326 0.07660 0.210
900 1 7 0.0181 0.0172 0.00283 0.116
addicts$clinic=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 75 0 1.000 0.0000 1.000 1.000
100 66 5 0.932 0.0294 0.876 0.991
200 58 7 0.832 0.0442 0.750 0.924
300 50 7 0.730 0.0530 0.633 0.842
400 43 3 0.685 0.0558 0.584 0.804
500 39 2 0.653 0.0577 0.549 0.776
600 27 1 0.634 0.0590 0.528 0.761
700 19 1 0.606 0.0625 0.495 0.742
800 11 1 0.575 0.0669 0.457 0.722
900 7 1 0.517 0.0812 0.380 0.703
1000 3 0 0.517 0.0812 0.380 0.703
Notice for CLINIC=1, survival times stopped at 900 rather than 1000 because no subject was at risk on day 1000.
Plotting the fitted survival function -
plot(kmfit2, frame.plot=FALSE, lwd = 2,
lty = c("solid","dashed"),
col=c("#DC7633","#2471A3"),
xlab = "Survival Time in Days",
ylab = "Survival Probabilities")
legend("topright", c("Clinic 1", "Clinic 2"),
lty = c("solid","dashed"), col=c("#DC7633","#2471A3"))The plot indicates that subjects from CLINIC=2 have a higher rate of survival than subjects from CLINIC=1.
To perform log rank test on the variable clinic -
survdiff(Y~clinic, data = addicts)Call:
survdiff(formula = Y ~ clinic, data = addicts)
N Observed Expected (O-E)^2/E (O-E)^2/V
clinic=1 163 122 90.9 10.6 27.9
clinic=2 75 28 59.1 16.4 27.9
Chisq= 27.9 on 1 degrees of freedom, p= 1e-07
The test shows there is a significant effect of clinic on survival.