library(tidyverse)
library(survival)
library(survminer)Survival Analysis
Introduction
In this exercise, we will be conducting a survival analysis of three drugs that may slow the disease of a hypothetical disease.
The packages we will be using are, as always, tidyverse for improved code legibility, survival for survival analysis, and survminer for easy survival plots.
file <- "C:/Users/Justin Pons/Spring 2024/DA 6213/3/data3.txt"
df <- read_fwf(file, col_positions = fwf_empty(file, col_names = c("Group", "Time", "Event")))Analysis
We take a quick look at our data using the head and str functions. We can see that there are three columns. Group indicates the drug group 1, 2, or 3. Event indicates whether the subject experienced an event, 1, or was censored from the study, 0. The Time column indicates when the censor or event occured.
head(df)# A tibble: 6 × 3
Group Time Event
<dbl> <dbl> <dbl>
1 1 681 0
2 1 602 0
3 1 996 0
4 1 1162 0
5 1 833 0
6 1 477 0
Group and Event are initially numeric. Ordinarily, these would be converted into factor variables, however the survival package requires these to remain numeric.
str(df)spc_tbl_ [137 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Group: num [1:137] 1 1 1 1 1 1 1 1 1 1 ...
$ Time : num [1:137] 681 602 996 1162 833 ...
$ Event: num [1:137] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "spec")=
.. cols(
.. Group = col_double(),
.. Time = col_double(),
.. Event = col_double()
.. )
- attr(*, "problems")=<externalptr>
The Surv function creates a survival object and survfit creates survival curves for the three test groups.
surv1 <- Surv(time = df$Time, event = df$Event)
surv1 [1] 681+ 602+ 996+ 1162+ 833+ 477+ 630+ 596+ 226+ 699+ 811+ 530+
[13] 482+ 367+ 118 83 76 104 109 72 87 162 94 30
[25] 26 22 49 74 122 86 66 92 109 255 1 107
[37] 110 232 2569+ 2506+ 2409+ 2218+ 1857+ 1829+ 1562+ 1470+ 1363+ 1030+
[49] 1860+ 1258+ 2246+ 1870+ 1799+ 1709+ 1674+ 1568+ 1527+ 1324+ 1957+ 1932+
[61] 1847+ 1848+ 1850+ 1843+ 1535+ 1447+ 1384+ 914 2204 1063 481 605
[73] 641 390 288 421 1379 1748 486 448 272 1074 1381 1410
[85] 1353 1480 435 248 1704 1411 219 606 2640+ 2430+ 2252+ 2140+
[97] 2133+ 1738+ 2631+ 2524+ 1845+ 1936+ 1845+ 422 162 84 100 212
[109] 47 242 456 268 318 732 467 947 390 183 105 115
[121] 164 693 120 80 677 64 168 874 616 157 625 48
[133] 273 163 376 113 363
The survfit function from survminer allows a quick survival curve plot to be generated with ggplot2. Visually, we can observe that Group 2 looks markedly different from the other two groups. Their survival rate is greater. We will confirm this with inferential tests.
fit<- survfit(Surv(Time, Event) ~ Group, data = df)
ggsurvplot(fit, data = df)summary(fit)Call: survfit(formula = Surv(Time, Event) ~ Group, data = df)
Group=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 38 1 0.974 0.0260 0.924 1.000
22 37 1 0.947 0.0362 0.879 1.000
26 36 1 0.921 0.0437 0.839 1.000
30 35 1 0.895 0.0498 0.802 0.998
49 34 1 0.868 0.0548 0.767 0.983
66 33 1 0.842 0.0592 0.734 0.966
72 32 1 0.816 0.0629 0.701 0.949
74 31 1 0.789 0.0661 0.670 0.930
76 30 1 0.763 0.0690 0.639 0.911
83 29 1 0.737 0.0714 0.609 0.891
86 28 1 0.711 0.0736 0.580 0.870
87 27 1 0.684 0.0754 0.551 0.849
92 26 1 0.658 0.0770 0.523 0.827
94 25 1 0.632 0.0783 0.495 0.805
104 24 1 0.605 0.0793 0.468 0.782
107 23 1 0.579 0.0801 0.441 0.759
109 22 2 0.526 0.0810 0.389 0.712
110 20 1 0.500 0.0811 0.364 0.687
118 19 1 0.474 0.0810 0.339 0.662
122 18 1 0.447 0.0807 0.314 0.637
162 17 1 0.421 0.0801 0.290 0.611
232 15 1 0.393 0.0795 0.264 0.584
255 14 1 0.365 0.0786 0.239 0.557
Group=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
219 54 1 0.981 0.0183 0.946 1.000
248 53 1 0.963 0.0257 0.914 1.000
272 52 1 0.944 0.0312 0.885 1.000
288 51 1 0.926 0.0356 0.859 0.998
390 50 1 0.907 0.0394 0.833 0.988
421 49 1 0.889 0.0428 0.809 0.977
435 48 1 0.870 0.0457 0.785 0.965
448 47 1 0.852 0.0483 0.762 0.952
481 46 1 0.833 0.0507 0.740 0.939
486 45 1 0.815 0.0529 0.718 0.925
605 44 1 0.796 0.0548 0.696 0.911
606 43 1 0.778 0.0566 0.674 0.897
641 42 1 0.759 0.0582 0.653 0.882
914 41 1 0.741 0.0596 0.633 0.867
1063 39 1 0.722 0.0611 0.611 0.852
1074 38 1 0.703 0.0623 0.591 0.836
1353 35 1 0.683 0.0637 0.569 0.820
1379 33 1 0.662 0.0650 0.546 0.803
1381 32 1 0.641 0.0662 0.524 0.785
1410 30 1 0.620 0.0674 0.501 0.767
1411 29 1 0.599 0.0684 0.479 0.749
1480 26 1 0.576 0.0695 0.454 0.729
1704 20 1 0.547 0.0717 0.423 0.707
1748 18 1 0.516 0.0739 0.390 0.684
2204 6 1 0.430 0.0998 0.273 0.678
Group=3
time n.risk n.event survival std.err lower 95% CI upper 95% CI
47 45 1 0.978 0.0220 0.936 1.000
48 44 1 0.956 0.0307 0.897 1.000
64 43 1 0.933 0.0372 0.863 1.000
80 42 1 0.911 0.0424 0.832 0.998
84 41 1 0.889 0.0468 0.802 0.986
100 40 1 0.867 0.0507 0.773 0.972
105 39 1 0.844 0.0540 0.745 0.957
113 38 1 0.822 0.0570 0.718 0.942
115 37 1 0.800 0.0596 0.691 0.926
120 36 1 0.778 0.0620 0.665 0.909
157 35 1 0.756 0.0641 0.640 0.892
162 34 1 0.733 0.0659 0.615 0.875
163 33 1 0.711 0.0676 0.590 0.857
164 32 1 0.689 0.0690 0.566 0.838
168 31 1 0.667 0.0703 0.542 0.820
183 30 1 0.644 0.0714 0.519 0.801
212 29 1 0.622 0.0723 0.496 0.781
242 28 1 0.600 0.0730 0.473 0.762
268 27 1 0.578 0.0736 0.450 0.742
273 26 1 0.556 0.0741 0.428 0.721
318 25 1 0.533 0.0744 0.406 0.701
363 24 1 0.511 0.0745 0.384 0.680
376 23 1 0.489 0.0745 0.363 0.659
390 22 1 0.467 0.0744 0.341 0.638
422 21 1 0.444 0.0741 0.321 0.616
456 20 1 0.422 0.0736 0.300 0.594
467 19 1 0.400 0.0730 0.280 0.572
616 18 1 0.378 0.0723 0.260 0.550
625 17 1 0.356 0.0714 0.240 0.527
677 16 1 0.333 0.0703 0.221 0.504
693 15 1 0.311 0.0690 0.201 0.481
732 14 1 0.289 0.0676 0.183 0.457
874 13 1 0.267 0.0659 0.164 0.433
947 12 1 0.244 0.0641 0.146 0.409
With a significant p-value less than .05, we confirm there is at least one group that is significantly different than at least one other group.
survdiff(Surv(Time, Event) ~ Group, data = df)Call:
survdiff(formula = Surv(Time, Event) ~ Group, data = df)
N Observed Expected (O-E)^2/E (O-E)^2/V
Group=1 38 24 12.3 11.07 13.66
Group=2 54 25 46.0 9.58 22.50
Group=3 45 34 24.7 3.51 5.04
Chisq= 25.7 on 2 degrees of freedom, p= 3e-06
We separate the observations by group and test them in this pairwise test. We observe that group 2 is significantly different than groups 1 and 3.
oneTwo <- df |>
filter(Group == 1 | Group == 2)
oneThree <- df |>
filter(Group == 1 | Group == 3)
twoThree <- df |>
filter(Group == 2 | Group == 3)pairwise_survdiff(Surv(Time, Event) ~ Group, data = df)
Pairwise comparisons using Log-Rank test
data: df and Group
1 2
2 9.8e-07 -
3 0.38 6.1e-05
P value adjustment method: BH