Survival Analysis

Author

Justin Pons

Introduction

In this exercise, we will be conducting a survival analysis of three drugs that may slow the disease of a hypothetical disease.

The packages we will be using are, as always, tidyverse for improved code legibility, survival for survival analysis, and survminer for easy survival plots.

library(tidyverse)
library(survival)
library(survminer)
file <- "C:/Users/Justin Pons/Spring 2024/DA 6213/3/data3.txt"
df <- read_fwf(file, col_positions = fwf_empty(file, col_names = c("Group", "Time", "Event")))

Analysis

We take a quick look at our data using the head and str functions. We can see that there are three columns. Group indicates the drug group 1, 2, or 3. Event indicates whether the subject experienced an event, 1, or was censored from the study, 0. The Time column indicates when the censor or event occured.

head(df)
# A tibble: 6 × 3
  Group  Time Event
  <dbl> <dbl> <dbl>
1     1   681     0
2     1   602     0
3     1   996     0
4     1  1162     0
5     1   833     0
6     1   477     0

Group and Event are initially numeric. Ordinarily, these would be converted into factor variables, however the survival package requires these to remain numeric.

str(df)
spc_tbl_ [137 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Group: num [1:137] 1 1 1 1 1 1 1 1 1 1 ...
 $ Time : num [1:137] 681 602 996 1162 833 ...
 $ Event: num [1:137] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   Group = col_double(),
  ..   Time = col_double(),
  ..   Event = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

The Surv function creates a survival object and survfit creates survival curves for the three test groups.

surv1 <- Surv(time = df$Time, event = df$Event)
surv1
  [1]  681+  602+  996+ 1162+  833+  477+  630+  596+  226+  699+  811+  530+
 [13]  482+  367+  118    83    76   104   109    72    87   162    94    30 
 [25]   26    22    49    74   122    86    66    92   109   255     1   107 
 [37]  110   232  2569+ 2506+ 2409+ 2218+ 1857+ 1829+ 1562+ 1470+ 1363+ 1030+
 [49] 1860+ 1258+ 2246+ 1870+ 1799+ 1709+ 1674+ 1568+ 1527+ 1324+ 1957+ 1932+
 [61] 1847+ 1848+ 1850+ 1843+ 1535+ 1447+ 1384+  914  2204  1063   481   605 
 [73]  641   390   288   421  1379  1748   486   448   272  1074  1381  1410 
 [85] 1353  1480   435   248  1704  1411   219   606  2640+ 2430+ 2252+ 2140+
 [97] 2133+ 1738+ 2631+ 2524+ 1845+ 1936+ 1845+  422   162    84   100   212 
[109]   47   242   456   268   318   732   467   947   390   183   105   115 
[121]  164   693   120    80   677    64   168   874   616   157   625    48 
[133]  273   163   376   113   363 

The survfit function from survminer allows a quick survival curve plot to be generated with ggplot2. Visually, we can observe that Group 2 looks markedly different from the other two groups. Their survival rate is greater. We will confirm this with inferential tests.

fit<- survfit(Surv(Time, Event) ~ Group, data = df)
ggsurvplot(fit, data = df)

summary(fit)
Call: survfit(formula = Surv(Time, Event) ~ Group, data = df)

                Group=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1     38       1    0.974  0.0260        0.924        1.000
   22     37       1    0.947  0.0362        0.879        1.000
   26     36       1    0.921  0.0437        0.839        1.000
   30     35       1    0.895  0.0498        0.802        0.998
   49     34       1    0.868  0.0548        0.767        0.983
   66     33       1    0.842  0.0592        0.734        0.966
   72     32       1    0.816  0.0629        0.701        0.949
   74     31       1    0.789  0.0661        0.670        0.930
   76     30       1    0.763  0.0690        0.639        0.911
   83     29       1    0.737  0.0714        0.609        0.891
   86     28       1    0.711  0.0736        0.580        0.870
   87     27       1    0.684  0.0754        0.551        0.849
   92     26       1    0.658  0.0770        0.523        0.827
   94     25       1    0.632  0.0783        0.495        0.805
  104     24       1    0.605  0.0793        0.468        0.782
  107     23       1    0.579  0.0801        0.441        0.759
  109     22       2    0.526  0.0810        0.389        0.712
  110     20       1    0.500  0.0811        0.364        0.687
  118     19       1    0.474  0.0810        0.339        0.662
  122     18       1    0.447  0.0807        0.314        0.637
  162     17       1    0.421  0.0801        0.290        0.611
  232     15       1    0.393  0.0795        0.264        0.584
  255     14       1    0.365  0.0786        0.239        0.557

                Group=2 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
  219     54       1    0.981  0.0183        0.946        1.000
  248     53       1    0.963  0.0257        0.914        1.000
  272     52       1    0.944  0.0312        0.885        1.000
  288     51       1    0.926  0.0356        0.859        0.998
  390     50       1    0.907  0.0394        0.833        0.988
  421     49       1    0.889  0.0428        0.809        0.977
  435     48       1    0.870  0.0457        0.785        0.965
  448     47       1    0.852  0.0483        0.762        0.952
  481     46       1    0.833  0.0507        0.740        0.939
  486     45       1    0.815  0.0529        0.718        0.925
  605     44       1    0.796  0.0548        0.696        0.911
  606     43       1    0.778  0.0566        0.674        0.897
  641     42       1    0.759  0.0582        0.653        0.882
  914     41       1    0.741  0.0596        0.633        0.867
 1063     39       1    0.722  0.0611        0.611        0.852
 1074     38       1    0.703  0.0623        0.591        0.836
 1353     35       1    0.683  0.0637        0.569        0.820
 1379     33       1    0.662  0.0650        0.546        0.803
 1381     32       1    0.641  0.0662        0.524        0.785
 1410     30       1    0.620  0.0674        0.501        0.767
 1411     29       1    0.599  0.0684        0.479        0.749
 1480     26       1    0.576  0.0695        0.454        0.729
 1704     20       1    0.547  0.0717        0.423        0.707
 1748     18       1    0.516  0.0739        0.390        0.684
 2204      6       1    0.430  0.0998        0.273        0.678

                Group=3 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   47     45       1    0.978  0.0220        0.936        1.000
   48     44       1    0.956  0.0307        0.897        1.000
   64     43       1    0.933  0.0372        0.863        1.000
   80     42       1    0.911  0.0424        0.832        0.998
   84     41       1    0.889  0.0468        0.802        0.986
  100     40       1    0.867  0.0507        0.773        0.972
  105     39       1    0.844  0.0540        0.745        0.957
  113     38       1    0.822  0.0570        0.718        0.942
  115     37       1    0.800  0.0596        0.691        0.926
  120     36       1    0.778  0.0620        0.665        0.909
  157     35       1    0.756  0.0641        0.640        0.892
  162     34       1    0.733  0.0659        0.615        0.875
  163     33       1    0.711  0.0676        0.590        0.857
  164     32       1    0.689  0.0690        0.566        0.838
  168     31       1    0.667  0.0703        0.542        0.820
  183     30       1    0.644  0.0714        0.519        0.801
  212     29       1    0.622  0.0723        0.496        0.781
  242     28       1    0.600  0.0730        0.473        0.762
  268     27       1    0.578  0.0736        0.450        0.742
  273     26       1    0.556  0.0741        0.428        0.721
  318     25       1    0.533  0.0744        0.406        0.701
  363     24       1    0.511  0.0745        0.384        0.680
  376     23       1    0.489  0.0745        0.363        0.659
  390     22       1    0.467  0.0744        0.341        0.638
  422     21       1    0.444  0.0741        0.321        0.616
  456     20       1    0.422  0.0736        0.300        0.594
  467     19       1    0.400  0.0730        0.280        0.572
  616     18       1    0.378  0.0723        0.260        0.550
  625     17       1    0.356  0.0714        0.240        0.527
  677     16       1    0.333  0.0703        0.221        0.504
  693     15       1    0.311  0.0690        0.201        0.481
  732     14       1    0.289  0.0676        0.183        0.457
  874     13       1    0.267  0.0659        0.164        0.433
  947     12       1    0.244  0.0641        0.146        0.409

With a significant p-value less than .05, we confirm there is at least one group that is significantly different than at least one other group.

survdiff(Surv(Time, Event) ~ Group, data = df)
Call:
survdiff(formula = Surv(Time, Event) ~ Group, data = df)

         N Observed Expected (O-E)^2/E (O-E)^2/V
Group=1 38       24     12.3     11.07     13.66
Group=2 54       25     46.0      9.58     22.50
Group=3 45       34     24.7      3.51      5.04

 Chisq= 25.7  on 2 degrees of freedom, p= 3e-06 

We separate the observations by group and test them in this pairwise test. We observe that group 2 is significantly different than groups 1 and 3.

oneTwo <- df |> 
  filter(Group == 1 | Group == 2)
oneThree <- df |> 
  filter(Group == 1 | Group == 3)
twoThree <- df |> 
  filter(Group == 2 | Group == 3)
pairwise_survdiff(Surv(Time, Event) ~ Group, data = df)

    Pairwise comparisons using Log-Rank test 

data:  df and Group 

  1       2      
2 9.8e-07 -      
3 0.38    6.1e-05

P value adjustment method: BH