Survival Analysis using Kaplan–Meier estimate

Survival Analysis is a set of statistical methods for analyzing the occurrence of events over time.The two key functions in survival analysis are the survival function and the hazard function.The survival function, conventionally denoted by S, is the probability that the event has not occurred yet.

A popular estimate for the survival function S(t) is the Kaplan–Meier estimate

Loading the packages required

library(OIsurv)
## Loading required package: survival
## Loading required package: KMsurv
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggfortify)

Setting up the credentials to publish the graphs on plotly

Sys.setenv("plotly_username"="gupta.ruch")
Sys.setenv("plotly_api_key"="kuaxzjdqnl")

Loading the dataset tongue

data(tongue)

Printing the summary of the dataset

summary(tongue)
##       type           time            delta       
##  Min.   :1.00   Min.   :  1.00   Min.   :0.0000  
##  1st Qu.:1.00   1st Qu.: 23.75   1st Qu.:0.0000  
##  Median :1.00   Median : 69.50   Median :1.0000  
##  Mean   :1.35   Mean   : 73.83   Mean   :0.6625  
##  3rd Qu.:2.00   3rd Qu.:101.75   3rd Qu.:1.0000  
##  Max.   :2.00   Max.   :400.00   Max.   :1.0000

Attaching the dataset to the R search path.The dataset will be searched by R when evaluating a variable, so objects objects in the dataset cann be accessed by simoly giving their names.Creating a Survival object with Surv() function and it is usually used as a responce variable in a model formula.For right-censored data, only two arguments are needed in the Surv() function: a vector of times and a vector indicating which times are observed and censored.

attach(tongue)
tongue.surv <- Surv(time[type==1], delta[type==1])
tongue.surv
##  [1]   1    3    3    4   10   13   13   16   16   24   26   27   28   30 
## [15]  30   32   41   51   65   67   70   72   73   77   91   93   96  100 
## [29] 104  157  167   61+  74+  79+  80+  81+  87+  87+  88+  89+  93+  97+
## [43] 101+ 104+ 108+ 109+ 120+ 131+ 150+ 231+ 240+ 400+

Kaplan-Meier estimate and pointwise bounds: The Kaplan-Meier estimate is fit in R using the function survfit(). The simplest fit takes as input a formula of a survival object against an intercept

surv.fit <- survfit(tongue.surv~1)
surv.fit
## Call: survfit(formula = tongue.surv ~ 1)
## 
##       n  events  median 0.95LCL 0.95UCL 
##      52      31      93      67      NA

summary of the survival function which returns a list

summary(surv.fit)
## Call: survfit(formula = tongue.surv ~ 1)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1     52       1    0.981  0.0190        0.944        1.000
##     3     51       2    0.942  0.0323        0.881        1.000
##     4     49       1    0.923  0.0370        0.853        0.998
##    10     48       1    0.904  0.0409        0.827        0.988
##    13     47       2    0.865  0.0473        0.777        0.963
##    16     45       2    0.827  0.0525        0.730        0.936
##    24     43       1    0.808  0.0547        0.707        0.922
##    26     42       1    0.788  0.0566        0.685        0.908
##    27     41       1    0.769  0.0584        0.663        0.893
##    28     40       1    0.750  0.0600        0.641        0.877
##    30     39       2    0.712  0.0628        0.598        0.846
##    32     37       1    0.692  0.0640        0.578        0.830
##    41     36       1    0.673  0.0651        0.557        0.813
##    51     35       1    0.654  0.0660        0.537        0.797
##    65     33       1    0.634  0.0669        0.516        0.780
##    67     32       1    0.614  0.0677        0.495        0.762
##    70     31       1    0.594  0.0683        0.475        0.745
##    72     30       1    0.575  0.0689        0.454        0.727
##    73     29       1    0.555  0.0693        0.434        0.709
##    77     27       1    0.534  0.0697        0.414        0.690
##    91     19       1    0.506  0.0715        0.384        0.667
##    93     18       1    0.478  0.0728        0.355        0.644
##    96     16       1    0.448  0.0741        0.324        0.620
##   100     14       1    0.416  0.0754        0.292        0.594
##   104     12       1    0.381  0.0767        0.257        0.566
##   157      5       1    0.305  0.0918        0.169        0.550
##   167      4       1    0.229  0.0954        0.101        0.518

The Kaplan-Meier estimate may be plotted using plot(surv.fit).

plot(surv.fit, main='Kaplan-Meier estimate with 95% confidence bounds', xlab='time', ylab='survival function')

plotting the same graph using ggplot

ggplot <- autoplot(surv.fit,data = tongue, main='Kaplan-Meier estimate with 95% confidence bounds', xlab='time', ylab='survival function',surv.colour = 'orange', censor.colour = 'red')
ggplot

Plotting the same graph using plotly

plotly <- ggplotly(ggplot)
## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomConfint() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
plotly

Publishing graphs to your online plotly account

plotly_POST(plotly, "Survival Analysis using Kaplan–Meier estimate")
## No encoding supplied: defaulting to UTF-8.
## Success! Modified your plotly here -> https://plot.ly/~gupta.ruch/2