Last update: 2020-06-01 01:34:37

Intro

Our objective is to estimate the survival distribution of patients in the presence of censoring.

Time-to-event

In survival analysis the time until the occurrence of a well-defined event is recorded (time-to-event data). “Survival time”.

If everyone had an event, some of the methods we have already learned could be applied.

Often, not everyone has event (censoring) - Loss to follow-up - End of study

Censoring

Subjects are said to be censored if they are lost to follow up or drop out of the study, or if they die of unrelated causes or the study ends before they have the event of interest. They are counted as alive or disease-free for the time they were enrolled in the study. 10

Survival Analysis

Two-variable outcome:

Time variable \((T)\): is a random variable denoting the time of the event. \(t_i\) is the time at last event-free observation or just before the event, is a random variable having a probability distribution. Being this, the Survival function is the probability that the time of the event \((T)\) is later than some specified time \(t\). Formula 1.1

Censoring variable \((C)\): \(c_i\) = 1 if had the event; \(c_i\) = 0 no event by time \(t_i\)

Different models for survival data are distinguished by different choice of distribution for \(t_i\).

The object of primary interest is the survival function, conventionally denoted S, which is defined as

\[ S(t)=\Pr(T>t) \tag{1.1} \]

Let \(T\) be survival time, which is any positive number. A particular time is designated by the lower case letter \(t\). The cumulative distribution function (or lifetime distribution function) of \(T\) is the function, conventionally denoted \(F\), defined as the complement of the survival function.

\[ F(t)=\Pr(T \leq t)= 1 - S(t) \tag{2.1} \] If \(F\) is differentiable then the first derivative, which is the density function of the lifetime distribution, is conventionally denoted \(f\). The function \(f\) is sometimes called the event density; it is the rate of death or failure events per unit time.

\[f(t)=F'(t)={\frac{d}{dt}}F(t) \tag{3.1}\]

Back to the survival function (1.1), it can be expressed in terms of probability distribution and probability density functions

\[S(t)=\Pr(T>t)=\int _{t}^{{\infty }}f(u)\,du=1-F(t) \tag{4.1}\]

Similarly, a survival event density function can be defined as

\[s(t)=S'(t)={\frac {d}{dt}}S(t)={\frac {d}{dt}}\int _{t}^{{\infty }}f(u)\,du={\frac {d}{dt}}[1-F(t)]=-f(t) \tag{4.2}\]

Hazard function and cumulative hazard function

The hazard function, conventionally denoted , is defined as the event rate at time \(t\) conditional on survival until time \(t\) or later (that is, \(T ≥ t\)). Suppose that an item has survived for a time \(t\) and we desire the probability that it will not survive for an additional time \(dt\):

\[ {\lambda (t)=\lim _{dt\rightarrow 0}{\frac {\Pr(t\leq T<t+dt)}{dt\cdot S(t)}}={\frac {f(t)}{S(t)}}=-{\frac {S'(t)}{S(t)}}} \tag{4.1} \]

Example

library(readr)
library(tidyr)
library(splines2)
library(survival)
library(survminer)

Kaplan-Meier

data_test <- read_delim("data2.csv", ";",
  escape_double = FALSE,
  locale = locale(decimal_mark = ",", grouping_mark = "."),
  trim_ws = TRUE
)
Parsed with column specification:
cols(
  id = col_double(),
  fu = col_double(),
  event = col_double(),
  event_HD = col_double(),
  event_RT = col_double(),
  event_CR = col_double(),
  peritonitis = col_double(),
  sex = col_double(),
  age = col_double(),
  diab = col_double(),
  first = col_double()
)
km <- survfit(Surv(fu, event) ~ 1, data = data_test)
wb <- survreg(Surv(fu, event) ~ 1, data = data_test)

ggsurvplot(km, conf.int = TRUE, risk.table = "nrisk_cumevents", legend = "none", tables.height = 0.2)

km2 <- survfit(Surv(time, cens) ~ 1, data = GBSG2)
ggsurvplot(km2, palette = "blue", conf.int = TRUE, risk.table = "nrisk_cumevents", surv.median.line = "hv", tables.height = 0.3)

Weibull model

# weibull approximation
wb2 <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
surv <- seq(.99, .01, by = -0.1)
t <- predict(wb2, type = "quantile", p = 1 - surv, newdata = data.frame(1))
surv_wb <- data.frame(time = t, surv = surv, upper = NA, lower = NA, std.err = NA)
ggsurvplot_df(fit = surv_wb, surv.geom = geom_line)



# weibull approximation with covariants
# Compute the Weibull model
wbmod <- survreg(Surv(time, cens) ~ horTh + tsize, data = GBSG2)

# Decide on "imaginary patients"
newdat <- expand.grid(
  horTh = levels(GBSG2$horTh),
  tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75))
)

# Compute survival curves
surv <- seq(.99, .01, by = -0.1)
t <- predict(wbmod, type = "quantile", p = 1 - surv, newdata = newdat)
surv_wbmod_wide <- cbind(newdat, t)

# Create data.frame with survival curve information
surv_wbmod <- pivot_longer(surv_wbmod_wide, -(horTh:tsize), names_to = "surv_id", values_to = "time")
surv_wbmod$surv <- surv[as.numeric(surv_wbmod$surv_id)]
surv_wbmod[, c("upper", "lower", "std.err", "strata")] <- NA
surv_wbmod <- as.data.frame(surv_wbmod) # survplot bug

# plot
ggsurvplot_df(surv_wbmod, surv.geom = geom_line, linetype = "horTh", color = "tsize", legend.title = NULL)

Cox Model


# Compute the cox model
cxmod <- coxph(Surv(time, cens) ~ horTh + tsize, data = GBSG2)

# Decide on covariate combinations ("imaginary patients")
newdat <- expand.grid(
  horTh = levels(GBSG2$horTh),
  tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75))
)
rownames(newdat) <- letters[1:6]

# Compute survival curves
cxsf <- survfit(cxmod, data = GBSG2, newdata = newdat, conf.type = "none")

# Create data.frame with survival curve information

surv_cxmod0 <- surv_summary(cxsf)
surv_cxmod <- cbind(surv_cxmod0, newdat[as.character(surv_cxmod0$strata), ])

# Plot
ggsurvplot_df(surv_cxmod, linetype = "horTh", color = "tsize",
              legend.title = NULL, censor = FALSE)

---
title: "HEADS - HIDA - STATS - Survival Analysis"
output:
  html_notebook:
    highlight: pygments
    theme: united
    toc: yes
author: Francisco Bischoff
---

Last update: `r lubridate::now()`

## Intro

Our objective is to estimate the survival distribution of patients in the presence of censoring.

 - Why not compare mean time-to-event between groups using a t-test or linear regression?
   - Ignores censoring

 - Why not compare proportion of events in groups using risk/odds ratios or logistic regression?
   - Ignores time

## Time-to-event

In survival analysis the time until the occurrence of a well-defined event is recorded (time-to-event data). "Survival time".

If everyone had an event, some of the methods we have already learned could be applied.

Often, not everyone has event (censoring)
 - Loss to follow-up
 - End of study

## Censoring

Subjects are said to be censored if they are lost to follow up or drop out of the study, or if they die of unrelated causes or the study ends before they have the event of interest. They are counted as alive or disease-free for the time they were enrolled in the study.
10

## Survival Analysis

Two-variable outcome: 

Time variable $(T)$: is a *random variable* denoting the time of the event. $t_i$ is the time at last event-free observation or just before the event, is a random variable having a probability distribution. Being this, the Survival function is the probability that the time of the event $(T)$ is later than some specified time $t$. Formula [1.1]()

Censoring variable $(C)$: $c_i$ = 1 if had the event; $c_i$ = 0 no event by time $t_i$


Different models for survival data are distinguished by different choice of distribution for $t_i$.




The object of primary interest is the survival function, conventionally denoted S, which is defined as

$$
S(t)=\Pr(T>t) \tag{1.1}
$$

Let $T$ be survival time, which is any positive number. A *particular time* is designated by the *lower case letter* $t$. The **cumulative distribution function** (or *lifetime distribution function*) of $T$ is the function, conventionally denoted $F$, defined as the complement of the survival function.

- The **cumulative distribution function** of $T$

$$
F(t)=\Pr(T \leq t)= 1 - S(t) \tag{2.1}
$$
If $F$ is differentiable then the first derivative, which is the **density function of the lifetime distribution**, is conventionally denoted $f$.
The function $f$ is sometimes called the **event density**; it is the **rate of death or failure events per unit time**.

- Density function of the **cumulative distribution function** of $T$
- Rate of death or failure events per unit time
- The probability of the event occurring at exactly time $t$

$$f(t)=F'(t)={\frac{d}{dt}}F(t) \tag{3.1}$$

Back to the survival function (1.1), it can be expressed in terms of *probability distribution* and *probability density functions*


$$S(t)=\Pr(T>t)=\int _{t}^{{\infty }}f(u)\,du=1-F(t) \tag{4.1}$$

Similarly, a survival **event density** function can be defined as

$$s(t)=S'(t)={\frac  {d}{dt}}S(t)={\frac  {d}{dt}}\int _{t}^{{\infty }}f(u)\,du={\frac  {d}{dt}}[1-F(t)]=-f(t) \tag{4.2}$$

## Hazard function and cumulative hazard function

- $f(t)$ **event density**, is rate of death or failure events per unit time. The probability of the event occurring at exactly time $t$
  - Example: a woman born today has, say, a 1% chance of dying at 80 years
- Hazard rate is an instantaneous **incidence rate**. It is conditional on survival until time $t$ or later.
  - Example: a woman who is 79 today has, say, a 5% chance of dying at 80 years


The hazard function, conventionally denoted \lambda, is defined as the event rate at time $t$ conditional on survival until time $t$ or later (that is, $T ≥ t$). Suppose that an item has survived for a time $t$ and we desire the probability that it will not survive for an additional time $dt$:

$$
{\lambda (t)=\lim _{dt\rightarrow 0}{\frac {\Pr(t\leq T<t+dt)}{dt\cdot S(t)}}={\frac {f(t)}{S(t)}}=-{\frac {S'(t)}{S(t)}}}
\tag{4.1}
$$

## Example

- We need two variables: one continuous for the time, and one categorical for flagging the event


```{r setup, message = FALSE,	warning = FALSE}
library(readr)
library(tidyr)
library(splines2)
library(survival)
library(survminer)
```


### Kaplan-Meier

```{r}
data_test <- read_delim("data2.csv", ";",
  escape_double = FALSE,
  locale = locale(decimal_mark = ",", grouping_mark = "."),
  trim_ws = TRUE
)

km <- survfit(Surv(fu, event) ~ 1, data = data_test)
wb <- survreg(Surv(fu, event) ~ 1, data = data_test)

ggsurvplot(km, conf.int = TRUE, risk.table = "nrisk_cumevents", legend = "none", tables.height = 0.2)
```


```{r}
km2 <- survfit(Surv(time, cens) ~ 1, data = GBSG2)
ggsurvplot(km2, palette = "blue", conf.int = TRUE, risk.table = "nrisk_cumevents", surv.median.line = "hv", tables.height = 0.3)
```


### Weibull model

```{r}
# weibull approximation
wb2 <- survreg(Surv(time, cens) ~ 1, data = GBSG2)
surv <- seq(.99, .01, by = -0.1)
t <- predict(wb2, type = "quantile", p = 1 - surv, newdata = data.frame(1))
surv_wb <- data.frame(time = t, surv = surv, upper = NA, lower = NA, std.err = NA)
ggsurvplot_df(fit = surv_wb, surv.geom = geom_line)


# weibull approximation with covariants
# Compute the Weibull model
wbmod <- survreg(Surv(time, cens) ~ horTh + tsize, data = GBSG2)

# Decide on "imaginary patients"
newdat <- expand.grid(
  horTh = levels(GBSG2$horTh),
  tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75))
)

# Compute survival curves
surv <- seq(.99, .01, by = -0.1)
t <- predict(wbmod, type = "quantile", p = 1 - surv, newdata = newdat)
surv_wbmod_wide <- cbind(newdat, t)

# Create data.frame with survival curve information
surv_wbmod <- pivot_longer(surv_wbmod_wide, -(horTh:tsize), names_to = "surv_id", values_to = "time")
surv_wbmod$surv <- surv[as.numeric(surv_wbmod$surv_id)]
surv_wbmod[, c("upper", "lower", "std.err", "strata")] <- NA
surv_wbmod <- as.data.frame(surv_wbmod) # survplot bug

# plot
ggsurvplot_df(surv_wbmod, surv.geom = geom_line, linetype = "horTh", color = "tsize", legend.title = NULL)
```


### Cox Model


```{r}

# Compute the cox model
cxmod <- coxph(Surv(time, cens) ~ horTh + tsize, data = GBSG2)

# Decide on covariate combinations ("imaginary patients")
newdat <- expand.grid(
  horTh = levels(GBSG2$horTh),
  tsize = quantile(GBSG2$tsize, probs = c(0.25, 0.5, 0.75))
)
rownames(newdat) <- letters[1:6]

# Compute survival curves
cxsf <- survfit(cxmod, data = GBSG2, newdata = newdat, conf.type = "none")

# Create data.frame with survival curve information

surv_cxmod0 <- surv_summary(cxsf)
surv_cxmod <- cbind(surv_cxmod0, newdat[as.character(surv_cxmod0$strata), ])

# Plot
ggsurvplot_df(surv_cxmod, linetype = "horTh", color = "tsize",
              legend.title = NULL, censor = FALSE)
```
