Demonstration of survival curves with different hazard functions

Special Thanks

Dr. Hisateru Tachimori, National Center of Neurology and Psychiatry, Japan
Dr. Alex Richard Cook, Saw Swee Hock School of Public Health, National University of Singapore

References

Kleinbaum et al. Survival Analysis: A Self-Learning Text. 3rd Ed. Springer. http://www.sph.emory.edu/~dkleinb/surv2.htm
Collet et al. Modelling Survival Data in Medical Research. 2nd Ed. Chapman. http://www.crcpress.com/product/isbn/9781584883258
Pintilie. Competing Risks: A Practical Perspective. Wiley. http://www.uhnres.utoronto.ca/labs/hill/People_Pintilie.htm
Tableman et al. Survival Analysis Using S/R. http://stat.ethz.ch/wbl/Skript_SurvivalAnalysis.pdf
Cook. ST3242: Introduction to Survival Analysis. http://courses.nus.edu.sg/course/stacar/internet/st3242/handouts/ch1l2.pdf
Cook. ST3242: Introduction to Survival Analysis. http://files.campus.edublogs.org/blog.nus.edu.sg/dist/a/853/files/2010/12/ch1.pdf
Fox. Cox Proportional-Hazards Regression for Survival Data. http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
Univ. of Pittsburg. Supercourse. http://www.pitt.edu/~super4/5.htm
Princeton Uni. WWS509 Generalized Linear Models. Chapter 6. Survival Models. http://data.princeton.edu/wws509/notes/

Definition of notations used here

\( T \) is a random variable for a person’s survival time. \( t \) is a specific value for \( T \)
Like any other continuous random variables, a probability density function and a corresponding cumulative density function are defined as follows. However, these are NOT specified in the non-parametric methods, e.g., Kaplan-Meier estimator, which are more commonly used in survival analysis.
\( f(t) \) is the probability density function for the random variable \( T \). It is the probability density of having a survival time equal to \( t \).
\( F(t) \) is the cumulative distribution function for the random variable \( T \). It is the probability of having a survival time less than or equal to \( t \) or the probability of death by time \( t \). By definition it is the AUC of the probability density function between time 0 to time \( t \).
In addition, functions that are specific to survival analysis are defined as follows. In survival analysis, often one of these are estimated from the observed survival times, and the other functions are estimated via their interrelationship as needed (non-parametric method, i.e., no need for prespecified probability density function).
\( h(t) \) is the hazard function. It is a probability per unit time of having an event given survival to time \( t \) (instantaneous incidence rate, not raw probability). It can be likened to the speed indicated on a speedometer.
\( H(t) \) is the cumulative hazard function for \( T \) It is total hazards that a person who has survived to time \( t \) has faced.
\( S(t) \) is the survivor function for \( T \). It is the probability of having a survival times longer than \( t \) or the probability of survival to time \( t \). It is also the AUC of the probability density function between time \( t \) to time \( \infty \).

Mathmathetical definitions

The survivor function and the cumulative density function have straightforward definitions as probabilities, and these are complementary to each other.

\( S(t) = P(T > t) \) It is the probability of having a survival times longer than \( t \) or the probability of survival to time \( t \).
\( F(t) = P(T ≤ t) \) It is the probability of having a survival time less than or equal to \( t \) or the probability of death by time \( t \).

The probability density function and the hazard function can be defined as limits.

\( f(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} \) This is a definition of a probability density function.
\( h(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t |T \ge t)} {\Delta t} \) Notice the similarity and difference to \( f(t) \) (absence or presence of a condition).

Relationships

Either one of the function can be defined with a functional form (parametric method) or estimated from the data (non-parametric method), and all other functions can be expressed as the function of the first function.

\( F(t) = 1 - S(t) \)
\( S(t) = 1 - F(t) = 1 - \int_0^t f(u)du = \int_t^\infty f(u)du \) Notice the change in the integration range.
\( f(t) = \frac {d} {dt} \{F(t)\} = \frac {d} {dt} \{1 - S(t)\} = - \frac {d} {dt} \{S(t)\} \) It is the slope of the cumulative event curve and the opposite of the slope of the survival curve.
\( f(t) = \frac {d} {dt} \{F(t)\} = \lim_{\Delta t \to 0} \frac {F(t + \Delta t) - F(t)} {\Delta t} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} \)
\( F(t) = \int_0^t f(u)du \)
\( h(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t |T \ge t)} {\Delta t} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t \cap T \ge t)} {\Delta t P(T \ge t)} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t S(t)} = \frac {1} {S(t)} \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} = \frac {f(t)} {S(t)} \) The conditional part is broken down using the Bayes’ theorem, and becomes the survivor function in the denominator, and the limit part is the definition of the probability density function.
Probability part of the above: \( P(t \le T < t + \Delta t | T \ge t) = \frac {P([t \le T < t + \Delta t] \cap [T \ge t])} {P(T \ge t)} = \frac {P([t \le T < t + \Delta t])} {P(T \ge t)} \) The \( \cap [T \ge t] \) part is removed, as it is redundant. The denominator is the survivor function.
\( h(t) = \frac {f(t)} {S(t)} = \frac {1} {S(t)} \frac {d} {dt} \{F(t)\} = \frac {1} {S(t)} \frac {d} {dt} \{1 - S(t)\} = - \frac {1} {S(t)} \frac {d} {dt} \{S(t)\} = - \frac {d} {dt} log(S(t)) \) The last part is by the reverse of the chain rule. The hazard function is the opposite of the slope of the log survivor function.
\( H(t) = \int_0^t h(u)du = \int_0^t - \frac {d} {du} \{log(S(u))\} du = - log(S(t)) \)
\( S(t) = e^{- H(t)} = e^{- \int_0^t h(u)du} \) (From the relationship above)
\( h(t) = \frac {d} {dt} \{H(t)\} \)
\( h(t) = \frac {f(t)} {S(t)} = \frac {f(t)} {1 - F(t)} = \frac {f(t)} {1 - \int_0^t f(u)du} \)
\( f(t) = h(t)S(t) = h(t)e^{- H(t)} = h(t)e^{- \int_0^t h(u)du} \)
\( S(t) = \frac {f(t)} {h(t)} \)

Define graphing function for graphical demonstration:

This function draws curves of functions using a given hazard function.

## Load ggplot2
library(ggplot2)

## Define graphing function
SurvGraph <- function(h, xlim = c(0,1), ylim = c(0,1)) {
    ## Define functions
    ## h(t) hazard function: Defined outside, and given as an argument

    ## H(t) cumulative hazard function: h(t) integrated from time = 0 to time = t
    ## Vectorize to enable use with a vector
    H <- Vectorize(function(t) {
        res <- integrate(h, lower = 0, upper = t)
        res$value
    })

    ## S(t) survivor function: Derived from H(t) = -logS(t)
    S <- function(t) {
        exp(-1 * H(t))
    }

    ## f(t) probability density function (pdf): Derived from h(t) = f(t) / S(t)
    f <- function(t) {
        S(t) * h(t)
    }

    ## F(t) cumulative distribution function (cdf): Complement of S(t), F(t) = 1 - S(t)
    F <- function(t) {
        1 - S(t)
    }

    ## Graphing with ggplot2
    ggplot(data = data.frame(x = xlim), aes(x)) +
        stat_function(fun = h, aes(color = "h")) +
        stat_function(fun = H, aes(color = "H")) +
        stat_function(fun = S, aes(color = "S")) +
        stat_function(fun = f, aes(color = "f")) +
        stat_function(fun = F, aes(color = "F")) +
        scale_x_continuous(name = "time",  limit = xlim) +
        scale_y_continuous(name = "value", limit = ylim) +
        scale_color_manual(name = "functions",
                           values = c("h" = "black", "H" = "red", "S" = "green", "f" = "blue", "F" = "purple"),
                           breaks = c("h","H","S","f","F"),
                           labels = c("h(t)","H(t)","S(t)","f(t)","F(t)"))
}

Constant hazard:

eg. Death in healthy young population. \( h(t) = 0.5 + 0 * t \) (added 0 * t to return a vector) The survival curve is exponential.

h.constant <- function(t) 0.5 + 0 * t

SurvGraph(h = h.constant, xlim = c(0,5), ylim = c(0,2)) +
    labs(title = "constant hazard: h(t) = 0.5")

plot of chunk unnamed-chunk-3

Increasing hazard:

eg. Cancer patients not responding to treatment \( h(t) = 0.3 * t \)

h.increasing <- function(t) 0.3 * t

SurvGraph(h = h.increasing, xlim = c(0,5), ylim = c(0,2)) +
        labs(title = "increasing hazard: h(t) = 0.3t")

plot of chunk unnamed-chunk-4

Decreasing hazard:

eg. Survival following surgery \( h(t) = 0.5 - 0.1 * t \).

h.decreasing <- function(t) 0.5 - 0.1 * t

SurvGraph(h = h.decreasing, xlim = c(0,5), ylim = c(0,2)) +
        labs(title = "decreasing hazard: h(t) = 0.5 - 0.1t")

plot of chunk unnamed-chunk-5

Rising and falling hazard:

eg. Survival following tuberculosis infection (Potential of death increases early and decreases later)

h.rise.fall <- function(t, scale = 2, shape = 1.5) {
    (shape / scale) * (t / scale)^(shape - 1) * exp(-1 * (t / scale)^shape)
}

SurvGraph(h = h.rise.fall, xlim = c(0,5), ylim = c(0,2)) +
        labs(title = "rising and falling hazard")

plot of chunk unnamed-chunk-6

**Falling and rising hazard: h(t) = 0.15 * (t - 1)² + 0.1**

eg. Lifespan of animals (more death at extremes of ages)

h.fall.rise <- function(t) 0.2 * (t - 1.5)^2 + 0.1

SurvGraph(h = h.fall.rise, xlim = c(0,5), ylim = c(0,2)) +
        opts(title = "falling and rising hazard: h(t) = 0.15 * (t - 1)^2 + 0.1")

plot of chunk unnamed-chunk-7

For other information: http://rpubs.com/kaz_yos/ If you find errors: kazky AT mac.com