Definition of notations used here
\( T \) is a random variable for a person’s survival time. \( t \) is a specific value for \( T \)
Like any other continuous random variables, a probability density function and a corresponding cumulative density function are defined as follows. However, these are NOT specified in the non-parametric methods, e.g., Kaplan-Meier estimator, which are more commonly used in survival analysis.
\( f(t) \) is the probability density function for the random variable \( T \). It is the probability density of having a survival time equal to \( t \).
\( F(t) \) is the cumulative distribution function for the random variable \( T \). It is the probability of having a survival time less than or equal to \( t \) or the probability of death by time \( t \). By definition it is the AUC of the probability density function between time 0 to time \( t \).
In addition, functions that are specific to survival analysis are defined as follows. In survival analysis, often one of these are estimated from the observed survival times, and the other functions are estimated via their interrelationship as needed (non-parametric method, i.e., no need for prespecified probability density function).
\( h(t) \) is the hazard function. It is a probability per unit time of having an event given survival to time \( t \) (instantaneous incidence rate, not raw probability). It can be likened to the speed indicated on a speedometer.
\( H(t) \) is the cumulative hazard function for \( T \) It is total hazards that a person who has survived to time \( t \) has faced.
\( S(t) \) is the survivor function for \( T \). It is the probability of having a survival times longer than \( t \) or the probability of survival to time \( t \). It is also the AUC of the probability density function between time \( t \) to time \( \infty \).
Mathmathetical definitions
The survivor function and the cumulative density function have straightforward definitions as probabilities, and these are complementary to each other.
\( S(t) = P(T > t) \) It is the probability of having a survival times longer than \( t \) or the probability of survival to time \( t \).
\( F(t) = P(T ≤ t) \) It is the probability of having a survival time less than or equal to \( t \) or the probability of death by time \( t \).
The probability density function and the hazard function can be defined as limits.
\( f(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} \) This is a definition of a probability density function.
\( h(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t |T \ge t)} {\Delta t} \) Notice the similarity and difference to \( f(t) \) (absence or presence of a condition).
Relationships
Either one of the function can be defined with a functional form (parametric method) or estimated from the data (non-parametric method), and all other functions can be expressed as the function of the first function.
\( F(t) = 1 - S(t) \)
\( S(t) = 1 - F(t) = 1 - \int_0^t f(u)du = \int_t^\infty f(u)du \) Notice the change in the integration range.
\( f(t) = \frac {d} {dt} \{F(t)\} = \frac {d} {dt} \{1 - S(t)\} = - \frac {d} {dt} \{S(t)\} \) It is the slope of the cumulative event curve and the opposite of the slope of the survival curve.
\( f(t) = \frac {d} {dt} \{F(t)\} = \lim_{\Delta t \to 0} \frac {F(t + \Delta t) - F(t)} {\Delta t} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} \)
\( F(t) = \int_0^t f(u)du \)
\( h(t) = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t |T \ge t)} {\Delta t} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t \cap T \ge t)} {\Delta t P(T \ge t)} = \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t S(t)} = \frac {1} {S(t)} \lim_{\Delta t \to 0} \frac {P(t \le T < t + \Delta t)} {\Delta t} = \frac {f(t)} {S(t)} \) The conditional part is broken down using the Bayes’ theorem, and becomes the survivor function in the denominator, and the limit part is the definition of the probability density function.
Probability part of the above: \( P(t \le T < t + \Delta t | T \ge t) = \frac {P([t \le T < t + \Delta t] \cap [T \ge t])} {P(T \ge t)} = \frac {P([t \le T < t + \Delta t])} {P(T \ge t)} \) The \( \cap [T \ge t] \) part is removed, as it is redundant. The denominator is the survivor function.
\( h(t) = \frac {f(t)} {S(t)} = \frac {1} {S(t)} \frac {d} {dt} \{F(t)\} = \frac {1} {S(t)} \frac {d} {dt} \{1 - S(t)\} = - \frac {1} {S(t)} \frac {d} {dt} \{S(t)\} = - \frac {d} {dt} log(S(t)) \) The last part is by the reverse of the chain rule. The hazard function is the opposite of the slope of the log survivor function.
\( H(t) = \int_0^t h(u)du = \int_0^t - \frac {d} {du} \{log(S(u))\} du = - log(S(t)) \)
\( S(t) = e^{- H(t)} = e^{- \int_0^t h(u)du} \) (From the relationship above)
\( h(t) = \frac {d} {dt} \{H(t)\} \)
\( h(t) = \frac {f(t)} {S(t)} = \frac {f(t)} {1 - F(t)} = \frac {f(t)} {1 - \int_0^t f(u)du} \)
\( f(t) = h(t)S(t) = h(t)e^{- H(t)} = h(t)e^{- \int_0^t h(u)du} \)
\( S(t) = \frac {f(t)} {h(t)} \)
This function draws curves of functions using a given hazard function.
## Load ggplot2
library(ggplot2)
## Define graphing function
SurvGraph <- function(h, xlim = c(0,1), ylim = c(0,1)) {
## Define functions
## h(t) hazard function: Defined outside, and given as an argument
## H(t) cumulative hazard function: h(t) integrated from time = 0 to time = t
## Vectorize to enable use with a vector
H <- Vectorize(function(t) {
res <- integrate(h, lower = 0, upper = t)
res$value
})
## S(t) survivor function: Derived from H(t) = -logS(t)
S <- function(t) {
exp(-1 * H(t))
}
## f(t) probability density function (pdf): Derived from h(t) = f(t) / S(t)
f <- function(t) {
S(t) * h(t)
}
## F(t) cumulative distribution function (cdf): Complement of S(t), F(t) = 1 - S(t)
F <- function(t) {
1 - S(t)
}
## Graphing with ggplot2
ggplot(data = data.frame(x = xlim), aes(x)) +
stat_function(fun = h, aes(color = "h")) +
stat_function(fun = H, aes(color = "H")) +
stat_function(fun = S, aes(color = "S")) +
stat_function(fun = f, aes(color = "f")) +
stat_function(fun = F, aes(color = "F")) +
scale_x_continuous(name = "time", limit = xlim) +
scale_y_continuous(name = "value", limit = ylim) +
scale_color_manual(name = "functions",
values = c("h" = "black", "H" = "red", "S" = "green", "f" = "blue", "F" = "purple"),
breaks = c("h","H","S","f","F"),
labels = c("h(t)","H(t)","S(t)","f(t)","F(t)"))
}
eg. Death in healthy young population. \( h(t) = 0.5 + 0 * t \) (added 0 * t to return a vector) The survival curve is exponential.
h.constant <- function(t) 0.5 + 0 * t
SurvGraph(h = h.constant, xlim = c(0,5), ylim = c(0,2)) +
labs(title = "constant hazard: h(t) = 0.5")
eg. Cancer patients not responding to treatment \( h(t) = 0.3 * t \)
h.increasing <- function(t) 0.3 * t
SurvGraph(h = h.increasing, xlim = c(0,5), ylim = c(0,2)) +
labs(title = "increasing hazard: h(t) = 0.3t")
eg. Survival following surgery \( h(t) = 0.5 - 0.1 * t \).
h.decreasing <- function(t) 0.5 - 0.1 * t
SurvGraph(h = h.decreasing, xlim = c(0,5), ylim = c(0,2)) +
labs(title = "decreasing hazard: h(t) = 0.5 - 0.1t")
eg. Survival following tuberculosis infection (Potential of death increases early and decreases later)
h.rise.fall <- function(t, scale = 2, shape = 1.5) {
(shape / scale) * (t / scale)^(shape - 1) * exp(-1 * (t / scale)^shape)
}
SurvGraph(h = h.rise.fall, xlim = c(0,5), ylim = c(0,2)) +
labs(title = "rising and falling hazard")
eg. Lifespan of animals (more death at extremes of ages)
h.fall.rise <- function(t) 0.2 * (t - 1.5)^2 + 0.1
SurvGraph(h = h.fall.rise, xlim = c(0,5), ylim = c(0,2)) +
opts(title = "falling and rising hazard: h(t) = 0.15 * (t - 1)^2 + 0.1")
For other information: http://rpubs.com/kaz_yos/ If you find errors: kazky AT mac.com