Non-Parametric Survival Estimation

Jake Warby

31/03/2022

Parametric vs Non-Parametric

  • Non-parametric models make no prior assumption about the shape or form of the underlying distribution.
    • This leads to a relatively less smooth estimate but has no model risk
  • Parametric models make an assumption about the shape or form of the underlying distribution
    • This leads to a relatively smooth estimation but has significant model risk (risk of choosing the wrong underlying distribution)

Censoring and Truncation

  • There is often not full knowledge/information of a life under investigation which can impact estimates for the survival function. This is accounted for through the consideration of censored and truncated information

  • Censoring/Truncation point are often subject to randomness.

    • \(C_i\), the time at which the ith observation is censored is a random variable
  • If \(C_i\) is random, independence of lifetimes \(T\) and censoring point \(C\) is sufficient for it to be non informative

    • Informative censoring is when \(T\) and \(C\) are dependent, for example withdrawing from a life insurance policy can indicate a healthy individual (greater \(T\))

Censoring

  • Censoring gives partial information of the event of interest.

  • Type 1 Right Censoring

    • Event of interest is observed only if it occures to some prior time \(C_R\)
    • Lifetime \(T\) is only known if \(T\leq C_R\); the observation will be \(C_R\) if \(T>C_R\)
    • Examples include:
      • Investigation ending before all lives being observed dying
      • Life insurance policyholder surrendering their policies

  • Type 2 Right Censoring
    • Observation continues until a predetermined number of events (failures) occurs
    • Data then consists of only the r smallest lifetimes in a sample of n (order statistics)
  • Left Censoring
    • The event of interest has already occured before the observation starts
    • It is only known that lifetime \(L\) is less than left censoring time \(C_L\)

  • Interval Censoring
    • The lifetime \(T\) is only known to occur within an interval
    • For example we may only know the year of the event

Truncation

  • Truncation gives no information about the event

  • Only event times that lie within a certain observation period \((Y_L,Y_R)\) are observed, otherwise no information is available at all.

    • Examples include:
      • Insurance claim not communicated because deductible is not reached
      • Right censoring occurs when estimating distribution of starts from the earth in that stars too far away are not visible

Likelihood for Censored/Truncated Data

  • We make the assumption that censoring times \(C\) and lifetimes \(T\) are independent (and therefore uninformative)

  • The likelihood for an observation t:

    • Exact Lifetime: \[ f(t) \]

    • Right Censored: \[ S(C_R) = P(T>C_R)\]

    • Left Censored: \[1 - S(C_L) = 1-P(T>C_L)\]

    • Interval Censored: \[ [S(C_L) - S(C_R)] \]

    • Left Truncated: \[ \frac{\text{any of above}}{S(Y_L)}\]

    • Right Truncated: \[ \frac{\text{any of above}}{1-S(Y_R)}\]

    • Interval Truncated: \[ \frac{\text{any of above}}{S(Y_L)-S(Y_R)}\]

Non-Parametric Estimation of Survival Function

Notation

  • Population of \(N\) lives
  • Observe \(m\) deaths; \(N-m\) lives left over are right censored
  • \(d_j\) is the number of deaths to occur at time \(t_j\) (noting more than 1 death can occur at a time)
  • \(\sum d_j = m\)
  • \(c_j\) is the number of lives right censored at a time belonging to \([t_j, t_{j+1})\)
  • The times at which observations are censored within the interval \([t_j, t_{j+1})\) are \(t_{j1}, ..., t_{jc_j}\)
  • \(n_j\) is the number of lives alive and at risk at time \(t_j^-\) (just before time \(t_j\))
  • \(n_j = d_j + c_j + n_{j+1}\)
  • The largest observed study time is \(t_max = \max(t_{k},t_{kc_k})\)

Discrete Hazard Function

  • \(F(t)\) corresponds to the positive probability masses at the points \(t_1 < t_2 < ... < t_k\)

  • The hazard function is \(\lambda_j = P[T=t_j|T\geq t_j]\) (probability a life dies at \(t_j\) given they live to \(t_j\))

  • The survival function is then:

\[ S(t) = 1-F(t) = \prod_{j:t_j\leq t}(1-\lambda_j)\]

Assumptions

  • Non-Informative censoring

  • Lives are independent

  • Each interval \((t_{j-1},t_j]\) has the likelihood:

\[ \underbrace{F(t_j)-F(t_j^-)}_{d_j\text{ deaths}}\underbrace{\prod^{c_j}_{l=1}[1-F(t_{jl})]}_{c_j\text{ censored}} \]

  • \(1-F(t_{jl})\) is minimized when \(F(t_{jl} = F(t_{j-1})\), therefore the maximum likelihood estimate of \(F(t)\) is a step function

Kaplan-Meier (Product Limit) Estimator

  • The likelihood form of the data is in binomial form:

\[ L = \prod^k_{j=1}\underbrace{\lambda_j^{d_j}}_{\text{Dead}}\underbrace{[1-\lambda_j]^{n_j-d_j}}_{\text{Live}}\]

  • Therefore the maximum likelihood estimator is:

\[ \hat{\lambda}_j = \frac{d_j}{n_j}\]

  • Therefore the Kaplan-Meier Estimate for the cumulative density and therefore survival function is:

\[ \hat{F}(t) = 1 - \prod_{j;t_j\leq t}(1-\hat{\lambda})\] \[ \hat{S}(t) = \begin{cases} 1,\text{ if }t<t_1 \\ \prod_{j;t_j\leq t}(1-\frac{d_j}{n_j}),\text{ if } t_1\leq t\leq t_{\text{max}}\end{cases}\]

  • If the last life is censored, aka \(t_{\text{max}} = t_{kc_k}\), the value of \(S(t)\) for \(t>t_{\max}\) is undetermined
    • Can assume that all survivors after \(t_\max\) die immediately, therefore \(\hat{S}(t) = 0, t>t_\max\)
    • Can assume all survivors after \(t_\max\) survive forever, therefore \(\hat{S}(t) = \hat{S(t_\max)} = \hat{S}(t_k), t>t_\max\)
    • Could also do something in the middle of these two extremes, like a linear extrapolation from \(t_max\) to \(\omega\)
  • Variance of Kaplan-Meier is calculated through Greenwood’s formula:

\[ Var(\hat{F}(t)) = Var(\hat{S}(t)) \approx \left[1-\hat{F}(t)\right]^2\sum_{j:t_j\leq t}\frac{d_j}{n_j(n_j-d_j)}\]

  • Noting that maximum likelihood estimators are asymptotically normally distributed, the confidence interval for the KM estimate is:
    • Note that CI is bounded by \([0,1]\)

\[ \hat{S}(t) \pm Z_{1-\frac{\alpha}{2}}\sqrt{Var(\hat{S}(t))} \]

Nelson-Aalen Estimator

  • Another non-parametric approach which is based on the cumulative hazard function:

\[ \Lambda_t = \int^t_0\mu_s ds+ \underbrace{\sum_{j:t_j\leq t}m_j}_{\text{deaths}}\]

  • Note that \(\hat{F}(t)\) is discrete, therefore we consider only \(\sum_{j:t_j\leq t}m_j\):

\[ \hat{\Lambda}_t = \sum_{j:t_j\leq t}\hat{\lambda}_j = \sum_{j:t_j\leq t}\frac{d_j}{n_j} \]

  • As this is an estimate for the cumulative hazard, to get the estimate of the surival function:

\[ \hat{S}(t) = e^{-\hat\Lambda_t}\]

  • The variance of the NA estimator \(\hat\Lambda_t\) is approximated as:

\[ Var(\tilde{\Lambda}(t))\approx \sum_{j:t_j\leq t}\frac{d_j(n_j-d_j)}{n^3_j}\]

  • Noting the NA estimator is a maximum likelihood estimator, we can derive a confidence interval:

\[ \hat{\Lambda}(t) \pm Z_{1-\frac{\alpha}{2}}\sqrt{Var(\hat{\Lambda}(t))} \]

Comparison of Estimators

  • By Taylor expansion and noting \(ln(1-x)<-x\) for small x, we can show:

\[ \hat{S}_{KM}(t) < \hat{S}_{NA}(t),\quad\text{for }t_1\leq t\leq t_\max\]

Comparing Survival Functions

  • In many populations we want to test if two different survival functions are ‘different’ (in the statistical sense)

    • Such as when investigating smokers vs non-smokers
  • As there is a 1-1 relationship between survival function and hazard rate we can test between the difference in hazard rates.

  • The following hypothesis is tested:

    • Note this can be generalsied to more hazard rates

\[ H_0: h_1(t) = h_2(t);\quad \forall t\leq\tau\]

\[ H_1:\text{ At least one of the hazard differ from } h_2(t)\]

  • The general form of the test statistic is:

\[ Z_1 = \sum^k_{j=1}\tilde{w}(t_j)\left[\frac{d_{1j}}{n_{1j}}-\frac{d_j}{n_j}\right]\]

  • Where:

    • \(\tilde{w}\) is a positive weight function
    • \(t\) are the distinct death times
    • \(d_{1j}\) are the number of deaths in Group 1 at time \(t_j\)
    • \(n_{1j}\) is the number at risk prior to time \(t_j\) in group 1
    • \(n_j\) are the total number at risk prior to time \(t_j\)
  • Weight function is typically in the form \(\tilde{w}(t_j)=w(t_j)n_{1j}\)

  • Therefore the test statistic becomes:

\[ Z_1 = \sum^k_{j=1}w(t_j)[d_{1j}-e_{1j}]\]

  • Where \(e_{1j}\) is the expected number of deaths if the groups behaved the same:

\[ e_{1j} = n_{1j}\left(\frac{d_j}{n_j}\right)\]

  • With variance:

\[ Var(Z_1) = \sum^k_{j=1}(w(t_j))^2\left(\frac{n_{1j}}{n_j}\right)\left(1-\frac{n_{1j}}{n_j}\right)\left(\frac{n_j-d_j}{n_j-1}\right)d_j \]

  • Under the null hypothesis the following test statistic is a chi squared rv with 1 degree of freedom for large samples

\[ \chi^2=\frac{Z_1^2}{Var(Z_1)} \]

  • Common special cases ae based on the choice of the weight for the test statistic \(w(t_j)\)

    • Log-Rank - \(w(t_j) = 1\)
      • Powerful for detecting differences in hazard rates when they are proportional \((h_1(t) = rh_2(t))\)
    • Peto-Peto - \(w(t_j) = \hat{S}_{KM}(t_j)\)
    • Wilcoxon - \(w(t_j) = n_j\)
      • Gives more weight to early observations (higher \(n_j\))