Parametric vs Non-Parametric
- Non-parametric models make no prior assumption about the shape or form of the underlying distribution.
- This leads to a relatively less smooth estimate but has no model risk
- Parametric models make an assumption about the shape or form of the underlying distribution
- This leads to a relatively smooth estimation but has significant model risk (risk of choosing the wrong underlying distribution)
Censoring and Truncation
There is often not full knowledge/information of a life under investigation which can impact estimates for the survival function. This is accounted for through the consideration of censored and truncated information
Censoring/Truncation point are often subject to randomness.
- \(C_i\), the time at which the ith observation is censored is a random variable
If \(C_i\) is random, independence of lifetimes \(T\) and censoring point \(C\) is sufficient for it to be non informative
- Informative censoring is when \(T\) and \(C\) are dependent, for example withdrawing from a life insurance policy can indicate a healthy individual (greater \(T\))
Censoring
Censoring gives partial information of the event of interest.
Type 1 Right Censoring
- Event of interest is observed only if it occures to some prior time \(C_R\)
- Lifetime \(T\) is only known if \(T\leq C_R\); the observation will be \(C_R\) if \(T>C_R\)
- Examples include:
- Investigation ending before all lives being observed dying
- Life insurance policyholder surrendering their policies
- Type 2 Right Censoring
- Observation continues until a predetermined number of events (failures) occurs
- Data then consists of only the r smallest lifetimes in a sample of n (order statistics)
- Left Censoring
- The event of interest has already occured before the observation starts
- It is only known that lifetime \(L\) is less than left censoring time \(C_L\)
- Interval Censoring
- The lifetime \(T\) is only known to occur within an interval
- For example we may only know the year of the event
Truncation
Truncation gives no information about the event
Only event times that lie within a certain observation period \((Y_L,Y_R)\) are observed, otherwise no information is available at all.
- Examples include:
- Insurance claim not communicated because deductible is not reached
- Right censoring occurs when estimating distribution of starts from the earth in that stars too far away are not visible
- Examples include:
Likelihood for Censored/Truncated Data
We make the assumption that censoring times \(C\) and lifetimes \(T\) are independent (and therefore uninformative)
The likelihood for an observation t:
Exact Lifetime: \[ f(t) \]
Right Censored: \[ S(C_R) = P(T>C_R)\]
Left Censored: \[1 - S(C_L) = 1-P(T>C_L)\]
Interval Censored: \[ [S(C_L) - S(C_R)] \]
Left Truncated: \[ \frac{\text{any of above}}{S(Y_L)}\]
Right Truncated: \[ \frac{\text{any of above}}{1-S(Y_R)}\]
Interval Truncated: \[ \frac{\text{any of above}}{S(Y_L)-S(Y_R)}\]
Non-Parametric Estimation of Survival Function
Notation
- Population of \(N\) lives
- Observe \(m\) deaths; \(N-m\) lives left over are right censored
- \(d_j\) is the number of deaths to occur at time \(t_j\) (noting more than 1 death can occur at a time)
- \(\sum d_j = m\)
- \(c_j\) is the number of lives right censored at a time belonging to \([t_j, t_{j+1})\)
- The times at which observations are censored within the interval \([t_j, t_{j+1})\) are \(t_{j1}, ..., t_{jc_j}\)
- \(n_j\) is the number of lives alive and at risk at time \(t_j^-\) (just before time \(t_j\))
- \(n_j = d_j + c_j + n_{j+1}\)
- The largest observed study time is \(t_max = \max(t_{k},t_{kc_k})\)
Discrete Hazard Function
\(F(t)\) corresponds to the positive probability masses at the points \(t_1 < t_2 < ... < t_k\)
The hazard function is \(\lambda_j = P[T=t_j|T\geq t_j]\) (probability a life dies at \(t_j\) given they live to \(t_j\))
The survival function is then:
\[ S(t) = 1-F(t) = \prod_{j:t_j\leq t}(1-\lambda_j)\]
Assumptions
Non-Informative censoring
Lives are independent
Each interval \((t_{j-1},t_j]\) has the likelihood:
\[ \underbrace{F(t_j)-F(t_j^-)}_{d_j\text{ deaths}}\underbrace{\prod^{c_j}_{l=1}[1-F(t_{jl})]}_{c_j\text{ censored}} \]
- \(1-F(t_{jl})\) is minimized when \(F(t_{jl} = F(t_{j-1})\), therefore the maximum likelihood estimate of \(F(t)\) is a step function
Kaplan-Meier (Product Limit) Estimator
- The likelihood form of the data is in binomial form:
\[ L = \prod^k_{j=1}\underbrace{\lambda_j^{d_j}}_{\text{Dead}}\underbrace{[1-\lambda_j]^{n_j-d_j}}_{\text{Live}}\]
- Therefore the maximum likelihood estimator is:
\[ \hat{\lambda}_j = \frac{d_j}{n_j}\]
- Therefore the Kaplan-Meier Estimate for the cumulative density and therefore survival function is:
\[ \hat{F}(t) = 1 - \prod_{j;t_j\leq t}(1-\hat{\lambda})\] \[ \hat{S}(t) = \begin{cases} 1,\text{ if }t<t_1 \\ \prod_{j;t_j\leq t}(1-\frac{d_j}{n_j}),\text{ if } t_1\leq t\leq t_{\text{max}}\end{cases}\]
- If the last life is censored, aka \(t_{\text{max}} = t_{kc_k}\), the value of \(S(t)\) for \(t>t_{\max}\) is undetermined
- Can assume that all survivors after \(t_\max\) die immediately, therefore \(\hat{S}(t) = 0, t>t_\max\)
- Can assume all survivors after \(t_\max\) survive forever, therefore \(\hat{S}(t) = \hat{S(t_\max)} = \hat{S}(t_k), t>t_\max\)
- Could also do something in the middle of these two extremes, like a linear extrapolation from \(t_max\) to \(\omega\)
- Variance of Kaplan-Meier is calculated through Greenwood’s formula:
\[ Var(\hat{F}(t)) = Var(\hat{S}(t)) \approx \left[1-\hat{F}(t)\right]^2\sum_{j:t_j\leq t}\frac{d_j}{n_j(n_j-d_j)}\]
- Noting that maximum likelihood estimators are asymptotically normally distributed, the confidence interval for the KM estimate is:
- Note that CI is bounded by \([0,1]\)
\[ \hat{S}(t) \pm Z_{1-\frac{\alpha}{2}}\sqrt{Var(\hat{S}(t))} \]
Nelson-Aalen Estimator
- Another non-parametric approach which is based on the cumulative hazard function:
\[ \Lambda_t = \int^t_0\mu_s ds+ \underbrace{\sum_{j:t_j\leq t}m_j}_{\text{deaths}}\]
- Note that \(\hat{F}(t)\) is discrete, therefore we consider only \(\sum_{j:t_j\leq t}m_j\):
\[ \hat{\Lambda}_t = \sum_{j:t_j\leq t}\hat{\lambda}_j = \sum_{j:t_j\leq t}\frac{d_j}{n_j} \]
- As this is an estimate for the cumulative hazard, to get the estimate of the surival function:
\[ \hat{S}(t) = e^{-\hat\Lambda_t}\]
- The variance of the NA estimator \(\hat\Lambda_t\) is approximated as:
\[ Var(\tilde{\Lambda}(t))\approx \sum_{j:t_j\leq t}\frac{d_j(n_j-d_j)}{n^3_j}\]
- Noting the NA estimator is a maximum likelihood estimator, we can derive a confidence interval:
\[ \hat{\Lambda}(t) \pm Z_{1-\frac{\alpha}{2}}\sqrt{Var(\hat{\Lambda}(t))} \]
Comparison of Estimators
- By Taylor expansion and noting \(ln(1-x)<-x\) for small x, we can show:
\[ \hat{S}_{KM}(t) < \hat{S}_{NA}(t),\quad\text{for }t_1\leq t\leq t_\max\]
Comparing Survival Functions
In many populations we want to test if two different survival functions are ‘different’ (in the statistical sense)
- Such as when investigating smokers vs non-smokers
As there is a 1-1 relationship between survival function and hazard rate we can test between the difference in hazard rates.
The following hypothesis is tested:
- Note this can be generalsied to more hazard rates
\[ H_0: h_1(t) = h_2(t);\quad \forall t\leq\tau\]
\[ H_1:\text{ At least one of the hazard differ from } h_2(t)\]
- The general form of the test statistic is:
\[ Z_1 = \sum^k_{j=1}\tilde{w}(t_j)\left[\frac{d_{1j}}{n_{1j}}-\frac{d_j}{n_j}\right]\]
Where:
- \(\tilde{w}\) is a positive weight function
- \(t\) are the distinct death times
- \(d_{1j}\) are the number of deaths in Group 1 at time \(t_j\)
- \(n_{1j}\) is the number at risk prior to time \(t_j\) in group 1
- \(n_j\) are the total number at risk prior to time \(t_j\)
Weight function is typically in the form \(\tilde{w}(t_j)=w(t_j)n_{1j}\)
Therefore the test statistic becomes:
\[ Z_1 = \sum^k_{j=1}w(t_j)[d_{1j}-e_{1j}]\]
- Where \(e_{1j}\) is the expected number of deaths if the groups behaved the same:
\[ e_{1j} = n_{1j}\left(\frac{d_j}{n_j}\right)\]
- With variance:
\[ Var(Z_1) = \sum^k_{j=1}(w(t_j))^2\left(\frac{n_{1j}}{n_j}\right)\left(1-\frac{n_{1j}}{n_j}\right)\left(\frac{n_j-d_j}{n_j-1}\right)d_j \]
- Under the null hypothesis the following test statistic is a chi squared rv with 1 degree of freedom for large samples
\[ \chi^2=\frac{Z_1^2}{Var(Z_1)} \]
Common special cases ae based on the choice of the weight for the test statistic \(w(t_j)\)
- Log-Rank - \(w(t_j) = 1\)
- Powerful for detecting differences in hazard rates when they are proportional \((h_1(t) = rh_2(t))\)
- Peto-Peto - \(w(t_j) = \hat{S}_{KM}(t_j)\)
- Wilcoxon - \(w(t_j) = n_j\)
- Gives more weight to early observations (higher \(n_j\))
- Log-Rank - \(w(t_j) = 1\)