The Nelson-Aalen estimator

The Nelson-Aalen estimator (Nelson 1969) (Nelson 1972) (Aalen 1978) is a non-parametric estimator of the cumulative hazard function based on a sample of times to event subject to right-censoring. The estimator is defined by (see also):

\[ {\tilde {H}}(t)=\sum _{{t_{i}\leq t}}{\frac {n_{i}}{r_{i}}}, \]

where \(n_{i}\) is the number of events at time \(t_{i}\) and \(r_{i}\) is the number of individuals at risk just prior to time \(t_{i}\).

An estimator of the survival function can be obtained by using \(\tilde{S}(t) = \exp\left\{ -\tilde{H}(t) \right\}\).

R code

The following R code shows how to calculate the Nelson-Aalen estimator for a sample containing right-censored observations. Three estimators are presented:

  1. A direct implementation of the Nelson-Aalen estimator (contained in the “routines.R” source). This allows the user to explore the steps in the implementation of the Nelson-Aalen estimator. (Disclaimer: this is just one possible strategy for the implementation, I am sure it can be improved!)

  2. Nelson-Aalen estimator implemented in the survival R package.

  3. Nelson-Aalen estimator implemented in the mice R package.

Data preparation

rm(list=ls())

# Required packages and sources
library(spBayesSurv)
library(survival)
library(mice)
## 
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
# https://github.com/FJRubio67/SurvivalTools
source("routinesNA.R")


data("LeukSurv")

# Times to event
time <- LeukSurv$time/365.24
# Vital status indicators
status <- LeukSurv$cens

Estimators

########################################################################################
# Nelson-Aalen estimator: direct implementation
########################################################################################

naest0 <- NA_Est(time, status)

head(naest0)
##            times       CHaz
## [1,] 0.000000000 0.00000000
## [2,] 0.002737926 0.02492809
## [3,] 0.005475851 0.04656034
## [4,] 0.008213777 0.06364577
## [5,] 0.010951703 0.07591571
## [6,] 0.013689629 0.08730287
# Function
NAEst <- Vectorize(function(t){
  out <- stepfun(x = naest0[-1,1], y = naest0[,2], f = 0  )(t)
  return( out )
})

curve(NAEst,0, max(time), xlab = "Time (Years)", ylab = "Cumulative Hazard", n = 1000,
     cex.lab = 1.5, cex.axis = 1.5, lwd = 2, col = "black")

########################################################################################
# Nelson-Aalen estimator: using the `survival` R package
########################################################################################

fit <- survfit(Surv(time, status) ~ 1)

naest1 <- cumsum(fit$n.event/fit$n.risk)


########################################################################################
# Nelson-Aalen estimator: using the `mice` R package
########################################################################################

naest2 <- nelsonaalen(LeukSurv, time, cens)

Comparisons

#---------------------------------------------------------------------------------
# Comparison: cumulative hazard functions
#---------------------------------------------------------------------------------

plot(naest0, xlab = "Time (Years)", ylab = "Cumulative Hazard",
     cex.lab = 1.5, cex.axis = 1.5, lwd = 2, col = "black", type = "s")
points(fit$time, naest1, type = "s", col = "red")
points(x = LeukSurv$time/365.24, y = naest2, type = "s", col = "blue")
legend("bottomright", legend = c("Direct", "survival", "mice"), lwd = c(2,2,2),
       lty = c(1,1,1), col = c("black", "red", "blue"))

#---------------------------------------------------------------------------------
# Comparison: Survival functions
#---------------------------------------------------------------------------------

plot(naest0[,1], exp(-naest0[,2]), xlab = "Time (Years)", ylab = "Survival",
     cex.lab = 1.5, cex.axis = 1.5, lwd = 2, col = "black", type = "s")
points(fit$time, exp(-naest1), type = "s", col = "red")
points(x = LeukSurv$time/365.24, y = exp(-naest2), type = "s", col = "blue")
legend("topright", legend = c("Direct", "survival", "mice"), lwd = c(2,2,2),
       lty = c(1,1,1), col = c("black", "red", "blue"))

Aalen, O. 1978. “Nonparametric Inference for a Family of Counting Processes.” The Annals of Statistics, 701–26.
Nelson, W. 1969. “Hazard Plotting for Incomplete Failure Data.” Journal of Quality Technology 1 (1): 27–52.
———. 1972. “Theory and Applications of Hazard Plotting for Censored Failure Data.” Technometrics 14 (4): 945–66.