Introduction

One of the features of survival (time to event) data is the possibility of censoring and/or truncation.

Censoring is where we know that the event times will be above or below some bound. Truncation is where we are limited in what we can observe by virtue of the data generating process. For example, a signal detector might only activate above a certain threshold. Signals below this level would be truncated because we would never know they even exist. Similarly, a car insurance company would never know about the accidents that happen where the damage amounts to less than the minimum claimable amount.

Right censoring

The most commonly described of censoring and truncation (and perhaps the most commonly encountered) is right censoring. In right censoring, we know that the event will happen (e.g. death) but it might not do so before we stop observing. So, observations are censored on the right. There are two types of right censoring, type I is where we observe a process for a certain period of time. Technically, we say our interest is in the random variable \(T_i\), characterising the event time for an individual. However, as we only make observations for a fixed period of time, \(c\), we see \(T_i\) only if \(T_i < c\). In other words, we observe \((U_i, \delta_i)\) where \(U_i = \mathsf{min}(T_i, c)\) and \(\delta_i = I(T_i < c)\). That is, we observe the minimum of the event time and the censoring time and we observe whether the event happened or it was censored.

Type II censoring is where we observe the process until there have been a designated number of failures. In this case, the number of censored values is fixed in advance.

Another form of censoring is what will usually happen in a clinical trial. Instead of a fixed \(c\) as in the type I censoring, you will have random \(c_i\) for each participant. For example, the participant may move from the area 2 weeks in, without informing the trial group or they might drop out for no particular reason, or the participant may stay in the trial to the end but not have an event.

It is important to note that censoring is assumed to be non-informative. That is, knowing the censoring time tells us nothing about what the failure time will be. All we know is that \(T>c\). If someone is censored because they are too sick to tolerate the intervention medicine in a trial looking at all-cause mortality, then that participant’s censoring time may well be informative of imminent death.

The other forms of censoring are left and interval.

So, censoring provides some information and we therefore need to figure out how to include them in any analyses.