Reliability analysis focuses on quantifying how long a system or component continues to perform its intended function before failure. The fundamental metric in this analysis is the reliability (or survival) function, defined as:
\[ R(t) = P(T > t), \]
where \(T\) is a random variable representing the lifetime of the unit.
In engineering and biomedical applications, lifetime data is rarely complete. It is frequently subject to right-censoring, meaning that for some units, the event of interest (failure) has not occurred by the end of the observation period.
To address this, the Kaplan-Meier (KM) estimator provides a robust, nonparametric estimate of the reliability function, denoted as \(\hat{R}(t)\).
While traditional reliability reports focus on the estimated curve \(\hat{R}(t)\) or construct confidence intervals using Greenwood’s variance formula, these methods typically produce pointwise intervals. A pointwise interval guarantees that the true reliability falls within the bounds at a specific time \(t\) with probability \(1-\alpha\).
However, engineering applications often require stronger guarantees. For example, when establishing warranty periods or safety certifications, analysts need to ensure that the entire reliability curve lies within a specific region. This necessitates the use of Simultaneous Confidence Bands (SCBs). Unlike pointwise intervals, SCBs provide a global envelope \([L(t), U(t)]\) such that:
\[ P\left( L(t) \le R(t) \le U(t) \ \text{for all } t \in [0, t_{\max}] \right) \approx 1 - \alpha. \]
The goal of this project is to construct and compare two distinct approaches for generating these global confidence bands:
This comparison will illustrate the trade-offs between empirical adaptability (Bootstrap) and theoretical rigor (Borell-TIS) in the presence of censored data.
To achieve the comparison outlined above, this project aims to:
cancer dataset
from the R survival package, serving as a realistic proxy
for component reliability data.The ultimate objective is to provide a clear recommendation on when to prefer computational resampling versus analytical Gaussian bounds for reliability estimation.
To estimate the reliability function \(R(t)\) in the presence of right-censoring, we utilize the Product-Limit (Kaplan-Meier) estimator.
Suppose \(n\) units are placed on test. For each unit \(i = 1, \dots, n\), we observe a pair \((X_i, \delta_i)\), where:
\[ X_i = \min(T_i, C_i), \quad \delta_i = \mathbf{1}\{T_i \le C_i\}. \]
Here, \(T_i\) represents the true lifetime of the unit, and \(C_i\) represents the censoring time (e.g., the end of the study or loss to follow-up). The indicator \(\delta_i\) equals 1 if the failure is observed and 0 if the observation is censored.
Let \(t_{(1)} < t_{(2)} < \dots < t_{(k)}\) denote the distinct ordered failure times observed in the sample. For each distinct failure time \(t_{(j)}\), we define:
The Kaplan-Meier estimator is defined as the product over failure times:
\[ \hat{R}(t) = \prod_{t_{(j)} \le t} \left( 1 - \frac{d_j}{n_j} \right). \]
For \(t < t_{(1)}\), \(\hat{R}(t) = 1\). The estimator is a step function that drops only at observed failure times.
To quantify the uncertainty of \(\hat{R}(t)\), we require an estimate of its variance. A standard large-sample approximation is provided by Greenwood’s formula:
\[ \widehat{\mathrm{Var}}[\hat{R}(t)] = \hat{R}(t)^2 \sum_{t_{(j)} \le t} \frac{d_j}{n_j (n_j - d_j)}. \]
Using this variance, we can construct a \((1-\alpha)\) pointwise confidence interval for \(R(t)\) at any fixed time \(t\):
\[ \hat{R}(t) \pm z_{\alpha/2} \sqrt{\widehat{\mathrm{Var}}[\hat{R}(t)]}, \]
where \(z_{\alpha/2}\) is the upper \(\alpha/2\) quantile of the standard normal distribution.
To evaluate the performance of the estimators, we first generate synthetic data where the true reliability function is known. This allows us to visually verify the accuracy of the Kaplan-Meier estimator and the coverage of the Greenwood pointwise intervals.
We simulate lifetimes from a Weibull distribution, a common model in reliability engineering for components exhibiting wear-out behavior.
The observed time is \(X_i = \min(T_i, C_i)\) and the status is \(\delta_i = \mathbf{1}\{T_i \le C_i\}\).
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# library
library(survival)
# 1. set Seed
set.seed(2025)
# 2. parameters
n <- 500
shape_param <- 2
scale_param <- 100
# 3. generate Weibull distribution
T_true <- rweibull(n, shape = shape_param, scale = scale_param)
# 4. generate censoring times (Uniform). set max=150 to ensure a moderate percentage of censoring
C_times <- runif(n, min = 0, max = 150)
# 5. construct observed data X_i
obs_time <- pmin(T_true, C_times)
status <- as.numeric(T_true <= C_times) # indicator delta_i (1=failure, 0=censored)
# survival object
surv_obj <- Surv(obs_time, status)
# see the first 5 observations
head(surv_obj)
## [1] 55.77884 62.56528+ 81.55441 83.44385 49.80930 82.74532
Next, we fit the Kaplan-Meier estimator using the
survfit function. R also calculates the standard error
using Greenwood’s formula and produces
pointwise 95% confidence intervals.
# fit the Kaplan-Meier estimator
km_fit <- survfit(surv_obj ~ 1)
# define dynamic time points for the summary
# we pick 5 evenly spaced points based on the data range to inspect the fit
time_points <- pretty(c(0, max(obs_time)), n = 5)
time_points <- time_points[time_points > 0]
# summary
summary(km_fit, times = time_points)
## Call: survfit(formula = surv_obj ~ 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 50 261 99 0.751 0.0219 0.710 0.795
## 100 55 98 0.368 0.0310 0.312 0.434
We now visualize the estimated reliability curve \(\hat{R}(t)\). The solid line represents the KM estimate, and the dashed lines represent the pointwise intervals. Because this is a simulation, we can also overlay the true Weibull reliability curve (in red) to verify accuracy.
# plotting the KM Curve with pointwise CIs
plot(km_fit,
main = "Kaplan-Meier Estimate with Pointwise 95% CI",
xlab = "Time (t)",
ylab = "Reliability R(t)",
conf.int = TRUE,
col = "black",
lwd = 2,
las = 1)
# add in the TRUE theoretical Weibull curve for comparison
curve(pweibull(x, shape=shape_param, scale=scale_param, lower.tail=FALSE),
add=TRUE, col="red", lwd=2, lty=2)
legend("bottomleft",
legend = c("KM Estimate", "95% Pointwise CI", "True Weibull R(t)"),
col = c("black", "black", "red"),
lty = c(1, 2, 2),
lwd = c(2, 1, 2))
The KM estimate (step function) tracks the True Weibull curve (red dashed line) very closely. Although, we see as time jumps from \(t=50\) to \(t=100\) the standard error increase, hence the confidence band gets wider.
While pointwise intervals are useful for local estimation, they do not quantify the uncertainty of the survival curve as a whole. We seek Simultaneous Confidence Bands (SCBs) \([L(t), U(t)]\) such that:
\[ P\left( L(t) \le R(t) \le U(t) \ \text{for all } t \in [0, t_{\max}] \right) \approx 1 - \alpha. \]
The probability that the entire curve remains within the bounds is significantly lower than the probability that a single point remains within the bounds. Therefore, to maintain a 95% global confidence level, simultaneous bands must be wider than pointwise intervals.
We employ two methods to construct these bands.
The bootstrap method approximates the distribution of the global deviation statistic by resampling the observed data.
The Algorithm: 1. Resample: Draw \(B\) random samples of size \(n\) from the original data \(\{(X_i, \delta_i)\}\) with replacement. 2. Estimate: For each bootstrap sample \(b\), compute the Kaplan-Meier estimator \(\hat{R}^{*(b)}(t)\). 3. Supremum Statistic: Calculate the maximum absolute difference between the bootstrap estimate and the original estimate over the entire time range: \(T_b = \sup_{t} |\hat{R}^{*(b)}(t) - \hat{R}(t)\) 4. Threshold: Determine the critical value \(c_\alpha\) as the \((1-\alpha)\) empirical quantile of the statistics \(\{T_1, \dots, T_B\}\).
The resulting band is uniform in width: \[ L(t) = \hat{R}(t) - c_\alpha, \quad U(t) = \hat{R}(t) + c_\alpha. \]
We now derive the analytical bounds. This method relies on the asymptotic convergence of the normalized Kaplan-Meier error process.
Step 1: The Normalized Error Process
We define the normalized Kaplan-Meier error process as:
\[ G_n(t) = \sqrt{n}(\hat{R}(t) - R(t)) \]
By the Central Limit Theorem in function space, as \(n \to \infty\):
\[ G_n(t) \implies \mathcal{G}(t) \]
where \(\mathcal{G}(t)\) is a zero-mean Gaussian process. Under the hypothesis that \(\hat{R}(t)\) is the correct estimator, this process should fluctuate around zero. We seek a confidence band that covers the supremum (maximum absolute value) of this process with probability \(1-\alpha\).
Step 2: The Borell-TIS Inequality
The Borell-TIS inequality states that if \(\mathcal{G}(t)\) is a centered Gaussian process with maximum variance \(\sigma^2_{\max} = \sup_t \mathrm{Var}(\mathcal{G}(t))\), then for any large threshold \(u > 0\):
\[ P\left( \sup_{t} \mathcal{G}(t) - \mathbb{E}[\sup_t \mathcal{G}(t)] > u \right) \le \exp\left(-\frac{u^2}{2\sigma^2_{\max}}\right) \]
This theorem tells us that the maximum of the process concentrates sharply around its mean.
Step 3: Solving for the Threshold
We want to find a threshold \(u_\alpha\) such that the probability of exceeding it is at most \(\alpha\). Setting the exponential bound equal to \(\alpha\):
\[ \exp\left(-\frac{u_\alpha^2}{2\sigma^2_{\max}}\right) \le \alpha \]
Taking the natural log of both sides (and noting \(\ln(\alpha) < 0\)):
\[ -\frac{u_\alpha^2}{2\sigma^2_{\max}} \le \ln(\alpha) \]
Multiplying by \(-1\) reverses the inequality:
\[ \frac{u_\alpha^2}{2\sigma^2_{\max}} \ge -\ln(\alpha) = \ln(1/\alpha) \]
Solving for \(u_\alpha\), we get the smallest value satisfying the condition:
\[ u_\alpha = \sigma_{\max} \sqrt{2 \ln(1/\alpha)} \]
Step 4: Translating back to the Empirical Estimator
The derivation above applies to the normalized process \(G_n(t) = \sqrt{n}(\hat{R}(t) - R(t))\). To find the band width for the actual reliability function \(R(t)\), we must rescale by dividing by \(\sqrt{n}\).
\[ |\hat{R}(t) - R(t)| \le \frac{u_\alpha}{\sqrt{n}} \]
Substituting our solution for \(u_\alpha\):
\[ |\hat{R}(t) - R(t)| \le \frac{\sigma_{\max} \sqrt{2\ln(1/\alpha)}}{\sqrt{n}} \]
This yields the conservative simultaneous confidence bands:
\[ L(t) = \hat{R}(t) - \frac{\sigma_{\max} k_\alpha}{\sqrt{n}}, \quad U(t) = \hat{R}(t) + \frac{\sigma_{\max} k_\alpha}{\sqrt{n}} \]
where \(k_\alpha = \sqrt{2\ln(1/\alpha)}\). In practice, we estimate \(\sigma_{\max}\) using the maximum of the standard errors calculated via Greenwood’s formula.
We now calculate the global confidence bands using both methods described in the methodology.
# 1. Bootstrap CIs
B <- 1000 # numbers of samples
alpha <- 0.05
# we evaluate on the specific failure times observed in the original data
time_grid <- summary(km_fit)$time
R_hat <- summary(km_fit)$surv # KM estimate
# placeholder for maximum deviations
sup_deviations <- numeric(B)
# bootstrap Loop
for(b in 1:B) {
#resample data with replacement
indices <- sample(1:n, n, replace = TRUE)
boot_time <- obs_time[indices]
boot_status <- status[indices]
# then fit KM to bootstrap sample
km_boot <- survfit(Surv(boot_time, boot_status) ~ 1)
# now evaluate bootstrap KM on the exact same grid
R_boot <- summary(km_boot, times = time_grid, extend = TRUE)$surv
# finally calculate max absolute deviation for this loop
diffs <- abs(R_boot - R_hat)
sup_deviations[b] <- max(diffs, na.rm = TRUE)}
# Bootstrap critical value (c_alpha)
c_alpha_boot <- quantile(sup_deviations, 1 - alpha)
# 2. Borell-TIS CIs
# extract standard errors (Greenwood)
se_greenwood <- summary(km_fit)$std.err
# estimate maximum variance of the process: sigma^2_max = max(n * SE^2)
sigma_sq_max <- max(n * se_greenwood^2, na.rm = TRUE)
# calculate the threshold u_alpha:
u_alpha <- sqrt(sigma_sq_max) * sqrt(2 * log(1/alpha))
# calculate the Borell-TIS uniform half-width:
width_bt <- u_alpha / sqrt(n)
# 3. plot
# plot KM Estimate (Black)
plot(km_fit, conf.int = FALSE,
main = paste("Simultaneous Confidence Bands (n=", n, ")", sep=""),
xlab = "Time", ylab = "Reliability R(t)",
lwd = 2, las = 1)
# add Bootstrap bands (blue, dashed)
lines(km_fit$time, pmax(0, km_fit$surv - c_alpha_boot), col = "blue", lty = 2, lwd = 2)
lines(km_fit$time, pmin(1, km_fit$surv + c_alpha_boot), col = "blue", lty = 2, lwd = 2)
# add Borell-TIS bands (Red, Dotted)
lines(km_fit$time, pmax(0, km_fit$surv - width_bt), col = "red", lty = 3, lwd = 2)
lines(km_fit$time, pmin(1, km_fit$surv + width_bt), col = "red", lty = 3, lwd = 2)
legend("bottomleft",
legend = c("KM Estimate", "Bootstrap SCB", "Borell-TIS SCB"),
col = c("black", "blue", "red"),
lty = c(1, 2, 3), lwd = 2)
# print results for comparison
cat("Bootstrap Half-Width (c_alpha):", round(c_alpha_boot, 4), "\n")
## Bootstrap Half-Width (c_alpha): 0.11
cat("Borell-TIS Half-Width (u_alpha/sqrt(n)):", round(width_bt, 4), "\n")
## Borell-TIS Half-Width (u_alpha/sqrt(n)): 0.0838
The simulation for \(n=500\) yields the following insights regarding the reliability estimation and uncertainty quantification:
For the empirical application, we utilize the NCCTG Lung
Cancer dataset included in the R survival package.
This dataset contains survival times (in days) for patients with
advanced lung cancer from the North Central Cancer Treatment Group.
Although this is biomedical data, it is structurally identical to reliability data:
The dataset consists of 228 observations. While there are covariates like age and sex, our reliability analysis focuses on the time-to-event variables.
| Variable | Description | Role in Analysis |
|---|---|---|
| time | Survival time in days. | The lifetime variable (\(X_i\)). |
| status | Censoring status. | The event indicator (\(\delta_i\)). |
| age | Patient age in years. | Covariate (not used for univariate bands). |
| sex | Patient sex (1=Male, 2=Female). | Covariate. |
| ph.ecog | ECOG performance score (0-5). | Measure of functional capacity. |
We load the data and recode the status variable to align
with standard reliability notation (\(1=\) Failure, \(0=\) Censored).
Note: In the raw survival package data, Status is
coded as 1 (censored) and 2 (dead). We subtract 1 to convert this to the
standard 0/1 format.
# Load library and dataset
library(survival)
data(lung, package = "survival")
# rename dataset for modification
cancer_data <- lung
#View(cancer_data)
# Recode Status:
# original: 1 = Censored, 2 = Dead
# new: 0 = Censored, 1 = Dead (Failure)
cancer_data$status <- cancer_data$status - 1
# re-check dataset
#View(cancer_data)
# see the first 5 rows
head(cancer_data)
# only care about time and status
# check for missing values
na_time <- sum(is.na(cancer_data$time))
na_status <- sum(is.na(cancer_data$status))
cat("\nMissing Values in Time:", na_time, "\n")
##
## Missing Values in Time: 0
cat("Missing Values in Status:", na_status, "\n")
## Missing Values in Status: 0
Just like we did with the simulation, we start by fitting the standard Kaplan-Meier estimator and looking at the pointwise intervals. This gives us our baseline “survival curve” for the lung cancer patients.
We fit the Kaplan-Meier estimator to the empirical data. Note that we
define the formula as Surv(time, status) ~ 1, which treats
the entire dataset as a single population.
# 1. Fit the Kaplan-Meier Estimator to the cancer data
km_cancer <- survfit(Surv(time, status) ~ 1, data = cancer_data)
# 2. inspect the fit at certain time points
time_points_cancer <- pretty(c(0, max(cancer_data$time, na.rm=TRUE)), n = 5)
time_points_cancer <- time_points_cancer[time_points_cancer > 0]
summary(km_cancer, times = time_points_cancer)
## Call: survfit(formula = Surv(time, status) ~ 1, data = cancer_data)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 200 144 72 0.6803 0.0311 0.6219 0.744
## 400 57 54 0.3768 0.0358 0.3128 0.454
## 600 24 22 0.2136 0.0335 0.1571 0.290
## 800 8 15 0.0783 0.0246 0.0423 0.145
## 1000 2 2 0.0503 0.0228 0.0207 0.123
We visualize the survival curve with the standard pointwise 95% confidence intervals.
plot(km_cancer,
main = "Kaplan-Meier Estimate: Lung Cancer Survival",
xlab = "Time (Days)",
ylab = "Survival Probability",
conf.int = TRUE,
col = "black",
lwd = 2,
las = 1)
legend("topright",
legend = c("KM Estimate", "95% Pointwise CI"),
col = c("black", "black"),
lty = c(1, 2), lwd = 2)
High Mortality: by day 400, survival drops to ~37%. By day 1000, only ~5% of patients are expected to survive.
Declining Sample Size: we can see n.risk dropping rapidly (from 144 down to 2). This means our variance (uncertainty) will be high at the tail end, just like in the simulation.
We now construct the simultaneous confidence bands to quantify the global uncertainty of the lung cancer survival curve.
# parameters
n <- nrow(cancer_data) # n =228
B <- 1000 # Bootstrap iterations
alpha <- 0.05
# define the grid based on observed failure times in the real data
time_grid <- summary(km_cancer)$time
R_hat <- summary(km_cancer)$surv
# 2. Bootstrap construction
sup_deviations <- numeric(B)
for(b in 1:B) {
# Resample indices
indices <- sample(1:n, n, replace = TRUE)
# create bootstrap sample
boot_dat <- cancer_data[indices, ]
# fit KM to bootstrap sample
km_boot <- survfit(Surv(time, status) ~ 1, data = boot_dat)
R_boot <- summary(km_boot, times = time_grid, extend = TRUE)$surv
# calculate max deviation
diffs <- abs(R_boot - R_hat)
sup_deviations[b] <- max(diffs, na.rm = TRUE)
}
# Bootstrap multiplier
c_alpha_boot <- quantile(sup_deviations, 1 - alpha)
# 3. Borell-TIS construction
# extract Standard Errors from KM fit
se_greenwood <- summary(km_cancer)$std.err
# estimate maximum variance of the process
sigma_sq_max <- max(n * se_greenwood^2, na.rm = TRUE)
# calculate threshold u_alpha
u_alpha <- sqrt(sigma_sq_max) * sqrt(2 * log(1/alpha))
# calculate Uniform margin of error
width_bt <- u_alpha / sqrt(n)
# 4. Plot
plot(km_cancer, conf.int = FALSE,
main = "Simultaneous Bands: Lung Cancer Data",
xlab = "Time (Days)", ylab = "Survival Probability",
lwd = 2, las = 1)
# add Bootstrap bands (blue)
lines(km_cancer$time, pmax(0, km_cancer$surv - c_alpha_boot), col = "blue", lty = 2, lwd = 2)
lines(km_cancer$time, pmin(1, km_cancer$surv + c_alpha_boot), col = "blue", lty = 2, lwd = 2)
# add Borell-TIS bands (Red)
lines(km_cancer$time, pmax(0, km_cancer$surv - width_bt), col = "red", lty = 3, lwd = 2)
lines(km_cancer$time, pmin(1, km_cancer$surv + width_bt), col = "red", lty = 3, lwd = 2)
legend("topright",
legend = c("KM Estimate", "Bootstrap SCB", "Borell-TIS SCB"),
col = c("black", "blue", "red"),
lty = c(1, 2, 3), lwd = 2)
# compare
cat("Real Data - Bootstrap Width:", round(c_alpha_boot, 4), "\n")
## Real Data - Bootstrap Width: 0.0915
cat("Real Data - Borell-TIS Width:", round(width_bt, 4), "\n")
## Real Data - Borell-TIS Width: 0.0877
The application of both methods to the NCCTG lung cancer dataset (\(n=228\)) yielded results consistent with the simulation study:
This project set out to compare two methodologies for constructing simultaneous confidence bands for the reliability function under right-censoring: the data-driven Bootstrap method and the analytical Borell-TIS method (based on Gaussian process theory).
Accuracy of Pointwise Intervals: As demonstrated in the simulation, standard Greenwood pointwise intervals become increasingly unreliable at the tail of the distribution where the risk set is small. This confirms the necessity of simultaneous bands for making global guarantees about system performance.
Performance Comparison: Across both the Simulation Study (\(n=500\)) and the Real Data Analysis (\(n=228\)), the Borell-TIS method consistently produced narrower confidence bands compared to the Bootstrap approach.
Trade-offs:
For datasets with moderate to large sample sizes (\(n > 200\)), the Borell-TIS approach is highly recommended. It provides rigorous probabilistic guarantees with superior precision and computational speed. The Bootstrap remains a valuable alternative for smaller sample sizes or complex censoring patterns where asymptotic normality may be in doubt.