W. Q. Meeker and L. A. Escobar
20 May 2020
Statistical methods, based on the binomial distribution, to estimate \(F(t)\) from interval & singly right-censored data, without assuming an underlying parametric distribution.
Standard errors of the nonparametric estimator and approximate confidence intervals for \(F(t)\).
Methods to extend nonparametric estimation to allow for combinations of interval-censored & multiply right-censored data.
The Kaplan-Meier nonparametric estimator for data with observations reported as exact failure times
A generalized nonparametric estimator of \(F(t)\) for arbitrary censoring
At the beginning of any statistical study, it’s best to allow the data to speak for itself - before attempting to fit the data with a parametric distribution
Chapter 3 outlines methods to compute nonparametric estimates for
\[ \widehat{F}(t), \widehat{S}(t), se(\widehat{F}(t)) \]
The goal will be to compute \((1-\alpha)\times 100\%\) confidence intervals on point estimates for the true, but unknown, values
First, we’ll work with singly censored data before extending to multiply censored data and then to data with arbitrary censoring
Can be used when we don’t know the underlying distribution governing the failure process that produced the data
Are commonly used in survivability analyses (biostatistics and actuarial sciences)
Are gaining traction in reliability studies
Can simplify analyses by quickly revealing poor-fitting parametric models
Can be used as a benchmark to compare accuracy of parametric analyses
Cannot be used for prediction outside the range of observed data
Tests with only one censoring event occurring at time \(t_c\)
For a
A right censored observation results when the observational period ends before the event of interest occurs for a test unit
Product-to-market schedule limits the allocated test time for a new mobile device
Non-safety critical aircraft components replaced at each depot cycle whether or not they have failed
par(mar = c(0,0,0,0)) ; set.seed(42)
plot(NA,
axes = F,
xlab = "",
ylab = "",
xlim = range(0,105),
ylim = range(16,18))
segments(x0 = c(0,70),
y0 = c(18,18),
x1 = c(0,70),
y1 = c(17,17),
lwd = 2)
arrows(x0 = 0,
y0 = 17.5,
x1 = 100,
y1 = 17.5,
lwd = 2)
text(x = c(0,70,104),
y = c(16.8,16.8,17.5),
labels = c(0,
expression(t[c]),
expression(infinity)),
cex=c(1.5,1.5,2))
text(x = c(runif(20,5,65),runif(5,75,95)),
y = c(rep(17.5,25)),
labels = c(rep("x",20),rep("?",5)),
col = c(rep(1,20),rep(2,5)),
cex = 2)For a
Left censored observations can result when it is difficult or expensive to inspect samples, if the test was set up improperly, or if an unknown flaw causes an infant mortality failure
Tests conducted in remote locations (polar regions, space, desert)
Inspection requires extensive maintenance to gain access
par(mar = c(0,0,0,0)) ; set.seed(42)
plot(NA,
axes = FALSE,
xlab = "",
ylab = "",
xlim = c(0,105),
ylim = c(16,18))
segments(x0 = c(0,70),
y0 = c(18,18),
x1 = c(0,70),
y1 = c(17,17),
lwd = 2)
arrows(x0 = 0,
y0 = 17.5,
x1 = 100,
y1 = 17.5,
lwd = 2)
text(x = c(0,70,104,runif(25,5,65)),
y = c(16.8,16.8,17.5,rep(17.5,25)),
labels = c(0,expression(t[c]),
expression(infinity),
rep("?",25)),
cex = c(1.5,1.5,rep(2,26)),
col = c(1,1,rep('blue',26)))Motivation
This example introduce the concept of estimating failure probability from
Recall the heatexchanger data set presented in Example 1.5
Suppose we inspect \(100\) heat exchanger tubes at Plant 1 only, and assign each tube a value \(d_i\), where \[ d_i=\begin{cases}1&\mbox{if the tube has failed}\\\\ 0&\mbox{if the tube has not failed}\end{cases} \]
Figure 3.1 shows the number of failures observed during each of the three years of operation
par(family = 'serif', font = 2)
plot(NA,
axes = FALSE,
xlab = '',
ylab = '',
xlim = range(-50,350),
ylim = range(-10,150))
segments( x0 = c(0,0,0,100,200,300),
y0 = c(50,100,50,20,20,20),
x1 = c(350,350,0,100,200,300),
y1 = c(50,100,100,100,100,100),
lwd = c(rep(2,6)))
text(x = rep(-35,6),
y = c(seq(27,127,50),seq(17,117,100),6),
labels = c('Unconditional',
'Plant 1',
'100 tubes',
'Failure',
'at start',
'Probability'),
cex = c(1,1.15,1,1,1,1))
text(x = c(seq(50,250,100),325),
y = rep(20,4),
labels = c(expression(pi[1]),
expression(pi[2]),
expression(pi[3]),
expression(pi[4])),
cex = 1.5)
text(x = seq(50,250,100),
y = rep(108,3),
labels = c('Year 1','Year 2','Year 3'),
cex = rep(1.15,3))
text(x = c(150,320,320),
y = c(rep(140,2),130),
labels = c('Cracked tubes','Uncracked','tubes'),
cex = rep(1,3))
text(x = c(seq(50,250,100),325),
y = rep(77,4),
labels = c('1','2','2','95'),
cex = rep(1.5,4))Figure 3.1 - Diagram of the Plant 1 data from the heatexchanger data set
Data set subset(SMRD::heatexchanger, plant=='Plant1')
The inspection data are \(100\) binary observations from a Bernoulli RV
The Bernoulli parameter \(p \in [0,1]\) represents the probability of observing a “success”
The trials are mutually independent (i.e. The probability of “success” is the same for each observation )
The outcome of this inspection procedue (the total number of observed “successes”) is a single observation from a binomial RV with parameters \(n, p\)
Recall, the binomial distribution models the probability of observing \(x\) successes in \(n\) trials where the probability of success for each trial is \(p\)
The probability mass function for a \(BIN(n,p)\) random variable is
\[ f(x;p,n) = \left( \begin{array}{c} n \\ x \end{array} \right)(p)^{x}(1 - p)^{(n-x)} \;\;\;\;\;\; \mbox{for x = 0, 1, 2,..., n} \]
where
\(x\) is the number of observed “successes” (failures in this case)
\(n\) is the total sample size
\(p\) is the true (but unknown) long-run probability that a unit will fail
A nonparametric estimator for \(F(t)\) is the binomial parameter \(p\)
\[\hat{p}_{_{MLE}}=\frac{x}{n}\]
par(family = 'serif', font = 2, cex = 1.15)
library(package = SMRD)
HE.ld <-
frame.to.ld(subset(SMRD::heatexchanger, plant == 'Plant1'),
response.column = c(1,2),
censor.column = 3,
case.weight.column = 4,
time.units = 'Years')
plot(HE.ld,
band.type = 'Pointwise',
ylim = c(0,.2),
xlim = c(0,3))Figure 3.2 - Nonparametric CDF plot of the heatexchanger data set (Plant 1 only)
If no failures occur in \((t_{i-1},t_{i})\), \(F(t)\) is constant
If one or more failures occur in \((t_{i-1},t_{i})\), we can estimate \(F(t_{i})-F(t_{i-1})\) But we won’t know when the failures occured in the interval
Convention for this text
There’s insufficient information to assign a value of \(\widehat{F}(t)\in (t_{i-1},t_{i})\) for censored observations occuring between inspections
Changes in \(\widehat{F}(t)\in (t_{i-1},t_{i})\) result at the end of the interval
We usually can’t inspect every unit in a population, but we can inspect a random sample taken from the population
Regardless of how the sample was drawn, sampling errors will be present
The amount of sampling error depends on the size of the sample (relative to the size of the population)
As the sample size increases the properties of the sample approach the properties of the overall population and the sampling errors become smaller
There are many ways that can introduce uncertainty into our analyses
Limited understanding of how a system interacts with its operating environment
Imperfect knowledge of manufacturing flaws & deviations between units in a sample
Errors introduced by fitting data to a parametric model
Small sample sizes can amplify each of these uncertainties
Confidence intervals (CI) are useful tools for quantifying the uncertainty associated with an estimate or a prediction due to sampling errors
What does it mean to say “…the \(100\times(1-\alpha)\%\) CI for \(\theta =(\underline{\widehat{\theta}},\overline{\widehat{\theta}})\)…”
For \(\alpha=0.05,\) a \(95\%\) CI on \(\theta\) this means
If an experiment were run \(i\) times (each run producing a distinct data set)
The true (but unknown) value of \(\theta\) would be captured in the interval \((\underline {\widehat {\theta_{i}}},\overline{\widehat{\theta_{i}}})\) in \(95\%\) of those runs.
A \(95\%\) CI on \(\theta\)
For each experiment the probability \(P(\underline{\widehat{\theta_{i}}} <\theta<\overline{\widehat{\theta_{i}}})\) is either \(1\) or \(0\)
This is because \(\theta\) is either in the interval, or it isn’t
The confidence_intervals app from the teachingApps package illustrates the idea of generating confidence intervals
Run the app by pasting the code below into the R console
There are different ‘classes’ of quantities that we may want to estimate
Unconstrined quantities that can be positive or negative
Quantities that are strictly positive
Quantities defined between 0 and 1
Specific procedures exist to construct \(100\times (1-\alpha)\%\) CI’s for quantities in each of these classes
Most of the CI estimation procedures produce
For confidence level \(\alpha\), \(1-\alpha\) is the
The
Approximate CI’s result if the procedure used to construct the CI is based on a distribution that differs from the distribution of the quantity being estimated
Only approximate CI methods exists for data sets containing time-censored observations
For some data sets, CI estimation procedures exist to produce exact confidence intervals
For confidence level \(\alpha\), \(1-\alpha\) is the
The
Exact CI’s result if the procedure used to estimate the CI is based on a same distribution as that of the value being estimated
Exact CI’s result for complete data sets and some data sets with Type-II right-censored observations
Where
\(\mathcal{F}_{c;d_{1},d_{2}}\) is the value of the F-distribution quantile function
\(c \in [0,1]\) represents the specific quantile value
\(d_{1}\) and \(d_{2}\) are the respective degrees of freedom
Note
The presence of the F-distribution in (3.2) does not preclude the C-P interval from being exact
Rather, the C-P interval is based on the relationship between the binomial and F distributions
For the Plant 1 heat exchanger data at \(t_{i}=2\) we have
Sample size \(n=100\)
Number of failures in \((0,2]\) is \(d=3\)
\(\mathcal{F}_{_{(.975;200-6+2,6)}}=4.8830932\)
\(\mathcal{F}_{_{(.975;6+2,200-6)}}=2.2578325\)
Therefore the \(95\%\) confidence interval for \(F(2)\)
\[ \begin{aligned} \underline{F}(2)&=\left(1+\frac{(100-100\times\frac{3}{100}+1)\mathcal{F}_{_{(.975;200-6+2,6)}}}{100\times \frac{3}{100}}\right)^{-1}=0.0062\\\\\\ \overline{F}(2)&=\left(1+\frac{100-100\times\frac{3}{100}}{(100\times\frac{3}{100}+1)\mathcal{F}_{_{(.975;6+2,200-6)}}}\right)^{-1}=0.0852 \end{aligned} \]
cp.ci<-function(n,d,a) {
Fhat<-d/n
f1<-2*n-2*n*Fhat
f2<-2*n*Fhat
cp.lower<-(1+(n-n*Fhat+1)*qf(1-a/2,f1+2,f2)/(n*Fhat))^-1
cp.upper<-(1+(n-n*Fhat)/((n*Fhat+1)*qf(1-a/2,f2+2,f1)))^-1
return(c(cp.lower,cp.upper)) }
cp.ci(100,3,.05)[1] 0.006229972 0.085176053
There are many methods for constructing confidence intervals - the approach based on the Normal approximation is the most commonly used and is used most often in this course for computing CI for \(F(t_{i})\)
Assumes the sampling errors are normally distributed WRT the binomial point estimate, i.e.
\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_i)-F(t_i)}{\widehat{se}_{\widehat{F}}}\xrightarrow d N(0,1) \text{ as } n\rightarrow\infty \]
Where
\(F(t_{i})\) is the true (but unknown) value of the CDF at time \(t_i\)
\(\widehat{F}(t_{i})=\sum_{j=1}^{i}d_{i}/n\) is the binomial point estimate of \(F(t_{i})\)
\(\widehat{se}_{\widehat{F}}=\sqrt{\widehat{F}(t_{i})[1-\widehat{F}(t_{i})]/n}\) the standard error of \(\widehat{F}\) based on the sample
Consider again the heat exchanger data for Plant 1
At the end of Year 3, five failures were observed, thus
\[ \widehat{F}(3)=\frac{5}{100}=0.05 \;\;\;\text{and}\;\;\; \widehat{se}_{\widehat{F}(3)}=\sqrt{\frac{.05(1-.05)}{100}}=.02179 \]
\[ [\underline{F(3)}, \overline{F(3)}]=.05\pm F^{-1}_{_{NOR}}(P = 0.975)\times .02179 = [.0073,.0927] \]
qnorm(.975)
Compares the CI’s produced by the Binomial and the Normal approximation for the Plant 1 data over years \(1, 2 \;\&\; 3\).
The CI’s produced by the binomial method are more conservative for each time interval, particularly in the upper limit.
The CI’s produced using the Normal approximation method have negative values for the lower limit in the first and second time intervals.
Motivation for this example
As the number of inspections increase, the interval widths \((t_{i-1},t_{i}]\) approach zero
In this experiment \(n=4156\) integrated circuits were tested to failure up to 1370 hours
We dont know how the circuits were tested, just that \(d=28\) ‘exact’ failures were observed
Data set lfp1370 & Nonparametric Plots
The data set and related figures may be seen by running the code below
The presence of ties is a clear indication that this is grouped read-out data as defined in Chapter 1 and the failures were discovered at periodic inspections.
The inspection interval may have been set to every .05 hours at the beginning of the test and then extended as the test progressed.
Units are removed from the test, before observing the event of interest
Units fail due to an unexpected failure modes that are not of interest
Units are removed from the test to fill a real-world need
Units enter the test at different times
An error affects some of the units after testing begins
Example: Pooling the observations from all three plants in the heat exchanger data
par(family = "serif", font = 2, lwd = 2, cex = 1.15)
plot(NA,
axes = FALSE,
xlab = "",
ylab = "",
xlim = c(-50,350),
ylim = c(-10,150))
segments(x0 = c(0,0,0,100,200,300),
y0 = c(0,100,0,0,0,0),
x1 = c(350,350,0,100,200,300),
y1 = c(0,100,100,100,100,100))
text(x = -35, y = 50, labels = "All Plants")
text(x = seq(75, 275, 100),
y = rep(-8, 3),
labels = c("4/300","5/197","2/97"))
text(x = seq(50, 250, 100),
y = rep(108, 3),
labels = c("Year 1","Year 2","Year 3"))
text(x = seq(140, 340, 100),
y = rep(140,3),
labels = c("99","95","95"))
text(x = seq(15, 215, 100),
y = rep(90, 3),
label = c("300","197","97"))
text(x = 75, y = 140, labels = "Uncracked tubes:")
arrows(x0 = seq(100,300,100),
y0 = rep(100,3),
x1 = seq(130,330,100),
y1 = rep(125,3),
length = rep(0.25,3))For singly censored data, \(F(t_{i})\) was estimated assuming that \(n\) was constant
For multiply censored data, the number of units at risk in \((t_{i-1},t_{i}]\) is expressed as
\[ n_{i}=n-\sum_{j=0}^{i-1}d_{j}-\sum_{j=0}^{i-1}r_{j}, i=1,...,m \]
where
\(d_{j} \equiv\) the number of failed units in interval \(j, j=0,...,i-1\)
\(r_{j} \equiv\) the number of censored units in interval \(j, j=0,...,i-1\)
\(m \equiv\) the number of intervals (need not be of equal length)
It follows that \(\widehat{p}_{i}\) (the conditional probability that a unit will fail in interval \(i\), given that it has survived each of the previous intervals) is expressed as
\[ \widehat{p}_{i}=\frac{d_{i}}{n_{i}}, i=1,...,m \]
\[ \begin{aligned} \widehat{S}(t_{i})&=\prod_{j=1}^{i}\left[1-\widehat{p_{j}}\right], i=1,...,m\\\\ \widehat{F}(t_{i})&=1-\prod_{j=1}^{i}\left[1-\widehat{p_{j}}\right], i=1,...,m \end{aligned} \]
par(family = 'serif',font = 2,mar = c(0,0,0,0))
plot(NA,
axes = FALSE,
xlab = '',
ylab = '',
xlim = range(-50,350),
ylim = range(-10,300))
segments( x0 = c(0,0,0,0,0,rep(0,15),100,200,300),
y0 = c(0,100,200,300,0,seq(232,280,12),seq(132,180,12),seq(32,80,12),0,0,0),
x1 = c(350,350,350,350,0,rep(300,5),rep(200,5),rep(100,5),100,200,300),
y1 = c(0,100,200,300,300,seq(232,280,12),seq(132,180,12),seq(32,80,12),300,300,300),
lwd = c(rep(2,5),rep(1,18)))
text(x = rep(-25,3),
y = seq(56,256,100),
c('Plant 3','Plant 2','Plant 1'),
cex=rep(0.9,3))
text(x = c(rep(50,3),rep(150,2),250),
y = c(90,190,290,190,rep(290,2)),
labels = c('1 failure',
'2 failures',
'1 failures',
'3 failures',
'2 failures',
'2 failure'),
cex = rep(0.8,6))
text(x = seq(50,250,100),
y = rep(-8,3),
labels = c('Year 1',
'Year 2',
'Year 3'),
cex = rep(0.9,3))
text(x = c(110,210,310),
y = seq(56,256,100),
labels = c('99',
'95',
'95'),
cex = rep(0.9,3))
segments(x0 = c(120,220,320),
y0 = seq(56,256,100),
x1 = c(345,345,345),
y1 = seq(56,256,100),
lty = rep(2,3))
arrows(x0 = c(rep(295,5),rep(195,5),rep(95,5)),
y0 = c(seq(232,280,12),seq(132,180,12),seq(32,80,12)),
x1 = c(rep(300,5),rep(200,5),rep(100,5)),
length = rep(0.1,15))
arrows(x0 = rep(345,3),
y0 = seq(56,256,100),
x1 = rep(350,3),
length = rep(0.1,3))Figure 1.7 - Diagram of the transformed heatexchanger data
\[ \begin{array}{lrrrrllrr} \hline Year & t_i & d_i & r_i & n_i & p_i & 1-p_i & S(t_i) & F(t_i) \\ \hline (0-1] & 1 & 4 & 99 & 300 & 4/300 & 296/300 & 0.9867 & 0.0133 \\ (1-2] & 2 & 5 & 95 & 197 & 5/197 & 192/197 & 0.9616 & 0.0384 \\ (2-3] & 3 & 2 & 95 & 97 & 2/97 & 95/97 & 0.9418 & 0.0582 \\ \hline \end{array} \]
Pooling the data assumes the usage pattern is consistent across all three plants
If this is a valid assumption, then the “test” begins with an sample set of 300 tubes
For multiply censored data, no method exists to compute exact CI’s
We must rely on normal approximations
In general, the normal approximation will be of the form
\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_{i})-F(t_{i})}{\widehat{se}_{\widehat{F}}} \]
Computing the standard error term in the above equation will require us to use the Delta Method
Suppose we can find the variance of \(\mathbf{\widehat{\theta}}=\widehat{\theta}_{1},...,\widehat{\theta}_{r}\)
But, we want to find the variance of some function of \(\mathbf{\theta}\), say \(g(\mathbf{\widehat{\theta}})\)
\(g(\mathbf{\widehat{\theta}})=\log[\theta]\)
\(g(\mathbf{\widehat{\theta}})=\theta^2\)
If we can find \(\frac{dg(\theta)}{d\theta}\), the Delta Method can help us estimate \(\widehat{Var}[g(\mathbf{\widehat{\theta}})]\) from \(\widehat{Var}[\mathbf{\widehat{\theta}}]\)
\[ Var\left[g(\widehat{\theta})\right]\approx \sum_{i=1}^{r} \left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{i}}\right]^{2} Var(\widehat{\theta_{i}})+\sum_{i=1}^r \mathop{\sum^{r}_{j=1}}_{i\ne j}\left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{i}}\right]\left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{j}}\right] Cov(\widehat{\theta}_{i}, \widehat{\theta}_{j}) \]
\[ \displaystyle \widehat{S}(t_{i})\approx S(t_i)+ \sum_{j=1}^{i}\frac{\partial S}{\partial q_{j}}\vert_{q_{j}}(\widehat{q}_{j}-q_{j}) \]
However, there’s not much detail showing how this result was achieved
The following example walks through the derivation of Equation 3.8
Our goal is to find a expression for \(\widehat{Var}[\widehat{F}(t_i)]\)
From Equation 3.6, we see that \(\widehat{S}(t_i)\) is a function of \(\widehat{p_j}\)
Since \(\widehat{F}(t_{i})=1-\widehat{S}(t_{i})\), we know that \(Var[\widehat{F}(t_{i})]=Var[\widehat{S}(t_{i})]\)
Similarly, \(\widehat{q_j}=1-\widehat{p_j}\) and \(Var[\widehat{q_j}]=Var[\widehat{p_j}]\)
Therefore,
Since \(q_{j}=1-p_{j}\), we know that \(Var[q_{j}]=Var[p_{j}]\)
Recall that \(\widehat{p}=\frac{x}{n}\) where \(x\) is the number of observed “successes”
Also, note that \(\widehat{p}\) is an unbiased estimator for \(p\)
\[E[\widehat{p}]=E\left[\frac{x}{n}\right]=\frac{E[x]}{n}=\frac{np}{n}=p\]
\[ \begin{aligned} Var[\widehat{p}]&=Var[p]\\\\ &=Var\left[\frac{x}{n}\right]=\frac{1}{n^{2}}Var[x]=\frac{Var[x]}{n^{2}}\\\\\\ &=\frac{np(1-p)}{n^{2}}=\frac{p(1-p)}{n}\\ \end{aligned} \]
\[ \begin{aligned} S(t_{i})&=\prod_{j=1}^{i}[1-p_{j}], i=1,...,m \\ &=(1-p_{1})(1-p_{2})...(1-p_{i})\\ &=(q_{1})(q_{2})...(q_{i}) \end{aligned} \]
In this case, it’s easier to compute the derivatives \(\frac{\partial S(t_{i})}{\partial q_{i}}, \;\; i=1,...m\)
We know that \(\forall i \in 1,2,...m\)
\[ \frac{\partial S(t_{i})}{\partial q_{i}} \;=\; \frac{\partial \left( q_{1}q_{2}...q_{i-1}q_{i}\right)}{\partial q_{i}} \;=\; q_{1}q_{2}...q_{i-1} \;=\; \frac{S(t_{i})}{q_{i}} \]
\[ \begin{aligned} Var[S(t_{i})]&=\sum_{j=1}^{i}\left[\frac{\partial S(t_{i})}{\partial q_{j}}\right]^{2}Var(q_{j})\\ &=\sum_{j=1}^{i}\left[\frac{S(t_{i})}{q_{j}}\right]^{2}\frac{p_{j}(1-p_{j})}{n_{j}}\\ &=S(t_{i})^{2}\sum_{j=1}^{i}\frac{p_{j}(1-p_{j})}{n_{j}(1-p_{j})^{2}}\\ &=S(t_{i})^{2}\sum_{j=1}^{i}\frac{p_{j}}{n_{j}(1-p_{j})}\\ \end{aligned} \]
\[ \displaystyle \widehat{Var}\left[\widehat{F}(t_{i})\right]=\widehat{Var}\left[\widehat{S}(t_i)\right]=\left[\widehat{S}(t_{i})\right]^{2}\sum_{j=1}^{i}\frac{\widehat{p}_{j}}{n_{j}(1-\widehat{p_{j}})} \]
\[ \widehat{se}_{\widehat{F}}=\sqrt{\widehat{Var}[\widehat{F}(t_{i})]} \]
\[ \left[\underline{F}(t_i),\overline{F}(t_i)\right]=\widehat{F}(t_i)\pm z_{(1-\alpha/2)}\widehat{se}_{\widehat{F}} \]
\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_i)-F(t_i)}{\widehat{se}_{\widehat{F}}}\xrightarrow d N(0,1) \text{ as } n\rightarrow\infty \]
However, for small sample sizes this assumption may fail if the distribution of \(Z_{\widehat{F}}\) is highly skewed
A better approximation for \(\widehat{F}(t_{i})\) results from a logit transformation such that
\[ Z_{\text{logit}(\widehat{F})}=\frac{\text{logit}(\widehat{F}(t_i))-\text{logit}(F(t_i))}{\widehat{se}_{\text{logit}(\widehat{F})}} \]
\[ \left[\underline{\underline{F}}(t_i),\overline{\overline{F}}(t_i)\right]=\left[\frac{\widehat{F}(t_i)}{\widehat{F}(t_i)+(1-\widehat{F}(t_i)\times w}, \frac{\widehat{F}(t_i)}{\widehat{F}(t_i)+(1-\widehat{F}(t_i)/ w}\right] \]
\[ w=\exp\left[\frac{z_{(1-\alpha/2)}\widehat{se}_{\widehat{F}}}{\widehat{F}(1-\widehat{F})}\right] \]
\[ \begin{array}{lrrrrllrr} \hline Year & t\_i & d\_i & r\_i & n\_i & p\_i & 1-p\_i & S(t\_i) & F(t\_i) \\ \hline (0-1] & 1 & 4 & 99 & 300 & 4/300 & 296/300 & 0.9867 & 0.0133 \\ (1-2] & 2 & 5 & 95 & 197 & 5/197 & 192/197 & 0.9616 & 0.0384 \\ (2-3] & 3 & 2 & 95 & 97 & 2/97 & 95/97 & 0.9418 & 0.0582 \\ \hline \end{array} \]
\[ \begin{aligned} \widehat{Var}(\widehat{F}(1))&=[\widehat{S}(t_{i})]^{2}\sum_{j=1}^{i}\left[\frac{\widehat{p}_{j}}{n_{j}(1-p_{j})}\right]\\ &=(.9867)^{2}\left[\frac{.0133}{300(.9867)}\right]\\ &=.0000438 \end{aligned} \]
The estimated standard error is then found by \(\widehat{se}_{\widehat{F}}=\sqrt{.0000438}=.00662\)
And the \(95\%\) pointwise CI is
\[ \left[\underline{F}(1),\overline{F}(1)\right]=.0133\pm 1.960\times .00662=[.0003, .0263] \]
\[ \begin{aligned} \left[\underline{\underline{F}}(1),\overline{\overline{F}}(1)\right]&=\left[\frac{.0133}{.0133+(1-.0133)\times 2.6878}, \frac{.0133}{.0133+(1-.0133)/ 2.6878}\right]\\\\\\ &=[.0050, .0350]\\\\ \end{aligned} \]
\[ w=\exp\left[\frac{1.960(.00662)}{.0133(1-.0133)}\right]=2.86687816 \]
This output may then be converted to a \(LaTeX\) table using
\[ \begin{array}{rrrrrr} \hline Years-lower & Years-upper & Fhat & SE\_Fhat & 95\% Lower & 95\% Upper \\ \hline 1 & 1 & 0.01333 & 0.00662 & 0.00501 & 0.03498 \\ 2 & 2 & 0.03838 & 0.01280 & 0.01982 & 0.07302 \\ 3 & 3 & 0.05820 & 0.01870 & 0.03069 & 0.10763 \\ \hline \end{array} \]
“Exact” failures are an abstraction, but can be considered to have occurred in reality if
Test units are under continuous inspection (some electronic systems)
A large number of inspections are performed at closely spaced intervals
In a test with exact failures, most inspections will not discover a failure, therefore
The value of \(F(t)\) will not change
The plot of \(\widehat{F}(t)\) will appear as a step-function
ShockAbsorber.ld <- frame.to.ld(shockabsorber,
response.column = 1,
censor.column = 3,
time.units = 'Kilometers')
plot(ShockAbsorber.ld, band.type = 'Pointwise')Quantify the sampling uncertainty about \(F(t_{i}\) at a set of individual points
Have a collective coverage probability that’s less than any individual interval
Quantify the sampling uncertainty along the entire range of \(F(t_{i})\)
Are generally wider than pointwise confidence intervals
More accurately reflect the uncertainty over the displayed time range
Should be used with the logit transformation
An approximate \(100(1-\alpha)\%\) simultaneous CI for \(F(t)\) is obtained from
\[ \left[\underline{\underline{F}}(t),\overline{\overline{F}}(t)\right]=\widehat{F}(t)\pm (e_{a,b,1-\alpha/2)}\widehat{se}_{\widehat{F}(t)}, \;\;\; \forall t\in [t_{L}(a), t_{U}(b)] \]
where the range \([t_{L}(a), t_{U}(b)]\) is a function of the censoring pattern in the data
Note that \(e_{a,b,1-\alpha/2)}\) replaces
\[ \left[\underline{\underline{F}}(t),\overline{\overline{F}}(t)\right]=\left[\frac{\widehat{F}(t)}{\widehat{F}(t)+[1-\widehat{F}(t)]\times w},\frac{\widehat{F}(t)}{\widehat{F}(t)+[1-\widehat{F}(t)]/w}\right] \]