Statistical Methods for Reliability Data

Chapter 3 - Nonparametric Estimation

W. Q. Meeker and L. A. Escobar

20 May 2020

OVERVIEW

This chapter explains…

3.1 - INTRODUCTION

Nonparametric estimation

\[ \widehat{F}(t), \widehat{S}(t), se(\widehat{F}(t)) \]

Nonparametric techniques

3.2 - ESTIMATION FROM SINGLY CENSORED DATA

Singly censored data

par(mar = c(0,0,0,0)) ; set.seed(42)
plot(NA,
     axes = F,
     xlab = "",
     ylab = "",
     xlim = range(0,105),
     ylim = range(16,18))
segments(x0 = c(0,70),
         y0 = c(18,18),
         x1 = c(0,70),
         y1 = c(17,17),
         lwd = 2)
arrows(x0 = 0,
       y0 = 17.5,
       x1 = 100,
       y1 = 17.5,
       lwd = 2)
text(x = c(0,70,104),
     y = c(16.8,16.8,17.5),
     labels = c(0,
                expression(t[c]),
                expression(infinity)),
     cex=c(1.5,1.5,2))
text(x = c(runif(20,5,65),runif(5,75,95)),
     y = c(rep(17.5,25)),
     labels = c(rep("x",20),rep("?",5)), 
     col = c(rep(1,20),rep(2,5)), 
     cex = 2)

par(mar = c(0,0,0,0)) ; set.seed(42)
plot(NA,
     axes = FALSE,
     xlab = "",
     ylab = "",
     xlim = c(0,105),
     ylim = c(16,18))
segments(x0 = c(0,70),
         y0 = c(18,18),
         x1 = c(0,70),
         y1 = c(17,17),
         lwd = 2)
arrows(x0 = 0,
       y0 = 17.5,
       x1 = 100,
       y1 = 17.5,
       lwd = 2)
text(x = c(0,70,104,runif(25,5,65)),
     y = c(16.8,16.8,17.5,rep(17.5,25)),
     labels = c(0,expression(t[c]),
                expression(infinity),
                rep("?",25)),
     cex = c(1.5,1.5,rep(2,26)),
     col = c(1,1,rep('blue',26)))

3.2 - ESTIMATION FROM SINGLY CENSORED DATA - Example

Example 3.1 & 3.2 - Plant 1 Heat Exchanger Data

par(family = 'serif', font = 2)

plot(NA,
     axes = FALSE,
     xlab = '',
     ylab = '',
     xlim = range(-50,350),
     ylim = range(-10,150))

segments( x0 = c(0,0,0,100,200,300),
          y0 = c(50,100,50,20,20,20),
          x1 = c(350,350,0,100,200,300),
          y1 = c(50,100,100,100,100,100),
          lwd = c(rep(2,6)))

text(x = rep(-35,6),
     y = c(seq(27,127,50),seq(17,117,100),6),
     labels = c('Unconditional',
                'Plant 1',
                '100 tubes',
                'Failure',
                'at start',
                'Probability'),
     cex = c(1,1.15,1,1,1,1))

text(x = c(seq(50,250,100),325),
     y = rep(20,4),
     labels = c(expression(pi[1]),
                expression(pi[2]),
                expression(pi[3]),
                expression(pi[4])),
     cex = 1.5)

text(x = seq(50,250,100),
     y = rep(108,3),
     labels = c('Year 1','Year 2','Year 3'),
     cex = rep(1.15,3))

text(x = c(150,320,320),
     y = c(rep(140,2),130),
     labels = c('Cracked tubes','Uncracked','tubes'),
     cex = rep(1,3))

text(x = c(seq(50,250,100),325),
     y = rep(77,4),
     labels = c('1','2','2','95'),
     cex = rep(1.5,4))
Figure 3.1 - Diagram of the Plant 1 data from the heatexchanger data set

Figure 3.1 - Diagram of the Plant 1 data from the heatexchanger data set

library(SMRD)
library(DT)

DT::datatable(subset(SMRD::heatexchanger, plant == "Plant1"))

\[ f(x;p,n) = \left( \begin{array}{c} n \\ x \end{array} \right)(p)^{x}(1 - p)^{(n-x)} \;\;\;\;\;\; \mbox{for x = 0, 1, 2,..., n} \]

\[\hat{p}_{_{MLE}}=\frac{x}{n}\]

par(family = 'serif', font = 2, cex = 1.15)

library(package = SMRD)

HE.ld <-
frame.to.ld(subset(SMRD::heatexchanger, plant == 'Plant1'),
            response.column = c(1,2),
            censor.column = 3,
            case.weight.column = 4,
            time.units = 'Years')

plot(HE.ld,
     band.type = 'Pointwise',
     ylim = c(0,.2),
     xlim = c(0,3))
Figure 3.2 - Nonparametric CDF plot of the heatexchanger data set (Plant 1 only)

Figure 3.2 - Nonparametric CDF plot of the heatexchanger data set (Plant 1 only)

3.2 - ESTIMATION FROM SINGLY CENSORED DATA

Recall, the cdf \(F(t)\) is defined everywhere in \(\mathbb{R}\), i.e. \((-\infty, \infty)\)

3.3.2 - Confidence Intervals

Sampling errors

Sources of uncertainty

3.3.2 - Confidence Intervals - Example

Understanding Confidence Intervals

teachingApps::teachingApp('confidence_intervals')

Exact CI vs Approximate CI

3.4.1 - Pointwise binomial-based CI for \(F(t_i)\)

Clopper-Pearson CI For Proportions

\[ (\underline{F}(t_i),\overline{F}(t_i))=\left(1+\frac{(n-n\widehat{F}+1)\mathcal{F}_{(1-\alpha/2,2n-2n\widehat{F}+2,2n\widehat{F})}}{n\widehat{F}}\right)^{-1}, \left(1+\frac{n-n\widehat{F}}{(n\widehat{F}+1)\mathcal{F}_{(1-\alpha/2,2n\widehat{F}+2,2n-2n\widehat{F})}}\right)^{-1} \]

3.4.1 - Pointwise binomial-based CI for \(F(t_i)\) - Example

Example 3.3 - Binomial CI for \(F(t_i)\)

\[ \begin{aligned} \underline{F}(2)&=\left(1+\frac{(100-100\times\frac{3}{100}+1)\mathcal{F}_{_{(.975;200-6+2,6)}}}{100\times \frac{3}{100}}\right)^{-1}=0.0062\\\\\\ \overline{F}(2)&=\left(1+\frac{100-100\times\frac{3}{100}}{(100\times\frac{3}{100}+1)\mathcal{F}_{_{(.975;6+2,200-6)}}}\right)^{-1}=0.0852 \end{aligned} \]

cp.ci<-function(n,d,a)  {
    Fhat<-d/n
    f1<-2*n-2*n*Fhat
    f2<-2*n*Fhat
    cp.lower<-(1+(n-n*Fhat+1)*qf(1-a/2,f1+2,f2)/(n*Fhat))^-1
    cp.upper<-(1+(n-n*Fhat)/((n*Fhat+1)*qf(1-a/2,f2+2,f1)))^-1
return(c(cp.lower,cp.upper))  }

cp.ci(100,3,.05)
[1] 0.006229972 0.085176053

3.4.2 - Pointwise Normal-approximate CI for \(F(t_i)\)

Confidence Intervals Based on the Normal Distribution

\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_i)-F(t_i)}{\widehat{se}_{\widehat{F}}}\xrightarrow d N(0,1) \text{ as } n\rightarrow\infty \]

3.4.2 - Pointwise Normal-approximate CI for \(F(t_i)\) - Example

Example 3.4 - Normal-approx CI for Plant 1 \(F(t_i)\)

\[ \widehat{F}(3)=\frac{5}{100}=0.05 \;\;\;\text{and}\;\;\; \widehat{se}_{\widehat{F}(3)}=\sqrt{\frac{.05(1-.05)}{100}}=.02179 \]

\[ [\underline{F(3)}, \overline{F(3)}]=.05\pm F^{-1}_{_{NOR}}(P = 0.975)\times .02179 = [.0073,.0927] \]

\(F^{-1}_{_{NOR}}(P = .975)\equiv z_{.975}\equiv\) qnorm(.975)

Table 3.1

3.4.2 - Pointwise Normal-approximate CI for \(F(t_i)\) - Example 3.5

Integrated Circuit Life Test Data

SMRD::smrd_app('lfp1370_data')

3.5 - ESTIMATION FROM MULTIPLY CENSORED DATA

Multiply censored data results when

Figure 3.4 - Pooled heat exchanger data

par(family = "serif", font = 2, lwd = 2, cex = 1.15)
plot(NA,
     axes = FALSE,
     xlab = "",
     ylab = "",
     xlim = c(-50,350),
     ylim = c(-10,150))
segments(x0 = c(0,0,0,100,200,300),
         y0 = c(0,100,0,0,0,0),
         x1 = c(350,350,0,100,200,300),
         y1 = c(0,100,100,100,100,100))
text(x = -35, y = 50, labels = "All Plants")
text(x = seq(75, 275, 100),
     y = rep(-8, 3),
     labels = c("4/300","5/197","2/97"))
text(x = seq(50, 250, 100),
     y = rep(108, 3),
     labels = c("Year 1","Year 2","Year 3"))
text(x = seq(140, 340, 100),
     y = rep(140,3),
     labels = c("99","95","95"))
text(x = seq(15, 215, 100),
     y = rep(90, 3),
     label = c("300","197","97"))
text(x = 75, y = 140, labels = "Uncracked tubes:")
arrows(x0 = seq(100,300,100),
       y0 = rep(100,3),
       x1 = seq(130,330,100),
       y1 = rep(125,3),
       length = rep(0.25,3))

3.5 - Estimation from multiply censored data

\[ n_{i}=n-\sum_{j=0}^{i-1}d_{j}-\sum_{j=0}^{i-1}r_{j}, i=1,...,m \]

\[ \widehat{p}_{i}=\frac{d_{i}}{n_{i}}, i=1,...,m \]

\[ \begin{aligned} \widehat{S}(t_{i})&=\prod_{j=1}^{i}\left[1-\widehat{p_{j}}\right], i=1,...,m\\\\ \widehat{F}(t_{i})&=1-\prod_{j=1}^{i}\left[1-\widehat{p_{j}}\right], i=1,...,m \end{aligned} \]

Fig 1.7 - Heat exchanger crack inspection data

par(family = 'serif',font = 2,mar = c(0,0,0,0))

plot(NA,
     axes = FALSE,
     xlab = '',
     ylab = '',
     xlim = range(-50,350),
     ylim = range(-10,300))

segments( x0 = c(0,0,0,0,0,rep(0,15),100,200,300),
          y0 = c(0,100,200,300,0,seq(232,280,12),seq(132,180,12),seq(32,80,12),0,0,0),
          x1 = c(350,350,350,350,0,rep(300,5),rep(200,5),rep(100,5),100,200,300),
          y1 = c(0,100,200,300,300,seq(232,280,12),seq(132,180,12),seq(32,80,12),300,300,300),
          lwd = c(rep(2,5),rep(1,18)))

text(x = rep(-25,3),
     y = seq(56,256,100),
     c('Plant 3','Plant 2','Plant 1'),
     cex=rep(0.9,3))

text(x = c(rep(50,3),rep(150,2),250),
     y = c(90,190,290,190,rep(290,2)),
     labels = c('1 failure',
                '2 failures',
                '1 failures',
                '3 failures',
                '2 failures',
                '2 failure'),
     cex = rep(0.8,6))

text(x = seq(50,250,100),
     y = rep(-8,3),
     labels = c('Year 1',
                'Year 2',
                'Year 3'),
     cex = rep(0.9,3))

text(x = c(110,210,310),
     y = seq(56,256,100),
     labels = c('99',
                '95',
                '95'),
     cex = rep(0.9,3))

segments(x0 = c(120,220,320),
         y0 = seq(56,256,100),
         x1 = c(345,345,345),
         y1 = seq(56,256,100),
         lty = rep(2,3))

arrows(x0 = c(rep(295,5),rep(195,5),rep(95,5)),
       y0 = c(seq(232,280,12),seq(132,180,12),seq(32,80,12)),
       x1 = c(rep(300,5),rep(200,5),rep(100,5)),
       length = rep(0.1,15))

arrows(x0 = rep(345,3),
       y0 = seq(56,256,100),
       x1 = rep(350,3),
       length = rep(0.1,3))
Figure 1.7 - Diagram of the transformed heatexchanger data

Figure 1.7 - Diagram of the transformed heatexchanger data

Table 3.2 - Pooled Heat Exchanger Data

\[ \begin{array}{lrrrrllrr} \hline Year & t_i & d_i & r_i & n_i & p_i & 1-p_i & S(t_i) & F(t_i) \\ \hline (0-1] & 1 & 4 & 99 & 300 & 4/300 & 296/300 & 0.9867 & 0.0133 \\ (1-2] & 2 & 5 & 95 & 197 & 5/197 & 192/197 & 0.9616 & 0.0384 \\ (2-3] & 3 & 2 & 95 & 97 & 2/97 & 95/97 & 0.9418 & 0.0582 \\ \hline \end{array} \]

3.6 - POINTWISE CI FROM MULTIPLY CENSORED DATA

\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_{i})-F(t_{i})}{\widehat{se}_{\widehat{F}}} \]

The Delta Method (B.2)

Using The Delta Method Requires 4 Things

  1. A parameter for which we know the variance - \(\theta\)

  2. The variance of the paramter - \(Var[\theta]\)

  3. A function of the parameter - \(g(\theta)\)

  4. The partial derivatives - \(\frac{\partial g(\theta)}{\partial \theta_{i}}, \;\; i=1,...m\)

The General Delta Method Equation

\[ Var\left[g(\widehat{\theta})\right]\approx \sum_{i=1}^{r} \left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{i}}\right]^{2} Var(\widehat{\theta_{i}})+\sum_{i=1}^r \mathop{\sum^{r}_{j=1}}_{i\ne j}\left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{i}}\right]\left[\frac{\partial g(\mathbf{\theta})}{\partial \theta_{j}}\right] Cov(\widehat{\theta}_{i}, \widehat{\theta}_{j}) \]

3.6.1 - Approximate Variance of \(\widehat{F}(t_{i})\) - Example

Example - Derivation of Equation 3.8

\[ \displaystyle \widehat{S}(t_{i})\approx S(t_i)+ \sum_{j=1}^{i}\frac{\partial S}{\partial q_{j}}\vert_{q_{j}}(\widehat{q}_{j}-q_{j}) \]

First, We Must Identify The Parameter \(\theta\)

Next, Find The Variance of The Parameter \(Var[\theta]\)

\[E[\widehat{p}]=E\left[\frac{x}{n}\right]=\frac{E[x]}{n}=\frac{np}{n}=p\]

\[ \begin{aligned} Var[\widehat{p}]&=Var[p]\\\\ &=Var\left[\frac{x}{n}\right]=\frac{1}{n^{2}}Var[x]=\frac{Var[x]}{n^{2}}\\\\\\ &=\frac{np(1-p)}{n^{2}}=\frac{p(1-p)}{n}\\ \end{aligned} \]

Now, What is The Function of the Parameter\(g(\theta)?\)

\[ \begin{aligned} S(t_{i})&=\prod_{j=1}^{i}[1-p_{j}], i=1,...,m \\ &=(1-p_{1})(1-p_{2})...(1-p_{i})\\ &=(q_{1})(q_{2})...(q_{i}) \end{aligned} \]

Finally, What is \(\frac{\partial g(\theta_{i})}{\partial \theta_{i}}, \;\; i=1,...m\)?

\[ \frac{\partial S(t_{i})}{\partial q_{i}} \;=\; \frac{\partial \left( q_{1}q_{2}...q_{i-1}q_{i}\right)}{\partial q_{i}} \;=\; q_{1}q_{2}...q_{i-1} \;=\; \frac{S(t_{i})}{q_{i}} \]

Putting Everything Together

\[ \begin{aligned} Var[S(t_{i})]&=\sum_{j=1}^{i}\left[\frac{\partial S(t_{i})}{\partial q_{j}}\right]^{2}Var(q_{j})\\ &=\sum_{j=1}^{i}\left[\frac{S(t_{i})}{q_{j}}\right]^{2}\frac{p_{j}(1-p_{j})}{n_{j}}\\ &=S(t_{i})^{2}\sum_{j=1}^{i}\frac{p_{j}(1-p_{j})}{n_{j}(1-p_{j})^{2}}\\ &=S(t_{i})^{2}\sum_{j=1}^{i}\frac{p_{j}}{n_{j}(1-p_{j})}\\ \end{aligned} \]

Greenwood’s formula

\[ \displaystyle \widehat{Var}\left[\widehat{F}(t_{i})\right]=\widehat{Var}\left[\widehat{S}(t_i)\right]=\left[\widehat{S}(t_{i})\right]^{2}\sum_{j=1}^{i}\frac{\widehat{p}_{j}}{n_{j}(1-\widehat{p_{j}})} \]

\[ \widehat{se}_{\widehat{F}}=\sqrt{\widehat{Var}[\widehat{F}(t_{i})]} \]

3.6.3 - Pointwise Normal-Approximate CI for \(F(t_i)\)

\[ \left[\underline{F}(t_i),\overline{F}(t_i)\right]=\widehat{F}(t_i)\pm z_{(1-\alpha/2)}\widehat{se}_{\widehat{F}} \]

\[ Z_{\widehat{F}}=\frac{\widehat{F}(t_i)-F(t_i)}{\widehat{se}_{\widehat{F}}}\xrightarrow d N(0,1) \text{ as } n\rightarrow\infty \]

\[ Z_{\text{logit}(\widehat{F})}=\frac{\text{logit}(\widehat{F}(t_i))-\text{logit}(F(t_i))}{\widehat{se}_{\text{logit}(\widehat{F})}} \]

\[ \left[\underline{\underline{F}}(t_i),\overline{\overline{F}}(t_i)\right]=\left[\frac{\widehat{F}(t_i)}{\widehat{F}(t_i)+(1-\widehat{F}(t_i)\times w}, \frac{\widehat{F}(t_i)}{\widehat{F}(t_i)+(1-\widehat{F}(t_i)/ w}\right] \]

\[ w=\exp\left[\frac{z_{(1-\alpha/2)}\widehat{se}_{\widehat{F}}}{\widehat{F}(1-\widehat{F})}\right] \]

Example 3.7 - CI for Pooled Heat Exchanger Data

Recall Table 3.2

\[ \begin{array}{lrrrrllrr} \hline Year & t\_i & d\_i & r\_i & n\_i & p\_i & 1-p\_i & S(t\_i) & F(t\_i) \\ \hline (0-1] & 1 & 4 & 99 & 300 & 4/300 & 296/300 & 0.9867 & 0.0133 \\ (1-2] & 2 & 5 & 95 & 197 & 5/197 & 192/197 & 0.9616 & 0.0384 \\ (2-3] & 3 & 2 & 95 & 97 & 2/97 & 95/97 & 0.9418 & 0.0582 \\ \hline \end{array} \]

\[ \begin{aligned} \widehat{Var}(\widehat{F}(1))&=[\widehat{S}(t_{i})]^{2}\sum_{j=1}^{i}\left[\frac{\widehat{p}_{j}}{n_{j}(1-p_{j})}\right]\\ &=(.9867)^{2}\left[\frac{.0133}{300(.9867)}\right]\\ &=.0000438 \end{aligned} \]

\[ \left[\underline{F}(1),\overline{F}(1)\right]=.0133\pm 1.960\times .00662=[.0003, .0263] \]

\[ \begin{aligned} \left[\underline{\underline{F}}(1),\overline{\overline{F}}(1)\right]&=\left[\frac{.0133}{.0133+(1-.0133)\times 2.6878}, \frac{.0133}{.0133+(1-.0133)/ 2.6878}\right]\\\\\\ &=[.0050, .0350]\\\\ \end{aligned} \]

\[ w=\exp\left[\frac{1.960(.00662)}{.0133(1-.0133)}\right]=2.86687816 \]

This output may then be converted to a \(LaTeX\) table using

xtable::xtable(cdf_table, 
               include.rownames = FALSE, 
               digits = c(0,0,0,5,5,5,5))

\[ \begin{array}{rrrrrr} \hline Years-lower & Years-upper & Fhat & SE\_Fhat & 95\% Lower & 95\% Upper \\ \hline 1 & 1 & 0.01333 & 0.00662 & 0.00501 & 0.03498 \\ 2 & 2 & 0.03838 & 0.01280 & 0.01982 & 0.07302 \\ 3 & 3 & 0.05820 & 0.01870 & 0.03069 & 0.10763 \\ \hline \end{array} \]

3.7 - Estimation From Multiply Censored Data With Exact Failures

Multiply Censored Data With Exact Failures

Example 3.9 - Shock Absorber Data

Table 3.4

ShockAbsorber.ld <- frame.to.ld(shockabsorber, 
                                response.column = 1, 
                                censor.column = 3,
                                time.units = 'Kilometers')
plot(ShockAbsorber.ld, band.type = 'Pointwise')

\[ \begin{array}{rrrrrr} \hline Kilometers-lower & Kilometers-upper & Fhat & SE\_Fhat & 95\% Lower & 95\% Upper \\ \hline 0 & 6700 & 0.00000 & 0.00000 & 0.00000 & 0.00000 \\ 6700 & 9120 & 0.02632 & 0.02597 & 0.00369 & 0.16457 \\ 9120 & 12200 & 0.05495 & 0.03783 & 0.01376 & 0.19513 \\ 12200 & 13150 & 0.09130 & 0.05093 & 0.02929 & 0.25073 \\ 13150 & 14300 & 0.12916 & 0.06128 & 0.04851 & 0.30143 \\ 14300 & 17520 & 0.17271 & 0.07205 & 0.07210 & 0.35933 \\ 17520 & 20100 & 0.21625 & 0.08034 & 0.09826 & 0.41130 \\ 20100 & 20900 & 0.28156 & 0.09661 & 0.13321 & 0.49984 \\ 20900 & 22700 & 0.37137 & 0.11918 & 0.17844 & 0.61638 \\ 22700 & 26510 & 0.46117 & 0.13171 & 0.23246 & 0.70749 \\ 26510 & 27490 & 0.56894 & 0.14281 & 0.29656 & 0.80515 \\ 27490 & 28100 & 0.71262 & 0.15109 & 0.36869 & 0.91327 \\ \hline \end{array} \]

3.8 - SIMULTANEOUS CONFIDENCE BANDS

Pointwise confidence intervals \((\S \;\;\text{3.6.3})\)

Simultaneous confidence intervals

Simultaneous & Pointwise CI’s

3.8.2 - Large-Sample Simultaneous CI for \(F(t)\)

An approximate \(100(1-\alpha)\%\) simultaneous CI for \(F(t)\) is obtained from

\[ \left[\underline{\underline{F}}(t),\overline{\overline{F}}(t)\right]=\widehat{F}(t)\pm (e_{a,b,1-\alpha/2)}\widehat{se}_{\widehat{F}(t)}, \;\;\; \forall t\in [t_{L}(a), t_{U}(b)] \]

where the range \([t_{L}(a), t_{U}(b)]\) is a function of the censoring pattern in the data

Note that \(e_{a,b,1-\alpha/2)}\) replaces

\[ \left[\underline{\underline{F}}(t),\overline{\overline{F}}(t)\right]=\left[\frac{\widehat{F}(t)}{\widehat{F}(t)+[1-\widehat{F}(t)]\times w},\frac{\widehat{F}(t)}{\widehat{F}(t)+[1-\widehat{F}(t)]/w}\right] \]

Table 3.5 - Factors for Simultaneous CI

3.8.3 - Time Range for Simultaneous CI for \(F(t)\)

more to come