OVERVIEW

This chapter explains...

Applications of probability plots
Basic probability plotting concepts for both complete and censored data
How to analytically linearize a cdf on special plotting scales
How to plot a nonparametric estimate of \(F(t)\) on probability paper and how to use such a plot to judge the adequacy of a particular parametric distribution
Analytical and simulation methods of separating useful information from "noise" when using a probability plot to assess the reasonableness of a particular model
Graphical estimates of important reliability characteristics like failure probabilities and distribution quantiles

6.1 - INTRODUCTION

Applications Of Probability Plots

Reveal information about a population, a process, or data that might otherwise escape detection
Assess the adequacy of a particular distributional model
Provide nonparametric graphical estimates of probabilities and distribution quantiles
Obtain graphical estimates of parametric model parameters (e.g. by fitting a straight line through the points on a probability plot)
Display the results of a parametric maximum likelihood fit along with the data

6.1.2 - Probability Plotting Scales: Linearizing A CDF

CDF's from different distributions can look similar on plots

Probability plots are nonparametric tools that should be used early on to eliminate poorly fitting models
Probability plots use transformed axes to plot a linearized cdf vs \(t\)

teachingApps::teachingApp('probability_plotting')

6.2 - LINEARIZING LOCATION-SCALE BASED DISTRIBUTIONS

Basic Idea Behind The Process

Identify the event times and sort them
- If we're trying to model data - then we should already have data
- Data should be sorted smallest to largest
- In R, we can sort data using sort( )
Invert and rearrange \(F(t_{p})\) to obtain a linear function of the form

\[ \begin{aligned} y&=m(x)\;+b\\\\ F^{-1}(p)&=m(t_{p})+b \end{aligned} \]

Figure out some a way to come up with values for \(p\) to use in the plot
- The values are called plotting positions
- We'll discuss plotting positions at length in a minute
- For now, we'll use what is known as the Hazen plotting position which is expressed as

\[\hat{p}_i=\frac{i-0.5}{n}\]

Fake Data Vs. Real Data

Fake Data	Real Data

The Figures 6.1-6.5 show probability plots using ideal data (aka cheater data) where every observation falls on the line
- This will never happen with real data
- If you ever present a probability plot that looks like these figures - I'll know your're lying

6.2.1 - Linearizing The Exponential CDF

Steps to Create An exponential Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the exponetial CDF
- For \(T\sim EXP(\theta,\gamma)\) the CDF is \[F(t_{p})=p=1-\exp\left(-\frac{t_{p}-\gamma}{\theta}\right),\quad t_{p}\in [0,\infty)\]
- Inverting and rearranging the exponential CDF gives the following linear equation \[ \begin{aligned} p&=1-\exp\left(-\frac{t_{p}-\gamma}{\theta}\right)\\ 1-p&=\exp\left(-\frac{t_{p}-\gamma}{\theta}\right)\\ \ln\left[\frac{1}{1-p}\right]&=\frac{t_{p}}{\theta}- \frac{\gamma}{\theta} \end{aligned} \]
Plot the points \(\left(x = t_i,y = \log\left[\frac{1}{1-p_i}\right]\right),\;i=1,...,n\)
If the points roughly fall on a straight line the data may follow an exponential distribution
- \(\theta \equiv\;\) is the slope of this line, i.e. \(\displaystyle m = \frac{1}{\theta}\)
- \(\gamma \equiv\;\) the value of the \(y\) intercept, i.e. \(\displaystyle b=\frac{\gamma}{\theta}\)

Figure 6.1 - Exponential Probability Plot

par(family='serif',font = 2)

p <- seq(.01,.99,.01)
gamma1 <- 0
theta1 <- 50

gamma2 <- 0
theta2 <-200

plot(qexp(p,rate=1/theta1) - gamma1,-log(1-p),
     type='l', las = 1,
     lwd=2, col=1,
     xlim=range(0,800),
     xlab='Time',
     ylab='Standard Exponential Quantile')

lines(qexp(p,rate=1/theta2) - gamma2, -log(1-p),
      lwd=2, col=2)

text(c(100,650),c(4,2.5), 
     c(expression(theta*', '*gamma*' = 50, 0'),
       expression(theta*', '*gamma*' = 200, 0')))
box(lwd=1.25)

6.2.2 - Linearizing The Normal CDF

Steps To Create a Normal Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the normal CDF
- For \(T\sim NOR(\mu,\sigma)\) the CDF is \[F(t_{p})=p=\Phi_{NOR}\left[\frac{t_{p}-\mu}{\sigma}\right] \;\;t_{p}\in [-\infty,\infty)\]
- Inverting this CDF gives the normal quantile function \[t_p=F^{-1}(p)=\Phi^{-1}_{NOR}(p)\sigma +\mu\]
- Rearranging this expression as a linear equation gives \[\Phi_{NOR}^{-1}(p)=\frac{t_{p}}{\sigma}-\frac{\mu}{\sigma}\]
Plot the points \(\left(x = t_i,y = \Phi_{NOR}^{-1}(p_i)\right),\;i=1,...,n\)
If the points roughly fall on a straight line the data may follow a normal distribution where
- \(\sigma \equiv\;\) the slope of this line, i.e. \(\displaystyle m=\left(\frac{1}{\sigma}\right)\)
- \(\mu \equiv\;\) the \(x\)-value where the line crosses \(\displaystyle \Phi_{NOR}^{-1}(p) = 0\)

Figure 6.2 - Normal Probability Plot

par(family='serif', font=2)

p <- seq(.01,.99,.01)
mean1 <- 40
sd1 <- 10
mean2 <- 80
sd2 <- 5

plot(qnorm(p, mean = mean1, sd = sd1), qnorm(p),
     type='l',
     lwd=2, col=1,
     xlim=range(0,100),
     las = 1,
     xlab='Time',
     ylab='Standard Normal Quantile')

lines(qnorm(p,mean = mean2, sd = sd2), qnorm(p), lwd=2, col=2)
abline(h=0, lty=2)

text(c(15, 85), c(-1,-2), 
     c(expression(mu*', '*sigma*' = 40, 10'),
       expression(mu*', '*sigma*' = 80, 5')))
box(lwd=1.25)

6.2.3 - Linearizing The Lognormal CDF

Steps To Create A Lognormal Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the lognormal CDF
- For \(T\sim LOGNOR(\mu,\sigma)\) the CDF is \[F(t_{p})=p=\Phi_{NOR}\left[\frac{\log(t_{p})-\mu}{\sigma}\right] \;\;t_{p}\in [0,\infty)\]
- Inverting this CDF gives the lognormal quantile function \[\log[t_p]=F^{-1}(p)=\Phi^{-1}_{NOR}\sigma+\mu\]
- Rearranging this expression as a linear equation gives \[\Phi_{NOR}^{-1}(p)=\frac{\log[t_{p}]}{\sigma}-\frac{\mu}{\sigma}\]
Plot the points \(\left(x = \log[t_i], y =\Phi_{NOR}^{-1}(p_i)\right),\;i=1,...,n\)
If the points roughly fall on a straight line, the data may follow an lognormal distribution
- \(\log[\sigma] \equiv\;\) the slope of this line i.e. \(\displaystyle m = \left(\frac{1}{\sigma}\right)\)
- \(\log[\mu] \equiv\;\) the \(x\)-value where the line crosses \(\displaystyle \Phi_{NOR}^{-1}(p) = 0\)

Figure 6.3 - Lognormal Probability Plot

par(family='serif',font=2)
p <- seq(.01,.99,.01)
meanlog1 <- log(50)
sdlog1 <- 2
meanlog2 <- log(500)
sdlog2 <- 2
meanlog3 <- log(500)
sdlog3 <- 1

plot(qlnorm(p,meanlog = meanlog1, 
              sdlog = sdlog1),
     qnorm(p),
     type='l',
     lwd=2,
     xlim=c(1,1000),
     las = 1,
     xlab='Time',
     ylab='Standard Normal Quantile',
     log='x')

lines(qlnorm(p,meanlog = meanlog2, sdlog = sdlog2), 
      qnorm(p), lwd=2, col=2)

lines(qlnorm(p,meanlog = meanlog3, sdlog = sdlog3), 
      qnorm(p), lwd=2, col=3)
abline(h=0, lty=2)

box(lwd=1.25)

legend('topleft',
       c(expression('exp('*mu*'),'*sigma*' = 50,2'),
         expression('exp('*mu*'),'*sigma*' = 500,2'),
         expression('exp('*mu*'),'*sigma*' = 500,1')),
       lty = 1,col = c(1:3),lwd = 2, bty = 'n')

6.2.2 - Linearizing The Smallest Extreme Value CDF

Steps To Create An SEV Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the smallest extreme value CDF
- For \(T\sim SEV(\mu,\sigma)\) the CDF is \[F(t_{p})=p=\Phi_{SEV}\left[\frac{t_{p}-\mu}{\sigma}\right] \;\;t_{p}\in [-\infty,\infty)\]
- Inverting this CDF gives the SEV quantile function \[t_p=F^{-1}(p)=\Phi^{-1}_{SEV}(p)\sigma +\mu\]
- Rearranging this expression as a linear equation gives \[\Phi_{SEV}^{-1}(p)=\frac{t_{p}}{\sigma}-\frac{\mu}{\sigma}\]
Plot the points \(\left(x = t_i,y = \Phi_{SEV}^{-1}(p_i)\right),\;i=1,...,n\)
If the points roughly fall on a straight line the data may follow an SEV distribution
- \(\sigma \equiv\;\) the slope of this line i.e. \(\displaystyle m =\left(\frac{1}{\sigma}\right)\)
- \(\mu \equiv\;\) the \(x\)-value where the line crosses \(\displaystyle \Phi_{SEV}^{-1}(p) = 0\)

Figure 6.4 - Smallest Extreme Value Probability Plot

par(family='serif', font=2)

library(SMRD)

p<- seq(.01,.99,.01)
mean1 <- 40
sdev1 <- 10
mean2 <- 80
sdev2 <- 5

plot(qsev(p, location=mean1, scale=sdev1), qsev(p),
     type='l',
     lwd=2, col=1,
     xlim=range(0,100),
     las = 1,
     xlab='Time',
     ylab='Standard SEV Quantile')

lines(qsev(p,location=mean2, scale=sdev2), qsev(p), lwd=2, col=2)
abline(h=0, lty=2)

text(c(15, 85), c(-1,-2), 
     c(expression(mu*', '*sigma*' = 40, 10'),
       expression(mu*', '*sigma*' = 80, 5')))
box(lwd=1.25)

6.2.4 - Linearizing The Weibull CDF

Steps To Create A Weibull\((\beta,\eta)\) Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the Weibull\((\beta,\eta)\) CDF
- For \(T\sim WEIB(\beta,\eta)\) the CDF is \[F(t_{p})=p=1-\exp\left[-\left(\frac{t}{\eta}\right)^{\beta}\right]\]
- Inverting this CDF gives the Weibull quantile function \[\log[t_p]=F^{-1}(p)=\Phi^{-1}_{SEV}(p)\sigma +\mu\]
- Rearranging this expression as a linear equation gives \[\log\left[\log\left[\frac{1}{1-p}\right]\right]=\frac{\log\left[t_{p}\right]}{\beta}-\frac{\log[\eta]}{\beta}\]
Plot the points \(\left(x = \log[t_i],y = \log\left[\log\left[\frac{1}{1-p_i}\right]\right]\right),\;i=1,...,n\)
If the points roughly fall on a straight line the data may follow an Weibull distribution
- \(\beta \equiv\;\) the slope of the line \(\displaystyle m = \left(\frac{1}{\beta}\right)\)
- \(\theta \equiv\;\) the \(x\)-value where the line crosses \(\displaystyle \Phi_{SEV}^{-1}(p) = 0\)

Steps To Create A Weibull\((\mu,\sigma)\) Probability Plot

Sort the observations from smallest to largest \(t_1, t_2,...,t_n\)
Linearize the Weibull\((\mu,\sigma)\) CDF
- For \(T\sim WEIB(\mu,\sigma)\) the CDF is \[F(t_{p})=p=\Phi_{SEV}\left(\frac{\log[t_p]-\mu}{\sigma}\right)\]
- Inverting this CDF gives the Weibull quantile function \[\log[t_p]=F^{-1}(p)=\Phi^{-1}_{SEV}(p)\sigma +\mu\]
- Rearranging this expression as a linear equation gives \[\Phi_{SEV}^{-1}(p)=\frac{\log[t_{p}]}{\sigma}-\frac{\mu}{\sigma}\]
Plot the points \(\left(x = \log[t_i],y = \Phi_{SEV}^{-1}(p_i)\right),\; i = 1,...,n\)
If the points roughly fall on a straight line the data may follow an Weibull distribution
- \(\sigma \equiv\;\) the slope of this line, i.e. \(\displaystyle m = \left(\frac{1}{\sigma}\right)\)
- \(\mu \equiv\;\) the \(x\)-value where the line crosses \(\displaystyle \Phi_{SEV}^{-1}(p) = 0\)

Figure 6.5 - Weibull Probability Plot

par(family='serif',font=2)

library(SMRD)

p<- seq(.01,.99,.01)

plot(qweibull(p,.5,50), qsev(p),
     type='l',
     lwd = 2,
     col = 1,
     xlim = c(1,1000),
     las = 1,
     xlab='Time',
     ylab='Standard SEV Quantile',
     log='x')

lines(qweibull(p,0.5,500), 
      qsev(p), lwd = 2, col = 2)
lines(qweibull(p,1.0,500), 
      qsev(p), lwd = 2, col = 3)
abline(h=0, lty=2)

box(lwd=1.25)
legend(x = 400,
       y = -4.6,
       c(expression(eta*','*beta == ~~50*','*0.5),
         expression(eta*','*beta == 500*','*0.5),
         expression(eta*','*beta == 500*','*1.0)),
       lty = 1,
       col = c(1:3), 
       lwd = 2, 
       bty = 'n',
       y.intersp = 0.75,
       yjust = 0.2,
       xjust = 0.5,
       cex = 0.9,
       seg.len = 1)

6.3 - GRAPHICAL GOODNESS OF FIT

Let's Review

Assessing the adequacy of a distribution's fit for a data set is an important application of probability plots
Probability plots are created by
1. Identifying the points \(\left(t_{i},\hat{F}(t_{i})=\hat{p}_{i}\right), \;\;\;\;i=1,2,...,n\)
2. Plotting these points on axes that have been transformed for a distribution of interest
3. Observe if the data fall approximately on a straight line
We discussed most of this in 6.2 But, what values should have been used for \(\hat{p}\)?
These values are called probability plotting positions

6.4.1 - Criteria For Choosing Plotting Positions

Probability plots require a way to compute plotting positions \(F(t_{i}), i=1,...,n\)

Choosing a plotting position formula depends on the purpose of the plot
- Checking distributional assumptions
- Estimating distribution parameters
The best plotting position formula is a function of
- The assumed underlying model
- The function to be estimated \((t_{p}, h(t), F(t),...)\)
- The type and amount of censoring
Most texts oversimplify the choice of a plotting position formula

Probability Plotting Formula

The generic plotting position formula is

\[ p_{i}=\frac{i-a}{n+1-2a} \]

where
- \(i\) is an index of the ordered observations (smallest \(\rightarrow\)largest)
- \(n\) is the sample size
- \(a\) is the plotting position parameter
- \(p_{i}\) is the value of the plotting position corresponding to time \(t_{i}\)
The value of \(a\) is chosen to produce approximately unbiased estimates for \(F(t_{i})\) for an assumed distribution

Probability Plot Formula Table

Proponent	Year	\(a\)	Formula
Hazen	1914	0.50	\(\displaystyle\frac{i-0.5}{n}\)
Weibull	1939	0	\(\displaystyle\frac{i}{n+1}\)
Blom	1958	0.375	\(\displaystyle\frac{i-0.375}{n+0.25}\)
Gringorten	1963	0.44	\(\displaystyle\frac{i-0.44}{n+0.12}\)
Chegodayev	2000	0.30	\(\displaystyle\frac{i-0.30}{n+0.4}\)
Cunnane	1977	0.40	\(\displaystyle\frac{i-0.40}{n+0.2}\)
Median	1943	0.3175	\(\displaystyle\frac{i-0.3175}{n+0.365}\)

Except for specialized situations, the plotting position formulas used most in practice are

\[ \begin{aligned} \text{Hazen}\; &p_{i}=\frac{i-0.5}{n}\\\\ \text{Chegodayev}\; &p_{i}=\frac{i-0.3}{n+0.4}\;\text{also known as "Median-Ranks"}\\\\ \text{Weibull}\; &p_{i}=\frac{i}{n+1}\\\\ \end{aligned} \]

This text uses the following plotting position formulas

\[ p_{i}=\begin{cases}\frac{i}{n}, &\mbox{Distribution free - (Eq 3.1)}\\\\ \frac{i-0.5}{n}, &\mbox{Assumed distribution}\end{cases} \]

6.4.2 - Choice of Plotting Positions

Censoring considerations for selecting a plotting position formula

Case 1 - Continuous inspection or small inspection intervals
Case 2 - Data With Continuous Inspection And Multiple Right Censoring Events
Case 3 - Interval censoring with large intervals
Case 4 - Arbitrary censoring
- Left censoring
- Right censoring
- Interval censoring
- Overlapping intervals

6.4.2 - Choice of Plotting Positions

Case 1 - Data with continuous inspection and a single censoring event

Since the size of the risk-set \(n\) doesn't change for this case, a logical estimator is

\[ \hat{p}_{i}=\frac{i}{n} \]

Estimator properties
- Increases by \(\frac{1}{n}\) at each failure time \(t_{i}\)
- Underestimates \(F(t)\) prior to the first failure time at \(t_{1}\)
- Overestimates \(F(t)\) after the last failure time at \(t_{n}\)
The SMRD package uses this estimator for nonparametric CDF plots
- plot(data.ld)
- plot(data.ld, distribution = NULL)
A compromise plotting position for addressing this bias uses midpoint of the \(1/n\) jumps

\[ \hat{p}_{i}=\frac{i-0.5}{n}=\frac{1}{2}\left[\hat{F}(t_{(i)})+\hat{F}(t_{(i-1)})\right] \]

The SMRD package uses this comprimise estimator for parametric CDF plots
- plot(data.ld, distribution = "distribution")

SMRD::smrd_app('at7987_data')

6.4.2 - Choice of Plotting Positions

Case 2 - Data with continuous inspection and multiple right censoring events

For this case, each failure can result in a jump larger that \(1/n\)
- Censored observations between failures will reduce \(n\)
- Furthermore, \(\frac{i-0.5}{n}\ne\frac{1}{2}\left[\hat{F}(t_{(i)})+\hat{F}(t_{(i-1)})\right]\)
For this case, we use the estimated values \(\hat{F}(t_{(i)})\) directly as

\[ \hat{p_{i}}=\frac{1}{2}\left[\hat{F}(t_{(i)})+\hat{F}(t_{(i-1)})\right] \]

Where Equations 3.6 & 3.7 are used to compute
- \(\hat{F}(t_{(i)})\)
- \(\hat{F}(t_{(i-1)})\)

Example 6.3 - ShockAbsorber

Recall from Eq. 3.7

\[ \begin{aligned} \hat{F}(t_{i})&=1-\prod_{j=1}^i \left[1-\hat{p}_{j}\right]\\\\ &=1-\prod_{j=1}^i \left[1-\frac{d_j}{n_j}\right] \end{aligned} \]

Therefore,

\[ \begin{aligned} p_i=\frac{1}{2}\left[1-\prod_{j=1}^i \left[1-\frac{d_j}{n_j}\right]+1-\prod_{j=1}^{i-1} \left[1-\frac{d_j}{n_j}\right]\right] \end{aligned} \]

SMRD::smrd_app('shockabsorber_data6')

6.4.2 - Choice of Plotting Positions

Case 3 - Data with Interval censoring

In this case, failures "occur" at the end of an interval when discovered by inspection
\(\hat{F}(t)\) defined as the upper endpoint of an interval in which failures occurred
For this case, the plotting position is

\[ p_{i}=\hat{F}(t_{(i)}) \]

Where
- \(E\left[\hat{F}(t_{(i)})\right]=F(t_{(i)})\) when there are no losses due censoring
- \(E\left[\hat{F}(t_{(i)})\right]\approx F(t_{(i)})\) when multiple censoring does occur

Example 6.4 - Heat Exchanger

SMRD::smrd_app('heatexchanger_data6')

6.4.2 - Choice of Plotting Positions

Case 4 - Arbitrary censoring

In this case the data can consist of a mixture of
- Left censored observations
- Right censored observations
- Interval censored observations
- Exact failures
For arbitrarily censored data the plotting positions \(\hat{F}(t)\) are computed using the formulas for the other cases described above

Example 6.5 - Turbine Wheel

SMRD::smrd_app('turbine_data6')

6.6.2 - Reasons For Bends In Probability Plots

What if your probability plot looks like this?

par(family = 'serif', font = 2)

Bleed.ld <- 
frame.to.ld(bleed,
            response.column = 1, 
            censor.column = 2, 
            case.weight.column = 3,
            x.columns = c(4),
            time.units = 'Hours')

plot(Bleed.ld,
     distribution = NULL)

Sharp bends or departures from linearity could indicate the existence of
- Multiple failure modes
- Different usage patterns in the sample
- An unexpected or unaccounted stressor
For this reason reliability data should include
- The failure mode(s) observed
- Variations in the usage environment
- Variations in the amount of usage

Example 6.9 - Bleed System Failure

SMRD::smrd_app('bleed_data6')

Statistical Methods for Reliability Data

Chapter 6 - Probability Plotting

OVERVIEW

This chapter explains...

6.1 - INTRODUCTION

Applications Of Probability Plots

6.1.2 - Probability Plotting Scales: Linearizing A CDF

CDF's from different distributions can look similar on plots

6.2 - LINEARIZING LOCATION-SCALE BASED DISTRIBUTIONS

Basic Idea Behind The Process

Fake Data Vs. Real Data

6.2.1 - Linearizing The Exponential CDF

Steps to Create An exponential Probability Plot

Figure 6.1 - Exponential Probability Plot

6.2.2 - Linearizing The Normal CDF

Steps To Create a Normal Probability Plot

Figure 6.2 - Normal Probability Plot

6.2.3 - Linearizing The Lognormal CDF

Steps To Create A Lognormal Probability Plot

Figure 6.3 - Lognormal Probability Plot

6.2.2 - Linearizing The Smallest Extreme Value CDF

Steps To Create An SEV Probability Plot

Figure 6.4 - Smallest Extreme Value Probability Plot

6.2.4 - Linearizing The Weibull CDF

Steps To Create A Weibull\((\beta,\eta)\) Probability Plot

Steps To Create A Weibull\((\mu,\sigma)\) Probability Plot

Figure 6.5 - Weibull Probability Plot

6.3 - GRAPHICAL GOODNESS OF FIT

Let's Review

6.4.1 - Criteria For Choosing Plotting Positions

Probability plots require a way to compute plotting positions \(F(t_{i}), i=1,...,n\)

Probability Plotting Formula

Probability Plot Formula Table

6.4.2 - Choice of Plotting Positions

Censoring considerations for selecting a plotting position formula

6.4.2 - Choice of Plotting Positions

Case 1 - Data with continuous inspection and a single censoring event

6.4.2 - Choice of Plotting Positions

Case 2 - Data with continuous inspection and multiple right censoring events

Example 6.3 - ShockAbsorber

6.4.2 - Choice of Plotting Positions

Case 3 - Data with Interval censoring

Example 6.4 - Heat Exchanger

6.4.2 - Choice of Plotting Positions

Case 4 - Arbitrary censoring

Example 6.5 - Turbine Wheel

6.6.2 - Reasons For Bends In Probability Plots

What if your probability plot looks like this?

Example 6.9 - Bleed System Failure