W. Q. Meeker, L. A. Escobar, and J. K. Freels
16 July 2018
Important ideas behind parametric models in the analysis of reliability data
Motivation for important functions of model parameters that are of interest in reliability studies
The location-scale family of probability distributions
Properties and importance of the exponential distribution
Properties and importance of the log-location-scale distributons such as the Weibull, lognormal, and loglogistic distributions
How to generate pseudorandom data from a specified distribition (such random data are used in simulation evaluations in subsequent chapters)
Can simplify the process of computing quantities of interest compared to nonparametric methods
Can be described concisely, by listing a few parameters rather that the entire curve
Can enable extrapolation in the upper or lower tails of the distribution
Can provide smooth estimates of failure-time distributions
Can and should both be used together with nonparametric techniques to analyze a data set
The CDF \(\rightarrow F(t|\theta)\)
Failure probability for a single unit at or before time \(t\)
Fraction of units from a population that have failed at or before time \(t\)
The quantile value \(\rightarrow t_{p}=F^{-1}(p|\theta)\)
Time at which the failure probability for a single unit is \(p\)
Time at which \(p\) units are likely to have failed
The hazard function \(\rightarrow h(t|\theta)=f(t|\theta)/S(t|\theta)\)
Conditional failure probability in the interval \([t, t+\Delta t)\)
Time at which \(p\) units are likely to have failed
The mean lifetime \(\;\;\;\;\;\;\hspace{1pt}\rightarrow \; E[T]=\int_{0}^{\infty}tf(t|\theta)dt=MTTF\)
The lifetime variance \(\;\;\;\rightarrow \; Var[T]=\int_{0}^{\infty}\left(t-E[T]\right)^{2}f(t|\theta)dt\)
The location_scale app from the teachingApps package illustrates properties of several location-scale distributions
Run the app by pasting the code below in the R console
teachingApps::teachingApp('location_scale')The log_location_scale app from the teachingApps package illustrates properties of several log-location-scale distributions
Run the app by pasting the code below in the R console
teachingApps::teachingApp('log_location_scale')Several algorithms exist to generate PRN's
These algorithms are classified by
Randomness - no algorithm generates truly random numbers, but some generate numbers that are more random than others.
Periodicity - if enough numbers are generated, every algorithm will eventually show a cyclical pattern. The larger the period, the more numbers that can be generated without developing this pattern
Speed - how quickly can an algorithm generate 1 million random numbers
Specific 'brands' of PRNG
Linear Congruential
Combined Linear Congruential
Inversive Linear Congruential
Multiply-With-Carry
Lehman
Mersenne Twister
Recall, R can compute values for any available probability distribution using the standard set of four functions
These functions are differentiated by the first letter of the function name
Thus, using the normal distribution as an example:
pnorm(q,mean = 0, sd = 1) returns the value of CDF of a \(NOR(0,1)\) distribution at \(q\)
dnorm(x,mean = 0, sd = 1) returns the value of PDF of a \(NOR(0,1)\) distribution at \(x\)
qnorm(p,mean = 0, sd = 1) returns the value of quantile function (inverse CDF) of a \(NOR(0,1)\) distribution at \(p\)
rnorm(n,mean = 0, sd = 1) returns \(n\) random observations from a \(NOR(0,1)\) distribution
PRNG's require a seed value to establish a starting point for how the random numbers will be generated
The function set.seed( ) can be used to set the seed value in R
Setting the seed value allows you to create 'repeatable' random numbers
I often use randomly generated data to discuss an example
The matrix \(\bar{A}\) below is one example of this
\[\bar{A}=\left[\begin{array}{rrrr} 17 & 40 & 37 & 38 \\ 49 & 42 & 31 & 29 \\ 33 & 3 & 10 & 12 \\ 50 & 18 & 6 & 23 \\ \end{array} \right]\]
If I say 'the diagonal elements in \(\bar{A}\) are \(30,12,31,16\)' it would be confusing if the actual elements were different
By setting the seed value I know what values will be generated
\[ \bar{A} = \left[\begin{array}{rrrr} 30 & 38 & 40 & 4 \\ 1 & 12 & 3 & 36 \\ 15 & 32 & 31 & 48 \\ 14 & 39 & 45 & 16 \\ \end{array} \right] \]
Eight observations from a \(NOR(0.75, 1.5)\) distribution set.seed(NULL)
Eight observations from a \(NOR(0.75, 1.5)\) distribution set.seed(NULL)
Eight observations from a \(NOR(0.75, 1.5)\) distribution set.seed(42)
Eight observations from a \(NOR(0.75, 1.5)\) distribution set.seed(42)
The table below shows the four sets of PRN's that were generated above
The observations in Set 1 and Set 2 are different because no seed value was set
The observations in Set 3 and Set 4 are the same because they were generated using the same seed value
| set1 | set2 | set3 | set4 |
|---|---|---|---|
| 0.2251 | 0.1583 | 0.9075 | 0.9075 |
| 1.0119 | 1.6559 | 1.8055 | 1.8055 |
| 1.3853 | 1.6727 | 1.8368 | 1.8368 |
| 2.7658 | 2.7740 | 3.6499 | 3.6499 |
| 4.0767 | 4.4050 | 3.8822 | 3.8822 |
| 7.4931 | 9.1429 | 5.4701 | 5.4701 |
| 25.2562 | 25.5608 | 16.5509 | 16.5509 |
| 36.6162 | 63.8440 | 20.4357 | 20.4357 |
R simulates observations from contiuous distributions in the following way
Create a vector of observations from a \(UNIF(0,1)\) distribution (These observations serve as values for \(F(t_p)= p\))
Select a distribution and set of parameter values to simulate from
Invert the CDF to find the quantile function \(t_{p}=F^{-1}(p)\)
Substitute the \(UNIF(0,1)\) observations for \(p\) and the selected parameter values and solve for \(t_p\)
The resulting values for \(t_p\) will be PRN's from the selected distribution
Step 1 - Create a vector of observations from a \(UNIF(0,1)\)
The code below will return \(n = 10\) observations from a \(UNIF(0,1)\) distribution
These values represent results from the CDF of an arbitrary distribution
Note: I did not set a seed here - if you paste this code into your R console you will likely generate different values
probs <- runif(10,0,1)
probs [1] 0.97822643 0.11748736 0.47499708 0.56033275 0.90403139 0.13871017
[7] 0.98889173 0.94666823 0.08243756 0.51421178
Step 2 - Select a distribution and parameter values to simulate from
We've already selected the exponential distribution with scale parameter \(\theta = 2\)
The CDF for this specific distribution were simulating from is expressed as
\[ F(t|\theta=2)=1-\exp\left[-\frac{t}{2}\right] \]
Step 3 - Invert the CDF to determine the quantile function \(t_{p}=F^{-1}(p)\)
The quantile function is found by inverting the CDF (if possible), solving for \(t_p\)
For these distributions we have to solve for \(t_p\) numerically
Inverting the CDF of the \(EXP(2)\) distribution results in a quantile function expressed as
\[ t_{p}=-\ln[1-p]\times2 \]
Step 4 - Substitute the values generated from the \(UNIF(0,1)\) distribution into the quantile function
The only unknown in the quantile function above is the value \(p\)
Substituting the values we generated from the \(UNIF(0,1)\) distribution into this quantile function will return the values of \(t_p\) that correspond to each \(p\) value
The first line in the code chunk below creates an R function that computes \(t_p\) values from and exponential distribution for any specified value of \(p \text{and} \theta\)
Substituting the values \(p=\)probs and \(\theta=2\) in this function returns \(10\) simulated observations from the \(EXP(2)\) distribution
# Create function
t_p.exp <- function(p,theta) -log(1-p)*theta
# Evaluate function
t_p.exp(p = probs, theta = 2) [1] 7.6541167 0.2499643 1.2887029 1.6434742 4.6874682 0.2986484 9.0001306
[8] 5.8624462 0.1720693 1.4439650
Step 5 - Check the resulting values for \(t_p\)
The values returned from this process are PRN's from the selected distribution
We can compare our simulated values to a plot of the CDF for the \(EXP(2)\) distribution
In this example, only \(10\) observations were simulated, so the fit may be poor
Simulating more observations (say \(1000\)) should reduce the sampling error and result in a better fit
The left plot in the Figure below shows the result for \(n = 10\) observations, the right plot shows the result for \(n = 1000\) observations
The smooth black line in each plot is the true shape of the \(EXP(2)\) distribution
The step plot is a nonparametric plot of the empirical CDF of the data
par(family = "serif", bg = NA, mfrow = c(1,2))
probs10 <- runif(10,0,1)
probs1000 <- runif(1000,0,1)
t_p.exp2_10 <- t_p.exp(p = probs10, theta = 2)
t_p.exp2_1000 <- t_p.exp(p = probs1000, theta = 2)
plot(ecdf(t_p.exp2_10),
main = "10 observations",
col = 'red')
curve(pexp(x,1/2), add = TRUE, lwd = 2)
plot(ecdf(t_p.exp2_1000),
main = "1000 observations",
col = 'blue')
curve(pexp(x,1/2), add = TRUE, lwd = 2)The probability integral transform provides an efficient way to simulate random observations from many distributions
However, these observations assume that we have complete data
Suppose we wanted to generate \(n\) censored observations from a distribution \(F(t)\)
When planning a test we choose when the test will end beforehand
After a specified time has elapsed
After a specified number of failures have been observed
Say we know that observation \(i-1\) occurs before censoring -- what is the probability that observation \(i\) also occurs before censoring?
This can be accomplished using pseudorandom uniform order statistics as follows:
Suppose \(U_i \sim UNIF(0,1)\) and \(U_{(i)}\) is the \(i^{th}\) order statistic
For an arbitrary value \(u\) we want to know
\[ \Pr \left [ U_{(i)}\leq u|U_{(i-1)}=u_{(i-1)}\right ]=1 - \left [\frac{1-u } { 1-u_{(i-1)} } \right ]^{(n-i+1)}, \quad u \ge u_{(i-1)} \]
\[ \begin{eqnarray*} U_{(1)}&=& 1-[1-U_{(0)}]\times(1-U_{1})^{1/n} \\ U_{(2)}&=& 1-[1-U_{(1)}]\times (1-U_{2})^{1/(n-1)} \\ & \vdots & \\ U_{(r)}&=& 1-[1-U_{(r-1)}] \times (1-U_{r})^{1/(n-r+1)} \end{eqnarray*} \]
n = 10
obs <- runif(n,0,1) ; obs [1] 0.93739662 0.63554887 0.48474653 0.08589623 0.94491319 0.34671377
[7] 0.36934622 0.62414874 0.72740583 0.19792811
obs <- sort(obs, decreasing = F) ; obs [1] 0.08589623 0.19792811 0.34671377 0.36934622 0.48474653 0.62414874
[7] 0.63554887 0.72740583 0.93739662 0.94491319
ranks <- rank(obs) ; ranks [1] 1 2 3 4 5 6 7 8 9 10
U.order <- double(length(obs)) ; U.order [1] 0 0 0 0 0 0 0 0 0 0
For Type II (failure censored data) compute the order statistics for the \(r\) failures
For \(r+1,\ldots,n\), \(U_{r+1}\ldots,U_{n} = U_{r}\)
The function defined below can be used to produce the psedorandom order statistics
PRUOS <- function(n, r = NULL){
`if`(is.null(r),
r <- n,
if(r > n) stop("r value greater than in_vec length - dummy"))
in_vec <- sort(runif(n,0,1))
out_vec <- double(r)
for(i in 1:r) {
`if`(i == 1,
out_vec[i] <- 1 - (1 - in_vec[i])**(1 / (n - i + 1)),
out_vec[i] <- 1 - (1 - out_vec[i - 1]) * (1 - in_vec[i])**(1 / (n - i + 1)))
}
if(r < n) out_vec[(r + 1):n] <- out_vec[r]
df <- data.frame(probs = out_vec,
censor = rep(c('failed','right'), c(r,n - r)),
stringsAsFactors = F)
return(df)
}
# Generate order statistics for n=25 systems with r=4 failures
n = 25 ; r = 4
samp1 <- PRUOS(n = n, r = r)
# using these order statistics to generate values from
# a Weibull(2.25, 10) distribution
samp1$times <- qweibull(samp1$probs, shape = 2.25, scale = 10)
# Compare values to not using order statistics
samp2 <- data.frame(probs = sort(runif(n)),
censor = 'failed',
stringsAsFactors = F)
samp2$times <- qweibull(samp2$probs, shape = 2.25, scale = 10)
samp2$times[(r + 1):n] <- samp2$times[r]
samp2$censor[(r + 1):n] <- 'right'
dt1 <- DT::datatable(cbind(samp1,samp2), rownames = F)
DT::formatRound(dt1,
digits = 5,
columns = which(sapply(dt1$x$data, is.numeric)))# Generate order statistics for n=25 systems with r=25 failures
n = 25 ; r = 25
samp3 <- PRUOS(n = n, r = r)
# using these order statistics to generate values from
# a Weibull(2.25, 10) distribution
samp3$times <- qweibull(samp3$probs, shape = 2.25, scale = 10)
# Specify a censoring time and cap all values at this time
t_c <- 12.5
samp3$censor[samp3$times >= t_c] <- 'right'
samp3$times[samp3$censor == 'right'] <- max(samp3$times[samp3$times <= t_c])
# Compare values to not using order statistics
samp4 <- data.frame(probs = sort(runif(n)),
censor = 'failed',
stringsAsFactors = F)
samp4$times <- qweibull(samp4$probs, shape = 2.25, scale = 10)
samp4$censor[samp4$times >= t_c] <- 'right'
samp4$times[samp4$censor == 'right'] <- max(samp4$times[samp4$times <= t_c])
dt2 <- DT::datatable(cbind(samp3,samp4), rownames = F)
DT::formatRound(dt2,
digits = 5,
columns = which(sapply(dt2$x$data, is.numeric)))