W. Q. Meeker and L. A. Escobar
16 June 2020
Likelihood for a parametric model using discrete data
Likelihood for samples containing right and left consored observations
Use of parametric likelihood as a tool for data analysis and inference about a single population or process
The use of likelihood and normal-approximation confidence intervals for model parameters and other quantities of interest
The density approximation to the likelihood for observations reported as exact failures
If we specify a probability distribution and values for parameter vector \(\underline{\theta}\), the pdf \(f(\underline{x}|\underline{\theta})\)…
Represents the probability (probability density, actually) of observing data \(\underline{x}=x_1,...,x_n, \;\;n\in[1,\infty)\) from a specified probability distribution
Is a function of \(\underline{x}\) assuming \(\underline{\theta}=\theta_{1},\theta_{2},...\) are
Describes what observations \(\underline{x}\) are most likely to be produced by a distribution with parameters \(\underline{\theta}\)
If we specify a probability distribution and observe data \(\underline{x}\), the likelihood function \(\mathscr{L}(\underline{\theta}|\underline{x})\)…
Represents a relative measure to determine which values of \(\underline{\theta}\) are more likely to have produced the data \(\underline{x}=x_1,...,x_n, \;\;n\in[1,\infty)\)
Is a function of \(\underline{\theta}\) assuming \(\underline{x}=x_{1},...,x_{n}\) has already been observed
\[ \sqrt{n}\left(\hat{\theta}_{_{MLE}}-\theta\right)\xrightarrow{L}N\left(0,\mathscr{I}^{-1}_{\theta}\right) \]
\[ \mathscr{I}_{\theta}=-E\left[\frac{\partial^2\mathscr{L}(\underline{\theta}|\underline{x})}{\partial\underline{\theta}(\partial\underline{\theta})^T}\right]=-E\left[\frac{\partial^2\mathscr{L}(\underline{\theta}|\underline{x})}{\partial\theta_i\partial\theta_j}\right] \]
Functions of MLE’s are also MLE’s and are asymptotocally normally distributed
Background
Berkson investigated the time between \(\alpha\)-particle emissions from AM-241
Physics suggests that the interarrival times of the \(\alpha\) particles are \(iid\; EXP(\theta)\)
Data set
The data are \(10,220\) observed interarrival times (measured in units of \(1/5000\) seconds)
The times have been “binned” into intervals as shown in Table 7.1
To illustrate the effect of sample size, random samples of size \(n=20, 200,\;\&\;2000\) were drawn from the original data set, these samples are shown in Table 7.1
Figure 7.1 & Table 7.1
\[ \mathscr{L}(\underline{\theta}|\underline{x})=\sum_{i=1}^{n}\mathscr{L}_{i}(\underline{\theta}|x_i)=f(\underline{x}|\underline{\theta})=\prod_{i=1}^{n}f(x_{i}|\underline{\theta}),\;\;\text{if}\;x_{i}\; iid \]
The value of the likelihood function \(\mathscr{L}(\underline{\theta}|\underline{x})\) depends on
The assumed parametric model
The observed data
The total likelihood is comprised of the contributions from every observation
For observation \(x_i\), the model with the highest probability provides the greatest contribution to the total likelihood
For a single observation, the model providing the greatest contribution to the total likelihood may not be the correct model
As the number of observatons is increased, more information is obtained and it becomes easier to differentiate which model best-fits the data
The shiny app below shows how increasing the number of observations makes it easier to tell which model best-fits a data set
\[ \mathscr{L}_{i}(\underline{\theta}|t_{i})=\begin{cases} S(t_{i}) &\mbox{for a right censored observation}\\\\F(t_{i}) &\mbox{for a left censored observation}\\\\F(t_{i})-F(t_{i-1}) &\mbox{for an interval censored observation}\\\\\lim\limits_{\Delta_i\rightarrow 0} \frac{(F(t_{i})-\Delta_{i})-F(t_{i})}{\Delta_{i}} &\mbox{for an "exact" observations}\end{cases} \]
\[ \begin{aligned} \mathscr{L}(\underline{\theta}|\underline{x})&=C\prod_{i=1}^{n} \mathscr{L}_{i}(\underline{\theta}|x_i)\\ &=C\prod_{i=1}^{m+1}[F(t_{i})]^{l_{i}}[F(t_{i})-F(t_{i-1})]^{d_{i}}[1-F(t_{i})]^{r_{i}} \end{aligned} \]
where
\(l_i=1\) if \(t_i\) is a left censored observation (0 otherwise)
\(d_i=1\) if \(t_i\) is an interval censored observation (0 otherwise)
\(r_i=1\) if \(t_i\) is a right censored observation (0 otherwise)
\(n = \sum_{j=1}^{m+1}(l_{j}+d_{j}+r_{j})\)
Berkson modeled the times between \(\alpha\)-particle arrivals using an \(EXP(\theta)\) distribution.
This example demonstrates the effect of sample size on the value of \(\hat{\theta}_{_{MLE}}\)
The original data set contained \(10,220\) observations binned into \(m+1=8\) time intervals
The random samples of size \(n = 20, 200, 2000\) that were generated from the original data are shown below
Note that all four of these data sets are available in the SMRD package and may be called using
berkson20
berkson200
berkson2000
berkson10200
data <- matrix(c(3,7,4,1,3,2,0,0,
41,44,24,32,29,21,9,0,
292,494,332,236,261,308,73,4,
1609,2424,1770,1306,1213,1528,354,16),
nrow = 4,
byrow = TRUE)
rownames(data) <- c("n = 20","n = 200","n = 2000","n = 10220") [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
n = 20 3 7 4 1 3 2 0 0
n = 200 41 44 24 32 29 21 9 0
n = 2000 292 494 332 236 261 308 73 4
n = 10220 1609 2424 1770 1306 1213 1528 354 16
Regardless of which sample size we wish to use, the form of the likelihood function will be the same
We chose to model the data with a distribution other than the
We chose to inspect the samples more (less) often - resulting in a higher (lower) number of intervals
The form of the likelihood function for these data (
\[ \begin{aligned} \mathscr{L}(\underline{\theta}|\underline{x})&=C\prod_{j=1}^{8}[F(t_{j}|\theta)-F(t_{j-1}|\theta)]^{d_{j}}\\ &=C\prod_{j=1}^{8}\left[1-\exp\left(-\frac{t_{j}}{\theta}\right)-1-\exp\left(-\frac{t_{j-1}}{\theta}\right)\right]^{d_{j}}\\ &=C\prod_{j=1}^{8}\left[\exp\left(-\frac{t_{j-1}}{\theta}\right)-\exp\left(-\frac{t_{j}}{\theta}\right)\right]^{d_{j}} \end{aligned} \]
In R, this likelihood function may be assigned shown below
Where
theta represents the exponential parameter value \(\theta\) the value of which we are looking to find
d is a vector-valued variable of size \(1\times 8\) representing the number of observations in each of the eight time intervals
Like7.3 <- function(theta,d) {
F <- log(exp( -0/theta)-exp( -100/theta))*d[1]+
log(exp( -100/theta)-exp( -300/theta))*d[2]+
log(exp( -300/theta)-exp( -500/theta))*d[3]+
log(exp( -500/theta)-exp( -700/theta))*d[4]+
log(exp( -700/theta)-exp(-1000/theta))*d[5]+
log(exp(-1000/theta)-exp(-2000/theta))*d[6]+
log(exp(-2000/theta)-exp(-4000/theta))*d[7]+
log(exp(-4000/theta)-exp( -Inf/theta))*d[8]
return(-F) ### Remember: we're minimizing the negative of our function
}We’ll use nlminb to find the value of theta that maximizes the function Like7.3 for each of the four data sets
For each sample, a start value of 400 was used, this was based on trial and error
Choosing start values that are too far away from the critical value can sometimes cause convergence failures
Note that the upper and lower arguments were used to limit the range of values in which nlminb will search
Verify that the values returned by nlminb below for \(\hat{\theta}_{_{MLE}}\) match those shown in the top of Table 7.2
P.1 <- nlminb(start = 400, obj = Like7.3, d = data[1,], lower = 100, upper = 800)$par
P.2 <- nlminb(start = 400, obj = Like7.3, d = data[2,], lower = 100, upper = 800)$par
P.3 <- nlminb(start = 400, obj = Like7.3, d = data[3,], lower = 100, upper = 800)$par
P.4 <- nlminb(start = 400, obj = Like7.3, d = data[4,], lower = 100, upper = 800)$par
MLE <- data.frame(P.1,P.2,P.3,P.4)
colnames(MLE) <- c("n = 20","n = 200","n = 2000","n = 10220")
rownames(MLE) <- c("MLE") n = 20 n = 200 n = 2000 n = 10220
MLE 440.1711 572.2742 612.7727 596.3443
N.samples <- 200
berkson <- switch(as.character(N.samples),
"20" = {SMRD::berkson20},
"200" = {SMRD::berkson200},
"2000" = {SMRD::berkson2000},
"10220" = {SMRD::berkson10220},
stop('This number is no good'))
berkson.ld <- frame.to.ld(berkson,
response.column = c(1,2),
censor.column = 3,
case.weight.column = 4)
mleprobplot(berkson.ld, distribution = "Exponential")The likelihood function \(\mathscr{L}(\underline{\theta}|\underline{x})\) provides a useful method for computing approximate CI’s for
Parameters
Functions of parameters
We know that \(\mathscr{L}(\underline{\theta}|\underline{x})\) attains its maximum value at \(\hat{\theta}_{_{MLE}}\)
We also know that for \(\theta \ne \hat{\theta}_{_{MLE}}\), \(\mathscr{L}(\theta|\underline{x})<\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})\)
By rearranging this relationship, we can define the
\[ R(\theta)=\frac{\mathscr{L}(\theta|\underline{x})}{\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})} \in (0,1) \]
\[-2\log[R(\theta)]\le \chi^2_{(1-\alpha;df)}\]
\[R(\theta)\ge\exp\left[-\chi^2_{(1-\alpha;df)}/2\right]\]
Where \(df\) is equal to the number of unknown parameters
\[ \frac{\mathscr{L}(\theta|\underline{x})}{\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})}=\exp\left[\frac{\log(\mathscr{L}(\theta|\underline{x}))}{\log(\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})})\right]\ne\frac{\log(\mathscr{L}(\theta|\underline{x}))}{\log(\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x}))} \]
For the \(\alpha\)-particle data set \((n=20)\), a \(95\%\) CI for \(\theta\) may be found using the relative likelihood in two ways
Analytically -
Numerically -
This example walk through both processes
\[ \begin{aligned} \mathscr{L}(\theta|\underline{x})&=C\prod^{20}_{i=1}\frac{1}{\theta}\exp\left[-\frac{t}{\theta}\right]\\ &=C\prod_{j=1}^{8}\left[\exp\left(-\frac{t_{j-1}}{\theta}\right)-\exp\left(-\frac{t_{j}}{\theta}\right)\right]^{d_{j}}\\ &=C\sum_{j=1}^{8}d_{j}\times \log\left[\exp\left(-\frac{t_{j-1}}{\theta}\right)-\exp\left(-\frac{t_{j}}{\theta}\right)\right] \end{aligned} \]
Step 2: Identify the critical values for \(R(\theta)\)
For this example the only unknown parameter is \(\theta,\;(df=1)\)
Therefore, a two-sided \(95\%\) CI for \(\hat{\theta}_{_{MLE}}\) is all values of \(\hat{\theta}\) for which
\[ \begin{aligned} R(\theta)=\frac{\mathscr{L}(\hat{\theta}|\underline{x})}{\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})}&\ge \exp\left[-\chi^{2}_{(1-.05;1)}/2\right]\\ &\ge 0.147 \end{aligned} \]
[1] 0.1465001
Step 3: Using the critical values of \(R(\theta)\), find the associated critical values for \(\theta\)
If \(\hat{\theta}_{_{MLE}}\) is known, the value of \(\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})\) can easily be found.
The critical value for \(R(\theta)\) has already been determined
Thus, the critical values for \(\theta\) may then be found by recursively substituting values in \[\mathscr{L}(\theta|\underline{x})\ge R(\theta_c) \times\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})\]
For all but the most basic likelihood functions, Step 3 is hard and time consuming
Thankfully, computers can do this for us
Plotting \(R(\theta) \sim \theta\) gives a normalized profile of \(\mathscr{L}(\theta)\) that can be used to compute CI’s for \(\theta\)
The SMRD package makes this easy
par(family = 'serif')
library(SMRD)
berkson200.ld <-
frame.to.ld(berkson200,
response.column = c(1,2),
censor.column = 3,
case.weight.column = 4,
time.units = '1/5000 Seconds')
simple.contour(berkson200.ld,
distribution = 'exponential',
xlim = c(450,800))Figure 7.2 shows the relative likelihood functions for each of the four samples from the Berkson data
Note how the shape of the relative likelihood function changes with differing amounts of information
What does this mean mathematically?
What does this mean conceptually?
The text talks about the ‘curvature of the likelihood function’ - which is shown graphically in Figure 7.2 and in the code below
We’ll discuss this figure more in a later example
library(ggplot2)
Like<-function(theta,d) {
F<-log(exp( -0/theta)-exp( -100/theta))*d[1]+
log(exp( -100/theta)-exp( -300/theta))*d[2]+
log(exp( -300/theta)-exp( -500/theta))*d[3]+
log(exp( -500/theta)-exp( -700/theta))*d[4]+
log(exp( -700/theta)-exp(-1000/theta))*d[5]+
log(exp(-1000/theta)-exp(-2000/theta))*d[6]+
log(exp(-2000/theta)-exp(-4000/theta))*d[7]+
log(exp(-4000/theta)-exp( -Inf/theta))*d[8]
return(-F)
}
data<-matrix(c(3,7,4,1,3,2,0,0,
41,44,24,32,29,21,9,0,
292,494,332,236,261,308,73,4,
1609,2424,1770,1306,1213,1528,354,16),
nrow = 4,
byrow = TRUE)
mles <- matrix(NA, ncol = 1, nrow = nrow(data))
mles[,1] <- sapply(X = 1:4,
FUN = function(x) {
nlminb(start = 400,
Like,
d = data[x,],
lower = 100,
upper = 800,
control = list(trace=FALSE))$par
})
x<-seq(200,800,.1)
n.1 <- exp(-Like(x,data[1,]) + Like(mles[1,],data[1,]))
n.2 <- exp(-Like(x,data[2,]) + Like(mles[2,],data[2,]))
n.3 <- exp(-Like(x,data[3,]) + Like(mles[3,],data[3,]))
n.4 <- exp(-Like(x,data[4,]) + Like(mles[4,],data[4,]))
Data <- data.frame(X = x,
Y = c(n.1, n.2, n.3, n.4),
N = rep(c(20,200,2000,10220), each = length(x)),
row.names = NULL)
ggplot(Data) +
geom_line(aes(x = X, y = Y, colour = as.factor(N)), size = 1.5) +
theme_bw(base_size = 16) +
ylab(expression('R('*theta*')')) +
xlab(expression(theta)) +
guides(col = guide_legend(title = "# Samples"))The significance test associated with likelihood-based CI’s is known as the
In the \(\alpha\)-particle example, our uncertaintly about the value of \(\hat{\theta}_{_{MLE}}\) was expressed as all values of \(\theta\) for which
\[ R(\theta|\underline{x})\ge\exp\left[-\chi^2_{(1-\alpha;1)}/2\right] \]
In the likelihood ratio test (LRT)
A single value of \(\theta\) is hypothesized, denoted by \(\theta_{0}\)
The LRT tests whether the observed data provides sufficient evidence to reject the hypothesized value \(\theta_0\) or not
\[ \begin{aligned} H_{0}: \theta &= 650\\ H_{1}: \theta &\ne 650 \end{aligned} \]
\[ \begin{aligned} -2\log\left[R(\theta_{0}|\underline{x})\right]&>\chi^2_{(.95,1)}\\ -2\log\left[\frac{\mathscr{L}(\theta_0|\underline{x})}{\mathscr{L}(\hat{\theta}_{_{MLE}}|\underline{x})}\right]&>3.84 \end{aligned} \]
\[\frac{\mathscr{L}(650|\underline{x})}{\mathscr{L}(572.3|\underline{x})}=2.94\]
The value of the test statistic \(2.94\) is less that the critical value of \(3.84\)
As this value is not in the rejection region, insufficient evidence exists to reject the null hypothesis that \(\theta=650\) at the \(.05\) significance level.
Meeker notes that the sample with \(n=2000\) observations does provide sufficient evidence to reject the null hypothesis that \(\theta=650\) at the \(.05\) significance level
\[ \left[\underline{\theta},\overline{\theta}\right]=\hat{\theta}\pm z_{(1-\alpha/2)} \hat{se}_{\hat{\theta}} \]
Where
\(\hat{\theta}\) is our point estimate for \(\theta\) using the available data
\(z_{(1-\alpha/2)}\) is the quantile value \(NOR(p=1-\alpha/2|\mu=0, \sigma=1)\)
\(\hat{se}_{\hat{\theta}}=\sqrt{\widehat{Var}[\theta]}\)
In general, the variance & covariance values for the parameters are computed from the
\[ \mathscr{I}_{\theta}=-E\left[\frac{\partial^2\mathscr{L}(\underline{\theta}|\underline{x})}{\partial\theta_i\partial\theta_j}\right]=E\left[\mathcal{I}_{\hat{\theta}}\right] \]
The Fisher Information Matrix
Gives the
Is primarily a theoretical measure
Once data has been observed the estimated variances & covariances may be computed from the
\[ \mathcal{I}_{\theta}=-\frac{\partial^2\mathscr{L}(\underline{\theta}|\underline{x})}{\partial\theta_i\partial\theta_j} \]
\[ \mathcal{L}(\theta|\underline{t})=\sum_{j=1}^8 d_j \log\left[\exp\left(-\frac{t_{j-1}}{\theta}\right)-\exp\left(-\frac{t_j}{\theta}\right)\right] \]
The information observed from a data set may be found by substituting any value for \(\hat{\theta}\) into \(\frac{\partial^2\mathcal{l}(\theta)}{\partial\theta^2}\)
Substituting \(\hat{\theta}_{_{MLE}}\) maximizes the observed information and is typically the value of \(\theta\) used
Substituting \(\hat{\theta}_{_{MLE}}\) for \(\theta\) in \(\frac{\partial^2\mathcal{l}(\theta)}{\partial\theta^2}\) gives
\[ \mathcal{I}_{\hat{\theta}_{_{MLE}}} = 0.000574 \]
\[ \Sigma_{\hat{\theta}_{_{MLE}}}=\left[\mathcal{I}_{\hat{\theta}_{_{MLE}}}\right]^{-1} \]
For the \(\alpha\)-particle data set \(T\sim EXP(\theta)\), \(\mathcal{I}_{\hat{\theta}}\) is a single element matrix
The inverse of a single-element matrix is found by simply taking reciprocal, thus
\[ \Sigma_{\hat{\theta}_{_{MLE}}}=\frac{1}{0.000574}=1742.2 \]
Like7.3<-function(theta,d) {
F<-log(exp( -0/theta)-exp( -100/theta))*d[1]+
log(exp( -100/theta)-exp( -300/theta))*d[2]+
log(exp( -300/theta)-exp( -500/theta))*d[3]+
log(exp( -500/theta)-exp( -700/theta))*d[4]+
log(exp( -700/theta)-exp(-1000/theta))*d[5]+
log(exp(-1000/theta)-exp(-2000/theta))*d[6]+
log(exp(-2000/theta)-exp(-4000/theta))*d[7]+
log(exp(-4000/theta)-exp( -Inf/theta))*d[8]
return(-F) }nlminb function to compute a value for \(\hat{\theta}_{_{MLE}}\) assuming \(n=200\)[1] 41 44 24 32 29 21 9 0
[1] 572.27
optim function and arrived at the same solution[1] 572.27
optim output includes a numerical estimate of the Observed Information Matrix (aka the Hessian matrix in some contexts)optim(par = 400,
fn = Like7.3,
d = data[2,],
lower = 100,
upper = 800,
method = "L-BFGS-B",
hessian = TRUE)$hessian [,1]
[1,] 0.00057452
To compute the the variance-covariance matrix \(\Sigma_{\hat{\theta}_{MLE}}\)
Use solve() to invert \(\mathcal{I}_{\hat{\theta}_{MLE}}\)
Take the reciprocal (only for \(1\times 1\) matrices)
Note: the accuracy of this numerical estimate can decrease as the number of unknown parameters increases
If the gradients are known, it’s generally better to define then in the function call
berkson200.ld <- frame.to.ld(berkson200,
response.column = c(1,2),
censor.column = 3,
case.weight.column = 4)
berkson200.mlest <- mlest(berkson200.ld, distribution = "Exponential")
berkson200.mlest$mle.table MLE Std.Err. 95% Lower 95% Upper
mu 6.3496 0.072901 6.2067 6.4925
sigma 1.0000 0.000000 1.0000 1.0000
Exponential (mean) 572.2742 41.719536 496.0786 660.1732
mu sigma
mu 0.0053146 0
sigma 0.0000000 0
In general, the diagonal elements of \(\Sigma_{\hat{\theta}_{_{MLE}}}\) represent the variances of the parameters \(\widehat{Var}\left[\hat{\theta_1}\right],\widehat{Var}\left[\hat{\theta_2}\right],...\)
While the off-diagonal elements represent the covariances \(\widehat{Cov}\left[\hat{\theta_i}\hat{\theta_j}\right],\; i\ne j\)
For the \(\alpha\)-particle example \(\widehat{Var}\left[\hat{\theta}_{_{MLE}}\right]=1742.2\)
and
\[ \widehat{se}\left[\hat{\theta}_{_{MLE}}\right]=\sqrt{\widehat{Var}\left[\hat{\theta}_{_{MLE}}\right]}=\sqrt{1742.2}=41.72 \]
\[ \frac{\hat{\theta}-\theta}{\hat{se}_{\hat{\theta}}}\sim NOR(0,1) \]
\[ \left[\underline{\theta}, \overline{\theta}\right]=572.3\pm 1.960\times 41.72=\left[ 490,653 \right] \]
where
\(572.3=\hat{\theta}_{_{MLE}}, n=200\)
\(1.960=\Phi^{-1}_{_{NOR}}(p=1-0.05/2)\)
Should work well for large samples \(n>30\)
May give poor results for small samples \(n<30\)
An alternative method to compute \(100(1-\alpha)\%\) CI’s
\[ \frac{\log(\hat{\theta})-\log(\theta)}{\hat{se}_{\log(\hat{\theta})}}\sim NOR(0,1) \]
This method has been shown to produce better confidence intervals
Using this alternative method the upper and lower confidence limits are expressed as
\[ \left[\underline{\log(\theta)},\overline{\log(\theta)}\right]=\log(\hat{\theta}) \pm z_{(1-\alpha/2)}\hat{se}_{\log(\hat{\theta})} \]
\[ \left[\underline{\theta},\overline{\theta}\right]=\left[\hat{\theta}/w,\hat{\theta}\times w\right] \]
where
\(w = \exp(z_{(1-\alpha/2)}\hat{se}_{\hat{\theta}}/\hat{\theta})\)
\(\hat{se}_{\log(\hat{\theta})}=\hat{se}_{\hat{\theta}}/\hat{\theta})\)
\(\hat{se}_{\log(\hat{\theta})}\) is found from the delta method where
\(\hat{\theta}\equiv\) the estimator for which we already know the variance
\(\log(\hat{\theta})\equiv\) the estimator for which we want to know the variance
Using the delta method gives
\[ \begin{aligned} \widehat{Var}\left[\log(\hat{\theta})\right]&=\sum_{i=1}^n\left[\frac{\partial(\log(\hat{\theta}))}{\partial\hat{\theta}}\right]^2 \widehat{Var}\left[\hat{\theta}\right] =\frac{\widehat{Var}\left[\hat{\theta}\right]}{\hat{\theta}^2}\\\\ \text{therefore}\;\;\hat{se}_{\log(\hat{\theta})}&=\sqrt{\frac{\widehat{Var}\left[\hat{\theta}\right]}{\hat{\theta}^2}}=\frac{\hat{se}_{\hat{\theta}}}{\hat{\theta}} \end{aligned} \]
MLE Std.Err. 95% Lower 95% Upper
mu 6.3496 0.072901 6.2067 6.4925
sigma 1.0000 0.000000 1.0000 1.0000
Exponential (mean) 572.2742 41.719536 496.0786 660.1732
mu sigma
mu 0.0053146 0
sigma 0.0000000 0
a<-c("596.3","6.084","",rep("[584,608]",3),"168","1.7","",rep("[164,171]",3))
b<-c("612.8","14.13","",rep("[586,641]",2),"[585,640]","163","3.8","",rep("[156,171]",3))
c<-c("572.3","41.72","","[498,662]","[496,660]","[490,653]","175","13","","[151,201]","[152,202]","[149,200]")
d<-c("440.2","101.0","","[289,713]","[281,690]","[242,638]","227","52","","[140,346]","[145,356]","[125,329]")
Tab.7.2<-data.frame(a,b,c,d)
rownames(Tab.7.2)<-c("ML estimate $\\hat{\\theta}$",
"Standard error $\\hat{se}_{\\hat{\\theta}}$",
"Approximate $95\\%$ CI for $\\theta$",
"Based on the likelihood",
"Based on $Z_{\\log(\\hat{\\theta})}\\sim NOR(0,1)$",
"Based on $Z_{\\hat{\\theta}}\\sim NOR(0,1)$",
"ML estimate $\\hat{\\lambda}\\times10^{5}$",
"Standard error $\\hat{se}_{\\hat{\\lambda}\\times10^{5}}$",
"Approximate $95\\%$ CI for $\\lambda \\times 10^{5}$",
"Based on the likelihood ",
"Based on $Z_{\\log(\\hat{\\lambda})}\\sim NOR(0,1)$",
"Based on $Z_{\\hat{\\lambda}} \\sim NOR(0,1)$")
knitr::kable(Tab.7.2,
format = 'markdown',
col.names = c("n = 10220","n = 2000","n = 200","n = 20"),
row.names = T)| n = 10220 | n = 2000 | n = 200 | n = 20 | |
|---|---|---|---|---|
| ML estimate \(\hat{\theta}\) | 596.3 | 612.8 | 572.3 | 440.2 |
| Standard error \(\hat{se}_{\hat{\theta}}\) | 6.084 | 14.13 | 41.72 | 101.0 |
| Approximate \(95\%\) CI for \(\theta\) | ||||
| Based on the likelihood | [584,608] | [586,641] | [498,662] | [289,713] |
| Based on \(Z_{\log(\hat{\theta})}\sim NOR(0,1)\) | [584,608] | [586,641] | [496,660] | [281,690] |
| Based on \(Z_{\hat{\theta}}\sim NOR(0,1)\) | [584,608] | [585,640] | [490,653] | [242,638] |
| ML estimate \(\hat{\lambda}\times10^{5}\) | 168 | 163 | 175 | 227 |
| Standard error \(\hat{se}_{\hat{\lambda}\times10^{5}}\) | 1.7 | 3.8 | 13 | 52 |
| Approximate \(95\%\) CI for \(\lambda \times 10^{5}\) | ||||
| Based on the likelihood | [164,171] | [156,171] | [151,201] | [140,346] |
| Based on \(Z_{\log(\hat{\lambda})}\sim NOR(0,1)\) | [164,171] | [156,171] | [152,202] | [145,356] |
| Based on \(Z_{\hat{\lambda}} \sim NOR(0,1)\) | [164,171] | [156,171] | [149,200] | [125,329] |
The rate of occurrence for events distributed \(EXP(\theta)\) is \(\lambda =\frac{1}{\theta}\)
Since \(\lambda\propto \frac{1}{\theta}\) we can say that
\[ \overline{\theta}=\underline{\lambda} \quad \& \quad \underline{\theta}=\overline{\lambda} \]
\[ \lambda \in \left(\underline{\lambda},\overline{\lambda}\right)\;\;\iff \theta \in \left(\underline{\theta},\overline{\theta}\right) \]
\[ \left[\underline{F(t|\theta)},\overline{F(t|\theta)}\right]=\left[F(t|\overline{\theta}),F(t\underline{\theta}\right] \]
Unlinke what was shown in \(\S\) 3.8, these confidence bands are simultaneous since \(\theta\) is the only unknown parameter.
And again
\[ F(t) \in \left(\underline{F(t)},\overline{F(t)}\right)\;\;\iff \theta \in \left(\underline{\theta},\overline{\theta}\right) \]
\[ \begin{array}{lcccc} \hline & n=10220 & n=2000 & n=200 & n=20 \\ \hline \text{ML estimate } \hat{\lambda}\times10^{5} & 168 & 163 & 175 & 227 \\ \text{Standard error } \hat{se}_{\hat{\lambda}\times10^{5}} & 1.7 & 3.8 & 13 & 52 \\ \hline \text{Approximate } 95\% \text{ CI for } \lambda\times10^{5}\ & & & & \\ \hline \text{Based on the likelihood } & [164,171] & [156,171] & [151,201] & [140,346] \\ \text{Based on } Z_{\log(\hat{\lambda})}\sim NOR(0,1) & [164,171] & [156,171] & [152,202] & [145,356] \\ \text{Based on } Z_{\hat{\lambda}}\sim NOR(0,1) & [164,171] & [156,171] & [149,200] & [125,329] \\ \hline \end{array} \]
Simulation studies have shown that better confidence intervals are produced by
Likelihood methods
Bootstrap methods (Chapter 9)
Normal approximation method based on \(\log(\hat{\theta})\)
Normal approximation method based on \(\hat{\theta}\)
These CI techniques are in order of their coverage probability
“Better” CI \(\equiv\) coverage probability closer to the nominal confidence level
\[ \lim\limits_{\Delta_i\rightarrow 0^+}\left[F(t_i|\theta)-F(t_i-\Delta_i|\theta\right]\approx f(t_i|\theta)\Delta_i \]
This approximation works well for data that
Are right-censored only (no left-censoring or interval-censoring)
Can be adequately modeled by an exponential distribution
Under this situation, \(\theta\) is estimated by
\[ \hat{\theta}=\frac{TTT}{r} \]
where
\(r \equiv\) the number of failures
\(\displaystyle TTT =\sum_{i=1}^n t_{i}\) is the total time on test
lzbearing data setThe first failure occurs at lzbearing$megacycles[1] = 17.88 cycles.
However, every other unit in the test has also operated for this many cycles, therefore
[1] 411.24
[1] 654.12
TTT <- double(nrow(lzbearing))
for(i in 1:nrow(lzbearing)) {
`if`(i == 1,
TTT[i] <- lzbearing[i,1] * length(lzbearing[23:i,1]),
TTT[i] <- diff(lzbearing[(i-1):i,1]) * length(lzbearing[23:i,1]))
}
plot(x = 1:23,
y = cumsum(TTT),
pch = 16,
col = 2,
cex = 1.2,
las = 1,
ylab = 'TTT',
xlab = '# Failures')An even more generalized code is shown below
Computes TTT plot and Barlow-Proschan’s test for censored and uncensored data set
The code was initially found at http://www.math.ntnu.no/~akerkar/tma4275/TTT-plot.R, however this link doesn’t seem to be working now
TTTplot <- function(time,cens=NULL,show_results = F) {
n <- length(time)
oo <- order(time)
if(is.null(cens)) cens = rep(1,n)
sorttime <- time[oo]
sortcens <- cens[oo]
ttt <- numeric(n)
ttt[1] <- n*sorttime[1]
for(i in 2:n)
{
ttt[i] <- ttt[i-1]+(n-i+1)*(sorttime[i]-sorttime[i-1])
}
ttt <- data.frame(time = sorttime,
cens = sortcens,
rank = 1:n,
"cum total time" = ttt)
colnames(ttt) <- c("time","cens","rank","Cum.Total.Time")#,"i/n","TTT")
TTT <- ttt[ttt$cens==1,]
nfail <- dim(TTT)[1]
TTT <- cbind(TTT[,-3],
TTT$Cum.Total.Time / TTT$Cum.Total.Time[nfail],
(1:nfail) / nfail)
colnames(TTT) <- c("time","cens","Cum.Total.Time","TTT","i/n")
W <- sum(TTT$"Cum.Total.Time"[1:(nfail-1)]) / TTT$"Cum.Total.Time"[nfail]
Z <- (W - (nfail - 1) / 2) / sqrt((nfail-1) / 12)
if(show_results) {
print(TTT)
cat("Barlow-Proschan's test\n")
cat(paste("W=",round(W,4)," k=",nfail,"\n",sep=""))
cat(paste("Z=",
round(Z,4)," p.value=",round(pnorm(-abs(Z)),4),"\n",sep=""))
}
plot(TTT$"i/n",
TTT$TTT,
xlab = "i/n",
ylab = "TTT",
xlim = c(0,1),
ylim = c(0,1),
pch = 19,
main = "TTT plot",
sub = paste("N. failure = ",
nfail," N. censored = ",
n-nfail,sep=""))
lines(c(0,1),
c(0,1),
lty = 2)
invisible(list(TTT = TTT, W = W, Z = Z))
}
TTTplot(time = lzbearing$megacycles)If the following assumptions are met
The exponential distribution cannot be rejected as an adequate model for a dataset using either graphical or numerical procedures
We have either complete data (exact observation times) or Type-II (failure) censoring fTTT plot data
An exact \(100\times(1-\alpha)\) CI can be constructed for \(\theta\) as
\[ \big[\underset{\sim}{\theta},\quad\overset{\sim}{\theta}\big] = \Bigg[\frac{2(TTT)}{\chi^2_{(1-\alpha/2),2r}},\quad\frac{2(TTT)}{\chi^2_{(\alpha/2),2r}}\Bigg] \]
\[ \frac{2(TTT)}{\theta}\sim\chi^2_{2r} \]
Background
A life-test was performed on 25 specimens of a new insulating material
Testing concluded when 15 failures were observed (Type-II test)
The data are shown below (not included in the SMRD package)
t7.13 <- c(1.08,12.2,17.8,19.1,26,27.9,28.2,32.2,35.9,43.5,44,45.2,45.7,46.3,47.8,rep(47.8,10))
c7.13 <- c(rep(1,15),rep(0,10))
df7.13 <- data.frame(times = t7.13,
cens = c7.13)
knitr::kable(df7.13)| times | cens |
|---|---|
| 1.08 | 1 |
| 12.20 | 1 |
| 17.80 | 1 |
| 19.10 | 1 |
| 26.00 | 1 |
| 27.90 | 1 |
| 28.20 | 1 |
| 32.20 | 1 |
| 35.90 | 1 |
| 43.50 | 1 |
| 44.00 | 1 |
| 45.20 | 1 |
| 45.70 | 1 |
| 46.30 | 1 |
| 47.80 | 1 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
| 47.80 | 0 |
Analysis
Implementing the TTTplot function defined previously (code shown below) results in \(TTT = 950.88\)
Using this value gives an ML estimate for \(\widehat{\theta}_{_{MLE}}= 950.88/15 = 63.392\) and a \(95\%\) CI as
\[ \begin{eqnarray} \big[\underset{\sim}{\theta},\quad\overset{\sim}{\theta}\big] &=& \Bigg[\frac{2(TTT)}{\chi^2_{(1-\alpha/2),2r}}&,&\quad\frac{2(TTT)}{\chi^2_{(\alpha/2),2r}}&\Bigg]&\\\\ &=&\Bigg[\frac{2(950.88)}{\chi^2_{(0.975),30}}&,&\quad\frac{2(950.88)}{\chi^2_{(0.025),30}}&\Bigg]&\\\\ &=&\Bigg[\frac{1901.76}{46.98}&,&\quad\frac{1901.76}{16.79}&\Bigg]&\\\\ &=&\Bigg[40.48&,&113.26&\Bigg]& \end{eqnarray} \]
| time | cens | Cum.Total.Time | TTT | i/n |
|---|---|---|---|---|
| 1.08 | 1 | 27.00 | 0.02839 | 0.06667 |
| 12.20 | 1 | 293.88 | 0.30906 | 0.13333 |
| 17.80 | 1 | 422.68 | 0.44451 | 0.20000 |
| 19.10 | 1 | 451.28 | 0.47459 | 0.26667 |
| 26.00 | 1 | 596.18 | 0.62698 | 0.33333 |
| 27.90 | 1 | 634.18 | 0.66694 | 0.40000 |
| 28.20 | 1 | 639.88 | 0.67293 | 0.46667 |
| 32.20 | 1 | 711.88 | 0.74865 | 0.53333 |
| 35.90 | 1 | 774.78 | 0.81480 | 0.60000 |
| 43.50 | 1 | 896.38 | 0.94268 | 0.66667 |
| 44.00 | 1 | 903.88 | 0.95057 | 0.73333 |
| 45.20 | 1 | 920.68 | 0.96824 | 0.80000 |
| 45.70 | 1 | 927.18 | 0.97508 | 0.86667 |
| 46.30 | 1 | 934.38 | 0.98265 | 0.93333 |
| 47.80 | 1 | 950.88 | 1.00000 | 1.00000 |
[1] 950.88
Data from berkson200
lower upper Status Case.weights
1 100 100 L-Censored 41
2 100 300 Interval 44
3 300 500 Interval 24
4 500 700 Interval 32
5 700 1000 Interval 29
6 1000 2000 Interval 21
7 2000 4000 Interval 9
8 4000 4000 R-Censored 0
Snubber.ld <- frame.to.ld(snubber,
response.column = "cycles",
censor.column = "event",
time.units = "Toaster Cycles",
case.weight.column = "count",
x.columns = "design")
groupm.mleprobplot(Snubber.ld,
distribution ="Lognormal",
group.var = 1,
relationships = "linear")