By: Alton Barbehenn
Adviser: Brad Hartlaub
These are my notes on Hollander, M., Wolfe, D. A., and Chicken, E. (2014), Nonparametric Statistical Methods, Third Edition, New York, NY: John Wiley & Sons, Inc. Chapter 11.
library(NSM3)
library(nortest)
library(survival)
library(km.ci)
We are given \(n\) observations, \(X_1, X_2,\dots, X_n\), such that \(X_i\overset{iid}{\sim}F\), where \(F\) is a continuous CDF. Note that there are methods for discrete CDF’s. Suppose \(F\) is a life distribution (nonnegative), i.e. \(F(a) = 0\) for \(a<0\).
Suppose \(F\) has a corresponding pdf, \(f\). Then we define the failure rate to be given by \[r(x) = \frac{f(x)}{\overline{F}(x)} ,\] where \(\overline{F}(x) > 0\) and \(\overline{F}(x) = 1 - F(x)\). Intuitively, for small \(\delta_x\), \(r(x)\delta_x\) is the probability that a product is alive at time \(x\) but will fail in the interval \((x,x+\delta_x)\). Failure rate can increase, decrease, remain constant, or fluctuate.
We are primarily interested in testing the null hypothesis that the distribution has constant failure rate (is the exponential distribution in the continuous case, geometric in the discrete case).
We say \(F\) is in the increasing failure rate class if it’s failure rate is non-decreasing. Formally, \(r(x)\leq r(y)\ \forall\ x<y\). Here we are testing \(H_0:F(x) = 1-e^{-\lambda x}\), for \(x\geq 0\) and \(F(x) = 0\) otherwise.
The Epstein test for IFR versus the exponential distribution is:
NSM3::epstien
functionLet’s look at the times between failures of a specific Boeing 720’s air conditioning system (Problem 1).
Table11.2 <- c(487, 18, 100, 7, 98, 5, 85, 91, 43, 230, 3, 130)
test <- epstein(x = Table11.2, alternative = "ifr", exact = TRUE)
## E= 4.222051
## p= 0.9077575
Thus we can conclude that the air conditioning system does not have an increasing failure rate.
test <- epstein(x = Table11.2, alternative = "two.sided", exact = TRUE) #WHAAATTTT???? WHY IS THE P-VALUE > 1.0? I BROKE IT!
## E= 4.222051
## p= 1.815515
How about the survival of Guinea Pigs? Here we will test to see if guinea pigs have an increasing failure rate when injected with human TB (Problem 2).
Table11.3 <- c(10,33,44,56,59,72,74,77,92,93,96,100,100,102,105,107,107,108,
108,108,109,112,113,115,116,120,121,122,122,124,130,134,136,139,144,146,
153,159,160,163,163,186,171,172,176,183,195,196,197,202,213,215,216,222,
230,231,240,245,251,253,254,254,278,293,327,342,347,361,402,432,458,555)
test <- epstein(x = Table11.3, alternative = "ifr")
## E*= 5.566958
## p= 1.296124e-08
So guinea pigs experience increasing failure rate when infected with human TB.
We can also test to see if the average failure rate is increasing. This class of life distributions allows for the failure rate to go down, locally, but maintain an average increasing trend. Note that \(IFR\subset IFRA\). Klefsjo provides us with tests for both IFR and IFRA (and DFR and DFRA) where we can choose to simulate the test statistic or not.
Let’s re-run the tests above, but with Klefsjo’s test for IFR.
test <- klefsjo.ifr(x = Table11.2, alternative = "ifr", exact = TRUE)
## A*= -0.7789036
## p= 0.7076182
test <- klefsjo.ifr.mc(x = Table11.2, alternative = "ifr", exact = TRUE)
## A*= -0.7789036
## p= 0.6730769
test <- klefsjo.ifr(x = Table11.3, alternative = "ifr")
## A*= 2.229348
## p= 0.01289539
So we see that both tests for IFR yield the same results as when we used NSM3::epstein
. Also, we notice that the Monte Carlo Simulated test statistics seem pretty close to the exact value. The exact
parameter refers to the use of the distribution of the test statistics (TRUE) or the LSA distribution (FALSE).
Now, because the guinea pigs fall are in IFR, they should also be in IFRA. Let’s also see if the air conditioners are in IFRA.
test <- klefsjo.ifra(x = Table11.2, alternative = "ifra", exact = TRUE)
## B*= -1.088068
## p= 0.8368789
test <- klefsjo.ifra(x = Table11.3, alternative = "ifra")
## B*= 7.380603
## p= 7.878726e-14
So the the guinea pigs are definitely are definitely in IFRA (as theory dictates), but the Boeing air conditioners are not in IFRA either.
The test for New Better than Used (NBU) or New Worse than Used (NWU) versus \(H_0: F\ \text{is exponential}\) is pretty simple. If an item is NBU, then new items are better than used items of any age. Likewise, if an item is NWU, then new items are worse than used items of any age.
Compute \(T = \sum_{i>j>k}\psi(X_{(i)}, X_{(j)} + X_{(k)})\), where \[ \psi(a,b) = \begin{cases} 1, &\text{if}\ a>b \\ \frac{1}{2}, &\text{if}\ a=b \\ 0, &\text{if}\ a<b \\ \end{cases} . \] Noting that the summation is over all \(\frac{n(n-1)(n-2)}{6}\) ordered triplets.
Use this test statistic to make conclusion (the cut-off values are chosen to get the probability of Type 1 Error equal \(\alpha\), see book for details).
Note that \(IFR\subset IFRA\subset NBU\).
In R, the function NSM3::newbet()
can be used to compute the p-value of the LSA test statistic (outputs \(T\), \(T^*\), and the p-value). Also, the function NSM3::nb.mc
can be used to run a Monte Carlo simulation of the Hollander-Proschan test statistic and p-value.
Let’s come back to the guinea pigs in Table 11.3 (Problem 16).
test <- newbet(x = Table11.3)
test
## [[1]]
## [1] 25088
##
## [[2]]
## [1] -8.398708
##
## [[3]]
## [1] 0
test <- nb.mc(x = Table11.3, alternative = "nbu", exact = FALSE)
## T*= -8.398708
## p= 2.257086e-17
So both tests result in the same conclusion (that theory predicted): the guinea pigs are in NBU. We can also see that both tests yield similar values for \(T^*\) and the p-value.
Let’s look at another example. Consider this data set of survival days from diagnosis with chronic granulocytic leukemia (Problem 9). Note that we might well be reluctant to say this data is in IFR because treatment after diagnosis may decrease the failure rate temporarily.
Table11.6 <- c(7,47,58,74,177,232,273,285,317,429,440,445,455,
468,495,497,532,571,579,581,650,702,715,779,
881,900,930,968,1077,1109,1314,1334,1367,1534,
1712,1784,1877,1886,2045,2056,2260,2429,2509)
test <- nb.mc(x = Table11.6, alternative = "nbu")
## T*= -1.453056
## p= 0.07310412
So the patients seem to not be in NBU.
There are also tests for new better than used in expectation (NBUE). Define the mean residual life function by \[
m(x) = \begin{cases}
E_F(X - x | X > x), & \text{for those $x$ such that } \overline{F}(x) > 0,\\
0, & \text{for those $x$ such that } \overline{F}(x) = 0. \\
\end{cases}
\] \(F\) is then in NBUE if, for all \(x\), \(m(0)\geq m(x)\). These tests can be implemented with NSM3::newbet()
and NSM3::nb.mc()
. In theory, \(IFR\subset IFRA\subset NBU\subset NBUE\). There are identical definitions, procedures, and theory for new worse than used in expectations (NWUE).
Here we are interested in testing \(H_0: F\ \text{is exponential}\) versus the alternatives that \(F\) has decreasing mean residual life (DMRL) or increasing mean residual life (IMRL) where \(F\) is not exponential in any alternative hypothesis. The former refers to situations when things deteriorate with age, while the latter refers to when aging is beneficial.
A distribution \(F\) is said to be member of DMRL if \(F(0)=0\) and \(m(x)\geq m(y)\) for all \(x<y\) where \(\overline{F}(x),\overline{F}(y)>0\).
The procedure for a data set of size \(n\) is as follows:
Compute \[V^* = \frac{V}{\overline{X}}\], where \[V = \frac{\sum_{i=1}^n c_{(i)}X_{(i)}}{n^4}\] and \[c_{(i)} = \left(\frac{4}{3}\right) i^3 - 4 n i^2 + 3 n^2 i - \left(\frac{1}{2}\right) n^3 + \left(\frac{1}{2}\right) n^2 - \left(\frac{1}{2}\right) i^2 + \left(\frac{1}{6}\right) i\]
Let \(V' = \sqrt{210 n}\ V^*\)
Use this test statistic to make conclusion (the cut-off values are chosen to get the probability of Type 1 Error equal \(\alpha\), see book for details).
Note that we now have two relational chains: \(IFR \subset IFRA \subset NBU \subset NBUE\) and \(IFR \subset DMRL \subset NBUE\). Similarly, \(DFR \subset DFRA \subset NWU \subset NWUE\) and \(DFR \subset IMRL \subset NWUE\).
Comments 20 and 21 show us how to create an emperical mean residual life function and it’s confidence band.
Revisiting our data sets (Problems 18, 20, and 17 respectively):
test <- dmrl.mc(x = Table11.2, alternative = "dmrl", exact = TRUE)
## V*= -0.02451437
## p= 0.8598131
test <- dmrl.mc(x = Table11.3, alternative = "dmrl", exact = FALSE)
## V'= 1.275118
## p= 0.1011337
test <- dmrl.mc(x = Table11.6, alternative = "dmrl", exact = FALSE)
## V'= 1.38813
## p= 0.08254871
Notice that all three of our data sets have fail to be in DMRL, so it must be the case that none of them are in NBUE.
In these tests, we are concerned with testing \(H_0: F\ \text{is exponential}\) versus the alternative that the mean residual life is initially increasing then decreasing (IDMRL) or the opposite (DIMRL). A life distribution with finite mean is in IDMRL if there exists \(\tau\geq 0\) such that \(m(x)\leq m(y)\) for \(0\leq x\leq y < \tau\) and \(m(x)\geq m(y)\) for \(\tau \leq x \leq y\).
The test for a DIMRL or IDMRL with known mean is more complicated than the test for IMRL or DMRL. The test statistic \(T_1\) can be made asymptotically normal by calculating \(T_1^* = \frac{n^{1/2}T_1}{\hat{\sigma}_{T_1}}\). The test can be performed by NSM3::tc
.
If the mean is unknown we can use the Hawkins, Kochar, and Loader test (\(T^{(2)}\)) of exponential versus IDMRL. This test is not designed to test exponential versus DIMRL. This test is in comment 25. The cut-off values have been found for large \(n\) and are presented in the same comment. No estimators for \(\tau\) are provided.
Let’s look at another guinea pig trial. This is part of the same research as the other trial, but with a different TB dosing regiment. We are given that \(\tau = 89\) (Problem 24).
Table11.11 <- c(12,15,22,24,24,32,32,33,34,38,38,43,44,48,52,53,54,54,
55,56,57,58,58,59,60,60,60,60,61,62,63,65,65,67,68,70,
70,72,73,75,76,76,81,83,84,85,87,91,95,96,98,99,109,110,
121,127,129,131,143,146,146,175,175,211,233,258,258,263,297,341,341,376)
test <- tc(x = Table11.11, tau = 89, alternative = "two.sided")
## T1= -0.5125822
## p= 0.278297
## sigma hat= 4.011799
## T1*= -1.084153
So there is no evidence that this second set of guinea pigs have any trend change (IDMRL or DIMRL). Here’s a plot of the mean residual life:
min.data <- floor(min(Table11.11))
max.data <- floor(max(Table11.11))
X <- seq(from = min.data, to = max.data, by = 2)
mrl <- c()
for(i in X) {
mrl <- c(mrl, mean(Table11.11[Table11.11 > i]))
}
plot(x = X, y = mrl)
abline(v = 89, col = "red")
# This function returns the value of the empirical CDF for a data set, data, and value, vec.x.
# data should be a vector
# x can be a vector or a specific value
ecdf <- function(vec.x, data) {
n = length(data)
emp.F <- c()
for(i in 1:length(vec.x)) {
x = vec.x[i]
emp.F <- c(emp.F, length(data[data <= x])/n)
}
return(emp.F)
}
X <- seq(from = 0, to = ceiling(max(Table11.11)), length = 100)
plot(x = X, y = ecdf(vec.x = X, data = Table11.11), pch = 20, cex = 0.25, ylab = "Empirical CDF")
Now we want to find a confidence interval for this function. Let’s generalize this to any set of \(n\) \(X\)’s that are iid according to a common distribution \(F\).
For a given \(\alpha\)-level, the confidence band is given by two functions, \(l(x)\) and \(u(x)\) such that \[P\left[l(x)\leq F(x)\leq u(x)\ |\ \forall x\right] \geq 1-\alpha .\] Hence we have at least \(100(1-\alpha)\%\) confidence that \(F(x)\in [l(x),u(x)]\).
Define these functions by \[ l(x) = \begin{cases} F_n(x)-d_\alpha, & F_n(x)-d_\alpha \geq 0 \\ 0, & F_n(x)-d_\alpha < 0 \\ \end{cases} \]
and
\[ u(x) = \begin{cases} F_n(x)+d_\alpha, & F_n(x)+d_\alpha \leq 1 \\ 1, & F_n(x)+d_\alpha > 1 ,\\ \end{cases} \]
where \(F_n\) is the empirical CDF and \(d_\alpha\) is the upper \(\alpha\) percentile point of the distribution of the Kolmogorov statistic. They are derived in the normal manner of deriving a confidence interval.
The function NSM3::ecdf.ks.CI()
will plot the Kolmogorov’s confidence band around the empirical CDF. The confidence intervals are fixed at \(95\%\) because they are using an approximation that is not easily computed.
test <- ecdf.ks.CI(x = Table11.11, pch = 20, cex = 0.7)
We can make this computation by ourselves if we compute the value of \(D\) using the R function kd
(not found in package), we may be able to use ks.test()$statistic
. Then use NSM3::kolmogorov()
to compute the probability under the null that \(D\geq d\).
Because \(F_n(x)\) converges to \(F(x)\), we can use it to measure goodness of fit. Let’s assume we have a good guess of the underlying distribution, \(F(x)\), call it \(F_0(x)\). Then the goodness of fit test is simply to calculate the maximum distance between \(F_n(x)\) and \(F_0(x)\) (this is \(D\)). The test is then reject \(H_0: F(x) = F_0(x)\) for every \(x\) in favor of \(H_1: F_0(x) \neq F(x)\) for at least one \(x\) iff \(D\geq d_\alpha\). This is done by the function NSM3::kolmogorov()
.
For an example, let’s look at the discrepancy scores for 68 female AP calculus students data from chapter 12.
apscores <- c(0.129,0.242,0.262,0.284,0.300,0.317,0.324,0.330,0.339,0.353,
0.359,0.369,0.377,0.382,0.424,0.425,0.429,0.451,0.453,0.471,
0.477,0.479,0.480,0.480,0.483,0.487,0.489,0.501,0.501,0.502,
0.503,0.507,0.511,0.520,0.522,0.530,0.532,0.535,0.536,0.540,
0.547,0.548,0.551,0.551,0.554,0.556,0.557,0.558,0.581,0.590,
0.596,0.604,0.616,0.623,0.627,0.628,0.654,0.663,0.680,0.691,
0.744,0.751,0.790,0.806,0.813,0.818,0.830,0.860)
hist(apscores)
Let’s test to see if this data is normal with mean 0.5223824 and standard deviation 0.1509664.
mu = mean(apscores)
sigma = sd(apscores)
#kolmogorov(x = apscores, fnc = pnorm, mean = mu, sd = sigma)
The above commented out code returns the following error: Error in find.kol(D, n) : object 'n' not found
. I also cannot find or use this function. It seems that this function works for smaller data sets but not for larger data sets (see below), this leads me to conclude that there is a cut off for size that dictates whether or not we can use the find.kol()
function.
velocity<-c(12.8, 12.9, 13.3, 13.4, 13.7, 13.8, 14.5)
kolmogorov(velocity,pnorm, mean=14,sd=2)
## D= 0.4012937
## p= 0.157072
## $D
## [1] 0.4012937
##
## $p
## [1] 0.157072
kolmogorov(velocity,pexp,1/2)
## D= 0.9983384
## p= 6.992576e-20
## $D
## [1] 0.9983384
##
## $p
## [1] 6.992576e-20
There is a specialized version of the Kolmogorov goodness of fit test called nortest::lillie.test
. This is a test for normality.
lillie.test(velocity)
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: velocity
## D = 0.15156, p-value = 0.8968
lillie.test(apscores)
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: apscores
## D = 0.11263, p-value = 0.03218
So the AP-Scores are distributed normally, while the velocities are still not normally distributed (consistency between tests is always nice).
Let \(T_i\) be samples from a life distribution and \(C_i\) be samples from a censoring distribution. We observe \(X_i = min\{T_i,C_i\}\) and \[ \delta_i = \begin{cases} 1, & T_i\leq C_i \\ 0, & T_i > C_i \\ \end{cases}, \] hence \(\delta_i = 1\) if \(T_i\) is censored. Assume that the \(T\)’s and the \(C\)’s are mutually independent.
In a clinical trial, \(T_i\) may represent an endpoint event, such as relapse or death. If a patient leaves before reaching this endpoint, the data is considered to be censored and we use their final value as the censored data point.
Let \(t_{(1)}, \dots, t_{(k)}\) be the ordered known deaths (no censored values). Denote the number of patients who have not died or been censored by time \(t_{(i)}\) as \(n_i\). Finally, denote the number of deaths (failures) at time \(t_{(i)}\) as \(d_{i}\).
Then the Kaplan-Meier estimator of the survival function is: \[\overline{F}_{KM}(x) = \prod_{t_{(i)} \leq x} \left(1 - \frac{d_i}{n_i}\right) .\] Hence \(F_{KM} = 1 - \overline{F}_{KM}(x)\).
There are two other estimators for a censored survival function proposed by Efron. Both methods are asymptotically equivalently to the Kaplan-Meier estimator. These two methods are shown in comments 32 and 33.
There are many ways to build a confidence interval about the Kaplan-Meier estimator. But it is recommended that we use the simultaneous confidence band given by the Hollander-Peña family when \(c = 1\). See comment 37 for the computational form. Quartiles can also be computed.
When looking at the data on submissions to the Journal of the American Statistical Association in 1994. Here we are looking at the time in days between submission and when the paper is first reviewed (Problem 34). Some papers were not read by the end of 1994, so they are considered censored.
data("JASA1994")
fit <- survfit(formula = Surv(time = JASA1994$x, event = JASA1994$delta) ~ 1)
plot(fit)
Here is the Kaplan-Meier estimator for the survival distribution with 95% confidence intervals. We can change the confidence intervals by using the kn.ci
package.
plot(km.ci(fit, conf.level = 0.90))
We can get the all the predictive data we want from fit.
str(fit)
## List of 13
## $ n : int 432
## $ time : int [1:187] 0 1 3 4 5 6 7 8 11 12 ...
## $ n.risk : num [1:187] 432 428 417 415 414 412 409 405 401 399 ...
## $ n.event : num [1:187] 4 9 2 0 0 2 0 0 2 1 ...
## $ n.censor : num [1:187] 0 2 0 1 2 1 4 4 0 3 ...
## $ surv : num [1:187] 0.991 0.97 0.965 0.965 0.965 ...
## $ type : chr "right"
## $ std.err : num [1:187] 0.00465 0.00847 0.00913 0.00913 0.00913 ...
## $ upper : num [1:187] 1 0.986 0.983 0.983 0.983 ...
## $ lower : num [1:187] 0.982 0.954 0.948 0.948 0.948 ...
## $ conf.type: chr "log"
## $ conf.int : num 0.95
## $ call : language survfit(formula = Surv(time = JASA1994$x, event = JASA1994$delta) ~ 1)
## - attr(*, "class")= chr "survfit"
For instance, we can find the 95% confidence interval for the probability that the first review will exceed 150 days (Problem 36).
fit$lower[150] #lower bound
## [1] 0.2064465
fit$upper[150] #upper bound
## [1] 0.3136707
So the 95% C.I. is \((0.21,0.31)\).
Similarly to the Kaplan-Meier estimator, we will assume that we have two sets of censored data from a life distribution, \(X_1,\dots,X_m\) and \(Y_1,\dots,Y_n\) where we know if each value is censored or not.
The Mantel test is then to first order the known failures across both sets, \(w_{(1)},\dots, w_{(k)}\). Then, for \(j=1\ \text{or}\ 2\) indicating the set the value is in, find the number of patients who have not yet died by time \(w_{(i)}\): \(n_{ij}\) and \(n_i = n_{i1}+n_{i2}\). Next, find the number of patients who have died at each time \(w_{(i)}\): \(d_{ij}\) and \(d_{i} = d_{i1}+d_{i2}\). Finally, we can compute the Mantel test statistic: \[M_c = \frac{\sum_{i=1}^k (d_{i1} - E_{i1})}{\sqrt{\sum_{i=1}^k V_{i1}}},\] where \[E_{i1} = \frac{d_i n_{i1}}{n_i}\] and \[V_{i1} = \frac{d_i (n_i - d_i) n_{i1} n_{i2}}{n_i^2 (n_i - 1)}\] are the conditional mean and variance respectively.
Noting that \(M_c\) is asymptotically normal, we now want to test \(H_0: F_1 = F_2\). Hence we reject \(H_0\) in favor of the hypothesis that survival times in treatment 2 are better iff \(M_c \geq z_{\alpha}\). Or we reject \(H_0\) in favor of the hypothesis that survival times in treatment 1 are better iff \(M_c \leq -z_{\alpha}\). And we reject \(H_0\) in favor of the hypothesis that the survival times in the treatments are different iff \(|M_c| \geq z_{\alpha/2}\).
To perform this test in R, we use the survival
package. Let’s look back at the Journal of the American Statistical Association. We are looking at the same data as last time, but know we are curious as to whether the there was a difference between 1994 and 1995 (Problem 48).
#get + format data
data("JASA1994")
data("JASA1995")
JASA1994$year <- 1994
JASA1995$year <- 1995
JASA <- rbind(JASA1994, JASA1995)
#run test
Formula <- Surv(time = JASA$x, event = JASA$delta) ~ JASA$year
fit <- survfit(formula = Formula)
plot(fit, col = c("red", "green"), main = "Days Until a JASA Submission is Read", xlab = "Days", ylab = "Prob. Still Unread")
legend("topright", inset = 0.05, title = "Years", c("1994", "1995"), fill = c("red", "green"))
survdiff(formula = Formula, rho = 0) #rho = 0 is the Mandel Test
## Call:
## survdiff(formula = Formula, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## JASA$year=1994 432 274 279 0.0879 0.182
## JASA$year=1995 444 271 266 0.0921 0.182
##
## Chisq= 0.2 on 1 degrees of freedom, p= 0.67
So there is no real difference in how long it takes JASA to read a paper between the two years.
Let’s do one more example. Looking at data for a clinical trial of a drug to help treat serious liver disease (Problem 44). We are given information on each patient’s treatment(drug or placebo), the observation length, and whether they are alive or dead at the end.
trt <- c('D','P','P','D','D','P','D','D','P','D','P','P','D','P','P','D','D','P','P','D','P','D','P','D')
obs.length <- c(6,16,2,1,1,4,16,3,16,8,16,2,4,16,5,1,10,2,2,10,16,1,16,15)
status <- c(1,0,1,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,0,0,0,0) #1 is dead, 0 is alive
Formula <- Surv(time = obs.length, event = status) ~ trt
fit <- survfit(formula = Formula)
plot(fit, col = c("red", "green"), main = "Liver Disease Study", xlab = "Days")
legend("topright", inset = 0.05, title = "Treatment", c("Drug", "Placebo"), fill = c("red", "green"))
survdiff(formula = Formula, rho = 0) #rho = 0 is the Mandel Test
## Call:
## survdiff(formula = Formula, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## trt=D 12 7 4.12 2.01 3.9
## trt=P 12 2 4.88 1.70 3.9
##
## Chisq= 3.9 on 1 degrees of freedom, p= 0.0484
So the drug works! If you use the drug you are more likely to live longer.