Order statistics

Setup

This section contains the setup and the various utility functions used throughout.

Libraries used:

library(data.table)
library(ggplot2)
rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)

Compiled code (any models used are shown later):

# mod1 <- rstan::stan_model(".//stan//logit_notrand.stan", verbose = FALSE)
# rstan::expose_stan_functions(mod1)

Introduction

Often we have an interest in the relative magnitude of some set of observed variables. This is the domain of order statistics. Examples of applications:

evaluate the joint density of maximum or minimum from a set of random variables
finding the density for failure time in a system with multiple points of failure

Some notation is necessary. If \(Y_1, Y_2, \dots, Y_n\) are independent continuous random variables with cdf \(F(y)\) and density \(f(y)\) then we denote the ordered random variables as \(Y_{(1)}, Y_{(2)}\) etc. where \(Y_{(1)} \le Y_{(2)} \le \dots \le Y_{(n)}\) such that \(Y_{(1)} = \mathsf{min}(Y_1, Y_2, \dots, Y_n)\) and \(Y_{(n)} = \mathsf{max}(Y_1, Y_2, \dots, Y_n)\).

We may have interest in finding the density functions for \(Y_{(1)}\) and/or \(Y_{(n)}\). These can be found using the method of distribution functions.

For \(Y_{(n)}\), the event \((Y_{(n)} \le y)\) will occur only if the events \((Y_i \le y)\) occur for all \(i = 1 \dots n\):

\[ \mathsf{Pr}(Y_{(n)} \le y) = \mathsf{Pr}(Y_1 \le y, Y_2 \le y, \dots, Y_n \le y) \]

Because the \(Y_i\) are indep, this leads to

\[ \begin{aligned} \mathsf{Pr}(Y_{(n)} \le y) &= \mathsf{Pr}(Y_1 \le y)\mathsf{Pr}(Y_2 \le y) \dots \mathsf{Pr}(Y_n \le y) \\ &= [F(y)]^n \\ \end{aligned} \]

If we take the derivative of both sides we obtain the density function for \(Y_{(n)}\):

\[ g_{(n)}(y) = n[F(y)]^{n-1} f(y) \]

The cdf for \(Y_{(1)}\) can be found in an analogous way.

\[ \mathsf{Pr}(Y_{(1)} \le y) = 1 - [1 - F(y)]^n \]

Ditto with the density function

\[ g_{(1)}(y) = n[1-F(y)]^{n-1} f(y) \]

Toy example

Assume you have three light bulbs connected in parallel. The bulbs are from three manufactures with exponential failure times distributed as \(Y_1 \sim \mathsf{Ex}(0.1)\), \(Y_2 \sim \mathsf{Ex}(0.25)\) and \(Y_3 \sim \mathsf{Ex}(0.15)\). These density functions are shown below.

Density functions of failure times

Because the lights are in parallel, at least one light will remain on until they are all blown. In contrast, if the circuit were in series, all lights would go out when one light failed. So the question we want to answer is what is the density function for the system as a whole, given that it is wired in parallel?

We are interested \(X = \mathsf{max}(Y_1, Y_2, Y_3)\); in other words the \(g_{(n)}\) from earlier. In other words, the distribution for the time when the last light goes out. First we need the density and cdf functions. These are of the form \(f(y) = \lambda \exp(-\lambda y)\) and \(F(y) = 1 - \exp(-\lambda y)\) with applicable values substituted in for \(\lambda\).

For sanity checking, work out \(F_{Y_{(n)}}(y)\):

\[ \begin{aligned} F_{Y_{(n)}}(y) &= F_{Y_1}(y)F_{Y_2}(y)F_{Y_3}(y) \\ &= (1 - \exp(-0.1 y))(1 - \exp(-0.25 y))(1 - \exp(-0.15 y)) \\ &= 1 - \exp(-0.1 y) - \exp(-0.15 y) + \exp(-0.35 y) + \exp(-0.4 y) - \exp(-0.5 y) \end{aligned} \]

Now, we want to differentiate wrt y. In general, we would use the product rule for this:

\[ D(f(x)g(x)h(x)) = f\prime(x)g(x)h(x) + f(x)g\prime(x)h(x) + f(x)g(x)h\prime(x) \]

Which in this setting amounts to:

\[ \begin{aligned} g_{Y_{(n)}}(y) &= f_{Y_1}(y)F_{Y_2}(y)F_{Y_3}(y) + \\ &\quad f_{Y_2}(y)F_{Y_1}(y)F_{Y_3}(y) + \\ &\quad f_{Y_3}(y)F_{Y_1}(y)F_{Y_2}(y) \end{aligned} \]

and for the specific example, we get:

\[ \begin{aligned} g_{Y_{(n)}}(y) &= (0.1e^{-0.1y})(1 - e^{-0.25y})(1 - e^{-0.15y}) + \\ &\quad (0.25e^{-0.25y})(1 - e^{-0.1y})(1 - e^{-0.15y}) + \\ &\quad (0.15e^{-0.15y})(1 - e^{-0.1y})(1 - e^{-0.25y}) \\ &= 0.1(e^{-0.1y}-e^{-0.35y}-e^{-0.25y}+e^{-0.5y}) + \\ &\quad 0.25(e^{-0.25y}-e^{-0.35y}-e^{-0.4y}+e^{-0.5y}) + \\ &\quad 0.15(e^{-0.15y}-e^{-0.25y}-e^{-0.4y}+e^{-0.5y}) \\ \end{aligned} \]

In R, this can be obtained via a number of approaches. One of these is to use a grid of \(y\) values over which the density is computed. While this is an approximation of the analytic form, it is generally close enough for applied purposes as shown below. Note that the resulting form is not an exponential distribution.

# lambda alread defined above
# define some reasonable range over which to evaluate the density
nn <- 1000
x <- seq(0, 100, len = nn)

# for each value in the grid
g_y_n <- sapply(x, function(y){
  # implements the product rule
  sum(
    sapply(seq_along(lambda), function(l){
      # compute the density at this value for each toy
      # and multiply it by the product of the cdfs for the 
      # remaining two toys
      dexp(y, lambda[l]) * prod( pexp(y, lambda[-l])  )  
    })
  )
})

dfig <- data.table(x = x, g_y_n = g_y_n)

# close form version of the density function
ff <- function(x){
  l <- lambda
  l[1]*( exp(-l[1]*x) - exp(-0.35*x)- exp(-0.25*x) + exp(-0.5*x) ) + 
    l[2]*( exp(-l[2]*x) - exp(-0.35*x)- exp(-0.4*x) + exp(-0.5*x) ) +
    l[3]*( exp(-l[3]*x) - exp(-0.25*x)- exp(-0.4*x) + exp(-0.5*x) )
}

dfig[, f := ff(x)]

Desity function for time to first failure (black shows approx, red shows analytic result

An alternative way to approach this is via Monte carlo (or MCMC) simulation, which again results in a reasonable approximation for the kind of applied work that I do.

tt <- lapply(seq_along(lambda), function(i) rexp(1e4, lambda[i]))

# empirical density 
dens <- lapply(seq_along(lambda), function(l) {
  density(tt[[l]], from = 0, to = 100)
  })

x <- dens[[1]]$x
nn <- length(x)

# empirical cdf - there is an easier way using ecdf, I just chose to not do it that way
d_dist <- data.table(do.call(rbind, lapply(1:nn, function(i){
  pr_ltx <- unlist(lapply(seq_along(lambda), function(l){
    mean(tt[[l]] < x[i])
  }))
  pr_ltx
})))
d_dist[, x := x]
d_dist[, g_V1 := dens[[1]]$y]
d_dist[, g_V2 := dens[[2]]$y]
d_dist[, g_V3 := dens[[3]]$y]

d_dist[, g_y_n := g_V1 * V2 * V3 + g_V2 * V1 * V3 + g_V3 * V1 * V2 ]
d_dist[, f := ff(x)]

Desity function for time to first failure (black shows simulation based approx, red shows analytic result

A simpler approach is also available by direct comparison of the random samples.

tt <- do.call(cbind, lapply(seq_along(lambda), function(i) rexp(1e4, lambda[i])))
# Which is the maximum col
ttidx <- cbind(1:nrow(tt), max.col(tt))
# Form a new density from the max
dfig <- data.table(cbind(tt, tt[ttidx]))

Desity function for time to first failure (black shows simulation based approx, red shows analytic result

We could consider a similar question when the lights were wired in series by considering \(X=\mathsf{min}(Y1,Y2,Y3)\) and working through the results again using \(\mathsf{Pr}(Y_{(1)}\le y) = 1 - [1 - F(y)]^n\) and the corresponding result for \(g_{(1)}(y)\) as discussed previously.

Other order statistics

The k-th order statistic can also be derived and while there isn’t time to go into this in detail, the density function can be obtained via

\[ g_{(k)}(y_k) = \frac{n!}{(k-1)!(n-k)!}[F(y_k)]^{k-1}[1-F(y_k)]^{n-k} f(y_k) \]

similar results are available for joint densities of order statistics.

Famous example

The German tank problem is a famous example of the use of order statistics. During World War II, the Allied forces have found themselves in possession of a collection of Nazi tanks. Each had a serial number on. These numbers were assumed to start from 1 and inrement in units of 1. So, under our assumption, the numbers indicate the order in which these weapons were produced and figuring out the highest serial number in use would have amounted to figuring out the size of the Nazi arsenal.

Since these are serial numbers, each of them would occur exactly once in the population, so we have a discrete random variable with a uniform pdf. When you roll a fair die, numbered 1 through 6, each outcome has probability 1/6. When you have serial numbers going from 1 through \(t\), then each outcome has probability \(1/t\).

For further information on this example, see [https://en.wikipedia.org/wiki/German_tank_problem].