Introduction

First a simple example of inverting a function. If \(f(x) = 3 x - 4\) then we can find the inverse as follows

\[ \begin{aligned} \text{replace } f(x) \text{ with y:}& \quad y = 3 x - 4 \\ \text{solve for x:}& \quad x = \frac{y + 4}{3} \\ \text{swap sides:}& \quad y = \frac{x + 4}{3} \\ \text{replace }y \text{ with } f^{-1}(x):& \quad f^{-1}(x) = \frac{x + 4}{3} \end{aligned} \]

Say we want to generate random samples from an arbitrary probability distribution. The inverse transform states that if \(X\) is a continuous random variable with cdf \(F_X\), then the random variable \(U = F_X(X)\) has a uniform distribution on \([0, 1]\). Therefore, if we can invert \(F_X\) and feed the inverse with random samples from a standard uniform distribution, then the result will be random samples from our desired distribution.

For example, the cdf for the exponential distribution is \(F(x) = 1 - exp(-\lambda x)\) for \(x \ge 0\) and zero otherwise. The inverse is of \(F(x)\) is \(F^{-1}(x) = -\frac{\log(1-x)}{\lambda}\) and we can use this to generate samples from the exponential distribution.

Example

You will need the following libraries.

# you need to load rsconnect in your current environment before trying to publish to rpubs
# see https://github.com/rstudio/rstudio/issues/3389
library(rsconnect)
library(splines2)
u <- runif(10000)
lambda <- 2
x <- -log(1-u)/lambda
hist(x, probability = TRUE)
xx <- seq(0, 5, len = 100)
yy <- dexp(xx, lambda)
lines(xx,yy, col = 2, lwd = 2)

However, there will be occasions where the inverse cannot be found. When that happens, we can use a root finding approach to obtain the value of the inverse we require. In other words, we numerically solve \(S(t) - U = 0\) for \(t\) where \(U\) is a draw from a standard uniform and \(S(t)\) is the survival function. This will give us a random sample from the survival distribution. You can also use the cumulative hazard if that is all that is available by taking advantage of the fact that \(S(t) = \exp(-H(t))\). For an exponential distribution, the cumulative hazard is \(H(x) = \lambda x\), which is what we will use below.

lambda <- 2
# This controls the range to be searched. Needs to be > 0.
interval = c(1E-8, 25)

# To get sensible values.
return_finite <- function(x){
  x <- min(x, .Machine$double.xmax)
  x <- max(x, -.Machine$double.xmax)
  x
}

# The function to be solved. Note that the log transform is used
# to give numerical stability.
rootfn <- function(t, u){
  surv <- exp(-lambda * t)
  return_finite(log(surv) - log(u))
}

# Creates random draws
sample_me <- function(){
  u_i <- runif(1)
  # I have built in censoring such that if we are outside our search range
  # then a censored value is returned.
  at_limit <- rootfn(interval[2], u = u_i)
  
  if(at_limit > 0){
    return(c(interval[2], 0))
  } else {
    # trying to find the value for t at which S(t) - u = 0
    t_i <- stats::uniroot(rootfn, 
                          u = u_i, 
                          interval = interval, 
                          check.conv = TRUE)$root
    # our random sample from the surv dist
    return(c(t_i, 1))
  }
}
tnew <- t(replicate(10000, sample_me()))
hist(tnew[, 1], probability = TRUE, xlab = "xnew", main = "Histogram of xnew")
xx <- seq(0, 5, len = 100)
yy <- dexp(xx, lambda)
lines(xx,yy, col = 2, lwd = 2)

Conclusion

We started by finding the inverse of a simple function, then introduced inverse transform sampling as a way to generate samples from an arbitrary distribution. When the inverse for the target cdf cannot be found, we can use a root finding approach to numerically solve for the desired sample times.