Inverse Transform Sampling

Overview

Inverse transform sampling is a method for generating random numbers from any probability distribution by using its inverse cumulative distribution \(F^{-1}(x)\). Recall that the cumulative distribution for a random variable \(X\) is \(F_X(x) = P(X \leq x)\). In what follows, we assume that our computer can, on demand, generate independent realizations of a random variable \(U\) uniformly distributed on \([0,1]\).

Algorithm

Continuous Distributions

Assume we want to generate a random variable \(X\) with cumulative distribution function (CDF) \(F_X\). The inverse transform sampling algorithm is simple:
1. Generate \(U \sim \text{Unif}(0,1)\)
2. Let \(X = F_X^{-1}(U)\).

Then, \(X\) will follow the distribution governed by the CDF \(F_X\), which was our desired result.

Note that this algorithm works in general but is not always practical. For example, inverting \(F_X\) is easy if \(X\) is an exponential random variable, but its harder if \(X\) is Normal random variable.

Discrete Distributions

Now we will consider the discrete version of the inverse transform method. Assume that \(X\) is a discrete random variable such that \(P(X = x_i) = p_i\). The algorithm proceeds as follows:
1. Generate \(U \sim \text{Unif}(0,1)\)
2. Determine the index \(k\) such that \(\sum_{j=1}^{k-1} p_j \leq U < \sum_{j=1}^k p_j\), and return \(X = x_k\).

Notice that the second step requires a search.

Why it works ?

The question pops up is that why this method works. Well!, to see this, we will state the followin theorem

Theorem: Let \(U\) be a continuous random variable having a standard uniform distribution. That is, \(U \sim \operatorname{U}(0,1)\). Then, the random variable

\[ X = F_X^{-1}(U) \]

has a probability distribution characterized by the invertible cumulative distribution function \(F_X(x)\).

Proof: The cumulative distribution function of the transformation \(X = F_X^{-1}(U)\) can be derived as

\[ \begin{split} &\hphantom{=} \mathrm{Pr}(X \leq x) \ &= \mathrm{Pr}(F_X^{-1}(U) \leq x) \ &= \mathrm{Pr}(U \leq F_X(x)) \ &= F_X(x) , \end{split} \]

because the cumulative distribution function of the standard uniform distribution \(\mathcal{U}(0,1)\) is

\[ U \sim \mathcal{U}(0,1) \quad \Rightarrow \quad F_U(u) = \mathrm{Pr}(U \leq u) = u. \]

Computationally, this method involves computing the quantile function of the distribution — in other words, computing the cumulative distribution function (CDF) of the distribution (which maps a number in the domain to a probability between 0 and 1) and then inverting that function many times. This is the source of the term “inverse” or “inversion” in most of the names for this method. Note that for a discrete distribution, computing the CDF is not in general too difficult: we simply add up the individual probabilities for the various points of the distribution. For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions that does not have intractable CDF and other methods are preferred; however, it is a useful method for building more generally applicable samplers such as those based on rejection sampling.

For the normal distribution, the lack of an analytical expression for the corresponding quantile function means that other methods (e.g. the Box–Muller transform) may be computationally favorable. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on.

Simple Implementation (Discrete Random Variable)

Assume our random variable \(X\) can take on any one of \(K\) values with probabilities \(\{p_1, \ldots, p_K\}\). We implement the algorithm below, assuming these probabilities are stored in a vector called prob

discrete_inverse_sampling <- function( prob ) {
  U  <- runif(1)
  if(U <= prob[1]){
    return(1)
  }
  for(state in 2:length(prob)) {
    if(sum(prob[1:(state-1)]) < U && U <= sum(prob[1:state]) ) {
      return(state)
    }
  }
}

Continuous Example: Exponential Distribution

Assume \(Y\) is an exponential random variable with rate parameter \(\lambda = 2\). Recall that the probability density function is \(p(y) = 2e^{-2y}\), for \(y > 0\). First, we compute the CDF: \[F_Y(x) = P(Y\leq x) = \int_0^x 2e^{-2y} dy = 1 - e^{-2x}\]

Solving for the inverse CDF, we get that \[F_Y^{-1}(y) = -\frac{\ln(1-y)}{2}\]

Using our algorithm above, we first generate \(U \sim \text{Unif}(0,1)\), then set \(X = F_Y^{-1}(U) = -\frac{\ln(1-U)}{2}\). We do this in the R code below and compare the histogram of our samples with the true density of \(Y\).

# inverse transform sampling
num_samples <-  1000
U           <-  runif(num_samples)
X           <- -log(1-U)/2
# plot
hist(X, freq=F, xlab='X', main='Exponential Random Variable')
curve(dexp(x, rate=2) , 0, 3, lwd=2, xlab = "", ylab = "", add = T)

Indeed, the plot indicates that our random variables are following the intended distribution.

Discrete Example

Let’s assume we want to simulate a discrete random variable \(X\) that follows the following distribution:

\(x_i\)	\(P(X=x_i)\)
0	0.4
2	0.2
3	0.1
7	0.1
10	0.2

Below we simulate from this distribution using the discrete_inverse_sampling() function defined above, and visualize both the true probability vector, and the empirical proportions from our simulation.

num_samples <- 10000
prob       <- c(0.4, 0.2, 0.1, 0.1, 0.2)
names(prob) <- c("0","2","3","7","10")
samples     <- numeric(num_samples)
for(i in seq_len(num_samples) ) {
  samples[i] <- discrete_inverse_sampling(prob)
}
sim_prob <- table(samples) / sum(table(samples))
names(sim_prob) <-  c("0","2","3","7","10")

barplot(prob, main='True Probability Mass Function')

barplot(sim_prob, main='Empirical Probability Mass Function')

Again, the plot supports our claim that we are drawing from the true probability distribution