(Instructor : Nishant Panda)

Additional References

  1. (IMC) : Introducing Monte Carlo Methods with R, Christian P. Robert & George Casella, Springer

  2. (MMT) : Monte Carlo Methods, Robert & Casella

Introduction

We ended last lecture with an example that showed how a careful adjustment of the integral and the sampling density can lead to a reduction in variance. We explore this topic by looking at Importance sampling. The idea is to somehow get samples from rare events.

Importance sampling

Let \(X \sim f\) be a random variable (could be multidimensional!) and consider a real valued function \(g\) and the corresponding random variable \(g(X)\). Then, the expectation of \(g(X)\) is given by, \[ \theta = \mathbb{E}_{f}\left[g(X)\right] = \int\limits_{\mathcal{X}}g(x)f(x)dx, \] where \(\mathcal{X}\) is the range of \(X\) (support of \(f(x)\)). The subscript \(f\) emphasizes the density with respect to which we are computing the expectation. Let \(h\) be another density such that \(h(x) > 0\) on \(\mathcal{X}\), then we can write the above integral as \[ \int\limits_{\mathcal{X}}g(x)f(x)dx = \int\limits_{\mathcal{X}}g(x)\frac{f(x)}{h(x)}\cdot h(x)dx = \mathbb{E}_{h}\left[g(X)\frac{f(X)}{h(X)}\right]. \]

Hence, \[ \theta = \mathbb{E}_{f}\left[g(X)\right] = \mathbb{E}_{h}\left[g(X)\frac{f(X)}{h(X)}\right]. \] We can now approximate the expectation with respect to \(h\) using the Monte Carlo method by sampling from \(h\). Thus, importance sampling alogrithm for approximating expectation is given by

  1. Generate \(n\) samples from \(h(x)\): \(X_1, X_2, \dots, X_n\). Then,
  2. the monte carlo approximation w.r.t \(h\) is given by, \[\widehat{\theta} = \frac{1}{n}\sum\limits_{i = 1}^{n}g(X_i)\frac{f(X_i)}{h(X_i)}.\]

For this strategy to work, it must be easy to sample from \(h\) and to evaluate \(f\)! Some notation,

  1. The ratio \(w = \frac{f}{h}\) is called the likelihood ratio.
  2. The function \(h\) is called the importance function.

The likelihood ratio act as weights to the original function \(f\).

Example 1: Let \(Z \sim N(0,1)\) be the standard normal. Compute the probaility \(P(Z > 4.5)\) using standard monte carlo and importance sampling.

Note that \(Z > 4.5\) is a rare event

1 - pnorm(4.5)
## [1] 3.397673e-06

Let \(f\) be the standard normal density. To use standard monte carlo, we note that \[ \theta = P(Z > 4.5) = \mathbb{E}_{f}\left[\mathbb{I}_{(Z > 4.5)}\right]. \] Thus, for \(n\) samples \(Z_1, Z_2, \dots, Z_n\) from \(N(0,1)\), the monte carlo estimate is given by \[ \widehat{\theta_{MC}} = \frac{1}{n}\sum\limits_{i}^{n}\mathbb{I}_{(Z_i > 4.5)}. \]

Let us compute \(\widehat{\theta_{MC}}\) using \(10000\) samples.

z <- rnorm(1e4)
theta.mc <- mean(z > 4.5)
print(theta.mc)
## [1] 0

Note that the rare event \((4.5, \infty)\) gets fewer samples (0 samples in the example above!) and doesn’t contribute much to monte carlo approximation. What if we consider a density that was strictly above \(0\) in this region. A natural(!) choice for \(h\) is shifted \(Exp(1)\) i.e \(X \sim Exp(1)\) shifted to \(X > 4.5\). That is \(h\) is the conditional density, conditioned when \(X > 4.5\) \[ h(x) = \frac{\mathbb{I}_{X > 4.5}\text{ }e^{-x}}{\int_{4.5}^{\infty}e^{-y}dy} = e^{-(x - 4.5)}, \]

for \(x \geq 4.5\). Let us plot this function (the importance function)

x.seq <- seq(from = 4.5, to = 10, length.out = 101)
y.seq <- sapply(x.seq, function(x) exp(-1*(x - 4.5)))
plot(x.seq, y.seq, type = "l", xlab = "x", ylab = "h(x)", lwd = 3, col = "blue")

Now, let us plot the likelihood ratio

w.seq <- sapply(x.seq, function(x) dnorm(x)/dexp(x-4.5))
plot(x.seq, w.seq, type = "l", col = "red", xlab = "x", ylab = "f(x)/h(x)", lwd = 3)

Now let us carry out the monte carlo integration using importance sampling

# number of simulations
n.sim <- 1e3
# get samples from h
x <- rexp(n.sim) + 4.5
# get the likelihood ratio
w <- dnorm(x)/dexp(x-4.5)

# mc importance sampling
# Note we don't have to multiply by g(x) because
# g(x) is always 1!
theta.hat <- mean(w)

print(theta.hat)
## [1] 3.328885e-06

Choosing importance functions is a delicate matter. The single most important criteria for choosing an importance function \(h\) is,

\(h\) should have a heavier tail than \(f\) and \(\frac{f}{h}\) should be bounded.