Metropolis-Hasting Algorithm

This post is an attempt to provide a friendly introduction to markov chain monte carlo method, particularly the most basic method Metropolis-Hasting algorithm. We will use these clever methods to construct and sample from any arbitrary posterior distributions. Obviously, we may not obtain exact values for those posterior quantities, which is extremely difficult. However, we possibly generate random sample values of parameters from approximating distributions and rectifying those draws to attain a better approximation of the target posterior distribution. The underlying engine that make the method work is not solely the Markov property of the draw process (this sequential draws form a Markov Chain) but rather that the improvement of the approximate distributions at each simulation step underpins the convergence to the target distribution. Before diving into more details, we will start with a interesting example collecting from Kruschke’s brilliant book.

An interesting example

A full description of the problem could be found at Doing Bayesian Data Analysis’s page 146-147. Shortly, there is a chain of islands that produces canvas and a politician has to visit all the island according to their populations. He comes up with a simple, heuristic strategy based on the population of each island to determine which island to visit next. If the neighboring island has a smaller population, then the politician visits with probability \(p = p_{\text{neighbor}}/p_\text{current}\); otherwise the politician stays on the same island. Repeating the process many times, the relative population of each island will be proportional to the number of times the politician spends on each island; thereby, correctly estimating the distribution of island populations.

Random Walk model

Based on the book, we sketch the problem more specifically. Assumed that there are \(7\) islands indexed by \(\theta\), in which the leftmost island and the rightmost is \(1\) and \(7\) respectively. Moreover, we may further assume that the relative population is linearly increasing such that \(P(\theta) = \theta\), where \(P()\) could be considered as a relative population function (It returns the relative population over total populations, this is not a probability function). We will demonstrate a simple trajectory with R as below.

# Data manipulation
library(dplyr)
# Data visualization
library(ggplot2)
library(patchwork)
library(cowplot)

set.seed(28)

n_days <- 5e4
positions <- rep(0, n_days)
current <- 3
for (i in 1:n_days){
  # record the current position
  positions[i] <- current
  # Toss a coin to generate the proposal quantity
  # Note!!! the Sample function if the argument prob is NULL, then equal weights for the input vector
  
  # Generate the next position (island)
  proposal <- current + sample(c(-1,1), size = 1)
  # Restrict the draw from 1 to 7
  if (proposal < 1) proposal <- 7
  if (proposal > 7) proposal <- 1
  # Switch 
  accept_prob <- proposal / current
  current <- ifelse(runif(1) < accept_prob, proposal, current)
}

p1 <- tibble(theta = positions) %>%
  ggplot(aes(x = theta)) +
  geom_bar(fill = "darkorange") +
  scale_x_continuous(expression(theta), breaks = 1:7) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
  theme_cowplot()

p2 <- tibble(t = 1:n_days, theta = positions) %>%
  slice(1:500) %>%
  ggplot(aes(x = theta, y = t)) +
  geom_path(size = 1/4, color = "darkorange") +
  geom_point(size = 1/2, alpha = 1/2, color = "darkorange") + 
  scale_x_continuous(expression(theta), breaks = 1:7) +
  scale_y_log10("Time Step", breaks = c(1,2,5,20,100,500)) +
  theme_cowplot()

p3 <- tibble(x = 1:7,
       y = 1:7) %>% 
  ggplot(aes(x = x, y = y)) +
  geom_col(width = .2, fill = "darkorange") +
  scale_x_continuous(expression(theta), breaks = 1:7) +
  scale_y_continuous(expression(p(theta)), expand = expansion(mult = c(0, 0.05))) +
  theme_cowplot()
                       
layout <- "AAA
           BBB
           CCC
           "
p1 + p2 + p3 + plot_layout(design = layout)

The middle plot illustrates one possible trajectory made by the politician. The y-axis represents the evolution of time and the x-axis is the island or the position of the politician. At time \(t= 1\), the politician starts at island \(3\) (\(\theta_{\text{current}} = 3\)). On the second day, he tosses the coin to propose the next position (\(\theta_{\text{proposal}}\)). In our case, he lands on island \(2\) with probability \(\frac{p(\theta_{\text{proposal}})}{p(\theta_{\text{current}})}\) if this probability is greater than a uniformly random draw (\(\mathcal{U}(0,1)\)). The time is scaled logarithmically so that the early steps and later steps could be examined.

The first plot and the last plot debscribed the frequency of visiting each island and the relative population of each of them. Particularly, these two plots are approximately identical, which guarantees the convergence of our statement above. In specific, if we simulates mroe steps, then the approximation will be close to the actual quantities.

General Metropolis-Hasting scheme and its properties

To be continued …