The Metropolis-Hasting Algorithm

Section 7.1 - Using MH to simulate from N(\( \mu \) ,\( \Sigma \)), where:

\[ \mu = \begin{pmatrix} 1 & 4 \end{pmatrix} \]

\[ \Sigma = \begin{pmatrix} 1 & 0.9 \\ 0.9 & 1 \\ \end{pmatrix} \]

The idea was to follow section 7.1 in the paper to simulate data from the given Normal. One of the challenges with the MH algorithm is to figure out which is the best distribution we should generate candidates from in order get a good convergence. In this case the paper propose 4 alternatives. The first one uses a bivariate uniform distribution, and this is the one I try to replicate here.

First I created a function that do the job. I know this is not the most efficient function, and sorry I did not include any comments.

require("reshape")

## Loading required package: reshape

## Loading required package: plyr

## Attaching package: 'reshape'

## The following object is masked from 'package:plyr':
## 
## rename, round_any

require("ggplot2")

## Loading required package: ggplot2


mh <- function(draws, u1, u2, corr, unif1, unif2) {

    x <- matrix(c(u1, u2), nrow = 2, ncol = 1)
    y <- matrix(c(1, 1), nrow = 2, ncol = 1)

    mu <- matrix(c(1, 4), nrow = 2, ncol = 1)
    sigma <- matrix(c(1, corr, corr, 1), nrow = 2, ncol = 2)
    seq <- matrix(ncol = 2, nrow = 0)
    acceptance <- vector()
    xy <- matrix(ncol = 4, nrow = 0, byrow = T)

    for (i in 1:draws) {

        x1 <- runif(1, min = -unif1, max = unif1)
        x2 <- runif(1, min = -unif2, max = unif2)

        y <- x + c(x1, x2)
        xy <- rbind(xy, c(x, y))

        num <- exp(-(t(y - mu) %*% solve(sigma) %*% (y - mu))/2)
        den <- exp(-(t(x - mu) %*% solve(sigma) %*% (x - mu))/2)

        alpha <- min(num/den, 1)

        rnd <- runif(1, min = 0, max = 1)
        if (rnd < alpha) {
            seq <- rbind(seq, t(y))
            acceptance[i] <- 1
        } else {
            seq <- rbind(seq, t(x))
            acceptance[i] <- 0
        }

        x <- as.matrix(seq[i, ])

    }
    seq
}

Let's see how this works. I run a simulation and generate some dataframes.

mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.75, unif2 = 1)

seq1 <- as.data.frame(mh1)
seq1$order <- seq(1:nrow(seq1))
names(seq1) <- c("Y1", "Y2", "Draw")

long1 <- melt(seq1, id = "Draw")  # convert to long format
names(long1) <- c("Draw", "Variable", "Value")

Then we look at the generated series

ggplot(data = long1, aes(x = Draw, y = Value, colour = Variable)) + geom_line()

plot of chunk unnamed-chunk-3

Well, in general it seems that it converged to the correct mean, although there is a high degree of dependence. Let's look at a scatterplot

ggplot(seq1, aes(x = Y1, y = Y2)) + geom_point()

plot of chunk unnamed-chunk-4

Looks as expected! I decided to generate fewer observations to see the convergence in two dimensions.

mh1 <- mh(draw = 100, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.75, unif2 = 1)

seq2 <- as.data.frame(mh1)
seq2$order <- seq(1:nrow(seq2))
names(seq2) <- c("Y1", "Y2", "Draw")

ggplot(seq2, aes(x = Y1, y = Y2)) + geom_point() + geom_path()

plot of chunk unnamed-chunk-5

This is fun. By changing the parameters in the function we can see how it affects the simulated data. For example, we can insert more variability in the candidate distribution:

mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 5, unif2 = 5)

seq3 <- as.data.frame(mh1)
seq3$order <- seq(1:nrow(seq3))
names(seq3) <- c("Y1", "Y2", "Draw")

long3 <- melt(seq3, id = "Draw")  # convert to long format
names(long3) <- c("Draw", "Variable", "Value")

ggplot(data = long3, aes(x = Draw, y = Value, colour = Variable)) + geom_line()

plot of chunk unnamed-chunk-6

Or we can use less variability:

mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.2, unif2 = 0.2)

seq4 <- as.data.frame(mh1)
seq4$order <- seq(1:nrow(seq4))
names(seq4) <- c("Y1", "Y2", "Draw")

long4 <- melt(seq4, id = "Draw")  # convert to long format
names(long4) <- c("Draw", "Variable", "Value")

ggplot(data = long4, aes(x = Draw, y = Value, colour = Variable)) + geom_line()

plot of chunk unnamed-chunk-7

Let's look at the autocorrelation:

par(mfrow = c(3, 1))
acf(ts(seq1)[, 1])
acf(ts(seq3)[, 1])
acf(ts(seq4)[, 1])

plot of chunk unnamed-chunk-8