\[ \mu = \begin{pmatrix} 1 & 4 \end{pmatrix} \]
\[ \Sigma = \begin{pmatrix} 1 & 0.9 \\ 0.9 & 1 \\ \end{pmatrix} \]
The idea was to follow section 7.1 in the paper to simulate data from the given Normal. One of the challenges with the MH algorithm is to figure out which is the best distribution we should generate candidates from in order get a good convergence. In this case the paper propose 4 alternatives. The first one uses a bivariate uniform distribution, and this is the one I try to replicate here.
First I created a function that do the job. I know this is not the most efficient function, and sorry I did not include any comments.
require("reshape")
## Loading required package: reshape
## Loading required package: plyr
## Attaching package: 'reshape'
## The following object is masked from 'package:plyr':
##
## rename, round_any
require("ggplot2")
## Loading required package: ggplot2
mh <- function(draws, u1, u2, corr, unif1, unif2) {
x <- matrix(c(u1, u2), nrow = 2, ncol = 1)
y <- matrix(c(1, 1), nrow = 2, ncol = 1)
mu <- matrix(c(1, 4), nrow = 2, ncol = 1)
sigma <- matrix(c(1, corr, corr, 1), nrow = 2, ncol = 2)
seq <- matrix(ncol = 2, nrow = 0)
acceptance <- vector()
xy <- matrix(ncol = 4, nrow = 0, byrow = T)
for (i in 1:draws) {
x1 <- runif(1, min = -unif1, max = unif1)
x2 <- runif(1, min = -unif2, max = unif2)
y <- x + c(x1, x2)
xy <- rbind(xy, c(x, y))
num <- exp(-(t(y - mu) %*% solve(sigma) %*% (y - mu))/2)
den <- exp(-(t(x - mu) %*% solve(sigma) %*% (x - mu))/2)
alpha <- min(num/den, 1)
rnd <- runif(1, min = 0, max = 1)
if (rnd < alpha) {
seq <- rbind(seq, t(y))
acceptance[i] <- 1
} else {
seq <- rbind(seq, t(x))
acceptance[i] <- 0
}
x <- as.matrix(seq[i, ])
}
seq
}
Let's see how this works. I run a simulation and generate some dataframes.
mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.75, unif2 = 1)
seq1 <- as.data.frame(mh1)
seq1$order <- seq(1:nrow(seq1))
names(seq1) <- c("Y1", "Y2", "Draw")
long1 <- melt(seq1, id = "Draw") # convert to long format
names(long1) <- c("Draw", "Variable", "Value")
Then we look at the generated series
ggplot(data = long1, aes(x = Draw, y = Value, colour = Variable)) + geom_line()
Well, in general it seems that it converged to the correct mean, although there is a high degree of dependence. Let's look at a scatterplot
ggplot(seq1, aes(x = Y1, y = Y2)) + geom_point()
Looks as expected! I decided to generate fewer observations to see the convergence in two dimensions.
mh1 <- mh(draw = 100, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.75, unif2 = 1)
seq2 <- as.data.frame(mh1)
seq2$order <- seq(1:nrow(seq2))
names(seq2) <- c("Y1", "Y2", "Draw")
ggplot(seq2, aes(x = Y1, y = Y2)) + geom_point() + geom_path()
This is fun. By changing the parameters in the function we can see how it affects the simulated data. For example, we can insert more variability in the candidate distribution:
mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 5, unif2 = 5)
seq3 <- as.data.frame(mh1)
seq3$order <- seq(1:nrow(seq3))
names(seq3) <- c("Y1", "Y2", "Draw")
long3 <- melt(seq3, id = "Draw") # convert to long format
names(long3) <- c("Draw", "Variable", "Value")
ggplot(data = long3, aes(x = Draw, y = Value, colour = Variable)) + geom_line()
Or we can use less variability:
mh1 <- mh(draw = 5000, u1 = 2, u2 = 1, corr = 0.9, unif1 = 0.2, unif2 = 0.2)
seq4 <- as.data.frame(mh1)
seq4$order <- seq(1:nrow(seq4))
names(seq4) <- c("Y1", "Y2", "Draw")
long4 <- melt(seq4, id = "Draw") # convert to long format
names(long4) <- c("Draw", "Variable", "Value")
ggplot(data = long4, aes(x = Draw, y = Value, colour = Variable)) + geom_line()
Let's look at the autocorrelation:
par(mfrow = c(3, 1))
acf(ts(seq1)[, 1])
acf(ts(seq3)[, 1])
acf(ts(seq4)[, 1])