Illustration of some properties of MCMC samplers

References

  1. See the slides “An Introduction to MCMC” from my teaching material: link.

Method 1: Inverse Cumulative Distribution Function (not MCMC)

  • Aim: we want to simulate from \(F\), a (cumulative) distribution function.
  • Ingredients :
  1. The inverse of the cumulative distribution function \(F\), \(F^{-1}\), a.k.a. the quantile function.
  2. Generate random numbers from a Uniform(0,1) distribution, this is, random numbers on \((0,1)\).
  • The idea: if we generate numbers \(u_i\) on \((0,1)\), then, the transformed numbers \(F^{-1}(u_i)\) are distributed according to \(F\). This is a theoretical result known as the “Probability integral transform”, and the sampling method is known as “Inverse transform sampling”.

The following R code contains an R about the generation of \(10,000\) numbers from a N(0,1) using this method.

###############################################################################################################################3
# Example 1: Inverse CDF method
###############################################################################################################################3

rm(list=ls()) # clear memory

# Generate 10,000 uniform numbers
u = runif(10000)
# Apply the quantile normal function
z = qnorm(u)
# histogram
hist(z,breaks=100,probability=T)
curve(dnorm,-5,5,add=T,col="blue",lwd=2)
box()

# Summary statistics
median(z) # median
## [1] -0.02004848
quantile(z,c(0.025,0.975)) # 95% probability interval
##      2.5%     97.5% 
## -1.933969  1.936913
mean(z) # mean
## [1] -0.01259031
sd(z) # standard deviation
## [1] 0.9970053
rm(list=ls()) # clear memory

# Generate one uniform and map it to a normal random number 
set.seed(123)
u0 = round(runif(1),digits=2) # Generate a single uniform on (0,1)
print(u0)
## [1] 0.29
z0 = round(qnorm(u0),digits=2) # Transform it using the normal quantile function
print(z0)
## [1] -0.55
curve(pnorm,-5,5,ylab="CDF",xlab="z", xaxt='n',  yaxt='n',lwd=2,cex.lab=1.5) # plot
axis(side = 2, at = c(u0,0,1),cex.axis=1.5)
axis(side = 1, at = c(z0,-5,5),cex.axis=1.5)
abline(v=z0,lty=2,col="blue"); abline(h=u0,lty=2,col="blue")
points(z0,u0,col="red",pch=20,cex=2)

# Generate one uniform and map it to a normal random number (again with a different seed)
set.seed(12345)
u0 = round(runif(1),digits=2) # Generate a single uniform on (0,1)
print(u0)
## [1] 0.72
z0 = round(qnorm(u0),digits=2) # Transform it using the normal quantile function
print(z0)
## [1] 0.58
curve(pnorm,-5,5,ylab="CDF",xlab="z", xaxt='n',  yaxt='n',lwd=2,cex.lab=1.5) # plot
axis(side = 2, at = c(u0,0,1),cex.axis=1.5)
axis(side = 1, at = c(z0,-5,5),cex.axis=1.5)
abline(v=z0,lty=2,col="blue"); abline(h=u0,lty=2,col="blue")
points(z0,u0,col="red",pch=20,cex=2)

Method 2: The Metropolis algorithm

Ingredients

  • An initial point.
  • A density function proportional to the posterior distribution: \[p(\theta)= f({\bf x}\vert \theta)\pi(\theta) \propto \pi(\theta\vert {\bf x}).\]
  • A proposal distribution \(g(\theta \vert \eta)\) with \(g(\theta \vert \eta) = g(\eta \vert \theta)\). You need to be able to simulate from this distribution (e.g. \(g(\theta \vert \eta) = Normal(\eta,\sigma^2)\)), where \(\sigma\) is fixed and controls the length of the steps between iterations.

Note that we do not need the normalising constant.

Directions

  • Initialise the chain at \(\theta_1\). (Generate it from the prior or invent one).
  • Generate a proposal draw \(\tilde{\theta} \sim g(\theta\vert \theta_1)\).
  • Calculate \(\alpha = \min\left\{1,\dfrac{p(\tilde{\theta})}{p(\theta_1)}\right\}\).
  • Generate \(u\sim Unif(0,1)\). If \(u<\alpha\) then \(\theta_2 = \tilde{\theta}\) (move), otherwise \(\theta_2=\theta_1\) (no move).
  • The new initial point is \(\theta_2\) for the next iteration.

The following R code illustrates a single step of the Metropolis algorithm.

###############################################################################################################################3
# Metropolis illustration
###############################################################################################################################3

rm(list=ls())

target = Vectorize(function(x) dnorm(x)  ) # target distribution: normal
prop = function(x,y) dlogis(x,y,2)   # proposal distribution: logistic

x0 = 1 # initial point
# Generate candidate
set.seed(123)
xt = rlogis(1,x0,2)

prop0 = Vectorize( function(x) prop(x,x0))

curve(target, -10,10,lwd=2,cex=1.5,cex.lab=1.5,n=1000)  # target distribution
curve(prop0, -10,10,lwd=2,col="black",add=T,lty=2,n=1000)  # proposal distribution
abline(v=x0,col="blue",lwd=2)  # initial point
abline(h=target(x0),col="blue") # target at initial point
abline(v=xt,col="red",lwd=2)  # candidate point
abline(h=target(xt),col="red")  # target at candidate point

alpha = min(1,target(xt)/target(x0))
print(alpha)
## [1] 1
# alpha is 1, then we move to x2 = xt since any u~unif(0,1) will be smaller than alpha.

Other properties of the Metropolis Algorithm: Initial points, choice of the proposal (narrow, wide), and randomness of the sampler.

###############################################################################################################################
# Example 2: Convergence of Metropolis Algorithm
###############################################################################################################################

# Effect of initial points

rm(list=ls()) # clear memory

library(mcmc) # Using a package
## Warning: package 'mcmc' was built under R version 3.3.3
# simulate data
set.seed(123)
x <- rnorm(30,0,1)

# log posterior distribution of (mu,sigma)
# x ~ N(mu,sigma)
# mu ~ N(0,1000000)
# sigma ~ Gamma(0.001,0.001)
log.post <- function(par){
  mu = par[1]; sigma = par[2];
if(sigma>0){
  log.lik <- sum(dnorm(x,mu,sigma,log=T))
  log.prior <- dnorm(mu,0,1000,log=T)  + dgamma(sigma,0.001,0.001,log=T)
  return(log.lik + log.prior)
  }
  else return(-Inf)
}

# Three initial points
init1 <- c(0,1)
init2 <- c(5,1)
init3 <- c(-5,1)

# Chains for the three initial points
set.seed(1234)
chain1 <- metrop(log.post,init1,scale=0.25,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain2 <- metrop(log.post,init2,scale=0.25,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain3 <- metrop(log.post,init3,scale=0.25,nbatch=1000)

# Plots that illustrate the convergence of the Metropolis algorithm after a number of iterations for different initial points
plot(chain1$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4), xlab=~mu,ylab=~sigma,cex.axis=1.5,cex.lab=1.5); abline(v=0) ; abline(h=1);
points(chain2$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="blue")
points(chain3$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="red")

# Traceplots for mu
plot(chain1$batch[,1],type="l")

plot(chain2$batch[,1],type="l",col="blue")

plot(chain3$batch[,1],type="l",col="red")

##############################################################################################################################################################################


# Narrow proposal

# Three initial points
init1 <- c(0,1)
init2 <- c(5,1)
init3 <- c(-5,1)

# Chains for the three initial points
set.seed(1234)
chain1 <- metrop(log.post,init1,scale=0.045,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain2 <- metrop(log.post,init2,scale=0.045,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain3 <- metrop(log.post,init3,scale=0.045,nbatch=1000)

# Plots that illustrate the convergence of the Metropolis algorithm after a number of iterations for different initial points
plot(chain1$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4), xlab=~mu,ylab=~sigma,cex.axis=1.5,cex.lab=1.5); abline(v=0) ; abline(h=1);
points(chain2$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="blue")
points(chain3$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="red")

# Traceplots for mu
plot(chain1$batch[,1],type="l")

plot(chain2$batch[,1],type="l",col="blue")

plot(chain3$batch[,1],type="l",col="red")

##############################################################################################################################################################################


# Wide proposal

# Three initial points
init1 <- c(0,1)
init2 <- c(5,1)
init3 <- c(-5,1)

# Chains for the three initial points
set.seed(1234)
chain1 <- metrop(log.post,init1,scale=0.5,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain2 <- metrop(log.post,init2,scale=0.5,nbatch=1000)
# Chains for the three initial points
set.seed(1234)
chain3 <- metrop(log.post,init3,scale=0.5,nbatch=1000)

# Plots that illustrate the convergence of the Metropolis algorithm after a number of iterations for different initial points
plot(chain1$batch,type="l",xlim=c(-6,6),ylim=c(0.5,6), xlab=~mu,ylab=~sigma,cex.axis=1.5,cex.lab=1.5); abline(v=0) ; abline(h=1);
points(chain2$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="blue")
points(chain3$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="red")

# Traceplots for mu
plot(chain1$batch[,1],type="l")

plot(chain2$batch[,1],type="l",col="blue")

plot(chain3$batch[,1],type="l",col="red")

##############################################################################################################################################################################


# Randomness of the MCMC

#  Same Initial point
init1 <- c(0,1)
init2 <- c(0,1)
init3 <- c(0,1)

# Chain 1
set.seed(123)
chain1 <- metrop(log.post,init1,scale=0.5,nbatch=1000)
# Chain 2
set.seed(1234)
chain2 <- metrop(log.post,init2,scale=0.5,nbatch=1000)
# Chain 3
set.seed(12345)
chain3 <- metrop(log.post,init3,scale=0.5,nbatch=1000)

# Plots that illustrate the randomness of the Metropolis algorithm for the same initial point but different seeds
plot(chain1$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4), xlab=~mu,ylab=~sigma,cex.axis=1.5,cex.lab=1.5); abline(v=0) ; abline(h=1);
points(chain2$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="blue")
points(chain3$batch,type="l",xlim=c(-6,6),ylim=c(0.5,4),col="red")

# Traceplots for mu
plot(chain1$batch[,1],type="l",ylim=c(-0.6,0.7))
points(chain2$batch[,1],type="l",col="blue")
points(chain3$batch[,1],type="l",col="red")

To center or not to center: The effect of centering the covariates on the autocorrelation in MCMC samplers

#################################################################################################################################################################
# The effect of centering the covariates on the autocorrelation in MCMC samplers
#################################################################################################################################################################

rm(list=ls())
library(mcmc)

# Simulated data y = 1 + 2*x + e 
set.seed(123)
x = rnorm(100,50,0.5)
set.seed(1234)
e = rnorm(100)
y = 1 + 2*x + e
###################################
# Non-centered analysis
###################################
# beta[1] ~ N(0,1000000)
# beta[2] ~ N(0,1000000)
# sigma ~ Gamma(0.001,0.001)
log.post = function(par){
  if(par[3]>0){
    log.lik = sum(dnorm(y-par[1]-par[2]*x,0,par[3],log=T))
    log.prior = dnorm(par[1],0,1000,log=T) + dnorm(par[2],0,1000,log=T) + dgamma(par[3],0.001,0.001,log=T)
    return(log.lik+log.prior)
  }
  else return(-Inf)
}


chain = metrop(obj = log.post, initial = c(1,2,1),nbatch = 100000, scale = c(0.01,.01,0.02))

# Checking the acceptance rate as a diagnostic tool.
# The optimal acceptance rate is 0.234, but this is not a definitive diagnostic tool, just a reference.
chain$accept
## [1] 0.2346
plot(chain$batch[,1],type="l")

plot(chain$batch[,2],type="l",col="blue")

plot(chain$batch[,3],type="l",col="red")

###########################################
# Centered analysis
# A better mixing of the chains is 
# observed after centering the covariate
###########################################
m = mean(x)

log.post1 = function(par){
  if(par[3]>0){
    log.lik = sum(dnorm(y-par[1]-par[2]*(x-m),0,par[3],log=T))
    log.prior = dnorm(par[1],0,1000,log=T) + dnorm(par[2],0,1000,log=T) + dgamma(par[3],0.001,0.001,log=T)
    return(log.lik+log.prior)
  }
  else return(-Inf)
}


# Checking the acceptance rate as a diagnostic tool.
chain1 = metrop(obj = log.post1, initial = c(101,2,1),nbatch = 100000, scale = 0.175)
chain1$accept
## [1] 0.24827
plot(chain1$batch[,1],type="l")

plot(chain1$batch[,2],type="l",col="blue")

plot(chain1$batch[,3],type="l",col="red")