Metropolis Algorithm, an MCMC approach to sampling the Cauchy Distribution

Michael Lanier
8/4/2016

Abstract

The Metropolis algorithm is a popular way of generating random samples of hard to sample distributions. This project uses the Metropolis algorithm to generate data from the Cauchy distribution. We then seek to find the relationship between the variance of the proposal function and the converge properties of the Metropolis algorithm; notably the burn-in time and the auto correlation of the generated series. Finally, further questions related to the variance size will be asked.

Introduction

The purpose of this project is to use the Metropolis algorithm to sample from a Cauchy Distribution using various proposal distributions. We also seek to examine the relationship between the converge of the Markov chain and the variance of the proposal distribution.

Algorithm

The Metropolis Algorithm for a Cauchy- Normal can be defined as such:

Let f be the pdf for X~Cauchy(0,1). Let g be the pdf for Y~N(0,Var)
1. Generate X0 from a distribution g.
2. Repeat (until the chain has converged to a stationary distribution according to some criteria): a. Generate Y from a distribution g. b. Generate U from Uniform(0, 1). c. If U < (f(Y )g(Xt|Y ))/(f(Xt)g(Y |Xt)) accept Y and set Xt+1 = Y ; otherwise set Xt+1 = Xt. 9 d. Increment t. . Note in step (3c) the candidate point Y is accepted with probability alpha(Xt, Y ) = min(1,(f(Y )g(Xt|Y ))/(f(Xt)g(Y |Xt)))

Note since g (ie (g(Y|Xt)=g(Y))) is independent: alpha(Xt, Y ) = min(1,(f(Y )*g(Xt))/(f(Xt)g(Y))

First we will define variables to be used in the simulation.

Results

set.seed(1234)
nsim=10000
cauchy=function(x)
{
  1/(1+x^2)
}

Results

x=vector(mode="numeric",length=nsim)
x[1]=rnorm(1,0,2) # initialize the chain from the stationary
for (i in 2:nsim){
  y=rnorm(1,0,2) # candidate normal
  u=runif(1)
  alpha=min(1,cauchy(y)*dnorm(x[i-1],0,2)/(cauchy(x[i-1])*dnorm(y,0,2)))
  if(u<=alpha)
  {
    x[i]=y
  }else x[i]=x[i-1]
}

Data

library(ggplot2)
qplot(1:10000,x[1:10000],xlab="Index",ylab="Value")

plot of chunk unnamed-chunk-4

Data

plot of chunk unnamed-chunk-5

Data

This data appears to be Cauchy distributed.

plot of chunk unnamed-chunk-6

Data

This data is has heavy tails but appears symmetric just like the Cauchy function. This Auto correlation plot shows that the data is correlated.Auto correlation is problematic because correlated samples with produce local areas of samples that do not resemble the target distribution.

plot of chunk unnamed-chunk-7

Run Metropolis Algorithm using t distribution Proposal

This data does not appear to be Cauchy distributed. It is bi modal.

par(mfrow = c(2, 1))
x[1]=rt(1,20)# initialize the chain from the stationary
for (i in 2:nsim){
  y=rt(1,20) # candidate normal
  u=runif(1)
  alpha=min(1,cauchy(y)*dt(x[i-1],20)/(cauchy(x[i-1])*dt(y,20)))
  if(u<=alpha)
  {
    x[i]=y
  }else x[i]=x[i-1]
}

Run Metropolis Algorithm using t distribution Proposal

plot of chunk unnamed-chunk-9

qq plot

The data has two major tails.

plot of chunk unnamed-chunk-10

Test of goodness of fit

ks.test(x,"cauchy")


    One-sample Kolmogorov-Smirnov test

data:  x
D = 0.96125, p-value < 2.2e-16
alternative hypothesis: two-sided

The chi square goodness of fit test indicates that the data is a good fit.

View Auto correlation

par(mfrow = c(2, 1))
TS=ts(data = x, start = 1, end = nsim, frequency = 1)

acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)

plot of chunk unnamed-chunk-12

The issue appears to be that the data is extremely correlated, so the burn in time is going to be large.

Effect of Variance of sampler function

plot of chunk unnamed-chunk-14

[1] "Precentage rejected is  0.2604"

Effect of Variance of sampler function

plot of chunk unnamed-chunk-15

[1] "Precentage rejected is  0.4644"

Effect of Variance of sampler function

plot of chunk unnamed-chunk-16

[1] "Precentage rejected is  0.5818"

Effect of Variance of sampler function

plot of chunk unnamed-chunk-17

[1] "Precentage rejected is  0.6713"

Conclusion

The variance of the sampler is inversely related to the auto correlation and directly related to the count of rejected points. With a small variance in the sampler function the series is has high auto correlation but large acceptance rates. This means it can visit the entire sample space in a short amount of runs. With a large variance in the sampler function the series is has low auto correlation but low acceptance rates. This means it takes many runs to sample from the whole sample space.

Limitations and results

The project here only deals with the Cauchy distribution with Normal sampler. It is not a rigorous treatment of the topic so any conclusions are specific and not general.The choice of variance in the sampler function is highly important. If the variance is quite large then the acceptance rate is low so the algorithm converges slowly. On the other hand if it is too small, the the algorithm moves slowly throughout the sample space and converges slowly. The solution to this problem, which is outside of the scope of this project, is to specify a loss function (perhaps deviance from target distribution), and find the optimal variance the minimizes this loss function with optimization techniques like stochastic gradient descent or adaptive learning rates. A rigorous treatment of the latter can be found by Haario, Heikki; Saksman, Eero; Tamminen, Johanna. An adaptive Metropolis algorithm. Bernoulli 7 (2001), no. 2, 223–242.