Abstract

The Metropolis algorithm is a popular way of generating random samples of hard to sample distributions. This project uses the Metropolis algorithm to generate data from the Cauchy distribution. We then seek to find the relationship between the variance of the proposal function and the converge properties of the Metropolis algorithm; notably the burn-in time and the auto correlation of the generated series. Finally, further questions related to the variance size will be asked.

Introduction

The purpose of this project is to use the Metropolis algorithm to sample from a Cauchy Distribution using various proposal distributions. We also seek to examine the relationship between the converge of the Markov chain and the variance of the proposal distribution.

Analysis

The Metropolis Algorithm for a Cauchy- Normal can be defined as such:

  1. Let f be the pdf for X~Cauchy(0,1). Let g be the pdf for Y~N(0,Var)

  2. Generate X0 from a distribution g.
  3. Repeat (until the chain has converged to a stationary distribution according to some criteria):
  1. Generate Y from a distribution g.
  2. Generate U from Uniform(0, 1).
  3. If U < (f(Y )g(Xt|Y ))/(f(Xt)g(Y |Xt)) accept Y and set Xt+1 = Y ; otherwise set Xt+1 = Xt. 9
  4. Increment t. . Note in step (3c) the candidate point Y is accepted with probability alpha(Xt, Y ) = min(1,(f(Y )g(Xt|Y ))/(f(Xt)g(Y |Xt)))

Note since g (ie (g(Y|Xt)=g(Y))) is independent: alpha(Xt, Y ) = min(1,(f(Y )*g(Xt))/(f(Xt)g(Y))

Results

First we will define variables to be used in the simulation.

set.seed(1234)
nsim=10000
cauchy=function(x)
{
  1/(1+x^2)
}

Run Metropolis Simulation using Normal proposal

x=vector(mode="numeric",length=nsim)
x[1]=rnorm(1,0,2) # initialize the chain from the stationary
for (i in 2:nsim){
  y=rnorm(1,0,2) # candidate normal
  u=runif(1)
  alpha=min(1,cauchy(y)*dnorm(x[i-1],0,2)/(cauchy(x[i-1])*dnorm(y,0,2)))
  if(u<=alpha)
  {
    x[i]=y
  }else x[i]=x[i-1]
}

Examine data

library(ggplot2)
qplot(1:10000,x[1:10000],xlab="Index",ylab="Value")

This data does not appear to have any strange patters in it that would imply heavy autocorrelation.
hist(x, col="grey")

# draw a density line plot
plot(density(x), bty="l",main="95% density with Normal Proposal")



# add vertical lines for the median and 95th percentile
abline(v=quantile(x, c(0.5, 0.95)), lty=2:3)

This data appears to be Cauchy distributed.
#qqplot
par(mfrow = c(2, 2))
a <- ppoints(100)
QR <- qcauchy(a)
Qx <- quantile(x, a)
qqplot(QR, Qx, xlab ="Cauchy Quantiles",
       ylab ="Sample Quantiles",
       main = "Original accepted values")
abline(0,1, col=2)
b <- 100 #discard the burnin sample
y <- x[b: nsim]
Qy <- quantile(y, a)
qqplot(QR, Qy, xlab="Cauchy Quantiles",
       ylab ="Sample Quantiles",
       main = "Minus the burnin sample")
abline(0, 1, col = 2)

This data is has heavy tails but appears symmetric just like the Cauchy function.
TS=ts(data = x, start = 1, end = nsim, frequency = 1)

acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)

This Autocorrelation plot shows that the data is correlated.

Run Metropolis Algorithm using t distibution Proposal

x[1]=rt(1,20)# initialize the chain from the stationary
for (i in 2:nsim){
  y=rt(1,20) # candidate t
  u=runif(1)
  alpha=min(1,cauchy(y)*dt(x[i-1],20)/(cauchy(x[i-1])*dt(y,20)))
  if(u<=alpha)
  {
    x[i]=y
  }else x[i]=x[i-1]
}
qplot(9500:10000,x[9500:10000],xlab="Index",ylab="Value")

hist(x, col="grey")

# draw a density line plot
plot(density(x), bty="l",main="95% density with t(20) Proposal")



# add vertical lines for the median and 95th percentile
abline(v=quantile(x, c(0.5, 0.95)), lty=2:3)

This data does not appear to be Cauchy distributed. It is bimodal.

qq plot

par(mfrow = c(1, 2))
a <- ppoints(100)
QR <- qcauchy(a)
Qx <- quantile(x, a)
qqplot(QR, Qx, xlab ="Cauchy Quantiles",
       ylab ="Sample Quantiles",
       main = "Original accepted values")
abline(0,1, col=2)
b <- 100 #discard the burnin sample
y <- x[b: nsim]
Qy <- quantile(y, a)
qqplot(QR, Qy, xlab="Cauchy Quantiles",
       ylab ="Sample Quantiles",
       main = "Minus the burnin sample")
abline(0, 1, col = 2)

The data has two major tails.

Test of goodness of fit

ks.test(x,"cauchy")
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  x
## D = 0.96125, p-value < 2.2e-16
## alternative hypothesis: two-sided

The chi square goodness of fit test indicates that the data is a good fit.

TS=ts(data = x, start = 1, end = nsim, frequency = 1)
plot(TS)

acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)

The issue appears to be that the data is extremely correlated. So locally a set of samples will be correlated with each other and not correctly reflect the distribution.
#Define function
metro_cauchy_norm=function(s,nsim,burnin)
{ x=vector(mode="numeric",length=nsim)
x[1]=rnorm(1,0,s)# initialize the chain from the stationary
k=0 #Number rejected
for (i in 2:nsim){
  y=rnorm(1,0,s) # candidate normal
  u=runif(1)
  alpha=min(1,cauchy(y)*dnorm(x[i-1],0,s)/(cauchy(x[i-1])*dnorm(y,0,s)))
  if(u<=alpha)
  {
    x[i]=y
  }else{x[i]=x[i-1]
  k=k+1}
}

TS=ts(data = x, start = 1, end = 1000, frequency = 1)
H=plot(density(x), bty="l",main=paste("95% density with Normal(0,",s,")"))
curve(dcauchy(x,location=0, scale=1), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")
V=acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)


a <- ppoints(1000)
QR <- qcauchy(a)
Qx <- quantile(x, a)


y <- x[burnin: nsim]
Qy <- quantile(y, a)
QMB=qqplot(QR, Qy, xlab="Cauchy Quantiles",
       ylab ="Sample Quantiles",
       main = "Minus the burnin sample")
abline(0, 1, col = 2)


L=list(c(TS,V,H))
L=list(c(H,QMB,x))
print(paste("Precentage rejected is ",k/nsim))
return(L)
}

How does the varience in the proposal function effect the Metropolis Algorithm?

for(i in 1:4)
{
  par(mfrow = c(1, 3))
  metro_cauchy_norm(2*i,10000,1000)
  
}

## [1] "Precentage rejected is  0.2604"

## [1] "Precentage rejected is  0.4644"

## [1] "Precentage rejected is  0.5818"

## [1] "Precentage rejected is  0.6713"
A large varience in the propsal function reduces autocorrelation but rejects more points.

How does the number of simulated runs effect the Metropolis Algorithm with small varience?

for(i in 1:4)
{
  par(mfrow = c(1, 3))
  metro_cauchy_norm(.5,i*200,.01*i*200)
  
}

## [1] "Precentage rejected is  0.305"

## [1] "Precentage rejected is  0.4175"

## [1] "Precentage rejected is  0.24"

## [1] "Precentage rejected is  0.38875"
With a small varience in the proposal function means that the density of the data is slowly converging to the Cauchy distribution.The autocorrelation is also high.

How does the number of simulated runs effect the Metropolis Algorithm with larger varience?

for(i in 1:4)
{
  par(mfrow = c(1, 3))
  metro_cauchy_norm(10,i*200,.1*i*200)
  
}

## [1] "Precentage rejected is  0.705"

## [1] "Precentage rejected is  0.6575"

## [1] "Precentage rejected is  0.706666666666667"

## [1] "Precentage rejected is  0.72375"
With a large variance as the number of simulations increases the autocorrelation is less variable.

Conclusions

The variance of the sampler is inversely related to the auto correlation and directly related to the count of rejected points. With a small variance in the sampler function the series is has high auto correlation but large acceptance rates. This means it can visit the entire sample space in a short amount of runs. With a large variance in the sampler function the series is has low auto correlation but low acceptance rates. This means it takes many runs to sample from the whole sample space.

Limitations and results

The project here only deals with the Cauchy distribution with a Normal (and near normal) sampler. It is not a rigorous treatment of the topic so any conclusions are specific and not general.The choice of variance in the sampler function is highly important. If the variance is quite large then the acceptance rate is low so the algorithm converges slowly. On the other hand if it is too small, the the algorithm moves slowly throughout the sample space and converges slowly. The solution to this problem, which is outside of the scope of this project, is to specify a loss function (perhaps deviance from target distribution), and find the optimal variance the minimizes this loss function with optimization techniques like stochastic gradient descent or adaptive learning rates. A rigorous treatment of the latter can be found by Haario, Heikki; Saksman, Eero; Tamminen, Johanna. An adaptive Metropolis algorithm. Bernoulli 7 (2001), no. 2, 223–242.