Michael Lanier
8/4/2016
The Metropolis algorithm is a popular way of generating random samples of hard to sample distributions. This project uses the Metropolis algorithm to generate data from the Cauchy distribution. We then seek to find the relationship between the variance of the proposal function and the converge properties of the Metropolis algorithm; notably the burn-in time and the auto correlation of the generated series. Finally, further questions related to the variance size will be asked.
The purpose of this project is to use the Metropolis algorithm to sample from a Cauchy Distribution using various proposal distributions. We also seek to examine the relationship between the converge of the Markov chain and the variance of the proposal distribution.
The Metropolis Algorithm for a Cauchy- Normal can be defined as such:
Let f be the pdf for X~Cauchy(0,1). Let g be the pdf for Y~N(0,Var)
Note since g (ie (g(Y|Xt)=g(Y))) is independent: alpha(Xt, Y ) = min(1,(f(Y )*g(Xt))/(f(Xt)g(Y))
First we will define variables to be used in the simulation.
set.seed(1234)
nsim=10000
cauchy=function(x)
{
1/(1+x^2)
}
x=vector(mode="numeric",length=nsim)
x[1]=rnorm(1,0,2) # initialize the chain from the stationary
for (i in 2:nsim){
y=rnorm(1,0,2) # candidate normal
u=runif(1)
alpha=min(1,cauchy(y)*dnorm(x[i-1],0,2)/(cauchy(x[i-1])*dnorm(y,0,2)))
if(u<=alpha)
{
x[i]=y
}else x[i]=x[i-1]
}
library(ggplot2)
qplot(1:10000,x[1:10000],xlab="Index",ylab="Value")
This data appears to be Cauchy distributed.
This data is has heavy tails but appears symmetric just like the Cauchy function. This Auto correlation plot shows that the data is correlated.Auto correlation is problematic because correlated samples with produce local areas of samples that do not resemble the target distribution.
This data does not appear to be Cauchy distributed. It is bi modal.
par(mfrow = c(2, 1))
x[1]=rt(1,20)# initialize the chain from the stationary
for (i in 2:nsim){
y=rt(1,20) # candidate normal
u=runif(1)
alpha=min(1,cauchy(y)*dt(x[i-1],20)/(cauchy(x[i-1])*dt(y,20)))
if(u<=alpha)
{
x[i]=y
}else x[i]=x[i-1]
}
The data has two major tails.
ks.test(x,"cauchy")
One-sample Kolmogorov-Smirnov test
data: x
D = 0.96125, p-value < 2.2e-16
alternative hypothesis: two-sided
The chi square goodness of fit test indicates that the data is a good fit.
par(mfrow = c(2, 1))
TS=ts(data = x, start = 1, end = nsim, frequency = 1)
acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)
The issue appears to be that the data is extremely correlated, so the burn in time is going to be large.
[1] "Precentage rejected is 0.2604"
[1] "Precentage rejected is 0.4644"
[1] "Precentage rejected is 0.5818"
[1] "Precentage rejected is 0.6713"
The variance of the sampler is inversely related to the auto correlation and directly related to the count of rejected points. With a small variance in the sampler function the series is has high auto correlation but large acceptance rates. This means it can visit the entire sample space in a short amount of runs. With a large variance in the sampler function the series is has low auto correlation but low acceptance rates. This means it takes many runs to sample from the whole sample space.
The project here only deals with the Cauchy distribution with Normal sampler. It is not a rigorous treatment of the topic so any conclusions are specific and not general.The choice of variance in the sampler function is highly important. If the variance is quite large then the acceptance rate is low so the algorithm converges slowly. On the other hand if it is too small, the the algorithm moves slowly throughout the sample space and converges slowly. The solution to this problem, which is outside of the scope of this project, is to specify a loss function (perhaps deviance from target distribution), and find the optimal variance the minimizes this loss function with optimization techniques like stochastic gradient descent or adaptive learning rates. A rigorous treatment of the latter can be found by Haario, Heikki; Saksman, Eero; Tamminen, Johanna. An adaptive Metropolis algorithm. Bernoulli 7 (2001), no. 2, 223–242.