The Metropolis algorithm is a popular way of generating random samples of hard to sample distributions. This project uses the Metropolis algorithm to generate data from the Cauchy distribution. We then seek to find the relationship between the variance of the proposal function and the converge properties of the Metropolis algorithm; notably the burn-in time and the auto correlation of the generated series. Finally, further questions related to the variance size will be asked.
The purpose of this project is to use the Metropolis algorithm to sample from a Cauchy Distribution using various proposal distributions. We also seek to examine the relationship between the converge of the Markov chain and the variance of the proposal distribution.
The Metropolis Algorithm for a Cauchy- Normal can be defined as such:
Let f be the pdf for X~Cauchy(0,1). Let g be the pdf for Y~N(0,Var)
Note since g (ie (g(Y|Xt)=g(Y))) is independent: alpha(Xt, Y ) = min(1,(f(Y )*g(Xt))/(f(Xt)g(Y))
First we will define variables to be used in the simulation.
set.seed(1234)
nsim=10000
cauchy=function(x)
{
1/(1+x^2)
}
x=vector(mode="numeric",length=nsim)
x[1]=rnorm(1,0,2) # initialize the chain from the stationary
for (i in 2:nsim){
y=rnorm(1,0,2) # candidate normal
u=runif(1)
alpha=min(1,cauchy(y)*dnorm(x[i-1],0,2)/(cauchy(x[i-1])*dnorm(y,0,2)))
if(u<=alpha)
{
x[i]=y
}else x[i]=x[i-1]
}
library(ggplot2)
qplot(1:10000,x[1:10000],xlab="Index",ylab="Value")
hist(x, col="grey")
# draw a density line plot
plot(density(x), bty="l",main="95% density with Normal Proposal")
# add vertical lines for the median and 95th percentile
abline(v=quantile(x, c(0.5, 0.95)), lty=2:3)
#qqplot
par(mfrow = c(2, 2))
a <- ppoints(100)
QR <- qcauchy(a)
Qx <- quantile(x, a)
qqplot(QR, Qx, xlab ="Cauchy Quantiles",
ylab ="Sample Quantiles",
main = "Original accepted values")
abline(0,1, col=2)
b <- 100 #discard the burnin sample
y <- x[b: nsim]
Qy <- quantile(y, a)
qqplot(QR, Qy, xlab="Cauchy Quantiles",
ylab ="Sample Quantiles",
main = "Minus the burnin sample")
abline(0, 1, col = 2)
TS=ts(data = x, start = 1, end = nsim, frequency = 1)
acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)
x[1]=rt(1,20)# initialize the chain from the stationary
for (i in 2:nsim){
y=rt(1,20) # candidate t
u=runif(1)
alpha=min(1,cauchy(y)*dt(x[i-1],20)/(cauchy(x[i-1])*dt(y,20)))
if(u<=alpha)
{
x[i]=y
}else x[i]=x[i-1]
}
qplot(9500:10000,x[9500:10000],xlab="Index",ylab="Value")
hist(x, col="grey")
# draw a density line plot
plot(density(x), bty="l",main="95% density with t(20) Proposal")
# add vertical lines for the median and 95th percentile
abline(v=quantile(x, c(0.5, 0.95)), lty=2:3)
par(mfrow = c(1, 2))
a <- ppoints(100)
QR <- qcauchy(a)
Qx <- quantile(x, a)
qqplot(QR, Qx, xlab ="Cauchy Quantiles",
ylab ="Sample Quantiles",
main = "Original accepted values")
abline(0,1, col=2)
b <- 100 #discard the burnin sample
y <- x[b: nsim]
Qy <- quantile(y, a)
qqplot(QR, Qy, xlab="Cauchy Quantiles",
ylab ="Sample Quantiles",
main = "Minus the burnin sample")
abline(0, 1, col = 2)
ks.test(x,"cauchy")
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.96125, p-value < 2.2e-16
## alternative hypothesis: two-sided
The chi square goodness of fit test indicates that the data is a good fit.
TS=ts(data = x, start = 1, end = nsim, frequency = 1)
plot(TS)
acf(TS, lag.max = NULL,type = c("correlation"),plot = TRUE)
for(i in 1:4)
{
par(mfrow = c(1, 3))
metro_cauchy_norm(2*i,10000,1000)
}
## [1] "Precentage rejected is 0.2604"
## [1] "Precentage rejected is 0.4644"
## [1] "Precentage rejected is 0.5818"
## [1] "Precentage rejected is 0.6713"
for(i in 1:4)
{
par(mfrow = c(1, 3))
metro_cauchy_norm(.5,i*200,.01*i*200)
}
## [1] "Precentage rejected is 0.305"
## [1] "Precentage rejected is 0.4175"
## [1] "Precentage rejected is 0.24"
## [1] "Precentage rejected is 0.38875"
for(i in 1:4)
{
par(mfrow = c(1, 3))
metro_cauchy_norm(10,i*200,.1*i*200)
}
## [1] "Precentage rejected is 0.705"
## [1] "Precentage rejected is 0.6575"
## [1] "Precentage rejected is 0.706666666666667"
## [1] "Precentage rejected is 0.72375"
The variance of the sampler is inversely related to the auto correlation and directly related to the count of rejected points. With a small variance in the sampler function the series is has high auto correlation but large acceptance rates. This means it can visit the entire sample space in a short amount of runs. With a large variance in the sampler function the series is has low auto correlation but low acceptance rates. This means it takes many runs to sample from the whole sample space.
The project here only deals with the Cauchy distribution with a Normal (and near normal) sampler. It is not a rigorous treatment of the topic so any conclusions are specific and not general.The choice of variance in the sampler function is highly important. If the variance is quite large then the acceptance rate is low so the algorithm converges slowly. On the other hand if it is too small, the the algorithm moves slowly throughout the sample space and converges slowly. The solution to this problem, which is outside of the scope of this project, is to specify a loss function (perhaps deviance from target distribution), and find the optimal variance the minimizes this loss function with optimization techniques like stochastic gradient descent or adaptive learning rates. A rigorous treatment of the latter can be found by Haario, Heikki; Saksman, Eero; Tamminen, Johanna. An adaptive Metropolis algorithm. Bernoulli 7 (2001), no. 2, 223–242.