The Metropolis-Hastings algorithm is a Markov Chain Monte Carlo (MCMC) method used to generate samples from a target distribution when direct sampling is difficult. It is particularly useful for Bayesian inference and other probabilistic models.
Choose an initial state \(x^{(0)}\).
For each iteration \(t\):
Propose a new state \(x'\) from a proposal distribution \(q(x'|x^{(t)})\).
Calculate the acceptance probability \(\alpha(x^{(t)}, x') = \min \left(1, \frac{p(x') q(x^{(t)}|x')}{p(x^{(t)}) q(x'|x^{(t)})}\right)\), where \(p(x)\) is the target distribution.
Generate a uniform random number \(u \sim U(0, 1)\).
If \(u \leq \alpha(x^{(t)}, x')\), set \(x^{(t+1)} = x'\), otherwise set \(x^{(t+1)} = x^{(t)}\).
Repeat step 2 for a sufficient number of iterations to obtain the desired number of samples.
Properties
The Metropolis-Hastings algorithm generates a Markov chain whose stationary distribution is the target distribution \(p(x)\).
The algorithm requires a proposal distribution \(q(x'|x)\), which should satisfy some conditions such as being symmetric if possible.
The choice of proposal distribution can impact the efficiency of the algorithm, with a good choice leading to faster convergence.
p=function(x,sam=FALSE){if(sam) {return(rnorm(1,x,1)) }else {return(dnorm(x,mean = x,sd =1)) }}pi=function(x) {return(dbeta(x,2.5,5.9))}#--------need to draw a sample from p(theta|x)da=mcmc(0.5,p,10000,pi)x=seq(-20,20,length=1000)par(mfrow=c(2,1))hist((da[-(1:500)]),prob=TRUE,col ="cyan",main="Histogram of sample",xlab="value")lines(density(da[-(1:500)]))a=density(da)polygon(a$x,a$y,density =50,col="blue")curve(pi(x),xlim=c(0,1))polygon(x,pi(x),density=50,col="green")mtext("Beta example",outer =TRUE,line =-1)
Limitations
We first simulate two sets of observations \(X_1, . . . , X_n ∼ Normal(\mu_X,1) and Y1, . . . , Yn ∼ Normal(\mu_Y, 1).\) However, suppose these data are not observable and we only know $ Z_i = X_i + Y_i $
$ i = 1, . . . , n $ Thus, $ Z_i ∼ Normal(_X + _Y, 2).$We want to estimate \(\mu_X\) and \(\mu_X\). We choose priors \(\mu_X,\mu_Y \sim N(0,100^2)\)
Code
mu.X <-2mu.Y <-4n=100set.seed(100)X <-rnorm(n, mean = mu.X, sd =1)Y <-rnorm(n, mean = mu.Y, sd =1)Z <- X + Ydata <-list(Z = Z, n = n)
Let us consider that we have a dataset with covariates as say temperature(in Farenhite)and response variable is a factor variable denoting “success” by 1 and failure by 0.In classical approach we can fit a regression model.Let us start with \[
y_i\sim N(\beta_0+\beta_1 x_i,\sigma^2) ,\ i=1,2,...,n
\]
where \((\beta_0,\beta_1,\sigma)\) are the unknown parametres.
Estimation(Bayesian Analysis)
In classical approach we fit a Linear model and found the LS estimates to be \(\hat\beta_0,\hat\beta_1\)
Now we want to use prior information about \(\beta's\) for estimation.Here \(\beta's\) takes value in \(\mathbb R\) so we are taking Prior distribution as independent \(N(0,100^2)\) and \(\sigma^2\sim InverseGamma(0.001,0.001)\).Our posterior distribution is the following, \[
\pi(\alpha|x,y) \propto L(\alpha|x,y)\pi(\alpha|\hat\alpha,\hat\sigma_\alpha)
\]
Trace plot
Code
require(rjags)X <-runif(500)Y <-rnorm(500, mean =5+2* X, sd =1)data <-list(Y =Y, X = X, n =length(X))model_string <-textConnection("model{# Likelihood (dnorm uses a precision, not variance)for(i in 1:n){Y[i] ~ dnorm(beta0 + beta1 * X[i], tau) #tau = 1/sigma^2}# Priorstau ~ dgamma(0.01, 0.01)sigma <- 1/sqrt(tau)beta0 ~ dnorm(0, 0.0001)beta1 ~ dnorm(0, 0.0001)}")inits <-list(beta0 =mean(X), beta1 =0, tau =1/var(X))model <-jags.model(model_string, data = data, inits = inits, quiet =TRUE)update(model, 10000, progress.bar ="none")params <-c("beta0", "beta1", "sigma")samples <-coda.samples(model,variable.names = params,n.iter =20000, progress.bar ="none")plot(samples)
\[\sigma^{2} | \mu, Y \sim InvGamma ( a + \frac{n}{2}, \dfrac{1}{2} \sum_{i=1}^{n} (Y_{i} - \mu)^{2} + b )\] - we want to generate samples from posterior distribution
How Does it works?
step 1: take a initial value: \(\mu_o\) and \(\sigma_0 ^2\)
step 3: We repeat step 2 n times and get the posterior draws
R Code
R Code
# given data from N(5,1) of size 100mu <-5sigmaSq <-1set.seed(100)Y <-rnorm(100, mean = mu,sd =sqrt(sigmaSq))mu.update <-function(Y, sigmaSq, mu0, sigmaSq0){mu.posvar.inv <-length(Y) / sigmaSq +1/ sigmaSq0mu.posvar <-1/ mu.posvar.invmu.posmean <- (sum(Y) / sigmaSq + mu0 / sigmaSq0) / mu.posvar.invout <- mu.posmean +sqrt(mu.posvar) *rnorm(1)out}sigmaSq.update <-function(Y, mu, a, b){sigmaSq.inv <-rgamma(1, shape = a +length(Y) /2,rate = b +sum((Y - mu)^2) /2)sigmaSq <-1/ sigmaSq.invsigmaSq}MCMC <-function(Y, mu.init, sigmaSq.init, mu0, sigmaSq0, a, b, iters){# chain initiationmu <- mu.initsigmaSq <- sigmaSq.init# define chainsmu.chain <-rep(NA, iters)sigmaSq.chain <-rep(NA, iters)# start MCMCfor(i in1:iters){mu <-mu.update(Y, sigmaSq, mu0, sigmaSq0)sigmaSq <-sigmaSq.update(Y, mu, a, b)mu.chain[i] <- musigmaSq.chain[i] <- sigmaSq}# return chainsout <-list(mu.chain = mu.chain,sigmaSq.chain = sigmaSq.chain)return(out)}
Gibbs sampling for the gaussian model
initial choice of \(\mu\) = \(\bar X\) inital choice of \(\sigma^2\) = \(var(X)\)
Code
par(mfrow=c(1,2))m =mean(Y)v =var(Y)MCMC.out <-MCMC(Y = Y,mu.init = m,sigmaSq.init = v,mu0 =0,sigmaSq0 =100,a =0.1,b =0.1,iters =3000)plot(MCMC.out$mu.chain, xlab ="Iteration", ylab ="mu", type ="l", sub =paste('initial: ', m ))abline(h =5, col ='red')plot(MCMC.out$sigmaSq.chain, xlab ="Iteration", ylab ="sigmaSq", type ="l", sub =paste('initial: ', v ))abline(h =1, col ='red')
Gibbs sampling for the gaussian model
initial choice of \(mu\) = -1000 inital choice of \(\sigma^2\) = 500
Code
par(mfrow=c(1,2))m =-1000v =500MCMC.out <-MCMC(Y = Y,mu.init = m,sigmaSq.init = v,mu0 =0,sigmaSq0 =100,a =0.1,b =0.1,iters =3000)plot(MCMC.out$mu.chain, xlab ="Iteration", ylab ="mu", type ="l", sub =paste('initial: ', m ))abline(h =5, col ='red')plot(MCMC.out$sigmaSq.chain, xlab ="Iteration", ylab ="sigmaSq", type ="l", sub =paste('initial: ', v ))abline(h =1, col ='red')
Using rjags
Code
X<-rnorm(100,5,1)library(rjags)data<-list(X=X,n=length(X))#(1) Define the model as a stringmodel_string<-textConnection("model{ #Likelihood(dnorm in jags uses precision, not variance) for(i in 1:n){ X[i]~ dnorm(mu,tau)#tau=1/sigma^2 } #Priors tau~ dgamma(0.01,0.01) sigma<-1/sqrt(tau) mu~ dnorm(0,0.0001) }")#(2) Load the data and compile the MCMC codeinits<-list(mu=mean(X),tau=1/var(X))model<-rjags::jags.model(model_string, data=data, inits=inits, quiet=TRUE)#(3) Burn in for 10000 samplesupdate(model,10000,progress.bar="none")#(4) Generate 20000 post-burn-in samples and retain the parameters named in paramsparams<-c("mu","sigma")samples<-rjags::coda.samples(model,variable.names=params,n.iter=20000,progress.bar="none")#(6)Plot Samplesplot(samples)
Example
\[y_i\overset{iid}{\sim} poisson(\lambda)\hspace{0.8cm} i =1,2,3,\ldots,n \]
deriving these conditional distributions can be analytically intractable
Dependency on Initial Values
Convergence Rate
Softwer Options
R functions for Bayesian Linear Regression, BLR() in BGLR package
OpenBUGS, JAGS, Proc MCMC, STAN.
Acknowledgement
We want to say a big thank you to everyone who helped make this project possible. Special thanks to Professor Deepayan Sarkar and to Subhrangsu, Sourav, Subhendu for helping us.