A simple Metropolis sampler

A simple Metropolis-Hastings independence sampler

Let’s look at simulating from a gamma distribution with arbitrary shape and scale parameters,using a Metropolis-Hastings independence sampling algorithm with normal proposal distribution with the same mean and variance as the desired gamma.

A function for the Metropolis-Hastings sampler for this problem is given below. The chain is initialised at zero, and at each stage a N(a/b,a/(b*b)) candidate is proposed.

\[N(a/b,a/(b*b))\]

Metropolis-Hastings independence sampler for a gamma based on normal candidates with the same mean and variance

  1. Start in some state \(x_t\). x in code.
  2. Propose a new state \(x^\prime\)Œ can in code
  3. Compute the “acceptance probability” \[\alpha = \min\left[1, \frac{f(x^\prime)}{f(x)}\right]\]
  4. Draw some uniformly distributed random number \(u\) from \([0,1]\); if \(u < \alpha\) accept the point, setting \(x_{t+1} = x^\prime\). Otherwise reject it and set \(x_{t+1} = x_t\).
gamm<-function (n, a, b){
        mu <- a/b
        sig <- sqrt(a/(b * b))
        vec <- vector("numeric", n)
        x <- 3*a/b
        vec[1] <- x
        for (i in 2:n) {
                can <- rnorm(1, mu, sig)
                aprob <- min(1, (dgamma(can, a, b)/dgamma(x, 
                        a, b))/(dnorm(can, mu, sig)/dnorm(x, 
                        mu, sig)))
                u <- runif(1)
                if (u < aprob) 
                        x <- can
                vec[i] <- x
        }
        return(vec)
}

Plots

Set parameters.

nrep<- 55000
burnin<- 5000
shape<- 2.3
rate<-2.7

vec<-gamm(nrep,shape, rate)

Modify the plots below so they apply only to the chain AFTER the burn-in period

vec=vec[-(1:burnin)]
#vec=vec[burnin:length(vec)]
par(mfrow=c(2,1)) # change main frame, how many graphs in one frame
plot(ts(vec), xlab="Chain", ylab="Draws")
abline(h = mean(vec), lwd="2", col="red" )
hist(vec,30, prob=TRUE, xlab="Red Line = mean", col="grey", main="Simulated Density")
abline(v = mean(vec), lwd="2", col="red" )

par(mfrow=c(1,1)) # go back to default

summary

summary(vec[-(1:burnin)]);  
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.007863 0.442100 0.731100 0.845800 1.128000 3.186000
var(vec[-(1:burnin)])
## [1] 0.2948457

Sample 2: Bayesian Estimation for Regression

MH visualization

Figure: A visualization of the Metropolis-Hastings MCMC, from Hartig et al., 2011. (copyright see publisher)

  1. Check your understanding of the Metropolis-Hastings sampler that is being used here for Bayesian estimation of a regression model, using artificial data.

This code is a modified version of that presented at https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/ (Modified by David Giles dgiles@uvic.ca, 21 March, 2016) This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

\[y = Ax + B + u,\, u~ N(0,sd)\]

Set parameter

trueA <- 5
trueB <- 0
trueSd <- 10
sampleSize <- 131

DGP and Plot

# create independent x-values, around zero
x <- (-(sampleSize-1)/2):((sampleSize-1)/2)
# create dependent values according to ax + b + N(0,sd)
y <-  trueA * x + trueB + rnorm(n=sampleSize,mean=0,sd=trueSd)

par(mfrow = c(1,1))
plot(x,y, main="Test Data")

likelihood from normal distribution

likelihood <- function(param){
    a = param[1]
    b = param[2]
    sd = param[3]
     
    pred = a*x + b
    singlelikelihoods = dnorm(y, mean = pred, sd = sd, log = T)
    sumll = sum(singlelikelihoods)
    return(sumll)   
}
Why work with logarithms

Return the logarithm of the probabilities in the likelihood function, which is also the reason why I sum the probabilities of all our datapoints (the logarithm of a product equals the sum of the logarithms).

Why do we do this? It’s strongly advisable because likelihoods, where a lot of small probabilities are multiplied, can get ridiculously small pretty fast (something like 10^-34). At some stage, computer programs are getting into numerical rounding or underflow problems then.

So, bottom-line: when you program something with likelihoods, always use logarithms!!!

Example: plot the likelihood profile of the slope a

# Example: plot the likelihood profile of the slope a
slopevalues <- function(x){return(likelihood(c(x, trueB, trueSd)))}
slopelikelihoods <- lapply(seq(3, 7, by=.05), slopevalues )
plot (seq(3, 7, by=.05), slopelikelihoods , type="l", xlab = "values of slope parameter a", ylab = "Log likelihood")

Prior distribution

uniform distributions and normal distributions for all three parameters.

# Prior distribution
prior <- function(param){
    a = param[1]
    b = param[2]
    sd = param[3]
# CHANGE THE NEXT 3 LINES TO CHANGE THE PRIOR
    aprior = dunif(a, min=0, max=10, log = T)
    bprior = dnorm(b, sd = 5, log = T)
    sdprior = dunif(sd, min=0, max=30, log = T)
    return(aprior+bprior+sdprior)
}

The posterior

The product of prior and likelihood is the actual quantity the MCMC will be working on. This function is called the posterior (or to be exact, it’s called the posterior after it’s normalized, which the MCMC will do for us, but let’s not be picky for the moment). Again, here we work with the sum because we work with logarithms.

posterior <- function(param){
   return (likelihood(param) + prior(param))
}

Metropolis algorithm

One of the most frequent applications of this algorithm (as in this example) is sampling from the posterior density in Bayesian statistics.

In principle, however, the algorithm may be used to sample from any integrable function. So, the aim of this algorithm is to jump around in parameter space, but in a way that the probability to be at a point is proportional to the function we sample from (this is usually called the target function). In our case this is the posterior defined above.

  1. Starting at a random parameter value
  2. Choosing a new parameter value close to the old value based on some probability density that is called the proposal function
  3. Jumping to this new point with a probability p(new)/p(old), where p is the target function, and p>1 means jumping as well
######## Metropolis algorithm ################
 
proposalfunction <- function(param){
    return(rnorm(3,mean = param, sd= c(0.1,0.5,0.3)))

}
 
run_metropolis_MCMC <- function(startvalue, iterations){
    chain = array(dim = c(iterations+1,3))
    chain[1,] = startvalue
    for (i in 1:iterations){
        proposal = proposalfunction(chain[i,])
         
        probab = exp(posterior(proposal) - posterior(chain[i,]))
        if (runif(1) < probab){
            chain[i+1,] = proposal
        }else{
            chain[i+1,] = chain[i,]
        }
    }
    return(chain)
}

Again, working with the logarithms of the posterior might be a bit confusing at first, in particular when you look at the line where the acceptance probability is calculated (probab = exp(posterior(proposal) - posterior(chain[i,]))). To understand why we do this, note that p1/p2 = exp[log(p1)-log(p2)].

Implementation

startvalue = c(4,0,10)
chain = run_metropolis_MCMC(startvalue, 55000)
#str(chain) 
burnIn = 5000
acceptance = 1-mean(duplicated(chain[-(1:burnIn),]))
#?duplicated

The first steps of the algorithm may be biased by the initial value, and are therefore usually discarded for the further analysis (burn-in time). An interesting output to look at is the acceptance rate: how often was a proposal rejected by the metropolis-hastings acceptance criterion? The acceptance rate can be influenced by the proposal function: generally, the closer the proposals are, the larger the acceptance rate. Very high acceptance rates, however, are usually not beneficial: this means that the algorithms is “staying” at the same point, which results in a suboptimal probing of the parameter space (mixing).

(e)Print the value of the quantity called acceptance, and interpret what it is telling you.

summary

summary(cbind(chain[-(1:burnIn),1],chain[-(1:burnIn),2],chain[-(1:burnIn),3]))
##        V1              V2                V3        
##  Min.   :4.884   Min.   :-4.3133   Min.   : 8.418  
##  1st Qu.:4.971   1st Qu.:-0.9681   1st Qu.:10.078  
##  Median :4.988   Median :-0.3291   Median :10.494  
##  Mean   :4.988   Mean   :-0.3356   Mean   :10.528  
##  3rd Qu.:5.004   3rd Qu.: 0.3079   3rd Qu.:10.938  
##  Max.   :5.080   Max.   : 2.8278   Max.   :13.391
# for comparison:
summary(lm(y~x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.970  -5.683   1.478   5.938  23.795 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.34093    0.91168  -0.374    0.709    
## x            4.98724    0.02411 206.863   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.43 on 129 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
## F-statistic: 4.279e+04 on 1 and 129 DF,  p-value: < 2.2e-16
summary(lm(y~x))$sigma
## [1] 10.43471
coefficients(lm(y~x))[1]
## (Intercept) 
##   -0.340933
coefficients(lm(y~x))[2]
##       x 
## 4.98724

Trace of chains:

### Summary: #######################
 
par(mfrow = c(2,3))
hist(chain[-(1:burnIn),1],prob=TRUE,nclass=30,col="109" , main="Posterior of a", xlab="Black=mean; Red=true; Magenta = MLE" )
abline(v = mean(chain[-(1:burnIn),1]), lwd="2")
abline(v = trueA, col="red", lwd="2" )
abline(v = coefficients(lm(y~x))[2], col="magenta", lwd="2" )

hist(chain[-(1:burnIn),2],prob=TRUE, nclass=30, col="green",main="Posterior of b", xlab="Black=mean; Red=true; Magenta = MLE")
abline(v = mean(chain[-(1:burnIn),2]), lwd="2")
abline(v = trueB, col="red", lwd="2" )
abline(v = coefficients(lm(y~x))[1], col="magenta", lwd="2" )

hist(chain[-(1:burnIn),3],prob=TRUE, nclass=30, col="yellow",main="Posterior of sd", xlab="Black=mean; Red=true; Magenta = MLE")
abline(v = mean(chain[-(1:burnIn),3]), lwd="2" )
abline(v = trueSd, col="red", lwd="2" )
abline(v = summary(lm(y~x))$sigma, col="magenta", lwd="2" )

plot(chain[-(1:burnIn),1], col="648",type = "l", xlab="True value = red line" , main = "Chain values of a" )
abline(h = trueA, col="red" )
plot(chain[-(1:burnIn),2], col="648",type = "l", xlab="True value = red line" , main = "Chain values of b" )
abline(h = trueB, col="red" )
plot(chain[-(1:burnIn),3], col="648",type = "l", xlab="True value = red line" , main = "Chain values of sd" )
abline(h = trueSd, col="red" )