Let’s look at simulating from a gamma distribution with arbitrary shape and scale parameters,using a Metropolis-Hastings independence sampling algorithm with normal proposal distribution with the same mean and variance as the desired gamma.
A function for the Metropolis-Hastings sampler for this problem is given below. The chain is initialised at zero, and at each stage a N(a/b,a/(b*b)) candidate is proposed.
\[N(a/b,a/(b*b))\]
Metropolis-Hastings independence sampler for a gamma based on normal candidates with the same mean and variance
gamm<-function (n, a, b){
mu <- a/b
sig <- sqrt(a/(b * b))
vec <- vector("numeric", n)
x <- 3*a/b
vec[1] <- x
for (i in 2:n) {
can <- rnorm(1, mu, sig)
aprob <- min(1, (dgamma(can, a, b)/dgamma(x,
a, b))/(dnorm(can, mu, sig)/dnorm(x,
mu, sig)))
u <- runif(1)
if (u < aprob)
x <- can
vec[i] <- x
}
return(vec)
}
Set parameters.
nrep<- 55000
burnin<- 5000
shape<- 2.3
rate<-2.7
vec<-gamm(nrep,shape, rate)
Modify the plots below so they apply only to the chain AFTER the burn-in period
vec=vec[-(1:burnin)]
#vec=vec[burnin:length(vec)]
par(mfrow=c(2,1)) # change main frame, how many graphs in one frame
plot(ts(vec), xlab="Chain", ylab="Draws")
abline(h = mean(vec), lwd="2", col="red" )
hist(vec,30, prob=TRUE, xlab="Red Line = mean", col="grey", main="Simulated Density")
abline(v = mean(vec), lwd="2", col="red" )
par(mfrow=c(1,1)) # go back to default
summary(vec[-(1:burnin)]);
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.007863 0.442100 0.731100 0.845800 1.128000 3.186000
var(vec[-(1:burnin)])
## [1] 0.2948457
Figure: A visualization of the Metropolis-Hastings MCMC, from Hartig et al., 2011. (copyright see publisher)
This code is a modified version of that presented at https://theoreticalecology.wordpress.com/2010/09/17/metropolis-hastings-mcmc-in-r/ (Modified by David Giles dgiles@uvic.ca, 21 March, 2016) This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
\[y = Ax + B + u,\, u~ N(0,sd)\]
trueA <- 5
trueB <- 0
trueSd <- 10
sampleSize <- 131
# create independent x-values, around zero
x <- (-(sampleSize-1)/2):((sampleSize-1)/2)
# create dependent values according to ax + b + N(0,sd)
y <- trueA * x + trueB + rnorm(n=sampleSize,mean=0,sd=trueSd)
par(mfrow = c(1,1))
plot(x,y, main="Test Data")
likelihood <- function(param){
a = param[1]
b = param[2]
sd = param[3]
pred = a*x + b
singlelikelihoods = dnorm(y, mean = pred, sd = sd, log = T)
sumll = sum(singlelikelihoods)
return(sumll)
}
Return the logarithm of the probabilities in the likelihood function, which is also the reason why I sum the probabilities of all our datapoints (the logarithm of a product equals the sum of the logarithms).
Why do we do this? It’s strongly advisable because likelihoods, where a lot of small probabilities are multiplied, can get ridiculously small pretty fast (something like 10^-34). At some stage, computer programs are getting into numerical rounding or underflow problems then.
So, bottom-line: when you program something with likelihoods, always use logarithms!!!
# Example: plot the likelihood profile of the slope a
slopevalues <- function(x){return(likelihood(c(x, trueB, trueSd)))}
slopelikelihoods <- lapply(seq(3, 7, by=.05), slopevalues )
plot (seq(3, 7, by=.05), slopelikelihoods , type="l", xlab = "values of slope parameter a", ylab = "Log likelihood")
uniform distributions and normal distributions for all three parameters.
# Prior distribution
prior <- function(param){
a = param[1]
b = param[2]
sd = param[3]
# CHANGE THE NEXT 3 LINES TO CHANGE THE PRIOR
aprior = dunif(a, min=0, max=10, log = T)
bprior = dnorm(b, sd = 5, log = T)
sdprior = dunif(sd, min=0, max=30, log = T)
return(aprior+bprior+sdprior)
}
The product of prior and likelihood is the actual quantity the MCMC will be working on. This function is called the posterior (or to be exact, it’s called the posterior after it’s normalized, which the MCMC will do for us, but let’s not be picky for the moment). Again, here we work with the sum because we work with logarithms.
posterior <- function(param){
return (likelihood(param) + prior(param))
}
One of the most frequent applications of this algorithm (as in this example) is sampling from the posterior density in Bayesian statistics.
In principle, however, the algorithm may be used to sample from any integrable function. So, the aim of this algorithm is to jump around in parameter space, but in a way that the probability to be at a point is proportional to the function we sample from (this is usually called the target function). In our case this is the posterior defined above.
######## Metropolis algorithm ################
proposalfunction <- function(param){
return(rnorm(3,mean = param, sd= c(0.1,0.5,0.3)))
}
run_metropolis_MCMC <- function(startvalue, iterations){
chain = array(dim = c(iterations+1,3))
chain[1,] = startvalue
for (i in 1:iterations){
proposal = proposalfunction(chain[i,])
probab = exp(posterior(proposal) - posterior(chain[i,]))
if (runif(1) < probab){
chain[i+1,] = proposal
}else{
chain[i+1,] = chain[i,]
}
}
return(chain)
}
Again, working with the logarithms of the posterior might be a bit confusing at first, in particular when you look at the line where the acceptance probability is calculated (probab = exp(posterior(proposal) - posterior(chain[i,]))). To understand why we do this, note that p1/p2 = exp[log(p1)-log(p2)].
startvalue = c(4,0,10)
chain = run_metropolis_MCMC(startvalue, 55000)
#str(chain)
burnIn = 5000
acceptance = 1-mean(duplicated(chain[-(1:burnIn),]))
#?duplicated
The first steps of the algorithm may be biased by the initial value, and are therefore usually discarded for the further analysis (burn-in time). An interesting output to look at is the acceptance rate: how often was a proposal rejected by the metropolis-hastings acceptance criterion? The acceptance rate can be influenced by the proposal function: generally, the closer the proposals are, the larger the acceptance rate. Very high acceptance rates, however, are usually not beneficial: this means that the algorithms is “staying” at the same point, which results in a suboptimal probing of the parameter space (mixing).
(e)Print the value of the quantity called acceptance, and interpret what it is telling you.
summary(cbind(chain[-(1:burnIn),1],chain[-(1:burnIn),2],chain[-(1:burnIn),3]))
## V1 V2 V3
## Min. :4.884 Min. :-4.3133 Min. : 8.418
## 1st Qu.:4.971 1st Qu.:-0.9681 1st Qu.:10.078
## Median :4.988 Median :-0.3291 Median :10.494
## Mean :4.988 Mean :-0.3356 Mean :10.528
## 3rd Qu.:5.004 3rd Qu.: 0.3079 3rd Qu.:10.938
## Max. :5.080 Max. : 2.8278 Max. :13.391
# for comparison:
summary(lm(y~x))
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.970 -5.683 1.478 5.938 23.795
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.34093 0.91168 -0.374 0.709
## x 4.98724 0.02411 206.863 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.43 on 129 degrees of freedom
## Multiple R-squared: 0.997, Adjusted R-squared: 0.997
## F-statistic: 4.279e+04 on 1 and 129 DF, p-value: < 2.2e-16
summary(lm(y~x))$sigma
## [1] 10.43471
coefficients(lm(y~x))[1]
## (Intercept)
## -0.340933
coefficients(lm(y~x))[2]
## x
## 4.98724
### Summary: #######################
par(mfrow = c(2,3))
hist(chain[-(1:burnIn),1],prob=TRUE,nclass=30,col="109" , main="Posterior of a", xlab="Black=mean; Red=true; Magenta = MLE" )
abline(v = mean(chain[-(1:burnIn),1]), lwd="2")
abline(v = trueA, col="red", lwd="2" )
abline(v = coefficients(lm(y~x))[2], col="magenta", lwd="2" )
hist(chain[-(1:burnIn),2],prob=TRUE, nclass=30, col="green",main="Posterior of b", xlab="Black=mean; Red=true; Magenta = MLE")
abline(v = mean(chain[-(1:burnIn),2]), lwd="2")
abline(v = trueB, col="red", lwd="2" )
abline(v = coefficients(lm(y~x))[1], col="magenta", lwd="2" )
hist(chain[-(1:burnIn),3],prob=TRUE, nclass=30, col="yellow",main="Posterior of sd", xlab="Black=mean; Red=true; Magenta = MLE")
abline(v = mean(chain[-(1:burnIn),3]), lwd="2" )
abline(v = trueSd, col="red", lwd="2" )
abline(v = summary(lm(y~x))$sigma, col="magenta", lwd="2" )
plot(chain[-(1:burnIn),1], col="648",type = "l", xlab="True value = red line" , main = "Chain values of a" )
abline(h = trueA, col="red" )
plot(chain[-(1:burnIn),2], col="648",type = "l", xlab="True value = red line" , main = "Chain values of b" )
abline(h = trueB, col="red" )
plot(chain[-(1:burnIn),3], col="648",type = "l", xlab="True value = red line" , main = "Chain values of sd" )
abline(h = trueSd, col="red" )