Just Another Gibbs Sampler

JAGS is a software that programs and executes Markov chain Monte Carto (MCMC) for Bayesian models (Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (dsc 2003), Vienna, Austria.ISSN 1609-395X. Plummer, M. (2012). JAGS version 3.3.0 user manual.

JAGS is a successor of BUGS or Bayesian inference using Gibbs sampling (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2013; Lunn, Thomas, Best, & Spiegelhalter, 2000). JAGS is very similar to BUGS but has a few extra functions and some times is more efficient. Moreover, JAGS can be run in Windows, Mac and Linux.

To use JAGS (o BUGS) first we have to write our model using a synthetic language and save it as a text file. A convenient way to do this is to use the cat function in R that allow us to write and save a text file from a script.

For example, to write a simple Normal regression such as \[ \begin{aligned} & y_i \sim \textrm{Normal}(\mu_i, \tau^2) \\ & \mu_i = \beta_0 + \beta_1 \times x_i \\ & \beta_0 \sim \textrm{Normal}(0, 0.001) \\ & \beta_1 \sim \textrm{Normal}(0, 0.001) \\ \end{aligned} \]

In BUGS we have:

cat(file = "lm.bug", 
"
model{
    b0 ~ dnorm(0, 0.001)
    b1 ~ dnorm(0, 0.001)
    s ~ dunif(0, 100)
    tau <- pow(s, -2)
    for (i in 1:n) {    
      y[i] ~ dnorm(m[i], tau)
      m[i] <- b0 + b1 * x[i]
    }    
  }"
)

Lets try this model with simulated data:

set.seed(2345)
n <- 30
b0 <- 0.25
b1 <- 0.3
s <- 1

x <- runif(n, 0, 10)
y <- rnorm(n, mean = b0 + b1 * x, sd = s)

plot(x, y)

Now we have to make a list of the data that we need to pass to JAGS, a function to generate appropriate random starting points for the Markov Chains.

m.data <- list("n", "y", "x")
inits <- function() list(b0 = rnorm(1, 0, 1), b1 = rnorm(1, 0, 1), s = runif(1, 
    0, 10))
params <- c("b0", "b1", "s")

Finally, we define how many iterations we want to run for each chain, how many values to “thin”, the length of the “burn in” and how many chains to run.

ni <- 1000  # number of iterations per chain
nt <- 1  # how often to discard ('thin') the chains
nb <- 500  # how many iterations to discard as 'burn in'
nc <- 3  # how many chains to run

We can run JAGS from R using the function jags from package jagsUI

library(jagsUI)

m.sim <- jags(data = m.data, inits = inits, parameters.to.save = params, model.file = "lm.bug", 
    n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 130
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 500 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 500 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

We have to check that all the chains have converged, and that we have reasonable effective sample sizes for all posteriors. To see a summary of the results we write:

print(m.sim)

## JAGS output for model 'lm.bug', generated by jagsUI.
## Estimates based on 3 chains of 1000 iterations,
## burn-in = 500 iterations and thin rate = 1,
## yielding 1500 total samples from the joint posterior. 
## MCMC ran for 0.003 minutes at time 2016-02-19 06:51:28.
## 
##            mean    sd   2.5%    50%  97.5% overlap0     f  Rhat n.eff
## b0        0.444 0.330 -0.211  0.446  1.100     TRUE 0.909 1.002  1322
## b1        0.258 0.062  0.135  0.258  0.383    FALSE 1.000 1.001  1193
## s         0.867 0.126  0.671  0.851  1.152    FALSE 1.000 1.022   179
## deviance 74.955 2.732 71.888 74.258 82.291    FALSE 1.000 1.015   349
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 3.7 and DIC = 78.671 
## DIC is an estimate of expected predictive error (lower is better).

We can also plot the chains to inspect by eye if they are stationary and well mixed

op <- par(mfrow = c(3, 1))
jagsUI::traceplot(m.sim, parameters = c("b0"))
jagsUI::traceplot(m.sim, parameters = c("b1"))
jagsUI::traceplot(m.sim, parameters = c("s"))

par(op)

Lets plot the posterior densities and the true values

op <- par(mfrow = c(3, 1))
plot(density(m.sim$sims.list$b0), lwd = 2, main = "")
abline(v = b0, lty = 2, lwd = 2)
plot(density(m.sim$sims.list$b1), lwd = 2, main = "")
abline(v = b1, lty = 2, lwd = 2)
plot(density(m.sim$sims.list$s), lwd = 2, main = "")
abline(v = s, lty = 2, lwd = 2)

par(op)

How do these results compare to a classical analysis?

summary(lm(y ~ x))

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41699 -0.46592  0.03046  0.39072  1.70201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.45508    0.30737   1.481 0.149885    
## x            0.25616    0.05758   4.449 0.000125 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8266 on 28 degrees of freedom
## Multiple R-squared:  0.4141, Adjusted R-squared:  0.3932 
## F-statistic: 19.79 on 1 and 28 DF,  p-value: 0.000125

m.sim$summary

##                mean         sd       2.5%        25%        50%        75%
## b0        0.4436719 0.32992889 -0.2108262  0.2289355  0.4455410  0.6519370
## b1        0.2581780 0.06222401  0.1354727  0.2182500  0.2584740  0.2961984
## s         0.8667881 0.12640564  0.6712617  0.7746096  0.8511378  0.9405757
## deviance 74.9553903 2.73204788 71.8883457 72.9221041 74.2581872 76.2692913
##              97.5%     Rhat n.eff overlap0         f
## b0        1.099999 1.002118  1322        1 0.9086667
## b1        0.382616 1.001374  1193        0 1.0000000
## s         1.151686 1.022412   179        0 1.0000000
## deviance 82.291314 1.015185   349        0 1.0000000

Graphically:

plot(x, y)
abline(lm(y ~ x), lwd = 3, lty = 2)
xv <- seq(min(x), max(x), length.out = 100)
lines(xv, m.sim$mean$b0 + m.sim$mean$b1 * xv, col = 2, lwd = 2)

So why bother? As we will see in other practicals, it is easy to expand the BUGS models to build hierarchial models, include hidden variables, assimilate different sources of information, use informative priors and so on.

Also, since this is a Bayesian análisis we can ask things about the posterior such as what is the probability that the slope is larger that 0.4:

length(which(m.sim$sims.list$b1 > 0.4))/length(m.sim$sims.list$b1)

## [1] 0.014

Juan Manuel Morales. 2016-02-19