Suppose we study \(10\) sub-poblations in a region and that we follow the survival from one year to the next of a variable number of individuals in each sub-population. Lets simulate data with this hiearchical structure. The idea is that all sub-populations are part of the same meta-population, belong to the same species, are subject to the same general climate and so on. So, we expect that survival probability will be similar among populations, but at the same time they will not be identical. To model variability between sub-populations we could use a Beta distribution, for example with parameters \(a_s = 2\) y \(b_s = 10\). According to this distribution, the expected mortality is \(\frac{a_s}{(a_s+b_s)}\) = 0.1666667 with variance \(\frac{(a_s \times b_s)}{(a_s + b_s)^2 \times (a_s + b_s + 1)}\) = 0.0106838. Assuming that we fix the number of sampled individuals from each sub-population, we can simulate mortality rates by:
set.seed(1234)
n <- 10 # subgrups
m <- c(30, 28, 20, 30, 30, 26, 29, 5, 3, 27) # sampled individuals per group
a_s <- 2
b_s <- 10
theta <- rbeta(n, a_s, b_s) # draw n mortality rates
y <- rbinom(n, size = m, prob = theta) # simulate deaths per group
# now we make a graph
op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2,
cex.axis = 1.3, las = 1, bty = "n")
plot(table(y), xlab = "# deaths", ylab = "frequency")
par(op)
We can model these data assuming: 1. all sub-populations have the same mortality rate (complete pooling) 2. each sub-population has its own mortality rate, independent from other sites (no pooling) 3. each sub-poblation is different but similar to the others (partial pooling)
Lets try each option
Here the data model is a Binomial distribution and we can use its conjugate prior, the Beta. If we want a non-informative prior we do:
alpha <- 1
beta <- 1
x <- seq(from = 0, to = 1, by = 0.001)
pos_theta1 <- dbeta(x, alpha + sum(y), beta + sum((m - y)))
op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2,
cex.axis = 1.3, las = 1, bty = "n")
plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "densiy", xlab = "mortality rate")
par(op)
Questions:
1- What’s the expected value for the yearly mortality rate assuming “complete pooling”? 2- How different is this exected value to the observed mean mortality
Again, using non-informative priors:
pos_theta2 <- matrix(0, n, length(x))
op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2,
cex.axis = 1.3, las = 1, bty = "n")
plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "density", xlab = "mortality rate")
for (i in 1:10) {
pos_theta2[i, ] <- dbeta(x, alpha + y[i], beta + m[i] - y[i])
lines(x, pos_theta2[i, ], lwd = 3, col = "darkgrey")
}
par(op)
Questions:
3- What can you say about the different posteriors?
4- How would you estimate an average mortality rate for the whole meta-population?
Finally, we can fit a hierarchical model to the data. If we agree that each sub-population is potentiall different from the other, but at the same time, they are all part of the same meta-population, we cannot say that they are fully independent.
We will use JAGS for model fitting. JAGS is a sofware that programs for us Markov chain Monte Carto (MCMC) algorithms given user-defined models, data and priors.
JAGS is a successor to BUGS, “Bayesian inference Using Gibbs Sampling”" (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2013; Lunn, Thomas, Best, & Spiegelhalter, 2000). JAGS is very similar to BUGS but it has some extra functions and some times is faster. Also, JAGS can be used in Windows, Mac y Linux.
To use JAGS (or BUGS) first we have to define the model using a sinthetic language and save it as a text file. We could write the model on any word processor or we could write it in R using the functioncat. Lets call the model fine “hier.bug”:
cat(file="hier.bug",
"
model
{
for( i in 1 : n ) {
y[i] ~ dbin(theta[i], m[i])
theta[i] ~ dbeta(a,b)
}
a ~ dnorm(0,0.001)T(0,)
b ~ dnorm(0,0.001)T(0,)
mean_pobl <- a/(a+b)
}
")
Now we have to define the list of data that we want to pass to JAGS, a function to generate the initial values of the Markov Chains and the list of parameters that we want to save.
data <- list("y", "m", "n")
inits <- function() list(a = runif(1, 1, 5), b = runif(1, 5, 20))
params <- c("a", "b", "theta", "mean_pobl")
Now, we have to load the R packages needed so that R can talk to JAGS, set the number of iterations, number of chains to run and how many values will be discarded as burn-in.
library(jagsUI)
ni <- 5000
nc <- 3
nt <- 4
nb <- 2500
hier.sim <- jags(data, inits, params, model.file = "hier.bug", n.chains = nc,
n.iter = ni, n.burnin = nb, n.thin = nt)
##
## Processing function input.......
##
## Done.
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 37
##
## Initializing model
##
## Adaptive phase, 100 iterations x 3 chains
## If no progress bar appears JAGS has decided not to adapt
##
##
## Burn-in phase, 2500 iterations x 3 chains
##
##
## Sampling from joint posterior, 2500 iterations x 3 chains
##
##
## Calculating statistics.......
##
## Done.
To see details on the model fit we write:
print(hier.sim)
## JAGS output for model 'hier.bug', generated by jagsUI.
## Estimates based on 3 chains of 5000 iterations,
## burn-in = 2500 iterations and thin rate = 4,
## yielding 1875 total samples from the joint posterior.
## MCMC ran for 0.022 minutes at time 2016-02-17 16:23:25.
##
## mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
## a 3.962 2.831 0.718 3.367 11.285 FALSE 1 1.076 49
## b 27.184 17.012 5.712 24.015 69.386 FALSE 1 1.072 45
## theta[1] 0.059 0.038 0.003 0.054 0.148 FALSE 1 1.017 120
## theta[2] 0.097 0.043 0.030 0.093 0.196 FALSE 1 1.003 701
## theta[3] 0.070 0.044 0.005 0.066 0.173 FALSE 1 1.007 308
## theta[4] 0.221 0.063 0.117 0.215 0.363 FALSE 1 1.006 352
## theta[5] 0.185 0.057 0.090 0.179 0.305 FALSE 1 1.006 338
## theta[6] 0.162 0.053 0.074 0.157 0.278 FALSE 1 1.000 1875
## theta[7] 0.152 0.051 0.071 0.147 0.268 FALSE 1 1.000 1875
## theta[8] 0.105 0.063 0.010 0.097 0.252 FALSE 1 1.005 365
## theta[9] 0.112 0.066 0.010 0.106 0.268 FALSE 1 1.001 1585
## theta[10] 0.100 0.044 0.030 0.094 0.198 FALSE 1 1.004 496
## mean_pobl 0.127 0.033 0.069 0.125 0.199 FALSE 1 1.001 1387
## deviance 33.528 5.169 24.354 33.271 44.425 FALSE 1 1.018 115
##
## Successful convergence based on Rhat values (all < 1.1).
## Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## For each parameter, n.eff is a crude measure of effective sample size.
##
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
##
## DIC info: (pD = var(deviance)/2)
## pD = 13.1 and DIC = 46.668
## DIC is an estimate of expected predictive error (lower is better).
Make sure all the chains have converged and that you got a decent n.eff. Now we plot the posteriors comparing all models
a.sim <- hier.sim$sims.list$a
b.sim <- hier.sim$sims.list$b
mean(hier.sim$sims.list$mean_pobl) # compare this with the 'true value' used in the simulation
## [1] 0.1272574
alpha = 1
beta = 1
x = seq(from = 0, to = 1, by = 0.001)
op = par(mfrow = c(1, 2), mar = c(4, 4, 1, 1) + 0.1, las = 1, cex.lab = 1.5,
cex.axis = 1.3)
pos_theta1 = dbeta(x, alpha + sum(y), beta + sum((m - y)))
plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "density", xlab = "mortality rate",
ylim = c(0, 20))
pos_theta2 = matrix(0, 10, length(x))
for (i in 1:10) {
pos_theta2[i, ] = dbeta(x, alpha + y[i], beta + m[i] - y[i])
lines(x, pos_theta2[i, ], lwd = 3, col = "darkgrey")
}
plot(density(hier.sim$sims.list$mean_pobl), lwd = 3, col = 2, xlab = "mortality rate",
ylab = "", type = "l", ylim = c(0, 20), xlim = c(0, 1), main = "")
lines(x, pos_theta1, type = "l", lty = 2, lwd = 3)
for (i in 1:10) {
lines(density(hier.sim$sims.list$theta[, i]), col = "blue", lwd = 2)
}
par(op)
Questions:
5- What differences you can find between the no-pooling posteriors and the partial-pooling ones?
6- What is the mean posterior mortality rate for population number 8 according to each of the modelling options? Do they make sense given the data?
Juan Manuel Morales 2016-02-17