Hierarchica Analysis

Example with simulated data

goals:

simulate data with hierarchical structure
perform complete pooling and no-pooling data analysis
compare the above analysis with a hierarchical model fitted with JAGS

Suppose we study \(10\) sub-poblations in a region and that we follow the survival from one year to the next of a variable number of individuals in each sub-population. Lets simulate data with this hiearchical structure. The idea is that all sub-populations are part of the same meta-population, belong to the same species, are subject to the same general climate and so on. So, we expect that survival probability will be similar among populations, but at the same time they will not be identical. To model variability between sub-populations we could use a Beta distribution, for example with parameters \(a_s = 2\) y \(b_s = 10\). According to this distribution, the expected mortality is \(\frac{a_s}{(a_s+b_s)}\) = 0.1666667 with variance \(\frac{(a_s \times b_s)}{(a_s + b_s)^2 \times (a_s + b_s + 1)}\) = 0.0106838. Assuming that we fix the number of sampled individuals from each sub-population, we can simulate mortality rates by:

set.seed(1234)
n <- 10  # subgrups
m <- c(30, 28, 20, 30, 30, 26, 29, 5, 3, 27)  # sampled individuals per group

a_s <- 2
b_s <- 10

theta <- rbeta(n, a_s, b_s)  # draw n mortality rates    
y <- rbinom(n, size = m, prob = theta)  # simulate deaths per group

# now we make a graph
op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2, 
    cex.axis = 1.3, las = 1, bty = "n")

plot(table(y), xlab = "# deaths", ylab = "frequency")

par(op)

We can model these data assuming: 1. all sub-populations have the same mortality rate (complete pooling) 2. each sub-population has its own mortality rate, independent from other sites (no pooling) 3. each sub-poblation is different but similar to the others (partial pooling)

Lets try each option

(1) complete pooling

Here the data model is a Binomial distribution and we can use its conjugate prior, the Beta. If we want a non-informative prior we do:

alpha <- 1
beta <- 1

x <- seq(from = 0, to = 1, by = 0.001)
pos_theta1 <- dbeta(x, alpha + sum(y), beta + sum((m - y)))

op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2, 
    cex.axis = 1.3, las = 1, bty = "n")

plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "densiy", xlab = "mortality rate")

par(op)

Questions:

1- What’s the expected value for the yearly mortality rate assuming “complete pooling”? 2- How different is this exected value to the observed mean mortality

(2) no-pooling

Again, using non-informative priors:

pos_theta2 <- matrix(0, n, length(x))

op <- par(mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5, font.lab = 2, 
    cex.axis = 1.3, las = 1, bty = "n")

plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "density", xlab = "mortality rate")
for (i in 1:10) {
    pos_theta2[i, ] <- dbeta(x, alpha + y[i], beta + m[i] - y[i])
    lines(x, pos_theta2[i, ], lwd = 3, col = "darkgrey")
}

par(op)

Questions:

3- What can you say about the different posteriors?

4- How would you estimate an average mortality rate for the whole meta-population?

(3) partial pooling

Finally, we can fit a hierarchical model to the data. If we agree that each sub-population is potentiall different from the other, but at the same time, they are all part of the same meta-population, we cannot say that they are fully independent.

Just Another Gibs Sampler

We will use JAGS for model fitting. JAGS is a sofware that programs for us Markov chain Monte Carto (MCMC) algorithms given user-defined models, data and priors.

JAGS is a successor to BUGS, “Bayesian inference Using Gibbs Sampling”" (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2013; Lunn, Thomas, Best, & Spiegelhalter, 2000). JAGS is very similar to BUGS but it has some extra functions and some times is faster. Also, JAGS can be used in Windows, Mac y Linux.

To use JAGS (or BUGS) first we have to define the model using a sinthetic language and save it as a text file. We could write the model on any word processor or we could write it in R using the functioncat. Lets call the model fine “hier.bug”:

cat(file="hier.bug",
    "
    model
{
  for( i in 1 : n ) {
        y[i] ~ dbin(theta[i], m[i]) 
        theta[i] ~ dbeta(a,b)
        }
        a ~ dnorm(0,0.001)T(0,)
        b ~ dnorm(0,0.001)T(0,)
    mean_pobl <- a/(a+b)
}
    ")

Now we have to define the list of data that we want to pass to JAGS, a function to generate the initial values of the Markov Chains and the list of parameters that we want to save.

data <- list("y", "m", "n")
inits <- function() list(a = runif(1, 1, 5), b = runif(1, 5, 20))
params <- c("a", "b", "theta", "mean_pobl")

Now, we have to load the R packages needed so that R can talk to JAGS, set the number of iterations, number of chains to run and how many values will be discarded as burn-in.

library(jagsUI)
ni <- 5000
nc <- 3
nt <- 4
nb <- 2500

hier.sim <- jags(data, inits, params, model.file = "hier.bug", n.chains = nc, 
    n.iter = ni, n.burnin = nb, n.thin = nt)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 37
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 2500 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 2500 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

To see details on the model fit we write:

print(hier.sim)

## JAGS output for model 'hier.bug', generated by jagsUI.
## Estimates based on 3 chains of 5000 iterations,
## burn-in = 2500 iterations and thin rate = 4,
## yielding 1875 total samples from the joint posterior. 
## MCMC ran for 0.022 minutes at time 2016-02-17 16:23:25.
## 
##             mean     sd   2.5%    50%  97.5% overlap0 f  Rhat n.eff
## a          3.962  2.831  0.718  3.367 11.285    FALSE 1 1.076    49
## b         27.184 17.012  5.712 24.015 69.386    FALSE 1 1.072    45
## theta[1]   0.059  0.038  0.003  0.054  0.148    FALSE 1 1.017   120
## theta[2]   0.097  0.043  0.030  0.093  0.196    FALSE 1 1.003   701
## theta[3]   0.070  0.044  0.005  0.066  0.173    FALSE 1 1.007   308
## theta[4]   0.221  0.063  0.117  0.215  0.363    FALSE 1 1.006   352
## theta[5]   0.185  0.057  0.090  0.179  0.305    FALSE 1 1.006   338
## theta[6]   0.162  0.053  0.074  0.157  0.278    FALSE 1 1.000  1875
## theta[7]   0.152  0.051  0.071  0.147  0.268    FALSE 1 1.000  1875
## theta[8]   0.105  0.063  0.010  0.097  0.252    FALSE 1 1.005   365
## theta[9]   0.112  0.066  0.010  0.106  0.268    FALSE 1 1.001  1585
## theta[10]  0.100  0.044  0.030  0.094  0.198    FALSE 1 1.004   496
## mean_pobl  0.127  0.033  0.069  0.125  0.199    FALSE 1 1.001  1387
## deviance  33.528  5.169 24.354 33.271 44.425    FALSE 1 1.018   115
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 13.1 and DIC = 46.668 
## DIC is an estimate of expected predictive error (lower is better).

Make sure all the chains have converged and that you got a decent n.eff. Now we plot the posteriors comparing all models

a.sim <- hier.sim$sims.list$a
b.sim <- hier.sim$sims.list$b
mean(hier.sim$sims.list$mean_pobl)  # compare this with the 'true value' used in the simulation

## [1] 0.1272574

alpha = 1
beta = 1
x = seq(from = 0, to = 1, by = 0.001)

op = par(mfrow = c(1, 2), mar = c(4, 4, 1, 1) + 0.1, las = 1, cex.lab = 1.5, 
    cex.axis = 1.3)
pos_theta1 = dbeta(x, alpha + sum(y), beta + sum((m - y)))
plot(x, pos_theta1, type = "l", lty = 2, lwd = 3, ylab = "density", xlab = "mortality rate", 
    ylim = c(0, 20))
pos_theta2 = matrix(0, 10, length(x))
for (i in 1:10) {
    pos_theta2[i, ] = dbeta(x, alpha + y[i], beta + m[i] - y[i])
    lines(x, pos_theta2[i, ], lwd = 3, col = "darkgrey")
}
plot(density(hier.sim$sims.list$mean_pobl), lwd = 3, col = 2, xlab = "mortality rate", 
    ylab = "", type = "l", ylim = c(0, 20), xlim = c(0, 1), main = "")
lines(x, pos_theta1, type = "l", lty = 2, lwd = 3)
for (i in 1:10) {
    lines(density(hier.sim$sims.list$theta[, i]), col = "blue", lwd = 2)
}

par(op)

Questions:

5- What differences you can find between the no-pooling posteriors and the partial-pooling ones?

6- What is the mean posterior mortality rate for population number 8 according to each of the modelling options? Do they make sense given the data?

Juan Manuel Morales 2016-02-17