Análisis Jerárquico

En este trabajo práctico vamos a aprender a escribir y ajustar modelos jerárquicos en JAGS. Para entender de qué se trata esto de las estructuras jerárquicas vamos a empezar con datos simulados.

Objetivos:

Simular datos con estructura jerárquica
Realizar análisis de complete pooling y no-pooling
Comparar los análisis anteriores con un modelo jerárquico (partial pooling) usando JAGS

Estamos estudiando \(10\) sub-poblaciones de Chinchillones en la estepa Patagónica, en cada sub-población y queremos saber cuál es la probabilidad de que sobrevivan al invierno. Para eso vamos a simular datos provenientes de individuos de cada una de las sub-poblaciones. Asumimos que todas las sub-poblaciones son parte de una misma “meta-población”. Esto hace que esperemos que los parámetros de supervivencia sean parecidos entre las sub-poblaciones, pero que tengan cierto grado de diferencia. Para modelar la variabilidad entre sub-poblaciones podemos usar una distribución Beta, por ejemplo con parámetros \(a_s = 2\) y \(b_s = 10\). Si repasamos el Bestiario vemos que esta combinación de parámetros resulta en un valor esperado de \(\frac{a_s}{(a_s+b_s)}\) = 0.167 y una varianza de \(\frac{(a_s \times b_s)}{(a_s + b_s)^2 \times (a_s + b_s + 1)}\) = 0.011. Entonces, cada sub-población tendrá su propia tasa de supervivencia. Para simular los datos hacemos:

set.seed(1234)
n <- 10  # sub-poblaciones
m <- c(30, 28, 20, 30, 30, 26, 29, 5, 3, 27)  # número de individuos muestraeados por sub-poblacion

a_s <- 2
b_s <- 10

theta <- rbeta(n, a_s, b_s)  # generamos n tasas de mortalidad, una por cada sub-poblacion    
y <- rbinom(n, size = m, prob = theta)  # simulamos número de muertes por grupo

op <- par(cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, las = 1, bty = "n")

plot(table(y), xlab = "Muertes por sub-población", ylab = "Frecuencia")

par(op)

Estos datos nos muestran el número de muertes por sub-población. Hay cuatro sub-poblaciones que no tuvieron muertes, dos que tuvieron dos muertes, etc. Además, nos tenemos que acordar que cuando simulamos estos datos cada sub-población tenia su propia tasa de mortalidad!

Vamos a modelar estos datos asumiendo que: 1. todas las sub-poblaciones tienen la misma mortalidad (complete pooling), es decir que ignoramos que cada sub-población tiene su propia tasa de mortalidad 2. cada sub-población tiene una tasa de mortalidad independiente de las otras (no pooling), es decir que ignoramos que todas las sub-poblaciones son parte de una misma “meta-población”. 3. cada sub-población es diferente pero parecida a las otras (partial pooling)

Vamos por partes:

(1) complete pooling

En este caso el modelo de datos (cuántos individuos mueren) es una Binomial:

Como de costumbre, definimos los datos que le vamos a pasar a JAGS, una función para generar valores iniciales para las cadenas Markovianas, los parámetros que queremos guardar, cuántas iteraciones vamos a correr, etc.

data <- list("y", "m", "n")
inits <- function() list(theta = runif(1, 0, 1))
params <- c("theta")

ni <- 1000
nc <- 3
nt <- 1
nb <- 500

Ahora llamamos a JAGS

library(jagsUI)
cp.sim <- jags(data, inits, params, model.file = "completepooling.bug", n.chains = nc, 
    n.iter = ni, n.burnin = nb, n.thin = nt)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 1
##    Total graph size: 24
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 500 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 500 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

print(cp.sim)

## JAGS output for model 'completepooling.bug', generated by jagsUI.
## Estimates based on 3 chains of 1000 iterations,
## burn-in = 500 iterations and thin rate = 1,
## yielding 1500 total samples from the joint posterior. 
## MCMC ran for 0.002 minutes at time 2017-06-20 20:32:03.
## 
##            mean    sd   2.5%    50%  97.5% overlap0 f  Rhat n.eff
## theta     0.135 0.022  0.095  0.134  0.182    FALSE 1 1.000  1500
## deviance 47.398 1.429 46.402 46.846 51.513    FALSE 1 1.002  1481
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 1 and DIC = 48.418 
## DIC is an estimate of expected predictive error (lower is better).

Ejercicios:

1- ¿Cuál es el valor esperado de la posterior de la tasa de mortalidad asumiendo “complete pooling”? 2- Comparar el valor esperado de la tasa de mortalidad con el valor promedio de las tasas de mortalidad usadas para simular los datos.

(2) No-pooling

De nuevo usando previas no informativas:

cat(file = "nopooling.bug", "
  model   {
    for( i in 1 : n ) {
        y[i] ~ dbin(theta[i], m[i]) # Likelihood
        theta[i] ~ dbeta(1, 1) # previas no-informativas para la tasa de mortalidad
\t\t}
}
")

library(jagsUI)
data <- list("y", "m", "n")
inits <- function() list(theta = runif(n, 0, 1))
params <- c("theta")

ni <- 1000
nc <- 3
nt <- 1
nb <- 500
np.sim <- jags(data, inits, params, model.file = "nopooling.bug", n.chains = nc, 
    n.iter = ni, n.burnin = nb, n.thin = nt)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 10
##    Total graph size: 51
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 500 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 500 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

print(np.sim)

## JAGS output for model 'nopooling.bug', generated by jagsUI.
## Estimates based on 3 chains of 1000 iterations,
## burn-in = 500 iterations and thin rate = 1,
## yielding 1500 total samples from the joint posterior. 
## MCMC ran for 0.005 minutes at time 2017-06-20 20:32:04.
## 
##             mean    sd   2.5%    50%  97.5% overlap0 f  Rhat n.eff
## theta[1]   0.031 0.030  0.001  0.022  0.111    FALSE 1 1.001  1500
## theta[2]   0.100 0.053  0.022  0.091  0.227    FALSE 1 1.002   843
## theta[3]   0.044 0.041  0.001  0.032  0.157    FALSE 1 1.004  1500
## theta[4]   0.309 0.080  0.166  0.306  0.475    FALSE 1 1.002   753
## theta[5]   0.247 0.078  0.114  0.241  0.419    FALSE 1 1.001  1500
## theta[6]   0.212 0.076  0.087  0.206  0.380    FALSE 1 1.000  1500
## theta[7]   0.195 0.067  0.081  0.188  0.343    FALSE 1 1.001  1222
## theta[8]   0.145 0.126  0.004  0.109  0.475    FALSE 1 1.004   745
## theta[9]   0.198 0.160  0.006  0.161  0.591    FALSE 1 0.999  1500
## theta[10]  0.101 0.052  0.022  0.094  0.224    FALSE 1 1.000  1500
## deviance  31.580 4.753 24.035 30.908 42.289    FALSE 1 1.002   845
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 11.3 and DIC = 42.863 
## DIC is an estimate of expected predictive error (lower is better).

Veamos un gráfico para comprar los dos análisis (complete-pooling y no-pooling)

op <- par(cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, las = 1, bty = "n")

plot(density(cp.sim$sims.list$theta), type = "l", lwd = 3, ylab = "Density", 
    xlab = "Tasa de mortalidad", xlim = c(0, 0.5), ylim = c(0, 30), main = "")  #Posterior de theta con complete pooling

for (i in 1:10) {
    lines(density(np.sim$sims.list$theta[, i]), type = "l", col = "gray", lwd = 3)  #Posteriores de theta con no pooling
}

par(op)

Ejercicios

1- ¿Qué se puede decir de las distintas posteriores?

2- ¿Alguno de estos dos enfoques les parece más adecuado? ¿Cuáles les parecen los pro y contras de ambos?

(3) Partial pooling

Finalmente, podemos hacer un análisis jerárquico o “multi-nivel” (partial pooling), reconociendo explícitamente que cada sub-población es potencialmente distinta de las otras, pero todas son parte de la misma meta-población. Esto último implica que las observaciones en las sub-poblaciones no son del todo independientes.

Noten que en la última línea del modelo BUGS, registramos el valor esperado de la tasa de mortalidad para la metapoblación. Este es un ejemplo de cómo podemos calcular valores de interés (en la jerga estadística serían “derived quantities”) dentro del modelo BUGS. En este caso, usamos esta fórmula que corresponde a la media de la distribución Beta (ver Bestiario de distribuciones). Entonces, vamos a obtener una posterior para esta “derived quantity”.

library(jagsUI)
data <- list("y", "m", "n")
inits <- function() list(a = runif(1, 1, 5), b = runif(1, 5, 20))
params <- c("a", "b", "theta", "mean_pobl")
ni <- 5000
nc <- 3
nt <- 4
nb <- 2500

hier.sim <- jags(data, inits, params, model.file = "hier.bug", n.chains = nc, 
    n.iter = ni, n.burnin = nb, n.thin = nt)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 12
##    Total graph size: 41
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## 
##  Burn-in phase, 2500 iterations x 3 chains 
##  
## 
## Sampling from joint posterior, 2500 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

print(hier.sim)

## JAGS output for model 'hier.bug', generated by jagsUI.
## Estimates based on 3 chains of 5000 iterations,
## burn-in = 2500 iterations and thin rate = 4,
## yielding 1875 total samples from the joint posterior. 
## MCMC ran for 0.013 minutes at time 2017-06-20 20:32:04.
## 
##             mean     sd   2.5%    50%  97.5% overlap0 f  Rhat n.eff
## a          3.595  2.780  0.612  2.770 10.657    FALSE 1 1.100    40
## b         24.834 16.626  4.701 20.652 66.179    FALSE 1 1.064    49
## theta[1]   0.054  0.039  0.002  0.047  0.146    FALSE 1 1.022   109
## theta[2]   0.094  0.044  0.025  0.089  0.188    FALSE 1 1.008   235
## theta[3]   0.065  0.045  0.003  0.058  0.172    FALSE 1 1.015   138
## theta[4]   0.222  0.062  0.121  0.216  0.365    FALSE 1 1.001  1439
## theta[5]   0.185  0.057  0.089  0.179  0.313    FALSE 1 1.003   726
## theta[6]   0.163  0.056  0.072  0.156  0.290    FALSE 1 1.000  1875
## theta[7]   0.152  0.051  0.066  0.147  0.265    FALSE 1 1.004  1078
## theta[8]   0.100  0.062  0.008  0.093  0.236    FALSE 1 1.005   994
## theta[9]   0.107  0.067  0.008  0.099  0.260    FALSE 1 1.002  1332
## theta[10]  0.097  0.043  0.027  0.093  0.190    FALSE 1 1.004   431
## mean_pobl  0.126  0.033  0.070  0.123  0.198    FALSE 1 1.006   331
## deviance  32.921  5.278 24.228 32.313 44.336    FALSE 1 1.015   144
## 
## **WARNING** Rhat values indicate convergence failure. 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 13.7 and DIC = 46.67 
## DIC is an estimate of expected predictive error (lower is better).

Ahora podemos ver la posterior de la tasa de mortalidad a nivel de meta-población y compararla con el valor verdadero que usamos en las simulaciones (en rojo).

op <- par(cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, las = 1, bty = "n")
plot(density(hier.sim$sims.list$mean_pobl), main = "", xlab = "Tasa de  supervivencia", 
    lwd = 3)
abline(v = a_s/(a_s + b_s), col = 2, lwd = 3, lty = 2)

par(op)

Finalmente podemos comparar gráficamente los tres enfoques:

op <- par(mfrow = c(1, 2), cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, las = 1, 
    bty = "n")

plot(density(cp.sim$sims.list$theta), type = "l", lwd = 3, ylab = "Density", 
    xlab = "Tasa de mortalidad", xlim = c(0, 1), ylim = c(0, 25), main = "")  #Complete pooling

for (i in 1:10) {
    lines(density(np.sim$sims.list$theta[, i]), type = "l", col = "gray", lwd = 2)
}  #No pooling

curve(dbeta(x, hier.sim$mean$a, hier.sim$mean$b), lwd = 3, col = 2, xlab = "Tasa de mortalidad", 
    ylab = "", ylim = c(0, 25))  #Tasa de mortalidad meta-poblacional estimada con partial pooling

for (i in 1:10) {
    lines(density(hier.sim$sims.list$theta[, i]), col = "blue", lwd = 2, main = "")  #Tasa de mortalidad de cada subpoblación estimadas con partial pooling
}

par(op)

En el gráfico la linea roja representa la posterior para la tasa de mortalidad de la meta-población y cada linea azul representa la posterior para la tasa de mortalidad de cada sub-población. ¿Cómo comparan las posteriores para las distintas sub-poblaciones con no-pooling y partial pooling?