Establecimiento de Alisos en el NOA

Veremos como modelar el establecimiento de alisos en el NOA en función de lluvia. Los datos son de Ezequiel Araoz. Carguemos los datos y grafiquemos cómo cambia el número de establecimientos de Aliso en función de la precipitación en distintas cuencas.

library(coda)
library(jagsUI)

datos <- read.table("https://sites.google.com/site/modelosydatos/alisos.txt", 
    header = TRUE)
attach(datos)

op <- par(mar = c(4, 4, 1, 1) + 0.1, las = 1, cex.lab = 1.5, cex.axis = 1.3)
plot(81 - yr[1:81], lluvia[1:81], type = "l", lwd = 2, xlab = "Año", ylab = "Lluvia (desvíos)")

par(op)

ncuencas <- length(unique(cuenca))
op <- par(mar = c(4, 4, 1, 1) + 0.1, las = 1, cex.lab = 1.5, cex.axis = 1.3)
plot(lluvia, ne, xlab = "lluvia (desvíos)", ylab = "Número de Establecimientos")
for (i in 1:ncuencas) {
    tmp <- lm(ne[cuenca == i] ~ lluvia[cuenca == i])
    points(lluvia[cuenca == i], ne[cuenca == i], pch = 16, col = i + 1)
    abline(tmp, col = i + 1, lwd = 3)
}
tmp <- lm(ne ~ lluvia)
abline(tmp, lwd = 4)

par(op)

Los colores de puntos y líneas corresponden a las distintas cuencas (datos y regresiones lineales simples de no-pooling) y la línea negra gruesa una regresión simple para todos los datos (complete pooling). Vamos a ajustar un modelo jerárquico teniendo en cuenta la variabilidad entre cuencas.

Análisis jerárquico

Escribimos el modelo:

cat(file = "aliso.bug",
    "
model{
# modelo de datos
for(i in 1:nobs){
    ne[i]~dpois(lambda[i]) # establecimientos por año
    lambda[i] <- exp( a[cuenca[i]] + b[cuenca[i]]*lluvia[i] ) 
    }

  # definir distribuciones previas para todos los parametros
    # previas a nivel poblacional
    mua~dnorm(0,0.001)  # media para ordenada al origen
    sa~dunif(0,100)     # desvío estándar
    taua<-1/(sa*sa)     # precisión
    mub~dnorm(0,0.001)  
    sb~dunif(0,100)
    taub<-1/(sb*sb)
    
    # previas a nivel de cuencas
    for(j in 1:ncuencas){
        a[j]~dnorm(mua,taua)
        b[j]~dnorm(mub,taub)
        }
}
    ")

Luego definimos los datos, la función para los valores iniciales y los parámetros que queremos guardar.

nobs = length(yr)

aliso.data <- list("ne", "lluvia", "cuenca", "nobs", "ncuencas")

inits <- function() {
    list(a = rnorm(ncuencas, 0, 0.5), b = rnorm(ncuencas, 0, 0.5), mua = rnorm(1, 
        0, 0.5), mub = rnorm(1, 0, 0.5), sa = runif(1), sb = runif(1))
}

parameters <- c("a", "b", "mua", "mub", "sa", "sb")

Ahora llamamos a jags

aliso.sim <- jags(data = aliso.data, inits, parameters, model.file = "aliso.bug", 
    n.thin = 1, n.chains = 3, n.iter = 5000)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 567
##    Unobserved stochastic nodes: 18
##    Total graph size: 3438
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## No burn-in specified 
##  
## Sampling from joint posterior, 5000 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

Vemos que hayan convergido y que los tamaños efectivos de las muestras de las posteriores (neff) sean razonables.

print(aliso.sim)

## JAGS output for model 'aliso.bug', generated by jagsUI.
## Estimates based on 3 chains of 5000 iterations,
## burn-in = 0 iterations and thin rate = 1,
## yielding 15000 total samples from the joint posterior. 
## MCMC ran for 0.279 minutes at time 2016-10-06 23:26:17.
## 
##              mean    sd     2.5%      50%    97.5% overlap0     f  Rhat
## a[1]        1.497 0.046    1.405    1.498    1.588    FALSE 1.000 1.001
## a[2]        1.122 0.062    0.998    1.122    1.242    FALSE 1.000 1.003
## a[3]        0.911 0.074    0.766    0.912    1.057    FALSE 1.000 1.004
## a[4]        1.654 0.048    1.558    1.655    1.747    FALSE 1.000 1.001
## a[5]        1.005 0.065    0.876    1.005    1.129    FALSE 1.000 1.003
## a[6]        1.159 0.060    1.042    1.159    1.274    FALSE 1.000 1.002
## a[7]        2.161 0.036    2.090    2.161    2.232    FALSE 1.000 1.000
## b[1]       -0.037 0.046   -0.127   -0.037    0.053     TRUE 0.792 1.000
## b[2]       -0.145 0.066   -0.275   -0.144   -0.018    FALSE 0.988 1.002
## b[3]        0.365 0.067    0.234    0.364    0.499    FALSE 1.000 1.003
## b[4]        0.124 0.049    0.028    0.124    0.220    FALSE 0.994 1.001
## b[5]        0.121 0.065   -0.004    0.120    0.250     TRUE 0.971 1.003
## b[6]        0.158 0.059    0.043    0.158    0.274    FALSE 0.997 1.002
## b[7]        0.220 0.036    0.149    0.219    0.290    FALSE 1.000 1.002
## mua         1.359 0.233    0.897    1.360    1.826    FALSE 1.000 1.001
## mub         0.116 0.096   -0.073    0.116    0.306     TRUE 0.910 1.000
## sa          0.572 0.228    0.302    0.519    1.182    FALSE 1.000 1.002
## sb          0.226 0.099    0.108    0.205    0.478    FALSE 1.000 1.000
## deviance 3625.725 9.256 3609.348 3625.159 3645.541    FALSE 1.000 1.001
##          n.eff
## a[1]      1591
## a[2]      1014
## a[3]       503
## a[4]      3981
## a[5]       832
## a[6]      2600
## a[7]      4493
## b[1]      4217
## b[2]       951
## b[3]       608
## b[4]      2482
## b[5]      1077
## b[6]       960
## b[7]      1670
## mua       3117
## mub       5956
## sa        2738
## sb       15000
## deviance  1656
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 42.8 and DIC = 3668.518 
## DIC is an estimate of expected predictive error (lower is better).

Podemos ver que para algunas cuencas como la 3, 4, 6 y 7 los establecimientos aumentan con la precipitación (b > 0), mientras que en otras cuencas como la 2 el efecto es negativo. En general, la tendencia es a un aumento de los establecimientos con las precipitaciones (alrededor del 90% de la posterior de mub es mayor que cero). Podemos ver esto (masomenos) gráficamente:

r <- seq(min(lluvia), max(lluvia), 0.001)

op <- par(mar = c(4, 4, 1, 1) + 0.1, las = 1, cex.lab = 1.5, cex.axis = 1.3)
plot(lluvia, ne, xlab = "lluvia (desvíos)", ylab = "Número de Establecimientos")
for (i in 1:ncuencas) {
    points(lluvia[cuenca == i], ne[cuenca == i], pch = 16, col = i + 1)
    lines(r, exp(aliso.sim$mean$a[i] + aliso.sim$mean$b[i] * r), col = i + 1, 
        lwd = 3)
}

lines(r, exp(aliso.sim$mean$mua + aliso.sim$mean$mub * r), lwd = 4)

par(op)

Pero es posible que no observemos todos los establecimientos. Un grupo de árboles puede desaparecer antes de que los encontremos. Entonces, modelamos como variable oculta al número de establecimientos por año y modelamos la probabilidad de observar un establecimeinto en función del tiempo p[i] <- exp(-d * yr[i]).

cat(file = "alisop.bug",
    "
model{
 for(i in 1:(nobs)){
    N[i]~dpois(lambda[i])  # (variable oculta)
    lambda[i] <- exp( a[cuenca[i]] + b[cuenca[i]]*lluvia[i] ) 
    p[i] <- exp(-d * yr[i])  # probabilidad de observar un establecimiento en función del tiempo
    nep[i] ~ dbin(p[i],N[i]) # establecimientos observados
    }

    # previas para todos los par·metros 
    mua~dnorm(0,0.001) 
    sa~dunif(0,100)
    taua<-1/(sa*sa) 
    mub~dnorm(0,0.001)
    sb~dunif(0,100)
    taub<-1/(sb*sb)
    d~dunif(0,100)
    
    for(j in 1:ncuencas){
        a[j]~dnorm(mua,taua)
        b[j]~dnorm(mub,taub)
        }
}       
    ")

Luego definimos datos, initis y parámetros para poder ajustar el modelo

alisop.data <- list("yr", "ne", "lluvia", "cuenca", "nobs", "ncuencas")
initsp <- function() {
    list(a = rnorm(ncuencas, 6, 0.5), b = rnorm(ncuencas, 1, 0.1), mua = rnorm(1, 
        0, 0.5), mub = rnorm(1, 0, 0.5), sa = runif(1), sb = runif(1), d = 1e-05)
}

parametersp <- c("a", "b", "mua", "mub", "sa", "sb", "d")

alisop.sim <- jags(data = alisop.data, initsp, parametersp, model.file = "alisop.bug", 
    n.thin = 5, n.chains = 3, n.iter = 5000)

## 
## Processing function input....... 
## 
## Done. 
##  
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 567
##    Unobserved stochastic nodes: 586
##    Total graph size: 4738
## 
## Initializing model
## 
## Adaptive phase, 100 iterations x 3 chains 
## If no progress bar appears JAGS has decided not to adapt 
##  
## No burn-in specified 
##  
## Sampling from joint posterior, 5000 iterations x 3 chains 
##  
## 
## Calculating statistics....... 
## 
## Done.

print(alisop.sim)

## JAGS output for model 'alisop.bug', generated by jagsUI.
## Estimates based on 3 chains of 5000 iterations,
## burn-in = 0 iterations and thin rate = 5,
## yielding 3000 total samples from the joint posterior. 
## MCMC ran for 0.553 minutes at time 2016-10-06 23:26:35.
## 
##              mean     sd     2.5%      50%    97.5% overlap0     f  Rhat
## a[1]        2.220  0.058    2.102    2.220    2.331    FALSE 1.000 1.001
## a[2]        1.984  0.069    1.849    1.984    2.123    FALSE 1.000 1.009
## a[3]        1.774  0.078    1.618    1.776    1.923    FALSE 1.000 1.002
## a[4]        2.513  0.059    2.399    2.514    2.626    FALSE 1.000 1.004
## a[5]        1.859  0.074    1.715    1.860    2.006    FALSE 1.000 1.007
## a[6]        2.011  0.071    1.871    2.012    2.147    FALSE 1.000 1.003
## a[7]        2.938  0.049    2.840    2.939    3.036    FALSE 1.000 1.004
## b[1]       -0.189  0.050   -0.289   -0.188   -0.093    FALSE 1.000 1.000
## b[2]       -0.360  0.067   -0.495   -0.359   -0.235    FALSE 1.000 1.001
## b[3]        0.118  0.067   -0.010    0.118    0.251     TRUE 0.963 1.008
## b[4]       -0.100  0.047   -0.192   -0.101   -0.010    FALSE 0.984 1.002
## b[5]       -0.111  0.063   -0.239   -0.110    0.008     TRUE 0.967 1.001
## b[6]       -0.075  0.059   -0.192   -0.076    0.040     TRUE 0.893 1.001
## b[7]        0.023  0.038   -0.047    0.022    0.100     TRUE 0.717 1.003
## mua         2.186  0.222    1.733    2.186    2.617    FALSE 1.000 1.002
## mub        -0.097  0.089   -0.265   -0.099    0.080     TRUE 0.893 1.003
## sa          0.534  0.207    0.280    0.484    1.062    FALSE 1.000 1.017
## sb          0.207  0.089    0.097    0.188    0.425    FALSE 1.000 1.002
## d           0.024  0.001    0.022    0.024    0.026    FALSE 1.000 1.005
## deviance 2137.263 45.801 2048.260 2137.775 2227.003    FALSE 1.000 1.002
##          n.eff
## a[1]      3000
## a[2]       229
## a[3]      1177
## a[4]       488
## a[5]       277
## a[6]       623
## a[7]      3000
## b[1]      3000
## b[2]      3000
## b[3]       292
## b[4]       706
## b[5]      3000
## b[6]      3000
## b[7]       568
## mua       3000
## mub       3000
## sa         547
## sb        3000
## d          653
## deviance  1539
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 1048.2 and DIC = 3185.469 
## DIC is an estimate of expected predictive error (lower is better).

Luego de chequear converjencia y "neff" vemos que ahora la relación de los establecimientos con la lluvia es mayormente negativa!

Podemos ver como cambia la probabilidad de encontrar un establecimiento con el tiempo

t = 0:80
plot(t, exp(-alisop.sim$mean$d * t), lwd = 2, type = "l", xlab = "tiempo (años)", 
    ylab = "Pr Supervivencia al Presente")

Y el establecimiento en función de la lluvia

r = seq(min(lluvia), max(lluvia), 0.001)
plot(lluvia, ne, xlab = "lluvia (desvíos)", ylab = "Número de Establecimientos")
for (i in 1:ncuencas) {
    points(lluvia[cuenca == i], ne[cuenca == i], col = i)
    lines(r, exp(alisop.sim$mean$a[i] + alisop.sim$mean$b[i] * r), col = i, 
        lwd = 2)
}
lines(r, exp(alisop.sim$mean$mua + alisop.sim$mean$mub * r), lwd = 4)

par(op)

sessionInfo()

## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.6 (El Capitan)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] jagsUI_1.4.2    lattice_0.20-33 coda_0.18-1     knitr_1.13     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5          digest_0.6.9         mime_0.4            
##  [4] grid_3.3.0           knitrBootstrap_1.0.0 formatR_1.4         
##  [7] magrittr_1.5         evaluate_0.9         stringi_1.1.1       
## [10] rjags_4-4            rmarkdown_0.9.6      tools_3.3.0         
## [13] stringr_1.0.0        markdown_0.7.7       parallel_3.3.0      
## [16] yaml_2.1.13          htmltools_0.3.5

Juan Manuel Morales. 30 de Septiembre del 2015. Última actualización: 2016-10-06