This chapter has been an introduction to the motivation, implementation, and interpretation of basic multilevel models. It focused on varying intercepts, which achieve better estimates of baseline differences among clusters in the data. They achieve better estimates, because they simultaneously model the population of clusters and use inferences about the population to pool information among parameters. From another perspective, varying intercepts are adaptively regularized parameters, relying upon a prior that is itself learned from the data.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html
file as: YourName_ANLY505-Year-Semester.html
and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
13E1. Which of the following priors will produce more shrinkage in the estimates? \[\begin{align} \ α_{TANK} ∼ Normal(0, 1) \tag{a} \\ \ α_{TANK} ∼ Normal(0, 2) \tag{b} \\ \end{align}\]
# The normal distribution fits a probability distribution centered around the mean and the spread is given by the standard deviation. Thus the first option, (a) will produce more shrinkage in the estimates, as the prior will be more concentrated.
curve(dnorm(x,0,1),from=-10,to=10,col="red",ylab="density")
curve(dnorm(x,0,2),add=TRUE)
legend("topright",
col = c("red","black"),
pch = 19,
legend = c("Normal(0,1)","Normal(0,2)"))
13E2. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Binomial(1, p_i) \\ logit(p_i) = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 1.5) \\ β ∼ Normal(0, 0.5) \\ \end{align}\]
# To convert the model to a multilevel model is to take the prior for the vector of intercepts, α_group, and make it adaptive. This means we define parameters for its mean and standard deviation. Then we assign these two new parameters their own priors, 'hyperpriors'. This is what it looks like:
\[\begin{align} y_i ∼ Binomial(1, p_i) \\ logit(p_i) = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(\bar{α}, σ_α) \\ β ∼ Normal(0, 1) \\ \bar{α} ∼ Normal(0, 10) \\ σ_α ∼ HalfCauchy(0, 1) \\ \end{align}\]
13E3. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Normal(μ_i, σ) \\ μ_i = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 5) \\ β ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]
# there is already a standard deviation parameter, σ. But that standard deviation is for the residuals, at the top level. Thus we need another standard deviation for the varying intercepts:
\[\begin{align} y_i ∼ Normal(μ_i, σ) \\ μ_i = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(\bar{α}, σ_α) \\ β ∼ Normal(0, 1) \\ σ ∼ HalfCauchy(0, 1) \\ \bar{α} ∼ Normal(0, 10) \\ σ_α ∼ HalfCauchy(0, 1) \\ \end{align}\]
13E4. Write a mathematical model formula for a Poisson regression with varying intercepts.
# The mathematical model formula for a Poisson regression with varying intercepts:
\[\begin{align} y_i ∼ Poission(λ_i) \\ log(λ_i) = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(\bar{α}, σ_α) \\ β ∼ Normal(0, 1) \\ \bar{α} ∼ Normal(0, 10) \\ σ_α ∼ HalfCauchy(0, 1) \\ \end{align}\]
13E5. Write a mathematical model formula for a Poisson regression with two different kinds of varying intercepts, a cross-classified model.
# The cross-classified model adds another varying intercept type. We have to take care not to over-parameterize the model by having a hyperprior mean for both intercept types. I just pull the mean intercept out of both priors and put it in the linear model:
\[\begin{align} y_i ∼ Poission(λ_i) \\ log(λ_i) = \bar{α} + α_{group[i]} + α_{day[i]} + βx_i \\ α_{group} ∼ Normal(0, σ_{group}) \\ α_{day} ∼ Normal(0, σ_{day}) β ∼ Normal(0, 1) \\ \bar{α} ∼ Normal(0, 10) \\ σ_{group} ∼ HalfCauchy(0, 1) \\ σ_{day} ∼ HalfCauchy(0, 1) \\ \end{align}\]
13M1. Revisit the Reed frog survival data, data(reedfrogs), and add the predation and size treatment variables to the varying intercepts model. Consider models with either main effect alone, both main effects, as well as a model including both and their interaction. Instead of focusing on inferences about these two predictor variables, focus on the inferred variation across tanks. Explain why it changes as it does across models.
data(reedfrogs)
d <- reedfrogs
str(d)
## 'data.frame': 48 obs. of 5 variables:
## $ density : int 10 10 10 10 10 10 10 10 10 10 ...
## $ pred : Factor w/ 2 levels "no","pred": 1 1 1 1 1 1 1 1 2 2 ...
## $ size : Factor w/ 2 levels "big","small": 1 1 1 1 2 2 2 2 1 1 ...
## $ surv : int 9 10 7 10 9 9 10 9 4 9 ...
## $ propsurv: num 0.9 1 0.7 1 0.9 0.9 1 0.9 0.4 0.9 ...
# Recode pred and size as dummy variables
d$pred <- ifelse( d$pred=="no" , 0 , 1 )
d$big <- ifelse( d$size=="big" , 1 , 0 )
# Define and fit the models
d$tank <- 1:nrow(d)
m0 <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank],
a_tank[tank] ~ dnorm(a,sigma_tank),
a ~ dnorm(0,10),
sigma_tank ~ dcauchy(0,1)
),
data=d , chains=4 )
##
## SAMPLING FOR MODEL '7a0e663757dc52d936cd92305df6dc03' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.39 seconds (Warm-up)
## Chain 1: 0.318 seconds (Sampling)
## Chain 1: 0.708 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL '7a0e663757dc52d936cd92305df6dc03' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.378 seconds (Warm-up)
## Chain 2: 0.313 seconds (Sampling)
## Chain 2: 0.691 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL '7a0e663757dc52d936cd92305df6dc03' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.377 seconds (Warm-up)
## Chain 3: 0.322 seconds (Sampling)
## Chain 3: 0.699 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL '7a0e663757dc52d936cd92305df6dc03' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.37 seconds (Warm-up)
## Chain 4: 0.312 seconds (Sampling)
## Chain 4: 0.682 seconds (Total)
## Chain 4:
## Computing WAIC
# To this model skeleton we add predictors. Nothing will change otherwise. The varying intercepts and their priors remain the same throughout. First a model containing only the dummy variable indicating presence of predators:
m_p <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank] + bp*pred,
bp ~ dnorm(0,1),
a_tank[tank] ~ dnorm(a,sigma_tank),
a ~ dnorm(0,10),
sigma_tank ~ dcauchy(0,1)
),
data=d , chains=4 )
##
## SAMPLING FOR MODEL 'c306afe051fc94874ec8e2851a0185fc' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.701 seconds (Warm-up)
## Chain 1: 0.386 seconds (Sampling)
## Chain 1: 1.087 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'c306afe051fc94874ec8e2851a0185fc' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.553 seconds (Warm-up)
## Chain 2: 0.454 seconds (Sampling)
## Chain 2: 1.007 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'c306afe051fc94874ec8e2851a0185fc' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.522 seconds (Warm-up)
## Chain 3: 0.389 seconds (Sampling)
## Chain 3: 0.911 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'c306afe051fc94874ec8e2851a0185fc' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.582 seconds (Warm-up)
## Chain 4: 0.527 seconds (Sampling)
## Chain 4: 1.109 seconds (Total)
## Chain 4:
## Computing WAIC
# The model containing only the dummy variable indicating big tadpoles:
m_b <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank] + bb*big,
bb ~ dnorm(0,1),
a_tank[tank] ~ dnorm(a,sigma_tank),
a ~ dnorm(0,10),
sigma_tank ~ dcauchy(0,1)
),
data=d , chains=4 )
##
## SAMPLING FOR MODEL 'cf23f169771b32411e72a03a9cef83c8' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.763 seconds (Warm-up)
## Chain 1: 0.721 seconds (Sampling)
## Chain 1: 1.484 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'cf23f169771b32411e72a03a9cef83c8' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.649 seconds (Warm-up)
## Chain 2: 0.595 seconds (Sampling)
## Chain 2: 1.244 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'cf23f169771b32411e72a03a9cef83c8' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.667 seconds (Warm-up)
## Chain 3: 0.427 seconds (Sampling)
## Chain 3: 1.094 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'cf23f169771b32411e72a03a9cef83c8' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.638 seconds (Warm-up)
## Chain 4: 0.777 seconds (Sampling)
## Chain 4: 1.415 seconds (Total)
## Chain 4:
## Computing WAIC
# Finally, a model containing both predictors and a model that contains their interaction:
m_p_b <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank] + bp*pred + bb*big,
c(bp,bb) ~ dnorm(0,1),
a_tank[tank] ~ dnorm(a,sigma_tank),
a ~ dnorm(0,10),
sigma_tank ~ dcauchy(0,1)
),
data=d , chains=4 )
##
## SAMPLING FOR MODEL '06882fee9a480c1d232e98db556419e2' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.892 seconds (Warm-up)
## Chain 1: 0.503 seconds (Sampling)
## Chain 1: 1.395 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL '06882fee9a480c1d232e98db556419e2' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.62 seconds (Warm-up)
## Chain 2: 0.517 seconds (Sampling)
## Chain 2: 1.137 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL '06882fee9a480c1d232e98db556419e2' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.655 seconds (Warm-up)
## Chain 3: 0.488 seconds (Sampling)
## Chain 3: 1.143 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL '06882fee9a480c1d232e98db556419e2' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.773 seconds (Warm-up)
## Chain 4: 0.778 seconds (Sampling)
## Chain 4: 1.551 seconds (Total)
## Chain 4:
## Computing WAIC
m_p_b_pb <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank] + bp*pred + bb*big + bpb*pred*big,
c(bp,bb,bpb) ~ dnorm(0,1),
a_tank[tank] ~ dnorm(a,sigma_tank),
a ~ dnorm(0,10),
sigma_tank ~ dcauchy(0,1)
),
data=d , chains=4 )
##
## SAMPLING FOR MODEL '69b9ae38bf5b1eef7ddfa93063c6824f' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.028 seconds (Warm-up)
## Chain 1: 0.99 seconds (Sampling)
## Chain 1: 2.018 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL '69b9ae38bf5b1eef7ddfa93063c6824f' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.952 seconds (Warm-up)
## Chain 2: 0.847 seconds (Sampling)
## Chain 2: 1.799 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL '69b9ae38bf5b1eef7ddfa93063c6824f' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 1.028 seconds (Warm-up)
## Chain 3: 0.768 seconds (Sampling)
## Chain 3: 1.796 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL '69b9ae38bf5b1eef7ddfa93063c6824f' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 1.095 seconds (Warm-up)
## Chain 4: 0.919 seconds (Sampling)
## Chain 4: 2.014 seconds (Total)
## Chain 4:
## Computing WAIC
# Then inspect how the estimated variation across tanks changes from model to model.
# Compare the posterior distributions for σtank across the models:
coeftab(m0,m_p,m_b,m_p_b,m_p_b_pb)
## m0 m_p m_b m_p_b m_p_b_pb
## a_tank[1] 2.15 2.49 2.43 2.80 2.46
## a_tank[2] 3.10 2.99 3.34 3.23 2.88
## a_tank[3] 1.01 1.69 1.32 2.06 1.73
## a_tank[4] 3.12 2.98 3.37 3.23 2.87
## a_tank[5] 2.18 2.49 2.23 2.64 2.49
## a_tank[6] 2.16 2.48 2.20 2.65 2.49
## a_tank[7] 3.11 2.96 3.19 3.11 2.90
## a_tank[8] 2.18 2.49 2.21 2.65 2.47
## a_tank[9] -0.17 2.24 0.17 2.60 2.51
## a_tank[10] 2.15 3.58 2.45 3.89 3.73
## a_tank[11] 1.01 3.01 1.30 3.35 3.23
## a_tank[12] 0.59 2.75 0.91 3.11 2.98
## a_tank[13] 1.01 3.01 1.05 3.08 2.71
## a_tank[14] 0.19 2.49 0.23 2.58 2.23
## a_tank[15] 2.14 3.59 2.18 3.65 3.28
## a_tank[16] 2.16 3.58 2.18 3.66 3.26
## a_tank[17] 2.95 2.90 3.24 3.22 2.82
## a_tank[18] 2.41 2.56 2.71 2.90 2.49
## a_tank[19] 2.01 2.28 2.34 2.65 2.22
## a_tank[20] 3.72 3.31 3.96 3.56 3.15
## a_tank[21] 2.40 2.57 2.44 2.67 2.54
## a_tank[22] 2.40 2.56 2.42 2.68 2.55
## a_tank[23] 2.40 2.56 2.43 2.67 2.55
## a_tank[24] 1.71 2.01 1.73 2.12 2.05
## a_tank[25] -1.00 1.58 -0.67 1.99 1.96
## a_tank[26] 0.17 2.53 0.49 2.94 2.89
## a_tank[27] -1.44 1.26 -1.11 1.68 1.64
## a_tank[28] -0.47 2.01 -0.14 2.42 2.38
## a_tank[29] 0.16 2.53 0.17 2.57 2.18
## a_tank[30] 1.44 3.53 1.47 3.56 3.15
## a_tank[31] -0.64 1.86 -0.62 1.92 1.55
## a_tank[32] -0.30 2.13 -0.30 2.19 1.81
## a_tank[33] 3.22 3.08 3.51 3.39 2.95
## a_tank[34] 2.72 2.78 3.04 3.11 2.68
## a_tank[35] 2.72 2.77 3.02 3.12 2.68
## a_tank[36] 2.08 2.26 2.38 2.67 2.20
## a_tank[37] 2.07 2.26 2.09 2.36 2.28
## a_tank[38] 3.93 3.43 3.96 3.54 3.33
## a_tank[39] 2.72 2.76 2.75 2.86 2.73
## a_tank[40] 2.35 2.50 2.37 2.60 2.49
## a_tank[41] -1.80 0.92 -1.49 1.37 1.35
## a_tank[42] -0.58 1.91 -0.24 2.34 2.31
## a_tank[43] -0.46 2.01 -0.13 2.43 2.41
## a_tank[44] -0.34 2.11 -0.01 2.54 2.51
## a_tank[45] 0.58 2.91 0.59 2.95 2.52
## a_tank[46] -0.57 1.91 -0.56 1.96 1.56
## a_tank[47] 2.07 4.02 2.07 4.04 3.61
## a_tank[48] 0.01 2.42 0.02 2.45 2.04
## a 1.39 2.56 1.56 2.79 2.53
## sigma_tank 1.64 0.83 1.62 0.79 0.75
## bp NA -2.44 NA -2.44 -1.99
## bb NA NA -0.34 -0.46 0.11
## bpb NA NA NA NA -1.04
## nobs 48 48 48 48 48
# A full table with all of the parameters in it was shown, including the varying intercepts (all 48 of them). It can be seen that Adding a predictor always decreased the posterior mean variation across tanks. Because the predictors are predicting variation. This leaves less variation for the varying intercepts to mop up. In theory, if we had in the form of predictor variables all of the relevant information that determined the survival outcomes, there would be zero variation across tanks.
# It is also noticed that big reduces the variation much less than does pred. The predictor big, in these models, doesn’t help prediction very much, so accounting for it has minimal impact on the estimated variation across tanks.
13M2. Compare the models you fit just above, using WAIC. Can you reconcile the differences in WAIC with the posterior distributions of the models?
# The WAIC comparison table:
compare(m0,m_p,m_b,m_p_b,m_p_b_pb)
## WAIC SE dWAIC dSE pWAIC weight
## m_p_b_pb 1000.707 37.70761 0.0000000 NA 27.88807 0.360791998
## m_p_b 1000.877 37.45735 0.1694258 2.320931 28.17790 0.331487036
## m_p 1001.069 37.28174 0.3618515 2.692446 28.85786 0.301079960
## m0 1010.067 38.23091 9.3593022 7.138895 38.20901 0.003348962
## m_b 1010.101 38.24812 9.3935858 7.141564 38.09781 0.003292044
# The top three models are almost perfectly tied, suggesting that big accounts for very little
# The coefficients:
coeftab(m0,m_p,m_b,m_p_b,m_p_b_pb)
## m0 m_p m_b m_p_b m_p_b_pb
## a_tank[1] 2.15 2.49 2.43 2.80 2.46
## a_tank[2] 3.10 2.99 3.34 3.23 2.88
## a_tank[3] 1.01 1.69 1.32 2.06 1.73
## a_tank[4] 3.12 2.98 3.37 3.23 2.87
## a_tank[5] 2.18 2.49 2.23 2.64 2.49
## a_tank[6] 2.16 2.48 2.20 2.65 2.49
## a_tank[7] 3.11 2.96 3.19 3.11 2.90
## a_tank[8] 2.18 2.49 2.21 2.65 2.47
## a_tank[9] -0.17 2.24 0.17 2.60 2.51
## a_tank[10] 2.15 3.58 2.45 3.89 3.73
## a_tank[11] 1.01 3.01 1.30 3.35 3.23
## a_tank[12] 0.59 2.75 0.91 3.11 2.98
## a_tank[13] 1.01 3.01 1.05 3.08 2.71
## a_tank[14] 0.19 2.49 0.23 2.58 2.23
## a_tank[15] 2.14 3.59 2.18 3.65 3.28
## a_tank[16] 2.16 3.58 2.18 3.66 3.26
## a_tank[17] 2.95 2.90 3.24 3.22 2.82
## a_tank[18] 2.41 2.56 2.71 2.90 2.49
## a_tank[19] 2.01 2.28 2.34 2.65 2.22
## a_tank[20] 3.72 3.31 3.96 3.56 3.15
## a_tank[21] 2.40 2.57 2.44 2.67 2.54
## a_tank[22] 2.40 2.56 2.42 2.68 2.55
## a_tank[23] 2.40 2.56 2.43 2.67 2.55
## a_tank[24] 1.71 2.01 1.73 2.12 2.05
## a_tank[25] -1.00 1.58 -0.67 1.99 1.96
## a_tank[26] 0.17 2.53 0.49 2.94 2.89
## a_tank[27] -1.44 1.26 -1.11 1.68 1.64
## a_tank[28] -0.47 2.01 -0.14 2.42 2.38
## a_tank[29] 0.16 2.53 0.17 2.57 2.18
## a_tank[30] 1.44 3.53 1.47 3.56 3.15
## a_tank[31] -0.64 1.86 -0.62 1.92 1.55
## a_tank[32] -0.30 2.13 -0.30 2.19 1.81
## a_tank[33] 3.22 3.08 3.51 3.39 2.95
## a_tank[34] 2.72 2.78 3.04 3.11 2.68
## a_tank[35] 2.72 2.77 3.02 3.12 2.68
## a_tank[36] 2.08 2.26 2.38 2.67 2.20
## a_tank[37] 2.07 2.26 2.09 2.36 2.28
## a_tank[38] 3.93 3.43 3.96 3.54 3.33
## a_tank[39] 2.72 2.76 2.75 2.86 2.73
## a_tank[40] 2.35 2.50 2.37 2.60 2.49
## a_tank[41] -1.80 0.92 -1.49 1.37 1.35
## a_tank[42] -0.58 1.91 -0.24 2.34 2.31
## a_tank[43] -0.46 2.01 -0.13 2.43 2.41
## a_tank[44] -0.34 2.11 -0.01 2.54 2.51
## a_tank[45] 0.58 2.91 0.59 2.95 2.52
## a_tank[46] -0.57 1.91 -0.56 1.96 1.56
## a_tank[47] 2.07 4.02 2.07 4.04 3.61
## a_tank[48] 0.01 2.42 0.02 2.45 2.04
## a 1.39 2.56 1.56 2.79 2.53
## sigma_tank 1.64 0.83 1.62 0.79 0.75
## bp NA -2.44 NA -2.44 -1.99
## bb NA NA -0.34 -0.46 0.11
## bpb NA NA NA NA -1.04
## nobs 48 48 48 48 48
# Let’s focus on the beta coefficients. Note that the posterior means for bb are smaller in absolute value than those for bp. This is consistent with the WAIC comparison. In fact, the standard deviations on these coefficients are big enough that the bb posterior distributions overlap zero quite a bit. Consider for example the model m_b:
precis(m_b)
## 48 vector or matrix parameters hidden. Use depth=2 to show them.
## mean sd 5.5% 94.5% n_eff Rhat4
## bb -0.341903 0.4571136 -1.067289 0.3713163 548.9623 1.0069296
## a 1.556208 0.3476315 1.019272 2.1232440 867.6277 1.0029876
## sigma_tank 1.624580 0.2140942 1.314334 1.9854100 2243.3866 0.9993001
# We can conclude from this model that tadpole size doesn't matter.
13M3. Re-estimate the basic Reed frog varying intercept model, but now using a Cauchy distribution in place of the Gaussian distribution for the varying intercepts. That is, fit this model: \[\begin{align} s_i ∼ Binomial(n_i, p_i) \\ logit(p_i) = α_{tank[i]} \\ α_{tank} ∼ Cauchy(α, σ) \\ α ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]
(You are likely to see many divergent transitions for this model. Can you figure out why? Can you fix them?) Compare the posterior means of the intercepts, αtank, to the posterior means produced in the chapter, using the customary Gaussian prior. Can you explain the pattern of differences? Take note of any change in the mean α as well.
m0_Cauchy <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank],
a_tank[tank] ~ dcauchy(a,scale_tank),
a ~ dnorm(0,1),
scale_tank ~ dexp(1)
),
data=d , chains=4 , warmup=1000 , iter=3000 ,
control=list(adapt_delta=0.99) , cores=4 )
## Warning: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
## Computing WAIC
# Compare the posterior means of the a_tank parameters.
post_m0 <- extract.samples(m0)
a_tank_m0 <- apply(post_m0$a_tank,2,mean)
post_m0C <- extract.samples(m0_Cauchy)
a_tank_m0C <- apply(post_m0C$a_tank,2,mean)
# Plotting the posterior means.
plot( a_tank_m0 , a_tank_m0C , pch=16 , col=rangi2 ,
xlab="Gaussian prior" , ylab="Cauchy prior" )
abline(a=0,b=1,lty=2)
# The dashed line show the values for which the intercepts are equal in the two models. We can see that for the majority of tank intercepts, the Cauchy model actually produces posterior means that are essentially the same as those from the Gaussian model. But the extremely large intercepts, under the Gaussian prior, are very much more extreme under the Cauchy prior. For those tanks, on the righthand side of the plot, all of the tadpoles survived. So using only the data from each tank alone, the log-odds of survival are infinite. The adaptive prior applies pooling that shrinks those log-odds inwards from infinity, thankfully. But the Gaussian prior causes more shrinkage of the extreme values than the Cauchy prior does. That is what accounts for those 5 extreme points on the right of the plot above.
13M4. Now use a Student-t distribution with ν = 2 for the intercepts: \[\begin{align} α_{tank} ∼ Student(2, α, σ) \end{align}\]
Refer back to the Student-t example in Chapter 7 (page 234), if necessary. Compare the resulting posterior to both the original model and the Cauchy model in 13M3. Can you explain the differences and similarities in shrinkage in terms of the properties of these distributions?
m0_Studentt <- map2stan(
alist(
surv ~ dbinom(density,p),
logit(p) <- a_tank[tank],
a_tank[tank] ~ dstudent(2, a, sigma),
a ~ dnorm(0,1),
sigma ~ dcauchy(0,1)
),
data=d , chains=4 , warmup=1000 , iter=3000 ,
control=list(adapt_delta=0.99) , cores=4 )
## Computing WAIC
# Compare the posterior means of the a_tank parameters
post_m0S <- extract.samples(m0_Studentt)
a_tank_m0S <- apply(post_m0S$a_tank,2,mean)
# Plotting the posterior means
# original model vs. Student-t
plot( a_tank_m0 , a_tank_m0S , pch=16 , col=rangi2 ,
xlab="Gaussian prior" , ylab="Student-t prior" )
abline(a=0,b=1,lty=2)
# Cauchy vs. Student-t
plot( a_tank_m0S , a_tank_m0C , pch=16 , col=rangi2 ,
xlab="Student-t prior" , ylab="Cauchy prior" )
abline(a=0,b=1,lty=2)
# The dashed line show the values for which the intercepts are equal in the two models. We can see that for the majority of tank intercepts, the Student t model actually produces posterior means that are essentially the same as those from the Gaussian model. But the extremely large intercepts, under the Gaussian prior, are very much more extreme under the Student t prior. For those tanks, on the righthand side of the plot, all of the tadpoles survived. So using only the data from each tank alone, the log-odds of survival are infinite. The adaptive prior applies pooling that shrinks those log-odds inwards from infinity, thankfully. But the Gaussian prior causes more shrinkage of the extreme values than the Student-t prior does. That is what accounts for those 5 extreme points on the right of the plot above.
# The same applies to the comparison between Student-t model and Chauchy model, that the posterior means are essentially the same, but the extremely large intercepts, under the Student-t prior, are more extreme under the Student-t prior. For those tanks, on the righthand side of the plot, all of the tadpoles survived. So using only the data from each tank alone, the log-odds of survival are infinite. The adaptive prior applies pooling that shrinks those log-odds inwards from infinity. But the Student-t prior causes more shrinkage of the extreme values than the Chauchy prior does. That is what accounts for those 5 extreme points on the right of the plot above.
13M5. Modify the cross-classified chimpanzees model m13.4 so that the adaptive prior for blocks contains a parameter \(\bar{γ}\) for its mean: \[\begin{align} γ_j ∼ Normal(\bar{γ}, σ_γ) \\ \bar{γ} ∼ Normal(0, 1.5) \\ \end{align}\]
Compare this model to m13.4. What has including \(\bar{γ}\) done?
data(chimpanzees)
d1=chimpanzees
d1$treatment <-1+d1$prosoc_left+2*d1$condition
dat_list <-list(
pulled_left =d1$pulled_left,
actor =d1$actor,
block_id =d1$block,
treatment =as.integer(d1$treatment))
set.seed(13)
m13.4 <-ulam(
alist(
pulled_left ~dbinom(1,p),
logit(p) <-a[actor]+g[block_id]+b[treatment],
b[treatment] ~dnorm(0,0.5),
## adaptivepriors
a[actor] ~dnorm(a_bar,sigma_a),
g[block_id] ~dnorm(0,sigma_g),
## hyper-priors
a_bar ~dnorm(0,1.5),
sigma_a ~dexp(1),
sigma_g ~dexp(1)
) ,data=dat_list,chains=4,cores=4,log_lik=TRUE)
## Warning: The largest R-hat is 1.09, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
m13.4m <-ulam(
alist(
pulled_left ~dbinom(1,p),
logit(p) <-a[actor]+g[block_id]+b[treatment],
b[treatment] ~dnorm(0,0.5),
## adaptivepriors
a[actor] ~dnorm(a_bar,sigma_a),
g[block_id] ~dnorm(g_bar,sigma_g),
## hyper-priors
a_bar ~dnorm(0,1.5),
g_bar ~dnorm(0,1.5),
sigma_a ~dexp(1),
sigma_g ~dexp(1)
) ,data=dat_list,chains=4,cores=4,log_lik=TRUE)
## Warning: There were 42 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 355 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
precis(m13.4)
## 17 vector or matrix parameters hidden. Use depth=2 to show them.
## mean sd 5.5% 94.5% n_eff Rhat4
## a_bar 0.6157559 0.7410185 -0.55148586 1.7829846 838.0549 1.003500
## sigma_a 2.0466279 0.7172961 1.20506491 3.3877901 791.7664 1.003690
## sigma_g 0.2003300 0.1694725 0.01559975 0.5183373 181.5827 1.041487
precis(m13.4m)
## 17 vector or matrix parameters hidden. Use depth=2 to show them.
## mean sd 5.5% 94.5% n_eff Rhat4
## a_bar 0.3425819 1.1624020 -1.44450437 2.2273162 145.1867 1.012201
## g_bar 0.3638466 1.1610283 -1.55370271 2.2319355 103.1375 1.011617
## sigma_a 2.0024581 0.6405096 1.19600665 3.1867852 656.9140 1.009125
## sigma_g 0.2379528 0.1847310 0.04469258 0.5719697 292.2262 1.019034
# The new model, m13.4m, samples a little poorly. The n_eff values are lower, and the Rhat values are larger. Notice however that the inferences about the slopes are practically identical. So even though the over-parameterized model is inefficient, it has identified the slope parameters.
13M6. Sometimes the prior and the data (through the likelihood) are in conflict, because they concentrate around different regions of parameter space. What happens in these cases depends a lot upon the shape of the tails of the distributions.202 Likewise, the tails of distributions strongly influence can outliers are shrunk or not towards the mean. I want you to consider four different models to fit to one observation at y = 0. The models differ only in the distributions assigned to the likelihood and prior. Here are the four models:
\[\begin{align} Model \;NN: y &∼ Normal(μ, 1) & Model \;TN: y &∼ Student(2, μ, 1) \\ μ &∼ Normal(10, 1) & μ &∼ Normal(10, 1) \\ Model \;NT: y &∼ Normal(μ, 1) & Model \;TT: y &∼ Student(2, μ, 1) \\ μ &∼ Student(2, 10, 1) & μ &∼ Student(2, 10, 1) \\ \end{align}\]
Estimate the posterior distributions for these models and compare them. Can you explain the results, using the properties of the distributions?
data(chimpanzees)
d1=chimpanzees
# Model 1
m13m6.1 <- map2stan(
alist(
pulled_left ~ dnorm(mu , 1),
mu~dnorm(10,1)
),
data=d1 , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL '93e7d04727b2eb5013595a6b66a10cf2' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.022 seconds (Warm-up)
## Chain 1: 0.017 seconds (Sampling)
## Chain 1: 0.039 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
# Model 2
m13m6.2 <- map2stan(
alist(
pulled_left ~ dstudent(2, mu , 1),
mu~dnorm(10,1)
),
data=d1 , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL '8e77997edef344d9b4d360105ab0ca50' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.097 seconds (Warm-up)
## Chain 1: 0.099 seconds (Sampling)
## Chain 1: 0.196 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
# Model 3
m13m6.3 <- map2stan(
alist(
pulled_left ~ dnorm(mu , 1),
mu~dstudent(2,10,1)
),
data=d1 , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL '7119e0488a4a4982f719326257f0f923' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.018 seconds (Warm-up)
## Chain 1: 0.02 seconds (Sampling)
## Chain 1: 0.038 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
# Model 4
m13m6.4 <- map2stan(
alist(
pulled_left ~ dstudent(2,mu , 1),
mu~dstudent(2,10,1)
),
data=d1 , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL 'af52aebc33354eda631e72df6f16d72b' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.093 seconds (Warm-up)
## Chain 1: 0.097 seconds (Sampling)
## Chain 1: 0.19 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
precis(m13m6.1)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.597736 0.04669989 0.5232753 0.6714226 328.1431 1.002062
precis(m13m6.2)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.6186781 0.04505325 0.5463894 0.6856497 372.6287 0.9990035
precis(m13m6.3)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.5804294 0.04145138 0.5116512 0.64401 388.2238 1.003903
precis(m13m6.4)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.608028 0.04724995 0.5352785 0.6844646 241.1224 1.000429
post_m13m6.1 <- extract.samples(m13m6.1)
post_m13m6.2 <- extract.samples(m13m6.2)
post_m13m6.3 <- extract.samples(m13m6.3)
post_m13m6.4 <- extract.samples(m13m6.4)
dens(post_m13m6.2$mu, col="blue")
dens(post_m13m6.1$mu, col="red", add = TRUE)
dens(post_m13m6.3$mu, col="green", add=TRUE)
dens(post_m13m6.4$mu, col="black", add=TRUE)
abline(v=0.6, col = "grey")
legend("topright",
col = c("red","blue", "green", "black"),
pch = 19,
legend = c("Model 1","Model 2", "Model 3", "Model 4"))
# The shape of tails of Student-t distribution are heavier than Normal distribution. As can be seen from the posterior distributions plot, when the prior and the data (through the likelihood) are of the same distribution, the distribution of posterior tends to be symmetric. As indicated, the tails of distributions strongly influence can outliers are shrunk or not towards the mean.
EXTRA CREDIT (10 POINTS)
13H1. In 1980, a typical Bengali woman could have 5 or more children in her lifetime. By the year 2000, a typical Bengali woman had only 2 or 3. You’re going to look at a historical set of data, when contraception was widely available but many families chose not to use it. These data reside in data(bangladesh) and come from the 1988 Bangladesh Fertility Survey. Each row is one of 1934 women. There are six variables, but you can focus on two of them for this practice problem: (1) district: ID number of administrative district each woman resided in (2) use.contraception: An indicator (0/1) of whether the woman was using contraception The first thing to do is ensure that the cluster variable, district, is a contiguous set of integers. Recall that these values will be index values inside the model. If there are gaps, you’ll have parameters for which there is no data to inform them. Worse, the model probably won’t run. Look at the unique values of the district variable:
data(bangladesh)
d <- bangladesh
sort(unique(d$district)) #District 54 is absent. So district isn’t yet a good index variable, because it’s not contiguous. This is easy to fix. Just make a new variable that is contiguous. This is enough to do it:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 55 56 57 58 59 60 61
d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id)) # Now there are 60 values, contiguous integers 1 to 60. Now, focus on predicting use.contraception, clustered by district_id.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60
Fit both (1) a traditional fixed-effects model that uses an index variable for district and (2) a multilevel model with varying intercepts for district. Plot the predicted proportions of women in each district using contraception, for both the fixed-effects model and the varying-effects model. That is, make a plot in which district ID is on the horizontal axis and expected proportion using contraception is on the vertical. Make one plot for each model, or layer them on the same plot, as you prefer. How do the models disagree? Can you explain the pattern of disagreement? In particular, can you explain the most extreme cases of disagreement, both why they happen where they do and why the models reach different inferences?
# prep trimmed data list
dlist <- list(
use_contraception = d$use.contraception,
district = d$district_id )
# fixed effects model
m12H1f <- map2stan(
alist(
use_contraception ~ dbinom( 1 , p ),
logit(p) <- a_district[district],
a_district[district] ~ dnorm(0,10)
),
data=dlist )
##
## SAMPLING FOR MODEL '8300906df37c13b034c352e1a9dfda9e' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 17.24 seconds (Warm-up)
## Chain 1: 17.787 seconds (Sampling)
## Chain 1: 35.027 seconds (Total)
## Chain 1:
## Computing WAIC
# varying effects model
m12H1v <- map2stan(
alist(
use_contraception ~ dbinom( 1 , p ),
logit(p) <- a + a_district[district],
a ~ dnorm(0,10),
a_district[district] ~ dnorm(0,sigma),
sigma ~ dcauchy(0,1)
),
data=dlist )
##
## SAMPLING FOR MODEL '7f4a0b3e2d42cfe64a5bb28de404fe02' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 14.108 seconds (Warm-up)
## Chain 1: 10.305 seconds (Sampling)
## Chain 1: 24.413 seconds (Total)
## Chain 1:
## Computing WAIC
# get predicted probabilities for each district for each model
# first, set up a new data list to compute predictions for. In this list, each case is just a unique district.
pred.dat <- list(district=1:60)
# call link for each model, passing it the new cases to compute predicted probabilities for:
pred1 <- link(m12H1f,data=pred.dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
pred2 <- link(m12H1v,data=pred.dat)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
# have a glimps of the prediction for each district was computed for each of 1000 samples from the posterior distribution
str(pred1)
## num [1:1000, 1:60] 0.285 0.323 0.235 0.291 0.306 ...
# get average probabilities in each district:
p1.mean <- apply( pred1 , 2 , mean )
p2.mean <- apply( pred2 , 2 , mean )
round(p1.mean,2)
## [1] 0.26 0.35 0.96 0.50 0.36 0.29 0.28 0.38 0.30 0.08 0.00 0.35 0.42 0.63 0.36
## [16] 0.55 0.29 0.34 0.39 0.40 0.39 0.20 0.26 0.07 0.45 0.39 0.18 0.24 0.28 0.49
## [31] 0.45 0.21 0.43 0.66 0.50 0.35 0.54 0.29 0.50 0.47 0.50 0.54 0.53 0.22 0.33
## [46] 0.52 0.47 0.52 0.02 0.47 0.46 0.44 0.42 0.17 0.58 0.18 0.45 0.10 0.22 0.21
# plot the estimated proportions using contraception:
# take the estimated log-odds lists and use the logistic transform on them, to convert to probability scale
plot( 1:60 , p1.mean , col=rangi2 , pch=16 , xlab="District" ,
ylab="probability use contraception" )
points( 1:60 , p2.mean )
abline( h=logistic(coef(m12H1v)[1]) , lty=2 ) # plot line for 'a'
# The blue points are the fixed-effects estimates, and the open black ones are the varying effects. The dashed line is the average proportion of women using contraception, in the entire sample (as estimated by the varying effects model). Notice first that the black points are always closer to the dashedline, as was the case with the tadpole example in lecture. This results from shrinkage, which resultsfrom pooling information.
# There are cases with rather extreme disagreements, though. The most obvious is district 3, which has a fixed (blue) estimate of 1 but a varying (black) estimate of only 0.44. There are also two districts (11 and 49) for which the fixed estimates are zero, but the varying estimates are 0.18 and 0.30. In these districts, either all sampled women used contraception or none did. As a result, the fixed effects estimates were silly—the parameter estimates are far from zero with very large standard errors. The varying effects model was able to produce more rational estimates, because itpooled information from other districts. Depending upon how many women were sampled in eachdistrict, there was more or less shrinkage (pooling) towards to grand mean. So for example in the case of district 3, there were only 2 women in the sample, and so there is a lot of distance between the blue and black points. In contrast, district 11 had 21 women in the sample, and so while pooling pulls the estimate off of zero to 0.18, it doesn’t pull it nearly as far as district 3.
# Another way to think of this phenomenon is to view the same estimates arranged by number of women in the district sample, on the horizontal axis. Then on the vertical we can plot the distance (absolute value of the difference) between the fixed and varying estimates. Here’s what that looks like:
# compute number of women sampled in each district
n_by_district <- sapply( 1:60 , function(did) length(d$district_id[d$district_id==did]) )
# compute shrinkage
shrinkage <- abs( p1.mean - p2.mean )
plot( n_by_district , shrinkage , col ="slateblue" ,
xlab ="number of women sampled" , ylab ="shrinkage by district" )
# The districts with fewer women sampled show a lot more shrinkage, because there is less information in them. As a result, they are expected to overfit more, and so they are shrunk more towards the overall mean.