Chapter 13 - Models with Memory

This chapter has been an introduction to the motivation, implementation, and interpretation of basic multilevel models. It focused on varying intercepts, which achieve better estimates of baseline differences among clusters in the data. They achieve better estimates, because they simultaneously model the population of clusters and use inferences about the population to pool information among parameters. From another perspective, varying intercepts are adaptively regularized parameters, relying upon a prior that is itself learned from the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

13-1. Revisit the Reed frog survival data, data(reedfrogs), and add the predation and size treatment variables to the varying intercepts model. Consider models with either main effect alone, both main effects, as well as a model including both and their interaction. Instead of focusing on inferences about these two predictor variables, focus on the inferred variation across tanks. Explain why it changes as it does across models. Plot the sigma estimates.

library(rethinking) 
data(reedfrogs) 
d <- reedfrogs 
str(d)
## 'data.frame':    48 obs. of  5 variables:
##  $ density : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ pred    : Factor w/ 2 levels "no","pred": 1 1 1 1 1 1 1 1 2 2 ...
##  $ size    : Factor w/ 2 levels "big","small": 1 1 1 1 2 2 2 2 1 1 ...
##  $ surv    : int  9 10 7 10 9 9 10 9 4 9 ...
##  $ propsurv: num  0.9 1 0.7 1 0.9 0.9 1 0.9 0.4 0.9 ...
d$pred <- ifelse( d$pred=="no" , 0 , 1 ) 
d$big <- ifelse( d$size=="big" , 1 , 0 )
d$tank <- 1:nrow(d) 
# m0 <- map2stan( 
#   alist( 
#     surv ~ dbinom(density,p), 
#     logit(p) <- a_tank[tank], 
#     a_tank[tank] ~ dnorm(a,sigma_tank), 
#     a ~ dnorm(0,10), 
#     sigma_tank ~ dcauchy(0,1)
#     ), 
#   data=d , chains=4 )
# 
# 
# 
# m_p<-map2stan( 
#   alist( surv~dbinom(density,p), 
#          logit(p)<-a_tank[tank]+bp*pred, 
#          bp~dnorm(0,1), 
#          a_tank[tank]~dnorm(a,sigma_tank), 
#          a~dnorm(0,10), sigma_tank~dcauchy(0,1)
#          ), 
#   data=d,chains=4)
# 
# m_b<-map2stan(
#   alist(
#     surv~dbinom(density,p), 
#     logit(p)<-a_tank[tank]+bb*big, 
#     bb~dnorm(0,1), 
#     a_tank[tank]~dnorm(a,sigma_tank), 
#     a~dnorm(0,10), 
#     sigma_tank~dcauchy(0,1)
#     ), 
#   data=d,chains=4)
# 
# 
# m_p_b<-map2stan(
#   alist(
#     surv~dbinom(density,p), 
#     logit(p)<-a_tank[tank]+bp*pred+bb*big, 
#     c(bp,bb)~dnorm(0,1), 
#     a_tank[tank]~dnorm(a,sigma_tank), 
#     a~dnorm(0,10), 
#     sigma_tank~dcauchy(0,1)
#     ), 
#   data=d,chains=4)
# 
# m_p_b_pb<-map2stan(
#   alist(
#     surv~dbinom(density,p), 
#     logit(p)<-a_tank[tank]+bp*pred+bb*big+bpb*pred*big, 
#     c(bp,bb,bpb)~dnorm(0,1), 
#     a_tank[tank]~dnorm(a,sigma_tank), 
#     a~dnorm(0,10), 
#     sigma_tank~dcauchy(0,1)
#     ), 
#   data=d,chains=4)
# 
# 
# coeftab(m0,m_p,m_b,m_p_b,m_p_b_pb)

The varying intercepts and their priors remain the same throughout. Adding a predictor always decreased the posterior mean variation across tanks. Why? Because the predictors are predicting variation. This leaves less variation for the varying intercepts to mop up. In theory, if we had in the form of predictor variables all of the relevant information that determined the survival outcomes, there would be zero variation across tanks.

13-2. Compare the models you fit just above, using WAIC. Can you reconcile the differences in WAIC with the posterior distributions of the models? Show the WAIC table.

# compare(m0,m_p,m_b,m_p_b,m_p_b_pb)
# 
# coeftab(m0,m_p,m_b,m_p_b,m_p_b_pb)
# 
# precis(m_b)

The top three models are almost perfectly tied, suggesting that big accounts for very little. Note that the posterior means for bb are smaller in absolute value than those for bp. This is consistent with the WAIC comparison. In fact, the standard deviations on these coefficients are big enough that the bb posterior distributions overlap zero quite a bit.

13-3. Re-estimate the basic Reed frog varying intercept model, but now using a Cauchy distribution in place of the Gaussian distribution for the varying intercepts. That is, fit this model: \[\begin{align} s_i ∼ Binomial(n_i, p_i) \\ logit(p_i) = α_{tank[i]} \\ α_{tank} ∼ Cauchy(α, σ) \\ α ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]

(You are likely to see many divergent transitions for this model. Can you figure out why? Can you fix them?) Plot and compare the posterior means of the intercepts, αtank, to the posterior means produced in the chapter, using the customary Gaussian prior. Can you explain the pattern of differences? Take note of any change in the mean α as well.

# m0_Cauchy <- map2stan( 
#   alist( 
#     surv ~ dbinom(density,p), 
#     logit(p) <- a_tank[tank], 
#     a_tank[tank] ~ dcauchy(a,scale_tank), 
#     a ~ dnorm(0,10), 
#     scale_tank ~ dcauchy(0,1)
#     ), 
#   data=d , chains=4 , warmup=1000 , iter=3000 , 
#   control=list(adapt_delta=0.99) , cores=4 )
# 
# 
# post_m0 <- extract.samples(m0) 
# a_tank_m0 <- apply(post_m0$a_tank,2,mean) 
# post_m0C <- extract.samples(m0_Cauchy) 
# a_tank_m0C <- apply(post_m0C$a_tank,2,mean) 
# plot( a_tank_m0 , a_tank_m0C , pch=16 , col=rangi2 , xlab="Gaussian prior" , ylab="Cauchy prior" ) 
# abline(a=0,b=1,lty=2)

##The dashed line show the values for which the intercepts are equal in the two models. You can see that for the majority of tank intercepts, the Cauchy model actually produces posterior means that are essentially the same as those from the Gaussian model. But the extremely large intercepts, under the Gaussian prior, are very much more extreme under the Cauchy prior. For those tanks, on the righthand side of the plot, all of the tadpoles survived. So using only the data from each tank alone, the log-odds of survival are infinite. The adaptive prior applies pooling that shrinks those log-odds inwards from infinity, thankfully. But the Gaussian prior causes more shrinkage of the extreme values than the Cauchy prior does. That is what accounts for those 5 extreme points on the right of the plot above.

13-4. Now use a Student-t distribution with ν = 2 for the intercepts: \[\begin{align} α_{tank} ∼ Student(2, α, σ) \end{align}\]

Refer back to the Student-t example in Chapter 7 (page 234), if necessary. Plot and compare the resulting posterior to both the original model and the Cauchy model in 13-3. Can you explain the differences and similarities in shrinkage in terms of the properties of these distributions?

dat <- list(
  surv_ = d$surv,
  dens_ = d$density,
  tank_ = 1:nrow(d),
  pred_ = ifelse(d$pred == "no", 0L, 1L),
  size_ = ifelse(d$size == "small", 1L, 2L)
)
# m_student <- map2stan(
#   alist(
#     surv_ ~ dbinom(dens_, p),
#     logit(p) <- a[tank_],
#     a[tank_] ~ dstudent(2, a_1, sigma),
#     a_1 ~ dnorm(0,1),
#     sigma ~ dcauchy(0,1)
#     ),
#   data=dat, chains=4, cores=4, warmup=1000, 
#   iter=3000, control=list(adapt_delta=0.99)
# )
# 
# s_student <- apply(extract.samples(m_student)$a, 2, mean)
# m_tank = ulam(
#   alist(
#     surv_ ~ dbinom(dens_, p),
#     logit(p) <- a[tank_],
#     a[tank_] ~ dnorm(a_1, sigma),
#     a_1 ~ dnorm(0, 1.5),
#     sigma ~ dexp(1)
#   ),
#   data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
# )
# 
# s_tank <- apply(extract.samples(m_tank)$a, 2, mean)
#  
# plot(s_tank, s_student,
#   pch = 16, col = "red",
#   xlab = "Gaussian intercept", ylab = "student-t intercept"
# )
# abline(a = 0, b = 1, lty = 2)
# 

Once passing the extreme α_tank, the student-t model has more extreme values. And also, the Gaussian model causes more shrinkage of extreme values than the Student-t model. The Student-t model causes more shrinkage of extreme values than the Cauchy model.

13-5. Sometimes the prior and the data (through the likelihood) are in conflict, because they concentrate around different regions of parameter space. What happens in these cases depends a lot upon the shape of the tails of the distributions. Likewise, the tails of distributions strongly influence can outliers are shrunk or not towards the mean. I want you to consider four different models to fit to one observation at y = 0, this is your data, do not use any other data set. The models differ only in the distributions assigned to the likelihood and prior. Here are the four models:

\[\begin{align} Model \;NN: y &∼ Normal(μ, 1) & Model \;TN: y &∼ Student(2, μ, 1) \\ μ &∼ Normal(10, 1) & μ &∼ Normal(10, 1) \\ Model \;NT: y &∼ Normal(μ, 1) & Model \;TT: y &∼ Student(2, μ, 1) \\ μ &∼ Student(2, 10, 1) & μ &∼ Student(2, 10, 1) \\ \end{align}\]

Estimate and plot the posterior distributions against the likelihoods for these models and compare them. Can you explain the results, using the properties of the distributions?

data(chimpanzees)
dat_c <- chimpanzees
dat_c$recipient <- NULL
dat_c$block_id <- d$block

# # Model NN
# m_NN <- map2stan( 
#   alist(
#     pulled_left ~ dnorm(mu, 1),
#     mu ~ dnorm(10, 1)
#   ), 
#   data = dat_c, iter = 2000, chains = 1, cores = 1 
# )
# 
# precis(m_NN)
# 
# m_TN <- map2stan( 
#   alist(
#     pulled_left ~ dstudent(2, mu, 1),
#     mu ~ dnorm(10, 1)
#   ), 
#   data = dat_c, iter = 2000, chains = 1, cores = 1 
# )
# 
# m_NT <- map2stan( 
#   alist(
#     pulled_left ~ dnorm(mu, 1),
#     mu ~ dstudent(2, 10, 1)
#   ), 
#   data = dat_c, iter = 2000, chains = 1, cores = 1 
# )
# 
# precis(m_NT)
# 
# m_TT <- map2stan( 
#   alist(
#     pulled_left ~ dstudent(2, mu, 1),
#     mu ~ dstudent(2, 10, 1)
#   ), 
#   data = dat_c, iter = 2000, chains = 1, cores = 1 
# )
# 
# mu_NN <- extract.samples(m_NN)
# mu_TN <- extract.samples(m_TN)
# mu_NT <- extract.samples(m_NT)
# mu_TT <- extract.samples(m_TT)
# 
# dens(mu_NN$mu, col="blue")
# dens(mu_TN$mu, col="red", add = TRUE)
# dens(mu_NT$mu, col="green", add=TRUE)
# dens(mu_TT$mu, col="purple", add=TRUE)
# abline(v=0.6, col = "grey")
# legend("topright",
#        col = c("blue","red", "green", "purple"),
#        pch = 19,
#        legend = c("Model NN","Model TN", "Model NT", "Model TT"))