Chapter 13 - Models With Memory

This chapter has been an introduction to the motivation, implementation, and interpretation of basic multilevel models. It focused on varying intercepts, which achieve better estimates of baseline differences among clusters in the data. They achieve better estimates, because they simultaneously model the population of clusters and use inferences about the population to pool information among parameters. From another perspective, varying intercepts are adaptively regularized parameters, relying upon a prior that is itself learned from the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

13E1. Which of the following priors will produce more shrinkage in the estimates? Show the prior plot. \[\begin{align} \ α_{TANK} ∼ Normal(0, 1) \tag{a} \\ \ α_{TANK} ∼ Normal(0, 2) \tag{b} \\ \end{align}\]

# (a) would introduce more shrinkage, since it's distribution is narrower, which would provide more informative priors.

13E2. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Binomial(1, p_i) \\ logit(p_i) = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 1.5) \\ β ∼ Normal(0, 0.5) \\ \end{align}\]

# We can add another set of parameters (α1,σ_α1).
# y_i ∼ Binomial(1, p_i)
# logit(p_i) = α_{group[i]} + βx_i
# α_{group} ∼ Normal(α1, σ_α1)
# α1 ~ Normal(0,1.5)
# σ_α1 ~ Exponential(1)
# β ∼ Normal(0, 0.5)

13E3. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Normal(μ_i, σ) \\ μ_i = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 5) \\ β ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]

# We can add another set of parameters (α1,σ_α1).
# y_i ∼ Binomial(1, p_i)
# logit(p_i) = α_{group[i]} + βx_i
# α_{group} ∼ Normal(α1, σ_α1)
# α1 ~ Normal(0,5)
# σ_α1 ~ Exponential(1)
# β ∼ Normal(0, 1)
# σ ∼ Exponential(1)

13E4. Write a mathematical model formula for a Poisson regression with varying intercepts.

# We can change the 13E2 function with the Poisson distribution
# y_i ∼ Poisson(Lambda_i)
# logit(Lambda_i) = α_{group[i]} + βx_i
# α_{group} ∼ Normal(α1, σ_α1)
# α1 ~ Normal(0,1.5)
# σ_α1 ~ Exponential(1)
# β ∼ Normal(0, 0.5)

13E5. Write a mathematical model formula for a Poisson regression with two different kinds of varying intercepts, a cross-classified model.

# We can add a intercept group that has cluster of days to 13E4
# y_i ∼ Poisson(Lambda_i)
# logit(Lambda_i) = α_{group[i]} + γ_{day[i]} + βx_i
# α_{group} ∼ Normal(α1, σ_α1)
# α1 ~ Normal(0, 1.5)
# σ_α1 ~ Exponential(1)
# γ_day ~ Normal(γ1, σ_γ)
# γ1 ~ Normal(0, 1.5)
# σ_γ1 ~ Exponential(1)
# β ∼ Normal(0, 0.5)

13M1. Revisit the Reed frog survival data, data(reedfrogs), and add the predation and size treatment variables to the varying intercepts model. Consider models with either main effect alone, both main effects, as well as a model including both and their interaction. Instead of focusing on inferences about these two predictor variables, focus on the inferred variation across tanks. Explain why it changes as it does across models. Plot the sigma estimates.

# First, let's load the data
data(reedfrogs)
d <- reedfrogs
dat <- list(
  surv_ = d$surv,
  dens_ = d$density,
  tank_ = 1:nrow(d),
  pred_ = ifelse(d$pred == "no", 0L, 1L),
  size_ = ifelse(d$size == "small", 1L, 2L)
)

# Now run ulam with each one
# tank model
m_tank <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_],
    a[tank_] ~ dnorm(a_1, sigma),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
)
# predation model
m_pred <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_] + bp * pred_,
    a[tank_] ~ dnorm(a_1, sigma),
    bp ~ dnorm(-0.5, 1),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
)
# size model
m_size <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_] + s[size_],
    a[tank_] ~ dnorm(a_1, sigma),
    s[size_] ~ dnorm(0, 0.5),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
)
# predation + size model
m_pred_size <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_] + bp * pred_ + s[size_],
    a[tank_] ~ dnorm(a_1, sigma),
    bp ~ dnorm(-0.5, 1),
    s[size_] ~ dnorm(0, 0.5),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
# predation + size + interaction model
m_Interaction <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a_1 + a[tank_] * sigma + bp[size_] * pred_ + s[size_],
    a[tank_] ~ dnorm(0, 1),
    bp[size_] ~ dnorm(-0.5, 1),
    s[size_] ~ dnorm(0, 0.5),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE, iter = 2e3
)
# Now we can plot the sigma estimates
plot(coeftab(m_tank, m_pred, m_size, m_pred_size, m_Interaction),
  pars = "sigma",
  labels = c("Tank", "Predation", "Size", "Pred+Size", "Interaction")
)

# From the plot, we can clearly see that the inferred variation for tank would be predation. Therefore, it can be concluded that predation explains a lot of the variation across tanks.

13M2. Compare the models you fit just above, using WAIC. Can you reconcile the differences in WAIC with the posterior distributions of the models? Show the WAIC table.

# Let's compare those models
compare(m_tank, m_pred, m_size, m_pred_size, m_Interaction)
##                   WAIC       SE      dWAIC      dSE    pWAIC    weight
## m_pred_size   199.1285 8.649841 0.00000000       NA 18.70520 0.2538498
## m_pred        199.1500 8.929488 0.02142576 1.798831 19.22157 0.2511449
## m_Interaction 199.4522 8.971015 0.32367364 2.110151 19.00102 0.2159196
## m_tank        200.2225 7.306816 1.09400922 5.513615 20.98056 0.1468980
## m_size        200.4336 7.217108 1.30503883 5.673562 21.02723 0.1321878
# From the chart, we can see all of these models are similar in the predictions.
precis(m_pred, 2)
##             mean        sd       5.5%     94.5%     n_eff     Rhat4
## a[1]   2.4592915 0.6732850  1.4185497  3.582882 3235.1018 1.0000705
## a[2]   2.9574659 0.7486916  1.8183877  4.215616 2687.8438 1.0002782
## a[3]   1.7069418 0.6156903  0.7333352  2.695904 2602.0584 0.9999378
## a[4]   2.9646059 0.7369168  1.8505048  4.195904 3304.5614 1.0007902
## a[5]   2.4749595 0.7001932  1.3718910  3.641939 3609.8439 1.0007038
## a[6]   2.4681658 0.6907582  1.4010020  3.617751 3622.3496 1.0005511
## a[7]   2.9564306 0.7400787  1.8373015  4.205702 2720.1372 0.9999053
## a[8]   2.4521296 0.6781439  1.4084171  3.550352 3294.0457 1.0008883
## a[9]   2.2201546 0.5651055  1.2947847  3.108444 1291.0815 1.0038553
## a[10]  3.5734242 0.6501906  2.5837051  4.643148 1477.4699 1.0014425
## a[11]  2.9964508 0.5905283  2.0789048  3.961233 1570.2813 1.0008466
## a[12]  2.7361576 0.5683771  1.8458475  3.656208 1514.4395 1.0035316
## a[13]  2.9953193 0.5829300  2.0708033  3.896563 1897.1079 1.0021904
## a[14]  2.4740846 0.5599938  1.5786962  3.374632 1630.4936 1.0015852
## a[15]  3.5798818 0.6262039  2.6296836  4.599460 1427.2623 1.0016980
## a[16]  3.5779012 0.6394273  2.6026894  4.619262 1645.8312 1.0024703
## a[17]  2.8912067 0.6249148  1.9308571  3.917659 3224.0132 0.9997023
## a[18]  2.5652669 0.5779452  1.6973165  3.523928 3819.2049 1.0001794
## a[19]  2.2629276 0.5312721  1.4298299  3.135927 3845.9225 1.0002847
## a[20]  3.2595402 0.6982691  2.2286374  4.409218 2934.9816 1.0001890
## a[21]  2.5535628 0.5680692  1.6852574  3.501948 3159.2003 1.0003370
## a[22]  2.5466075 0.5695232  1.6858378  3.491601 2936.5310 0.9998303
## a[23]  2.5595440 0.5667495  1.6805244  3.484338 4000.0840 1.0007128
## a[24]  2.0030242 0.4956023  1.2409594  2.811617 3683.1554 0.9994428
## a[25]  1.5552211 0.4808941  0.7758043  2.316153  893.2704 1.0036748
## a[26]  2.5123725 0.4513869  1.7871944  3.236510  867.8380 1.0056675
## a[27]  1.2332963 0.5141419  0.3971678  2.031053 1100.8337 1.0050426
## a[28]  1.9876048 0.4608152  1.2253275  2.697330  933.8394 1.0037826
## a[29]  2.5163815 0.4474102  1.8078872  3.231440  903.9137 1.0039371
## a[30]  3.5084840 0.4878258  2.7452355  4.307039  899.1604 1.0044722
## a[31]  1.8479773 0.4671143  1.0999094  2.593261 1000.1125 1.0040477
## a[32]  2.1219449 0.4640074  1.3792046  2.875426  936.6817 1.0051614
## a[33]  3.0523489 0.5950608  2.1782626  4.016496 3050.4260 1.0000140
## a[34]  2.7555718 0.5583877  1.9128803  3.688271 3440.4364 1.0003991
## a[35]  2.7539314 0.5393730  1.9280803  3.681025 3473.4729 0.9998731
## a[36]  2.2668479 0.4760857  1.5278761  3.052407 3145.1277 1.0005683
## a[37]  2.2576907 0.4793543  1.5254593  3.054757 4173.9743 1.0007777
## a[38]  3.4098828 0.6562228  2.4333784  4.513628 2986.2838 0.9996588
## a[39]  2.7679605 0.5564047  1.9243578  3.685746 3022.1475 1.0004789
## a[40]  2.4881041 0.5012489  1.7440147  3.336282 3673.7956 1.0012898
## a[41]  0.9115801 0.4921350  0.1119326  1.686469 1020.5575 1.0039200
## a[42]  1.8989977 0.4289231  1.2095635  2.581892  783.6833 1.0072834
## a[43]  1.9968587 0.4252021  1.2935423  2.648998  829.8934 1.0043460
## a[44]  2.0976409 0.4370565  1.4006312  2.793748  814.8003 1.0060832
## a[45]  2.8976279 0.4203787  2.2319816  3.581518  754.7904 1.0045460
## a[46]  1.8979599 0.4210693  1.2182089  2.571786  832.7123 1.0050272
## a[47]  4.0051206 0.4886134  3.2367944  4.813965 1076.0942 1.0049360
## a[48]  2.4016498 0.4185416  1.7265750  3.060851  759.5839 1.0072282
## bp    -2.4347157 0.2949725 -2.9025787 -1.963986  408.2906 1.0116987
## a_1    2.5313897 0.2338313  2.1682387  2.902536  500.5992 1.0080492
## sigma  0.8234621 0.1437136  0.6106713  1.068480 1437.6472 1.0021745
precis(m_size, 2)
##              mean        sd       5.5%       94.5%     n_eff    Rhat4
## a[1]   2.14337467 0.9471509  0.7179059  3.74117365 1374.0709 1.001748
## a[2]   3.06889751 1.1671599  1.3614429  5.08995085 1612.8587 1.001333
## a[3]   1.03626326 0.7721214 -0.1753019  2.29702735 1233.8400 1.002523
## a[4]   3.03717415 1.1428564  1.3971435  4.99037983 1757.2499 1.001402
## a[5]   1.92521036 0.9473867  0.4284829  3.44522221 1375.7547 1.000328
## a[6]   1.90310931 0.9558905  0.4431434  3.52244198 1311.6847 1.002124
## a[7]   2.85177281 1.1984972  1.1451019  4.87085243 1691.0688 1.001803
## a[8]   1.88954137 0.9281802  0.4702104  3.45883851 1434.7198 1.003772
## a[9]  -0.15218982 0.7345329 -1.3206584  1.01518566 1083.9771 1.003434
## a[10]  2.14348604 0.9391648  0.7326477  3.72035753 1522.0307 1.001900
## a[11]  1.00901431 0.7718184 -0.1923982  2.25097236 1180.3805 1.001811
## a[12]  0.60226755 0.7386561 -0.5991592  1.76124430 1056.3940 1.003313
## a[13]  0.76053801 0.7682596 -0.4349608  2.01059870 1072.7415 1.000990
## a[14] -0.04967219 0.7284497 -1.2280913  1.12169222 1138.5102 1.001661
## a[15]  1.90899379 0.9525762  0.4649207  3.50557497 1639.1646 1.002101
## a[16]  1.92393100 0.9670354  0.5005042  3.53444894 1353.4198 1.002508
## a[17]  2.92323288 0.8834267  1.5867922  4.41717992 1168.3442 1.001834
## a[18]  2.40309697 0.7644430  1.2555076  3.68588110 1120.6769 1.003052
## a[19]  2.04371614 0.7043093  0.9501855  3.17981456 1053.4663 1.001387
## a[20]  3.64178315 1.0460977  2.0982218  5.41121703 1388.3924 1.003454
## a[21]  2.14845749 0.7585292  0.9663214  3.44002989 1199.7248 1.002836
## a[22]  2.16774167 0.7624614  1.0063057  3.42939349 1132.9432 1.000746
## a[23]  2.14796946 0.7566408  1.0172343  3.40424711 1016.9342 1.001235
## a[24]  1.44471274 0.6582668  0.4304178  2.54059133  968.5239 1.001815
## a[25] -0.96957405 0.5932685 -1.9463063 -0.07198787  831.5324 1.004919
## a[26]  0.20114409 0.5615803 -0.6967787  1.08561761  745.3089 1.004382
## a[27] -1.40165823 0.6376862 -2.4410017 -0.40403240  889.8096 1.004262
## a[28] -0.43378801 0.5676779 -1.3328132  0.48839232  758.0117 1.005704
## a[29] -0.10442585 0.5501632 -0.9880878  0.73997875  693.5367 1.004076
## a[30]  1.18438150 0.6184900  0.1959109  2.18911392  903.5803 1.002442
## a[31] -0.90153095 0.5677582 -1.8065788 -0.01959938  680.0942 1.004690
## a[32] -0.57122288 0.5487714 -1.4405913  0.29494916  741.4702 1.004088
## a[33]  3.18282422 0.8337118  1.9524745  4.59082854 1277.6529 1.004163
## a[34]  2.70776254 0.7376761  1.5825098  3.92151024 1070.1943 1.004346
## a[35]  2.72236376 0.7514966  1.5846766  3.98296619 1112.2197 1.000895
## a[36]  2.09337641 0.6381892  1.1062355  3.12105621  869.1808 1.004309
## a[37]  1.81023235 0.6376772  0.8356836  2.84812720  997.7334 1.001849
## a[38]  3.70117806 1.0464454  2.1978880  5.48710570 1481.2842 1.001487
## a[39]  2.47429348 0.7399688  1.3417284  3.69268529  886.6377 1.003848
## a[40]  2.08835700 0.6783886  1.0629540  3.21555426 1104.4833 1.004199
## a[41] -1.77439028 0.6193580 -2.7996319 -0.81893896  855.0712 1.004948
## a[42] -0.53102176 0.5320608 -1.4075876  0.31023985  665.7281 1.006054
## a[43] -0.41704559 0.5342813 -1.2668194  0.42043498  664.8054 1.004517
## a[44] -0.29301795 0.5183648 -1.1207276  0.53125715  632.7418 1.005548
## a[45]  0.31626564 0.5147834 -0.5092122  1.13755697  619.2962 1.003833
## a[46] -0.83208612 0.5149292 -1.6370731  0.01309766  644.0774 1.004166
## a[47]  1.80173013 0.6330274  0.8562021  2.85962844  891.5712 1.003494
## a[48] -0.26429144 0.5087381 -1.0687652  0.58060035  610.5369 1.005993
## s[1]   0.26884559 0.3858671 -0.3563900  0.89259617  392.0124 1.008959
## s[2]  -0.04534200 0.4040259 -0.6878066  0.62323321  422.7459 1.008793
## a_1    1.23623344 0.4169902  0.5823061  1.91172979  444.2361 1.007921
## sigma  1.61617071 0.2175477  1.3063404  1.98150238 1823.2411 1.001196
# From the parameters of predation and size, we would see that the parameter estimates of predation is further from 0 than size. Also, when predation is in the model, the sigma decreases very fast.

13M3. Re-estimate the basic Reed frog varying intercept model, but now using a Cauchy distribution in place of the Gaussian distribution for the varying intercepts. That is, fit this model: \[\begin{align} s_i ∼ Binomial(n_i, p_i) \\ logit(p_i) = α_{tank[i]} \\ α_{tank} ∼ Cauchy(α, σ) \\ α ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]

(You are likely to see many divergent transitions for this model. Can you figure out why? Can you fix them?) Plot and compare the posterior means of the intercepts, αtank, to the posterior means produced in the chapter, using the customary Gaussian prior. Can you explain the pattern of differences? Take note of any change in the mean α as well.

# First, the reason there might be many divergent transitions is that the Cauchy distribution has very long tails, so divergent transitions would have issue with default parameters.
# In order to fix it, we could add a control of list(adapt_delta) to the ulam model to measure more sampling of the posterior.
m_tank_cauchy <- ulam(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_],
    a[tank_] ~ dcauchy(a_1, sigma),
    a_1 ~ dnorm(0, 1.5),
    sigma ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE,
  iter = 2e3, control = list(adapt_delta = 0.99)
)
## Warning: There were 234 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 859 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
# We can now compare the posterior means
s_tank <- apply(extract.samples(m_tank)$a, 2, mean)
s_tank_cauchy <- apply(extract.samples(m_tank_cauchy)$a, 2, mean)
plot(s_tank, s_tank_cauchy,
  pch = 16, col = "purple",
  xlab = "Gaussian intercept", ylab = "Cauchy intercept"
)
abline(a = 0, b = 1, lty = 2)

# From the plot, we can see, two models have almost the same results in most of the intercepts part. However, when it crosses the extreme α_tank in Gaussian, the estimates of the Cauchy are getting more extreme. The reason behind this is that Gaussian distribution is more concentrated than Cauchy, hence the estimates shrink and fall to lower values. 

13M4. Now use a Student-t distribution with ν = 2 for the intercepts: \[\begin{align} α_{tank} ∼ Student(2, α, σ) \end{align}\]

Refer back to the Student-t example in Chapter 7 (page 234), if necessary. Plot and compare the resulting posterior to both the original model and the Cauchy model in 13M3. Can you explain the differences and similarities in shrinkage in terms of the properties of these distributions?

# first, let's build the model
m_student <- map2stan(
  alist(
    surv_ ~ dbinom(dens_, p),
    logit(p) <- a[tank_],
    a[tank_] ~ dstudent(2, a_1, sigma),
    a_1 ~ dnorm(0,1),
    sigma ~ dcauchy(0,1)
    ),
  data=dat, chains=4, cores=4, warmup=1000, 
  iter=3000, control=list(adapt_delta=0.99)
)
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
# We can now compare the posterior means
s_student <- apply(extract.samples(m_student)$a, 2, mean)
# Gaussian vs student-t
plot(s_tank, s_student,
  pch = 16, col = "purple",
  xlab = "Gaussian intercept", ylab = "student-t intercept"
)
abline(a = 0, b = 1, lty = 2)

# Cauchy vs student-t
plot(s_student, s_tank_cauchy,
  pch = 16, col = "purple",
  xlab = "student-t intercept", ylab = "Cauchy intercept"
)
abline(a = 0, b = 1, lty = 2)

# From the two plots, we can see that student-t actually have the same posterior means as the other two models in most of the intercepts.
# Then after the extreme α_tank, it becomes more extreme in student-t, similarly in the cauchy model comparsion.
# Therefore, the Gaussian model causes more shrinkage of the extreme values than the Student-t model. And the Student-t model causes more shrinkage of the extreme values than Cauchy model.

13M5. Modify the cross-classified chimpanzees model m13.4 so that the adaptive prior for blocks contains a parameter \(\bar{γ}\) for its mean: \[\begin{align} γ_j ∼ Normal(\bar{γ}, σ_γ) \\ \bar{γ} ∼ Normal(0, 1.5) \\ \end{align}\]

Compare the precis output of this model to m13.4. What has including \(\bar{γ}\) done?

# First, let's load the data
data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2 * d$condition
dat <- list(
  pulled_left = d$pulled_left,
  actor = d$actor,
  block_id = d$block,
  treatment = as.integer(d$treatment)
)

# Now build the model in m13.4 in the book
m13.4 <- ulam(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + g[block_id] + b[treatment],
    b[treatment] ~ dnorm(0, 0.5),
    a[actor] ~ dnorm(a_1, sigma_a),
    g[block_id] ~ dnorm(0, sigma_g),
    a_1 ~ dnorm(0, 1.5),
    sigma_a ~ dexp(1),
    sigma_g ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE
)
## Warning: There were 5 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
# We can now introduce the adaptive prior on blocks
m_13M5 <- ulam(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + g[block_id] + b[treatment],
    b[treatment] ~ dnorm(0, 0.5),
    a[actor] ~ dnorm(a_1, sigma_a),
    g[block_id] ~ dnorm(g_1, sigma_g),
    a_1 ~ dnorm(0, 1.5),
    g_1 ~ dnorm(0, 1.5),
    sigma_a ~ dexp(1),
    sigma_g ~ dexp(1)
  ),
  data = dat, chains = 4, cores = 4, log_lik = TRUE
)
## Warning: There were 65 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
# Then, we can compare the models
precis(m13.4, 2, pars = c("a_1", "b"))
##            mean        sd        5.5%      94.5%     n_eff    Rhat4
## a_1   0.5802570 0.7113074 -0.51789376 1.73212810 1091.3730 1.003090
## b[1] -0.1281346 0.3016403 -0.60553926 0.35581092  517.3216 1.003205
## b[2]  0.4012688 0.3011628 -0.07252781 0.89337865  529.2148 1.004402
## b[3] -0.4738906 0.3007320 -0.96218848 0.01044997  561.1035 1.001749
## b[4]  0.2868182 0.2952481 -0.20115053 0.76231255  519.2604 1.003926
precis(m_13M5, 2, pars = c("a_1", "b", "g_1"))
##            mean        sd        5.5%       94.5%    n_eff    Rhat4
## a_1   0.3035181 1.1266799 -1.46105913  2.11710724 349.5915 1.006931
## b[1] -0.1433947 0.2979416 -0.63269876  0.33152186 752.1105 1.001445
## b[2]  0.3894681 0.3001689 -0.09912795  0.85480560 745.2047 1.000698
## b[3] -0.4868014 0.3078016 -0.98504973 -0.00989841 662.5519 1.004525
## b[4]  0.2739321 0.2983625 -0.21097252  0.72339494 655.4042 1.000249
## g_1   0.4031091 1.1549985 -1.42614633  2.19369656 238.9320 1.011084
# As we can see from the parameters, the modified model didn't do it well. The n_eff is not efficient as the original m13.4, and Rhat value as well.
# This reason of this is because both a[actor] and g[block_id] are defined in varying priors, which has its own adaptive prior respectively, the combinations of σ and γ to produce the sum would be infinite. Therefore, the posterior would not be a good one and it's hard to sample.

13M6. Sometimes the prior and the data (through the likelihood) are in conflict, because they concentrate around different regions of parameter space. What happens in these cases depends a lot upon the shape of the tails of the distributions. Likewise, the tails of distributions strongly influence can outliers are shrunk or not towards the mean. I want you to consider four different models to fit to one observation at y = 0. The models differ only in the distributions assigned to the likelihood and prior. Here are the four models:

\[\begin{align} Model \;NN: y &∼ Normal(μ, 1) & Model \;TN: y &∼ Student(2, μ, 1) \\ μ &∼ Normal(10, 1) & μ &∼ Normal(10, 1) \\ Model \;NT: y &∼ Normal(μ, 1) & Model \;TT: y &∼ Student(2, μ, 1) \\ μ &∼ Student(2, 10, 1) & μ &∼ Student(2, 10, 1) \\ \end{align}\]

Estimate and plot the posterior distributions against the likelihoods for these models and compare them. Can you explain the results, using the properties of the distributions?

# Let's load the data and build the model
data(chimpanzees)
d <- chimpanzees

# Model NN
m_NN <- map2stan( 
  alist(
    pulled_left ~ dnorm(mu, 1),
    mu ~ dnorm(10, 1)
  ), 
  data = d, iter = 2000, chains = 1, cores = 1 
)
## 
## SAMPLING FOR MODEL 'd292ea68f30b4f810016e54c8b03a861' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.021 seconds (Warm-up)
## Chain 1:                0.015 seconds (Sampling)
## Chain 1:                0.036 seconds (Total)
## Chain 1: 
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here
# Model TN
m_TN <- map2stan( 
  alist(
    pulled_left ~ dstudent(2, mu, 1),
    mu ~ dnorm(10, 1)
  ), 
  data = d, iter = 2000, chains = 1, cores = 1 
)
## 
## SAMPLING FOR MODEL 'c33ae6570561b75c3dfd9115ee1d6a00' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.109 seconds (Warm-up)
## Chain 1:                0.112 seconds (Sampling)
## Chain 1:                0.221 seconds (Total)
## Chain 1: 
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here
# Model NT
m_NT <- map2stan( 
  alist(
    pulled_left ~ dnorm(mu, 1),
    mu ~ dstudent(2, 10, 1)
  ), 
  data = d, iter = 2000, chains = 1, cores = 1 
)
## 
## SAMPLING FOR MODEL '74aab02fba87f69da81a38db4ffecf91' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.019 seconds (Warm-up)
## Chain 1:                0.019 seconds (Sampling)
## Chain 1:                0.038 seconds (Total)
## Chain 1: 
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here
# Model TT
m_TT <- map2stan( 
  alist(
    pulled_left ~ dstudent(2, mu, 1),
    mu ~ dstudent(2, 10, 1)
  ), 
  data = d, iter = 2000, chains = 1, cores = 1 
)
## 
## SAMPLING FOR MODEL 'c000de56c09425e5f2828b35e35b85af' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.102 seconds (Warm-up)
## Chain 1:                0.101 seconds (Sampling)
## Chain 1:                0.203 seconds (Total)
## Chain 1: 
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here
# Now let's see the parameters
precis(m_NN)
##         mean         sd      5.5%     94.5%    n_eff    Rhat4
## mu 0.5981772 0.04179096 0.5311812 0.6620014 388.0282 1.003692
precis(m_TN)
##         mean         sd      5.5%     94.5%    n_eff    Rhat4
## mu 0.6183745 0.04546752 0.5457564 0.6918558 301.8135 0.999719
precis(m_NT)
##         mean         sd      5.5%     94.5%    n_eff    Rhat4
## mu 0.5784861 0.04543688 0.5082136 0.6511803 338.2154 1.001399
precis(m_TT)
##         mean         sd      5.5%     94.5%    n_eff     Rhat4
## mu 0.6051942 0.04495596 0.5322269 0.6794475 365.1497 0.9990562
# We can now compare the posterior means
s_m_NN <- extract.samples(m_NN)
s_m_TN <- extract.samples(m_TN)
s_m_NT <- extract.samples(m_NT)
s_m_TT <- extract.samples(m_TT)

dens(s_m_NN$mu, col="purple")
dens(s_m_TN$mu, col="blue", add = TRUE)
dens(s_m_NT$mu, col="red", add=TRUE)
dens(s_m_TT$mu, col="green", add=TRUE)

# As we can see from the plot,  when the prior and the data through likelihood are of the same distribution, the distribution seems to be symmetric. Also, the tails of Student-t distribution are heavier than Normal distribution, which indicates that it influence the shrinkage.

EXTRA CREDIT (10 POINTS)

13H1. In 1980, a typical Bengali woman could have 5 or more children in her lifetime. By the year 2000, a typical Bengali woman had only 2 or 3. You’re going to look at a historical set of data, when contraception was widely available but many families chose not to use it. These data reside in data(bangladesh) and come from the 1988 Bangladesh Fertility Survey. Each row is one of 1934 women. There are six variables, but you can focus on two of them for this practice problem: (1) district: ID number of administrative district each woman resided in (2) use.contraception: An indicator (0/1) of whether the woman was using contraception The first thing to do is ensure that the cluster variable, district, is a contiguous set of integers. Recall that these values will be index values inside the model. If there are gaps, you’ll have parameters for which there is no data to inform them. Worse, the model probably won’t run. Look at the unique values of the district variable:

data(bangladesh)
d <- bangladesh
sort(unique(d$district)) #District 54 is absent. So district isn’t yet a good index variable, because it’s not contiguous. This is easy to fix. Just make a new variable that is contiguous. This is enough to do it:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 55 56 57 58 59 60 61
d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id))  # Now there are 60 values, contiguous integers 1 to 60. Now, focus on predicting use.contraception, clustered by district_id.
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60

Fit both (1) a traditional fixed-effects model that uses an index variable for district and (2) a multilevel model with varying intercepts for district. Plot the predicted proportions of women in each district using contraception, for both the fixed-effects model and the varying-effects model. That is, make a plot in which district ID is on the horizontal axis and expected proportion using contraception is on the vertical. Make one plot for each model, or layer them on the same plot, as you prefer. How do the models disagree? Can you explain the pattern of disagreement? In particular, can you explain the most extreme cases of disagreement, both why they happen where they do and why the models reach different inferences?