This chapter has been an introduction to the motivation, implementation, and interpretation of basic multilevel models. It focused on varying intercepts, which achieve better estimates of baseline differences among clusters in the data. They achieve better estimates, because they simultaneously model the population of clusters and use inferences about the population to pool information among parameters. From another perspective, varying intercepts are adaptively regularized parameters, relying upon a prior that is itself learned from the data.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).
Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.
13E1. Which of the following priors will produce more shrinkage in the estimates? Show the prior plot. \[\begin{align} \ α_{TANK} ∼ Normal(0, 1) \tag{a} \\ \ α_{TANK} ∼ Normal(0, 2) \tag{b} \\ \end{align}\]
# The first prior will produce more shrinkage as it has lower variance (standard deviation) when compared to the second one.
13E2. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Binomial(1, p_i) \\ logit(p_i) = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 1.5) \\ β ∼ Normal(0, 0.5) \\ \end{align}\]
#yi ~ Binomial(1, pi)
#logit(pi) = αgroup[i] + βxi
#αgroup ∼ Normal(α_bar, σα)
#α_bar ~ Normal(0, 1.5)
#σα ~ Exponential(1)
#β ∼ Normal(0, 0.5)
13E3. Rewrite the following model as a multilevel model. \[\begin{align} y_i ∼ Normal(μ_i, σ) \\ μ_i = α_{group[i]} + βx_i \\ α_{group} ∼ Normal(0, 5) \\ β ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]
#yi ~ Normal(μi, σ)
#μi = α_group[i] + βx_i
# α_group[i] ∼ Normal(α_bar, σα)
#β ∼ Normal(0, 1)
#σ ∼ Exponential(1)
#α_bar ~ Normal(0, 1)
#σα ~ Exponential(1)
13E4. Write a mathematical model formula for a Poisson regression with varying intercepts.
#yi ~ Poisson(pi)
#log(pi) = αgroup[i] + βxi
#αgroup ∼ Normal(α_bar, σα)
#α_bar ~ Normal(0, 1)
#σα ~ Exponential(1)
#β ∼ Normal(0, 1)
13E5. Write a mathematical model formula for a Poisson regression with two different kinds of varying intercepts, a cross-classified model.
# y_i ~ Poisson(lambda_i)
# log(lambda_i) = a_l1 + a1[i] + a2[i] + beta*x_i
# a_l1 ~ Normal(0, 10)
# beta ~ Normal(0,1)
# a1[i] ~ Normal(0, sigma1_l2)
# a2[i] ~ Normal(0, sigma2_l2)
# sigma1_l2 ~ HalfCauchy(0,1)
# sigma2_l2 ~ HalfCauchy(0,1)
13M1. Revisit the Reed frog survival data, data(reedfrogs), and add the predation and size treatment variables to the varying intercepts model. Consider models with either main effect alone, both main effects, as well as a model including both and their interaction. Instead of focusing on inferences about these two predictor variables, focus on the inferred variation across tanks. Explain why it changes as it does across models. Plot the sigma estimates.
library(rethinking)
data(reedfrogs)
d <- reedfrogs
str(d)
## 'data.frame': 48 obs. of 5 variables:
## $ density : int 10 10 10 10 10 10 10 10 10 10 ...
## $ pred : Factor w/ 2 levels "no","pred": 1 1 1 1 1 1 1 1 2 2 ...
## $ size : Factor w/ 2 levels "big","small": 1 1 1 1 2 2 2 2 1 1 ...
## $ surv : int 9 10 7 10 9 9 10 9 4 9 ...
## $ propsurv: num 0.9 1 0.7 1 0.9 0.9 1 0.9 0.4 0.9 ...
d$tank <- 1:nrow(d)
d$predation <- ifelse(test = d$pred == "pred", yes = 1, no = 0)
d$frogsize <- ifelse(test = d$size == "big", yes = 1, no = 0)
m12M1.predation <- map2stan(
alist(
surv ~ dbinom(density, p) ,
logit(p) <- alpha[tank] + beta_predation*predation,
alpha[tank] ~ dnorm(a, sigma),
a ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1),
beta_predation ~ dnorm(0, 1)
), data=d )
##
## SAMPLING FOR MODEL 'e314a229f9bebc69e2ee0a1b56edd0e7' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.569 seconds (Warm-up)
## Chain 1: 0.405 seconds (Sampling)
## Chain 1: 0.974 seconds (Total)
## Chain 1:
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
m12M1.size <- map2stan(
alist(
surv ~ dbinom(density, p) ,
logit(p) <- alpha[tank] + beta_frogsize*frogsize,
alpha[tank] ~ dnorm(a, sigma),
a ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1),
beta_frogsize ~ dnorm(0, 1)
), data=d )
##
## SAMPLING FOR MODEL 'a8172b3d1a59a000c9a3d470b21665c5' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.638 seconds (Warm-up)
## Chain 1: 0.413 seconds (Sampling)
## Chain 1: 1.051 seconds (Total)
## Chain 1:
m12M1.both <- map2stan(
alist(
surv ~ dbinom(density, p) ,
logit(p) <- alpha[tank] + beta_predation*predation + beta_frogsize*frogsize,
alpha[tank] ~ dnorm(a, sigma),
a ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1),
c(beta_predation, beta_frogsize) ~ dnorm(0, 1)
), data=d )
##
## SAMPLING FOR MODEL '10165b3e1d4e51baea29f9a6c65db390' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.759 seconds (Warm-up)
## Chain 1: 0.549 seconds (Sampling)
## Chain 1: 1.308 seconds (Total)
## Chain 1:
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
m12M1.interaction <- map2stan(
alist(
surv ~ dbinom(density, p) ,
logit(p) <- alpha[tank] + beta_predation*predation + beta_frogsize*frogsize + beta_predation_frogsize*predation*frogsize,
alpha[tank] ~ dnorm(a, sigma),
a ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1),
c(beta_predation, beta_frogsize, beta_predation_frogsize) ~ dnorm(0, 1)
), data=d )
##
## SAMPLING FOR MODEL '254c294b3c5682b67ce0b7580d276fa4' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.868 seconds (Warm-up)
## Chain 1: 0.773 seconds (Sampling)
## Chain 1: 1.641 seconds (Total)
## Chain 1:
ppc <- function(model, df) {
post <- extract.samples(model)
df$propsurv.est <- logistic( apply( X = post$alpha, MARGIN = 2, FUN = median ) )
plot( df$propsurv , ylim=c(0,1) , pch=16 , xaxt="n" ,
xlab="tank" , ylab="proportion survival" , col=rangi2 )
axis( 1 , at=c(1,16,32,48) , labels=c(1,16,32,48) )
points( df$propsurv.est )
abline( h=logistic(median(post$alpha)) , lty=2 )
abline( v=16.5 , lwd=0.5 )
abline( v=32.5 , lwd=0.5 )
text( 8 , 0 , "small tanks" )
text( 16+8 , 0 , "medium tanks" )
text( 32+8 , 0 , "large tanks" )
}
ppc(model = m12M1.predation, df = d)
ppc(model = m12M1.size, df = d)
ppc(model = m12M1.both, df = d)
ppc(model = m12M1.interaction, df = d)
coeftab(m12M1.predation, m12M1.size, m12M1.both, m12M1.interaction)
## m12M1.predation m12M1.size m12M1.both m12M1.interaction
## alpha[1] 2.53 2.51 2.79 2.45
## alpha[2] 3.00 3.39 3.20 2.91
## alpha[3] 1.72 1.36 2.09 1.73
## alpha[4] 2.97 3.36 3.23 2.90
## alpha[5] 2.54 2.24 2.64 2.46
## alpha[6] 2.53 2.19 2.66 2.50
## alpha[7] 3.01 3.13 3.11 2.88
## alpha[8] 2.55 2.29 2.66 2.47
## alpha[9] 2.27 0.19 2.62 2.49
## alpha[10] 3.62 2.45 3.85 3.72
## alpha[11] 3.09 1.34 3.35 3.22
## alpha[12] 2.79 0.91 3.10 2.97
## alpha[13] 3.09 1.03 3.05 2.72
## alpha[14] 2.58 0.24 2.58 2.22
## alpha[15] 3.66 2.22 3.60 3.26
## alpha[16] 3.64 2.20 3.62 3.26
## alpha[17] 2.93 3.23 3.21 2.82
## alpha[18] 2.58 2.73 2.90 2.53
## alpha[19] 2.30 2.37 2.64 2.23
## alpha[20] 3.34 4.02 3.54 3.15
## alpha[21] 2.59 2.45 2.67 2.54
## alpha[22] 2.61 2.47 2.68 2.56
## alpha[23] 2.60 2.41 2.65 2.55
## alpha[24] 2.03 1.71 2.13 2.04
## alpha[25] 1.62 -0.64 2.01 1.94
## alpha[26] 2.59 0.53 2.93 2.88
## alpha[27] 1.30 -1.06 1.68 1.62
## alpha[28] 2.06 -0.11 2.41 2.35
## alpha[29] 2.59 0.20 2.56 2.16
## alpha[30] 3.59 1.47 3.52 3.12
## alpha[31] 1.91 -0.61 1.92 1.54
## alpha[32] 2.20 -0.30 2.18 1.80
## alpha[33] 3.09 3.50 3.37 2.99
## alpha[34] 2.78 3.06 3.08 2.71
## alpha[35] 2.78 3.05 3.12 2.69
## alpha[36] 2.27 2.42 2.68 2.22
## alpha[37] 2.27 2.09 2.37 2.28
## alpha[38] 3.46 3.97 3.49 3.34
## alpha[39] 2.77 2.75 2.85 2.74
## alpha[40] 2.53 2.38 2.60 2.50
## alpha[41] 0.95 -1.44 1.38 1.32
## alpha[42] 1.96 -0.21 2.32 2.29
## alpha[43] 2.06 -0.09 2.44 2.39
## alpha[44] 2.17 0.02 2.53 2.49
## alpha[45] 2.97 0.59 2.93 2.53
## alpha[46] 1.97 -0.56 1.94 1.55
## alpha[47] 4.08 2.10 4.00 3.59
## alpha[48] 2.46 0.03 2.44 2.03
## a 2.61 1.58 2.78 2.54
## sigma 0.83 1.62 0.77 0.76
## beta_predation -2.50 NA -2.43 -1.99
## beta_frogsize NA -0.37 -0.47 0.10
## beta_predation_frogsize NA NA NA -1.01
## nobs 48 48 48 48
# As we add more predictor variables, the variation of tanks decreases.
13M2. Compare the models you fit just above, using WAIC. Can you reconcile the differences in WAIC with the posterior distributions of the models? Show the WAIC table.
compare(m12M1.predation, m12M1.size, m12M1.both, m12M1.interaction)
## WAIC SE dWAIC dSE pWAIC weight
## m12M1.interaction 1001.367 37.75398 0.0000000 NA 28.39525 0.401297479
## m12M1.both 1001.940 37.34726 0.5732018 2.417532 28.16370 0.301298688
## m12M1.predation 1001.995 37.60836 0.6278174 2.484827 29.44670 0.293182205
## m12M1.size 1010.476 38.28042 9.1089648 7.071512 38.21730 0.004221627
precis(m12M1.size)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 1.5777411 0.3285713 1.076454 2.0992334 205.2244 0.9989999
## sigma 1.6188341 0.2300265 1.286670 1.9992187 538.9830 1.0000125
## beta_frogsize -0.3684859 0.4330103 -1.085825 0.2945228 112.2618 1.0016894
precis(m12M1.predation)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 2.6075053 0.2502034 2.2200508 3.009473 96.06412 0.9994797
## sigma 0.8310612 0.1458236 0.6239689 1.089684 357.73092 1.0053067
## beta_predation -2.4992063 0.3087161 -3.0254240 -2.041764 69.61185 1.0006708
13M3. Re-estimate the basic Reed frog varying intercept model, but now using a Cauchy distribution in place of the Gaussian distribution for the varying intercepts. That is, fit this model: \[\begin{align} s_i ∼ Binomial(n_i, p_i) \\ logit(p_i) = α_{tank[i]} \\ α_{tank} ∼ Cauchy(α, σ) \\ α ∼ Normal(0, 1) \\ σ ∼ Exponential(1) \\ \end{align}\]
(You are likely to see many divergent transitions for this model. Can you figure out why? Can you fix them?) Plot and compare the posterior means of the intercepts, αtank, to the posterior means produced in the chapter, using the customary Gaussian prior. Can you explain the pattern of differences? Take note of any change in the mean α as well.
m12m1.base <- map2stan(
alist(
surv ~ dbinom(density , p),
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dnorm( a , sigma ),
a ~ dnorm(0, 1) ,
sigma ~ dcauchy(0,1)
),
data=d , iter=2000 , chains=2, cores=2
)
precis(m12m1.base, depth=2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a_tank[1] 2.1341293929 0.8860062 0.81754950 3.640646668 3760.851 0.9999076
## a_tank[2] 3.0287491359 1.0745467 1.44885855 4.900575168 2883.877 0.9993937
## a_tank[3] 0.9998485487 0.7014328 -0.10480219 2.159475081 3925.607 0.9995578
## a_tank[4] 3.0442949990 1.0615278 1.51521836 4.911409789 2090.253 0.9996882
## a_tank[5] 2.0924493712 0.8167059 0.86400902 3.467702826 3199.546 0.9991802
## a_tank[6] 2.0968566602 0.8268646 0.86353530 3.492425893 3299.406 1.0000652
## a_tank[7] 3.0200599643 1.0742574 1.45808032 4.932931003 2625.564 0.9993018
## a_tank[8] 2.1189365346 0.8454300 0.87692816 3.543247752 2835.512 0.9997861
## a_tank[9] -0.1867427848 0.5971313 -1.14746223 0.778497802 3797.813 0.9992762
## a_tank[10] 2.1197992546 0.8489608 0.86429939 3.555811179 3123.998 0.9992432
## a_tank[11] 1.0030471492 0.6715713 -0.03208110 2.112457881 4088.368 0.9995058
## a_tank[12] 0.5780590124 0.6169006 -0.38773399 1.566161340 4106.326 1.0000797
## a_tank[13] 0.9996232556 0.6448793 0.02696180 2.067535909 3582.778 0.9992926
## a_tank[14] 0.1902511838 0.6055957 -0.80047787 1.150179607 4738.512 0.9990725
## a_tank[15] 2.0877383504 0.8330177 0.88216099 3.490159844 3541.863 1.0006268
## a_tank[16] 2.1136785031 0.8601852 0.84292233 3.574336472 3414.569 0.9999207
## a_tank[17] 2.9008887241 0.8261147 1.71604408 4.318378009 2616.232 0.9991205
## a_tank[18] 2.3911122672 0.6861933 1.42118341 3.553577144 3339.730 0.9990157
## a_tank[19] 2.0207005258 0.5812802 1.15927281 2.967468383 2839.764 1.0003708
## a_tank[20] 3.6717322709 1.0298917 2.22289701 5.454291727 1937.605 1.0006672
## a_tank[21] 2.3926653458 0.6897550 1.39439005 3.580864491 3236.394 1.0007424
## a_tank[22] 2.3738206544 0.6366901 1.42942792 3.470754176 2975.295 1.0000934
## a_tank[23] 2.3889613321 0.6818206 1.39531435 3.524203329 3249.232 0.9994428
## a_tank[24] 1.6941290627 0.5252877 0.89253021 2.590401860 3214.208 0.9998396
## a_tank[25] -1.0001557792 0.4324529 -1.69166815 -0.339171474 3188.506 1.0004786
## a_tank[26] 0.1599281304 0.3864829 -0.45984512 0.755590899 4061.262 0.9992404
## a_tank[27] -1.4288052427 0.4831299 -2.21883813 -0.708030993 3342.544 1.0002603
## a_tank[28] -0.4731792902 0.3996576 -1.11114842 0.151789548 3666.320 0.9991067
## a_tank[29] 0.1618105564 0.3997430 -0.47569116 0.812085032 3506.954 0.9992945
## a_tank[30] 1.4325512313 0.5008719 0.68557811 2.263370090 3491.979 0.9998743
## a_tank[31] -0.6280869055 0.4048039 -1.26778872 -0.006209008 3701.772 1.0003034
## a_tank[32] -0.3099895983 0.4204850 -1.00291093 0.366414018 5884.270 0.9995438
## a_tank[33] 3.1696906680 0.7500089 2.09167567 4.478031737 2844.274 1.0016148
## a_tank[34] 2.7071557143 0.6574589 1.73238566 3.836657678 2794.138 0.9995638
## a_tank[35] 2.6959901394 0.6639150 1.73042750 3.827983860 2821.118 0.9996619
## a_tank[36] 2.0671449236 0.5327695 1.27121334 2.928554275 3523.453 0.9996022
## a_tank[37] 2.0450233051 0.5280229 1.25415929 2.943164615 3026.598 0.9990944
## a_tank[38] 3.8689652575 0.9433530 2.56423591 5.495114849 2341.296 0.9990193
## a_tank[39] 2.7192830796 0.6878535 1.72406720 3.875907141 3391.802 0.9999968
## a_tank[40] 2.3347054762 0.5706892 1.50568969 3.272851035 3990.645 0.9993297
## a_tank[41] -1.8113529930 0.4893143 -2.63790987 -1.064594409 3849.448 0.9991967
## a_tank[42] -0.5774499336 0.3489297 -1.16106802 -0.010596719 4064.243 0.9992748
## a_tank[43] -0.4503910696 0.3315916 -0.97890815 0.052060764 5047.336 0.9992675
## a_tank[44] -0.3480583951 0.3557746 -0.91237535 0.216485385 4519.295 0.9992022
## a_tank[45] 0.5878967435 0.3442246 0.05065315 1.141591951 3885.765 0.9993842
## a_tank[46] -0.5776975692 0.3504942 -1.14744446 -0.033537199 4168.090 0.9997365
## a_tank[47] 2.0635206999 0.4961206 1.31989716 2.898632057 2372.668 1.0005575
## a_tank[48] -0.0004034919 0.3318090 -0.53025794 0.520824282 3370.050 0.9996353
## a 1.2972402960 0.2491515 0.89919056 1.693495727 2589.598 0.9992834
## sigma 1.6148434542 0.2115367 1.31268801 1.986650137 1467.197 1.0000723
m12m1.base.cauchy <- map2stan(
alist(
surv ~ dbinom(density , p),
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dcauchy( a , sigma ),
a ~ dnorm(0, 1) ,
sigma ~ dcauchy(0,1)
),
data=d , iter=2000 , chains=2, cores=2
)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
precis(m12m1.base.cauchy, depth=2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a_tank[1] 2.03504437 0.9074302 0.89346439 3.688621175 1182.25873 1.0001002
## a_tank[2] 5.00328618 5.8366727 1.50650680 13.377225160 263.35968 1.0125868
## a_tank[3] 1.10027927 0.6123227 0.14989463 2.088321657 1679.97032 1.0021274
## a_tank[4] 11.80757836 20.5182301 1.53334134 51.124786926 35.39916 1.0343439
## a_tank[5] 1.97379563 0.8077899 0.85491442 3.471440907 1235.38991 0.9995167
## a_tank[6] 2.01998594 0.8560082 0.86693424 3.582639444 977.42875 1.0012790
## a_tank[7] 6.49214942 11.9564973 1.50277108 16.630040777 185.89649 1.0101093
## a_tank[8] 1.99096301 0.8752673 0.88090278 3.342238708 1168.55686 1.0013917
## a_tank[9] -0.06395090 0.6399831 -1.04866854 0.955842125 1969.31083 0.9996475
## a_tank[10] 2.02361902 0.8967597 0.84607164 3.597584485 794.82676 1.0034813
## a_tank[11] 1.12975814 0.6117510 0.18850820 2.139693841 1971.95250 1.0012499
## a_tank[12] 0.72721115 0.6081235 -0.22762769 1.711086491 2318.13984 1.0000417
## a_tank[13] 1.08872992 0.6330887 0.06045117 2.085326008 1702.65495 1.0000196
## a_tank[14] 0.35152957 0.6305716 -0.69448160 1.352545739 1988.27743 0.9991058
## a_tank[15] 1.97681164 0.7943358 0.94920346 3.324936937 1372.93002 0.9991354
## a_tank[16] 2.01301267 0.8728276 0.89539374 3.480781402 692.43131 1.0023657
## a_tank[17] 2.92008667 1.0109070 1.64293160 4.740231231 947.25662 1.0005202
## a_tank[18] 2.25896208 0.6636531 1.31600864 3.349307400 1452.77757 1.0007050
## a_tank[19] 1.91673535 0.5153243 1.18208491 2.777705530 1681.53529 0.9994822
## a_tank[20] 8.31915868 10.5448147 2.37813159 20.172752586 141.77887 1.0204685
## a_tank[21] 2.26730047 0.6562256 1.32742378 3.382450990 1263.70777 0.9991431
## a_tank[22] 2.28054681 0.6558753 1.37663717 3.438571147 1503.49235 0.9993667
## a_tank[23] 2.26273040 0.6242314 1.37369842 3.346840909 1198.93928 0.9994150
## a_tank[24] 1.64381373 0.4969010 0.90218827 2.473266229 1983.51727 0.9992652
## a_tank[25] -1.04525418 0.4739859 -1.81810341 -0.300057577 2104.35237 0.9996001
## a_tank[26] 0.25115943 0.4077992 -0.40510582 0.907977019 2114.50220 1.0000009
## a_tank[27] -1.57304056 0.5342255 -2.44718755 -0.771340692 2177.74160 0.9993326
## a_tank[28] -0.45214040 0.4193163 -1.14646089 0.195255117 1942.94760 0.9996463
## a_tank[29] 0.22360065 0.4120006 -0.43232057 0.867465243 2002.19872 1.0005786
## a_tank[30] 1.43747116 0.4626641 0.74023953 2.134409209 1950.62272 0.9997518
## a_tank[31] -0.65370564 0.4448391 -1.39654598 0.040419492 1876.67329 1.0001009
## a_tank[32] -0.26885248 0.4050955 -0.90348711 0.361049163 2348.01081 0.9990947
## a_tank[33] 3.22519448 0.9091673 1.99166620 4.887290484 1756.89541 1.0000892
## a_tank[34] 2.63594034 0.7374978 1.65523092 3.888889997 875.31163 1.0005840
## a_tank[35] 2.59381059 0.6668548 1.64220145 3.793751075 1817.16194 0.9993027
## a_tank[36] 1.96871598 0.4807678 1.27037449 2.778877157 1675.77537 0.9996599
## a_tank[37] 1.95792878 0.4584961 1.26525934 2.704822503 2005.26177 0.9996358
## a_tank[38] 10.33707418 12.8071260 2.77018554 29.869735357 181.31472 1.0022495
## a_tank[39] 2.56508746 0.6542670 1.66592274 3.668022315 1351.48250 1.0015910
## a_tank[40] 2.22444866 0.5288007 1.46536240 3.095983276 1964.25774 0.9993844
## a_tank[41] -2.00693558 0.5385706 -2.89668491 -1.206850585 1298.93709 1.0005354
## a_tank[42] -0.56446000 0.3516312 -1.13879070 -0.016580786 1632.34793 1.0000139
## a_tank[43] -0.44632564 0.3440231 -0.99804389 0.088056135 2430.32889 1.0002872
## a_tank[44] -0.31055714 0.3587402 -0.87803544 0.249004714 2058.40045 0.9997180
## a_tank[45] 0.63799360 0.3367679 0.10259180 1.178600120 1801.42451 0.9993557
## a_tank[46] -0.55252512 0.3652535 -1.14131103 0.004757726 1967.89385 1.0004859
## a_tank[47] 1.96075295 0.4709723 1.25935649 2.730532640 2042.81199 1.0013719
## a_tank[48] 0.04074172 0.3506348 -0.51415805 0.593064049 2494.58373 0.9990917
## a 1.41211381 0.3004779 0.90921119 1.874350566 1228.82331 1.0036669
## sigma 1.02442830 0.2306264 0.69175607 1.405777475 1239.46731 1.0000639
(cmp <- compare(m12m1.base, m12m1.base.cauchy))
## WAIC SE dWAIC dSE pWAIC weight
## m12m1.base 1010.119 37.95516 0.0000000 NA 37.91191 0.5700134
## m12m1.base.cauchy 1010.683 37.64326 0.5638117 2.326914 38.42410 0.4299866
#WAIC is almost the same
# Model with Cauchy sigma has several intercepts per tank with greater variance than model with Gaussian sigma. In addition, model with Cauchy sigma has bigger estimate of mean and smaller estimate of sigma.
13M4. Now use a Student-t distribution with ν = 2 for the intercepts: \[\begin{align} α_{tank} ∼ Student(2, α, σ) \end{align}\]
Refer back to the Student-t example in Chapter 7 (page 234), if necessary. Plot and compare the resulting posterior to both the original model and the Cauchy model in 13M3. Can you explain the differences and similarities in shrinkage in terms of the properties of these distributions?
m12m4.student <- map2stan(
alist(
surv ~ dbinom(density , p),
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dstudent(2, a , sigma ),
a ~ dnorm(0, 1) ,
sigma ~ dcauchy(0,1)
),
data=d , iter=2000 , chains=2, cores=2
)
precis(m12m4.student, depth=2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a_tank[1] 2.06799016 0.8766975 0.87128497 3.65191213 2525.0214 0.9997388
## a_tank[2] 3.93431154 3.3845338 1.47390897 8.76243101 334.4904 1.0062968
## a_tank[3] 1.04617559 0.6294724 0.05649876 2.06193222 3146.2559 0.9993649
## a_tank[4] 4.18932745 3.4429834 1.47226752 9.61552350 349.8030 1.0012792
## a_tank[5] 2.07605022 0.8803428 0.83273862 3.55668801 1631.1297 0.9997732
## a_tank[6] 2.03673890 0.8539875 0.81490086 3.52460490 2008.2192 0.9990985
## a_tank[7] 4.01524812 3.1123438 1.52209879 8.78379331 465.2536 1.0070345
## a_tank[8] 2.05811340 0.8891362 0.82021802 3.60790092 1729.1582 1.0014216
## a_tank[9] -0.10486039 0.6409054 -1.14672066 0.86056420 2859.8262 0.9996417
## a_tank[10] 2.06689379 0.9105492 0.84001698 3.56942139 1712.5893 1.0007968
## a_tank[11] 1.04111932 0.6332779 0.03261613 2.07684695 3510.1234 0.9991643
## a_tank[12] 0.64903282 0.6380532 -0.36015854 1.63811547 3706.3574 0.9999549
## a_tank[13] 1.01932648 0.6361485 0.01336004 2.05039994 3225.1801 0.9992259
## a_tank[14] 0.26956696 0.6157215 -0.69969480 1.25437106 3298.6887 0.9994352
## a_tank[15] 2.05760411 0.8715831 0.90035096 3.58299622 1544.5914 1.0004929
## a_tank[16] 2.05334744 0.8888378 0.85536243 3.60699274 2067.3825 0.9998448
## a_tank[17] 2.89577037 0.8705640 1.70309417 4.46874783 1502.2237 0.9999751
## a_tank[18] 2.35127849 0.6777332 1.35548861 3.50975516 1639.6173 0.9990409
## a_tank[19] 1.95302305 0.5594944 1.13578801 2.88849669 2316.7112 1.0008922
## a_tank[20] 5.62244812 4.7341190 2.37040281 12.15103651 283.5865 1.0039766
## a_tank[21] 2.34373015 0.6675415 1.37457105 3.50619016 1779.9949 1.0014483
## a_tank[22] 2.30230895 0.6618921 1.36467748 3.44387567 2062.5416 0.9996037
## a_tank[23] 2.30010873 0.6199462 1.39984727 3.34614273 2419.8534 0.9990668
## a_tank[24] 1.66077523 0.4971691 0.90301723 2.49953512 3753.7813 0.9996164
## a_tank[25] -1.01840333 0.4633269 -1.79841081 -0.30692578 2750.2463 1.0006870
## a_tank[26] 0.20695531 0.4111780 -0.44605153 0.86566499 3170.8031 0.9996350
## a_tank[27] -1.52832993 0.5193995 -2.39385775 -0.77025410 2587.6878 0.9995558
## a_tank[28] -0.45019770 0.4239863 -1.14720936 0.21813167 4240.3379 0.9991581
## a_tank[29] 0.19615037 0.3964033 -0.40450578 0.80142921 2914.8961 0.9992619
## a_tank[30] 1.44948379 0.4861423 0.70471616 2.25382582 2767.2661 1.0004371
## a_tank[31] -0.62999704 0.4633719 -1.39949599 0.06701097 3196.0257 0.9998521
## a_tank[32] -0.27823323 0.3949590 -0.93357012 0.34910052 3209.0854 0.9993821
## a_tank[33] 3.21972831 0.8981670 2.02949433 4.78769266 1690.9399 1.0017242
## a_tank[34] 2.63101801 0.6620015 1.63237450 3.77799529 2138.3459 0.9996686
## a_tank[35] 2.66269382 0.6742650 1.77500600 3.86389952 1472.1122 1.0000493
## a_tank[36] 1.99972655 0.4861075 1.28764882 2.80073126 2851.0723 0.9993867
## a_tank[37] 2.02209074 0.5077416 1.28482146 2.88029601 2360.6863 0.9994940
## a_tank[38] 5.54897720 3.3171451 2.72082587 10.97534199 403.8605 1.0081536
## a_tank[39] 2.65016395 0.6756959 1.69371734 3.85377658 2317.1736 0.9998620
## a_tank[40] 2.29170133 0.5660099 1.48238363 3.26306514 2602.2510 0.9998304
## a_tank[41] -1.97239043 0.5394443 -2.85146694 -1.16407809 2329.1687 0.9995332
## a_tank[42] -0.55711796 0.3682430 -1.13769118 0.02123318 2894.4078 1.0000874
## a_tank[43] -0.45436412 0.3560515 -1.04265060 0.07851017 2832.7489 0.9996749
## a_tank[44] -0.31934535 0.3387014 -0.86836099 0.23405218 3817.7351 0.9993314
## a_tank[45] 0.61928821 0.3434405 0.06392546 1.18546766 2886.4814 0.9999009
## a_tank[46] -0.57206607 0.3636749 -1.14990762 -0.01684145 3097.2557 0.9996705
## a_tank[47] 2.02317812 0.4944193 1.32532642 2.89024805 2076.3992 1.0012951
## a_tank[48] 0.03882241 0.3239442 -0.47189691 0.55772491 3080.1072 0.9997161
## a 1.34653453 0.2784974 0.91033229 1.79967640 1959.5536 1.0005967
## sigma 1.27182493 0.2210644 0.95808918 1.65243986 1578.0788 1.0008039
compare(m12m1.base, m12m4.student)
## WAIC SE dWAIC dSE pWAIC weight
## m12m1.base 1010.119 37.95516 0.00000 NA 37.91191 0.6358877
## m12m4.student 1011.235 37.90172 1.11512 1.442119 38.89202 0.3641123
13M5. Modify the cross-classified chimpanzees model m13.4 so that the adaptive prior for blocks contains a parameter \(\bar{γ}\) for its mean: \[\begin{align} γ_j ∼ Normal(\bar{γ}, σ_γ) \\ \bar{γ} ∼ Normal(0, 1.5) \\ \end{align}\]
Compare the precis output of this model to m13.4. What has including \(\bar{γ}\) done?
data(chimpanzees)
d=chimpanzees
m13m6.1 <- map2stan(
alist(
pulled_left ~ dnorm(mu , 1),
mu~dnorm(10,1)
),
data=d , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL 'fe141bd866a0799de7ca5f8bf5fa12ac' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.016 seconds (Warm-up)
## Chain 1: 0.018 seconds (Sampling)
## Chain 1: 0.034 seconds (Total)
## Chain 1:
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
m13m6.2 <- map2stan(
alist(
pulled_left ~ dstudent(2, mu , 1),
mu~dnorm(10,1)
),
data=d , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL '33f83fde0b1af44423a13d49cd6d902b' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.085 seconds (Warm-up)
## Chain 1: 0.068 seconds (Sampling)
## Chain 1: 0.153 seconds (Total)
## Chain 1:
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
m13m6.3 <- map2stan(
alist(
pulled_left ~ dnorm(mu , 1),
mu~dstudent(2,10,1)
),
data=d , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL 'b8ca916b40f639d4978251e289c6174d' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.019 seconds (Warm-up)
## Chain 1: 0.016 seconds (Sampling)
## Chain 1: 0.035 seconds (Total)
## Chain 1:
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
m13m6.4 <- map2stan(
alist(
pulled_left ~ dstudent(2,mu , 1),
mu~dstudent(2,10,1)
),
data=d , iter=2000 , chains=1, cores=1
)
##
## SAMPLING FOR MODEL '621032ae053e6f01672856d0ccb8bfb9' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.09 seconds (Warm-up)
## Chain 1: 0.088 seconds (Sampling)
## Chain 1: 0.178 seconds (Total)
## Chain 1:
## Error in .local(fit, data, n, ...) :
## There appear to be no linear models here
precis(m13m6.1)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.5982653 0.04303706 0.5320732 0.6648921 181.3885 1.015602
precis(m13m6.2)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.6180311 0.04301479 0.5513719 0.6892273 339.8694 1.004882
precis(m13m6.3)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.580337 0.04292913 0.5130386 0.6481294 431.7987 1.000117
precis(m13m6.4)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu 0.6023575 0.04643864 0.5265981 0.6770711 436.3235 0.9999405
13M6. Sometimes the prior and the data (through the likelihood) are in conflict, because they concentrate around different regions of parameter space. What happens in these cases depends a lot upon the shape of the tails of the distributions. Likewise, the tails of distributions strongly influence can outliers are shrunk or not towards the mean. I want you to consider four different models to fit to one observation at y = 0. The models differ only in the distributions assigned to the likelihood and prior. Here are the four models:
\[\begin{align} Model \;NN: y &∼ Normal(μ, 1) & Model \;TN: y &∼ Student(2, μ, 1) \\ μ &∼ Normal(10, 1) & μ &∼ Normal(10, 1) \\ Model \;NT: y &∼ Normal(μ, 1) & Model \;TT: y &∼ Student(2, μ, 1) \\ μ &∼ Student(2, 10, 1) & μ &∼ Student(2, 10, 1) \\ \end{align}\]
Estimate and plot the posterior distributions against the likelihoods for these models and compare them. Can you explain the results, using the properties of the distributions?
data(bangladesh)
d <- bangladesh
library(dplyr)
d$district_id <- as.integer(as.factor(d$district))
d$use_contraception <- d$use.contraception
d$age_centered <- d$age.centered
d <- select(d,-use.contraception, -age.centered, -district)
m12h1.d.fixed <- map2stan(
alist(
use_contraception ~ dbinom(1, p),
logit(p) <- a_district[district_id],
a_district[district_id] ~ dnorm(0, 5)
),
data=d
)
##
## SAMPLING FOR MODEL '606cfaafb7f8697ba6bbed70d3d8767e' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 9.812 seconds (Warm-up)
## Chain 1: 7.967 seconds (Sampling)
## Chain 1: 17.779 seconds (Total)
## Chain 1:
precis(m12h1.d.fixed, depth=2)
## mean sd 5.5% 94.5% n_eff
## a_district[1] -1.072102617 0.2137909 -1.42805575 -0.73004922 1759.2174
## a_district[2] -0.657081441 0.4530150 -1.38954571 0.09524768 1770.7009
## a_district[3] 4.389770727 3.0030453 0.31813275 9.90067610 1099.2023
## a_district[4] 0.006329282 0.3696306 -0.56760708 0.60723734 2206.0154
## a_district[5] -0.593753152 0.3305980 -1.11745843 -0.07855726 1688.3489
## a_district[6] -0.902013348 0.2857676 -1.36342677 -0.44783113 1725.3339
## a_district[7] -1.018063708 0.5716796 -2.01868416 -0.16110297 1468.9568
## a_district[8] -0.531158677 0.3535431 -1.10485623 0.03296472 1871.1483
## a_district[9] -0.864096485 0.4550723 -1.62540374 -0.15694338 2321.5027
## a_district[10] -2.824782106 1.2034191 -4.89101232 -1.17173252 812.9747
## a_district[11] -6.334049818 2.6428256 -11.28055363 -2.89486392 721.2602
## a_district[12] -0.663637931 0.3938240 -1.28290546 -0.05476155 1770.9313
## a_district[13] -0.340727270 0.4150956 -1.03860313 0.31065933 1857.5298
## a_district[14] 0.526208506 0.1837122 0.23569554 0.81866993 1613.6967
## a_district[15] -0.577699854 0.4457568 -1.29283493 0.10227389 1627.8422
## a_district[16] 0.198783371 0.4594249 -0.54870757 0.93587552 2840.9806
## a_district[17] -0.908615208 0.4411197 -1.62099750 -0.22341036 1635.8471
## a_district[18] -0.674686142 0.3072829 -1.16930961 -0.20767970 1976.8763
## a_district[19] -0.478962703 0.4093944 -1.14357474 0.18467421 1247.1950
## a_district[20] -0.406904678 0.5264318 -1.26797979 0.40972255 1569.2671
## a_district[21] -0.480725372 0.4701304 -1.23771171 0.24738151 1772.6369
## a_district[22] -1.451786283 0.5838221 -2.41568977 -0.59455904 2199.4232
## a_district[23] -1.082908099 0.5940057 -2.04419287 -0.16818497 2284.0512
## a_district[24] -2.905956094 1.2296629 -5.06197344 -1.23899934 913.7585
## a_district[25] -0.210225444 0.2372513 -0.57645056 0.16897614 1544.6182
## a_district[26] -0.476227776 0.6156459 -1.46037413 0.44601225 1846.4447
## a_district[27] -1.569011880 0.4196847 -2.27476853 -0.91611475 1719.9670
## a_district[28] -1.140325798 0.3354959 -1.66123581 -0.62458431 1855.9477
## a_district[29] -0.952070313 0.4012863 -1.61588574 -0.32184634 2032.7807
## a_district[30] -0.023058897 0.2665411 -0.43002708 0.41225118 2361.0035
## a_district[31] -0.184032640 0.3553425 -0.75417488 0.38057049 1703.4610
## a_district[32] -1.400413091 0.5309058 -2.25996780 -0.58178531 1716.5325
## a_district[33] -0.324295525 0.5440802 -1.20485169 0.50150125 2250.1928
## a_district[34] 0.675362333 0.3648040 0.08461127 1.26620670 1668.1995
## a_district[35] -0.009757031 0.2844876 -0.45595365 0.45561787 2016.0144
## a_district[36] -0.640673127 0.5062309 -1.48575884 0.15135343 2278.4637
## a_district[37] 0.154453607 0.6098288 -0.81246558 1.10559596 1636.1711
## a_district[38] -0.986555922 0.6114873 -2.04257380 -0.03182713 1175.5729
## a_district[39] 0.011187054 0.4387506 -0.68524669 0.72588218 2130.4812
## a_district[40] -0.151622565 0.3082044 -0.63467565 0.36841796 1810.5770
## a_district[41] 0.001600730 0.3938917 -0.61411922 0.62385336 1775.5519
## a_district[42] 0.188357503 0.6143518 -0.80228336 1.15666228 1609.3773
## a_district[43] 0.144579540 0.3027415 -0.33059243 0.62825865 1482.1193
## a_district[44] -1.319773282 0.4734572 -2.13032506 -0.62727901 1829.3016
## a_district[45] -0.705555455 0.3322112 -1.23921847 -0.21148301 1838.6626
## a_district[46] 0.092408472 0.2221479 -0.24458954 0.44639219 1781.3449
## a_district[47] -0.138539778 0.5424842 -0.99306841 0.69813418 2308.9120
## a_district[48] 0.108875495 0.3165803 -0.39889533 0.60805363 1690.1944
## a_district[49] -5.145272445 2.9510932 -10.38563367 -1.14713356 888.5611
## a_district[50] -0.108357703 0.4905844 -0.88078818 0.69551944 1926.3100
## a_district[51] -0.172409744 0.3425276 -0.71680058 0.37208526 1523.2131
## a_district[52] -0.230052084 0.2453828 -0.62921388 0.14810414 3000.0000
## a_district[53] -0.347083658 0.4619727 -1.06929767 0.38570957 1784.2491
## a_district[54] -1.906162480 1.2299254 -4.01995609 -0.16359976 1536.4020
## a_district[55] 0.310951249 0.3068530 -0.18878792 0.81249093 1524.4008
## a_district[56] -1.543761059 0.5209212 -2.41458183 -0.71986641 1609.7171
## a_district[57] -0.187713436 0.3551131 -0.73573982 0.37782481 1651.1866
## a_district[58] -2.529890979 1.1746692 -4.61135196 -0.89461965 1560.6282
## a_district[59] -1.329505046 0.4572356 -2.05345142 -0.64295823 1897.8802
## a_district[60] -1.329586798 0.4137526 -2.03130994 -0.70048658 2092.0218
## Rhat4
## a_district[1] 0.9993160
## a_district[2] 0.9993355
## a_district[3] 1.0023322
## a_district[4] 0.9992635
## a_district[5] 1.0010909
## a_district[6] 0.9991173
## a_district[7] 0.9991772
## a_district[8] 0.9990370
## a_district[9] 0.9990070
## a_district[10] 1.0024630
## a_district[11] 0.9990163
## a_district[12] 0.9992487
## a_district[13] 1.0005747
## a_district[14] 0.9990063
## a_district[15] 0.9995213
## a_district[16] 0.9991971
## a_district[17] 0.9991879
## a_district[18] 0.9993857
## a_district[19] 1.0017839
## a_district[20] 0.9990639
## a_district[21] 1.0003544
## a_district[22] 0.9990479
## a_district[23] 0.9993848
## a_district[24] 0.9991842
## a_district[25] 1.0002157
## a_district[26] 1.0031916
## a_district[27] 0.9990516
## a_district[28] 0.9991063
## a_district[29] 0.9990739
## a_district[30] 0.9990593
## a_district[31] 0.9990303
## a_district[32] 0.9998286
## a_district[33] 0.9990292
## a_district[34] 0.9998279
## a_district[35] 0.9994280
## a_district[36] 0.9990748
## a_district[37] 1.0002783
## a_district[38] 0.9990032
## a_district[39] 1.0009628
## a_district[40] 1.0042141
## a_district[41] 0.9990007
## a_district[42] 0.9990042
## a_district[43] 1.0013023
## a_district[44] 0.9990004
## a_district[45] 0.9994647
## a_district[46] 0.9991833
## a_district[47] 0.9996840
## a_district[48] 1.0019008
## a_district[49] 1.0010683
## a_district[50] 0.9996462
## a_district[51] 0.9999841
## a_district[52] 0.9993815
## a_district[53] 0.9990519
## a_district[54] 0.9990925
## a_district[55] 1.0004549
## a_district[56] 0.9992931
## a_district[57] 0.9993651
## a_district[58] 1.0003298
## a_district[59] 1.0004625
## a_district[60] 0.9990036
m12h1.d.pooled <- map2stan(
alist(
use_contraception ~ dbinom(1, p),
logit(p) <- a_district[district_id] ,
a_district[district_id] ~ dnorm(a, sigma),
a ~ dnorm(0,5) ,
sigma ~ dcauchy(0,1)
),
data=d
)
##
## SAMPLING FOR MODEL '80c4b9e5da8e1271444b19d2f580cb02' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.001 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 9.365 seconds (Warm-up)
## Chain 1: 7.97 seconds (Sampling)
## Chain 1: 17.335 seconds (Total)
## Chain 1:
precis(m12h1.d.pooled, depth=2)
## mean sd 5.5% 94.5% n_eff
## a_district[1] -0.9952418264 0.19762770 -1.3095396 -0.68268294 1517.3864
## a_district[2] -0.6088449189 0.35010765 -1.1765116 -0.05073988 983.8339
## a_district[3] -0.2079475887 0.53076349 -1.0372206 0.62719814 1523.5991
## a_district[4] -0.1808232146 0.33595894 -0.7018847 0.33743445 2034.7408
## a_district[5] -0.5674166118 0.27864181 -1.0161732 -0.12293170 2033.1900
## a_district[6] -0.8146527461 0.24808313 -1.2034506 -0.43187273 2005.0725
## a_district[7] -0.7733050700 0.35408086 -1.3577264 -0.23978375 1540.8441
## a_district[8] -0.5189292871 0.29131414 -0.9991970 -0.06297920 2156.2197
## a_district[9] -0.7188318792 0.34974793 -1.2909686 -0.17886188 1095.5220
## a_district[10] -1.1448907882 0.43676256 -1.8867004 -0.48004626 1298.8828
## a_district[11] -1.5480450188 0.43371594 -2.2296343 -0.91867238 1089.4320
## a_district[12] -0.6016506122 0.31246080 -1.1012035 -0.10100251 1611.6650
## a_district[13] -0.4352698765 0.35373768 -1.0085951 0.10766684 1355.9872
## a_district[14] 0.3965968010 0.17683249 0.1192271 0.66723627 1516.5416
## a_district[15] -0.5603837483 0.32041174 -1.0691630 -0.04253889 1920.4867
## a_district[16] -0.1236058532 0.35631427 -0.6872928 0.44038903 1660.2834
## a_district[17] -0.7398203011 0.34668206 -1.3026188 -0.19112234 1612.4297
## a_district[18] -0.6346022749 0.27722339 -1.0852616 -0.16625327 1579.8465
## a_district[19] -0.5068165317 0.30959093 -0.9807180 -0.03909616 1348.4235
## a_district[20] -0.4821856183 0.34022096 -1.0680237 0.06486509 2633.8645
## a_district[21] -0.5024888280 0.36844813 -1.1025959 0.05230378 997.6989
## a_district[22] -0.9770600195 0.37006361 -1.6116773 -0.40957980 1673.4897
## a_district[23] -0.7899391530 0.37983778 -1.4203932 -0.20168543 1359.5354
## a_district[24] -1.1876971047 0.41716399 -1.8693258 -0.55392903 1001.1269
## a_district[25] -0.2717109312 0.23508926 -0.6513938 0.11086205 1590.6867
## a_district[26] -0.5189743919 0.40737534 -1.1277721 0.14977235 1612.9430
## a_district[27] -1.1786882413 0.30350371 -1.6755885 -0.68880812 1293.4877
## a_district[28] -0.9680049311 0.28223687 -1.4193030 -0.52707448 1320.3373
## a_district[29] -0.8028558194 0.33260522 -1.3323373 -0.27659764 1971.7836
## a_district[30] -0.1420782731 0.23852953 -0.5325509 0.23487983 2022.0105
## a_district[31] -0.3009740317 0.32654627 -0.8180134 0.23143681 1911.7343
## a_district[32] -0.9817387736 0.34476977 -1.5563792 -0.41924207 1883.0043
## a_district[33] -0.4208518214 0.39239997 -1.0255764 0.18190997 1544.0025
## a_district[34] 0.2790565214 0.29691680 -0.2107673 0.74435314 1469.1055
## a_district[35] -0.1302003417 0.26099952 -0.5324672 0.29391122 1521.4283
## a_district[36] -0.5823145708 0.36815981 -1.1728741 0.01719886 1507.7356
## a_district[37] -0.2078219283 0.40535025 -0.8423875 0.41047440 1273.2905
## a_district[38] -0.7207010510 0.39553086 -1.3607691 -0.08029366 2168.9123
## a_district[39] -0.1910847243 0.30877358 -0.6737267 0.31674694 1790.1513
## a_district[40] -0.2531899187 0.26274972 -0.6670416 0.17709795 1570.8130
## a_district[41] -0.2111422921 0.32309667 -0.7238555 0.29838646 1865.2964
## a_district[42] -0.2373062329 0.43061125 -0.9178187 0.45475243 1797.9639
## a_district[43] -0.0346163795 0.25599995 -0.4557247 0.38206578 1398.8845
## a_district[44] -0.9622834427 0.33592314 -1.5069031 -0.43521604 1305.0785
## a_district[45] -0.6532243503 0.28629485 -1.0989202 -0.21073515 1875.3798
## a_district[46] 0.0003202191 0.20083469 -0.3360758 0.32251881 1532.6288
## a_district[47] -0.3548489509 0.39657189 -0.9796051 0.28572589 2024.8291
## a_district[48] -0.0790750997 0.26801972 -0.5096833 0.38925270 1224.8906
## a_district[49] -0.8744772578 0.50105093 -1.6631061 -0.08833936 1549.6903
## a_district[50] -0.3078326260 0.32571184 -0.8140487 0.20831126 1954.1471
## a_district[51] -0.2811811221 0.28012708 -0.7226188 0.17732661 1759.3777
## a_district[52] -0.2834035758 0.23727300 -0.6679077 0.09076430 1747.1309
## a_district[53] -0.4251872497 0.37227244 -1.0123378 0.18268299 1888.6718
## a_district[54] -0.8032873083 0.48593301 -1.5719346 -0.07596136 1915.5619
## a_district[55] 0.1030934804 0.25032395 -0.2842082 0.50182724 2022.8758
## a_district[56] -1.0647557618 0.34398347 -1.6296879 -0.54911096 1548.5723
## a_district[57] -0.3096624598 0.28424219 -0.7579548 0.14232983 1936.5587
## a_district[58] -1.0220611572 0.39411013 -1.6761888 -0.40410800 1354.0654
## a_district[59] -1.0092804139 0.30923831 -1.5138692 -0.52649585 1329.4785
## a_district[60] -1.0541732731 0.30965316 -1.5555244 -0.58096410 1244.9016
## a -0.5404941972 0.08490396 -0.6773998 -0.40758979 537.6329
## sigma 0.5221273980 0.08625650 0.4024285 0.66228372 452.5457
## Rhat4
## a_district[1] 1.0001980
## a_district[2] 0.9997308
## a_district[3] 0.9990057
## a_district[4] 0.9991405
## a_district[5] 0.9989999
## a_district[6] 0.9997507
## a_district[7] 0.9991202
## a_district[8] 0.9993154
## a_district[9] 0.9990041
## a_district[10] 0.9990089
## a_district[11] 0.9992277
## a_district[12] 1.0024089
## a_district[13] 0.9994242
## a_district[14] 0.9990693
## a_district[15] 0.9999119
## a_district[16] 1.0009395
## a_district[17] 0.9990052
## a_district[18] 0.9999453
## a_district[19] 0.9990008
## a_district[20] 0.9994104
## a_district[21] 0.9990246
## a_district[22] 0.9993737
## a_district[23] 0.9990678
## a_district[24] 1.0041473
## a_district[25] 1.0003518
## a_district[26] 0.9990350
## a_district[27] 0.9990052
## a_district[28] 0.9993405
## a_district[29] 0.9990156
## a_district[30] 0.9990043
## a_district[31] 1.0006305
## a_district[32] 0.9991037
## a_district[33] 1.0002939
## a_district[34] 0.9990166
## a_district[35] 0.9990357
## a_district[36] 1.0010725
## a_district[37] 0.9999908
## a_district[38] 0.9990183
## a_district[39] 0.9999384
## a_district[40] 0.9993915
## a_district[41] 0.9989999
## a_district[42] 0.9993253
## a_district[43] 0.9991574
## a_district[44] 0.9992013
## a_district[45] 0.9998078
## a_district[46] 0.9995670
## a_district[47] 1.0001970
## a_district[48] 0.9991782
## a_district[49] 0.9995850
## a_district[50] 1.0010728
## a_district[51] 0.9990831
## a_district[52] 0.9998629
## a_district[53] 1.0001709
## a_district[54] 1.0016909
## a_district[55] 0.9994055
## a_district[56] 0.9995129
## a_district[57] 0.9993421
## a_district[58] 0.9991230
## a_district[59] 0.9990150
## a_district[60] 0.9993969
## a 1.0020052
## sigma 1.0014185
compare(m12h1.d.fixed, m12h1.d.pooled)
## WAIC SE dWAIC dSE pWAIC weight
## m12h1.d.pooled 2515.337 25.10320 0.00000 NA 36.38591 0.9998394818
## m12h1.d.fixed 2532.811 32.47771 17.47389 12.12175 61.94509 0.0001605182
# Pooled model looks superior according to WAIC comparison
cacl_district_rate_estimate <- function(model, model_name, district_data){
sf <- extract.samples(model)
sf$a_district <- logistic(sf$a_district)
d1 <- data.frame(district_data)
d1$rate <- apply(sf$a_district, 2, mean)
pi <- apply(sf$a_district, 2, PI)
d1$rate_low <- pi[1,]
d1$rate_high <- pi[2,]
d1$model <- model_name
d1
}
d.res <- d %>%
group_by(district_id) %>%
summarise(
cnt=n(),
ttl_use_c=sum(use_contraception),
rate=ttl_use_c/cnt
) %>%
as.data.frame() %>%
mutate(
d_label = reorder(as.factor(paste0(district_id,'/n=',cnt)), cnt)
) %>%
arrange(district_id)
d1 <- cacl_district_rate_estimate(m12h1.d.fixed, 'fixed', d.res)
d2 <- cacl_district_rate_estimate(m12h1.d.pooled, 'pooled', d.res)
d.res.models <- bind_rows(d1,d2)
d.res <- arrange(d.res, cnt) %>% mutate(cnt_sorted_id=1:nrow(d.res))
pd <- position_dodge(0.3)
d.res.models %>%
ggplot(aes(d_label, rate, color=model, fill=model)) +
geom_point(data=d.res, size=3, aes(fill='raw', color='raw')) +
geom_point(size=2, position = pd) +
geom_errorbar(aes(ymin=rate_low, ymax=rate_high), position = pd) +
geom_hline(yintercept = mean(d$use_contraception), linetype='longdash') +
geom_hline(yintercept = mean(d.res$rate), linetype='dotdash' ) +
theme(axis.text.x = element_text(angle = 90))
ggtitle('Predicted vs. empirical rate of women who use contraception per district')
## $title
## [1] "Predicted vs. empirical rate of women who use contraception per district"
##
## attr(,"class")
## [1] "labels"
# This plot shows per district rate of contraception use. Districts are sorted by the number of observations from the smallest to the largest.
# We can see that differences between pooled, fixed and empirical estimates are tiny for the right part of the plot, where number of observations per district is ~100.
# But for teh first 10 districts, that have less than 15 observations, pooled estimates are shrinked towards the sample mean.
EXTRA CREDIT (10 POINTS)
13H1. In 1980, a typical Bengali woman could have 5 or more children in her lifetime. By the year 2000, a typical Bengali woman had only 2 or 3. You’re going to look at a historical set of data, when contraception was widely available but many families chose not to use it. These data reside in data(bangladesh) and come from the 1988 Bangladesh Fertility Survey. Each row is one of 1934 women. There are six variables, but you can focus on two of them for this practice problem: (1) district: ID number of administrative district each woman resided in (2) use.contraception: An indicator (0/1) of whether the woman was using contraception The first thing to do is ensure that the cluster variable, district, is a contiguous set of integers. Recall that these values will be index values inside the model. If there are gaps, you’ll have parameters for which there is no data to inform them. Worse, the model probably won’t run. Look at the unique values of the district variable:
data(bangladesh)
d <- bangladesh
sort(unique(d$district)) #District 54 is absent. So district isn’t yet a good index variable, because it’s not contiguous. This is easy to fix. Just make a new variable that is contiguous. This is enough to do it:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 55 56 57 58 59 60 61
d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id)) # Now there are 60 values, contiguous integers 1 to 60. Now, focus on predicting use.contraception, clustered by district_id.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60
Fit both (1) a traditional fixed-effects model that uses an index variable for district and (2) a multilevel model with varying intercepts for district. Plot the predicted proportions of women in each district using contraception, for both the fixed-effects model and the varying-effects model. That is, make a plot in which district ID is on the horizontal axis and expected proportion using contraception is on the vertical. Make one plot for each model, or layer them on the same plot, as you prefer. How do the models disagree? Can you explain the pattern of disagreement? In particular, can you explain the most extreme cases of disagreement, both why they happen where they do and why the models reach different inferences?
data(bangladesh)
d <- bangladesh
sort(unique(d$district))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 55 56 57 58 59 60 61
# District 54 is absent. So district isn’t yet a good index variable, because it’s not contiguous. This is easy to fix. Just make a new variable that is contiguous. This is enough to do it.
d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60
# Now there are 60 values, contiguous integers 1 to 60. Now, focus on predicting use.contraception, clustered by district_id.