Chapter 14 - Adventures in Covariance

This chapter extended the basic multilevel strategy of partial pooling to slopes as well as intercepts. Accomplishing this meant modeling covariation in the statistical population of parameters. The LKJcorr prior was introduced as a convenient family of priors for correlation matrices. You saw how covariance models can be applied to causal inference, using instrumental variables and the front-door criterion. Gaussian processes represent a practical method of extending the varying effects strategy to continuous dimensions of similarity, such as spatial, network, phylogenetic, or any other abstract distance between entities in the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

14E1. Add to the following model varying slopes on the predictor x. \[\begin{align} \ y_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{group[i]} + βx_i \\ \ α_{group} ∼ Normal(α, σ_α) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(1) \\ \ σ_α ∼ Exponential(1) \\ \end{align}\]

#y_i \sim Normal(\mu_i, \sigma)\\
#\mu_i = \alpha_{GROUP[i]} + {\beta_{GROUP[i]}}x_i\\
#{
#\begin{bmatrix}
#\alpha_{GROUP} \\
#\beta_{GROUP}
#\end{bmatrix}
#}
#\sim Normal\left(\begin{bmatrix}
#\alpha \\
#\beta
#\end{bmatrix}
#, S \right)\\
#S =  
#\begin{pmatrix}
#      \sigma_{\alpha} & 0 \\
#      0 & \sigma_{\beta}
#\end{pmatrix}

#\begin{pmatrix}
#      \sigma_{\alpha} & 0 \\
#      0 & \sigma_{\beta}
#\end{pmatrix}\\
#\alpha \sim Normal(0, 10)\\
#\beta \sim  Normal(0, 1)\\
#\sigma \sim HalfCauchy(0, 2)\\
#\sigma_{\alpha} \sim HalfCauchy(0, 2)\\
#\sigma_{\beta} \sim HalfCauchy(0, 2)\\
# \sim LKJcorr(2)

14E2. Think up a context in which varying intercepts will be positively correlated with varying slopes. Provide a mechanistic explanation for the correlation.

#Think about the cafes example. Now suppose that cafes with a long waiting time in the morning have a even longer (short) waiting time in the evening and vice verca. This will result in a posetive (negativ) correlation.
#The correlation measures the strengh and direction of a linear conection between X and Y.
#rho_{X, Y} = corr(X, Y) = \frac{cov(X,Y)}{\sigma_X\sigma_Y}

14E3. When is it possible for a varying slopes model to have fewer effective parameters (as estimated by WAIC or PSIS) than the corresponding model with fixed (unpooled) slopes? Explain.

#If the prior assigned to each intercept shrinks them all towards the mean, this will result in fewer effective parameters. 
#If we have an aggressive regularizing prior, this will result in a less flexible posterior and therefore fewer effective parameters."The [1] index in each vector is the varying intercept standard deviation, while the [2] and [3] are the slopes. While these are just posterior means, and the amount of shrinkage averages over the entire posterior, you can get a sense from the small values that shrinkage is pretty aggressive here. This is what takes the model from 56 actual parameters to 18 effective parameters, as measured by WAIC."

14M1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- 0 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(6)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)
m_M1 <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 6, cores = 6
)
post <- extract.samples(m_M1)
ggplot() +
  stat_halfeye(aes(x = post$Rho[, 1, 2])) +
  theme_bw() +
  labs(x = "Rho")

precis(m_M1, pars = "Rho[1,2]")
##              result
## mean     0.01308577
## sd       0.23733085
## 5.5%    -0.36905311
## 94.5%    0.40167809
## n_eff 2612.70739645
## Rhat     0.99993719

14M2. Fit this multilevel model to the simulated café data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.

# set up parameters of population
a <- 3.5 # average morning wait time
b <- -1 # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- -0.7 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(42)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)
m_M2Baseline <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 4, cores = 4
)
m_M2 <- ulam(
  alist(
    wait ~ dnorm(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    a_cafe[cafe] ~ dnorm(a, sigma_alpha),
    b_cafe[cafe] ~ dnorm(b, sigma_beta),
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 10),
    sigma ~ dexp(1),
    sigma_alpha ~ dexp(1),
    sigma_beta ~ dexp(1)
  ),
  data = d, chains = 4, cores = 4
)
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
post_Base <- extract.samples(m_M2Baseline)
a_Base <- apply(post_Base$a_cafe, 2, mean)
b_Base <- apply(post_Base$b_cafe, 2, mean)
post_M2 <- extract.samples(m_M2)
a_M2 <- apply(post_M2$a_cafe, 2, mean)
b_M2 <- apply(post_M2$b_cafe, 2, mean)
plot(a_M2, b_M2,
  xlab = "intercept", ylab = "slope",
  pch = 16, col = rangi2, ylim = c(min(b_M2) - 0.05, max(b_M2) + 0.05),
  xlim = c(min(a_M2) - 0.1, max(a_M2) + 0.1), cex = 2
)
points(a_Base, b_Base, pch = 1, cex = 2)

14M3. Re-estimate the varying slopes model for the UCBadmit data, now using a non-centered parameterization. Compare the efficiency of the forms of the model, using n_eff. Which is better? Which chain sampled faster?

data(UCBadmit)
d <- UCBadmit
dat_list <- list(
  admit = d$admit,
  applications = d$applications,
  male = ifelse(d$applicant.gender == "male", 1, 0),
  dept_id = rep(1:6, each = 2)
)
str(dat_list)
## List of 4
##  $ admit       : int [1:12] 512 89 353 17 120 202 138 131 53 94 ...
##  $ applications: int [1:12] 825 108 560 25 325 593 417 375 191 393 ...
##  $ male        : num [1:12] 1 0 1 0 1 0 1 0 1 0 ...
##  $ dept_id     : int [1:12] 1 1 2 2 3 3 4 4 5 5 ...
Begin_C <- Sys.time()
m_M3 <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a[dept_id] + bm[dept_id] * male,
    c(a, bm)[dept_id] ~ multi_normal(c(a_bar, bm_bar), Rho, sigma_dept),
    a_bar ~ dnorm(0, 1),
    bm_bar ~ dnorm(0, 1),
    sigma_dept ~ dexp(1),
    Rho ~ dlkjcorr(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
## Warning: There were 1 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
End_C <- Sys.time()
precis(m_M3, 3)
##                      mean           sd       5.5%      94.5%     n_eff
## bm[1]         -0.76728926 2.673536e-01 -1.1965420 -0.3477217 1056.1396
## bm[2]         -0.21514939 3.019798e-01 -0.7070368  0.2541455 1089.9844
## bm[3]          0.08072888 1.454014e-01 -0.1548227  0.3052824 2093.6313
## bm[4]         -0.09659289 1.390897e-01 -0.3117404  0.1291158 1300.7489
## bm[5]          0.11771462 1.850484e-01 -0.1749345  0.4103478 1395.8194
## bm[6]         -0.12162066 2.537355e-01 -0.5405836  0.2774844 1584.8821
## a[1]           1.28182951 2.537503e-01  0.8708790  1.6866249 1126.4066
## a[2]           0.74551009 3.031217e-01  0.2777246  1.2410216 1085.4284
## a[3]          -0.64512741 8.655395e-02 -0.7822318 -0.5043864 2258.8750
## a[4]          -0.61699318 1.037801e-01 -0.7829737 -0.4585176 1284.1910
## a[5]          -1.12989019 1.117887e-01 -1.3102466 -0.9526836 1516.6141
## a[6]          -2.59249602 1.897452e-01 -2.8979317 -2.2773330 1706.2246
## a_bar         -0.36228767 5.168624e-01 -1.1780189  0.4939375 2009.5696
## bm_bar        -0.17051278 2.097375e-01 -0.5019040  0.1591672 1436.8215
## sigma_dept[1]  1.47087671 4.534269e-01  0.9264733  2.2956434 1634.5424
## sigma_dept[2]  0.44403707 2.146551e-01  0.1848376  0.8030274  811.6385
## Rho[1,1]       1.00000000 0.000000e+00  1.0000000  1.0000000       NaN
## Rho[1,2]      -0.33281867 3.276035e-01 -0.7940553  0.2525757 1583.9537
## Rho[2,1]      -0.33281867 3.276035e-01 -0.7940553  0.2525757 1583.9537
## Rho[2,2]       1.00000000 8.145354e-17  1.0000000  1.0000000 2000.5663
##                   Rhat4
## bm[1]         1.0011513
## bm[2]         1.0037393
## bm[3]         0.9995542
## bm[4]         1.0006212
## bm[5]         0.9995703
## bm[6]         1.0020097
## a[1]          1.0009658
## a[2]          1.0043315
## a[3]          1.0003773
## a[4]          1.0007484
## a[5]          0.9983855
## a[6]          1.0021877
## a_bar         1.0035289
## bm_bar        1.0022218
## sigma_dept[1] 1.0023342
## sigma_dept[2] 1.0030660
## Rho[1,1]            NaN
## Rho[1,2]      0.9998541
## Rho[2,1]      0.9998541
## Rho[2,2]      0.9979980
Begin_NC <- Sys.time()
m_M3NonCent <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a_bar + v[dept_id, 1] + (bm_bar + v[dept_id, 2]) * male,
    transpars > matrix[dept_id, 2]:v <- compose_noncentered(sigma_dept, L_Rho, z),
    matrix[2, dept_id]:z ~ dnorm(0, 1),
    a_bar ~ dnorm(0, 1.5),
    bm_bar ~ dnorm(0, 1),
    vector[2]:sigma_dept ~ dexp(1),
    cholesky_factor_corr[2]:L_Rho ~ lkj_corr_cholesky(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
End_NC <- Sys.time()
precis(m_M3NonCent, 3)
##                      mean        sd        5.5%       94.5%     n_eff     Rhat4
## z[1,1]         1.23477592 0.5535698  0.41561508  2.16812164  571.8684 1.0006700
## z[1,2]         0.85622509 0.5099890  0.07290837  1.72021855  558.7171 1.0011555
## z[1,3]        -0.14601009 0.3955301 -0.76082181  0.49990436  497.6378 1.0020836
## z[1,4]        -0.12489755 0.3970778 -0.76387643  0.54051648  513.8818 1.0009937
## z[1,5]        -0.49987532 0.4075483 -1.15619280  0.16385946  534.6285 1.0020257
## z[1,6]        -1.55990345 0.5741875 -2.49959622 -0.65410407  688.7347 1.0033130
## z[2,1]        -1.18918126 0.8094670 -2.47812398  0.04760238 1211.7757 0.9994392
## z[2,2]         0.17697210 0.8042798 -1.13615757  1.42376307 1862.8265 1.0004954
## z[2,3]         0.63997551 0.6225204 -0.30374341  1.63869613 1148.7481 1.0015636
## z[2,4]         0.13803191 0.5884489 -0.84403657  1.09140834 1415.4605 0.9999269
## z[2,5]         0.59888118 0.6648959 -0.45598216  1.68079773 1288.1685 0.9998901
## z[2,6]        -0.42993856 0.8083375 -1.73926892  0.83763542 1278.7622 0.9995814
## a_bar         -0.42924099 0.5957797 -1.41079633  0.54191276  484.2472 1.0027684
## bm_bar        -0.16531002 0.2144009 -0.50509417  0.15660782  982.2696 1.0033215
## sigma_dept[1]  1.49933212 0.4627462  0.92956692  2.37712056  689.9083 1.0021833
## sigma_dept[2]  0.43980918 0.2261581  0.16390312  0.82048380  671.2762 1.0090505
## L_Rho[1,1]     1.00000000 0.0000000  1.00000000  1.00000000       NaN       NaN
## L_Rho[1,2]     0.00000000 0.0000000  0.00000000  0.00000000       NaN       NaN
## L_Rho[2,1]    -0.31215447 0.3486588 -0.80449754  0.29718413 1593.6956 1.0013815
## L_Rho[2,2]     0.87250654 0.1407112  0.59114977  0.99917694 1329.4438 1.0014389
## v[1,1]         1.69751965 0.6330367  0.72021892  2.72731703  541.4973 1.0024496
## v[1,2]        -0.58585292 0.3093785 -1.10712630 -0.12271885 1126.5245 1.0016512
## v[2,1]         1.17160617 0.6542532  0.13254366  2.19781667  592.4121 1.0017427
## v[2,2]        -0.04783454 0.3252172 -0.56985585  0.48054848 1766.1048 1.0003284
## v[3,1]        -0.21543051 0.6019046 -1.18561590  0.72799206  485.6293 1.0031505
## v[3,2]         0.23920026 0.2402506 -0.12075227  0.62948847  918.4453 1.0047373
## v[4,1]        -0.18640503 0.6031192 -1.15052684  0.77804809  497.7053 1.0021949
## v[4,2]         0.07004324 0.2375508 -0.28630451  0.44793376 1138.9613 1.0023857
## v[5,1]        -0.70349189 0.6031298 -1.68767250  0.27159266  491.4576 1.0033947
## v[5,2]         0.28077994 0.2637041 -0.08975926  0.73920632  984.8848 1.0027847
## v[6,1]        -2.16418301 0.6266241 -3.17210267 -1.14763339  523.9875 1.0034506
## v[6,2]         0.04670092 0.3047181 -0.44809180  0.51224962 1467.0518 0.9996538
# Centred
End_C - Begin_C
## Time difference of 50.13902 secs
## Time difference of 30.24955 secs

# Non-Centred
End_NC - Begin_NC
## Time difference of 37.60459 secs
## Time difference of 25.83258 secs

14M4. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC.

data(Kline)
d <- Kline
# Chapter 11 stuff
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)
# Chapter 14 stuff
d$society <- 1:10
data(islandsDistMatrix)
## Chapter 11 Model
dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- exp(a[cid]) * P^b[cid] / g,
    a[cid] ~ dnorm(1, 1),
    b[cid] ~ dexp(1),
    g ~ dexp(1)
  ),
  data = dat2, chains = 4, cores = 4, log_lik = TRUE
)
## Chapter 14 Model
dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- (a * P^b / g) * exp(k[society]),
    vector[10]:k ~ multi_normal(0, SIGMA),
    matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)
compare(m11.11, m14.8)
##            WAIC        SE    dWAIC      dSE    pWAIC      weight
## m14.8  67.11598  2.299248  0.00000       NA 3.876694 0.998722564
## m11.11 80.43922 11.350980 13.32324 11.41478 5.065518 0.001277436

14M5. Modify the phylogenetic distance example to use group size as the outcome and brain size as a predictor. Plot the estimates. Assuming brain size influences group size, what is your estimate of the effect? How does phylogeny influence the estimate?

data(Primates301)
d <- Primates301
d$name <- as.character(d$name)
dstan <- d[complete.cases(d$group_size, d$body, d$brain), ]
spp_obs <- dstan$name
dat_list <- list(
  N_spp = nrow(dstan),
  M = standardize(log(dstan$body)),
  B = standardize(log(dstan$brain)),
  G = standardize(log(dstan$group_size)),
  Imat = diag(nrow(dstan))
)
data(Primates301_nex)
tree_trimmed <- keep.tip(Primates301_nex, spp_obs) # only keep tree that's relevant to our species
Rbm <- corBrownian(phy = tree_trimmed) # calculate expected covariance given a Brownian model
V <- vcv(Rbm) # compute expected variances and covariances
## Warning in Initialize.corPhyl(phy, dummy.df): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
Dmat <- cophenetic(tree_trimmed) # cophenetic distance matrix
dat_list$V <- V[spp_obs, spp_obs] # covariances in speciesXspecies matrix
dat_list$R <- dat_list$V / max(V) # relative covariances of speciesXspecies matrix
m_M5Ordi <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- Imat * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)
m_M5Brown <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- R * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)
dat_list$Dmat <- Dmat[spp_obs, spp_obs] / max(Dmat)
m_M5GP <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- cov_GPL1(Dmat, etasq, rhosq, 0.01),
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    etasq ~ half_normal(1, 0.25),
    rhosq ~ half_normal(3, 0.25)
  ),
  data = dat_list, chains = 4, cores = 4
)
plot(coeftab(m_M5Ordi, m_M5Brown, m_M5GP), pars = c("bM", "bB"))

14H1. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? It might help to plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plotting predicted proportion of women using contraception, with urban women on one axis and rural on the other, might also help.

data(bangladesh)
d <- bangladesh
dat_list <- list(
  C = d$use.contraception,
  did = as.integer(as.factor(d$district)),
  urban = d$urban
)
m_H1 <- ulam(
  alist(
    C ~ bernoulli(p),
    logit(p) <- a[did] + b[did] * urban,
    c(a, b)[did] ~ multi_normal(c(abar, bbar), Rho, Sigma),
    abar ~ normal(0, 1),
    bbar ~ normal(0, 0.5),
    Rho ~ lkj_corr(2),
    Sigma ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 4000
)
precis(m_H1)
##            mean         sd       5.5%      94.5%    n_eff     Rhat4
## abar -0.6868581 0.09928555 -0.8473043 -0.5313484 6015.311 0.9999369
## bbar  0.6409046 0.15988868  0.3859150  0.8960423 4644.005 0.9999363
precis(m_H1, depth = 3, pars = c("Rho", "Sigma"))
##                mean           sd       5.5%      94.5%     n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000       NaN       NaN
## Rho[1,2] -0.6488515 1.640059e-01 -0.8611723 -0.3451257 1565.7005 1.0011707
## Rho[2,1] -0.6488515 1.640059e-01 -0.8611723 -0.3451257 1565.7005 1.0011707
## Rho[2,2]  1.0000000 5.868555e-17  1.0000000  1.0000000 6913.8832 0.9994999
## Sigma[1]  0.5733948 9.666828e-02  0.4289012  0.7355289 2096.9227 1.0013255
## Sigma[2]  0.7701551 2.068129e-01  0.4595600  1.1100284  883.0168 1.0026664
post <- extract.samples(m_H1)
a <- apply(post$a, 2, mean)
b <- apply(post$b, 2, mean)
plot(a, b, xlab = "a (intercept)", ylab = "b (urban slope)")
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
R <- apply(post$Rho, 2:3, mean)
s <- apply(post$Sigma, 2, mean)
S <- diag(s) %*% R %*% diag(s)
ll <- c(0.5, 0.67, 0.89, 0.97)
for (l in ll) {
  el <- ellipse(S, centre = c(mean(post$abar), mean(post$bbar)), level = l)
  lines(el, col = "black", lwd = 0.5)
}

u0 <- inv_logit(a)
u1 <- inv_logit(a + b)
plot(u0, u1, xlim = c(0, 1), ylim = c(0, 1), xlab = "urban = 0", ylab = "urban = 1")
abline(h = 0.5, lty = 2)
abline(v = 0.5, lty = 2)