Chapter 14 - Adventures in Covariance

This chapter extended the basic multilevel strategy of partial pooling to slopes as well as intercepts. Accomplishing this meant modeling covariation in the statistical population of parameters. The LKJcorr prior was introduced as a convenient family of priors for correlation matrices. You saw how covariance models can be applied to causal inference, using instrumental variables and the front-door criterion. Gaussian processes represent a practical method of extending the varying effects strategy to continuous dimensions of similarity, such as spatial, network, phylogenetic, or any other abstract distance between entities in the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

14E1. Add to the following model varying slopes on the predictor x. \[\begin{align} \ y_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{group[i]} + βx_i \\ \ α_{group} ∼ Normal(α, σ_α) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(1) \\ \ σ_α ∼ Exponential(1) \\ \end{align}\]

# y_i ~ Normal(\mu_i, \sigma)\\
# mu_i = alpha_{GROUP[i]} + {\beta_{GROUP[i]}}x_i

#{
#begin{bmatrix}
#alpha_{GROUP} \\
#beta_{GROUP}
#end{bmatrix}
#}
#sim Normal\left(\begin{bmatrix}
#alpha 
#beta
#end{bmatrix}
#sim Normal left(\begin{bmatrix}
#alpha
#beta
#end{bmatrix}
#, S right)

#S =  
#begin{pmatrix}
#     sigma_{alpha} & 0 
#      0 & sigma_{beta}
#end{pmatrix}


#alpha ~ Normal(0, 10)\\
#beta ~  Normal(0, 1)\\
#sigma ~ HalfCauchy(0, 2)\\
#sigma_{alpha} ~ HalfCauchy(0, 2)
#sigma_{\beta} ~ HalfCauchy(0, 2)

14E2. Think up a context in which varying intercepts will be positively correlated with varying slopes. Provide a mechanistic explanation for the correlation.

# Restaurants with a long waiting time in the morning have longer waiting time in the evening and vice versa. This will result in a positive correlation.
# The correlation measures the strength and direction of a linear conection between X and Y.

#rho_{X, Y} = corr(X, Y) = \frac{cov(X, Y)}{\sigma_X\sigma_Y}$

14E3. When is it possible for a varying slopes model to have fewer effective parameters (as estimated by WAIC or PSIS) than the corresponding model with fixed (unpooled) slopes? Explain.

# If the prior assigned to each intercept shrinks them all towards the mean, so there will be fewer paramters. 

# If we have an aggressive regularizing prior, result is less flexible posterior and fewer effective parameters

14M1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

# set up parameters of population
library(MASS)
## Warning: package 'MASS' was built under R version 4.0.5
library(tidybayes)
## Warning: package 'tidybayes' was built under R version 4.0.5
a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- 0 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(6)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)

m_M1 <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 6, cores = 6
)


post <- extract.samples(m_M1)
ggplot() +
  stat_halfeye(aes(x = post$Rho[, 1, 2])) +
  theme_bw() +
  labs(x = "Rho")

precis(m_M1, pars = "Rho[1,2]")
##              result
## mean     0.01110476
## sd       0.23468860
## 5.5%    -0.36400650
## 94.5%    0.38982346
## n_eff 2833.41771477
## Rhat     1.00023118
# Posterior shifts to the right with a peak at roughly 0.5. The prior won't be affected by negative correlation but rather from a slightly positive one. 

14M2. Fit this multilevel model to the simulated cafĂ© data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.

# set up parameters of population
a <- 3.5 # average morning wait time
b <- -1 # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- -0.7 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(42)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)


m_M2Baseline <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 4, cores = 4
)


m_M2 <- ulam(
  alist(
    wait ~ dnorm(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    a_cafe[cafe] ~ dnorm(a, sigma_alpha),
    b_cafe[cafe] ~ dnorm(b, sigma_beta),
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 10),
    sigma ~ dexp(1),
    sigma_alpha ~ dexp(1),
    sigma_beta ~ dexp(1)
  ),
  data = d, chains = 4, cores = 4
)
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
post_Base <- extract.samples(m_M2Baseline)
a_Base <- apply(post_Base$a_cafe, 2, mean)
b_Base <- apply(post_Base$b_cafe, 2, mean)
post_M2 <- extract.samples(m_M2)
a_M2 <- apply(post_M2$a_cafe, 2, mean)
b_M2 <- apply(post_M2$b_cafe, 2, mean)
plot(a_M2, b_M2,
  xlab = "intercept", ylab = "slope",
  pch = 16, col = rangi2, ylim = c(min(b_M2) - 0.05, max(b_M2) + 0.05),
  xlim = c(min(a_M2) - 0.1, max(a_M2) + 0.1), cex = 2
)
points(a_Base, b_Base, pch = 1, cex = 2)

#Solid bubbles represent samples from our new model, while the outline bubbles represent samples from the posterior obtained via the model which accounts for correlation of slopes and intercepts. They are similar. We see it more in the centre of the plot with increasing divergences of the posterior samples at the fringes of the intercept and slope ranges. This comes down to what the underlying models assume. Baseline model assumes that slopes and intercepts are inherently related to one another and finds a negative correlation between them. This can be seen when looking at the lower right-hand and the upper left-hand corner of the plot above. Large intercepts are associated with strongly negative slopes and vice versa.

14M3. Re-estimate the varying slopes model for the UCBadmit data, now using a non-centered parameterization. Compare the efficiency of the forms of the model, using n_eff. Which is better? Which chain sampled faster?

data(UCBadmit)
d <- UCBadmit
dat_list <- list(
  admit = d$admit,
  applications = d$applications,
  male = ifelse(d$applicant.gender == "male", 1, 0),
  dept_id = rep(1:6, each = 2)
)
str(dat_list)
## List of 4
##  $ admit       : int [1:12] 512 89 353 17 120 202 138 131 53 94 ...
##  $ applications: int [1:12] 825 108 560 25 325 593 417 375 191 393 ...
##  $ male        : num [1:12] 1 0 1 0 1 0 1 0 1 0 ...
##  $ dept_id     : int [1:12] 1 1 2 2 3 3 4 4 5 5 ...
Begin_C <- Sys.time()
m_M3 <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a[dept_id] + bm[dept_id] * male,
    c(a, bm)[dept_id] ~ multi_normal(c(a_bar, bm_bar), Rho, sigma_dept),
    a_bar ~ dnorm(0, 1),
    bm_bar ~ dnorm(0, 1),
    sigma_dept ~ dexp(1),
    Rho ~ dlkjcorr(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
End_C <- Sys.time()
precis(m_M3, 3)
##                      mean           sd       5.5%      94.5%     n_eff
## bm[1]         -0.75467135 2.642890e-01 -1.1974575 -0.3419394  762.8361
## bm[2]         -0.21546963 3.141762e-01 -0.7172490  0.2702785 1444.8377
## bm[3]          0.07788729 1.395628e-01 -0.1457046  0.3005848 1519.8738
## bm[4]         -0.09706014 1.389302e-01 -0.3171928  0.1276948 1760.1554
## bm[5]          0.11606582 1.793774e-01 -0.1610577  0.4022199 1726.0739
## bm[6]         -0.10986997 2.643276e-01 -0.5302442  0.3009293 1397.2613
## a[1]           1.27128456 2.477604e-01  0.8819218  1.6722364  778.7511
## a[2]           0.74963140 3.154061e-01  0.2626239  1.2484753 1436.8953
## a[3]          -0.64605733 8.582742e-02 -0.7808221 -0.5086328 1617.3253
## a[4]          -0.61501279 1.019817e-01 -0.7774726 -0.4515837 1708.0376
## a[5]          -1.13005620 1.096386e-01 -1.3054035 -0.9559542 1993.4173
## a[6]          -2.60481876 2.045530e-01 -2.9411431 -2.2759003 1799.7746
## a_bar         -0.39106882 5.307969e-01 -1.2265922  0.4709463 1698.0086
## bm_bar        -0.16176143 2.137141e-01 -0.4947019  0.1694293 1403.2870
## sigma_dept[1]  1.48653705 4.716297e-01  0.9088349  2.3531967 1292.0985
## sigma_dept[2]  0.44566910 2.144169e-01  0.1819589  0.8410725  988.9796
## Rho[1,1]       1.00000000 0.000000e+00  1.0000000  1.0000000       NaN
## Rho[1,2]      -0.32154165 3.408218e-01 -0.8124723  0.2811236 1522.8640
## Rho[2,1]      -0.32154165 3.408218e-01 -0.8124723  0.2811236 1522.8640
## Rho[2,2]       1.00000000 8.265587e-17  1.0000000  1.0000000 1811.6478
##                   Rhat4
## bm[1]         1.0035396
## bm[2]         1.0007292
## bm[3]         1.0005234
## bm[4]         0.9999004
## bm[5]         1.0005376
## bm[6]         1.0030231
## a[1]          1.0032979
## a[2]          1.0004966
## a[3]          0.9995321
## a[4]          0.9992410
## a[5]          1.0004230
## a[6]          1.0015531
## a_bar         0.9994382
## bm_bar        1.0018595
## sigma_dept[1] 1.0008335
## sigma_dept[2] 1.0015728
## Rho[1,1]            NaN
## Rho[1,2]      1.0015113
## Rho[2,1]      1.0015113
## Rho[2,2]      0.9979980
Begin_NC <- Sys.time()
m_M3NonCent <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a_bar + v[dept_id, 1] + (bm_bar + v[dept_id, 2]) * male,
    transpars > matrix[dept_id, 2]:v <- compose_noncentered(sigma_dept, L_Rho, z),
    matrix[2, dept_id]:z ~ dnorm(0, 1),
    a_bar ~ dnorm(0, 1.5),
    bm_bar ~ dnorm(0, 1),
    vector[2]:sigma_dept ~ dexp(1),
    cholesky_factor_corr[2]:L_Rho ~ lkj_corr_cholesky(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
End_NC <- Sys.time()
precis(m_M3NonCent, 3)
##                      mean        sd       5.5%       94.5%     n_eff     Rhat4
## z[1,1]         1.27847990 0.5262323  0.4705305  2.13326205  727.0227 1.0011087
## z[1,2]         0.88269092 0.4857079  0.1388867  1.68994558  719.7999 1.0031813
## z[1,3]        -0.12528283 0.3849925 -0.7106325  0.50669799  612.7119 1.0140975
## z[1,4]        -0.10433526 0.3866020 -0.6958663  0.51682268  601.9049 1.0154629
## z[1,5]        -0.48074322 0.4055438 -1.1153447  0.16675545  643.6182 1.0175388
## z[1,6]        -1.56200990 0.5688037 -2.4838992 -0.69872746  737.3167 1.0145782
## z[2,1]        -1.23644880 0.8164974 -2.6022559  0.05071129 1271.1801 1.0026798
## z[2,2]         0.18312727 0.8096899 -1.0820859  1.45719856 1573.6340 1.0003249
## z[2,3]         0.60192227 0.6276861 -0.3439741  1.64761142 1203.4125 1.0020469
## z[2,4]         0.10146041 0.5799030 -0.8100848  1.01529134 1291.1294 1.0015204
## z[2,5]         0.53484163 0.6684946 -0.4889643  1.64962250 1165.4873 1.0009753
## z[2,6]        -0.49127010 0.8038063 -1.8068248  0.74735922 1885.4880 1.0003286
## a_bar         -0.46792318 0.5611715 -1.3795380  0.39745615  588.0941 1.0121940
## bm_bar        -0.13969516 0.2136100 -0.4695941  0.18952741  799.7150 1.0034511
## sigma_dept[1]  1.46852953 0.4505571  0.9255641  2.31989495  833.1351 1.0030067
## sigma_dept[2]  0.44493698 0.2388977  0.1818530  0.86089979  757.1626 0.9991599
## L_Rho[1,1]     1.00000000 0.0000000  1.0000000  1.00000000       NaN       NaN
## L_Rho[1,2]     0.00000000 0.0000000  0.0000000  0.00000000       NaN       NaN
## L_Rho[2,1]    -0.31981947 0.3440124 -0.8022403  0.29104897 1687.4781 1.0001054
## L_Rho[2,2]     0.87223636 0.1365401  0.5970006  0.99912792 1190.3266 1.0002817
## v[1,1]         1.73536453 0.6011856  0.8128695  2.70174932  648.8099 1.0094272
## v[1,2]        -0.61370995 0.3128848 -1.1315494 -0.14674125 1028.2946 1.0005503
## v[2,1]         1.19934441 0.6279810  0.2169330  2.22040335  675.6765 1.0101770
## v[2,2]        -0.06176349 0.3328516 -0.5897528  0.46209909 1539.0956 1.0001901
## v[3,1]        -0.17733549 0.5673456 -1.0542767  0.75415175  593.0556 1.0109594
## v[3,2]         0.21846891 0.2395483 -0.1438607  0.60551012 1035.9533 1.0016130
## v[4,1]        -0.14832190 0.5675488 -1.0127159  0.76606136  575.1370 1.0125444
## v[4,2]         0.04602378 0.2363775 -0.3298587  0.41695086  890.5646 1.0022504
## v[5,1]        -0.65983984 0.5702765 -1.5558841  0.26053379  610.1289 1.0114797
## v[5,2]         0.25080258 0.2583262 -0.1081353  0.67319529 1159.0924 1.0024436
## v[6,1]        -2.12809664 0.5847887 -3.0186504 -1.19350046  633.0943 1.0111973
## v[6,2]         0.02683558 0.3048067 -0.4676814  0.50643307 1645.9028 1.0004084
# Centred
End_C - Begin_C
## Time difference of 2.124572 mins
# Non-Centred
End_NC - Begin_NC
## Time difference of 1.524285 mins
# Identical with same "flexibility" (WAIC) and weight. Non-centered model samples efficiently.

14M4. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC.

data(Kline)
d <- Kline

d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)

d$society <- 1:10
data(islandsDistMatrix)

dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- exp(a[cid]) * P^b[cid] / g,
    a[cid] ~ dnorm(1, 1),
    b[cid] ~ dexp(1),
    g ~ dexp(1)
  ),
  data = dat2, chains = 4, cores = 4, log_lik = TRUE
)
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- (a * P^b / g) * exp(k[society]),
    vector[10]:k ~ multi_normal(0, SIGMA),
    matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)

compare(m11.11, m14.8)
##            WAIC        SE    dWAIC      dSE    pWAIC      weight
## m14.8  67.87559  2.370998  0.00000       NA 4.196489 0.998336383
## m11.11 80.66978 11.103427 12.79419 10.95919 5.148678 0.001663617
# m14.8 - outperforms the previously held best model (m11.11). The Gaussian process model has less effective parameters (pWAIC) than the simpler model. This is a sign of intense regularisation on the part of the Gaussian Process model.

14M5. Modify the phylogenetic distance example to use group size as the outcome and brain size as a predictor. Plot the estimates. Assuming brain size influences group size, what is your estimate of the effect? How does phylogeny influence the estimate?

library(ape)
## Warning: package 'ape' was built under R version 4.0.5
data(Primates301)
d <- Primates301
d$name <- as.character(d$name)
dstan <- d[complete.cases(d$group_size, d$body, d$brain), ]
spp_obs <- dstan$name
dat_list <- list(
  N_spp = nrow(dstan),
  M = standardize(log(dstan$body)),
  B = standardize(log(dstan$brain)),
  G = standardize(log(dstan$group_size)),
  Imat = diag(nrow(dstan))
)

data(Primates301_nex)
tree_trimmed <- keep.tip(Primates301_nex, spp_obs) # only keep tree that's relevant to our species
Rbm <- corBrownian(phy = tree_trimmed) # calculate expected covariance given a Brownian model
V <- vcv(Rbm) # compute expected variances and covariances
## Warning in Initialize.corPhyl(phy, dummy.df): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
Dmat <- cophenetic(tree_trimmed) # cophenetic distance matrix
dat_list$V <- V[spp_obs, spp_obs] # covariances in speciesXspecies matrix
dat_list$R <- dat_list$V / max(V) # relative covariances of speciesXspecies matrix

# Ordinary regression
m_M5Ordi <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- Imat * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)

# Brownian motion model:

m_M5Brown <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- R * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)

# Gaussian Process Model
dat_list$Dmat <- Dmat[spp_obs, spp_obs] / max(Dmat)
m_M5GP <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- cov_GPL1(Dmat, etasq, rhosq, 0.01),
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    etasq ~ half_normal(1, 0.25),
    rhosq ~ half_normal(3, 0.25)
  ),
  data = dat_list, chains = 4, cores = 4
)

plot(coeftab(m_M5Ordi, m_M5Brown, m_M5GP), pars = c("bM", "bB"))

# Model which does not take into account phylogeny - m_M5Ordi - finds a clearly non-zero dependence of brain size on group size. However, both models which include phylogenetic information - m_M5Brown and m_M5GP - do not show this relationship. Adding phylogenetic information seems to reduce the evidence for a causal link between brain size and group size.

14H1. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? It might help to plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plotting predicted proportion of women using contraception, with urban women on one axis and rural on the other, might also help.

library(car)
## Warning: package 'car' was built under R version 4.0.5
library(mvtnorm)

data(bangladesh)
d <- bangladesh
dat_list <- list(
  C = d$use.contraception,
  did = as.integer(as.factor(d$district)),
  urban = d$urban
)

m_H1 <- ulam(
  alist(
    C ~ bernoulli(p),
    logit(p) <- a[did] + b[did] * urban,
    c(a, b)[did] ~ multi_normal(c(abar, bbar), Rho, Sigma),
    abar ~ normal(0, 1),
    bbar ~ normal(0, 0.5),
    Rho ~ lkj_corr(2),
    Sigma ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 4000
)

precis(m_H1)
##            mean        sd       5.5%      94.5%    n_eff    Rhat4
## abar -0.6842219 0.1003302 -0.8475050 -0.5301731 6158.562 1.000395
## bbar  0.6369826 0.1590421  0.3866908  0.8900233 4474.285 1.000145
#  positive effect for bbar which indicates that contraception is used more frequently in urban areas.

# Investigating the posterior estimates
precis(m_H1, depth = 3, pars = c("Rho", "Sigma"))
##                mean           sd       5.5%      94.5%     n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000       NaN       NaN
## Rho[1,2] -0.6512997 1.696069e-01 -0.8680436 -0.3443487 1431.4457 1.0003693
## Rho[2,1] -0.6512997 1.696069e-01 -0.8680436 -0.3443487 1431.4457 1.0003693
## Rho[2,2]  1.0000000 5.972661e-17  1.0000000  1.0000000 7485.2487 0.9994999
## Sigma[1]  0.5762512 9.710360e-02  0.4308600  0.7404430 1990.0617 1.0021542
## Sigma[2]  0.7731623 1.991282e-01  0.4617857  1.0991674  967.7497 1.0034399
# shows a negative correlation between the intercepts and slopes (Rho[1,2] or Rho[2,1]).


#plotting the  relationship between the varying effects to get a better understanding of what is happening:

post <- extract.samples(m_H1)
a <- apply(post$a, 2, mean)
b <- apply(post$b, 2, mean)
plot(a, b, xlab = "a (intercept)", ylab = "b (urban slope)")
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)

R <- apply(post$Rho, 2:3, mean)
s <- apply(post$Sigma, 2, mean)
S <- diag(s) %*% R %*% diag(s)
ll <- c(0.5, 0.67, 0.89, 0.97)
#for (l in ll) {
#  el <- ellipse(S, centre = c(mean(post$abar), mean(post$bbar)), level = l)
#  lines(el, col = "black", lwd = 0.5)
#}

# Districts with higher contraception outside urban areas come with smaller slopes. This means that districts which boast a high use of contraception outside of urban areas do not have a marked shift in use of contraceptives when moving to urban areas.

#u0 <- inv_logit(a)
#u1 <- inv_logit(a + b)
#plot(u0, u1, xlim = c(0, 1), ylim = c(0, 1), xlab = "urban = 0", ylab = "urban = 1")
#abline(h = 0.5, lty = 2)
#abline(v = 0.5, lty = 2)