Chapter 14 - Adventures in Covariance

This chapter extended the basic multilevel strategy of partial pooling to slopes as well as intercepts. Accomplishing this meant modeling covariation in the statistical population of parameters. The LKJcorr prior was introduced as a convenient family of priors for correlation matrices. You saw how covariance models can be applied to causal inference, using instrumental variables and the front-door criterion. Gaussian processes represent a practical method of extending the varying effects strategy to continuous dimensions of similarity, such as spatial, network, phylogenetic, or any other abstract distance between entities in the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them. Problems are labeled Easy (E), Medium (M), and Hard(H).

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

14E1. Add to the following model varying slopes on the predictor x. \[\begin{align} \ y_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{group[i]} + βx_i \\ \ α_{group} ∼ Normal(α, σ_α) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 1) \\ \ σ ∼ Exponential(1) \\ \ σ_α ∼ Exponential(1) \\ \end{align}\]

#The outcome distribution doesn't change:
# y_i ∼ Normal(μ_i, σ) 
#In the linear model, we want to know β_{group}, therefore,
# μ_i = α_{group[i]} + β_{group[i]}x_i 

# α is the average intercept, while,β becomes into the average slope. α and β are the mean expectations for α_{group} and β_{group}. Now in a multivariate normal distribution MVNormal() with a covariance matrix (S) defining the link:
#{\begin{bmatrix}\alpha_{GROUP} \\
#\beta_{GROUP}
#\end{bmatrix}
#}
#\sim Normal\left(\begin{bmatrix}
#\alpha \\
#\beta
#\end{bmatrix}

#, S \right)\\
#S =  
#\begin{pmatrix}
#      \sigma_{\alpha} & 0 \\
#      0 & \sigma_{\beta}
#\end{pmatrix}
#R
#\begin{pmatrix}
#      \sigma_{\alpha} & 0 \\
#      0 & \sigma_{\beta}
#\end{pmatrix}\\

#the priors of the variances:
#alpha  ∼ Normal(0, 10)
#beta  ∼  Normal(0, 1)
#sigma  ∼ HalfCauchy(0, 2)
#sigma_{alpha}  ∼ HalfCauchy(0, 2)
#sigma_{beta}  ∼ HalfCauchy(0, 2)
#R ∼ LKJcorr(2)

14E2. Think up a context in which varying intercepts will be positively correlated with varying slopes. Provide a mechanistic explanation for the correlation.

#I can think of many settings meet this criterion such as ant conoly. Similarly, let's think about apartment building as a function of water supply. We can expect that a bigger apt building size with a bigger height (higher intercepts) comes with higher water supplies. Meanwhile, it's more efficient to supply more water for one higher building (steeper slope). Definitely, in this case, we simplified many factors in reality.

14E3. When is it possible for a varying slopes model to have fewer effective parameters (as estimated by WAIC or PSIS) than the corresponding model with fixed (unpooled) slopes? Explain.

#If there is little variation among clusters, we can say that the absence of this among-cluster variation induces very strong shrinkage. Therefore, albeit containing more actual parameters in the posterior distribution. Also, the varying slopes model may end up less flexible in fitting to the data because of adaptive regularization forcing strong shrinkage. That's why the effective parameter number decreases.

#For instance,take a look of the comparison of models m13.1 and m13.2 in R Code.

14M1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

library(MASS)
library(tidybayes)
# set up parameters of population
a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- 0 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(6)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)

m_M1 <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 6, cores = 6
)


post <- extract.samples(m_M1)
ggplot() +
  stat_halfeye(aes(x = post$Rho[, 1, 2])) +
  theme_bw() +
  labs(x = "Rho")

precis(m_M1, pars = "Rho[1,2]")
##              result
## mean     0.01454989
## sd       0.23618503
## 5.5%    -0.37040029
## 94.5%    0.40113780
## n_eff 2522.90934355
## Rhat     1.00043502

14M2. Fit this multilevel model to the simulated café data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.

# set up parameters of population
a <- 3.5 # average morning wait time
b <- -1 # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- -0.7 # correlation between intercepts and slopes
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
# simulate observations
N_cafes <- 20
set.seed(42)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes / 2)
cafe_id <- rep(1:N_cafes, each = N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
# package into  data frame
d <- data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)

m_M2Baseline <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 4, cores = 4
)

m_M2 <- ulam(
  alist(
    wait ~ dnorm(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    a_cafe[cafe] ~ dnorm(a, sigma_alpha),
    b_cafe[cafe] ~ dnorm(b, sigma_beta),
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 10),
    sigma ~ dexp(1),
    sigma_alpha ~ dexp(1),
    sigma_beta ~ dexp(1)
  ),
  data = d, chains = 4, cores = 4
)
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
post_Base <- extract.samples(m_M2Baseline)
a_Base <- apply(post_Base$a_cafe, 2, mean)
b_Base <- apply(post_Base$b_cafe, 2, mean)
post_M2 <- extract.samples(m_M2)
a_M2 <- apply(post_M2$a_cafe, 2, mean)
b_M2 <- apply(post_M2$b_cafe, 2, mean)
plot(a_M2, b_M2,
  xlab = "intercept", ylab = "slope",
  pch = 16, col = rangi2, ylim = c(min(b_M2) - 0.05, max(b_M2) + 0.05),
  xlim = c(min(a_M2) - 0.1, max(a_M2) + 0.1), cex = 2
)
points(a_Base, b_Base, pch = 1, cex = 2)

#In this plot, the filled circles represent the new model's samples, meanwhile, the open circles represent samples from the posterior (obtained via the model whic)h accounts for correlation of slopes and intercepts). The 1st thing I learned is that both of them are similar to each other. Especially considering the increasing divergences of the posterior samples at the fringes of the intercept and slope ranges. The baseline model assumes that slopes and intercepts are inherently related to one another. There is a negative correlation between them. We can observe this when we investigate the lower right corner as well as the upper left corner of the plot. With the assumption, large intercepts are associated with strongly negative slopes and vice versa.

14M3. Re-estimate the varying slopes model for the UCBadmit data, now using a non-centered parameterization. Compare the efficiency of the forms of the model, using n_eff. Which is better? Which chain sampled faster?

data(UCBadmit)
d <- UCBadmit
dat_list <- list(
  admit = d$admit,
  applications = d$applications,
  male = ifelse(d$applicant.gender == "male", 1, 0),
  dept_id = rep(1:6, each = 2)
)
str(dat_list)
## List of 4
##  $ admit       : int [1:12] 512 89 353 17 120 202 138 131 53 94 ...
##  $ applications: int [1:12] 825 108 560 25 325 593 417 375 191 393 ...
##  $ male        : num [1:12] 1 0 1 0 1 0 1 0 1 0 ...
##  $ dept_id     : int [1:12] 1 1 2 2 3 3 4 4 5 5 ...
Begin_C <- Sys.time()
m_M3 <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a[dept_id] + bm[dept_id] * male,
    c(a, bm)[dept_id] ~ multi_normal(c(a_bar, bm_bar), Rho, sigma_dept),
    a_bar ~ dnorm(0, 1),
    bm_bar ~ dnorm(0, 1),
    sigma_dept ~ dexp(1),
    Rho ~ dlkjcorr(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
End_C <- Sys.time()
precis(m_M3, 3)
##                      mean           sd       5.5%      94.5%     n_eff
## bm[1]         -0.76077107 2.689968e-01 -1.2119099 -0.3424693  664.2030
## bm[2]         -0.20713502 3.102449e-01 -0.7037161  0.2817647 1044.4764
## bm[3]          0.07557000 1.357233e-01 -0.1447811  0.2902668 1305.1417
## bm[4]         -0.09718324 1.427211e-01 -0.3242062  0.1381112 1441.2028
## bm[5]          0.11047109 1.754072e-01 -0.1586052  0.4009950 1337.7577
## bm[6]         -0.12090857 2.650445e-01 -0.5448649  0.3006777 1278.9732
## a[1]           1.27476609 2.549336e-01  0.8874646  1.6901712  701.8718
## a[2]           0.73834762 3.067492e-01  0.2697131  1.2296623 1042.7487
## a[3]          -0.64544190 8.427618e-02 -0.7827854 -0.5125194 1561.3394
## a[4]          -0.61308256 1.048815e-01 -0.7779361 -0.4424686 1400.3571
## a[5]          -1.12526215 1.106650e-01 -1.3126155 -0.9579522 1459.4540
## a[6]          -2.59957763 1.985283e-01 -2.9275371 -2.2921102 1426.2309
## a_bar         -0.36817167 5.326814e-01 -1.2233163  0.5171263 1494.8974
## bm_bar        -0.16200247 2.055933e-01 -0.4748406  0.1523336  855.1706
## sigma_dept[1]  1.47455786 4.737220e-01  0.9121865  2.3558989 1319.8501
## sigma_dept[2]  0.44433549 2.185057e-01  0.1830373  0.8308194  498.3918
## Rho[1,1]       1.00000000 0.000000e+00  1.0000000  1.0000000       NaN
## Rho[1,2]      -0.32729515 3.299758e-01 -0.7955956  0.2743432 1008.1760
## Rho[2,1]      -0.32729515 3.299758e-01 -0.7955956  0.2743432 1008.1760
## Rho[2,2]       1.00000000 8.220706e-17  1.0000000  1.0000000 1774.7292
##                   Rhat4
## bm[1]         1.0040723
## bm[2]         1.0023058
## bm[3]         1.0011827
## bm[4]         0.9992158
## bm[5]         1.0014895
## bm[6]         1.0009593
## a[1]          1.0038580
## a[2]          1.0030785
## a[3]          1.0015171
## a[4]          1.0009253
## a[5]          1.0033164
## a[6]          1.0001368
## a_bar         0.9990572
## bm_bar        1.0013324
## sigma_dept[1] 1.0051760
## sigma_dept[2] 1.0070462
## Rho[1,1]            NaN
## Rho[1,2]      1.0001509
## Rho[2,1]      1.0001509
## Rho[2,2]      0.9979980
#The precis() output is used in the re-parametrised model.
#Ulam() has helper functions:

Begin_NC <- Sys.time()
m_M3NonCent <- ulam(
  alist(
    admit ~ dbinom(applications, p),
    logit(p) <- a_bar + v[dept_id, 1] + (bm_bar + v[dept_id, 2]) * male,
    transpars > matrix[dept_id, 2]:v <- compose_noncentered(sigma_dept, L_Rho, z),
    matrix[2, dept_id]:z ~ dnorm(0, 1),
    a_bar ~ dnorm(0, 1.5),
    bm_bar ~ dnorm(0, 1),
    vector[2]:sigma_dept ~ dexp(1),
    cholesky_factor_corr[2]:L_Rho ~ lkj_corr_cholesky(2)
  ),
  data = dat_list, chains = 4, cores = 4
)
End_NC <- Sys.time()
precis(m_M3NonCent, 3)
##                      mean        sd        5.5%       94.5%     n_eff     Rhat4
## z[1,1]         1.22542388 0.5391820  0.40555775  2.05870796  707.9165 1.0092190
## z[1,2]         0.83931665 0.5056397  0.05090741  1.66379788  734.8407 1.0080022
## z[1,3]        -0.16805614 0.3916197 -0.81224766  0.45412317  620.1539 1.0055018
## z[1,4]        -0.14665291 0.3947996 -0.79637720  0.47908569  628.1046 1.0050752
## z[1,5]        -0.52399546 0.4130560 -1.19892216  0.11690376  636.9829 1.0040752
## z[1,6]        -1.59190547 0.5710791 -2.51839363 -0.72418238  650.9390 1.0047235
## z[2,1]        -1.17876327 0.7987806 -2.48872583  0.08218495 1419.1524 0.9989190
## z[2,2]         0.17629119 0.8082085 -1.15743665  1.42056284 1880.4488 1.0005226
## z[2,3]         0.65220091 0.6343446 -0.29639132  1.70023794 1272.9438 0.9997852
## z[2,4]         0.15415849 0.5834741 -0.79018006  1.04957847 1312.0445 0.9989197
## z[2,5]         0.60615502 0.6511364 -0.40219395  1.67098396 1408.5758 1.0005314
## z[2,6]        -0.42215846 0.7785862 -1.67436543  0.81388210 1772.6828 0.9990618
## a_bar         -0.40423551 0.5853072 -1.32477104  0.54864882  606.3930 1.0050635
## bm_bar        -0.17017941 0.2231337 -0.52511790  0.15135265  712.5496 1.0001783
## sigma_dept[1]  1.49240073 0.4670208  0.93337157  2.36630314  702.4977 1.0101129
## sigma_dept[2]  0.45564753 0.2334685  0.19047522  0.86857434  693.1674 1.0060142
## L_Rho[1,1]     1.00000000 0.0000000  1.00000000  1.00000000       NaN       NaN
## L_Rho[1,2]     0.00000000 0.0000000  0.00000000  0.00000000       NaN       NaN
## L_Rho[2,1]    -0.30132235 0.3436926 -0.78163374  0.32192560 1535.8899 1.0032612
## L_Rho[2,2]     0.87987048 0.1302909  0.62225208  0.99889454 1228.1889 1.0007631
## v[1,1]         1.67980597 0.6216362  0.64666329  2.64664896  643.2641 1.0035401
## v[1,2]        -0.59066096 0.3068635 -1.11016745 -0.15318895  865.1197 1.0017192
## v[2,1]         1.14630459 0.6554917  0.08310716  2.18132242  742.3470 1.0034243
## v[2,2]        -0.04125271 0.3381682 -0.57390649  0.48564169 1355.9458 1.0005045
## v[3,1]        -0.24178548 0.5889649 -1.19233355  0.69187660  612.7770 1.0046587
## v[3,2]         0.25207021 0.2560060 -0.10702516  0.68040337  874.9696 0.9990493
## v[4,1]        -0.21190104 0.5923362 -1.17897128  0.71620704  625.5698 1.0043336
## v[4,2]         0.07682890 0.2438733 -0.27770506  0.46730510  951.2256 1.0003892
## v[5,1]        -0.73000928 0.5949261 -1.67658229  0.16908360  628.5248 1.0049695
## v[5,2]         0.29074280 0.2707199 -0.08478665  0.73143517  824.3965 1.0004135
## v[6,1]        -2.19772504 0.6019871 -3.14077716 -1.25570459  639.1251 1.0067920
## v[6,2]         0.05695641 0.3090386 -0.42240658  0.52876914 1195.0737 1.0011860
#The number of effective samples (n_eff) is higher for the centred model (m_M3). 

# Centred
End_C - Begin_C
## Time difference of 35.64413 secs
# Non-Centred
End_NC - Begin_NC
## Time difference of 26.25804 secs
#Therefore, the non-centred model ran faster than the centred modeal

14M4. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC.

data(Kline)
d <- Kline
# Chapter 11 stuff
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)
# Chapter 14 stuff
d$society <- 1:10
data(islandsDistMatrix)

#Use the same code in the book to execute the respective models, and with the ulam() argument log_lik=TRUE for comparison with WAIC in the next step.

# Chapter 11 Model
dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- exp(a[cid]) * P^b[cid] / g,
    a[cid] ~ dnorm(1, 1),
    b[cid] ~ dexp(1),
    g ~ dexp(1)
  ),
  data = dat2, chains = 4, cores = 4, log_lik = TRUE
)
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess
## Chapter 14 Model
dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- (a * P^b / g) * exp(k[society]),
    vector[10]:k ~ multi_normal(0, SIGMA),
    matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)

compare(m11.11, m14.8)
##            WAIC        SE    dWAIC      dSE    pWAIC      weight
## m14.8  67.75806  2.279063  0.00000       NA 4.160725 0.998330077
## m11.11 80.54468 11.246869 12.78661 11.31052 5.113115 0.001669923
#The more complex model- m14.8 - does a better job than the (m11.11). Besides, the Gaussian process model has less effective parameters (pWAIC) than the simpler model. 

14M5. Modify the phylogenetic distance example to use group size as the outcome and brain size as a predictor. Plot the estimates. Assuming brain size influences group size, what is your estimate of the effect? How does phylogeny influence the estimate?

library(keep)
library(ape)
data(Primates301)
d <- Primates301
d$name <- as.character(d$name)
dstan <- d[complete.cases(d$group_size, d$body, d$brain), ]
spp_obs <- dstan$name
dat_list <- list(
  N_spp = nrow(dstan),
  M = standardize(log(dstan$body)),
  B = standardize(log(dstan$brain)),
  G = standardize(log(dstan$group_size)),
  Imat = diag(nrow(dstan))
)

data(Primates301_nex)
tree_trimmed <- keep.tip(Primates301_nex, spp_obs) # only keep tree that's relevant 
Rbm <- corBrownian(phy = tree_trimmed) # expected covariance given a Brownian model
V <- vcv(Rbm) #expected variances and covariances
## Warning in Initialize.corPhyl(phy, dummy.df): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
Dmat <- cophenetic(tree_trimmed) # cophenetic distance matrix
dat_list$V <- V[spp_obs, spp_obs] # covariances in speciesXspecies matrix
dat_list$R <- dat_list$V / max(V) # relative covariances of speciesXspecies matrix


#Ordinary regression:
m_M5Ordi <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- Imat * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)

#Brownian motion model
m_M5Brown <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- R * sigma_sq,
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    sigma_sq ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4
)

#Gaussian Process model
dat_list$Dmat <- Dmat[spp_obs, spp_obs] / max(Dmat)
m_M5GP <- ulam(
  alist(
    G ~ multi_normal(mu, SIGMA),
    mu <- a + bM * M + bB * B,
    matrix[N_spp, N_spp]:SIGMA <- cov_GPL1(Dmat, etasq, rhosq, 0.01),
    a ~ normal(0, 1),
    c(bM, bB) ~ normal(0, 0.5),
    etasq ~ half_normal(1, 0.25),
    rhosq ~ half_normal(3, 0.25)
  ),
  data = dat_list, chains = 4, cores = 4
)

plot(coeftab(m_M5Ordi, m_M5Brown, m_M5GP), pars = c("bM", "bB"))

#Via the plot, the oridinary model - m_M5Ordi - has a clearly non-zero dependence of brain size on group size. In contract, both models of - m_M5Brown and m_M5GP - do not show the same relationship. Moreover, adding phylogenetic information leads to reduce the evidence for a causal link between brain size and group size.

14H1. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? It might help to plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plotting predicted proportion of women using contraception, with urban women on one axis and rural on the other, might also help.

library(ellipse)
data(bangladesh)
d <- bangladesh
dat_list <- list(
  C = d$use.contraception,
  did = as.integer(as.factor(d$district)),
  urban = d$urban
)

m_H1 <- ulam(
  alist(
    C ~ bernoulli(p),
    logit(p) <- a[did] + b[did] * urban,
    c(a, b)[did] ~ multi_normal(c(abar, bbar), Rho, Sigma),
    abar ~ normal(0, 1),
    bbar ~ normal(0, 0.5),
    Rho ~ lkj_corr(2),
    Sigma ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 4000
)

precis(m_H1)
##            mean        sd       5.5%      94.5%    n_eff    Rhat4
## abar -0.6848044 0.0992502 -0.8449288 -0.5266659 5734.013 1.000392
## bbar  0.6390180 0.1604257  0.3832515  0.8950034 3994.354 1.000449
#There is a positive effect for bbar, and so the contraception is used more frequently in urban areas.

precis(m_H1, depth = 3, pars = c("Rho", "Sigma"))
##                mean           sd       5.5%      94.5%    n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000      NaN       NaN
## Rho[1,2] -0.6522810 1.703058e-01 -0.8670043 -0.3425695 1518.395 1.0023133
## Rho[2,1] -0.6522810 1.703058e-01 -0.8670043 -0.3425695 1518.395 1.0023133
## Rho[2,2]  1.0000000 5.859357e-17  1.0000000  1.0000000 7504.974 0.9994999
## Sigma[1]  0.5719033 9.647745e-02  0.4254988  0.7318981 1796.682 1.0035817
## Sigma[2]  0.7732802 2.025374e-01  0.4703167  1.1131390  795.226 1.0056162
#There is a negative correlation between the intercepts and slopes.

post <- extract.samples(m_H1)
a <- apply(post$a, 2, mean)
b <- apply(post$b, 2, mean)
plot(a, b, xlab = "a (intercept)", ylab = "b (urban slope)")
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
R <- apply(post$Rho, 2:3, mean)
s <- apply(post$Sigma, 2, mean)
S <- diag(s) %*% R %*% diag(s)
ll <- c(0.5, 0.67, 0.89, 0.97)
for (l in ll) {
  el <- ellipse(S, centre = c(mean(post$abar), mean(post$bbar)), level = l)
  lines(el, col = "black", lwd = 0.5)
}

#As we learn, the districts with higher use of contraception outside of urban areas come with smaller slopes. 

#In probability scale by applying inverse logit transformation:

u0 <- inv_logit(a)
u1 <- inv_logit(a + b)
plot(u0, u1, xlim = c(0, 1), ylim = c(0, 1), xlab = "urban = 0", ylab = "urban = 1")
abline(h = 0.5, lty = 2)
abline(v = 0.5, lty = 2)