Chapter 14 - Adventures in Covariance

This chapter extended the basic multilevel strategy of partial pooling to slopes as well as intercepts. Accomplishing this meant modeling covariation in the statistical population of parameters. The LKJcorr prior was introduced as a convenient family of priors for correlation matrices. You saw how covariance models can be applied to causal inference, using instrumental variables and the front-door criterion. Gaussian processes represent a practical method of extending the varying effects strategy to continuous dimensions of similarity, such as spatial, network, phylogenetic, or any other abstract distance between entities in the data.

Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.

Finally, upon completion, name your final output .html file as: YourName_ANLY505-Year-Semester.html and publish the assignment to your R Pubs account and submit the link to Canvas. Each question is worth 5 points.

Questions

14-1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

a = 3.5 
b = (-1) 
sigma_a = 1 
sigma_b = 0.5 
rho= 0 
Mu = c(a, b)
cov_ab = sigma_a * sigma_b * rho
Sigma = matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)

N_cafes = 20
set.seed(6)
vary_effects = mvrnorm(N_cafes, Mu, Sigma)
a_cafe = vary_effects[, 1]
b_cafe = vary_effects[, 2]
N_visits = 10
afternoon = rep(0:1, N_visits * N_cafes / 2)
cafe_id = rep(1:N_cafes, each = N_visits)
mu = a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma = 0.5 # std dev within cafes
wait = rnorm(N_visits * N_cafes, mu, sigma)

d = data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)

m_M1 = ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 6, cores = 6
)

post = extract.samples(m_M1)
ggplot() +
  stat_halfeye(aes(x = post$Rho[, 1, 2])) +
  theme_bw() +
  labs(x = "Rho")

precis(m_M1, pars = "Rho[1,2]")
##              result
## mean     0.01308609
## sd       0.23733082
## 5.5%    -0.36905294
## 94.5%    0.40167788
## n_eff 2612.73299569
## Rhat     0.99993714
# We can see that accurately represents the underlying correlation of 0.

14-2. Fit this multilevel model to the simulated café data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.

a = 3.5  
b = -1  
sigma_a = 1  
sigma_b = 0.5 
rho = -0.7  
Mu = c(a, b)
cov_ab = sigma_a * sigma_b * rho
Sigma = matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2)
 
N_cafes = 20
set.seed(42)
vary_effects = mvrnorm(N_cafes, Mu, Sigma)
a_cafe = vary_effects[, 1]
b_cafe = vary_effects[, 2]
N_visits = 10
afternoon = rep(0:1, N_visits * N_cafes / 2)
cafe_id = rep(1:N_cafes, each = N_visits)
mu = a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma = 0.5  
wait = rnorm(N_visits * N_cafes, mu, sigma)
 
d = data.frame(cafe = cafe_id, afternoon = afternoon, wait = wait)

m_M2Baseline = ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe, b_cafe)[cafe] ~ multi_normal(c(a, b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ),
  data = d, chains = 4, cores = 4
)

m_M2 = ulam(
  alist(
    wait ~ dnorm(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    a_cafe[cafe] ~ dnorm(a, sigma_alpha),
    b_cafe[cafe] ~ dnorm(b, sigma_beta),
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 10),
    sigma ~ dexp(1),
    sigma_alpha ~ dexp(1),
    sigma_beta ~ dexp(1)
  ),
  data = d, chains = 4, cores = 4
)

post_Base = extract.samples(m_M2Baseline)
a_Base = apply(post_Base$a_cafe, 2, mean)
b_Base = apply(post_Base$b_cafe, 2, mean)
post_M2 = extract.samples(m_M2)
a_M2 = apply(post_M2$a_cafe, 2, mean)
b_M2 = apply(post_M2$b_cafe, 2, mean)
plot(a_M2, b_M2,
  xlab = "intercept", ylab = "slope",
  pch = 16, col = rangi2, ylim = c(min(b_M2) - 0.05, max(b_M2) + 0.05),
  xlim = c(min(a_M2) - 0.1, max(a_M2) + 0.1), cex = 2
)
points(a_Base, b_Base, pch = 1, cex = 2)

# This model accounts for correlation of slopes and intercepts. And we can see that slopes and intercepts are inherently related to one another and finds a negative correlation between them. The large intercepts are associated with strongly negative slopes.

14-3. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC. Explain the result.

data(Kline)
d <- Kline
# Chapter 11 stuff
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)
# Chapter 14 stuff
d$society <- 1:10
data(islandsDistMatrix)

dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- exp(a[cid]) * P^b[cid] / g,
    a[cid] ~ dnorm(1, 1),
    b[cid] ~ dexp(1),
    g ~ dexp(1)
  ),
  data = dat2, chains = 4, cores = 4, log_lik = TRUE
)

dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- (a * P^b / g) * exp(k[society]),
    vector[10]:k ~ multi_normal(0, SIGMA),
    matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)

compare(m11.11, m14.8)
##            WAIC        SE    dWAIC      dSE    pWAIC      weight
## m14.8  67.69180  2.203929  0.00000       NA 4.125443 0.998469163
## m11.11 80.65262 11.335497 12.96082 11.36499 5.179427 0.001530837
# We can see that m14.8 model performs better than the m11.11.And the Gaussian process model has less effective parameters (pWAIC) than the simpler model. 

14-4. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? Plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plot the predicted proportion of women using contraception, with urban women on one axis and rural on the other. Explain the results.

data(bangladesh)
d <- bangladesh
dat_list <- list(
  C = d$use.contraception,
  did = as.integer(as.factor(d$district)),
  urban = d$urban
)

m_H1 <- ulam(
  alist(
    C ~ bernoulli(p),
    logit(p) <- a[did] + b[did] * urban,
    c(a, b)[did] ~ multi_normal(c(abar, bbar), Rho, Sigma),
    abar ~ normal(0, 1),
    bbar ~ normal(0, 0.5),
    Rho ~ lkj_corr(2),
    Sigma ~ exponential(1)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 4000
)

precis(m_H1)
##            mean         sd       5.5%      94.5%    n_eff     Rhat4
## abar -0.6833179 0.09880033 -0.8400065 -0.5246286 6147.165 1.0000479
## bbar  0.6335012 0.16283924  0.3719204  0.8925002 4435.839 0.9999479
precis(m_H1, depth = 3, pars = c("Rho", "Sigma"))
##                mean           sd       5.5%      94.5%    n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000      NaN       NaN
## Rho[1,2] -0.6456963 1.676427e-01 -0.8644315 -0.3463125 1422.240 1.0017087
## Rho[2,1] -0.6456963 1.676427e-01 -0.8644315 -0.3463125 1422.240 1.0017087
## Rho[2,2]  1.0000000 6.145596e-17  1.0000000  1.0000000 7105.184 0.9994999
## Sigma[1]  0.5740731 9.738070e-02  0.4296594  0.7367008 2615.967 1.0009836
## Sigma[2]  0.7807138 1.942076e-01  0.4828870  1.1066861 1140.530 1.0045143
post <- extract.samples(m_H1)
a <- apply(post$a, 2, mean)
b <- apply(post$b, 2, mean)
plot(a, b, xlab = "a (intercept)", ylab = "b (urban slope)")
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
R <- apply(post$Rho, 2:3, mean)
s <- apply(post$Sigma, 2, mean)
S <- diag(s) %*% R %*% diag(s)
ll <- c(0.5, 0.67, 0.89, 0.97)
for (l in ll) {
  el <- ellipse(S, centre = c(mean(post$abar), mean(post$bbar)), level = l)
  lines(el, col = "black", lwd = 0.5)
}

u0 <- inv_logit(a)
u1 <- inv_logit(a + b)
plot(u0, u1, xlim = c(0, 1), ylim = c(0, 1), xlab = "urban = 0", ylab = "urban = 1")
abline(h = 0.5, lty = 2)
abline(v = 0.5, lty = 2)

# We can see that the districts with higher use of contraception outside of urban areas come with smaller slopes. Its do not have a significant shift in use of contraceptives when moving to urban areas.

14-5 Now consider the predictor variables age.centered and living.children, also contained in data(bangladesh). Suppose that age influences contraceptive use (changing attitudes) and number of children (older people have had more time to have kids). Number of children may also directly influence contraceptive use. Draw a DAG that reflects these hypothetical relationships. Then build models needed to evaluate the DAG. You will need at least two models. Retain district and urban, as in 14-4. What do you conclude about the causal influence of age and children?

dag <- dagitty::dagitty(" dag {
                        Age -> N_children
                        Age -> contraception
                        N_children -> contraception
                        }")
drawdag(dag)

dat_list$children <- standardize(d$living.children) 
dat_list$age <- standardize(d$age.centered)

m1 <- ulam( alist(
C ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bA*age,
c(a,b)[did] ~ multi_normal( c(abar,bbar), Rho, Sigma), 
abar ~ normal(0,1),
c(bbar,bA) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list, chains=4, cores=4)

precis(m1)
##             mean         sd         5.5%      94.5%    n_eff     Rhat4
## abar -0.68866932 0.09690611 -0.837962758 -0.5390095 1785.598 0.9991651
## bA    0.08574221 0.05019701  0.007194717  0.1675978 2920.317 0.9986665
## bbar  0.64835535 0.16177264  0.391786536  0.9048286 1228.941 0.9997753
precis(m1, depth=3, pars=c("Rho","Sigma"))
##                mean           sd       5.5%     94.5%     n_eff    Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.000000       NaN      NaN
## Rho[1,2] -0.6449438 1.776327e-01 -0.8574411 -0.330814  271.3734 1.012156
## Rho[2,1] -0.6449438 1.776327e-01 -0.8574411 -0.330814  271.3734 1.012156
## Rho[2,2]  1.0000000 5.754280e-17  1.0000000  1.000000 1775.8936 0.997998
## Sigma[1]  0.5781366 9.731639e-02  0.4341311  0.741339  355.6451 1.014014
## Sigma[2]  0.7688081 2.140269e-01  0.4309752  1.099655  135.3573 1.040528
m2 <- ulam( alist(
C ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bK*children + bA*age, 
c(a,b)[did] ~ multi_normal(c(abar,bbar), Rho, Sigma), 
abar ~ normal(0,1),
c(bbar,bK,bA) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list , chains=4 , cores=4 )

precis(m2)
##            mean         sd       5.5%      94.5%     n_eff     Rhat4
## abar -0.7199801 0.10357934 -0.8885619 -0.5574366 1376.7241 1.0033880
## bA   -0.2630084 0.06849489 -0.3716681 -0.1543842 2475.8394 0.9990618
## bK    0.5098454 0.06939794  0.4022651  0.6243414 2314.2406 0.9995708
## bbar  0.6901812 0.16266730  0.4345989  0.9487198  953.6034 1.0041531
precis(m2, depth=3, pars=c("Rho","Sigma") )
##                mean           sd       5.5%      94.5%     n_eff    Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000       NaN      NaN
## Rho[1,2] -0.6337205 1.715108e-01 -0.8529450 -0.3076873  411.4620 1.007440
## Rho[2,1] -0.6337205 1.715108e-01 -0.8529450 -0.3076873  411.4620 1.007440
## Rho[2,2]  1.0000000 6.277031e-17  1.0000000  1.0000000 1944.1693 0.997998
## Sigma[1]  0.6042683 9.911688e-02  0.4597289  0.7739020  683.8189 1.000557
## Sigma[2]  0.7653553 2.051784e-01  0.4357862  1.0999545  206.2355 1.016086
# We can see that once we change the number of children, the older people who use contraception will be changed. when the number of children increases, then using contraception will be dropped.