This chapter extended the basic multilevel strategy of partial pooling to slopes as well as intercepts. Accomplishing this meant modeling covariation in the statistical population of parameters. The LKJcorr prior was introduced as a convenient family of priors for correlation matrices. You saw how covariance models can be applied to causal inference, using instrumental variables and the front-door criterion. Gaussian processes represent a practical method of extending the varying effects strategy to continuous dimensions of similarity, such as spatial, network, phylogenetic, or any other abstract distance between entities in the data.
Place each answer inside the code chunk (grey box). The code chunks should contain a text response or a code that completes/answers the question or activity requested. Make sure to include plots if the question requests them.
Finally, upon completion, name your final output .html
file as: YourName_ANLY505-Year-Semester.html and publish
the assignment to your R Pubs account and submit the link to Canvas.
Each question is worth 5 points.
14-1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?
library(MASS)
a <- 3.5
b <- (-1)
sigma_a <- 1
sigma_b <- 0.5
rho <- 0
Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol=2)
# Simulation
N_cafes <- 20
set.seed(6)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[ ,1]
b_cafe <- vary_effects[ ,2]
N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes/2)
cafe_id <- rep(1:N_cafes, each=N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
d <- data.frame(cafe=cafe_id, afternoon=afternoon, wait=wait)
set.seed(42)
m14.1 <- ulam(
alist(
wait ~ normal(mu, sigma),
mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
c(a_cafe,b_cafe)[cafe] ~ multi_normal(c(a,b), Rho, sigma_cafe),
a ~ normal(5, 2),
b ~ normal(-1, 0.5),
sigma_cafe ~ exponential(1),
sigma ~ exponential(1),
Rho ~ lkj_corr(2)
), data=d, chains=4, cores=4, refresh=0
)
#precis(m14.1, pars="Rho[1, 2]")
#posterior <- extract.samples(m14.1)
#dens(posterior$Rho[ , 1, 2])
#legend("topright",
# pch=15,
#legend=c("Rho"))
# Since we set rho to be equal to 0,the posterior distribution of the correlation reflects this change.
14-2. Fit this multilevel model to the simulated café data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.
#set up parameters of population
a<-3.5 #average morning wait time
b<-(-1) #average difference afternoon wait time
sigma_a<-1 #stddev in intercepts
sigma_b<-0.5 #stddev in slopes
rho<-(-0.7) #correlation between intercepts and slopes
Mu<-c(a,b)
cov_ab<-sigma_a*sigma_b*rho
Sigma<-matrix(c(sigma_a^2,cov_ab,cov_ab,sigma_b^2),ncol=2)
#simulate observations
N_cafes<-20
library(MASS)
set.seed(5) #used to replicate example
vary_effects<-mvrnorm(N_cafes,Mu,Sigma)
a_cafe<-vary_effects[,1]
b_cafe<-vary_effects[,2]
N_visits<-10
afternoon<-rep(0:1,N_visits*N_cafes/2)
cafe_id<-rep(1:N_cafes,each=N_visits)
mu<-a_cafe[cafe_id]+b_cafe[cafe_id]*afternoon
sigma<-0.5 #stddev with in cafes
wait<-rnorm(N_visits*N_cafes,mu,sigma)
#package into data frame
d<-data.frame(cafe=cafe_id,afternoon=afternoon,wait=wait)
m13M2<-ulam(
alist(
wait~dnorm(mu,sigma),
mu<-a_cafe[cafe]+b_cafe[cafe]*afternoon,
a_cafe[cafe]~dnorm(a,sigma_alpha),
b_cafe[cafe]~dnorm(b,sigma_beta),
a~dnorm(0,10),
b~dnorm(0,10),
sigma~dcauchy(0,1),
sigma_alpha~dcauchy(0,1),
sigma_beta~dcauchy(0,1)
),
data=d,iter=5000,warmup=2000,chains=3,cores=3)
m13.1 <- ulam(
alist(
wait ~ dnorm(mu, sigma),
mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
a_cafe[cafe] ~ dnorm(a, sigma_alpha),
b_cafe[cafe] ~ dnorm(b, sigma_beta),
a ~ dnorm(0, 10),
b ~ dnorm(0, 10),
sigma ~ exponential(1),
sigma_alpha ~ exponential(1),
sigma_beta ~ exponential(1)
),
data = d, chains = 4, cores = 4, log_lik = TRUE
)
post1<-extract.samples(m13.1)
a1<-apply(post1$a_cafe,2,mean)
b1<-apply(post1$b_cafe,2,mean)
post2<-extract.samples(m13M2)
a2<-apply(post2$a_cafe,2,mean)
b2<-apply(post2$b_cafe,2,mean)
plot(a1,b1,xlab="intercept",ylab="slope",
pch=16,col=rangi2,ylim=c(min(b1)-0.05,max(b1)+0.05),
xlim=c(min(a1)-0.1,max(a1)+0.1))
points(a2,b2,pch=1)
# In the problem at hand, the model assumes independent priors for the intercepts (alpha_cafe) and slopes (beta_cafe), which implies no correlation between them. However, this approach lacks a correlation parameter. The main text assigned a scale of 2 to the Cauchy priors, but the model above sets it to 1, which can be adjusted to match the main text. Nevertheless, the difference is negligible. The posterior means of the average effects and standard deviation parameters are almost identical. While there are some differences between the blue and open points in the plot, they mostly occur at extreme intercept and slope values. The model m13.1 estimated the correlation between the intercepts and slopes, which it used to adjust them simultaneously. As the correlation is negative, larger intercepts correspond, on average, to smaller slopes. Therefore, to the right of the center, the blue points are below the open points, and to the left, they are above. The model that considers the correlation is more accurate, on average, because it incorporates additional population information and shrinks in both dimensions.
14-3. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC. Explain the result.
library(rethinking)
data(Kline)
d <- Kline
# Chapter 11
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)
d$society <- 1:10
data(islandsDistMatrix)
dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
alist(
T ~ dpois(lambda),
lambda <- exp(a[cid]) * P^b[cid] / g,
a[cid] ~ dnorm(1, 1),
b[cid] ~ dexp(1),
g ~ dexp(1)
),
data = dat2, chains = 4, cores = 4, log_lik = TRUE
)
dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
alist(
T ~ dpois(lambda),
lambda <- (a * P^b / g) * exp(k[society]),
vector[10]:k ~ multi_normal(0, SIGMA),
matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
),
data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)
compare(m11.11, m14.8)
## WAIC SE dWAIC dSE pWAIC weight
## m14.8 67.73178 2.244134 0.00000 NA 4.157516 0.998503056
## m11.11 80.73744 11.441280 13.00566 11.39767 5.202666 0.001496944
# Out of all the models presented, only the Gaussian process model incorporates adaptive regularization. Despite its intricate structure, it is less flexible than most of the other models and has an effective number of parameters close to the actual parameters. The performance of m14.8 surpasses that of m11.11, and the Gaussian process model receives nearly all the weight in this comparison.
14-4. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? Plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plot the predicted proportion of women using contraception, with urban women on one axis and rural on the other. Explain the results.
library(rethinking)
library(MASS)
data(bangladesh)
d <- bangladesh
str(d)
## 'data.frame': 1934 obs. of 6 variables:
## $ woman : int 1 2 3 4 5 6 7 8 9 10 ...
## $ district : int 1 1 1 1 1 1 1 1 1 1 ...
## $ use.contraception: int 0 0 0 0 0 0 0 0 0 0 ...
## $ living.children : int 4 1 3 4 1 1 4 4 2 4 ...
## $ age.centered : num 18.44 -5.56 1.44 8.44 -13.56 ...
## $ urban : int 1 1 1 1 1 1 1 1 1 1 ...
# fix index- can also use coerce_index() here
d$district_id <- as.integer(as.factor(d$district))
#rename outcome so it doesn't have a dot in it
dlist<-list(
use_contraception=d$use.contraception,
urban=d$urban,
district=d$district_id)
m13H1<-ulam(
alist(
use_contraception~dbinom(1,p),
logit(p) <- a[district] + b[district] * urban,
c(a, b)[district] ~ multi_normal(c(abar, bbar), Rho, Sigma),
abar ~ normal(0, 1),
bbar ~ normal(0, 0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
),
data = dlist, chains = 4, cores = 4, iter = 4000
)
precis(m13H1,pars=c("Sigma","Rho"),depth=3)
## mean sd 5.5% 94.5% n_eff Rhat4
## Sigma[1] 0.5741830 9.690234e-02 0.4283957 0.7368998 1969.3209 1.0011513
## Sigma[2] 0.7728980 2.067958e-01 0.4589811 1.1089871 731.6714 1.0085376
## Rho[1,1] 1.0000000 0.000000e+00 1.0000000 1.0000000 NaN NaN
## Rho[1,2] -0.6514383 1.702595e-01 -0.8688449 -0.3450856 1492.9469 1.0045957
## Rho[2,1] -0.6514383 1.702595e-01 -0.8688449 -0.3450856 1492.9469 1.0045957
## Rho[2,2] 1.0000000 6.026597e-17 1.0000000 1.0000000 7464.5785 0.9994999
post<-extract.samples(m13H1)
dens(post$Rho[,1,2],xlab="rho",xlim=c(-1,1))
abline(v=0,lty=2) #add dashed marker at zero correlation
pred.dat.rural<-list(
urban=rep(0,60),
district=1:60)
pred.dat.urban<-list(
urban=rep(1,60),
district=1:60)
pred.rural<-link(m13H1,data=pred.dat.rural)
pred.urban<-link(m13H1,data=pred.dat.urban)
means.rural<-apply(pred.rural,2,mean)
means.urban<-apply(pred.urban,2,mean)
plot(means.rural,means.urban,col="slateblue",
xlim=c(0,1),ylim=c(0,1),
xlab="ruraluse",ylab="urbanuse")
abline(a=0,b=1,lty=2)
plot(means.rural,means.urban-means.rural,col="slateblue", xlab="rural use",ylab="difference btw urban and rural")
abline(h=0,lty=2)
# It appears that urban residence has a positive effect on contraceptive use, which is not surprising. The variance components indicate that the standard deviation for slopes (sigma[2]) is higher than that for varying intercepts (sigma[1]). However, it is crucial to be cautious when interpreting differences between slopes and intercepts, as the scale is not the same. Nonetheless, it is safe to say that urbanity has a varying impact on contraceptive use across different districts. Moreover, the correlation between intercepts and slopes across districts is negative, with a posterior mean of -0.66 and an 89% interval below zero. This means that larger intercepts are associated with smaller slopes, indicating that districts with higher contraceptive use in rural areas are more similar to urban areas within the same district. This can be visualized through two plots, which demonstrate that districts with higher rural contraceptive use have more similar urban and rural areas. In other words, in districts with higher rural contraceptive use, urban and rural areas are more alike (closer to the dashed lines). Districts with the highest urban use rates have low rural use, but those with the lowest urban use rates have some of the highest rural use rates.
14-5 Now consider the predictor variables age.centered and living.children, also contained in data(bangladesh). Suppose that age influences contraceptive use (changing attitudes) and number of children (older people have had more time to have kids). Number of children may also directly influence contraceptive use. Draw a DAG that reflects these hypothetical relationships. Then build models needed to evaluate the DAG. You will need at least two models. Retain district and urban, as in 14-4. What do you conclude about the causal influence of age and children?
# Determine the DAG for this scenario reflecting the hypothetical relation of Age and living children on the Contraception
library(dagitty)
Dag <- dagitty::dagitty(" dag {
Age_Centered -> Contraception
Living_Children -> Contraception
Age_Centered -> Living_Children
}")
drawdag(Dag)
# Determine the models with inclusion of new predictors of *age* and *living children* in the models
set.seed(1234)
data(bangladesh)
dat <- bangladesh
# Inclusion of additional predictor parameters in the data.
dat_list <- list(
Cntrcptn = dat$use.contraception,
did = as.integer(as.factor(dat$district)),
urban = dat$urban,
age = standardize(dat$age.centered),
children = standardize(dat$living.children
))
# Determine the Model for our DAG evaluation with *age* predictor and retaining the other *urban* and *district* parameters as well in the model
Model_1_Age <- ulam( alist(
Cntrcptn ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bA*age,
c(a,b)[did] ~ multi_normal( c(abar,bbar), Rho, Sigma),
abar ~ normal(0,1),
c(bbar,bA) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list, chains=2, cores=4, log_lik = TRUE, iter = 2000)
# Determination for precis for in-depth exploration of posterior estimates "Rho", & "Sigma"
precis(Model_1_Age, depth = 3, pars = c("Rho", "Sigma"))
## mean sd 5.5% 94.5% n_eff Rhat4
## Rho[1,1] 1.0000000 0.000000e+00 1.0000000 1.0000000 NaN NaN
## Rho[1,2] -0.6481751 1.687386e-01 -0.8645762 -0.3423949 421.5262 1.0007101
## Rho[2,1] -0.6481751 1.687386e-01 -0.8645762 -0.3423949 421.5262 1.0007101
## Rho[2,2] 1.0000000 5.870968e-17 1.0000000 1.0000000 1793.6714 0.9989995
## Sigma[1] 0.5783229 9.898190e-02 0.4301549 0.7483148 408.8852 1.0000318
## Sigma[2] 0.7895637 2.003872e-01 0.4731738 1.1067617 246.0145 1.0039245
# Determination for the second Model for our DAG evaluation with *age* as well as "children* as predictors and further retaining the other *urban* and *district* parameters also included in this new model
Model_2_Age_Children <- ulam( alist(
Cntrcptn ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bA*age +bC*children,
c(a,b)[did] ~ multi_normal( c(abar,bbar), Rho, Sigma),
abar ~ normal(0,1),
c(bbar,bA,bC) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list, chains=2, cores=4, log_lik = TRUE, iter = 2000)
precis(Model_2_Age_Children, depth = 3, pars = c("Rho", "Sigma"))
## mean sd 5.5% 94.5% n_eff Rhat4
## Rho[1,1] 1.0000000 0.000000e+00 1.0000000 1.0000000 NaN NaN
## Rho[1,2] -0.6316599 1.758695e-01 -0.8645606 -0.3194775 392.72093 1.0102919
## Rho[2,1] -0.6316599 1.758695e-01 -0.8645606 -0.3194775 392.72093 1.0102919
## Rho[2,2] 1.0000000 6.272117e-17 1.0000000 1.0000000 1994.78528 0.9989995
## Sigma[1] 0.6077203 1.049990e-01 0.4510592 0.7819080 455.69981 1.0113967
## Sigma[2] 0.7703891 2.276556e-01 0.4067350 1.1309406 95.36887 1.0311645
# The posterior estimates of both "Model_1_Age" and "Model_2_Age_Children" reveal a negative correlation between intercepts and slopes (Rho[1,2] & Rho[2,1]). The difference in score values between the two models was not found to be significant. As per our model output comparison and DAG, it is clear that the direct causal effect of age is greater than the indirect effect mediated by the number of living children. Thus, it can be inferred from our model findings that older individuals are less likely to use contraception for adjusting the number of children. However, since older individuals tend to have more children, changes in the number of children impact the likelihood of using contraception. Therefore, increasing the number of children decreases the percentage of older individuals using contraception.