Assignment #11

Questions

14-1. Repeat the café robot simulation from the beginning of the chapter. This time, set rho to zero, so that there is no correlation between intercepts and slopes. Plot the posterior distribution. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

library(MASS)
a <- 3.5 
b <- (-1) 
sigma_a <- 1 
sigma_b <- 0.5 
rho <- 0 

Mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix(c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol=2)

# Simulation
N_cafes <- 20
set.seed(6)
vary_effects <- mvrnorm(N_cafes, Mu, Sigma)
a_cafe <- vary_effects[ ,1]
b_cafe <- vary_effects[ ,2]

N_visits <- 10
afternoon <- rep(0:1, N_visits * N_cafes/2)
cafe_id <- rep(1:N_cafes, each=N_visits)
mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5  # std dev within cafes
wait <- rnorm(N_visits * N_cafes, mu, sigma)
d <- data.frame(cafe=cafe_id, afternoon=afternoon, wait=wait)


set.seed(42)
m14.1 <- ulam(
  alist(
    wait ~ normal(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    c(a_cafe,b_cafe)[cafe] ~ multi_normal(c(a,b), Rho, sigma_cafe),
    a ~ normal(5, 2),
    b ~ normal(-1, 0.5),
    sigma_cafe ~ exponential(1),
    sigma ~ exponential(1),
    Rho ~ lkj_corr(2)
  ), data=d, chains=4, cores=4, refresh=0
)

#precis(m14.1, pars="Rho[1, 2]")

#posterior <- extract.samples(m14.1)
#dens(posterior$Rho[ , 1, 2])
#legend("topright",
      # pch=15,
       #legend=c("Rho"))

# Since we set rho to be equal to 0，the posterior distribution of the correlation reflects this change.

14-2. Fit this multilevel model to the simulated café data: \[\begin{align} \ W_i ∼ Normal(μ_i, σ) \\ \ μ_i = α_{CAFE[i]} + β_{CAFE[i]}A_i \\ \ α_{CAFE} ∼ Normal(α, σ_α) \\ \ β_{CAFE} ∼ Normal(β, σ_β) \\ \ α ∼ Normal(0, 10) \\ \ β ∼ Normal(0, 10) \\ \ σ, σ_α, σ_β ∼ Exponential(1) \\ \end{align}\] Use WAIC to compare this model to the model from the chapter, the one that uses a multi-variate Gaussian prior. Visualize the differences between the posterior means. Explain the result.

#set up parameters of population 
a<-3.5 #average morning wait time 
b<-(-1) #average difference afternoon wait time 
sigma_a<-1 #stddev in intercepts 
sigma_b<-0.5 #stddev in slopes 
rho<-(-0.7) #correlation between intercepts and slopes 
Mu<-c(a,b) 
cov_ab<-sigma_a*sigma_b*rho 
Sigma<-matrix(c(sigma_a^2,cov_ab,cov_ab,sigma_b^2),ncol=2) 

#simulate observations 
N_cafes<-20 
library(MASS) 
set.seed(5) #used to replicate example 
vary_effects<-mvrnorm(N_cafes,Mu,Sigma) 
a_cafe<-vary_effects[,1] 
b_cafe<-vary_effects[,2] 
N_visits<-10 
afternoon<-rep(0:1,N_visits*N_cafes/2) 
cafe_id<-rep(1:N_cafes,each=N_visits) 
mu<-a_cafe[cafe_id]+b_cafe[cafe_id]*afternoon 
sigma<-0.5 #stddev with in cafes 
wait<-rnorm(N_visits*N_cafes,mu,sigma) 

#package into data frame 
d<-data.frame(cafe=cafe_id,afternoon=afternoon,wait=wait)
m13M2<-ulam( 
  alist( 
    wait~dnorm(mu,sigma), 
    mu<-a_cafe[cafe]+b_cafe[cafe]*afternoon, 
    a_cafe[cafe]~dnorm(a,sigma_alpha), 
    b_cafe[cafe]~dnorm(b,sigma_beta), 
    a~dnorm(0,10), 
    b~dnorm(0,10), 
    sigma~dcauchy(0,1), 
    sigma_alpha~dcauchy(0,1), 
    sigma_beta~dcauchy(0,1)
     ), 
  data=d,iter=5000,warmup=2000,chains=3,cores=3)

m13.1 <- ulam(
  alist(
    wait ~ dnorm(mu, sigma),
    mu <- a_cafe[cafe] + b_cafe[cafe] * afternoon,
    a_cafe[cafe] ~ dnorm(a, sigma_alpha),
    b_cafe[cafe] ~ dnorm(b, sigma_beta),
    a ~ dnorm(0, 10),
    b ~ dnorm(0, 10),
    sigma ~ exponential(1),
    sigma_alpha ~ exponential(1),
    sigma_beta ~ exponential(1)
  ),
  data = d, chains = 4, cores = 4, log_lik = TRUE
)

post1<-extract.samples(m13.1) 
a1<-apply(post1$a_cafe,2,mean) 
b1<-apply(post1$b_cafe,2,mean) 
post2<-extract.samples(m13M2) 
a2<-apply(post2$a_cafe,2,mean) 
b2<-apply(post2$b_cafe,2,mean) 
plot(a1,b1,xlab="intercept",ylab="slope", 
     pch=16,col=rangi2,ylim=c(min(b1)-0.05,max(b1)+0.05), 
     xlim=c(min(a1)-0.1,max(a1)+0.1)) 
points(a2,b2,pch=1)

# In the problem at hand, the model assumes independent priors for the intercepts (alpha_cafe) and slopes (beta_cafe), which implies no correlation between them. However, this approach lacks a correlation parameter. The main text assigned a scale of 2 to the Cauchy priors, but the model above sets it to 1, which can be adjusted to match the main text. Nevertheless, the difference is negligible. The posterior means of the average effects and standard deviation parameters are almost identical. While there are some differences between the blue and open points in the plot, they mostly occur at extreme intercept and slope values. The model m13.1 estimated the correlation between the intercepts and slopes, which it used to adjust them simultaneously. As the correlation is negative, larger intercepts correspond, on average, to smaller slopes. Therefore, to the right of the center, the blue points are below the open points, and to the left, they are above. The model that considers the correlation is more accurate, on average, because it incorporates additional population information and shrinks in both dimensions.

14-3. Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 11. Pay special attention to the effective numbers of parameters, as estimated by WAIC. Explain the result.

library(rethinking) 
data(Kline)
d <- Kline
# Chapter 11 
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)
d$society <- 1:10
data(islandsDistMatrix)
dat2 <- list(T = d$total_tools, P = d$population, cid = d$contact_id)
m11.11 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- exp(a[cid]) * P^b[cid] / g,
    a[cid] ~ dnorm(1, 1),
    b[cid] ~ dexp(1),
    g ~ dexp(1)
  ),
  data = dat2, chains = 4, cores = 4, log_lik = TRUE
)

dat_list <- list(T = d$total_tools, P = d$population, society = d$society, Dmat = islandsDistMatrix)
m14.8 <- ulam(
  alist(
    T ~ dpois(lambda),
    lambda <- (a * P^b / g) * exp(k[society]),
    vector[10]:k ~ multi_normal(0, SIGMA),
    matrix[10, 10]:SIGMA <- cov_GPL2(Dmat, etasq, rhosq, 0.01), c(a, b, g) ~ dexp(1), etasq ~ dexp(2), rhosq ~ dexp(0.5)
  ),
  data = dat_list, chains = 4, cores = 4, iter = 2000, log_lik = TRUE
)

compare(m11.11, m14.8)

##            WAIC        SE    dWAIC      dSE    pWAIC      weight
## m14.8  67.73178  2.244134  0.00000       NA 4.157516 0.998503056
## m11.11 80.73744 11.441280 13.00566 11.39767 5.202666 0.001496944

# Out of all the models presented, only the Gaussian process model incorporates adaptive regularization. Despite its intricate structure, it is less flexible than most of the other models and has an effective number of parameters close to the actual parameters. The performance of m14.8 surpasses that of m11.11, and the Gaussian process model receives nearly all the weight in this comparison.

14-4. Let’s revisit the Bangladesh fertility data, data(bangladesh), from the practice problems for Chapter 13. Fit a model with both varying intercepts by district_id and varying slopes of urban by district_id. You are still predicting use.contraception. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? Plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plot the predicted proportion of women using contraception, with urban women on one axis and rural on the other. Explain the results.

library(rethinking)
library(MASS)
data(bangladesh) 
d <- bangladesh 
str(d)

## 'data.frame':    1934 obs. of  6 variables:
##  $ woman            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ district         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ use.contraception: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ living.children  : int  4 1 3 4 1 1 4 4 2 4 ...
##  $ age.centered     : num  18.44 -5.56 1.44 8.44 -13.56 ...
##  $ urban            : int  1 1 1 1 1 1 1 1 1 1 ...

# fix index- can also use coerce_index() here 
d$district_id <- as.integer(as.factor(d$district))
#rename outcome so it doesn't have a dot in it 
dlist<-list( 
  use_contraception=d$use.contraception, 
  urban=d$urban, 
  district=d$district_id)
m13H1<-ulam( 
    alist(
   use_contraception~dbinom(1,p),
    logit(p) <- a[district] + b[district] * urban,
    c(a, b)[district] ~ multi_normal(c(abar, bbar), Rho, Sigma),
    abar ~ normal(0, 1),
    bbar ~ normal(0, 0.5),
    Rho ~ lkj_corr(2),
    Sigma ~ exponential(1)
  ),
  data = dlist, chains = 4, cores = 4, iter = 4000
)

precis(m13H1,pars=c("Sigma","Rho"),depth=3)

##                mean           sd       5.5%      94.5%     n_eff     Rhat4
## Sigma[1]  0.5741830 9.690234e-02  0.4283957  0.7368998 1969.3209 1.0011513
## Sigma[2]  0.7728980 2.067958e-01  0.4589811  1.1089871  731.6714 1.0085376
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000       NaN       NaN
## Rho[1,2] -0.6514383 1.702595e-01 -0.8688449 -0.3450856 1492.9469 1.0045957
## Rho[2,1] -0.6514383 1.702595e-01 -0.8688449 -0.3450856 1492.9469 1.0045957
## Rho[2,2]  1.0000000 6.026597e-17  1.0000000  1.0000000 7464.5785 0.9994999

post<-extract.samples(m13H1) 
dens(post$Rho[,1,2],xlab="rho",xlim=c(-1,1)) 
abline(v=0,lty=2) #add dashed marker at zero correlation

pred.dat.rural<-list( 
  urban=rep(0,60), 
  district=1:60) 
pred.dat.urban<-list( 
  urban=rep(1,60), 
  district=1:60) 
pred.rural<-link(m13H1,data=pred.dat.rural) 
pred.urban<-link(m13H1,data=pred.dat.urban) 
means.rural<-apply(pred.rural,2,mean) 
means.urban<-apply(pred.urban,2,mean) 
plot(means.rural,means.urban,col="slateblue", 
     xlim=c(0,1),ylim=c(0,1), 
     xlab="ruraluse",ylab="urbanuse") 
abline(a=0,b=1,lty=2)

plot(means.rural,means.urban-means.rural,col="slateblue", xlab="rural use",ylab="difference btw urban and rural") 
abline(h=0,lty=2)

# It appears that urban residence has a positive effect on contraceptive use, which is not surprising. The variance components indicate that the standard deviation for slopes (sigma[2]) is higher than that for varying intercepts (sigma[1]). However, it is crucial to be cautious when interpreting differences between slopes and intercepts, as the scale is not the same. Nonetheless, it is safe to say that urbanity has a varying impact on contraceptive use across different districts. Moreover, the correlation between intercepts and slopes across districts is negative, with a posterior mean of -0.66 and an 89% interval below zero. This means that larger intercepts are associated with smaller slopes, indicating that districts with higher contraceptive use in rural areas are more similar to urban areas within the same district. This can be visualized through two plots, which demonstrate that districts with higher rural contraceptive use have more similar urban and rural areas. In other words, in districts with higher rural contraceptive use, urban and rural areas are more alike (closer to the dashed lines). Districts with the highest urban use rates have low rural use, but those with the lowest urban use rates have some of the highest rural use rates.

14-5 Now consider the predictor variables age.centered and living.children, also contained in data(bangladesh). Suppose that age influences contraceptive use (changing attitudes) and number of children (older people have had more time to have kids). Number of children may also directly influence contraceptive use. Draw a DAG that reflects these hypothetical relationships. Then build models needed to evaluate the DAG. You will need at least two models. Retain district and urban, as in 14-4. What do you conclude about the causal influence of age and children?

# Determine the DAG for this  scenario reflecting the hypothetical relation of Age and living children on the Contraception
library(dagitty)
Dag <- dagitty::dagitty(" dag {
                        Age_Centered -> Contraception
                        Living_Children -> Contraception
                        Age_Centered -> Living_Children
                        }")
drawdag(Dag)

# Determine the models with inclusion of new predictors of *age* and *living children* in the models
set.seed(1234)
data(bangladesh)
dat <- bangladesh

# Inclusion of additional predictor parameters in the data.
dat_list <- list(
  Cntrcptn = dat$use.contraception,
  did = as.integer(as.factor(dat$district)),
  urban = dat$urban,
  age = standardize(dat$age.centered),
  children = standardize(dat$living.children
))

# Determine the Model for our DAG evaluation with *age* predictor and retaining the other *urban* and *district* parameters as well in the model
Model_1_Age <- ulam( alist(
Cntrcptn ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bA*age,
c(a,b)[did] ~ multi_normal( c(abar,bbar), Rho, Sigma), 
abar ~ normal(0,1),
c(bbar,bA) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list, chains=2, cores=4, log_lik = TRUE, iter = 2000)

# Determination for precis for in-depth exploration of posterior estimates "Rho", & "Sigma" 
precis(Model_1_Age, depth = 3, pars = c("Rho", "Sigma"))

##                mean           sd       5.5%      94.5%     n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000       NaN       NaN
## Rho[1,2] -0.6481751 1.687386e-01 -0.8645762 -0.3423949  421.5262 1.0007101
## Rho[2,1] -0.6481751 1.687386e-01 -0.8645762 -0.3423949  421.5262 1.0007101
## Rho[2,2]  1.0000000 5.870968e-17  1.0000000  1.0000000 1793.6714 0.9989995
## Sigma[1]  0.5783229 9.898190e-02  0.4301549  0.7483148  408.8852 1.0000318
## Sigma[2]  0.7895637 2.003872e-01  0.4731738  1.1067617  246.0145 1.0039245

# Determination for the second Model for our DAG evaluation with *age* as well as "children* as predictors and further retaining the other *urban* and *district* parameters also included in this new model
Model_2_Age_Children <- ulam( alist(
Cntrcptn ~ bernoulli( p ),
logit(p) <- a[did] + b[did]*urban + bA*age +bC*children,
c(a,b)[did] ~ multi_normal( c(abar,bbar), Rho, Sigma), 
abar ~ normal(0,1),
c(bbar,bA,bC) ~ normal(0,0.5),
Rho ~ lkj_corr(2),
Sigma ~ exponential(1)
) , data=dat_list, chains=2, cores=4, log_lik = TRUE, iter = 2000)

precis(Model_2_Age_Children, depth = 3, pars = c("Rho", "Sigma"))

##                mean           sd       5.5%      94.5%      n_eff     Rhat4
## Rho[1,1]  1.0000000 0.000000e+00  1.0000000  1.0000000        NaN       NaN
## Rho[1,2] -0.6316599 1.758695e-01 -0.8645606 -0.3194775  392.72093 1.0102919
## Rho[2,1] -0.6316599 1.758695e-01 -0.8645606 -0.3194775  392.72093 1.0102919
## Rho[2,2]  1.0000000 6.272117e-17  1.0000000  1.0000000 1994.78528 0.9989995
## Sigma[1]  0.6077203 1.049990e-01  0.4510592  0.7819080  455.69981 1.0113967
## Sigma[2]  0.7703891 2.276556e-01  0.4067350  1.1309406   95.36887 1.0311645

# The posterior estimates of both "Model_1_Age" and "Model_2_Age_Children" reveal a negative correlation between intercepts and slopes (Rho[1,2] & Rho[2,1]). The difference in score values between the two models was not found to be significant. As per our model output comparison and DAG, it is clear that the direct causal effect of age is greater than the indirect effect mediated by the number of living children. Thus, it can be inferred from our model findings that older individuals are less likely to use contraception for adjusting the number of children. However, since older individuals tend to have more children, changes in the number of children impact the likelihood of using contraception. Therefore, increasing the number of children decreases the percentage of older individuals using contraception.

Assignment #11

Chapter 14

Emeka Nwosu

2023-02-15

Chapter 14 - Adventures in Covariance

Questions