Hierarchical Modeling

Data Input

Let’s fit our hierarhical model for counts of chocolate chips. The data can be found in cookies.dat.

dat = read.table(file="data_files/cookies.dat", header=TRUE)
head(dat)

##   chips location
## 1    12        1
## 2    12        1
## 3     6        1
## 4    13        1
## 5    12        1
## 6    12        1

table(dat$location)

## 
##  1  2  3  4  5 
## 30 30 30 30 30

hist(dat$chips)

boxplot(chips~location, data=dat)

Prior predictive checks

Before implementing the model, we need to select prior distributions for $\alpha$ and $\beta$, the hyperparameters governing the gamma distribution for the $\lambda$ parameters. First, think about what the $\lambda$’s represent. For location $j$, $\lambda j$ is the expected number of chocolate chips per cookie. Hence, $\alpha$ and $\beta$ control the distribution of these means between locations. The mean of this gamma distributidon will represent the overall mean of number of chips for all cookies. The variance of this gamma distribution controls the variability between locations. If this is high, the mean number of chips will vary widely from location to location. If it is small, the mean number of chips will be nearly the same from location to location.

To see the effects of different priors on the distribution of $\lambda$’s, we can simulate. Suppose we try independent exponential priors for $\alpha$ and $\beta$.

set.seed(112)
n_sim = 500
alpha_pri = rexp(n_sim, rate=1.0/2.0)
beta_pri = rexp(n_sim, rate=5.0)

mu_pri = alpha_pri/beta_pri
sig_pri = sqrt(alpha_pri/beta_pri^2)

summary(mu_pri)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.021    2.983    9.852   61.127   29.980 4858.786

summary(sig_pri)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.1834    3.3663    8.5488   41.8137   22.2219 2865.6461

After simulating from the priors for $\alpha$ and $\beta$, we can use those samples to simulate further down the hierarchy:

lambda_pri = rgamma(n=n_sim, shape=alpha_pri, rate=beta_pri)
summary(lambda_pri)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.000     1.171     7.667    83.062    28.621 11005.331

Or for a prior predictive reconstruction of the original data set:

(lambda_pri = rgamma(n=5, shape=alpha_pri[1:5], rate=beta_pri[1:5]))

## [1] 66.444084  9.946688  6.028319 15.922568 47.978587

(y_pri = rpois(n=150, lambda=rep(lambda_pri, each=30)))

##   [1] 63 58 64 63 70 62 61 48 71 73 70 77 66 60 72 77 69 62 66 71 49 80 66 75 74
##  [26] 55 62 90 65 57 12  9  7 10 12 10 11  7 14 13  9  6  6 13  7 10 12  9  9 10
##  [51]  7  8  6  9  7 10 13 13  8 12  6 10  3  6  7  4  6  7  5  5  4  3  6  2  8
##  [76]  4  8  4  5  7  1  4  5  3  8  8  3  1  7  3 16 14 13 17 17 12 13 13 16 16
## [101] 15 14 11 10 13 17 16 19 16 17 15 16  7 17 21 16 12 15 14 13 52 44 51 46 39
## [126] 40 40 44 46 59 45 49 58 42 31 52 43 47 53 41 48 57 35 60 51 58 36 34 41 59

Because these priors have high variance and are somewhat noninformative, they produce unrealistic predictive distributions. Still, enough data would overwhelm the prior, resulting in useful posterior distributions. Alternatively, we could tweak and simulate from these prior distributions until they adequately represent our prior beliefs. Yet another approach would be to re-parameterize the gamma prior, which we’ll demonstrate as we fit the model.

`JAGS` Model

library("rjags")

## Loading required package: coda

## Linked to JAGS 4.3.2

## Loaded modules: basemod,bugs

mod_string = " model {
for (i in 1:length(chips)) {
  chips[i] ~ dpois(lambda[location[i]])
}

for (j in 1:max(location)) {
  lambda[j] ~ dgamma(alpha, beta)
}

alpha = mu^2 / sig^2
beta = mu / sig^2

mu ~ dgamma(2.0, 1.0/5.0)
sig ~ dexp(1.0)

} "

set.seed(113)

data_jags = as.list(dat)

params = c("lambda", "mu", "sig")

mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 150
##    Unobserved stochastic nodes: 7
##    Total graph size: 315
## 
## Initializing model

update(mod, 1e3)

mod_sim = coda.samples(model=mod,
                       variable.names=params,
                       n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim))

## convergence diagnostics
plot(mod_sim)

gelman.diag(mod_sim)

## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## lambda[1]          1       1.00
## lambda[2]          1       1.00
## lambda[3]          1       1.00
## lambda[4]          1       1.00
## lambda[5]          1       1.00
## mu                 1       1.00
## sig                1       1.01
## 
## Multivariate psrf
## 
## 1

autocorr.diag(mod_sim)

##           lambda[1]     lambda[2]   lambda[3]     lambda[4]    lambda[5]
## Lag 0   1.000000000  1.0000000000  1.00000000  1.0000000000  1.000000000
## Lag 1   0.019896123  0.1179486545  0.01189915  0.0339354698  0.081685454
## Lag 5   0.001802248  0.0069474989  0.01648663  0.0033383446  0.008828777
## Lag 10  0.007575615  0.0104390098 -0.01165884  0.0025121857 -0.007741562
## Lag 50 -0.004694051 -0.0003860656 -0.00792229 -0.0008624712  0.012342023
##                 mu          sig
## Lag 0  1.000000000  1.000000000
## Lag 1  0.364632757  0.596693734
## Lag 5  0.031656214  0.118428191
## Lag 10 0.008698618  0.003281204
## Lag 50 0.002030983 -0.017211067

autocorr.plot(mod_sim)

effectiveSize(mod_sim)

## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5]        mu       sig 
## 14323.574 10907.144 14303.290 13466.308 12145.880  6039.979  3400.618

## compute DIC
dic = dic.samples(mod, n.iter=1e3)

Model checking

After assessing convergence, we can check the fit via residuals. With a hierarhcical model, there are now two levels of residuals: the observation level and the location mean level. To simplify, we’ll look at the residuals associated with the posterior means of the parameters.

First, we have observation residuals, based on the estimates of location means.

## observation level residuals
(pm_params = colMeans(mod_csim))

## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5]        mu       sig 
##  9.275456  6.225599  9.522458  8.941679 11.754803  9.087465  2.080304

yhat = rep(pm_params[1:5], each=30)
resid = dat$chips - yhat
plot(resid)

plot(jitter(yhat), resid)

var(resid[yhat<7])

## [1] 6.447126

var(resid[yhat>11])

## [1] 13.72414

Also, we can look at how the location means differ from the overall mean $\mu$.

## location level residuals
lambda_resid = pm_params[1:5] - pm_params["mu"]
plot(lambda_resid)
abline(h=0, lty=2)

We don’t see any obvious violations of our model assumptions.

Results

summary(mod_sim)

## 
## Iterations = 2001:7000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean     SD Naive SE Time-series SE
## lambda[1]  9.275 0.5359 0.004376       0.004481
## lambda[2]  6.226 0.4630 0.003780       0.004435
## lambda[3]  9.522 0.5413 0.004420       0.004537
## lambda[4]  8.942 0.5256 0.004291       0.004533
## lambda[5] 11.755 0.6228 0.005085       0.005663
## mu         9.087 0.9914 0.008094       0.012848
## sig        2.080 0.7175 0.005859       0.012306
## 
## 2. Quantiles for each variable:
## 
##             2.5%    25%    50%    75%  97.5%
## lambda[1]  8.261  8.916  9.262  9.623 10.362
## lambda[2]  5.361  5.904  6.211  6.534  7.153
## lambda[3]  8.500  9.149  9.516  9.881 10.611
## lambda[4]  7.933  8.588  8.929  9.288  9.983
## lambda[5] 10.549 11.328 11.751 12.165 12.994
## mu         7.171  8.471  9.067  9.669 11.169
## sig        1.085  1.580  1.948  2.427  3.864

Posterior predictive simulation

Just as we did with the prior distribution, we can use these posterior samples to get Monte Carlo estimates that interest us from the posterior predictive distribution.

For example, we can use draws from the posterior distribution of $\mu$ and $\sigma$ to simulate the posterior predictive distribution of the mean for a new location.

(n_sim = nrow(mod_csim))

## [1] 15000

lambda_pred = rgamma(n=n_sim, shape=mod_csim[,"mu"]^2/mod_csim[,"sig"]^2, 
                  rate=mod_csim[,"mu"]/mod_csim[,"sig"]^2)
hist(lambda_pred)

mean(lambda_pred > 15)

## [1] 0.01806667

Using these $\lambda$ draws, we can go to the observation level and simulate the number of chips per cookie, which takes into account the uncertainty in $\lambda$:

y_pred = rpois(n=n_sim, lambda=lambda_pred)
hist(y_pred)

mean(y_pred > 15)

## [1] 0.0606

hist(dat$chips)

Finally, we could answer questions like: what is the posterior probability that the next cookie produced in Location 1 will have fewer than seven chips?

y_pred1 = rpois(n=n_sim, lambda=mod_csim[,"lambda[1]"])
hist(y_pred1)

mean(y_pred1<7)

## [1] 0.1924

Random intercept linear model

We can extend the linear model for the Leinhardt data on infant mortality by incorporating the region variable. We’ll do this with a hierarhcical model, where each region has its own intercept.

library("car")

## Loading required package: carData

data("Leinhardt")
?Leinhardt
str(Leinhardt)

## 'data.frame':    105 obs. of  4 variables:
##  $ income: int  3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
##  $ infant: num  26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
##  $ region: Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
##  $ oil   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

pairs(Leinhardt)

head(Leinhardt)

##           income infant   region oil
## Australia   3426   26.7     Asia  no
## Austria     3350   23.7   Europe  no
## Belgium     3346   17.0   Europe  no
## Canada      4751   16.8 Americas  no
## Denmark     5029   13.5   Europe  no
## Finland     3312   10.1   Europe  no

Previously, we worked with infant mortality and income on the logarithmic scale. Recall also that we had to remove some missing data.

dat = na.omit(Leinhardt)
dat$logincome = log(dat$income)
dat$loginfant = log(dat$infant)
str(dat)

## 'data.frame':    101 obs. of  6 variables:
##  $ income   : int  3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
##  $ infant   : num  26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
##  $ region   : Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
##  $ oil      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ logincome: num  8.14 8.12 8.12 8.47 8.52 ...
##  $ loginfant: num  3.28 3.17 2.83 2.82 2.6 ...
##  - attr(*, "na.action")= 'omit' Named int [1:4] 24 83 86 91
##   ..- attr(*, "names")= chr [1:4] "Iran" "Haiti" "Laos" "Nepal"

Now we can fit the proposed model:

library("rjags")

mod_string = " model {
  for (i in 1:length(y)) {
    y[i] ~ dnorm(mu[i], prec)
    mu[i] = a[region[i]] + b[1]*log_income[i] + b[2]*is_oil[i]
  }
  
  for (j in 1:max(region)) {
    a[j] ~ dnorm(a0, prec_a)
  }
  
  a0 ~ dnorm(0.0, 1.0/1.0e6)
  prec_a ~ dgamma(1/2.0, 1*10.0/2.0)
  tau = sqrt( 1.0 / prec_a )
  
  for (j in 1:2) {
    b[j] ~ dnorm(0.0, 1.0/1.0e6)
  }
  
  prec ~ dgamma(5/2.0, 5*10.0/2.0)
  sig = sqrt( 1.0 / prec )
} "

set.seed(116)
data_jags = list(y=dat$loginfant, log_income=dat$logincome,
                  is_oil=as.numeric(dat$oil=="yes"), region=as.numeric(dat$region))
data_jags$is_oil

##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

table(data_jags$is_oil, data_jags$region)

##    
##      1  2  3  4
##   0 31 20 24 18
##   1  3  2  3  0

params = c("a0", "a", "b", "sig", "tau")

mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 101
##    Unobserved stochastic nodes: 9
##    Total graph size: 622
## 
## Initializing model

update(mod, 1e3) # burn-in

mod_sim = coda.samples(model=mod,
                       variable.names=params,
                       n.iter=5e3)

mod_csim = as.mcmc(do.call(rbind, mod_sim)) # combine multiple chains

## convergence diagnostics
plot(mod_sim)

gelman.diag(mod_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## a[1]       1.02       1.06
## a[2]       1.02       1.06
## a[3]       1.02       1.06
## a[4]       1.02       1.07
## a0         1.00       1.01
## b[1]       1.02       1.07
## b[2]       1.00       1.01
## sig        1.00       1.00
## tau        1.00       1.00
## 
## Multivariate psrf
## 
## 1.02

autocorr.diag(mod_sim)

##             a[1]      a[2]      a[3]      a[4]        a0      b[1]       b[2]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000
## Lag 1  0.9246845 0.9249464 0.9229657 0.9390910 0.2469307 0.9809473 0.13619648
## Lag 5  0.8541026 0.8530348 0.8513099 0.8691425 0.2337974 0.9057266 0.03466881
## Lag 10 0.7733505 0.7703498 0.7695464 0.7855497 0.2208945 0.8188252 0.05502477
## Lag 50 0.3738111 0.3798072 0.3747685 0.3892785 0.1054640 0.4024636 0.03415218
##                 sig          tau
## Lag 0   1.000000000  1.000000000
## Lag 1   0.076573143  0.281251508
## Lag 5   0.018898807 -0.005351989
## Lag 10  0.008891877  0.002945107
## Lag 50 -0.005001137 -0.007247051

autocorr.plot(mod_sim)

effectiveSize(mod_sim)

##       a[1]       a[2]       a[3]       a[4]         a0       b[1]       b[2] 
##   163.0862   174.2339   172.1697   155.9613   738.6444   145.6586  5133.1579 
##        sig        tau 
## 12112.3156  8420.2944

Results

Convergence looks okay, so let’s compare this with the old model from Lesson 7 using DIC:

dic.samples(mod, n.iter=1e3)

## Mean deviance:  214.5 
## penalty 7.233 
## Penalized deviance: 221.8

It appears that this model is an improvement over the non-hierarchical one we fit earlier. Notice that the penalty term, which can be interpreted as the “effective” number of parameters, is less than the actual number of parameters (nine). There are fewer “effective” parameters because they are “sharing” information or “borrowing strength” from each other in the hierarhical structure. If we had skipped the hierarchy and fit one intercept, there would have been four parameters. If we had fit separate, independent intercepts for each region, there would have been seven parameters (which is close to what we ended up with).

Finally, let’s look at the posterior summary.

summary(mod_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD  Naive SE Time-series SE
## a[1]  6.5917 0.57773 0.0047171      0.0463922
## a[2]  6.0610 0.72694 0.0059355      0.0566796
## a[3]  5.8922 0.64292 0.0052494      0.0501706
## a[4]  5.5977 0.88600 0.0072341      0.0728915
## a0    6.0334 1.31723 0.0107551      0.0502761
## b[1] -0.3489 0.10969 0.0008956      0.0092578
## b[2]  0.6457 0.35291 0.0028815      0.0052327
## sig   0.9200 0.06605 0.0005393      0.0006028
## tau   2.0233 1.00736 0.0082251      0.0109946
## 
## 2. Quantiles for each variable:
## 
##          2.5%     25%     50%     75%   97.5%
## a[1]  5.43215  6.2074  6.6029  6.9717  7.7516
## a[2]  4.60867  5.5773  6.0735  6.5319  7.5056
## a[3]  4.60577  5.4704  5.9022  6.3028  7.1790
## a[4]  3.82932  5.0120  5.6127  6.1687  7.3933
## a0    3.44507  5.2426  6.0340  6.8165  8.5807
## b[1] -0.56875 -0.4194 -0.3512 -0.2770 -0.1296
## b[2] -0.04826  0.4070  0.6446  0.8845  1.3428
## sig   0.80117  0.8736  0.9162  0.9630  1.0585
## tau   0.98288  1.4006  1.7703  2.3349  4.5507

In this particular model, the intercepts do not have a real interpretation because they correspond to the mean response for a country that does not produce oil and has $0 log-income per capita (which is $1 income per capita). We can interpret $a_0$ as the overall mean intercept and $\tau$ as the standard deviation of intercepts across regions.

Other models

We have not investigated adding interaction terms, which might be appropriate. We only considered adding hierarchy on the intercepts, but in reality nothing prevents us from doing the same for other terms in the model, such as the coefficients for income and oil. We could try any or all of these alternatives and see how the DIC changes for those models. This, together with other model checking techniques we have discussed could be used to identify your best model that you can use to make inferences and predictions.

Quiz Questions

In previous lessons, we fit models to data representing percent growth in personnel for companies in two industries. Below are additional data from the two original industries (with 10 and 6 companies, respectively), as well as 3 additional industries. Percent growth is reported for a total of 53 companies.

dat = read.table(file="data_files/pctgrowth.csv", sep=",", header=TRUE)
head(dat)

##      y grp
## 1  1.2   1
## 2  1.4   1
## 3 -0.5   1
## 4  0.3   1
## 5  0.9   1
## 6  2.3   1

table(dat$grp)

## 
##  1  2  3  4  5 
## 10  6  8 15 14

hist(dat$y)

boxplot(y~grp, data=dat)

Rather that fit 5 separate models, one for each industry, we can fit a hierarchical model. As before, we assume a normal likelihood and common variance across all observations. Each industry will have its own mean growth, and each of these means will come from a common distribution, from which we will estimate the overall mean and variability across industries.

Let $i$ index the individual companies, and $g_i$ indicate the industry (grp variable in pctgrowth) for company $i$. How do we describe the hierarchical model?

$y_i \space |\space \theta_{g_i}, \sigma^2 \space \stackrel{ind}{\sim} \space N(\theta_{g_i},\sigma^2), \space \space i=1,...,53,\space \space g_i \in \{1,...,5\}$

$\theta_g \space |\space \mu, \tau^2 \space \stackrel{iid}{\sim} \space N(\mu,\tau^2), \space \space g_i \in \{1,...,5\}$

$\mu \space {\sim} \space N(0,1e6)$

$\tau^2 \space {\sim} \space IG(\frac{1}{2},(1)\frac{3}{2}))$

$\sigma^2 \space {\sim} \space IG(\frac{2}{2},(2)\frac{1}{2}))$

Fit the hierarchical model in JAGS and obtain posterior mean estimates for each industry’s mean growth (posterior mean for each $\theta_g$)

mod_quiz_string = " model {
  for (i in 1:length(y)) {
    y[i] ~ dnorm(theta[grp[i]], prec)
  }
  
  for (j in 1:max(grp)) {
    theta[j] ~ dnorm(mu, tau_sq)
  }
  
  mu ~ dnorm(0, 1/1e6)
  tau_sq ~ dgamma(1.0/2.0, 1.0*3.0/2.0)
  prec ~ dgamma(2.0/2.0, 2*1/2)
  sig = sqrt(1/prec)

} "

set.seed(113)

data_jags = as.list(dat)

params = c("theta", "mu", "sig")

mod_quiz = jags.model(textConnection(mod_quiz_string), data=data_jags, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 53
##    Unobserved stochastic nodes: 8
##    Total graph size: 127
## 
## Initializing model

update(mod_quiz, 1e3)

mod_quiz_sim = coda.samples(model=mod_quiz,
                       variable.names=params,
                       n.iter=5e3)
mod_quiz_csim = as.mcmc(do.call(rbind, mod_quiz_sim))

## convergence diagnostics
plot(mod_quiz_sim)

gelman.diag(mod_quiz_sim)

## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## mu                1          1
## sig               1          1
## theta[1]          1          1
## theta[2]          1          1
## theta[3]          1          1
## theta[4]          1          1
## theta[5]          1          1
## 
## Multivariate psrf
## 
## 1

autocorr.diag(mod_quiz_sim)

##                  mu           sig     theta[1]     theta[2]     theta[3]
## Lag 0   1.000000000  1.0000000000  1.000000000  1.000000000  1.000000000
## Lag 1   0.048099635  0.0842571704  0.045774105  0.059142931  0.041709931
## Lag 5  -0.006067291  0.0082003116 -0.016796849 -0.009077723  0.009376290
## Lag 10  0.010853499  0.0001289421 -0.007960229  0.012251592 -0.006946827
## Lag 50 -0.003163093 -0.0083421932 -0.013204207 -0.013031574 -0.004917743
##            theta[4]     theta[5]
## Lag 0   1.000000000 1.0000000000
## Lag 1   0.011760574 0.0070332512
## Lag 5  -0.003721279 0.0046122216
## Lag 10 -0.002468041 0.0085976124
## Lag 50  0.001491710 0.0007347058

autocorr.plot(mod_quiz_sim)

effectiveSize(mod_quiz_sim)

##       mu      sig theta[1] theta[2] theta[3] theta[4] theta[5] 
## 13757.12 13440.76 13685.50 13327.47 13707.73 15000.00 14787.02

## compute DIC
dic = dic.samples(mod_quiz, n.iter=1e3)

pm_params = apply(mod_quiz_csim, 2, mean)
means_theta = pm_params[-c(1,2)]

We are interested in comparing these estimates to those obtained from the model that assumes no hierarchy (the ANOVA cell means model). We can approximate the posterior estimates for the 5 industry means under a noninformative prior by simply calculating the sample mean growth for the five industries. We can do this in R with:

means_anova = tapply(dat$y, INDEX=dat$grp, FUN=mean)
## dat is the data read from pctgrowth.csv

plot(means_anova)
points(means_theta, col="red") ## where means_theta are the posterior point estimates for the industry means.

The estimates from the hierarchical model have less variability than those of the ANOVA model, tending toward smaller magnitudes.

In our hierarchical model for personnel growth, we assumed that the variability between companies within an industry was constant across industries ($\sigma^2$ was the same for all industries.) Which approach would be less informative?

Answer: Calculate the posterior probability that $\sigma^2/\tau^2>1$ in the original model. If this probability exceeds a pre-determined amount, use a model with separate variance parameters.

Consider once again the OME data in the MASS package in R, which we explored earlier. The data consist of experimental results from tests of auditory perception in children. Under varying conditions and for multiple trials under each condition, children either correctly or incorrectly identified the source of the changing signals.

Recall that the model looked like this:

$y_i \space | \space \phi_i \stackrel{ind}{\sim} Binomial(n_i,\phi_i), \space \space i=1,...,712$

$logit(\phi_i)=\beta_0+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}$

$\beta_0 \sim N(0,5^2)$

$\beta_k \stackrel{iid}{\sim} N(0,4^2)\space \space k=1,2,3.$

As with other models, we will extend the intercept (and rename it) so that the linear part of the model looks like this:

$logit(\phi_i)=\alpha_{(ID_i)}+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}$

where $ID_i$ is an index identifying the child for observation $i$.

The hierarchical prior for the intercepts would look like this:

$\alpha_j \stackrel{iid}{\sim} N(\mu,\tau^2),\space \space j=1,...,63$ (There are 63 children).

followed by the priors for $\mu$ and $\tau^2$,

$\mu \sim N(0,10^2)$

$\tau^2 \sim IG(\frac{1}{2},\frac{1}{2})$

$\tau^2$ indicates the variability in the number of correct responses across tests for one child.

We fit the hierarchical model outlined above using JAGS:

library("MASS")
data("OME")

dat = subset(OME, OME != "N/A")
dat$OME = factor(dat$OME) # relabel OME
dat$ID = as.numeric(factor(dat$ID)) # relabel ID so there are no gaps in numbers (they now go from 1 to 63)

## Original reference model and covariate matrix
mod_glm = glm(Correct/Trials ~ Age + OME + Loud + Noise, data=dat, weights=Trials, family="binomial")
X = model.matrix(mod_glm)[,-1]

## Original model (that needs to be extended)
mod2_quiz_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dbin(phi[i], n[i])
        logit(phi[i]) = alpha[ID[i]] + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
    }
     
    for (k in 1:max(ID)) {
      alpha[k] ~ dnorm(mu, prec_alpha)
    }
    
    for (j in 1:4) {
        b[j] ~ dnorm(0.0, 1.0/4.0^2)
    }
    
    mu ~ dnorm(0.0, 1.0/10.0^2)
    prec_alpha ~ dgamma(1.0/2.0, 1.0/2.0)
    tau = 1/prec_alpha
    
    
} "

data_jags = as.list(as.data.frame(X))
data_jags$y = dat$Correct
data_jags$n = dat$Trials
data_jags$ID = dat$ID

How do the convergence diagnostics look?

params = c("alpha", "b", "mu", "tau")

mod2_quiz = jags.model(textConnection(mod2_quiz_string), data=data_jags, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 712
##    Unobserved stochastic nodes: 69
##    Total graph size: 6499
## 
## Initializing model

update(mod2_quiz, 1e3)

mod2_quiz_sim = coda.samples(model=mod2_quiz,
                       variable.names=params,
                       n.iter=5e3)
mod2_quiz_csim = as.mcmc(do.call(rbind, mod2_quiz_sim))

## convergence diagnostics
plot(mod2_quiz_sim)

gelman.diag(mod2_quiz_sim)

## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## alpha[1]        1.12       1.34
## alpha[2]        1.12       1.35
## alpha[3]        1.11       1.33
## alpha[4]        1.12       1.35
## alpha[5]        1.12       1.35
## alpha[6]        1.10       1.30
## alpha[7]        1.12       1.35
## alpha[8]        1.12       1.33
## alpha[9]        1.11       1.31
## alpha[10]       1.12       1.36
## alpha[11]       1.12       1.33
## alpha[12]       1.10       1.29
## alpha[13]       1.11       1.30
## alpha[14]       1.11       1.32
## alpha[15]       1.12       1.33
## alpha[16]       1.11       1.31
## alpha[17]       1.12       1.34
## alpha[18]       1.12       1.34
## alpha[19]       1.11       1.32
## alpha[20]       1.13       1.36
## alpha[21]       1.11       1.33
## alpha[22]       1.13       1.36
## alpha[23]       1.11       1.33
## alpha[24]       1.13       1.37
## alpha[25]       1.13       1.36
## alpha[26]       1.12       1.35
## alpha[27]       1.11       1.32
## alpha[28]       1.12       1.33
## alpha[29]       1.11       1.31
## alpha[30]       1.12       1.34
## alpha[31]       1.11       1.31
## alpha[32]       1.11       1.31
## alpha[33]       1.11       1.33
## alpha[34]       1.11       1.31
## alpha[35]       1.11       1.31
## alpha[36]       1.11       1.31
## alpha[37]       1.11       1.31
## alpha[38]       1.11       1.33
## alpha[39]       1.11       1.32
## alpha[40]       1.11       1.31
## alpha[41]       1.11       1.32
## alpha[42]       1.12       1.35
## alpha[43]       1.11       1.33
## alpha[44]       1.11       1.31
## alpha[45]       1.12       1.34
## alpha[46]       1.11       1.31
## alpha[47]       1.11       1.31
## alpha[48]       1.11       1.32
## alpha[49]       1.11       1.33
## alpha[50]       1.12       1.34
## alpha[51]       1.12       1.33
## alpha[52]       1.11       1.32
## alpha[53]       1.11       1.33
## alpha[54]       1.11       1.31
## alpha[55]       1.11       1.32
## alpha[56]       1.12       1.34
## alpha[57]       1.11       1.32
## alpha[58]       1.12       1.34
## alpha[59]       1.11       1.32
## alpha[60]       1.12       1.34
## alpha[61]       1.12       1.35
## alpha[62]       1.11       1.33
## alpha[63]       1.12       1.35
## b[1]            1.02       1.04
## b[2]            1.00       1.00
## b[3]            1.15       1.43
## b[4]            1.01       1.03
## mu              1.15       1.44
## tau             1.00       1.00
## 
## Multivariate psrf
## 
## 1.08

autocorr.diag(mod2_quiz_sim)

##         alpha[1]  alpha[2]  alpha[3]  alpha[4]  alpha[5]  alpha[6]  alpha[7]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8805935 0.8863123 0.8758478 0.9174165 0.9009758 0.8250163 0.8954051
## Lag 5  0.8321182 0.8378684 0.8132164 0.8790662 0.8529819 0.7686983 0.8508082
## Lag 10 0.8133812 0.8213814 0.7992093 0.8634349 0.8400800 0.7584615 0.8326739
## Lag 50 0.7215953 0.7150055 0.7036058 0.7542144 0.7287293 0.6640281 0.7320426
##         alpha[8]  alpha[9] alpha[10] alpha[11] alpha[12] alpha[13] alpha[14]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8773307 0.8441179 0.8736779 0.8947965 0.8276362 0.8319647 0.8521179
## Lag 5  0.8255534 0.7798186 0.8230767 0.8491825 0.7622669 0.7665667 0.7946590
## Lag 10 0.8096994 0.7605186 0.8037414 0.8329099 0.7436259 0.7537604 0.7805251
## Lag 50 0.7100236 0.6726599 0.7074177 0.7244428 0.6584196 0.6641613 0.6889869
##        alpha[15] alpha[16] alpha[17] alpha[18] alpha[19] alpha[20] alpha[21]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8777698 0.8373863 0.8422627 0.8853559 0.8345200 0.8988296 0.8539885
## Lag 5  0.8291231 0.7787474 0.7857128 0.8271508 0.7702236 0.8542913 0.7980614
## Lag 10 0.8147044 0.7658951 0.7730859 0.8126260 0.7565144 0.8440269 0.7843892
## Lag 50 0.7101232 0.6715441 0.6853994 0.7160263 0.6653927 0.7458603 0.6903959
##        alpha[22] alpha[23] alpha[24] alpha[25] alpha[26] alpha[27] alpha[28]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.9088185 0.8298057 0.9074378 0.8652318 0.8392386 0.8415115 0.8610941
## Lag 5  0.8570620 0.7647724 0.8655714 0.8137651 0.7836328 0.7781047 0.8062735
## Lag 10 0.8354329 0.7484220 0.8506405 0.7956588 0.7668578 0.7625849 0.7938747
## Lag 50 0.7334242 0.6633423 0.7413164 0.7021577 0.6824752 0.6725712 0.6991639
##        alpha[29] alpha[30] alpha[31] alpha[32] alpha[33] alpha[34] alpha[35]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8352577 0.8475182 0.8378022 0.8416694 0.8501929 0.8418685 0.8377269
## Lag 5  0.7652757 0.7911818 0.7723838 0.7667344 0.7888382 0.7841291 0.7775364
## Lag 10 0.7525940 0.7770906 0.7555281 0.7494930 0.7738743 0.7712052 0.7645335
## Lag 50 0.6674600 0.6895933 0.6646291 0.6615691 0.6798215 0.6814761 0.6696707
##        alpha[36] alpha[37] alpha[38] alpha[39] alpha[40] alpha[41] alpha[42]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8265941 0.8467042 0.8390624 0.8371079 0.8140012 0.8159711 0.8445468
## Lag 5  0.7603765 0.7866168 0.7802228 0.7771267 0.7422737 0.7446838 0.7916918
## Lag 10 0.7527312 0.7758702 0.7688240 0.7612942 0.7331010 0.7264762 0.7813124
## Lag 50 0.6637745 0.6830923 0.6802446 0.6751497 0.6466207 0.6445759 0.6920931
##        alpha[43] alpha[44] alpha[45] alpha[46] alpha[47] alpha[48] alpha[49]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8384532 0.8395410 0.8389451 0.8384219 0.8336936 0.8338291 0.8369821
## Lag 5  0.7805831 0.7769642 0.7735889 0.7688315 0.7743716 0.7701505 0.7776108
## Lag 10 0.7691670 0.7590066 0.7582882 0.7529992 0.7627380 0.7550386 0.7623728
## Lag 50 0.6777696 0.6665319 0.6670678 0.6619549 0.6674187 0.6663895 0.6688652
##        alpha[50] alpha[51] alpha[52] alpha[53] alpha[54] alpha[55] alpha[56]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8331925 0.8474647 0.8242888 0.8419744 0.8334118 0.8109588 0.8415047
## Lag 5  0.7756962 0.7851986 0.7610924 0.7742288 0.7661141 0.7443987 0.7830543
## Lag 10 0.7609723 0.7735420 0.7484117 0.7630020 0.7486881 0.7312896 0.7682234
## Lag 50 0.6720896 0.6829579 0.6639048 0.6684293 0.6591061 0.6505607 0.6816542
##        alpha[57] alpha[58] alpha[59] alpha[60] alpha[61] alpha[62] alpha[63]
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.8671884 0.8773033 0.8296906 0.8579489 0.8438144 0.8271783 0.8610227
## Lag 5  0.8148115 0.8312434 0.7687921 0.7988164 0.7862958 0.7678746 0.8050146
## Lag 10 0.7978084 0.8175617 0.7526325 0.7834565 0.7661347 0.7575568 0.7903791
## Lag 50 0.7056465 0.7178228 0.6653041 0.6982369 0.6769475 0.6688112 0.7012956
##             b[1]       b[2]      b[3]       b[4]        mu        tau
## Lag 0  1.0000000 1.00000000 1.0000000 1.00000000 1.0000000 1.00000000
## Lag 1  0.9521623 0.90300601 0.9889125 0.52182385 0.9933711 0.68500171
## Lag 5  0.7944743 0.61653335 0.9491846 0.12835601 0.9810928 0.28988657
## Lag 10 0.6455344 0.37919008 0.9051031 0.09806712 0.9662158 0.08792042
## Lag 50 0.2251879 0.03836241 0.7095529 0.11254399 0.8524014 0.04299585

autocorr.plot(mod2_quiz_sim)

effectiveSize(mod2_quiz_sim)

##   alpha[1]   alpha[2]   alpha[3]   alpha[4]   alpha[5]   alpha[6]   alpha[7] 
##   37.44775   36.91292   32.31596   28.62276   33.31762   41.67873   36.23842 
##   alpha[8]   alpha[9]  alpha[10]  alpha[11]  alpha[12]  alpha[13]  alpha[14] 
##   35.28917   40.76902   33.14288   36.70273   46.14203   39.15319   36.98484 
##  alpha[15]  alpha[16]  alpha[17]  alpha[18]  alpha[19]  alpha[20]  alpha[21] 
##   36.71379   37.74433   35.24041   38.23883   38.36116   32.37557   33.96510 
##  alpha[22]  alpha[23]  alpha[24]  alpha[25]  alpha[26]  alpha[27]  alpha[28] 
##   40.80413   38.36735   37.45380   37.11940   35.65579   39.73454   33.91251 
##  alpha[29]  alpha[30]  alpha[31]  alpha[32]  alpha[33]  alpha[34]  alpha[35] 
##   36.33040   38.19714   42.32620   39.45678   39.28585   38.52281   40.86740 
##  alpha[36]  alpha[37]  alpha[38]  alpha[39]  alpha[40]  alpha[41]  alpha[42] 
##   36.03921   38.29534   39.54653   31.77420   39.25245   43.43920   37.23404 
##  alpha[43]  alpha[44]  alpha[45]  alpha[46]  alpha[47]  alpha[48]  alpha[49] 
##   37.27081   35.96194   44.91382   42.77693   38.54170   35.53320   42.60201 
##  alpha[50]  alpha[51]  alpha[52]  alpha[53]  alpha[54]  alpha[55]  alpha[56] 
##   34.63866   39.22699   36.00868   35.40188   41.43014   39.01054   34.41874 
##  alpha[57]  alpha[58]  alpha[59]  alpha[60]  alpha[61]  alpha[62]  alpha[63] 
##   31.87392   35.74257   39.98569   36.58211   41.32005   31.01417   32.24241 
##       b[1]       b[2]       b[3]       b[4]         mu        tau 
##  332.15409  719.06244   76.46295 1098.16803   23.93485 1926.56211

## compute DIC
dic = dic.samples(mod2_quiz, n.iter=1e3)
dic

## Mean deviance:  1240 
## penalty 28.76 
## Penalized deviance: 1269

Hierarchical Modeling

Ken Wood

2023-07-30

Data Input

Prior predictive checks

`JAGS` Model

Model checking

Results

Posterior predictive simulation

Random intercept linear model

Results

Other models

Quiz Questions

Hierarchical Modeling

Ken Wood

2023-07-30

Data Input

Prior predictive checks

JAGS Model

Model checking

Results

Posterior predictive simulation

Random intercept linear model

Results

Other models

Quiz Questions

`JAGS` Model