Forest inventory plots can include experimental treatments. Here we use long-term observations from a CO2 fumigation experiment to show how a hierarchical model can change the interpretation of size and treatment effects.
Resources
Data files
dataTreeFACE.txtForest inventory data collection with Renata, a PhD student in the lab
Install rjags, email if you encounter issues
Hierarchical analysis of experimental inventory plots
How does rising CO2 affect tree growth, when each individual experiences CO2 concentration in its own way? The FACE experimental inventory data provide an opportunity to examine the heterogeneity of responses in the context of individual differences.
To determine how rising CO2 can affect a forested ecosystem, the Free-Air CO2 Enrichment (FACE) experiment subjected trees to elevated CO2 levels in 30-m diameter rings at the Duke Forest. Four plots received elevated atmospheric CO2 concentration, and four plots were instrumented but were treated as ambient controls. The experiment began in June 1994 as a prototype, with the full experiment implemented with additional plots in August, 1996. Plots were partitioned in early 2005 to include CO2 X nitrogen treatment (main effects and factorial). In addition to measurements on all aspects of ecophysiology and atmosphere-biosphere exchange, the plots were sampled as forest inventory plots, including diameter growth. The study was funded by by the Office of Science (BER) of the U.S. Department of Energy.
In this analysis we use tree data from three ambient and three elevated CO2 rings. Rings are 30 m wide and instrumented for CO2 fumigation and monitoring.
Here I use a hierarchical model to examine the effects of tree size (diameter), CO2 fertilization, and their interaction on growth rates. There are large differences between individuals, so I include a random effect on trees. To appreciate the role of heterogeneity, I start first with a single-stage regression that ignores individual differences. Here is a data set:
data <- read.table('dataTreeFACE.txt',header=T)
data <- data[ is.finite( rowSums( data[,c('diam', 'trt')] ) ), ]
data$y <- log(data[,'dnow']) # log growth rateSome EDA
Exploratory data analysis (EDA) is the essential first step in an analysis to understand relationships between variables.
The CO2 treatment in the data set is coded as [0, 1]. To see the balance of observations I make a table:
table(data$trt)##
## 0 1
## 3669 3145
Note that treatment and controls are well balanced.
Here is a plot of growth rates against year:
plot( jitter(data$t), data$dnow, xlab = 'Year', ylab = 'Growth rate (cm)',
cex=.1, log = 'y')
q <- tapply( data$dnow, data$t, quantile, c(.5, .05, .95), na.rm = T)
q <- matrix( unlist(q), ncol = 3, byrow = T )
lines( q[,1], lwd = 3 )
lines( q[,2], lty = 2 )
lines( q[,3], lty = 2 )Attrition of slow growers in later years.
I find it hard to interpret this plot, which suggests bi-modality, i.e., clusters of low and high growth rates. Frequency plots of growth rates by year confirm this difference:
# histogram by year
lr <- range( data$y, na.rm=T )
bins <- seq( lr[1], lr[2], length=50) # bins for log growth rate
mids <- ( bins[-1] + bins[-50])/2
ybin <- findInterval( data$y, bins, all.inside = T) # assign growth rates to bins
dByt <- table( data$t, ybin ) # growth by year
xlim <- c(1, 12)
ylim <- range(data$y, na.rm=T)
plot( NA, xlim = xlim, ylim = ylim, cex=.1,
xlab = 'Year', ylab = 'log growth rate (cm)')
nt <- max(data$t)
x <- dByt[1,]*0
scale <- 1/max(dByt) # horizontal scale fits time increment
for(t in 1:nt){
lines( t + scale*dByt[t,], mids, type = 's', lwd=2, col = 'grey')
lines( c(t,t), ylim, lwd=2, col = 'grey' )
}Distributions of individual rates by year.
This is certainly not Gaussian, but perhaps it would be approximately Gaussian if I allow for individual differences. As a quick check, here are frequency distributions by individual tree:
par(mfrow=c(2,1), mar=c(4,5,2,4))
growthStandard <- (data$y - mean(data$y))/sd(data$y) #center, standardize
hist( growthStandard, nclass = 100, xlim=c(-3, 3),
main = "Centered on population mean", xlab = '') # centered
imu <- tapply( data$y, data$i, mean, na.rm=T) # standardized rates
isd <- tapply( data$y, data$i, sd, na.rm=T)
hist( (data$y - imu[data$i])/isd[data$i], nclass=100, xlim=c(-3, 3),
main = "On individual means", xlab = 'Standardized log growth')Distributions of individual rates.
Still not Gaussian, but predictors in the model are expected to explain variation in the data. I can include predictors in a regression.
Bayesian regression in Jags
rJags can be used to simulate the posterior distribution in R.
A Bayesian regression model starts with main effects of diameter diam and CO2 treatment trt. The graph for this model shows coefficients \(\boldsymbol{\beta}\) and a residual variance \(\sigma^2\),
\[y_{ij} = \boldsymbol{\beta'}\mathbf{x}_{it} + \epsilon_{it}\]
A formula in R has the syntax y ~ predictor1 + predictor1. An interaction has the syntax predictor1*predictor2.
Recall that \(\mathbf{x}_{it}\) is one row of the full design matrix \(\mathbf{X}\). I want to look at the design matrix \(\mathbf{X}\), so I generate it using model.frame and model.matrix:
tmp <- model.frame(y ~ diam*trt, data)
x <- model.matrix(y ~ diam*trt, tmp) # design matrix
y <- tmp$y # response (log growth rate)
n <- length(y)
head(x)## (Intercept) diam trt diam:trt
## 1 1 19.17660 0 0
## 2 1 19.72210 0 0
## 3 1 20.34015 0 0
## 4 1 20.91494 0 0
## 5 1 21.24999 0 0
## 6 1 21.93743 0 0
There is a product term for the interaction diam:trt.
The Bayesian regression will have fitted coefficients and a residual variance. Here is a model for rjags, which is written to a file growthFixed.txt:
cat( "model{
for(i in 1:n){
y[i] ~ dnorm(mu[i],tau)
mu[i] <- b1 + b2*x[i,2] + b3*x[i,3] + b4*x[i,4]
}
tau ~ dgamma(0.001,0.001)
sigma <- sqrt(1/tau)
b1 ~ dnorm(0.0,1.0E-06)
b2 ~ dnorm(0.0,1.0E-06)
b3 ~ dnorm(0.0,1.0E-06)
b4 ~ dnorm(0.0,1.0E-06)
}",
fill=T, file="growthFixed.txt"
)The function cat converts this character string to a text file. There are three blocks within the model string. The first block looks like a loop, but it does not execute that way. It will be interpreted to set up the graphical model for the mean of the regression mu and for the data y. rjags differs from R in that the parameterization for the standard deviation sigma is the reciprocal of sigma^2, tau, termed the precision. The precision is given a gamma prior, and the sigma is defined as its reciprocal. This means that sigma^2 has an inverse gamma prior. The coefficients have a non-informative independent prior distribution.
The inverse gamma distribution for \(\sigma^2\) is a gamma distribution of \(\sigma^{-2}\). It is often used as a prior distribution for a variance, because it is conjugate with the normal likelihood, meaning that the posterior distribution is also inverse gamma.
The full model is
\[ \begin{align} [\boldsymbol{\beta}, \sigma^2| \mathbf{y}] &= \prod_{i} \prod_{t}N( \mathbf{y} | \boldsymbol{\beta'}\mathbf{x}_{it}, \sigma^2) \\ &\times \prod_q N( \beta_q | 0, 10^6) \\ &\times IG( \sigma^2 | 0.001, 0.001 ) \end{align} \] The graph for this model has a data stage (fixed \(y_i\)), a random parameter stage (\(\boldsymbol{\beta}, \sigma^2\)), and a fixed prior distribution.
Graph of the regression model showing the prior distribution for coefficients. Note that parameters have distributions, but that randomness is viewed as “uncertainty”, not as true variation. Uncertainty comes from the fact that parameters are estimated.
Here I generate the model graph:
library(rjags)
treeData <- list(y = y, x = x, n = n ) # define data
growModel <- jags.model(data = treeData, file = "growthFixed.txt") # generate graph## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 6814
## Unobserved stochastic nodes: 5
## Total graph size: 50858
##
## Initializing model
Here I fit the model
# fit model
growFit <- coda.samples(growModel, variable.names = c("b1","b2","b3","b4"),
n.iter = 5000)
tmp <- summary(growFit)
print(tmp$statistics)## Mean SD Naive SE Time-series SE
## b1 -4.04743172 0.104881117 1.483243e-03 0.0082885616
## b2 0.15016768 0.005558774 7.861293e-05 0.0004510340
## b3 0.62837170 0.105196887 1.487709e-03 0.0123558396
## b4 -0.02544631 0.005616764 7.943304e-05 0.0006998969
par(mar = c(3,3,1,1))
plot( as.mcmc(growFit) )MCMC chains and posterior densities.
These plots show MCMC chains for each of the four \(\beta\) values on the left and density estimates at right. We will talk more in coming weeks about the need for convergence and good mixing in these chains. For now, I just note that they quickly converge to values that are then stationary. The density plots at right are generated from these chains.
Markov chain Monte Carlo (MCMC) draws repeated samples from the posterior distribution that can be viewed as chains (iterations by parameter value).
Objects can be extracted from the summary,
betaFixed <- cbind( tmp$statistics[,1:2], tmp$quantiles[,c('2.5%','97.5%')] )
signif( betaFixed, 3 )## Mean SD 2.5% 97.5%
## b1 -4.0500 0.10500 -4.1200 -3.9700
## b2 0.1500 0.00556 0.1460 0.1540
## b3 0.6280 0.10500 0.5020 0.7320
## b4 -0.0254 0.00562 -0.0312 -0.0186
The matrix betaFixed summarizes the posterior distribution for \(\boldsymbol{\beta}\).
Here are the data and the mean prediction for ambient (black) and elevated (red) treatments:
xlim <- range( x[,'diam'] )
xseq <- seq( xlim[1], xlim[2], length=100 ) # sequence of diameter values
par(mfrow=c(1,2), bty='n', mar=c(4,4,3,1)) # format for 2 plots
plot(x[,'diam'], y, col=x[,'trt'] + 1, cex=.1, # color code by trt
xlab='Diameter (cm)',
ylab = 'log growth rate (cm/yr)')
mu <- betaFixed[1,1] + betaFixed[2,1]*xseq
lines(xseq, mu, lwd = 2, lty = 2)
lines(xseq, mu + betaFixed[3,1] + betaFixed[4,1]*xseq, lwd = 2, col = 2, lty = 2)
legend('bottomright', c('ambient','elevated'), text.col = c(1,2), bty='n')Fitted model showing two treatment levels with observed values.
Exercise 1: Interpret the main effects and interactions in this model.
Exercise 2: Run the model again, including nfert (nitrogen fertilization) in addition to other variables. Interpret its effect.
A hierarchical model for random effects
A model becomes hierarchical when a stage is added to the prior. When this is done, the “prior” is no longer fixed, thus allowing for variability at an additional stage. Random effects can be thought of as parameters that have a hyperprior distribution.
The model can be extended hierarchically to allow for differences between individuals. I assign random effects to individuals under the assumption that individuals are drawn at random from a much larger population of individuals that I might have selected for the study. In the simple regression context this is termed a mixed model, referring to the fact that there will be both fixed and random effects. Note that the added stage does not affect the structure of the previous model that contained only fixed effects.
Graph with random coefficients. There are “fixed” differences" between individuals coming from beta. There are also random differences coming from alpha_i.
For random effects, I start with labels for the observations by tree in the vector ij, which uniquely identifies an individual. I use match against the unique values to assign each row in x to an individual:
ij <- apply( cbind(data$j, data$i), 1, paste0, collapse='-') # plot, tree
tn <- sort(unique(ij)) # tree labels
ix <- match(ij,tn) # index rows by tree
ntree <- max(ix) # no. treesThe model will have random effects for the intercept and the diameter:
\[ y_{ij} = \beta_0 + \beta_d d_{it} + \beta_c c_{it} + \beta_{d \times c} d_{it}c_{it} + \alpha_{i0} + \alpha_{id} d_{it} + \epsilon_{it} \] where \(d, c\) refer to diameter and co2 treatment, respectively. Note that \(\alpha\) coefficients have a subscript \(i\), whereas the \(\beta\)’s do not.
Exercise 3: Why didn’t I include random effects on CO2 treatment?
There are ntree = 696 individual trees. Each observation is now labeled by the tree to which it belongs in the vector ix. To follow the jags model, I need to recogize the vectorization ix[i], which gives me the individual identity for each row in x. Now I vectorize again to get the random intercept assigned to this individual, a1[ix[i]]. Here is a model:
cat( "model{
for(i in 1:n){
y[i] ~ dnorm(mu[i],tau)
mu[i] <- b1 + b2*x[i,2] + b3*x[i,3] + b4*x[i,4] +
a1[ix[i]] + a2[ix[i]]*x[i,2]
}
for(j in 1:ntree){
a1[j] ~ dnorm(0.0,tau1)
a2[j] ~ dnorm(0.0,tau2)
}
tau1 ~ dgamma(0.001,0.001) #RE precision intercept
sigma1 <- 1/sqrt(tau1) #RE variance intercept
tau2 ~ dgamma(0.001,0.001) #RE precision diam
sigma2 <- 1/sqrt(tau2) #RE variance diam
tau ~ dgamma(0.001,0.001)
sigma <- sqrt(1/tau)
b1 ~ dnorm(0.0,1.0E-06)
b2 ~ dnorm(0.0,1.0E-06)
b3 ~ dnorm(0.0,1.0E-06)
b4 ~ dnorm(0.0,1.0E-06)
}",
fill=T,file="growthRE.txt" )Note that there are fixed effects, b1, ..., b4. Each random effect, a1, a2 has 696 values. Again, the index ix identifies the proper individual for each observation row in x, having index i.
The hierarchical nature of model comes from the fact that the variance for a1, given by sigma1, also has a prior distribution.
Here is a model fit using three parallel chains:
treeData <- list(y = y, x = x, ix = ix, ntree = ntree, n = n)
reModel <- jags.model(data = treeData, file = "growthRE.txt", n.chains = 3)## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 6814
## Unobserved stochastic nodes: 1399
## Total graph size: 65885
##
## Initializing model
growthRE <- coda.samples(reModel,
variable.names=c("b1","b2","b3","b4",
"a1","a2","sigma1","sigma2"),
n.iter=5000)
tmp <- summary(growthRE)
betaRE <- cbind( tmp$statistics[,1:2], tmp$quantiles[,c('2.5%','97.5%')] )
head(betaRE) # many random coefficients first## Mean SD 2.5% 97.5%
## a1[1] 0.9105781 0.4261660 -0.18333618 1.545762
## a1[2] -1.9639670 0.4099115 -3.26659301 -1.588128
## a1[3] 1.0348123 0.4268739 -0.23862561 1.607613
## a1[4] 1.2743781 0.4659632 0.03191483 1.975486
## a1[5] 0.6864183 0.5705059 -1.00571439 1.388553
## a1[6] 0.6234952 0.4004524 -0.52885717 1.154438
tail(betaRE,10) # four fixed coefficients and sigma at end## Mean SD 2.5% 97.5%
## a2[693] -2.554608e-03 0.015737305 -0.0353620941 0.027333416
## a2[694] -1.316847e-03 0.013134425 -0.0296021350 0.023925522
## a2[695] -4.212973e-03 0.015895716 -0.0386303799 0.025094552
## a2[696] -4.170400e-03 0.016077405 -0.0407451859 0.024937002
## b1 -1.666192e+00 0.408784733 -1.9408781694 -0.303495773
## b2 7.251760e-03 0.003266785 0.0008066819 0.012569114
## b3 3.349531e-01 0.134641892 0.0003670621 0.527403530
## b4 4.083833e-05 0.003708022 -0.0061910465 0.009232301
## sigma1 1.159595e+00 0.163901368 1.0416430411 1.753483049
## sigma2 1.480624e-02 0.004867729 0.0098432378 0.027144703
As before we have four beta estimates.
The estimates in betaRE are obtained from MCMC chains that can be accessed this way:
cnames <- colnames(growthRE[[1]]) # parameters are column names
a1 <- which(startsWith(cnames, 'a1')) # a parameters for each tree
a2 <- which(startsWith(cnames, 'a2'))
bg <- which(startsWith(cnames, 'b')) # four beta parameter
sg <- which(startsWith(cnames, 's')) # sigma
bgibbs <- growthRE[[1]][,bg] # extract changes for beta, sigma
sgibbs <- growthRE[[1]][,sg]
beta <- apply(bgibbs,2, mean) # means agree with betaRE
# tree random effect parameter means
alpha <- cbind( apply(growthRE[[1]][,a1],2, mean),
apply(growthRE[[1]][,a2],2, mean))
colnames(alpha) <- c( 'a1', 'a2' )
rownames(alpha) <- NULL
head( signif(alpha, 3) )## a1 a2
## [1,] 0.911 0.00218
## [2,] -1.960 -0.00253
## [3,] 1.060 -0.00310
## [4,] 1.270 -0.00320
## [5,] 0.743 0.01500
## [6,] 0.614 -0.00320
Here are mean predictions for diameter at two CO2 treatments.
# predicted mean (fixed effects)
ylim <- range(y)
mu <- betaFixed[1,1] + betaFixed[2,1]*xseq # ambient
plot(xseq, mu, type='l',lwd=2, ylim=ylim, xlab='Diameter (cm)', ylab='')
mu2 <- mu + betaFixed[3,1] + betaFixed[4,1]*xseq # elevated CO2
lines(xseq, mu2, type='l',lwd=2, col=2)
# size range by individual
irange <- matrix( unlist(tapply(x[,'diam'], ix, range, na.rm=T)),ncol=2,byrow=T)
# regression fit by individual
for(i in 1:ntree){
xi <- xseq[xseq > irange[i,1] & xseq < irange[i,2]]
mu <- beta[1] + alpha[i,1] + (beta[2] + alpha[i,2])*xi # fixed b, individual a
wi <- which(ix == i)[1]
if(x[wi,'trt'] == 0){ # ambient
lines(xi, mu, col='grey')
next
}
mu <- mu + beta[3] + beta[4]*xi # elevated
lines(xi, mu, type='l',col=2)
}Note that I first plotted the lines for mean fixed effects. The loop adds the individual effects for each tree, showing elevated CO2 in red.
Exercise 4: Interpret the main effects and interactions in this hierarchical model. How does it compare with the model that included only fixed effects? Think about the role of variation between individuals versus within individuals. Speculate on why the differences.
Exercise 5: Is this analysis an example of the ecological fallacy? Why or why not?
Deviance information criterion
Models are selected to fit, but not over-fit, the data. A penalty for model size typically counts independent parameters. Hierarchical models do not have a clear number of parameters, because many are drawn from distributions that also have fitted parameters.
The deviance information criterion (DIC) is an index used for selection in hierarchical models, based on the deviance, \(D(\theta) = -2 \log( [\mathbf{y} | \theta] )\), where \(\theta\) is a vector of fitted parameters, and \([\mathbf{y} | \theta]\) is the likelihood. We need a penalty for model complexity, i.e., the number of parameters. The problem with model selection for hierarchical models is the fact I do not know how to ‘count’ them. Each parameter is part of a distribution.
The DIC is given by
\[DIC = D(\bar{\theta}) + 2p_D\]
where \(D(\bar{\theta})\) is the deviance calculated at the posterior mean parameter values, and \(p_D = \bar{D} - D(\bar{\theta})\) is the ‘effective number of parameters’. This is penalty for model complexity. The model with lowest DIC is viewed as the best model. Here is the DIC using \(2p_D\) as the penalty:
dic.samples(reModel, n.iter = 100, thin = 1, type = 'pD')## Mean deviance: -1169
## penalty 733.5
## Penalized deviance: -435.6
This penalized deviance (i.e., the DIC) can be compared with other models fitted to the same data.
Exercise 6: Include nitrogen fertilization, nfert to the hierarchical analysis. Use DIC to determine if it improves the model.
Summary
Random effects represent a common type of hierarchical model. Random effects can be used when group differences are viewed as having been drawn at random for a large population of such groups, i.e., they are exchangeable. A random effects model may be limited to random intercepts. Or it can additionally have random slopes. The effects become random when I add a prior on the prior–a hyperprior.
Hierarchical models submit to Gibbs sampling, a type of MCMC, just like single-stage models. They can be fitted in R, including in jags. In upcoming units we’ll write our own Gibbs samplers.
DIC provides a simple index to select variables in hierarchical models. It does so by estimating the effective number of parameters in the model and uses it as the penatly for model complexity.
Hierarchical models can apply to all kinds of problems, where additional stages allow us to specify and compute based on conditional relationships.
Appendix
Here is a bit more background on the hierarchical model that we take up in later units.
Consider a linear model with Gaussian error. A random effects model for a regression parameter can be written with two terms,
\[y_{ij} = \beta + \alpha_j + \epsilon_{ij}\]
The first term is the fixed effect. It will have a prior distribution. If \(\alpha_j\) is a random effect assigned to the \(j^{th}\) group, then it will have a prior distribution with parameters that also have prior distributions.
In a regression the random effects can be organized as a \(q \times J\) matrix, with one row per group. I give it a distribution, e.g., \(vec(\boldsymbol{\alpha}) \sim MVN(0, \mathbf{V})\), where \(\mathbf{V}\) is a \(Jq \times Jq\) covariance matrix. If there are predictors in the model I can write it this way:
\[y_{ij} = \boldsymbol{\beta'}\mathbf{x}_{ij} + \boldsymbol{\alpha}'_j \mathbf{w}_{ij} + \epsilon_{ij}\]
where \(\mathbf{x}_{ij}\) is a length-\(p\) vector of covariates and/or factors that enter as fixed effects, \(\mathbf{w}_{ij}\) is a subset of \(\mathbf{x}_{ij}\) length-\(q\) vector, where \(q \leq p\), \(\boldsymbol{\beta'}\) is the length-\(p\) vector of fixed effects, and \(\boldsymbol{\alpha}_j\) is the length-\(q\) vector of random effects for group \(j\). It is the \(j^{th}\) row in the random effects matrix.
For a simple model with random effects on both slope and intercept, \(\mathbf{x}_{ij} = \mathbf{w}_{ij} =(1, x_{ij})'\), \(\boldsymbol{\beta'} = (\beta_1, \beta_2)'\) \(\boldsymbol{\alpha}_j = (\alpha_{j1}, \alpha_{j2})'\). Now an observation in group \(j\) has intercept \(\beta_1 + \alpha_{j1}\) and slope \(\beta_2 + \alpha_{j2}\). If the model has a random intercept, but not a random slope, then all groups have slope \(\beta_2\).
The model for \(n_j\) observations in group \(j\) has this expected value:
\[E \left[\mathbf{y}_{\mathbf{j}} | \boldsymbol{\alpha}_{\mathbf{j}} \right] = \mathbf{X}_{\mathbf{j}} \boldsymbol{\beta} + \mathbf{W}_{\mathbf{j}} \boldsymbol{\alpha}_{\mathbf{j}}\]
The design matrices \(\mathbf{X}_{\mathbf{j}}\) and \(\mathbf{W}_{\mathbf{j}}\) have dimensions \(n_j \times p\) and \(n_j \times q\), respectively. The vector of random effects has the distribution \(\boldsymbol{\alpha} \sim MVN(\mathbf{0}, \mathbf{V})\), where \(\mathbf{0}\) is the length-\(qJ\) vector of zeros, and \(\mathbf{V}\) is the \(qJ \times qJ\) matrix of covariances. Because this is a random effects model, there will also be a prior on \(\mathbf{V} \sim IW(v, \mathbf{V}_0)\).
The likelihood is
\[\mathbf{y}_{\mathbf{j}} \sim MVN \left( \mathbf{X}_{\mathbf{j}} \boldsymbol{\beta} + \mathbf{W} \boldsymbol{\alpha}_{\mathbf{j}}, \sigma^2 \mathbf{I}_n \right)\]
where \(\boldsymbol{\alpha}_{\mathbf{j}}\) indicates the \(n \times q\) matrix having rows indexed to the proper group for observation \(i\). The full model is
\[ \begin{aligned} \boldsymbol{\beta}, \boldsymbol{\alpha}, \mathbf{V} | \mathbf{X}, \mathbf{y} & \sim \prod_i^n N \left( \mathbf{x}'_{ij} \boldsymbol{\beta} + \mathbf{w}_{ij} \boldsymbol{\alpha}_{j}, \sigma^2 \right) \\ & \times MVN \left( \boldsymbol{\beta} | 0, \mathbf{V}_b \right) \\ & \times MVN \left( vec(\boldsymbol{\alpha}) | 0, \mathbf{V} \right) \\ & \times IW(\mathbf{V} | v, \mathbf{V}_0) \\ & \times IG(\sigma^2 | s_1, s_2 ) \end{aligned} \] Note that both \(\boldsymbol{\beta}\) and \(\boldsymbol{\alpha}\) have prior distributions. However, the variance on \(\boldsymbol{\alpha}\) also has a prior distribution, the inverse Wishart for the covariance between random effects. This distribution will estimate the covariance between groups in their distribution. In the example below, I make variances apriori independent, \(\mathbf{V}^{-1}_{ii} = \tau_i \sim gamma(0.001, 0.001)\).