Let’s fit our hierarhical model for counts of chocolate chips. The
data can be found in cookies.dat.
dat = read.table(file="data_files/cookies.dat", header=TRUE)
head(dat)
## chips location
## 1 12 1
## 2 12 1
## 3 6 1
## 4 13 1
## 5 12 1
## 6 12 1
table(dat$location)
##
## 1 2 3 4 5
## 30 30 30 30 30
hist(dat$chips)
boxplot(chips~location, data=dat)
Before implementing the model, we need to select prior distributions for \(\alpha\) and \(\beta\), the hyperparameters governing the gamma distribution for the \(\lambda\) parameters. First, think about what the \(\lambda\)’s represent. For location \(j\), \(\lambda j\) is the expected number of chocolate chips per cookie. Hence, \(\alpha\) and \(\beta\) control the distribution of these means between locations. The mean of this gamma distributidon will represent the overall mean of number of chips for all cookies. The variance of this gamma distribution controls the variability between locations. If this is high, the mean number of chips will vary widely from location to location. If it is small, the mean number of chips will be nearly the same from location to location.
To see the effects of different priors on the distribution of \(\lambda\)’s, we can simulate. Suppose we try independent exponential priors for \(\alpha\) and \(\beta\).
set.seed(112)
n_sim = 500
alpha_pri = rexp(n_sim, rate=1.0/2.0)
beta_pri = rexp(n_sim, rate=5.0)
mu_pri = alpha_pri/beta_pri
sig_pri = sqrt(alpha_pri/beta_pri^2)
summary(mu_pri)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.021 2.983 9.852 61.127 29.980 4858.786
summary(sig_pri)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1834 3.3663 8.5488 41.8137 22.2219 2865.6461
After simulating from the priors for \(\alpha\) and \(\beta\), we can use those samples to simulate further down the hierarchy:
lambda_pri = rgamma(n=n_sim, shape=alpha_pri, rate=beta_pri)
summary(lambda_pri)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.171 7.667 83.062 28.621 11005.331
Or for a prior predictive reconstruction of the original data set:
(lambda_pri = rgamma(n=5, shape=alpha_pri[1:5], rate=beta_pri[1:5]))
## [1] 66.444084 9.946688 6.028319 15.922568 47.978587
(y_pri = rpois(n=150, lambda=rep(lambda_pri, each=30)))
## [1] 63 58 64 63 70 62 61 48 71 73 70 77 66 60 72 77 69 62 66 71 49 80 66 75 74
## [26] 55 62 90 65 57 12 9 7 10 12 10 11 7 14 13 9 6 6 13 7 10 12 9 9 10
## [51] 7 8 6 9 7 10 13 13 8 12 6 10 3 6 7 4 6 7 5 5 4 3 6 2 8
## [76] 4 8 4 5 7 1 4 5 3 8 8 3 1 7 3 16 14 13 17 17 12 13 13 16 16
## [101] 15 14 11 10 13 17 16 19 16 17 15 16 7 17 21 16 12 15 14 13 52 44 51 46 39
## [126] 40 40 44 46 59 45 49 58 42 31 52 43 47 53 41 48 57 35 60 51 58 36 34 41 59
Because these priors have high variance and are somewhat noninformative, they produce unrealistic predictive distributions. Still, enough data would overwhelm the prior, resulting in useful posterior distributions. Alternatively, we could tweak and simulate from these prior distributions until they adequately represent our prior beliefs. Yet another approach would be to re-parameterize the gamma prior, which we’ll demonstrate as we fit the model.
JAGS Modellibrary("rjags")
## Loading required package: coda
## Linked to JAGS 4.3.2
## Loaded modules: basemod,bugs
mod_string = " model {
for (i in 1:length(chips)) {
chips[i] ~ dpois(lambda[location[i]])
}
for (j in 1:max(location)) {
lambda[j] ~ dgamma(alpha, beta)
}
alpha = mu^2 / sig^2
beta = mu / sig^2
mu ~ dgamma(2.0, 1.0/5.0)
sig ~ dexp(1.0)
} "
set.seed(113)
data_jags = as.list(dat)
params = c("lambda", "mu", "sig")
mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 150
## Unobserved stochastic nodes: 7
## Total graph size: 315
##
## Initializing model
update(mod, 1e3)
mod_sim = coda.samples(model=mod,
variable.names=params,
n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim))
## convergence diagnostics
plot(mod_sim)
gelman.diag(mod_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## lambda[1] 1 1.00
## lambda[2] 1 1.00
## lambda[3] 1 1.00
## lambda[4] 1 1.00
## lambda[5] 1 1.00
## mu 1 1.00
## sig 1 1.01
##
## Multivariate psrf
##
## 1
autocorr.diag(mod_sim)
## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5]
## Lag 0 1.000000000 1.0000000000 1.00000000 1.0000000000 1.000000000
## Lag 1 0.019896123 0.1179486545 0.01189915 0.0339354698 0.081685454
## Lag 5 0.001802248 0.0069474989 0.01648663 0.0033383446 0.008828777
## Lag 10 0.007575615 0.0104390098 -0.01165884 0.0025121857 -0.007741562
## Lag 50 -0.004694051 -0.0003860656 -0.00792229 -0.0008624712 0.012342023
## mu sig
## Lag 0 1.000000000 1.000000000
## Lag 1 0.364632757 0.596693734
## Lag 5 0.031656214 0.118428191
## Lag 10 0.008698618 0.003281204
## Lag 50 0.002030983 -0.017211067
autocorr.plot(mod_sim)
effectiveSize(mod_sim)
## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5] mu sig
## 14323.574 10907.144 14303.290 13466.308 12145.880 6039.979 3400.618
## compute DIC
dic = dic.samples(mod, n.iter=1e3)
After assessing convergence, we can check the fit via residuals. With a hierarhcical model, there are now two levels of residuals: the observation level and the location mean level. To simplify, we’ll look at the residuals associated with the posterior means of the parameters.
First, we have observation residuals, based on the estimates of location means.
## observation level residuals
(pm_params = colMeans(mod_csim))
## lambda[1] lambda[2] lambda[3] lambda[4] lambda[5] mu sig
## 9.275456 6.225599 9.522458 8.941679 11.754803 9.087465 2.080304
yhat = rep(pm_params[1:5], each=30)
resid = dat$chips - yhat
plot(resid)
plot(jitter(yhat), resid)
var(resid[yhat<7])
## [1] 6.447126
var(resid[yhat>11])
## [1] 13.72414
Also, we can look at how the location means differ from the overall mean \(\mu\).
## location level residuals
lambda_resid = pm_params[1:5] - pm_params["mu"]
plot(lambda_resid)
abline(h=0, lty=2)
We don’t see any obvious violations of our model assumptions.
summary(mod_sim)
##
## Iterations = 2001:7000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## lambda[1] 9.275 0.5359 0.004376 0.004481
## lambda[2] 6.226 0.4630 0.003780 0.004435
## lambda[3] 9.522 0.5413 0.004420 0.004537
## lambda[4] 8.942 0.5256 0.004291 0.004533
## lambda[5] 11.755 0.6228 0.005085 0.005663
## mu 9.087 0.9914 0.008094 0.012848
## sig 2.080 0.7175 0.005859 0.012306
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## lambda[1] 8.261 8.916 9.262 9.623 10.362
## lambda[2] 5.361 5.904 6.211 6.534 7.153
## lambda[3] 8.500 9.149 9.516 9.881 10.611
## lambda[4] 7.933 8.588 8.929 9.288 9.983
## lambda[5] 10.549 11.328 11.751 12.165 12.994
## mu 7.171 8.471 9.067 9.669 11.169
## sig 1.085 1.580 1.948 2.427 3.864
Just as we did with the prior distribution, we can use these posterior samples to get Monte Carlo estimates that interest us from the posterior predictive distribution.
For example, we can use draws from the posterior distribution of \(\mu\) and \(\sigma\) to simulate the posterior predictive distribution of the mean for a new location.
(n_sim = nrow(mod_csim))
## [1] 15000
lambda_pred = rgamma(n=n_sim, shape=mod_csim[,"mu"]^2/mod_csim[,"sig"]^2,
rate=mod_csim[,"mu"]/mod_csim[,"sig"]^2)
hist(lambda_pred)
mean(lambda_pred > 15)
## [1] 0.01806667
Using these \(\lambda\) draws, we can go to the observation level and simulate the number of chips per cookie, which takes into account the uncertainty in \(\lambda\):
y_pred = rpois(n=n_sim, lambda=lambda_pred)
hist(y_pred)
mean(y_pred > 15)
## [1] 0.0606
hist(dat$chips)
Finally, we could answer questions like: what is the posterior probability that the next cookie produced in Location 1 will have fewer than seven chips?
y_pred1 = rpois(n=n_sim, lambda=mod_csim[,"lambda[1]"])
hist(y_pred1)
mean(y_pred1<7)
## [1] 0.1924
We can extend the linear model for the Leinhardt data on infant mortality by incorporating the region variable. We’ll do this with a hierarhcical model, where each region has its own intercept.
library("car")
## Loading required package: carData
data("Leinhardt")
?Leinhardt
str(Leinhardt)
## 'data.frame': 105 obs. of 4 variables:
## $ income: int 3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
## $ infant: num 26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
## $ region: Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
## $ oil : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
pairs(Leinhardt)
head(Leinhardt)
## income infant region oil
## Australia 3426 26.7 Asia no
## Austria 3350 23.7 Europe no
## Belgium 3346 17.0 Europe no
## Canada 4751 16.8 Americas no
## Denmark 5029 13.5 Europe no
## Finland 3312 10.1 Europe no
Previously, we worked with infant mortality and income on the logarithmic scale. Recall also that we had to remove some missing data.
dat = na.omit(Leinhardt)
dat$logincome = log(dat$income)
dat$loginfant = log(dat$infant)
str(dat)
## 'data.frame': 101 obs. of 6 variables:
## $ income : int 3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
## $ infant : num 26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
## $ region : Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
## $ oil : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ logincome: num 8.14 8.12 8.12 8.47 8.52 ...
## $ loginfant: num 3.28 3.17 2.83 2.82 2.6 ...
## - attr(*, "na.action")= 'omit' Named int [1:4] 24 83 86 91
## ..- attr(*, "names")= chr [1:4] "Iran" "Haiti" "Laos" "Nepal"
Now we can fit the proposed model:
library("rjags")
mod_string = " model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu[i], prec)
mu[i] = a[region[i]] + b[1]*log_income[i] + b[2]*is_oil[i]
}
for (j in 1:max(region)) {
a[j] ~ dnorm(a0, prec_a)
}
a0 ~ dnorm(0.0, 1.0/1.0e6)
prec_a ~ dgamma(1/2.0, 1*10.0/2.0)
tau = sqrt( 1.0 / prec_a )
for (j in 1:2) {
b[j] ~ dnorm(0.0, 1.0/1.0e6)
}
prec ~ dgamma(5/2.0, 5*10.0/2.0)
sig = sqrt( 1.0 / prec )
} "
set.seed(116)
data_jags = list(y=dat$loginfant, log_income=dat$logincome,
is_oil=as.numeric(dat$oil=="yes"), region=as.numeric(dat$region))
data_jags$is_oil
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
table(data_jags$is_oil, data_jags$region)
##
## 1 2 3 4
## 0 31 20 24 18
## 1 3 2 3 0
params = c("a0", "a", "b", "sig", "tau")
mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 101
## Unobserved stochastic nodes: 9
## Total graph size: 622
##
## Initializing model
update(mod, 1e3) # burn-in
mod_sim = coda.samples(model=mod,
variable.names=params,
n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim)) # combine multiple chains
## convergence diagnostics
plot(mod_sim)
gelman.diag(mod_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## a[1] 1.02 1.06
## a[2] 1.02 1.06
## a[3] 1.02 1.06
## a[4] 1.02 1.07
## a0 1.00 1.01
## b[1] 1.02 1.07
## b[2] 1.00 1.01
## sig 1.00 1.00
## tau 1.00 1.00
##
## Multivariate psrf
##
## 1.02
autocorr.diag(mod_sim)
## a[1] a[2] a[3] a[4] a0 b[1] b[2]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.00000000
## Lag 1 0.9246845 0.9249464 0.9229657 0.9390910 0.2469307 0.9809473 0.13619648
## Lag 5 0.8541026 0.8530348 0.8513099 0.8691425 0.2337974 0.9057266 0.03466881
## Lag 10 0.7733505 0.7703498 0.7695464 0.7855497 0.2208945 0.8188252 0.05502477
## Lag 50 0.3738111 0.3798072 0.3747685 0.3892785 0.1054640 0.4024636 0.03415218
## sig tau
## Lag 0 1.000000000 1.000000000
## Lag 1 0.076573143 0.281251508
## Lag 5 0.018898807 -0.005351989
## Lag 10 0.008891877 0.002945107
## Lag 50 -0.005001137 -0.007247051
autocorr.plot(mod_sim)
effectiveSize(mod_sim)
## a[1] a[2] a[3] a[4] a0 b[1] b[2]
## 163.0862 174.2339 172.1697 155.9613 738.6444 145.6586 5133.1579
## sig tau
## 12112.3156 8420.2944
Convergence looks okay, so let’s compare this with the old model from Lesson 7 using DIC:
dic.samples(mod, n.iter=1e3)
## Mean deviance: 214.5
## penalty 7.233
## Penalized deviance: 221.8
It appears that this model is an improvement over the non-hierarchical one we fit earlier. Notice that the penalty term, which can be interpreted as the “effective” number of parameters, is less than the actual number of parameters (nine). There are fewer “effective” parameters because they are “sharing” information or “borrowing strength” from each other in the hierarhical structure. If we had skipped the hierarchy and fit one intercept, there would have been four parameters. If we had fit separate, independent intercepts for each region, there would have been seven parameters (which is close to what we ended up with).
Finally, let’s look at the posterior summary.
summary(mod_sim)
##
## Iterations = 1001:6000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## a[1] 6.5917 0.57773 0.0047171 0.0463922
## a[2] 6.0610 0.72694 0.0059355 0.0566796
## a[3] 5.8922 0.64292 0.0052494 0.0501706
## a[4] 5.5977 0.88600 0.0072341 0.0728915
## a0 6.0334 1.31723 0.0107551 0.0502761
## b[1] -0.3489 0.10969 0.0008956 0.0092578
## b[2] 0.6457 0.35291 0.0028815 0.0052327
## sig 0.9200 0.06605 0.0005393 0.0006028
## tau 2.0233 1.00736 0.0082251 0.0109946
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## a[1] 5.43215 6.2074 6.6029 6.9717 7.7516
## a[2] 4.60867 5.5773 6.0735 6.5319 7.5056
## a[3] 4.60577 5.4704 5.9022 6.3028 7.1790
## a[4] 3.82932 5.0120 5.6127 6.1687 7.3933
## a0 3.44507 5.2426 6.0340 6.8165 8.5807
## b[1] -0.56875 -0.4194 -0.3512 -0.2770 -0.1296
## b[2] -0.04826 0.4070 0.6446 0.8845 1.3428
## sig 0.80117 0.8736 0.9162 0.9630 1.0585
## tau 0.98288 1.4006 1.7703 2.3349 4.5507
In this particular model, the intercepts do not have a real interpretation because they correspond to the mean response for a country that does not produce oil and has $0 log-income per capita (which is $1 income per capita). We can interpret \(a_0\) as the overall mean intercept and \(\tau\) as the standard deviation of intercepts across regions.
We have not investigated adding interaction terms, which might be appropriate. We only considered adding hierarchy on the intercepts, but in reality nothing prevents us from doing the same for other terms in the model, such as the coefficients for income and oil. We could try any or all of these alternatives and see how the DIC changes for those models. This, together with other model checking techniques we have discussed could be used to identify your best model that you can use to make inferences and predictions.
In previous lessons, we fit models to data representing percent growth in personnel for companies in two industries. Below are additional data from the two original industries (with 10 and 6 companies, respectively), as well as 3 additional industries. Percent growth is reported for a total of 53 companies.
dat = read.table(file="data_files/pctgrowth.csv", sep=",", header=TRUE)
head(dat)
## y grp
## 1 1.2 1
## 2 1.4 1
## 3 -0.5 1
## 4 0.3 1
## 5 0.9 1
## 6 2.3 1
table(dat$grp)
##
## 1 2 3 4 5
## 10 6 8 15 14
hist(dat$y)
boxplot(y~grp, data=dat)
Rather that fit 5 separate models, one for each industry, we can fit a hierarchical model. As before, we assume a normal likelihood and common variance across all observations. Each industry will have its own mean growth, and each of these means will come from a common distribution, from which we will estimate the overall mean and variability across industries.
Let \(i\) index the individual
companies, and \(g_i\) indicate the
industry (grp variable in pctgrowth) for
company \(i\). How do we describe the
hierarchical model?
\(y_i \space |\space \theta_{g_i}, \sigma^2 \space \stackrel{ind}{\sim} \space N(\theta_{g_i},\sigma^2), \space \space i=1,...,53,\space \space g_i \in \{1,...,5\}\)
\(\theta_g \space |\space \mu, \tau^2 \space \stackrel{iid}{\sim} \space N(\mu,\tau^2), \space \space g_i \in \{1,...,5\}\)
\(\mu \space {\sim} \space N(0,1e6)\)
\(\tau^2 \space {\sim} \space IG(\frac{1}{2},(1)\frac{3}{2}))\)
\(\sigma^2 \space {\sim} \space IG(\frac{2}{2},(2)\frac{1}{2}))\)
Fit the hierarchical model in JAGS and obtain posterior
mean estimates for each industry’s mean growth (posterior mean for each
\(\theta_g\))
mod_quiz_string = " model {
for (i in 1:length(y)) {
y[i] ~ dnorm(theta[grp[i]], prec)
}
for (j in 1:max(grp)) {
theta[j] ~ dnorm(mu, tau_sq)
}
mu ~ dnorm(0, 1/1e6)
tau_sq ~ dgamma(1.0/2.0, 1.0*3.0/2.0)
prec ~ dgamma(2.0/2.0, 2*1/2)
sig = sqrt(1/prec)
} "
set.seed(113)
data_jags = as.list(dat)
params = c("theta", "mu", "sig")
mod_quiz = jags.model(textConnection(mod_quiz_string), data=data_jags, n.chains=3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 53
## Unobserved stochastic nodes: 8
## Total graph size: 127
##
## Initializing model
update(mod_quiz, 1e3)
mod_quiz_sim = coda.samples(model=mod_quiz,
variable.names=params,
n.iter=5e3)
mod_quiz_csim = as.mcmc(do.call(rbind, mod_quiz_sim))
## convergence diagnostics
plot(mod_quiz_sim)
gelman.diag(mod_quiz_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## mu 1 1
## sig 1 1
## theta[1] 1 1
## theta[2] 1 1
## theta[3] 1 1
## theta[4] 1 1
## theta[5] 1 1
##
## Multivariate psrf
##
## 1
autocorr.diag(mod_quiz_sim)
## mu sig theta[1] theta[2] theta[3]
## Lag 0 1.000000000 1.0000000000 1.000000000 1.000000000 1.000000000
## Lag 1 0.048099635 0.0842571704 0.045774105 0.059142931 0.041709931
## Lag 5 -0.006067291 0.0082003116 -0.016796849 -0.009077723 0.009376290
## Lag 10 0.010853499 0.0001289421 -0.007960229 0.012251592 -0.006946827
## Lag 50 -0.003163093 -0.0083421932 -0.013204207 -0.013031574 -0.004917743
## theta[4] theta[5]
## Lag 0 1.000000000 1.0000000000
## Lag 1 0.011760574 0.0070332512
## Lag 5 -0.003721279 0.0046122216
## Lag 10 -0.002468041 0.0085976124
## Lag 50 0.001491710 0.0007347058
autocorr.plot(mod_quiz_sim)
effectiveSize(mod_quiz_sim)
## mu sig theta[1] theta[2] theta[3] theta[4] theta[5]
## 13757.12 13440.76 13685.50 13327.47 13707.73 15000.00 14787.02
## compute DIC
dic = dic.samples(mod_quiz, n.iter=1e3)
pm_params = apply(mod_quiz_csim, 2, mean)
means_theta = pm_params[-c(1,2)]
We are interested in comparing these estimates to those obtained from
the model that assumes no hierarchy (the ANOVA cell means model). We can
approximate the posterior estimates for the 5 industry means under a
noninformative prior by simply calculating the sample mean growth for
the five industries. We can do this in R with:
means_anova = tapply(dat$y, INDEX=dat$grp, FUN=mean)
## dat is the data read from pctgrowth.csv
plot(means_anova)
points(means_theta, col="red") ## where means_theta are the posterior point estimates for the industry means.
The estimates from the hierarchical model have less variability than those of the ANOVA model, tending toward smaller magnitudes.
In our hierarchical model for personnel growth, we assumed that the variability between companies within an industry was constant across industries (\(\sigma^2\) was the same for all industries.) Which approach would be less informative?
Answer: Calculate the posterior probability that \(\sigma^2/\tau^2>1\) in the original model. If this probability exceeds a pre-determined amount, use a model with separate variance parameters.
Consider once again the OME data in the MASS package in
R, which we explored earlier. The data consist of
experimental results from tests of auditory perception in children.
Under varying conditions and for multiple trials under each condition,
children either correctly or incorrectly identified the source of the
changing signals.
Recall that the model looked like this:
\(y_i \space | \space \phi_i \stackrel{ind}{\sim} Binomial(n_i,\phi_i), \space \space i=1,...,712\)
\(logit(\phi_i)=\beta_0+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}\)
\(\beta_0 \sim N(0,5^2)\)
\(\beta_k \stackrel{iid}{\sim} N(0,4^2)\space \space k=1,2,3.\)
As with other models, we will extend the intercept (and rename it) so that the linear part of the model looks like this:
\(logit(\phi_i)=\alpha_{(ID_i)}+\beta_1 Age_i+\beta_2 I_{(OME_i=low)}+\beta_3 Loud_i+\beta_4 I_{(Noise_i=incoherent)}\)
where \(ID_i\) is an index identifying the child for observation \(i\).
The hierarchical prior for the intercepts would look like this:
\(\alpha_j \stackrel{iid}{\sim} N(\mu,\tau^2),\space \space j=1,...,63\) (There are 63 children).
followed by the priors for \(\mu\) and \(\tau^2\),
\(\mu \sim N(0,10^2)\)
\(\tau^2 \sim IG(\frac{1}{2},\frac{1}{2})\)
\(\tau^2\) indicates the variability in the number of correct responses across tests for one child.
We fit the hierarchical model outlined above using
JAGS:
library("MASS")
data("OME")
dat = subset(OME, OME != "N/A")
dat$OME = factor(dat$OME) # relabel OME
dat$ID = as.numeric(factor(dat$ID)) # relabel ID so there are no gaps in numbers (they now go from 1 to 63)
## Original reference model and covariate matrix
mod_glm = glm(Correct/Trials ~ Age + OME + Loud + Noise, data=dat, weights=Trials, family="binomial")
X = model.matrix(mod_glm)[,-1]
## Original model (that needs to be extended)
mod2_quiz_string = " model {
for (i in 1:length(y)) {
y[i] ~ dbin(phi[i], n[i])
logit(phi[i]) = alpha[ID[i]] + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
}
for (k in 1:max(ID)) {
alpha[k] ~ dnorm(mu, prec_alpha)
}
for (j in 1:4) {
b[j] ~ dnorm(0.0, 1.0/4.0^2)
}
mu ~ dnorm(0.0, 1.0/10.0^2)
prec_alpha ~ dgamma(1.0/2.0, 1.0/2.0)
tau = 1/prec_alpha
} "
data_jags = as.list(as.data.frame(X))
data_jags$y = dat$Correct
data_jags$n = dat$Trials
data_jags$ID = dat$ID
How do the convergence diagnostics look?
params = c("alpha", "b", "mu", "tau")
mod2_quiz = jags.model(textConnection(mod2_quiz_string), data=data_jags, n.chains=3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 712
## Unobserved stochastic nodes: 69
## Total graph size: 6499
##
## Initializing model
update(mod2_quiz, 1e3)
mod2_quiz_sim = coda.samples(model=mod2_quiz,
variable.names=params,
n.iter=5e3)
mod2_quiz_csim = as.mcmc(do.call(rbind, mod2_quiz_sim))
## convergence diagnostics
plot(mod2_quiz_sim)
gelman.diag(mod2_quiz_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## alpha[1] 1.12 1.34
## alpha[2] 1.12 1.35
## alpha[3] 1.11 1.33
## alpha[4] 1.12 1.35
## alpha[5] 1.12 1.35
## alpha[6] 1.10 1.30
## alpha[7] 1.12 1.35
## alpha[8] 1.12 1.33
## alpha[9] 1.11 1.31
## alpha[10] 1.12 1.36
## alpha[11] 1.12 1.33
## alpha[12] 1.10 1.29
## alpha[13] 1.11 1.30
## alpha[14] 1.11 1.32
## alpha[15] 1.12 1.33
## alpha[16] 1.11 1.31
## alpha[17] 1.12 1.34
## alpha[18] 1.12 1.34
## alpha[19] 1.11 1.32
## alpha[20] 1.13 1.36
## alpha[21] 1.11 1.33
## alpha[22] 1.13 1.36
## alpha[23] 1.11 1.33
## alpha[24] 1.13 1.37
## alpha[25] 1.13 1.36
## alpha[26] 1.12 1.35
## alpha[27] 1.11 1.32
## alpha[28] 1.12 1.33
## alpha[29] 1.11 1.31
## alpha[30] 1.12 1.34
## alpha[31] 1.11 1.31
## alpha[32] 1.11 1.31
## alpha[33] 1.11 1.33
## alpha[34] 1.11 1.31
## alpha[35] 1.11 1.31
## alpha[36] 1.11 1.31
## alpha[37] 1.11 1.31
## alpha[38] 1.11 1.33
## alpha[39] 1.11 1.32
## alpha[40] 1.11 1.31
## alpha[41] 1.11 1.32
## alpha[42] 1.12 1.35
## alpha[43] 1.11 1.33
## alpha[44] 1.11 1.31
## alpha[45] 1.12 1.34
## alpha[46] 1.11 1.31
## alpha[47] 1.11 1.31
## alpha[48] 1.11 1.32
## alpha[49] 1.11 1.33
## alpha[50] 1.12 1.34
## alpha[51] 1.12 1.33
## alpha[52] 1.11 1.32
## alpha[53] 1.11 1.33
## alpha[54] 1.11 1.31
## alpha[55] 1.11 1.32
## alpha[56] 1.12 1.34
## alpha[57] 1.11 1.32
## alpha[58] 1.12 1.34
## alpha[59] 1.11 1.32
## alpha[60] 1.12 1.34
## alpha[61] 1.12 1.35
## alpha[62] 1.11 1.33
## alpha[63] 1.12 1.35
## b[1] 1.02 1.04
## b[2] 1.00 1.00
## b[3] 1.15 1.43
## b[4] 1.01 1.03
## mu 1.15 1.44
## tau 1.00 1.00
##
## Multivariate psrf
##
## 1.08
autocorr.diag(mod2_quiz_sim)
## alpha[1] alpha[2] alpha[3] alpha[4] alpha[5] alpha[6] alpha[7]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8805935 0.8863123 0.8758478 0.9174165 0.9009758 0.8250163 0.8954051
## Lag 5 0.8321182 0.8378684 0.8132164 0.8790662 0.8529819 0.7686983 0.8508082
## Lag 10 0.8133812 0.8213814 0.7992093 0.8634349 0.8400800 0.7584615 0.8326739
## Lag 50 0.7215953 0.7150055 0.7036058 0.7542144 0.7287293 0.6640281 0.7320426
## alpha[8] alpha[9] alpha[10] alpha[11] alpha[12] alpha[13] alpha[14]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8773307 0.8441179 0.8736779 0.8947965 0.8276362 0.8319647 0.8521179
## Lag 5 0.8255534 0.7798186 0.8230767 0.8491825 0.7622669 0.7665667 0.7946590
## Lag 10 0.8096994 0.7605186 0.8037414 0.8329099 0.7436259 0.7537604 0.7805251
## Lag 50 0.7100236 0.6726599 0.7074177 0.7244428 0.6584196 0.6641613 0.6889869
## alpha[15] alpha[16] alpha[17] alpha[18] alpha[19] alpha[20] alpha[21]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8777698 0.8373863 0.8422627 0.8853559 0.8345200 0.8988296 0.8539885
## Lag 5 0.8291231 0.7787474 0.7857128 0.8271508 0.7702236 0.8542913 0.7980614
## Lag 10 0.8147044 0.7658951 0.7730859 0.8126260 0.7565144 0.8440269 0.7843892
## Lag 50 0.7101232 0.6715441 0.6853994 0.7160263 0.6653927 0.7458603 0.6903959
## alpha[22] alpha[23] alpha[24] alpha[25] alpha[26] alpha[27] alpha[28]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.9088185 0.8298057 0.9074378 0.8652318 0.8392386 0.8415115 0.8610941
## Lag 5 0.8570620 0.7647724 0.8655714 0.8137651 0.7836328 0.7781047 0.8062735
## Lag 10 0.8354329 0.7484220 0.8506405 0.7956588 0.7668578 0.7625849 0.7938747
## Lag 50 0.7334242 0.6633423 0.7413164 0.7021577 0.6824752 0.6725712 0.6991639
## alpha[29] alpha[30] alpha[31] alpha[32] alpha[33] alpha[34] alpha[35]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8352577 0.8475182 0.8378022 0.8416694 0.8501929 0.8418685 0.8377269
## Lag 5 0.7652757 0.7911818 0.7723838 0.7667344 0.7888382 0.7841291 0.7775364
## Lag 10 0.7525940 0.7770906 0.7555281 0.7494930 0.7738743 0.7712052 0.7645335
## Lag 50 0.6674600 0.6895933 0.6646291 0.6615691 0.6798215 0.6814761 0.6696707
## alpha[36] alpha[37] alpha[38] alpha[39] alpha[40] alpha[41] alpha[42]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8265941 0.8467042 0.8390624 0.8371079 0.8140012 0.8159711 0.8445468
## Lag 5 0.7603765 0.7866168 0.7802228 0.7771267 0.7422737 0.7446838 0.7916918
## Lag 10 0.7527312 0.7758702 0.7688240 0.7612942 0.7331010 0.7264762 0.7813124
## Lag 50 0.6637745 0.6830923 0.6802446 0.6751497 0.6466207 0.6445759 0.6920931
## alpha[43] alpha[44] alpha[45] alpha[46] alpha[47] alpha[48] alpha[49]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8384532 0.8395410 0.8389451 0.8384219 0.8336936 0.8338291 0.8369821
## Lag 5 0.7805831 0.7769642 0.7735889 0.7688315 0.7743716 0.7701505 0.7776108
## Lag 10 0.7691670 0.7590066 0.7582882 0.7529992 0.7627380 0.7550386 0.7623728
## Lag 50 0.6777696 0.6665319 0.6670678 0.6619549 0.6674187 0.6663895 0.6688652
## alpha[50] alpha[51] alpha[52] alpha[53] alpha[54] alpha[55] alpha[56]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8331925 0.8474647 0.8242888 0.8419744 0.8334118 0.8109588 0.8415047
## Lag 5 0.7756962 0.7851986 0.7610924 0.7742288 0.7661141 0.7443987 0.7830543
## Lag 10 0.7609723 0.7735420 0.7484117 0.7630020 0.7486881 0.7312896 0.7682234
## Lag 50 0.6720896 0.6829579 0.6639048 0.6684293 0.6591061 0.6505607 0.6816542
## alpha[57] alpha[58] alpha[59] alpha[60] alpha[61] alpha[62] alpha[63]
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1 0.8671884 0.8773033 0.8296906 0.8579489 0.8438144 0.8271783 0.8610227
## Lag 5 0.8148115 0.8312434 0.7687921 0.7988164 0.7862958 0.7678746 0.8050146
## Lag 10 0.7978084 0.8175617 0.7526325 0.7834565 0.7661347 0.7575568 0.7903791
## Lag 50 0.7056465 0.7178228 0.6653041 0.6982369 0.6769475 0.6688112 0.7012956
## b[1] b[2] b[3] b[4] mu tau
## Lag 0 1.0000000 1.00000000 1.0000000 1.00000000 1.0000000 1.00000000
## Lag 1 0.9521623 0.90300601 0.9889125 0.52182385 0.9933711 0.68500171
## Lag 5 0.7944743 0.61653335 0.9491846 0.12835601 0.9810928 0.28988657
## Lag 10 0.6455344 0.37919008 0.9051031 0.09806712 0.9662158 0.08792042
## Lag 50 0.2251879 0.03836241 0.7095529 0.11254399 0.8524014 0.04299585
autocorr.plot(mod2_quiz_sim)
effectiveSize(mod2_quiz_sim)
## alpha[1] alpha[2] alpha[3] alpha[4] alpha[5] alpha[6] alpha[7]
## 37.44775 36.91292 32.31596 28.62276 33.31762 41.67873 36.23842
## alpha[8] alpha[9] alpha[10] alpha[11] alpha[12] alpha[13] alpha[14]
## 35.28917 40.76902 33.14288 36.70273 46.14203 39.15319 36.98484
## alpha[15] alpha[16] alpha[17] alpha[18] alpha[19] alpha[20] alpha[21]
## 36.71379 37.74433 35.24041 38.23883 38.36116 32.37557 33.96510
## alpha[22] alpha[23] alpha[24] alpha[25] alpha[26] alpha[27] alpha[28]
## 40.80413 38.36735 37.45380 37.11940 35.65579 39.73454 33.91251
## alpha[29] alpha[30] alpha[31] alpha[32] alpha[33] alpha[34] alpha[35]
## 36.33040 38.19714 42.32620 39.45678 39.28585 38.52281 40.86740
## alpha[36] alpha[37] alpha[38] alpha[39] alpha[40] alpha[41] alpha[42]
## 36.03921 38.29534 39.54653 31.77420 39.25245 43.43920 37.23404
## alpha[43] alpha[44] alpha[45] alpha[46] alpha[47] alpha[48] alpha[49]
## 37.27081 35.96194 44.91382 42.77693 38.54170 35.53320 42.60201
## alpha[50] alpha[51] alpha[52] alpha[53] alpha[54] alpha[55] alpha[56]
## 34.63866 39.22699 36.00868 35.40188 41.43014 39.01054 34.41874
## alpha[57] alpha[58] alpha[59] alpha[60] alpha[61] alpha[62] alpha[63]
## 31.87392 35.74257 39.98569 36.58211 41.32005 31.01417 32.24241
## b[1] b[2] b[3] b[4] mu tau
## 332.15409 719.06244 76.46295 1098.16803 23.93485 1926.56211
## compute DIC
dic = dic.samples(mod2_quiz, n.iter=1e3)
dic
## Mean deviance: 1240
## penalty 28.76
## Penalized deviance: 1269