You should start by reading Section 10 of the textbook.
All of Bayesian inference is based on the posterior distribution of parameters given sample data. Therefore, the main computational task in a Bayesian analysis is computing, or approximating, the posterior distribution. We have seen three ways to compute/approximate the posterior distribution
In principle, the posterior distribution \(\pi(\theta|y)\) of parameters \(\theta\) given observed data \(y\) can be found by
However, this is a very computationally inefficient way of approximating the posterior distribution. Unless the sample size is really small, the simulated sample statistic \(Y\) will only match the observed \(y\) value in relatively few samples, simply because in large samples there are just many more possibilities. For example, in 1000 flips of a fair coin, the most likely value of the number of heads is 500, but the probability of exactly 500 heads in 1000 flips is only 0.025. When there are many possibilities, the probability gets stretched fairly thin. Therefore, if we want say 10000 simulated values of \(\theta\) given \(y\), we would have first simulate many, many more values.
The situation is even more extreme when the data is continuous, where the probably of replicating the observed sample is essentially 0.
Therefore, we need more efficient simulation algorithms for approximating posterior distributions. Markov chain Monte Carlo (MCMC) methods provide powerful and widely applicable algorithms for simulating from probability distributions, including complex and high-dimensional distributions. These algorithms include Metropolis-Hastings, Gibbs sampling, Hamiltonian Monte Carlo, among others. We will see later some of the ideas behind MCMC algorithms. However, we will rely on the software package JAGS to carry out MCMC simulations.
JAGS (“Just Another Gibbs Sampler”) is a stand alone program for performing MCMC simulations. JAGS takes as input a Bayesian model description — prior plus likelihood — and data and returns an MCMC sample from the posterior distribution. JAGS uses a combination of Metropolis sampling, Gibbs sampling, and other MCMC algorithms.
This lab gets you started with the JAGS software. You will be coding in JAGS examples that we have already seen so that you can compare JAGS output with results from the other methods (math, grid approximation.)
The purpose of this lab is to just get you up and running with JAGS. We will talk a little bit more about how MCMC works later.
This RMarkdown file provides a template for you to fill in. Read the file from start to finish, completing the parts as indicated. Some code is provided for you. Be sure to run this code, and make sure you understand what it’s doing. Some blank “code chunks” are provided; you are welcome to add more (Code > Insert chunk or CTRL-ALT-I) as needed. There are also a few places where you should type text responses. Feel free to add additional text responses as necessary.
You can run individual code chunks using the play button. You can use objects defined in one chunk in others. Just keep in mind that chunks are evaluated in order. So if you call something x in one chunk, but redefine x as something else in another chunk, it’s essential that you evaluate the chunks in the proper order.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. When you are finished
You’ll need a few R packages, which you can install using install.packages
library(ggplot2)
library(dplyr)
library(knitr)
library(janitor)
library(rjags)
library(bayesplot)
library(runjags)
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE
)
JAGS is a standalone software program that you need to download and install outside of R. Install JAGS using Step 3 of these instructions.
JAGS is a separate program that must be installed outside of R. However, we will only interface with JAGS through R/RStudio. To do this, you will need to install the rjags package in R: install.packages("rjags").
You should also install the bayesplot package.
# you'll need to install the packages first
# install.packages("rjags")
# install.packages("bayesplot")
library(rjags)
library(bayesplot)
Read Section 10.1 of the textbook for a brief introduction to JAGS.
In Lab 1, you used simulation to approximate a posterior distribution. Now you will use JAGS to conduct a similar but much more computationally efficient simulation.
Suppose you’re interested in \(\theta\), the population proportion of Cal Poly students whose dominant eye (right or left) is the same same as their dominant hand (right or left).
Assume a Beta(2, 1) prior for \(\theta\). (This is similar to the prior used in parts 3-6 of Lab 1.)
In a sample of 36 students, 29 students have the same dominant right eye and right hand. (This is based on my STAT 217 class in Fall 2021, but you can assume it’s a random sample.)
To do:
library(rjags)
library(bayesplot)
library(runjags)
n = 36 # sample size
y = 29 # number of successes
model_string <- "model{
# Likelihood
y ~ dbinom(theta, n)
# Prior
theta ~ dbeta(alpha, beta)
alpha <- 2 # prior successes
beta <- 1 # prior failures
}"
dataList = list(y = y, n = n)
model <- jags.model(file = textConnection(model_string),
data = dataList)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 1
## Total graph size: 5
##
## Initializing model
update(model, n.iter = 1000)
Nrep = 10000 # number of values to simulate
posterior_sample <- coda.samples(model,
variable.names = c("theta"),
n.iter = Nrep)
summary(posterior_sample)
##
## Iterations = 2001:12000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.7934053 0.0642869 0.0006429 0.0008769
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.6524 0.7530 0.7983 0.8403 0.9051
plot(posterior_sample)
library(bayesplot)
mcmc_hist(posterior_sample)
Type your response here. Looking at this simulated posterior distribution of theta using JAGS, it is extremely similar to that of Lab 1’s simulation of the posterior distribution. Both have a center of around 0.8 and appear to have similar spreads as well. Thus, the JAGS output is reasonably consistent with the previous results of Lab 1.
Hints:
You’ll use JAGS to perform a Bayesian analysis of the body temperature problem from Lab 3. Recall the assumptions:
In a recent study1, the sample mean body temperature in a sample of 208 healthy adults was 97.7 degrees F. Here is the sample data on 208 temperature measurements. Notice that the sample mean is 97.7 and the sample SD is 1 (consistent with the assumed population SD of \(\sigma = 1\).)
data = read.csv("temperature.csv")
y = data$temperature
hist(y, freq = FALSE)
summary(y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 95.05 97.06 97.60 97.70 98.32 100.88
sd(y)
## [1] 0.9999203
n = length(y)
ybar = mean(y)
y
## [1] 98.44 99.40 99.07 97.32 99.90 97.33 97.21 96.30 97.00 99.05
## [11] 96.57 97.61 97.56 98.58 97.83 98.94 97.27 97.10 97.42 97.36
## [21] 95.86 97.49 96.87 98.82 98.25 98.68 96.35 97.11 99.71 97.98
## [31] 99.36 96.45 99.11 98.53 97.70 95.80 98.12 97.33 96.50 97.49
## [41] 97.04 96.09 97.56 99.35 96.86 98.14 97.47 97.94 97.13 95.82
## [51] 98.24 99.35 97.64 97.98 96.01 98.20 98.50 97.04 95.41 99.81
## [61] 98.13 97.13 98.51 96.62 98.22 98.83 97.19 98.82 96.90 99.70
## [71] 96.32 99.05 99.22 97.28 98.10 96.96 98.02 98.00 98.72 98.68
## [81] 98.03 96.99 98.22 99.66 97.27 99.99 98.74 98.12 96.45 98.49
## [91] 97.92 96.30 98.67 95.75 97.39 97.06 97.17 98.55 97.24 99.06
## [101] 98.59 96.63 97.58 98.30 96.73 97.43 97.38 98.72 97.26 97.85
## [111] 98.27 96.49 96.99 99.15 98.61 97.68 97.10 97.49 95.95 96.39
## [121] 97.82 95.05 97.59 98.29 97.31 98.31 97.54 97.32 97.86 97.52
## [131] 98.84 96.71 98.17 97.60 97.20 96.57 98.72 97.98 96.99 97.71
## [141] 98.78 96.96 98.54 98.25 96.68 98.03 97.19 96.21 98.15 98.16
## [151] 98.52 98.23 97.29 95.24 97.39 97.56 97.55 97.54 95.90 98.15
## [161] 97.58 97.10 96.48 100.19 97.65 96.79 99.61 97.58 97.30 97.72
## [171] 97.74 98.20 98.97 98.22 97.89 98.66 98.35 98.20 96.85 97.11
## [181] 95.77 96.64 97.79 96.62 97.71 97.10 96.57 96.77 96.93 96.75
## [191] 96.50 96.92 97.25 99.06 97.37 98.44 100.88 98.38 97.73 97.60
## [201] 97.82 97.06 97.43 98.48 98.20 98.60 97.82 97.39
To do:
library(rjags)
library(bayesplot)
library(runjags)
model_string <- "model{
# Likelihood
for (i in 1:n){
y[i] ~ dnorm(theta,1)
}
# Prior
theta ~ dnorm(mu0, sd0)
mu0 <- 98.6
sd0 <- 1
}"
dataList = list(y = y, n = n)
# Compile
model <- jags.model(file = textConnection(model_string),
data = dataList)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 208
## Unobserved stochastic nodes: 1
## Total graph size: 212
##
## Initializing model
update(model, n.iter = 1000)
Nrep = 10000 # number of values to simulate
posterior_sample <- coda.samples(model,
variable.names = c("theta"),
n.iter = Nrep)
summary(posterior_sample)
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 9.770e+01 6.925e-02 6.925e-04 6.925e-04
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 97.57 97.66 97.71 97.75 97.84
plot(posterior_sample)
library(bayesplot)
mcmc_hist(posterior_sample)
Type your response here. After simulatying the approximate posterior predictive distribution of body temperatures using JAGS, I can see that this simulation technique is extremely similar to that of Lab 3. Both have posterior means of around 97.7 and the 25th and 75th percentiles are very similar as well, 97.66 97.70 for the JAGS simulation and 97.70007 97.79121 for our results also using Normal-Normal in Lab 3. Thus, we can conclude that the JAGS output is reasonably consistent with the previous results from Lab 3.
Hints:
TYPE YOUR RESPONSE HERE. 1) One important concept I’ve learned with this lab is that there are many different methods in how to simulate a Bayesian posterior predictive distribution and that there isn’t just one unique way to do it. I also learned some helpful syntax and feel fairly confident on how to apply and use JAGS. 2) What are the advantages of using JAGS over the other techniques we have used in class thus far (e.g. grid-approximation)?
Source and a related article.↩