Chs. 4-5 Exercises

Setup and load Packages

Useful mathematic notatin in R markdown

\(x\)
\(\bar{x}\)
\(\mu\)
\(\sigma^2\)
\(\alpha\)
\(\beta\)
\(X \sim \text{Normal}(\mu, \sigma)\)
\(X \sim \text{Beta}(\alpha, \beta)\)
\(\to\)
\(n \to \infty\)
\(\propto\)
\(x \leq y\)
\(x \geq y\)
\(x \neq y\)
\(x^{n}\)
\(x_{n}\)
\(\bar{x}\)
\(\hat{x}\)
\(\frac{a}{b}\)

Exercise 4.1

Compare the results for the two analyses (inference about the mean when variance is known from Section 4.1 and inference about the mean and the variance with only data known from Section 4.3).

The 95% credible interval of HPD:

\(\mu\) marginal posterior distribution given the variance (78.80, 87.48) p. 91
\(\mu\) marginal posterior distribution with unknown variance and unknown mean (86.64, 86.69) p. 82

How do they differ? What explains why they differ in these ways?

Having more data results in greater certainty, as for the first case.

We can be more certain when computing the posterior distribution given the variance than when variance is unknown and all we have is observed data normally distributed.

Under what circumstances would such analyses yield increasingly similar results? Dissimilar results?

As \(n\) of conditional distribution of data increases (4.1.6) the mean given \(x\) and variance becomes a normal distribtion with mean of the same mean and more certainty/less variance.

As variace becomes more certain, the evaluation of mean given data and variance becomes more like the evaluation of mean and variance given x (assymptotics)

Exercise 4.2

Compare the results of the two in terms of themarginal posterior distribution for the variance: inference about the variance when the mean is known (Section 4.2) and inference about the mean and variance (Section 4.3).

\(\sigma^2| \mu, x\) 95% HPD interval (23.22, 91.34) p. 91

\(\sigma^2| x, \mu\) 95% HPD interval (27.02, 100.74) p. 82

How do they differ? What explains wy they differ in these ways?

The known mean does not increase certainty as expressed in 95% HPD interval as \(n \to \infty\), \(\mu_{\mu|x} \to \bar{x}\), so we should have more similar posterior distribution, but my guess is that currently 10 is too small an \(n\) to improve our posterior distribution.

Under what circumstances would such analyses yield increasingly similar results? Dissimilar results?

If we had greater \(n\) for our data it would improve the posterior distribution of the variance the inference about mean and variance.

Exercise 5.1 Beta-binomial from Section 2.4

First create the model string to be saved as a file and loaded into the jags.model() function

model_string <- "
model {
  # Likelihood
  y ~ dbin(theta, J)

  # Prior
  theta ~ dbeta(alpha, beta)
}
"

This is a conjugate Beta-Binomial model, so JAGS will use:

Gibbs sampling for theta, because the full conditional is a known Beta distribution.

Beta-binomial sata: n successes out of 10

data_list <- list(y = 7, J = 10, alpha = 6, beta = 6)

Write the model string to a temporary file

writeLines(model_string, con = "beta_binomial_model.jags")

5.1.a

Define dispersed initial values for each chain. We are starting with 0.1, 0.5, and 0.9. My reasoning for this is simply that they are sufficiently dispersed starting points, not related to the given prior or other.

inits_list <- list(
  list(theta = 0.1), 
  list(theta = 0.5), 
  list(theta = 0.9) 
  )

Initialize the model.

model <- jags.model("beta_binomial_model.jags",
                    data = data_list,
                    inits = inits_list,
                    n.chains = 3,
                    n.adapt = 0 # without this, the coda.samples will start later in the chains.
                    )

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 5
## 
## Initializing model

Compile the model with initial values In rjags, you do not specify the number of iterations when defining the model. Instead, you control the number of MCMC iterations only when you call:

update(model, n.iter) for burn-in (samples are discarded)
coda.samples(…) or jags.samples(…) for saving posterior samples

Coda is a format of output produced by WinBUGS Sample from the posterior using the coda package coda.samples(model, variable.names, n.iter, thin = 1, na.rm=TRUE, …)

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)

## NOTE: Stopping adaptation

Each time you run model, the coda.samples function starts from 0. Each time you run coda.samples function without initiating the model, it shows you the next \(n\) iterations.

traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.27       2.24

For the first 20 iterations, the graph shows some convergence, but the \(\hat{R}\) is still too far above 1.0.

Repeat from samples.coda() drawing additional samples using this model, monitoring convergence of the three chains 21-40

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.01       1.11

\(\hat{R}\) is good for iterations 21-40.

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.08       1.12

The \(\hat{R}\) is not as good for chains 41-60.

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.01       1.07

Looking good for iterations 61-80

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.05       1.21

\(\hat{R}\) is still good for 81-100.

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.05        1.2

By iteration 120 we have good convergence and sufficient \(\hat{R}\)

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta      0.992       1.03

Since it varies so muchfor each 20 iter set, maybe it makes more sense to do a bigger batch of samples. What if I were to redefine the model and start with 140 iterations sampled?

Initialize the model.

model <- jags.model("beta_binomial_model.jags",
                    data = data_list,
                    inits = inits_list,
                    n.chains = 3,
                    n.adapt = 0 # without this, the coda.samples will start later in the chains.
                    )

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 1
##    Total graph size: 5
## 
## Initializing model

Sample 140 interations starting from 0

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 140)

## NOTE: Stopping adaptation

traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.03       1.07

This looks good, with \(\hat{R}\) close to 1.

When complete, extract summary statistics of the posterior distribution of 10,000 past the convergence point

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 10000)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta          1          1

denplot(samples)

summary(samples)

## 
## Iterations = 141:10140
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      0.5899573      0.1024795      0.0005917      0.0006055 
## 
## 2. Quantiles for each variable:
## 
##   2.5%    25%    50%    75%  97.5% 
## 0.3836 0.5205 0.5923 0.6627 0.7795

5.1.b.

I chose to run 140 iterations to the point of convergence, with \(\hat{R}\) of 0.995 and good traceplot visual convergence.

5.1.c.

after which the 10,000 more interations and found the mean and median of the posterior distribution to be 0.593, with \(\sigma = 0.102\) and 95% HPD interval (0.384, 0.782)

These are very similar to the mean of 0.59, SD of 0.10, and 95% HPD interval (0.39, 0.79) in section 2.4.

Exercise 5.2 now with the beta-Bernoulli version

model_string <- "
model {
  # Prior distribution
  theta ~ dbeta(alpha, beta)
  
  # Likelihood (Bernoulli trials)
  for (j in 1:J) {
    x[j] ~ dbern(theta)
  }
}
"

Data: 1 is a success (there are 7 out of 10)

data_list <- list(J=10, x=c(1,0,1,0,1,1,1,1,0,1), alpha=6, beta=6)

Write the model string to a temporary file

writeLines(model_string, con = "beta_bernoulli_model.jags")

5.2.a

What initial values make sense for the three chains? Would the same again make sense? I think so.

Compile the model with initial values.

model <- jags.model("beta_bernoulli_model.jags",
                    data = data_list,
                    inits = inits_list,
                    n.chains = 3,
                    n.adapt = 0
)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 1
##    Total graph size: 14
## 
## Initializing model

1-20

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta      0.953      0.956

This looks good already even at iteration 20. Let’s look farther to see how stable the convergence is.

Repeat from samples.coda() drawing additional samples using this model, monitoring convergence of the three chains for iterations 21-40

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta       1.06       1.24

Looks pretty good. Close to 1 estimate. 41-60

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 20)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta      0.998       1.05

By iter 60 we are good to go already.

5.2.b.

Looks like by iteration 60 we have sufficient convergence, so from there forward we can draw 10,000 samples for the posterior distributions of each chain.

samples <- coda.samples(model, 
                        variable.names = c("theta"), 
                        n.iter = 10000)
traceplot(samples)

gelman.diag(samples)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## theta          1          1

Perfect stats for convergence at this point.