8640 Homework 6

Problem 5.2

In this case \(y_1\) and \(y_2\) are exchangeable and independent since the first draw is put back.
\(y_1\) and \(y_2\) are exchangeable since the joint distribution is the same no matter we draw which ball first. They are not independent because the result of \(y_1\) being black or white clearly will change the distribution of \(y_2\). However, we have no other information to model the correlation between \(y_1\) and \(y_2\), so we have to act as if they are independent and use a mixture of independent and identical distribution to model them.
As in (2) they are still exchangeable but not independent. Here we know the distribution of \(y_2 \mid y_1\), they can not be modeled as a mixture of independent variables.

Problem 5.5

Since \(\theta_i, \theta_j\) are independent sample from \(p(\theta_j \mid \phi)\), the joint distribution is the product of the marginals: \(p(\theta_i, \theta_j \mid \phi) = p(\theta_i \mid \phi) p(\theta_j \mid \phi)\). So their conditional covariance \(cov(\theta_i, \theta_j \mid \phi)\) is zero.

\[ \begin{aligned} cov(\theta_i, \theta_j) &= E[cov(\theta_i, \theta_j \mid \phi)] + cov[E(\theta_i \mid \phi), E(\theta_j \mid \phi)] \\ &= 0 + var[E(\theta_j \mid \phi)] > 0 \end{aligned} \]

Problem 5.8

Given that \(p_m(\theta)\) is conjugate prior for model \(y \mid \theta\), the posterior is: \[ p_m(\theta \mid y) = \frac{p_m(\theta) p(y \mid \theta)}{\int p_m(\theta) p(y \mid \theta) d\theta } = \frac{p_m(\theta) p(y \mid \theta)}{p_m(y)} \]

The posterior for the finite mixture prior density is given by: \[ p(\theta \mid y) = \frac{p(\theta)p(y\mid \theta)}{p(y)} = \frac{\sum_1^M \lambda_m p_m(\theta) p(y\mid \theta)}{p(y)} = \frac{\sum_1^M \lambda_m p_m(y) p_m(\theta \mid y) }{p(y)} = \sum_1^M \frac{\lambda_m p_m(y)}{p(y)} p_m(\theta \mid y) \]

in which \(\lambda_m p_m(y) / p(y)\) is the weight under posterior distribution.

If we use \(\lambda_1 = .3\), the prior will be \(p(\theta) = .3 p_1(\theta) + .7 p_2(\theta)\), in which \(p_1(\theta)\) and \(p_2(\theta)\) are normal density functions with mean and precision parameter \((-1, 4)\) and \((1, 4)\) respectively.

Using the result of normal conjugate priors, the posterior is still normal with parameters:

\[ \mu_1 = \frac{10\bar{y} - 4}{10 + 4} = -0.4642857, \qquad \mu_2 = 0.1071429, \qquad \sigma_1^2 = \sigma_2^2 = \frac{1}{10 + 4} = 0.07142857 \]

The weights in the posterior is \(\lambda_m p_m(y) / p(y)\), the normalizing constant can be ignored since the weights will sum to one. Here \(p_m(y)\) is the prior predictive distribution, but since we only have \(\bar{y}\), we can view it as one data point from model \(\bar{y} \mid \theta \sim N(\theta, 1/10)\), then using the conjugate prior \(p_m(\theta)\) on it.

By viewing \(\bar{y}\) as a sum of two normal random variables: \(\bar{y} = (\bar{y} - \theta) + \theta\), we can derive the distribution for \(\bar{y}\).

Note that \(\bar{y}-\theta \mid \theta \sim N(0, 1/10)\) and \(\theta \sim N(\pm 1, .5^2)\). Therefore, \(p_m(\bar{y}) \sim N(\pm 1, .5^2 + 1/10)\).

The weights are computed as follows:

l.prior <- c(.3, .7)
marginal <- dnorm(-.25, c(-1, 1), sqrt(.5^2 + 1/10))
l.post <- l.prior * marginal / sum(l.prior * marginal)
l.post

## [1] 0.6413604 0.3586396

Then the posterior distribution is

\[ p(\theta \mid y) = .64 p_1(\theta \mid y) + .36 p_2(\theta \mid y) \]

where \(p_1(\theta \mid y)\) is Normal\((-.464, 0.267^2)\) and \(p_2(\theta \mid y)\) is Normal\((0.107, 0.267^2)\).

library(ggplot2)
prior <- function(x) l.prior[1]*dnorm(x, -1, .5) + l.prior[2]*dnorm(x, 1, .5)
post <- function(x) {
  l.post[1]*dnorm(x, -.464, 0.267) + l.post[2]*dnorm(x, 0.107, 0.267)
}
qplot(c(-2.5, 2.5), geom = "blank") + 
  stat_function(fun = prior) + stat_function(fun = post)

Problem 5.10

a)

In 5.21, when \(\tau \to 0\), everything has a non-zero limit except for \(p(\tau)\). So if use \(p(\mu, \tau) \propto \tau^{-1}\), the posterior is improper.

b)

5.10 shows that if \(p(\tau) \propto 1\) then the posterior density is integrable near zero. If there is an upper bound when \(\tau \to \inf\) then it’s integrable.
\[ p(\tau\mid y) \propto \sum_J \prod_{k\neq j}(\sigma_j^2 + \tau^2)^{-1/2} exp(\cdot) < (\tau^{2(J-1)})^{-1/2} \] It has finite limits only when \(J > 2\).

Problem 5.13

a)

Using the same setting as in Section 5.13, the noninformative hyperprior is chosen to be \(p(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}\). The joint posterior distribution is: \[ \begin{aligned} p(\theta, \alpha, \beta \mid y) &\propto p(\alpha, \beta) p(\theta \mid \alpha, \beta) p(y \mid \theta) \\ &= (\alpha + \beta)^{-5/2} \prod_{j=1}^{10} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \theta_j^{\alpha-1}(1-\theta_j)^{\beta-1} \prod_{j=1}^{10} \theta_j^{y_j}(1-\theta_j)^{n_j-y_j} \end{aligned} \]

b)

The marginal posterior density is the same as in (5.8), except for the prior term. Plug in the prior in this problem we get the marginal posterior density: \[ p(\alpha, \beta \mid y) \propto (\alpha + \beta)^{-5/2} \prod_{j=1}^J \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha+y_j)\Gamma(\beta+n_j-y_j)}{\Gamma(\alpha + \beta + n_j)} \] If use the \((\log(\alpha/\beta), \log(\alpha + \beta))\) scale as in the book, the marginal posterior will be: \[ p(\gamma, \delta) = p(\log(\alpha/\beta), \log(\alpha + \beta)) \propto \alpha\beta(\alpha + \beta)^{-5/2} \prod_{j=1}^J \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha+y_j)\Gamma(\beta+n_j-y_j)}{\Gamma(\alpha + \beta + n_j)} \] I set the grid on \([-2, 1] \times [-3, 1]\), the marginal seems to be quite symmetric.

y <- c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55)
n <- c(58, 90, 48, 57, 103, 57, 86, 112, 273, 64)
n <- n + y

log.marg <- function(gam, del) {
  a = exp(gam + del) / (exp(gam) + 1)
  b = exp(del) / (exp(gam) + 1)
  prior <- -2.5 * (log(a) + log(b) + log(a + b))
  temp1 <- temp2 <- vector(length = 10)
  for (i in 1:10) {
    temp1[i] <- lgamma(a+b) + lgamma(a+y[i]) + lgamma(b+n[i]-y[i])
    temp2[i] <- lgamma(a) + lgamma(b) + lgamma(a + b + n[i])
  }
  prior + sum(temp1) - sum(temp2)
}


grid <- expand.grid(a = seq(from = -2, to = 1, length.out = 101),
                    b = seq(from = -3, to = 1, length.out = 101))
lmarg = mapply(log.marg, grid[, 1], grid[, 2])
unnorm.marg = exp(lmarg - max(lmarg))
grid <- data.frame(grid, marginal = unnorm.marg)

limits <- expand_limits(x=range(grid$a), y=range(grid$b))
v <- ggplot(grid, aes(a, b)) + limits
v + stat_contour(aes(z = marginal))

Samples are drawed from the joint posterior distribution as follows:

# compute the marginal for a
marginal.a <- aggregate(x = grid$marginal, by = list(grid$a), FUN = sum)
post.sampler <- function() {
    post.a <- sample(unique(grid$a), size = 1, prob = marginal.a[, 2])
    post.b <- sample(unique(grid$b), size = 1, 
                     prob = grid$posterior[grid$a == post.a])
    return(c(post.a, post.b))
}
# sample 1000 draws of (log(a/b), log(a+b))
post_sample <- replicate(n = 1000, post.sampler())

# transform back and draw theta
trans <- function(phi) {
  gam <- phi[1]
  del <- phi[2]
  a <- exp(gam + del) / (exp(gam) + 1)
  b <- exp(del) / (exp(gam) + 1)
  return(c(a, b))
}
phi <- apply(post_sample, 2, trans)
theta <- matrix(0, nrow = 1000, ncol = 10)
for (i in 1:1000) {
  for (j in 1:10) {
    theta[i, j] <- rbeta(1, shape1 = phi[1, i] + y[j], 
                         shape2 = phi[2, i] + n[j] - y[j])
  }
}

c, d)

The first plot is the boxplot for posterior sample of each \(theta_j\). The second plot is the posterior median against raw proportions.

library(reshape2)
theta.each <- melt(theta)
q <- apply(theta, 2, quantile, prob=c(.025, .975))
q

##            [,1]       [,2]       [,3]      [,4]       [,5]      [,6]
## 2.5%  0.1347984 0.04295068 0.09289614 0.1092272 0.09628073 0.1677967
## 97.5% 0.3187638 0.15575224 0.27613015 0.2843253 0.22568700 0.3671853
##            [,7]       [,8]       [,9]     [,10]
## 2.5%  0.1039693 0.08176685 0.08225622 0.3709451
## 97.5% 0.2442254 0.20226977 0.15472273 0.5481149

qplot(factor(Var2), value, data = theta.each, geom = "boxplot")

qplot(y/n, apply(theta, 2, median)) + 
  geom_errorbar(aes(ymax = q[1, ], ymin = q[2, ]),  width=0.01) + 
  geom_abline(slope=1)

e)

A new experiment is conducted, first we need to draw \(\phi\) from the posterior distribution (posterior median is used), then draw \(\tilde{\theta}\) from it.

# median of posterior sample
trans(c(-0.4400, -0.960))

## [1] 0.1499948 0.2328981

set.seed(2)
theta.new <- rbeta(1, 0.1499948, 0.2328981)
rbinom(n = 1, size = 100, prob = theta.new)

## [1] 13

qbinom(c(.025, .975), size = 100, prob = theta.new)

## [1]  3 14

Since it’s a new location, the prediction interval basically are not so reliable.

f)

Yes, it is, but other distributions on \((0, 1)\) may do well too.