Math 538 HW3

Libraries Used

library(ggplot2)
library(LaplacesDemon) #scaled inverse chi square random generator
library(metRology) #scaled and shifted t distribution

Chapter 3 Textbook Problem 1a.)

We first need to compute the joint posterior.

\[ \begin{align} p(\theta|y) &\propto p(\theta)p(y|\theta)\\ &\propto \prod_{k=1}^J \theta_k^{\alpha_k - 1} \prod_{k=1}^J \theta_k^{y_k}\\ &\propto \prod_{k=1}^J \theta_k^{y_k + \alpha_k - 1} \end{align} \]

Thus, the joint posterior distribution of \(\theta = (\theta_1,...,\theta_J) \sim Dirichlet(y_1 + \alpha_1,..., y_J + \alpha_J)\).

Now we seek the posterior distribution of \(\alpha = \frac{\theta_1}{\theta_1 + \theta_2}\)

We define a change of variables.

\[ u = \frac{\theta_1}{\theta_1 + \theta_2}\\ v = \theta_1 + \theta_2\\ \implies \\ \theta_1 = uv\\ \theta_2 = v(1-u) \]

Then, \(|detJ^{-1}|\) is equal to \(v(1-u) - (-uv) = v\).

We will use the following change of variable formula:

\[ \begin{align} p_{U,V}(u,v|y) &= |detJ^{-1}|p_{\theta_1, \theta_2}(\theta_1, \theta_2|y)\\ &= |detJ^{-1}|p_{\theta_1, \theta_2}(uv, v(1-u)) \end{align} \]

Thus, what we are missing is knowing the joint density \(p_{\theta_1, \theta_2}(\theta_1, \theta_2|y)\).

From Appendix A we are given the result that the marginal joint distribution of a subvector of \(\theta\) is also Dirichlet.

First off we can recognize that \((\theta_i, \theta_j) \sim (\theta_i, \theta_j, 1-\theta_i - \theta_j)\).

What does this say? Well, if we define two classes \(\theta_i\) and \(\theta_j\) then by default we must have a third class, i.e., those not belonging to class \(i\) or class \(j\).

With that in mind we can write the marginal joint distribution of the subvector \((\theta_1, \theta_2)\) as follows:

\[ \begin{align} p_{\theta_1, \theta_2}(\theta_1, \theta_2|y) &\propto p_{\theta_1, \theta_2,1- \theta_1 - \theta_2}(\theta_1, \theta_2, 1 - \theta_1 - \theta_2|y)\\ &\propto \theta_1^{y_1 + \alpha_1 - 1} \theta_2^{y_2 + \alpha_2 - 1} (1-\theta_1 - \theta_2)^{y_3 + ... + y_J + \alpha_3 + ... + \alpha_J - 1} \end{align} \]

We plug this joint distribution into our change of variable formula.

\[ \begin{align} p_{U,V}(u,v|y) &= |detJ^{-1}|p_{\theta_1, \theta_2}(\theta_1, \theta_2|y)\\ &= |detJ^{-1}|p_{\theta_1, \theta_2, 1-\theta_1 - \theta_2}(\theta_1, \theta_2, 1-\theta_1 - \theta_2|y)\\ &= |detJ^{-1}|p_{\theta_1, \theta_2, 1-\theta_1 - \theta_2}(uv, v(1-u), 1 - uv - v(1-u)|y)\\ &= |detJ^{-1}|p_{\theta_1, \theta_2, 1-\theta_1 - \theta_2}(uv, v(1-u), 1-v|y)\\ &= v (uv)^{y_1 + \alpha_1 - 1} (v(1-u))^{y_2 + \alpha_2 - 1} (1-v)^{y_3 + ... + y_J + \alpha_3 + ... + \alpha_J - 1}\\ &= u^{y_1 + \alpha_1 - 1} (1-u)^{y_2 + \alpha_2 - 1} v^{y_1 + \alpha_1 + y_2 + \alpha_2 - 1} (1-v)^{y_3 + ... + y_J + \alpha_3 + ... + \alpha_J - 1}\\ &\propto p(u) p(v) \end{align} \]

Thus, we have shown that the joint density \(p(u,v) = p(u)p(v)\). Thus,

\[ u = \frac{\theta_1}{\theta_1 + \theta_2}\\ v = \theta_1 + \theta_2 \]

are independent.

Let’s examine the structure of \(p(u) = p(\frac{\theta_1}{\theta_1 + \theta_2})\).

\[ p(u) \propto u^{y_1 + \alpha_1 - 1} (1-u)^{y_2 + \alpha_2 - 1}\\ \implies\\ p(\frac{\theta_1}{\theta_1 + \theta_2}) \propto (\frac{\theta_1}{\theta_1 + \theta_2})^{y_1 + \alpha_1 - 1} (1- \frac{\theta_1}{\theta_1 + \theta_2})^{y_2 + \alpha_2 - 1} \]

We recognize the final density above as \(Beta(y_1 + \alpha_1, y_2 + \alpha_2)\).

Therefore, in conclusion, \(\frac{\theta_1}{\theta_1 + \theta_2} \sim Beta(y_1 + \alpha_1, y_2 + \alpha_2)\)

Chapter 3 Textbook Problem 1b.)

Let \(\alpha = \frac{\theta_1}{\theta_1 + \theta_2}\) have a prior distribution \(Beta(\alpha_1, \alpha_2)\)

Let \(y_1 \sim Binomial(y_1 + y_2, \alpha)\) be our observed data.

Then we have:

\[ \begin{align} p(\alpha | y_1) &\propto p(\alpha)p(y_1|\alpha)\\ &\propto (\alpha)^{\alpha_1 - 1} (1-\alpha)^{\alpha_2 - 1} \binom{y_1+y_2}{\alpha} (\alpha)^{y_1} (1-\alpha)^{y_2}\\ &\propto (\alpha)^{y_1 + \alpha_1 - 1} (1-\alpha)^{y_2 + \alpha_2 - 1}\\ &\sim Beta(y_1 + \alpha_1, y_2 + \alpha_2) \end{align} \]

Chapter 3 Textbook Problem 15a.)

The distribution of \(y_t\) depends only on \(y_{t-1}\) and \(y_{t+1}\) due to the fact that the time series is an AR(1).

Chapter 3 Textbook Problem 15b.)

The distribution of \((y_t|y_{t-1}, y_{t+1}) \sim Normal(0.8y_{t-1} + 0.2y_{t+1}, 1)\).

Additional Problem 1

Part A.) Inverse CDF Method

We first need to derive the CDF of the Weibull.

\[ \begin{align} F(\theta) &= \int_{0}^{\theta} f(t)dt\\ &= \int_{0}^{\theta} \frac{\alpha}{\beta^{\alpha}} t^{\alpha-1} \exp(-\frac{t}{\beta})^{\alpha}dt\\ &= \int_{0}^{(\frac{\theta}{\beta})^{\alpha}} \frac{\alpha}{\beta^{\alpha}} t^{\alpha-1} \exp(-u) \frac{\beta^{\alpha}}{\alpha t^{\alpha-1}}du\\ &= \int_{0}^{(\frac{\theta}{\beta})^{\alpha}} \exp(-u)du\\ &=1 - \exp(-(\frac{\theta}{\beta})^\alpha) \end{align} \]

Where in the above derivation we used a \(u\) substituion of \(u = (\frac{t}{\beta})^{\alpha}\).

Therefore, for \(\theta \geq 0\) we have \(F(\theta) =1 - \exp(-(\frac{\theta}{\beta})^\alpha)\). For \(\theta < 0\) we have \(F(\theta) = 0\)

Now we must analytically solve for the inversed CDF. We swtich \(x\) and \(y\) and then solve for \(y\).

\[ y = 1-e^{-(\frac{x}{\beta})^{\alpha}} \implies x = 1 - e^{-(\frac{y}{\beta})^{\alpha}}\\ 1 - x = e^{-(\frac{y}{\beta})^{\alpha}}\\ \ln(1-x) = -(\frac{y}{\beta})^{\alpha}\\ \beta(-\ln(1-x))^{\frac{1}{\alpha}} = y \]

Therefore, \(F^{-1}(\theta) = \beta(-\ln(1-\theta))^{\frac{1}{\alpha}}\)

We will now use the Inverse CDF method.

#Inverse CDF of Weibull
F.inv <- function(theta, alpha, beta) {
  val <- beta * (-1*log(1-theta))^(1/alpha)
  return(val)
}

#Inverse CDF Method
n = 10000
U <- runif(n, min=0, max=1)
cauchy.theta <- F.inv(U, alpha=2, beta=0.5)

#plot in ggplot
data.cauchy = data.frame(theta = cauchy.theta)
p <- ggplot(data.cauchy, aes(theta)) + 
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dweibull, 
    args=list(shape=2, scale=0.5), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Part B: Rejection Sampling using Exponential as Envelope

Let’s first define a reasonable support for our densities. We can estimate it by looking at the standard deviation of our previously sampled Cauchy.From our plot above it seems reasonable that \(\theta \in (0,1.5)\) will be more than sufficient.

We first visually check that the \(Weib(\alpha, \beta) \leq kExp(\beta)\) for \(k = \alpha\).

alpha = 2
beta = 0.5
k = alpha

vals <- seq(0,1.5, length.out = 500)
plot(1, type='n', xlab='theta', xlim=c(0,1.5), ylim = c(0,5), ylab='Density')
lines(vals, dweibull(vals, shape=alpha, scale=beta), col='blue' )
lines(vals, k*dexp(vals, rate=(1/beta)), col='red')
legend('topleft', legend=c('Weibull', 'k*Exponential'), text.col=c('blue', 'red'))

As we can see, the red Exponential curve is greater than or equal to the blue Weibull curve for all values of \(\theta\).

Now we will use Rejection Sampling to try to sample from the Weibull.

Our algorithm will be the following:

1.) Generate \(Y \sim Exponential(\beta)\)
2.) Generate \(U \sim Unif(0,1)\)
3a.) Reject \(Y\) if \(U > \frac{f_X(Y)}{e(Y)}\) where \(f_X()\) is the Weibull density and \(e()\) is the envelope i.e. the scaled Exponential density
3b.) Otherwise accept \(Y\) and set \(\theta = Y\)

alpha = 2
beta = 0.5
k = alpha

n = 10000
U = runif(n)
Y = rexp(n, rate=1/beta)
fy = dweibull(Y, shape=alpha, scale=beta)
ey = k * dexp(Y, rate=1/beta)
theta.accepted = Y[U <= (fy/ey)]

#plot in ggplot
data = data.frame(theta = theta.accepted)
p <- ggplot(data, aes(theta)) + 
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dweibull, 
    args=list(shape=2, scale=0.5), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Additional Problem 2.)

Consider the Normal model with reference prior:

\[ y_i \sim N(\mu,\sigma^2)i=1,...,n\\ p(\mu,\sigma^2) \propto (\sigma^2)^{-1} \]

A.) Write out the algorithm for obtaining the Monte Carlo samples from the joint posterior

We use results from Lecture Slides 3 for the distributions of \(\sigma^2|y\) and \((\mu|\sigma^2,y)\).

Step 1.) Simulate ‘data’, i.e. simulate n data points \(y_i\) \(i = 1,..n\) from some Normal Distribution.

Step 2.) Calculate the sample variance \(s^2\) and sample mean \(\bar{y}\) from our \(n\) data points.

Step 3.) Since we know that \(\sigma^2|y \sim Sc.Inv.\chi^2(n-1,s^2)\), we will draw \(B\) random values from this distribution.

Step 4.) We also know that \((\mu|\sigma^2,y) \sim Normal(\bar{y}, \frac{\sigma^2}{n})\). Thus, for each of the \(B\) \(\sigma^2\) values drawn in Step 3, we will draw a random value \(\mu\) from the \(Normal(\bar{y}, \frac{\sigma^2}{n})\).

After Step 4.) we will have \(B\) points drawn from the joint posterior \((\mu, \sigma^2) \sim p(\mu,\sigma^2|y)\)

B.) Simulate 50 values from a normal population of your choice.

n <- 50
y <- rnorm(n, mean=2, sd=1)

C.) Implement MC Algorithm

#Step 2
s <- sd(y)
s.2 <- s^2
y.bar <- mean(y)

#Step 3
B <- 2000
sig2 <- rinvchisq(B, df=n-1, scale=s.2)

#Step 4
mu <- rnorm(B, mean=y.bar, sd=sqrt(sig2/n))

Let’s make ggplot of our joint posterior distribution

post <- data.frame(cbind(mu, sig2))
p <- ggplot(post, aes(x=mu, y=sig2)) + geom_point() + geom_density_2d() + ggtitle('Joint Posterior Distribution')
p

Marginal Posterior for sigma^2 given data

Let’s examine the marginal posterior sample distribution for \(\sigma^2|y\). Recall that we know the marginal posterior distribution should be a \(\sigma^2|y \sim Sc.Inv.\chi^2(n-1,s^2)\)

data = data.frame(sigma.squared.MC = sig2)
p <- ggplot(data, aes(x=sigma.squared.MC)) + ggtitle('2000 MC Samples from Sigma^2') +
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dinvchisq, 
    args=list(df=(n-1), scale=s.2), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can see that our posterior marginal sample distribution is similar to that of the theoretical posterior marginal distribution.

Let’s try to find the MAP estimate for \(\sigma^2|y\). By definition, the mode of the posterior distribution is the MAP estimate.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
map.sig2 <- Mode(sig2)
map.sig2

## [1] 1.060255

Recall that our ‘population’ were values from a \(Normal(\mu=2, \sigma^2 = 1)\). Thus, our MAP estimate \((\hat{\sigma}^2|y)_{\text{MAP}}\) is not too far off from our true population parameter.

Marginal Posterior for mu given data

We know (from Lecture 3 Slides) that the marginal posterior of \(\mu|y \sim t_{n-1}(\bar{y}, \frac{s^2}{n})\). In other words, the marginal posterior of \(\mu|y\) is a scaled and shifted t distribution.

data = data.frame(mu.MC = mu)
p <- ggplot(data, aes(x=mu.MC)) + ggtitle('2000 MC Samples for Mu') + 
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dt.scaled, 
    args=list(df=(n-1), mean=y.bar, sd = sqrt(s.2/n)), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can see that our posterior marginal sample distribution is similar to that of the theoretical posterior marginal distribution.

Let’s try to find the MAP estimate for \(\mu|y\). By definition, the mode of the posterior distribution is the MAP estimate.

map.mu<- Mode(mu)
map.mu

## [1] 1.627319

Recall that our ‘population’ were values from a \(Normal(\mu=2, \sigma^2 = 1)\). Thus, our MAP estimate \((\hat{\mu}|y)_{\text{MAP}}\) is not too far off from our true population parameter.

D.) Obtain a Monte Carlo approximation of the posterior predictive distribution.

Recall our prior and posterior distributions.

We originally sampled \(y_i\) from a \(Normal(\mu=2, \sigma^2 = 1)\) distribution.

\[ y_i \sim N(\mu=2,\sigma^2=1)i=1,...,n\\ \sigma^2|y \sim Sc.Inv.\chi^2(n-1,s^2)\\ (\mu|\sigma^2,y) \sim Normal(\bar{y}, \frac{\sigma^2}{n})\\ (\tilde{y_i} |\mu, \sigma^2, y_i) \sim N(\mu,\sigma^2)i=1,...,n \]

Where \(n\) is equal to the sample size and \(s^2\) is the sample variance, and \(\bar{y}\) is the sample mean.

Let us now use a Monte Carlo approximation of the posterior predictive distribution.

#Monte Carlo Posterior Predictive
sig2.data = sig2
mu.data = mu

y.new <- rep(-10, length(sig2.data))
for(i in 1:length(sig2.data)) {
  y.new[i] <- rnorm(1, mean = mu.data[i], sd = sqrt(sig2.data[i]))
}

hist(y.new, freq=FALSE, xlab='y.new', main='MC Posterior Predictive Distribution')

Let’s now calculate the theoretical posterior predictive distribution.

Recall that \(\tilde{y} \sim N(\mu_{\text{post}}, \sigma^2_{\text{post}})\). It remains then to find the theoretical Expectation of these posterior distribution random variables.

Let us first compute Expected Value.

\[ \begin{align} E[\tilde{Y}|Y] &= E[E[\tilde{Y}|\mu_{\text{post}}, Y]|Y]\\ &= E[\mu_{\text{post}} | Y]\\ & = \bar{y} \end{align} \]

The above is true because recall that \(\mu_{\text{post}}|Y \sim N(\bar{y}, \frac{\sigma^2_{\text{post}}}{n})\).

Now let us focus on the \(Var(\tilde{y})\). Again, we will condition on our observed data \(Y\). We also use results about the expectation and variance of a \(Sc.Inv.\chi^2(n-1,s^2)\).

\[ \begin{align} Var[\tilde{Y}|Y] &= E[Var(\tilde{y}|\sigma^2_{\text{post}}, Y) | Y] + Var[E(\tilde{y}|\sigma^2_{\text{post}}, Y)|Y]]\\ &= E[\sigma^2_{\text{post}}|Y] + Var[\mu_{\text{post}}|Y]\\ & = E[Sc.Inv.\chi^2(n-1,s^2)] + Var[\frac{\sigma^2_{\text{post}}}{n} |Y]\\ & = E[Sc.Inv.\chi^2(n-1,s^2)] + \frac{1}{n^2} Var[Sc.Inv.\chi^2(n-1,s^2)] \\ &= \frac{(n-1)s^2}{(n-1)-2} + \frac{2(n-1)^2 (s^2)^2}{n^2(n-1-2)^2(n-1-4)} \end{align} \]

In summary, the theoretical posterior predictive distribution is:

\[ p(\tilde{y}|y) \sim Normal(\mu = \bar{y}, \sigma^2 = \frac{(n-1)s^2}{(n-1)-2} + \frac{2(n-1)^2 (s^2)^2}{n^2(n-1-2)^2(n-1-4)}) \]

Lets calculate those in R and then overlay this theoretical predictive posterior distribution over our MC predictive posterior distribution.

mu.new = y.bar
sig2.new = ((n-1)*(s.2))/(n-3) + (2*((n-1)^2)*(s.2^2))/((n^2)*((n-3)^2)*(n-5)) 

data = data.frame(y.new = y.new)
p <- ggplot(data, aes(y.new)) + ggtitle('2000 MC Samples Y.new with theoretical overlay') + 
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dnorm, 
    args=list(mean = mu.new, sd = sqrt(sig2.new)), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

E.) Apply Gibb’s Sampling to this Model

Yes Gibb’s sampling can be applied to this model.

Here are the posterior conditionals:

\[ \mu \sim p(\mu|\sigma^2,y) \sim N(\bar{y}, \frac{\sigma^2}{n})\\ \sigma^2 \sim p(\sigma^2|\mu,y) \sim Sc.Inv.\chi^2(n,s_{\mu}^2) \]

The posterior conditional for \(\mu\) was already given. The posterior conditional for \(\sigma^2|\mu\) can be intuitively derived/understood by the fact that we have regained that one degree of freedom (since we no longer are using \(\bar{y}\) and now know \(\mu\)), and we will also compute the sample variance (but again replace \(\bar{y}\) with \(\mu\)).

Here is our algorithm:

1.) Start with initial value for \(\mu^{(1)} = \bar{y}\). Then,

2.) For \(b=2,...,B\), we will sample from the two posterior conditional distributions.

\[ \sigma^{2(b)}\sim Sc.Inv. \chi^2 (n,s_{\mu^{(b-1)}}^2)\\ \mu^{(b)} \sim Normal(\bar{y}, \frac{\sigma^{2(b-1)}}{n}))) \]

Let us now implement the Gibb’s sampler in R.

# y; y.bar; n; s.2; 
B = 2000
mu.post.gibbs <- rep(-10, B+1)
sig2.post.gibbs <- rep(-10,B)

#start mu with y.bar
mu.post.gibbs[1] <- y.bar

for(b in 2:(B+1)) {
  scale.val <- sum((y - mu.post.gibbs[b-1])^2)/n
  sig2.post.gibbs[b-1] <- rinvchisq(1, df = n, scale = scale.val) #sig2|mu
  
  
  mu.post.gibbs[b] <- rnorm(1, y.bar, sd = sqrt(sig2.post.gibbs[b-1]/n)) #mu|sig2
}

par(mfrow=c(1,2))
plot(sig2.post.gibbs, type="l", main = "Trace Plot of Sigma2")
plot(mu.post.gibbs, type = "l", main = "Trace Plot of Mu")

Let’s ‘burn-in’ 200 samples of each \(\mu\) and \(\sigma^2\).

We will now plot those histograms.

data = data.frame(mu.Gibbs = mu.post.gibbs[200:length(mu.post.gibbs)])
p <- ggplot(data, aes(mu.Gibbs)) + ggtitle('200 burned - 1,800 Gibb Sampled Mu.Post') + 
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dt.scaled, 
    args=list(df=(n-1), mean=y.bar, sd = sqrt(s.2/n)), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#posterior sigma^2 histogram
data = data.frame(sigma.squared.Gibbs = sig2.post.gibbs[200:length(sig2.post.gibbs)])
p <- ggplot(data, aes(sigma.squared.Gibbs)) + ggtitle('200 burned - 1,800 Gibb Sampled Sig2.Post') +
  geom_histogram(aes(y=stat(density))) + 
  stat_function(
    fun = dinvchisq, 
    args=list(df=(n-1), scale=s.2), 
    lwd = 2, 
    col = 'red'
  )
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

It appears that our Gibb Sampled matches the true distribution pretty well.