Math 538 Midterm 1 TakeHome

Libraries Used

library(invgamma)
library(reshape2)
library(ggplot2)
library(MASS)
library(mvtnorm)

Problem 1a.) Obtain the normalizing constant for the likelihood in terms of $\theta$.

The likelihood of $y|\theta$ is the following:

\[ p(y|\theta) = \frac{\alpha^n}{\theta^n} (\prod_{i=1}^n y_i)^{\alpha-1} \exp(-\frac{1}{\theta} \sum_{i=1}^n y_i^{\alpha}) \]

The normalizing constant (i.e. all the terms that are not dependent on $\theta$) is

\[ \alpha^n (\prod_{i=1}^n y_i)^{\alpha-1} \]

Problem 1b.)

#data
y <- c(6.5, 13.0, 8.5, 8.0, 5.5, 15.5, 9.5, 14.0, 13.5)
n <- length(y)

We have the following model set up:

\[ \text{Prior: } \theta \sim InvGamma(a_0 = 2, b_0 = 1)\\ \text{Likelihood: } y_i|\theta \sim Weibull(\alpha=2, \beta=\theta^{\frac{1}{\alpha}})\\ \implies p(y|\theta) = \frac{\alpha^n}{\theta^n} (\prod_{i=1}^n y_i)^{\alpha-1} \exp(-\frac{1}{\theta} \sum_{i=1}^n y_i^{\alpha})\\ \text{Posterior: } \theta|y \sim InvGamma(a_0 + n = 2+9, b_0 + \sum_{i=1}^n y_i^{\alpha} = 1 + 1086.5) \]

a.0 = 2
b.0 = 1
alpha = 2
sum.y.alpha <- sum(y^alpha)

theta.vals <- seq(0.01,150, length.out=1000)

prior.dist <- dinvgamma(theta.vals, shape=a.0, rate=b.0)

post.dist <- dinvgamma(theta.vals, shape=(a.0 + n), rate=(b.0 + sum.y.alpha))

like.theta.func <- function(data.y, theta, alpha) {
  n <- length(data.y)
  alpha.n <- alpha^n
  prod.y <- prod(data.y)^(alpha-1)
  sum.y.alpha <- sum(data.y^alpha)
  val <- alpha.n * prod.y * theta^(-n) * exp((-1/theta) * sum.y.alpha)
  return(val)
}

likelihood.theta <- like.theta.func(data.y = y, theta = theta.vals, alpha=2)

gg.df <- data.frame(theta = theta.vals, prior = prior.dist, likelihood = likelihood.theta, posterior = post.dist)

melted.gg.df <- melt(gg.df, id.vars='theta', variable.name='z')

p <- ggplot(data=melted.gg.df, aes(theta, value)) + geom_line(aes(colour=z)) + ggtitle('Prior, Likelihood, and Posterior')
p

It appears that both the likelihood and the posterior distribution are flat near zero for most of the $\theta$ values in (0,5), which is where most of the support of the prior is located.

The likelihood and posterior have most of their support between 50 and 150, as indicated in the the graps below. The vertical line in the posterior is the MAP estimate and vertical line in the likelihood is the MLE.

With this in mind, it would seem that the posterior is predominantly influenced by the likelihood since the prior support is very far from the posterior support.

theta.vals <- seq(50, 200, length.out=1000)

post.dist <- dinvgamma(theta.vals, shape=(a.0 + n), rate=(b.0 + sum.y.alpha))
gg.df.posterior <- data.frame(theta = theta.vals, posterior = post.dist)
gg.post <- ggplot(gg.df.posterior, aes(theta, posterior)) + geom_line() + geom_vline(xintercept = (b.0 + sum.y.alpha)/(a.0 + n + 1)) + ggtitle('Posterior')

likelihood.theta <- like.theta.func(data.y = y, theta = theta.vals, alpha=2)
gg.df.likelihood <- data.frame(theta = theta.vals, likelihood = likelihood.theta)
gg.like <- ggplot(gg.df.likelihood, aes(theta, likelihood)) + geom_line() + geom_vline(xintercept = (1/n)*sum.y.alpha) + ggtitle('Likelihood')

par(mfrow=c(1,2))
gg.post

gg.like

Problem 1c.) Comparing point estimates of $\theta$

theta.MLE = (1/n)*(sum.y.alpha)
theta.MAP = (b.0 + sum.y.alpha)/(a.0 + n + 1)

theta.prior.max = (b.0)/(a.0 + 1)

print(paste("theta.MLE: ", theta.MLE))

## [1] "theta.MLE:  120.722222222222"

print(paste("theta.MAP: ", theta.MAP))

## [1] "theta.MAP:  90.625"

print(paste("theta.prior.mode: ", theta.prior.max))

## [1] "theta.prior.mode:  0.333333333333333"

As we can see, $\hat{\theta}_{\text{MAP}}$ is much closer to the MLE than it is to the mode of the prior. This validates our hunch that the posterior depends more on the likelihood than it does the prior.

Problem 1d.) 95% Posterior Credible Intervals

We sample 2000 randomly generated values from the $InverseGamma(a_0 + n, b.0 + \sum_{i=1}^n y_i^{\alpha})$.

In our case we have

\[ a_0 = 2\\ b_0 = 1\\ \alpha = 2\\ n = 9\\ \sum_{i=1}^n y_i^{\alpha} = 1086.5 \]

num.sample <- 2000
theta.post.samples <- rinvgamma(n=num.sample, shape=a.0 + n, rate=b.0 + sum.y.alpha)
set.seed(10)
interval.lower = quantile(theta.post.samples, 0.025)
interval.upper = quantile(theta.post.samples, 0.975)

df = data.frame(theta.post.samples = theta.post.samples)
p <- ggplot(df, aes(x = theta.post.samples)) + geom_histogram() + geom_vline(xintercept = interval.lower, color='blue') + geom_vline(xintercept = interval.upper, color = 'blue')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Thus, as we can see our 95% posterior credible interval for $\theta$ visually as the vertical blue lines above and numerically it is

c(interval.lower, interval.upper)

##      2.5%     97.5% 
##  59.46666 197.37142

To find the 95% credible interval for $\beta$ we apply the transformation between $\theta$ and $\beta$. In other words,

\[ \theta = \beta^{\alpha} \implies \beta = \theta^{\frac{1}{\alpha}} \]

Since $\alpha = 2$, we have

\[ \beta = \theta^{\frac{1}{2}} \]

Thus, we will apply our transformation to all of our $\theta|y$ samples and obtain $\beta|y$ samples

beta.post.samples <- theta.post.samples^(1/alpha)
interval.beta.lower <- quantile(beta.post.samples, 0.025)
interval.beta.upper <- quantile(beta.post.samples, 0.975)

df = data.frame(beta.post.samples = beta.post.samples)
p <- ggplot(df, aes(x = beta.post.samples)) + geom_histogram() + geom_vline(xintercept = interval.beta.lower, color='blue') + geom_vline(xintercept = interval.beta.upper, color = 'blue')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Thus, as we can see our 95% posterior credible interval for $\beta$ visually as the vertical blue lines above and numerically it is

c(interval.beta.lower, interval.beta.upper)

##      2.5%     97.5% 
##  7.711463 14.048893

Problem 1e.) Prior Predictive and Posterior Predictive

Prior Predictive Distribution

We will repeat the following procedure many times:

1.) Take one sample from each of the prior(s)

2.) Plug those samples into the probability density function to generate a data set $y_1,...,y_n$.

For this model we have

\[ \theta_{\text{prior}} \sim InvGamma(a_0 =2, b_0 = 1)\\ y_i|\theta \sim Weibull(\alpha=2, \beta)\\ \theta = \beta^{\alpha} \implies \beta=\theta^{1/2} \]

B <- 2000
theta.prior <- rinvgamma(B, shape = a.0, rate = b.0)
beta.prior = theta.prior^(1/alpha)
y.prior <- rweibull(B, shape = alpha, scale = beta.prior)

df <- data.frame(y.prior.predictive = y.prior)

y.prior.mean = mean(y.prior)
y.prior.lower = quantile(y.prior, 0.025)
y.prior.upper = quantile(y.prior, 0.975)

p <- ggplot(df, aes(x=y.prior.predictive)) +
  geom_histogram(aes(y=..density..)) +  # scale histogram y
  geom_density(col = "red") + ggtitle('Empirical Density Estimate Plot - Prior Predictive Distribution') + geom_vline(xintercept = y.prior.lower, color='blue') + geom_vline(xintercept = y.prior.upper, color = 'blue') + geom_vline(xintercept=y.prior.mean, color='green')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here are the numerical results for the prior predictive distribution mean and 95% credible interval.

result.estimate.names <- c("Mean", "95.Lower", "95.Upper")
result.estimates <- c(y.prior.mean, as.numeric(y.prior.lower),as.numeric(y.prior.upper))
                      
results.df = data.frame(result.estimate.names, result.estimates)
colnames(results.df) = c("Point.Estimates", "Values")
results.df

##   Point.Estimates    Values
## 1            Mean 0.7784833
## 2        95.Lower 0.1155363
## 3        95.Upper 2.2406217

We can see our point estimates above from the data.frame printout

Posterior Predictive Distribution

We will repeat the following procedure many times:

1.) Take one sample from each of the posterior(s). In our case, $\theta|y$

2.) Plug those samples into the probability density function to generate a data set $y_1,...,y_n$.

For this model we have

\[ \theta_{\text{post}}|y \sim InvGamma(a_0 + n, b_0 + \sum_{i=1}^n y_i^{\alpha})\\ y_i|\theta \sim Weibull(\alpha=2, \beta)\\ \theta = \beta^{\alpha} \implies \beta=\theta^{1/2} \]

B <- 2000
theta.post <- rinvgamma(B, shape = a.0 + n, rate = b.0+ sum.y.alpha)
beta.post = theta.post^(1/alpha)
y.posterior <- rweibull(B, shape = alpha, scale = beta.post)

df <- data.frame(y.posterior.predictive = y.posterior)

y.posterior.mean = mean(y.posterior)
y.posterior.lower = quantile(y.posterior, 0.025)
y.posterior.upper = quantile(y.posterior, 0.975)

p <- ggplot(df, aes(x=y.posterior.predictive)) +
  geom_histogram(aes(y=..density..)) +  # scale histogram y
  geom_density(col = "red") + ggtitle('Empirical Density Estimate Plot - Posterior Predictive Distribution') + geom_vline(xintercept = y.posterior.lower, color='blue') + geom_vline(xintercept = y.posterior.upper, color = 'blue') + geom_vline(xintercept=y.posterior.mean, color='green')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

result.estimate.names <- c("Mean", "95.Lower", "95.Upper")
result.estimates <- c(y.posterior.mean, as.numeric(y.posterior.lower),as.numeric(y.posterior.upper))
                      
results.df = data.frame(result.estimate.names, result.estimates)
colnames(results.df) = c("Point.Estimates", "Values")
results.df

##   Point.Estimates    Values
## 1            Mean  9.129641
## 2        95.Lower  1.402510
## 3        95.Upper 20.580356

Comparison of Prior and Posterior Predictive Distributions

The prior predictive distribution isn’t informative and doesn’t agree with the data observed. This is because it hasn’t taken into account the observed data and thus solely relies on the prior distribution of $\theta$. On the other hand, the posterior predictive distribution has taken the observed data into account and thus more accurately reflects what a $y_{new}$ may look like.

Problem 2a.) Derive Jeffrey’s Prior for the Model & Derive the Posterior of $\theta$ under this prior.

We first derive Jeffrey’s prior.

By definition we know the Jeffrey’s prior is proportional to the square root of the Fisher Information. In other words,

$g(\lambda) \propto \sqrt{-E_y(\frac{\partial^2}{\partial\lambda^2} \log(f(y|\lambda)))}$

Thus, we will calculate the Fisher Information. \[ \begin{align} \sqrt{-E_y[\frac{\partial^2}{\partial\theta^2} \log(f(y|\theta)))]} &= \sqrt{-E_y[\frac{\partial^2}{\partial\theta^2}\log(\frac{e^{-y}\theta^y}{y!})]} \\ &= \sqrt{-E_y[\frac{-y}{\theta^2}]}\\ &= \sqrt{\frac{\theta}{\theta^2}}\\ &= \sqrt{\frac{1}{\theta}} \end{align} \]

Thus, the Jeffrey’s prior for this model is

\[ g(\theta) \propto \sqrt{\frac{1}{\theta}} \]

Now, given this prior and our data model $y_i \sim Poisson(\theta)$, we will derive the posterior.

\[ \begin{align} p(\theta|y) &\propto p(\theta)p(y|\theta) \\ &= (\theta^{-1/2}) \prod_{i=1}^n \frac{(e^{-\theta}) \theta^{y_i}}{y_i!}\\ &= \theta^{\sum_{i=1}^n y_i - \frac{1}{2}} e^{-n\theta}\\ &= \theta^{\sum_{i=1}^n y_i + \frac{1}{2} - 1} e^{-n\theta}\\ & \sim Gamma(\alpha=\sum_{i=1}^n y_i+ \frac{1}{2}, \beta=n) \end{align} \]

Problem 2b.) Sample from the posterior. Provide density plots of your samples as well as the 95% posterior credible interval and the MAP estimate of $\theta|y$

Our full model:

\[ \text{Prior: } p(\theta) \propto \theta^{-\frac{1}{2}}\\ \text{Data: } y_i|\theta \sim Poisson(\theta)\\ \text{Likelihood: } p(y|\theta) = \prod_{i=1}^n \frac{(e^{-\theta}) \theta^{y_i}}{y_i!}\\ \text{Posterior: } \theta|y \sim Gamma(\alpha=\sum_{i=1}^n y_i+ \frac{1}{2}, \beta=n) \]

#input the data
y.accidents <- c(24, 25, 31, 31, 22, 21, 26, 20, 16, 22)
n <- length(y.accidents)
time.seq <- c(1:n)
data <- data.frame(t = time.seq, y = y.accidents)

#sample from the posterior
B = 2000
theta.post <- rgamma(B, shape = sum(data$y) + 0.5, rate = n)

#histogram and empirical density plot
gg.df <- data.frame(theta.post = theta.post)

theta.post.MAP = (sum(data$y) + 0.5 - 1)/n
theta.post.lower = quantile(theta.post, 0.025)
theta.post.upper = quantile(theta.post, 0.975)

p <- ggplot(gg.df, aes(x=theta.post)) +
  geom_histogram(aes(y=..density..)) +  # scale histogram y
  geom_density(col = "red") + ggtitle('Empirical Density Estimate Plot - Posterior Distribution') + geom_vline(xintercept = theta.post.lower, color='blue') + geom_vline(xintercept = theta.post.upper, color = 'blue') + geom_vline(xintercept=theta.post.MAP, color='green')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can visually see our 95% credible interval with the blue lines and the $\hat{\theta}_{\text{MAP}}$ with the green lines. We provide our numerical estimates in the output of the data frame below.

result.estimate.names <- c("MAP", "95.Lower", "95.Upper")
result.estimates <- c(theta.post.MAP, as.numeric(theta.post.lower),as.numeric(theta.post.upper))
                      
results.df = data.frame(result.estimate.names, result.estimates)
colnames(results.df) = c("Point.Estimates", "Values")
results.df

##   Point.Estimates   Values
## 1             MAP 23.75000
## 2        95.Lower 20.97104
## 3        95.Upper 26.85076

Problem 2c.) Sample from the posterior predictive. Provide density plots of your samples as well as the 95% posterior credible interval and the MAP estimate of $\theta|y$

We will repeat the following procedure many times:

1.) Take one sample from each of the posterior(s). In our case, $\theta|y$

2.) Plug those samples into the probability density function to generate a data set $y_1,...,y_n$.

For this model we have

\[ \theta_{\text{post}}|y \sim Gamma(\alpha=\sum_{i=1}^n y_i+ \frac{1}{2}, \beta=n)\\ y_i|\theta_{\text{post}} \sim Poisson(\theta_{\text{post}}) \]

B <- 2000
theta.post <- rgamma(B, shape = sum(data$y) + 0.5, rate = n)
y.posterior <- rpois(B, lambda=theta.post)

df <- data.frame(y.posterior.predictive = y.posterior)


z <- density(y.posterior)
y.map.estimate <-z$x[which.max(z$y)] 
y.posterior.lower.jeffrey = quantile(y.posterior, 0.025)
y.posterior.upper.jeffrey = quantile(y.posterior, 0.975)

p <- ggplot(df, aes(x=y.posterior.predictive)) +
  geom_histogram(aes(y=..density..)) +  # scale histogram y
  geom_density(col = "red") + ggtitle('Empirical Density Estimate Plot - Posterior Predictive Distribution') + geom_vline(xintercept = y.posterior.lower.jeffrey, color='blue') + geom_vline(xintercept = y.posterior.upper.jeffrey, color = 'blue') + geom_vline(xintercept=y.map.estimate, color='green')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Above we can visually see the 95% credible interval and the map estimate for the $y_{\text{new}}$. Numerically is a console output of these point estimates.

result.estimate.names <- c("y.post.MAP", "95.Lower", "95.Upper")
result.estimates <- c(y.map.estimate, as.numeric(y.posterior.lower),as.numeric(y.posterior.upper))
                      
results.df = data.frame(result.estimate.names, result.estimates)
colnames(results.df) = c("Point.Estimates", "Values")
results.df

##   Point.Estimates   Values
## 1      y.post.MAP 23.70327
## 2        95.Lower  1.40251
## 3        95.Upper 20.58036

Thus, we can infer that the number of fatal accidents in 1986 would likely (i.e. with 95%) be between 14 and 34, and most probably be 24 (the MAP estimate).

Problem 2d.) Assume now that the number of fatal accidents in each year $t$ independetly follow a $Poisson(\theta_t)$…

We choose the diffuse priors $\alpha \sim Normal(0,10^2)$ and $\beta \sim Normal(0,10^2)$ and $Cov(\alpha, \beta) = Corr(\alpha,\beta)\sqrt{Var(\alpha)*Var(\beta)}$. We assume apriori that $Corr(\alpha, \beta) = -0.5$. This initial apriori decision was influenced by an exploratory data analysis. In particular, looking at the plot of $t$ vs $y$ and running a non-Bayesian poisson regression on the data.

y <- y.accidents
t <- c(1:length(y))
n <- length(y)
plot(t,y, main='# of Accidents over Time')

#non Bayesian estimates
mod.freq <- glm(y ~ t, family= poisson(link='log'), data = data)
summary(mod.freq)

## 
## Call:
## glm(formula = y ~ t, family = poisson(link = "log"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0656  -0.4573  -0.3639   0.6877   1.1416  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.37691    0.13407  25.188   <2e-16 ***
## t           -0.03880    0.02265  -1.713   0.0867 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 8.3893  on 9  degrees of freedom
## Residual deviance: 5.4439  on 8  degrees of freedom
## AIC: 59.413
## 
## Number of Fisher Scoring iterations: 4

Here is the form of the joint posterior. As always, the posterior is proportional to the $(prior) * (likelihood)$

\[ \begin{align} p(\alpha, \beta|y,t) &\propto p(\alpha,\beta) p(y|\alpha, \beta,t)\\ &\propto p(\alpha)p(\beta) p(y|\alpha, \beta,t)\\ &\propto N_2(\mu,\Sigma) * \prod_{t=1}^{10} \frac{e^{-\theta_t} \theta_t^{y_t}}{y_t!}\\ \end{align} \]

In the above $\mu = c(0,0)^{T}$ and $\Sigma$ is the variance-covariance matrix.

A grid sampling algorithm for sampling $\alpha$ and $\beta$ can be formalized using the bio-assay example from Bayesian Data Analysis Chapter 3 pg.76. One can follow the steps 1 and 2(a), 2(b), 2(c).

We must also show that our posterior is proper, i.e., it has a finite integral over the range of values $\alpha \in (-\infty, \infty)$ and $\beta \in (-\infty, \infty)$. Luckily, since we chose a proper prior distribution for $p(\alpha, \beta)$ it follows that the posterior distribution will be proper.

Problem 2e.) Obtain 2000 independent posterior samples from $p(,|data) and plot the joint and marginal posterior densities.

y <- y.accidents
t <- c(1:length(y))
n <- length(y)

#Diffuse Prior Parameters
mean <- c(0,0)
cov.ab <- -0.5*sqrt(100*100)
sigma.mat <- matrix(c(100, cov.ab, cov.ab, 100), nrow=2)

post.link.diffuse.prior <- function(alpha, beta, y, t, mean, cov.sigma) {
  prior <- dmvnorm(x = c(alpha, beta), mean = mean, sigma = cov.sigma)
  theta <- exp(alpha + beta*t)
  likelihood <- prod(dpois(y, lambda = theta))
  posterior <- prior*likelihood
  
  list(posterior = posterior, prior = prior, likelihood = likelihood)
}

#alpha.grid.interval <- 0.05
alpha.grid <- seq(2.9, 3.9, length.out=100)
alpha.grid.interval = alpha.grid[2] - alpha.grid[1]
a.n <- length(alpha.grid)

#beta.grid.interval <- 0.05
beta.grid <- seq(-0.1, 0.1, length.out=100)
beta.grid.interval = beta.grid[2] - beta.grid[1]
b.n <- length(beta.grid)

posterior.grid <-prior.grid <-likelihood.grid <-matrix(0,nrow=a.n,ncol=b.n)
alpha.posterior <- NULL


for(i in 1:a.n) {
  for(j in 1:b.n) {
    post.prior.like.list <- post.link.diffuse.prior(alpha.grid[i], beta.grid[j], y, t, mean = mean, cov.sigma = sigma.mat)
    posterior.grid[i,j] <- post.prior.like.list$posterior
    prior.grid[i,j] <- post.prior.like.list$prior
    likelihood.grid[i,j] <- post.prior.like.list$likelihood
  }
}

#Extract alpha values from Grids
alpha.posterior <- apply(posterior.grid,1,sum)
alpha.prior <- apply(prior.grid,1,sum)
alpha.likelihood <- apply(likelihood.grid,1,sum)

#Normalizing Constants
c.joint.posterior <- sum(posterior.grid)
c.joint.prior <- sum(prior.grid)
c.joint.likelihood <- sum(likelihood.grid)

After setting up the grids, we run the algorithm to set up the sample densities for $\alpha|y$ and $\beta|y$ using 2,000

boot.n <- 2000

#for posterior
place.a.post <- sample.int(a.n,boot.n,prob=alpha.posterior,replace=T)
alpha.posterior.sample <- alpha.grid[place.a.post]

#for prior
place.a.prior <- sample.int(a.n,boot.n,prob=alpha.prior,replace=T)
alpha.prior.sample <- alpha.grid[place.a.prior]

# For likelihood
place.a.lik <- sample.int(a.n,boot.n,prob=alpha.likelihood,replace=T)
alpha.likelihood.sample <- alpha.grid[place.a.lik]

### Initialize beta sample variables
beta.posterior.sample <- place.b.post <- numeric()
beta.prior.sample <- place.b.prior <- numeric()
beta.likelihood.sample <- place.b.lik <- numeric()


for(b in 1:boot.n) {
  #for posterior
  place.b.post[b] <- sample.int(b.n,1,prob=posterior.grid[place.a.post[b],],replace=T)
  beta.posterior.sample[b] <- beta.grid[place.b.post[b]]
  
  #for prior
  place.b.prior[b] <- sample.int(b.n,1,prob=prior.grid[place.a.prior[b],],replace=T)
  beta.prior.sample[b] <- beta.grid[place.b.prior[b]]
  
  #for likelihood
  place.b.lik[b] <- sample.int(b.n,1,prob=likelihood.grid[place.a.lik[b],],replace=T)
  beta.likelihood.sample[b] <- beta.grid[place.b.lik[b]]
}


#add a uniform random jigger to each alpha and beta

#for posterior
alpha.posterior.sample <- alpha.posterior.sample + runif(boot.n,-alpha.grid.interval/2,alpha.grid.interval/2)

beta.posterior.sample <- beta.posterior.sample + runif(boot.n,-beta.grid.interval/2,beta.grid.interval/2)

#for prior
alpha.prior.sample <- alpha.prior.sample + runif(boot.n,-alpha.grid.interval/2,alpha.grid.interval/2)

beta.prior.sample <- beta.prior.sample + runif(boot.n,-beta.grid.interval/2,beta.grid.interval/2)

#for likelihood
alpha.likelihood.sample <- alpha.likelihood.sample + runif(boot.n,-alpha.grid.interval/2,alpha.grid.interval/2)

beta.likelihood.sample <- beta.likelihood.sample + runif(boot.n,-beta.grid.interval/2,beta.grid.interval/2)

Let’s plot our joint density and joint sample scatter plot.

#Sample Posterior Contour Plot
image(alpha.grid,beta.grid,posterior.grid/c.joint.posterior,col=topo.colors(20),main="Contour Plot of Joint Posterior",xlab="alpha",ylab="beta")
contour(alpha.grid,beta.grid,posterior.grid/c.joint.posterior,add=T)

#Sample Posterior Scatterplot
plot(alpha.posterior.sample,beta.posterior.sample,col=rgb(0.4,0.4,1,1),pch=20,ylim=c(min(beta.grid),max(beta.grid)),xlim=c(min(alpha.grid),max(alpha.grid)),main="Samples from Joint Posterior",xlab="alpha",ylab="beta")
contour(alpha.grid,beta.grid,posterior.grid/c.joint.posterior,add=T)

Let’s plot the marginal $\alpha|data$ and marginal $\beta|data$ densities.

par(mfrow = c(1,2))
#Density Plot of Alpha Posterior
z = density(alpha.posterior.sample)
alpha.posterior.MAP <- z$x[which.max(z$y)]
plot(density(alpha.posterior.sample),xlab="Alpha",main="Samples from Alpha Posterior",col="blue",lwd=4)
abline(v = alpha.posterior.MAP, col= 'red')

#Density Plot of Beta Posterior
z = density(beta.posterior.sample)
beta.posterior.MAP <- z$x[which.max(z$y)]
plot(density(beta.posterior.sample),xlab="Beta",main="Samples from Beta Posterior",col="blue",lwd=4)
abline(v = beta.posterior.MAP, col = 'red')

Now we need to obtain the MAP for the posterior rate of fatal accidents per year (i.e. $\theta_t|data$) at each year.

In order to do that we take our alpha posterior samples, our beta posterior samples, and then we linearly combine them through the link function for each year.

Recall that 1976 is equivalent to $t = 1$, 1977 is equivalent to $t = 2$, and so on until 1985 is equivalent to $t=10$.

\[ \log(\theta_t) = \alpha + \beta t \\ \implies \theta_t = \exp(\alpha + \beta t) \]

$\theta_1$ for year 1976

theta.1.samples <- exp(alpha.posterior.sample + beta.posterior.sample * 1)
z <- density(theta.1.samples)
theta.1.MAP <- z$x[which.max(z$y)]
theta.1.lower <- quantile(theta.1.samples, 0.025)
theta.1.upper <- quantile(theta.1.samples, 0.975)

hist(theta.1.samples, main = 'Histogram of theta.1 for year 1976')
abline(v = theta.1.MAP, col = 'green')
abline(v = theta.1.lower, col = 'blue')
abline(v = theta.1.upper, col = 'blue')

Now that we understand how to do it for year 1976 ($t = 1$), we will iterate over all the years, store our answers and present as a dataframe or results.

MAP and 95% Credible Intervals for $\theta_1, \theta_2,..., \theta_{10}$

#initialize empty vectors that we will replace with MAP estimates and lower/upper 95% interval vals
thetas.map <- rep(-10, n)
thetas.lower <- rep(-10, n)
thetas.upper <- rep(-10, n)

for(i in 1:n) {
  theta.i.samples <- exp(alpha.posterior.sample + beta.posterior.sample * i)
  z <- density(theta.i.samples)
  thetas.map[i] <- z$x[which.max(z$y)]
  thetas.lower[i] <- quantile(theta.i.samples, 0.025)
  thetas.upper[i] <- quantile(theta.i.samples, 0.975)
}

years <- seq(1976, 1985, by=1)
result.names <- c("Year", "theta.MAP", "theta.lower", "theta.upper")

results.df <- data.frame(Year = years, theta.MAP = thetas.map, lower = thetas.lower, upper = thetas.upper)
results.df

##    Year theta.MAP    lower    upper
## 1  1976  28.86767 22.60914 34.32612
## 2  1977  27.62521 22.50426 32.18724
## 3  1978  26.22379 22.27693 30.15652
## 4  1979  25.16400 21.81817 28.60752
## 5  1980  23.72595 21.18639 27.23491
## 6  1981  23.14223 20.27239 26.35955
## 7  1982  22.08827 19.09267 25.72170
## 8  1983  21.31231 17.79746 25.52426
## 9  1984  20.39069 16.57866 25.45441
## 10 1985  19.74419 15.35416 25.42362

melted.gg.df <- melt(results.df, id.vars='Year', variable.name='z')

p <- ggplot(data=melted.gg.df, aes(Year, value)) + geom_point(aes(colour=z)) + ggtitle('theta.MAP, theta.lower, and theta.upper') + scale_x_continuous("Year", labels = as.character(years), breaks = years)
p

As we can see in the console output of the data frame and the graph, $\theta_t|data$ decreases over time. In the context of this problem, that implies that the rate of fatal accidents is decreasing over this ten year period.

Problem 2f.) Use your posterior samples of $\alpha$ and $\beta$ to predict the number of fatal accidents in the year 1986…

We will repeat the following procedure many times:

1.) Take one sample from each of the posterior(s). In our case, $\alpha|data$ and $\beta|data$.

2.) Using the link function, obtain $\theta_t$. In our case, $\theta_t|data$ for $t = 11$ since 1986 is year 11.

3.) Plug those samples into the probability density function of y to generate a data set $y_1,...,y_n$.

B <- 2000
theta.post.pred.11 <- exp(alpha.posterior.sample + beta.posterior.sample * 11)
y.post.pred.11 <- rpois(B, lambda = theta.post.pred.11)

df <- data.frame(y.posterior.predictive = y.post.pred.11)

z <- density(y.post.pred.11)
y.post.pred.11.MAP <- z$x[which.max(z$y)]
y.posterior.lower = quantile(y.post.pred.11, 0.025)
y.posterior.upper = quantile(y.post.pred.11, 0.975)

p <- ggplot(df, aes(x=y.posterior.predictive)) +
  geom_histogram(aes(y=..density..)) +  # scale histogram y
  geom_density(col = "red") + ggtitle('Empirical Density Estimate Plot - Posterior Predictive Distribution') + geom_vline(xintercept = y.posterior.lower, color='blue') + geom_vline(xintercept = y.posterior.upper, color = 'blue') + geom_vline(xintercept=y.post.pred.11.MAP , color='green')
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can visually see our posterior predictive distribution for year 1986. The green line is our MAP estimate and the blue lines are the lower and upper 95% credible interval values.

See below for the numerical estimates. We also include the estimates from when we had a Jeffrey’s prior instead of the log link function (i.e. Poisson Regression)

result.estimate.names <- c("y.post.MAP", "95.Lower", "95.Upper")
result.estimates <- c(y.post.pred.11.MAP, as.numeric(y.posterior.lower),as.numeric(y.posterior.upper))

result.estimates.jeffrey <- c(y.map.estimate, as.numeric(y.posterior.lower.jeffrey), as.numeric(y.posterior.upper.jeffrey))

results.df = data.frame(result.estimate.names, result.estimates, result.estimates.jeffrey)
colnames(results.df) = c("Point.Estimates", "Values.Poisson.Reg", "Values Jefrrey.Prior")
results.df

##   Point.Estimates Values.Poisson.Reg Values Jefrrey.Prior
## 1      y.post.MAP           17.54413             23.70327
## 2        95.Lower           10.00000             14.00000
## 3        95.Upper           31.00000             35.00000

Comparing the results from Part c.) when we had a Jeffrey’s prior, we can see that when we use the Jeffrey’s prior the predicted rate $\theta_t$ is greater than when we use the log-link function $\theta_t = \alpha + \beta t$ and $p(\alpha, \beta)$ was an improper prior Uniform.

I argue that the model using Jeffrey’s prior is more appropriate since the Poisson regression model has oversimplified that the rate of fatalieites $\theta_t$ is simply decreasing over time and thus possibly underestimates the number of fatalities for year 1986.

A simple plot of the the original data would suggest that is it not simply decreasing.

plot(t, y, main='Number of Fatalities over Time')

Math 538 Midterm 1 TakeHome

Hunter Alzate

October 8, 2021

Libraries Used

Problem 1a.) Obtain the normalizing constant for the likelihood in terms of \(\theta\).

Problem 1b.)

Problem 1c.) Comparing point estimates of \(\theta\)

Problem 1d.) 95% Posterior Credible Intervals

Problem 1e.) Prior Predictive and Posterior Predictive

Prior Predictive Distribution

Posterior Predictive Distribution

Comparison of Prior and Posterior Predictive Distributions

Problem 2a.) Derive Jeffrey’s Prior for the Model & Derive the Posterior of \(\theta\) under this prior.

Problem 2b.) Sample from the posterior. Provide density plots of your samples as well as the 95% posterior credible interval and the MAP estimate of \(\theta|y\)

Problem 2c.) Sample from the posterior predictive. Provide density plots of your samples as well as the 95% posterior credible interval and the MAP estimate of \(\theta|y\)

Problem 2d.) Assume now that the number of fatal accidents in each year \(t\) independetly follow a \(Poisson(\theta_t)\)…

Problem 2e.) Obtain 2000 independent posterior samples from $p(,|data) and plot the joint and marginal posterior densities.

\(\theta_1\) for year 1976

MAP and 95% Credible Intervals for \(\theta_1, \theta_2,..., \theta_{10}\)

Problem 2f.) Use your posterior samples of \(\alpha\) and \(\beta\) to predict the number of fatal accidents in the year 1986…