qgamma(p = 0.5, shape = 7, rate = 1.5)
#> [1] 4.446425
Question 1: Poisson Distribution with Chi-Square Prior
a) What is the posterior distribution of λ, after observing a single value of x?
Step 1: Likelihood Function
If \(X \sim \text{Poisson}(\lambda)\), then the likelihood of observing \(x\) is:
\[ P(X = x \mid \lambda) = \frac{\lambda^x e^{-\lambda}}{x!} \]
Step 2: Prior Distribution
Given \(\lambda \sim \chi^2_4\), we can express this as a Gamma distribution:
\[ \lambda \sim \text{Gamma}(\alpha = 2, \theta = 1/2) \]
Recall that a Chi-square distribution with \(k\) degrees of freedom is equivalent to:
\[ \chi^2_k \equiv \text{Gamma}\left(\frac{k}{2}, \frac{1}{2}\right) \]
So the prior density is:
\[ \begin{align} \pi(\lambda) &= \frac{\beta^{\alpha}}{\Gamma(\alpha) } \lambda^{\alpha-1} e^{-\beta\lambda}\\ &=\frac{(1/2)^2}{\Gamma(2) } \lambda^{2-1} e^{-\lambda/2}\\ &= \frac{1}{4} \lambda e^{-\lambda/2} \end{align} \] since \(\Gamma(2)=1\)
Step 3: Posterior Distribution
Using Bayes’ theorem, the unnormalized posterior is:
\[ \pi(\lambda \mid x) \propto P(x \mid \lambda) \cdot \pi(\lambda) = \left( \frac{\lambda^x e^{-\lambda}}{x!} \right) \cdot \left( \frac{1}{4} \lambda e^{-\lambda/2} \right) \]
Simplifying:
\[\begin{aligned} \pi(\lambda \mid x) &\propto\lambda^{x + 1} e^{- \frac{3}{2} \lambda}\\ &\propto\lambda^{x + 2-1} e^{- \frac{3}{2} \lambda}\\ \end{aligned} \]
This is the kernel of a Gamma distribution:
\[ \lambda \mid x \sim \text{Gamma}(x + 2, \text{rate} = \frac{3}{2}) \]
Final Answer
The posterior distribution of \(\lambda\), after observing \(x\), is:
\[ \boxed{\lambda \mid x \sim \text{Gamma}(x + 2, \text{rate} = \frac{3}{2})} \]
where \[\alpha^* = x+2 , \quad \beta^*=3/2\]
b) Given the observed x = 5, compute the mean, median, and mode and the 95
\[ \begin{aligned} \lambda \mid x &\sim \text{Gamma}(5 + 2, \text{rate} = \frac{3}{2})\\ \lambda \mid x&\sim \text{Gamma}(7,\frac{3}{2}) \end{aligned} \]
\[\begin{aligned} Mean&=E(X)\\ &=\frac{\alpha^{*}}{\beta^{*}}\\ &=\frac{7}{\frac{3}{2}}\\ &\approx 4.67 \end{aligned}\]
\[\begin{aligned} Mode&=\frac{\alpha^{*}-1}{\beta^{*}}\\ &=\frac{7-1}{\frac{3}{2}}\\ &=4 \end{aligned}\]
median using Wilson-Hilferty Approximation:
\[\begin{aligned} median&=\frac{\alpha^{*}-\frac{1}{3}}{\beta^{*}}\\ &=\frac{7-\frac{1}{3}}{\frac{3}{2}}\\ &=4.444444 \end{aligned}\]
median using software:
library(bayesrules)
plot_gamma_poisson(shape = 2, rate = 1/2, sum_y = 5, n = 1)
# Summarizing the posterior (and the prior):
summarize_gamma_poisson(shape = 2, rate = 1/2, sum_y = 5, n = 1)
model | shape | rate | mean | mode | var | sd |
---|---|---|---|---|---|---|
prior | 2 | 0.5 | 4.000000 | 2 | 8.000000 | 2.828427 |
posterior | 7 | 1.5 | 4.666667 | 4 | 3.111111 | 1.763834 |
######################
## Credible Intervals
######################
# A 95% quantile-based credible interval :
round(qgamma(c(0.025,0.975), shape=7, rate=3/2),3)
#> [1] 1.876 8.706
Markov Chain Monte Carlo in Rjags
library(rjags)
library(coda)
library(ggplot2)
library(ggmcmc) # For converting MCMC samples to tidy format
# Run your JAGS model (from previous code)
<- textConnection("
model model {
lambda ~ dgamma(2, 0.5) # Prior: Gamma(2, 0.5) = Chi-square(4)
X ~ dpois(lambda) # Likelihood
}
")
<- list(X = 5)
data <- jags.model(model, data = data, n.chains = 3)
jm #> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 1
#> Unobserved stochastic nodes: 1
#> Total graph size: 4
#>
#> Initializing model
update(jm, 1000)
<- coda.samples(jm, variable.names = "lambda", n.iter = 10000)
samples summary(samples)
#>
#> Iterations = 1001:11000
#> Thinning interval = 1
#> Number of chains = 3
#> Sample size per chain = 10000
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> 4.67630 1.77569 0.01025 0.01007
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> 1.910 3.403 4.434 5.703 8.778
quantile(samples[[1]], c(0.025, 0.5, 0.975))
#> 2.5% 50% 97.5%
#> 1.901777 4.434241 8.740291
# Convert to ggmcmc object
# Trace plot and diagnostics
plot(samples) # Trace plot and density
gelman.diag(samples) # R-hat
#> Potential scale reduction factors:
#>
#> Point est. Upper C.I.
#> lambda 1 1
Autocorrelation plots
<- ggs(samples)
samples_ggs
<- ggs_autocorrelation(samples_ggs, family = "lambda") +
autocorr_plot geom_hline(yintercept = 0, color = "darkred", linetype = "dashed") +
labs(title = "Autocorrelation for λ",
subtitle = "Ideal: Fast decay to zero (lag < 10)",
x = "Lag",
y = "Autocorrelation") +
theme_minimal(base_size = 12) +
theme(panel.grid.minor = element_blank())
print(autocorr_plot)
Trace Plot
library(ggplot2)
<- as.data.frame(samples[[1]])
samples_df ggplot(samples_df, aes(x = 1:10000, y = lambda)) +
geom_line(color = "blue") +
labs(title = "Trace Plot for λ", x = "Iteration", y = "λ") +
theme_minimal()
Question 2
(a) Explain what is meant by a conjugate prior distribution
Given a likelihood function \(p(y \mid \theta)\), if the posterior distribution \(p(\theta \mid y)\) belongs to the same family of probability distributions as the prior distribution \(p(\theta)\), then the prior and posterior are referred to as conjugate distributions with respect to that likelihood function. In this case, the prior is called a conjugate prior for the likelihood function \(p(y \mid \theta)\).
(b) Derive the posterior distribution for beliefs about p.
Let \(X_1, X_2,\cdots, X_n\) be iid Bernoulli(\(p\)). Then \(Y = \sum_{i=1}^n X_i\) is binomial(\(n,p\)). We assume the prior distribution of \(p\) is Beta(\(\alpha, \beta\)). From here we note that:
\[Prior: ~p\sim Beta(\alpha,\beta)\] so that \(f(p) = \frac{\Gamma(\alpha+\beta)}{\Gamma{(\alpha})\Gamma{(\beta)}}p^{\alpha-1} (1 - p)^{\beta-1}\)
also :
\[Likelihood~:Y|p \sim Binomial(n,p)\]
\[\begin{eqnarray*} % \nonumber to remove numbering (before each equation) f(y,p) &=& \left[\binom{n}{y}p^y(1-p)^{n-y}\right]\left[\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)}p^{\alpha-1}(1-p)^{\beta-1}\right] \\ &=& \binom{n}{y}\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)}p^{y+\alpha-1}(1-p)^{n-y+\beta -1}. \end{eqnarray*}\]The marginal pdf of \(Y\) is
\[\begin{equation*} f(y) = \int_0^1f(y,p)\mathrm{d}p = \binom{n}{y}\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)}\frac{\Gamma(y + \alpha) \Gamma(n-y+\beta)}{\Gamma(n + \alpha + \beta)}, \end{equation*}\]which is the beta-binomial distribution. The posterior distribution of \(p\) is given as follows;
\[\begin{equation*} \pi(p|y) = \frac{f(y,p)}{f(y)} = \frac{\Gamma(n + \alpha + \beta)}{\Gamma(y + \alpha) \Gamma(n-y+\beta)}p^{y+\alpha-1}(1-p)^{n-y+\beta -1}, \end{equation*}\]which is \(Beta(y+\alpha, n-y + \beta)\).
By inspection one could easily note that the posterior can easily be derived by inspecting the function derived from the joint distribution:
\[ \begin{aligned} {\color{red}{Pr(p \mid k)}} & \propto {\color{blue}{\binom{n}{k}p^k(1-p)^{n-k}}} \; {\color{green}{p^{a-1} (1 - p)^{\beta-1}}}\\ & \propto {p^{(\alpha+k)-1}} {(1-p)^{(\beta+n-k)-1}} \end{aligned} \] - That is:
\[ p \mid k \sim Beta(\alpha+k,\beta+n-k)\]
c) Show that if \(X \sim Beta(\alpha,\beta)\) with \(\alpha > 1\) then
\[E(\frac{1}{X})=\frac{\alpha+\beta-1}{\alpha-1}\]
Proof
\[ \begin{align} Given \quad that \quad f(x) &= \frac{\Gamma{(\alpha+\beta)}}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1} (1 - x)^{\beta-1}\\ E(\frac{1}{X})&= \int^{\infty}_{-\infty}\frac{1}{x}\frac{\Gamma{(\alpha+\beta)}}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1} (1 - x)^{\beta-1}\\ &=\frac{\Gamma{(\alpha+\beta)}}{\Gamma(\alpha)\Gamma(\beta)}\int^{\infty}_{-\infty}\frac{1}{x}x^{\alpha-1} (1 - x)^{\beta-1}\\ &=\frac{\Gamma{(\alpha+\beta)}}{\Gamma(\alpha)\Gamma(\beta)}\int^{\infty}_{-\infty}\frac{x^{\alpha-1}}{x} (1 - x)^{\beta-1}\\ &=\frac{\Gamma{(\alpha+\beta)}}{\Gamma(\alpha)\Gamma(\beta)}\int^{\infty}_{-\infty}x^{\alpha-1-1} (1 - x)^{\beta-1}\\ &=\frac{\Gamma{(\alpha-1+1+\beta)}}{\Gamma(\alpha-1+1)\Gamma(\beta)}\int^{\infty}_{-\infty}x^{\alpha-1-1} (1 - x)^{\beta-1}\\ &=\frac{\Gamma{(\alpha+\beta-1+1)}}{\Gamma(\alpha-1+1)\Gamma(\beta)}\int^{\infty}_{-\infty}x^{\alpha-1-1} (1 - x)^{\beta-1}\\ &=\frac{(\alpha+\beta-1)\Gamma{(\alpha+\beta-1)}}{(\alpha-1)\Gamma(\alpha-1)\Gamma(\beta)}\int^{\infty}_{-\infty}x^{\alpha-1-1} (1 - x)^{\beta-1} \quad since \quad \Gamma{(n+1)}=n\Gamma{(n)}\\ &=\frac{(\alpha+\beta-1)}{(\alpha-1)}\int^{\infty}_{-\infty}\frac{\Gamma{(\alpha+\beta-1)}}{\Gamma(\alpha-1)\Gamma(\beta)}x^{\alpha-1-1} (1 - x)^{\beta-1}\\ &=\frac{(\alpha+\beta-1)}{(\alpha-1)}\int^{\infty}_{-\infty}\frac{\Gamma{(\alpha-1+\beta)}}{\Gamma(\alpha-1)\Gamma(\beta)}x^{(\alpha-1)-1} (1 - x)^{\beta-1}\\ &=\frac{(\alpha+\beta-1)}{(\alpha-1)}\int^{\infty}_{-\infty}\frac{\Gamma{(\alpha^{*}+\beta)}}{\Gamma(\alpha^{*})\Gamma(\beta)}x^{\alpha^{*}-1} (1 - x)^{\beta-1} \quad where~ \alpha^{*} = \alpha-1\\ &=\frac{(\alpha+\beta-1)}{(\alpha-1)} \quad since ~\int^{\infty}_{-\infty}\frac{\Gamma{(\alpha^{*}+\beta)}}{\Gamma(\alpha^{*})\Gamma(\beta)}x^{\alpha^{*}-1} (1 - x)^{\beta-1} =1 \end{align}\]
d) Compute Bayes’ posterior mean estimate in the specific case where \(α = β = 3, k = 2\),and \(n =10\) from your derivations from b):
i) manually and find the percentage contribution of the sample (credibility factor Z as a percentage)
The posterior mean for \(Beta(\alpha^{*},\beta^{*})\) is given by
\[ \begin{align} \hat{p}_B &=\frac{a^{*}}{\alpha^{*}+\beta^{*}}\\ &= \frac{k+\alpha}{\alpha + k+\beta + n-k}\\ &= \frac{k+\alpha}{\alpha +\beta + n}. \end{align} \] We can write \(\hat{p}_B\) in the following way;
\[\hat{p}_B = \left(\frac{n}{\alpha +\beta +n}\right)\left(\frac{k}{n}\right) + \left(\frac{\alpha + \beta}{\alpha + \beta + n}\right)\left(\frac{\alpha}{\alpha + \beta}\right).\]
Thus \(\hat{p}_B\) is a linear combination of the prior mean and the sample mean. The weights are determined by the values of \(\alpha\), \(\beta\) and \(n\).
\[Z=\frac{n}{\alpha + \beta + n} ,\quad 1-Z= \frac{\alpha + \beta}{\alpha +\beta +n}\] \[Z=\frac{\alpha + \beta}{\alpha + \beta + n}=\frac{10}{3+3+10} \times 100\%=62.5\%\]
<- 3; beta <- 3; k <- 2; n <- 10
alpha
# Posterior mean
<- (alpha + k) / (alpha + beta + n) # 0.3125
post_mean
# Credibility factor
<- n / (n + alpha + beta) *100) # 0.625 (62.5%)
(Z ## [1] 62.5
Model specification
library(rjags)
library(coda)
library(bayesplot)
# Data
<- list(n = 10, k = 2)
data_list
# Model 1: Original Prior (Beta(3,3))
<- textConnection("
model1 model {
p ~ dbeta(3, 3) # Original prior
k ~ dbin(p, n) # Likelihood
}
")
# Model 2: Expert Prior (Beta(2.25,6.75) for p=0.25)
<- textConnection("
model2 model {
p ~ dbeta(2.25, 6.75) # Expert prior (mean=0.25)
k ~ dbin(p, n)
}
")
Model fitting
# Fit Model 1
<- jags.model(model1, data = data_list, n.chains = 4)
jm1 ## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 1
## Total graph size: 4
##
## Initializing model
update(jm1, 1000) # Burn-in
<- coda.samples(jm1, variable.names = "p", n.iter = 10000)
samples1
# Fit Model 2
<- jags.model(model2, data = data_list, n.chains = 4)
jm2 ## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 1
## Total graph size: 5
##
## Initializing model
update(jm2, 1000)
<- coda.samples(jm2, variable.names = "p", n.iter = 10000) samples2
# Posterior summaries
summary(samples1) # Original prior
##
## Iterations = 2001:12000
## Thinning interval = 1
## Number of chains = 4
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.3129095 0.1120814 0.0005604 0.0007275
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.1203 0.2303 0.3049 0.3873 0.5513
summary(samples2) # Expert prior
##
## Iterations = 2001:12000
## Thinning interval = 1
## Number of chains = 4
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## 0.2233034 0.0931844 0.0004659 0.0006328
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## 0.07146 0.15410 0.21385 0.28219 0.42968
# WAIC comparison (requires loo package)
# Note: For rjags, extract log-likelihood manually
<- sapply(1:10, function(i) dbinom(data_list$k, data_list$n, samples1[[1]][,1]))
loglik1 <- sapply(1:10, function(i) dbinom(data_list$k, data_list$n, samples2[[1]][,1]))
loglik2
library(loo)
<- waic(matrix(loglik1, ncol = 10))
waic1 <- waic(matrix(loglik2, ncol = 10))
waic2 print(compare(waic1, waic2))
## elpd_diff se
## 0.4 0.0
Compare convergence
# Trace plots
color_scheme_set("blue")
mcmc_trace(samples1) + ggtitle("Original Prior")
color_scheme_set("red")
mcmc_trace(samples2) + ggtitle("Expert Prior")
# Autocorrelation
mcmc_acf(samples1) + ggtitle("ACF: Original Prior")
mcmc_acf(samples2) + ggtitle("ACF: Expert Prior")
gelman.diag(samples1)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## p 1 1
effectiveSize(samples1)
## p
## 23777.5
Question 3
a) Use examples to demonstrate the inverse transformation method and also the acceptance rejection random number generation approach.
Inverse Transformation Method
- Suppose we want to simulate variable \(X\) with some distribution. We denote it’s cumulative function as: \[F_{X}(x) = \mathbb{P}( X \leq x) \]
- The random variable \(Y\) defined as \(Y=F_{X}(x)\) has \(U(0,1)\) distribution.
- The inverse of that theorem says that \(X=F^{-1}(Y)\) hass the same distribution as \(X\)
- If we know the inverse CDF function for a continuous random variable X , then we can generate random variates from the distribution of X by generating \(U(0,1)\) variates and transforming them using the inverse CDF.
Let’s see how we can use this method with a simple example. If we have \(X \sim Exp(\lambda)\) (an exponential distribution with parameter \(\lambda\)) we have that the cumulative function is: \[F(x) = 1 - e^{-\lambda x}\]
By inverting this function with some basic algebra we get: \[F^{-1}(u) = \frac{-1}{\lambda} ln(1-u) \]
if \(\lambda=3\) we have \[F^{-1}(u) = \frac{-1}{3} ln(1-u) \]
We can note that \(U \sim (1-U)\) and so we have: \[ \frac{-1}{3} ln(U) \sim F \]
Example
# Set parameters
<- 10000 # Number of simulations
N <- 3 # Rate parameter
lam
# Set seed for reproducibility
set.seed(123)
# Sample uniform random variates
<- runif(N)
U
# Apply probability integral transform
<- -log(U)/lam
X
# Create analytic result
<- seq(0, 2.5, length.out = 50)
x_a <- lam * exp(-lam * x_a)
fx_a
# Plot histogram of samples
hist(X, freq = FALSE, breaks = 50, col = "grey",
xlab = "x", ylab = "f(x)",
main = "Empirical Simulated Exponential Distribution Compared to Analytic Result",
xlim = c(0, 2.5), ylim = c(0, 3))
lines(x_a, fx_a, col = "red", lwd = 2.5)
legend("topright", legend = c("Empirical", "Analytic"),
col = c("grey", "red"), lwd = c(NA, 2.5),
fill = c("grey", NA), border = c("black", NA))
Acceptance-Rejection Methods
- There are many distributions for which the inverse transform method and even then general transformations will fail to be able to generate the required random variables
- hence we turn to Another class of method we can use are the acceptance-rejection methods. Again this relies on taking (pseudo-random) uniform variates as a starting point.
- The general concept is to “look” at each uniform pseudo-random number and decide whether we believe this could have been generated by the distribution we wish to sample from (and accept it) or not (and reject it).
The Fundamental Theorem of Simulation:
- suppose we wish to sample from a distribution \(f(x)\) that is difficult , however we can sample easily from a distribution with pdf \(g(x)\) which has a property that it envelopes \(f(x)\) i.e \[f(x)\leq Mg(x)\] for x where M is a constant.
- Here we are saying that the height of \(g(x)\) or some scaled version of it \(Mg(x)\) must be greater than \(f(x)\)
Choose \(g(x)\) which is close to the \(f(x)\) and easy to sample and has the same domain as the target \(f(x)\).
- Sample a random variable \(Y\) with density \(g(x)\)
- Generate \(Y \sim g, U \sim U[0,1]\)
- Calculate the acceptance probability \(\alpha=\frac{f(y)}{Mg(Y)}\)
- Sample a Uniform random variable \(U \sim U(0,1)\) giving a sample value u
- Accept \(X=Y ~if~ U \leq \frac{f(Y)}{Mg(Y)}\)
- Return to 1
In general , Accept \(X_{i}\) if \(U_i \leq \frac{f(x_i)}{Mg(x_i)} ,\forall x_i\) otherwise reject hence \(\underline{X}\sim f(x)\)
- Bongani wants to draw samples from a \(Beta(2,2)\) distribution but his package does not have a beta random number generator but it does have uniform number generator.
- He knows that for the beta distribution:
\[f(\pi)\propto \pi^{\alpha-1}(1-\alpha)^{\beta-1}\]
- We can analyse the structure of this function (find the maximum turning point) so that we know where to put our envelope.
\[ \begin{align} f(\pi)&= \pi^{\alpha-1}(1-\pi)^{\beta-1}\\ &= \pi^{2-1}(1-\pi)^{2-1}\\ &= \pi(1-\pi)\\ &= \pi-\pi^2\\ &differentiate~with~respect~to~\pi\\ f'(\pi)&= 1-2\pi \\ &equate~to~zero\\ 1-2\pi&=0\\ \pi&=0.5\\ \end{align}\]
- value of the function when \(\pi\) is maximum is \(f(0.5)=0.25\). The image below shows how an envelope would look like:
We shall generate from an actual Beta distribution. This has density function: \[ f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}} {B(\alpha,\beta)} \]
Where \(B\) is the Beta function defined: \[ B(\alpha ,\beta )={\frac {\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )}} \]
The Gamma function \(\Gamma\) being: \[ \Gamma (z)=\int _{0}^{\infty }x^{z-1}e^{-x}\,dx \]
Through differentiation we find the maximum value attained by this function is: \[ max_x (f(x)) = \frac{\left(\frac{\alpha-1}{\alpha+\beta-2}\right)^{\alpha-1}\left(1-\frac{\alpha-1}{\alpha+\beta-2}\right)^{\beta-1}} {B(\alpha,\beta)} = M \]
Assuming \(\alpha + \beta > 2\)
We can use this to sample from the Beta distribution using acceptance/rejection:
Summary
Practical idea
# Set parameters
<- 2
ap <- 2
bt
# Define beta PDF function
<- function(x, a = ap, b = bt) {
beta_pdf ^(a-1) * (1-x)^(b-1)) / beta(a, b)
(x
}
# Calculate maximum value
<- (ap - 1) / (ap + bt - 2)
x_max <- beta_pdf(x_max)
M
# Number of samples
<- 1000
N
# Generate samples
set.seed(123) # For reproducibility
<- runif(N)
Y <- runif(N, 0, M)
U
# Acceptance/rejection masks
<- U < beta_pdf(Y)
mask_acc <- U >= beta_pdf(Y) # Changed to >= for consistency
mask_rej
# Accepted and rejected samples
<- Y[mask_acc]
Y_acc <- U[mask_acc]
U_acc
<- Y[mask_rej]
Y_rej <- U[mask_rej]
U_rej
# Create analytic PDF
<- seq(0, 1, length.out = 100)
X <- beta_pdf(X)
PDF
# Create plots
par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))
# Plot 1: Acceptance-Rejection Sampling
plot(Y_acc, U_acc, col = "pink", pch = 16,
xlab = "x", ylab = "f(x)",
main = "Acceptance-Rejection Sampling",
xlim = c(0, 1), ylim = c(0, M + 0.05))
points(Y_rej, U_rej, col = "grey", pch = 16)
lines(X, PDF, col = "red", lwd = 2)
legend("topright", legend = c("Accepted", "Rejected", "PDF"),
col = c("pink", "grey", "red"), pch = c(16, 16, NA), lty = c(NA, NA, 1))
# Plot 2: Empirical Distribution
hist(Y_acc, freq = FALSE, col = "grey",
xlab = "x", ylab = "f(x)",
main = "Empirical Distribution",
xlim = c(0, 1), ylim = c(0, M + 0.05))
lines(X, PDF, col = "red", lwd = 2)
# Reset plotting parameters
par(mfrow = c(1, 1))
(b) Gibbs sampling
Gibbs
- The essence of the Gibbs algorithm is the sampling of parameters conditioned in other parameters: \[P(\theta_1 \mid \theta_2, \dots \theta_p)\]
- we have a two parameter situation \(\{\theta_1, \theta_2\}\) with independent prior distributions \(p(\theta_1, \theta_2) = p(\theta_1)p(\theta_2)\).
- Start at a random point : \(\theta^0\)
- for \(t= 1 \cdot T\) repeat the following steps
- Set \(\theta^{(1)}\)
- For \(j=1,2,3,...,d\) update \(\theta_j\) from \(\theta_j \sim f(\theta_j|\theta_{j-1,y})\)
Define an initial set \(\boldsymbol{\theta}^{(0)} \in \mathbb{R}^p\) that \(P\left(\boldsymbol{\theta}^{(0)} \mid \mathbf{y} \right) > 0\) \(t = 1, 2, \dots\) Assign:
\[ \boldsymbol{\theta}^{(t)} = \begin{cases} \theta^{(t)}_1 & \sim P \left(\theta_1 \mid \theta^{(0)}_2, \dots, \theta^{(0)}_p \right) \\ \theta^{(t)}_2 & \sim P \left(\theta_2 \mid \theta^{(t-1)}_1, \dots, \theta^{(0)}_p \right) \\ & \vdots \\ \theta^{(t)}_p & \sim P \left(\theta_p \mid \theta^{(t-1)}_1, \dots, \theta^{(t-1)}_{p-1} \right) \end{cases} \]
Example
Suppose we have: - Data: Observations \(y_1, \dots, y_n\) from \(\mathcal{N}(\mu, \sigma^2)\) (with \(\sigma^2\) initially known)
- Prior: \(\mu \sim \mathcal{N}(\mu_0, \tau_0^2)\)
- Goal: Estimate the posterior \(p(\mu \mid \mathbf{y})\)
For Gibbs sampling, we’ll estimate both \(\mu\) and \(\sigma^2\) (now unknown).
Gibbs Sampling Steps
1. Define Conditional Distributions
- Conditional for \(\mu\) (given \(\sigma^2\)):
\[ \mu \mid \sigma^2, \mathbf{y} \sim \mathcal{N}\left( \frac{\frac{n\bar{y}}{\sigma^2} + \frac{\mu_0}{\tau_0^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau_0^2}}, \left( \frac{n}{\sigma^2} + \frac{1}{\tau_0^2} \right)^{-1} \right) \]
- Conditional for \(\sigma^2\) (given \(\mu\)): \[ \sigma^2 \mid \mu, \mathbf{y} \sim \text{Inverse-Gamma}\left( \frac{n}{2} + \alpha, \frac{\sum (y_i - \mu)^2}{2} + \beta \right) \] where \(\alpha, \beta\) are hyperparameters.
2. Gibbs Sampling Algorithm
- Initialize \(\mu^{(0)}\) and \(\sigma^{2(0)}\)
- For \(t = 1\) to \(T\):
- Sample \(\mu^{(t)}\) from \(p(\mu \mid \sigma^{2(t-1)}, \mathbf{y})\)
- Sample \(\sigma^{2(t)}\) from \(p(\sigma^2 \mid \mu^{(t)}, \mathbf{y})\)
- Discard burn-in samples
# Simulate data
set.seed(123)
<- 50
n <- 2
true_mu <- 4
true_sigma2 <- rnorm(n, true_mu, sqrt(true_sigma2))
y
# Hyperparameters
<- 0 # Prior mean for mu
mu0 <- 100 # Prior variance for mu
tau02 <- 1 # Prior shape for sigma2
alpha <- 1 # Prior scale for sigma2
beta
# Gibbs sampling
<- 10000
T <- numeric(T)
mu_samples <- numeric(T)
sigma2_samples
# Initialize
1] <- mean(y)
mu_samples[1] <- var(y)
sigma2_samples[
for (t in 2:T) {
# Sample mu given sigma2
<- 1 / (n/sigma2_samples[t-1] + 1/tau02)
post_var_mu <- post_var_mu * (n*mean(y)/sigma2_samples[t-1] + mu0/tau02)
post_mean_mu <- rnorm(1, post_mean_mu, sqrt(post_var_mu))
mu_samples[t]
# Sample sigma2 given mu
<- alpha + n/2
post_alpha <- beta + sum((y - mu_samples[t])^2)/2
post_beta <- 1/rgamma(1, post_alpha, post_beta)
sigma2_samples[t]
}
# Discard burn-in (first 20%)
<- T/5
burnin <- mu_samples[-(1:burnin)]
mu_final <- sigma2_samples[-(1:burnin)] sigma2_final
library(ggplot2)
ggplot(data.frame(iteration = 1:T, mu = mu_samples), aes(iteration, mu)) +
geom_line(color = "blue") +
labs(title = "Trace Plot for μ")
acf(mu_final, main = "Autocorrelation for μ")
mean(mu_final) # Posterior mean for μ
## [1] 2.063447
quantile(mu_final, c(0.025, 0.975)) # 95% credible interval
## 2.5% 97.5%
## 1.556372 2.578244
c) Proof for any two of the following. If the random variables \(X_i\) are iid \(Exp(1)\), then three standard distributions can be derived. We can get these three standard distributions
- if \(X_1,X_2, \cdot ,X_n\) are \(iid\) random variables with parameter \(\lambda=1\) then: \[Y=\sum_{i=1}^n X_i\sim \Gamma{(n,1)}\]
Proof
MGF for a gamma is given by:
\[ \begin{align} M_X(t) &= E(e^{tx})\\ &=\int^{\infty}_{-\infty}e^{tx}.\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}\\ &=\int^{\infty}_{-\infty}\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-(\beta -t)x}\\ &=\frac{\beta^{\alpha}}{(\beta-t)^{\alpha}}\int^{\infty}_{-\infty}\frac{(\beta-t)^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-(\beta -t)x}\\ &=\frac{\beta^{\alpha}}{(\beta-t)^{\alpha}}\\ &=\left(\frac{\beta}{\beta-t}\right)^{\alpha}\\ \end{align} \] if \(X\sim \exp(\lambda)\) then MGF then be given as follows:
\[ \begin{align} M_X(t)&=E(e^{tx})\\ &=\int^{\infty}_{-\infty}e^{tx}.\lambda e^{-\lambda x}dx\\ &=\int^{\infty}_{-\infty}\lambda e^{-(\lambda-t)x}dx\\ &=\frac{\lambda}{\lambda-t}\int^{\infty}_{-\infty}(\lambda-t) e^{-(\lambda-t)x}dx\\ &=\frac{\lambda}{\lambda-t} \end{align} \] Given that \(S_n=X_1+X_2+\cdot \cdot +X_n\) we show that the MGF for \(S_n\) resembles a gamma:
\[ \begin{align} M_{S_n}(t)&=M_{X_1+X_2+\cdot \cdot +X_n}(t)\\ &=E(e^{t(X_1+X_2+\cdot \cdot +X_n)})\\ &=E(e^{tX_1}).E(e^{tX_2})\cdot \cdot \cdot E(e^{tX_n)})\\ &= M_{X_1}(t).M_{X_2}(t)...M_{X_n}(t)\\ &=[M_{X}(t)]^n \quad due \quad to~iid~property\\ &=\left(\frac{\lambda}{\lambda-t} \right)^n \end{align} \] for \(\lambda=1\)
\[ \begin{align} M_{S_n}(t)&=\left(\frac{1}{1-t} \right)^n \end{align} \]
\[ \begin{align} M_{S_n}(t)&=\left(\frac{1}{1-t} \right)^n \end{align} \]
let \(\beta=1\) and \(\alpha=n\) thus \(S_n \sim \Gamma(n,1)\)
- Exponential Distribution: If \(X_1,X_2,...,X_n\) are iid exponential random variables with parameter \(\lambda = 1\), show that the minimum \(W = min(X_1,X_2,...,X_n)\) follows an exponential distribution with parameter \(n\)
\[the~pdf~of~the~mininimum~order~statistics~given~ that~f(x)=e^{-x}\]
- First we derive the cdf for a minimum order statistics.
\[ \begin{aligned} F_{X_{(1)}}(X_{(1)})&=P(X_{(1)}\le x)\\ &=1-P(X_{(1)}\ge x)\\ &=1-P(X_1>x,X_2>x,X_3>x,\cdot \cdot \cdot,X_n>x)\\ &=1-[P(X\ge x)]^n \quad iid \quad assumption\\ &=1-[1-[P(X\le x)]^n]\\ &=1-[1-F(x)]^n \end{aligned} \]
- for an exponential distribution with parameter \(\lambda\) we can find the cumulative distribution function as follows:
\[ \begin{aligned} F(x) &= \int^{x}_{-\infty}f(t)dt\\ &= \int^{x}_{0}e^{-t}dt\\ &=[-e^{-t}]^{x}_{0}\\ &=[-e^{-x}]-[-e^{-0}]\\ &=1-e^{-x} \end{aligned} \]
hence we plug that in the cdf for the minimum order statistics:
\[ \begin{aligned} F_{X_{(1)}}(X_{(1)})&=1-[1-(1-e^{-x})]^n\\ F_{X_{(1)}}(X_{(1)})&=1-e^{-nx}\\ f_{x_{(1)}}(x_{(1)})&=\frac{d}{dx}F_{X_{(1)}}(X_{(1)})\\ &=\frac{d}{dx}(1-e^{-nx})\\ &=0-(-ne^{nx})\\ &=ne^{-nx} \quad hence \quad X_{(1)}\sim \exp(n) \end{aligned} \]
Recall that if \(X\sim \exp(\lambda)\) then \(f(x)=\lambda e^{-\lambda}\)
Question 4
a) Write down the posterior resulting from a Normal-Gamma prior and the resulting conjugate prior. Show the likelihood and prior and all the parameters that lead to the final posterior distribution.
\[Posterior:\mu,\sigma^2|X \sim \Gamma \left(\frac{n}{2}+\alpha,\beta+\frac{\sum(x-\mu)^2}{2}\right)\]
\[Prior:f(\sigma^2)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{-(\alpha+1)}\exp^{-\beta/\sigma^2}\]
\[Likelihood:L(X|\mu,\sigma^2)= \prod^{n}_{i=1}\frac{1}{\sigma\sqrt(2\pi)}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\]
deriving the posterior distribution:
\[ \begin{aligned} p(\mu,\sigma^2|X) &\propto L(X|\mu,\sigma^2)\times f(\sigma^2)\\ &\propto \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-(n-1)s^2}{2\sigma^2}}*\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{-(\alpha+1)}\exp^{-\beta/\sigma^2}\\ &\propto \left(\frac{1}{\sqrt(2\pi)}\right)^n\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{\frac{-n}{2}}*e^{\frac{-(n-1)s^2}{2\sigma^2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}\\ &\propto (\sigma^2)^{\frac{-n}{2}}*e^{\frac{-(n-1)s^2}{2\sigma^2}}*(\sigma^2)^{\alpha-1}e^{-\beta/\sigma^2}\\ &\propto (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-(n-1)s^2}{2\sigma^2}}\\ &\propto (\sigma^2)^{-(\frac{n}{2}+\alpha+1)}*e^{-\frac{1}{\sigma^2}(\beta+\frac{(n-1)s^2}{2})} \end{aligned} \]
b(i) Derive the marginal posterior distribution of the variance for the claims in general.Define all your terms before finally substituting the given values. for normal Gamma Conjugacy , we have that:
1. the likelihood of observing x is given by:
\[ \begin{aligned} L(X|\mu,\sigma^2)&= \prod^{n}_{i=1}\frac{1}{\sigma\sqrt(2\pi)}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\\ &= \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-\sum(x-\mu)^2}{2\sigma^2}}\\ &= \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-\sum(x-\bar{x}+\bar{x}+\mu)^2}{2\sigma^2}}\\ &= \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-\sum[(x-\bar{x})^2-2\sum(x-\bar{x})(x-\mu)+(\bar{x}-\mu)^2]}{2\sigma^2}} \\ &= \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-\sum[(x-\bar{x})^2+(\bar{x}-\mu)^2]}{2\sigma^2}} \quad since -2\sum(x-\bar{x})(x-\mu)=0\\ \end{aligned} \]
2. prior knowledge about \(\sigma^2\) is known i.e \(\sigma^2 \sim \Gamma(\alpha,\beta)\)
\[f(\sigma^2)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{-(\alpha+1)}\exp^{-\beta/\sigma^2}\] deriving the marginal posterior distribution for the variance:
\[ \begin{aligned} p(\mu,\sigma^2|X) &\propto L(X|\mu,\sigma^2)\times f(\sigma^2)\\ &\propto \left(\frac{1}{\sigma\sqrt(2\pi)}\right)^ne^{\frac{-\sum[(x-\bar{x})^2+(\bar{x}-\mu)^2]}{2\sigma^2}}*\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{-(\alpha+1)}\exp^{-\beta/\sigma^2}\\ &\propto \left(\frac{1}{\sqrt(2\pi)}\right)^n\frac{\beta^{\alpha}}{\Gamma(\alpha)}(\sigma^2)^{\frac{-n}{2}}*e^{\frac{-\sum[(x-\bar{x})^2+(\bar{x}-\mu)^2]}{2\sigma^2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}\\ &\propto (\sigma^2)^{\frac{-n}{2}}*e^{\frac{-\sum[(x-\bar{x})^2+(\bar{x}-\mu)^2]}{2\sigma^2}}*(\sigma^2)^{\alpha-1}e^{-\beta/\sigma^2}\\ &\propto (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum[(x-\bar{x})^2+(\bar{x}-\mu)^2]}{2\sigma^2}}\\ &\propto (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum(x-\bar{x})^2}{2\sigma^2}}e^{\frac{-\sum(\bar{x}-\mu)^2}{2\sigma^2}} \\ &\propto (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum(x-\bar{x})^2}{2\sigma^2}}e^{\frac{-n(\bar{x}-\mu)^2}{2\sigma^2}}\quad since \sum(\bar{x}-\mu)^2=n(\bar{x}-\mu)^2\\ \end{aligned} \]
next up , to find the marginal posterior for the variance , we integrate with respect to \(\mu\)
\[ \begin{aligned} p(\sigma^2|X)&= \int p(\mu,\sigma^2|X) d\mu\\ &= \int(\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum(x-\bar{x})^2}{2\sigma^2}}e^{\frac{-(\bar{x}-\mu)^2}{2\frac{\sigma^2}{n}}}d\mu\\ &= (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum(x-\bar{x})^2}{2\sigma^2}}\int e^{\frac{-(\bar{x}-\mu)^2}{2\frac{\sigma^2}{n}}}d\mu\\ &= (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-\sum(x-\bar{x})^2}{2\sigma^2}}*\sigma\sqrt{2\pi}\quad since \int e^{\frac{-(\bar{x}-\mu)^2}{2\frac{\sigma^2}{n}}}d\mu =\sigma\sqrt{2\pi}\\ &= (\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-(n-1)s^2}{2\sigma^2}}*\sigma\sqrt{2\pi} \quad since \sum(x-\bar{x})^2=(n-1)s^2\\ &=(\sigma^2)^{\frac{-n}{2}}*(\sigma^2)^{\frac{1}{2}}\sqrt{2\pi}*(\sigma^2)^{-(\alpha+1)}e^{-\beta/\sigma^2}*e^{\frac{-(n-1)s^2}{2\sigma^2}}\\ &=(\sigma^2)^{-(\frac{n-1}{2}+\alpha+1)}*e^{-\frac{\frac{(n-1)s^2}{2}+\beta}{\sigma^2}}\\ \end{aligned} \]
hence the marginal posterior for variance is given by:
\[\sigma^2|X \sim \Gamma \left(\frac{n-1}{2}+\alpha,\beta+\frac{(n-1)s^2}{2}\right)\]
Definition of Terms
- \(\sigma^2\): Population variance (unknown parameter)
- \(X\): Observed data \(\{x_1, \ldots, x_n\}\)
- \(n\): Sample size
- \(s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2\): Sample variance
- \(\alpha, \beta\): Parameters of the inverse-gamma prior (equivalent to \(\Gamma\) prior on \(1/\sigma^2\))
Plugging in terms
\[\sigma^2|X \sim \Gamma \left(\frac{n-1}{2}+200,300+\frac{(n-1)s^2}{2}\right)\]
Question 5
Bayesian model in Stata
use poissondata
of person-years for offset*
*Log gen log_py = log(personyears)
* Run Bayesian Poisson regressionpoisson deaths smoke i.age, offset(log_py)
bayes:
///
bayes, _cons}, normal(0, 100)) ///
prior({deaths:normal(0, 100)) ///
prior({deaths:smoke}, normal(0, 100)) ///
prior({deaths:age}, poisson deaths smoke i.age, offset(log_py)
:
///
bayes, ///
rseed(123) ///
burnin(1000) ///
mcmcsize(4000) ///
nchains(4) poisson deaths smoke i.age, offset(log_py)
:
bayesstats summary
* Trace and density plots
bayesgraph diagnostics {deaths:smoke} graph export post1.png,replace
bayesgraph diagnostics {deaths: 2.age 3.age 4.age 5.age}graph export post3.png,replace
levelsof age, local(levels)
foreach l of local levels {
if `l' != 1 { // Skip base category (1.age)
`l'.age}, name(diag_`l', replace)
bayesgraph diagnostics {deaths:graph export diag_age`l'.png, replace
}
}in ...
## Burn-
## Simulation ...
##
## Model summary
## ------------------------------------------------------------------------------
## Likelihood: poisson(xb_deaths)
## deaths ~
##
## Prior: _cons} ~ normal(0,10000) (1)
## {deaths:smoke i.age
## ------------------------------------------------------------------------------of the linear form xb_deaths.
## (1) Parameters are elements
##
## Bayesian Poisson regression MCMC iterations = 12,500in = 2,500
## Random-walk Metropolis–Hastings sampling Burn-sample size = 10,000
## MCMC of obs = 10
## Number
## Acceptance rate = .2464min = .01924
## Efficiency:
## avg = .02552max = .03181
## Log marginal-likelihood = -75.624571
##
## ------------------------------------------------------------------------------
## | Equal-taileddev. MCSE Median [95% cred. interval]
## deaths | Mean Std.
## -------------+----------------------------------------------------------------
## smoke | .3521283 .1084005 .006078 .3546537 .1489718 .5658104
## |
## age |
## 45-54 | 1.49149 .1946732 .013833 1.484139 1.131913 1.887716
## 55-64 | 2.647907 .1827394 .013174 2.64017 2.307437 3.051508
## 65-74 | 3.368698 .1864861 .012197 3.362579 3.025836 3.769044
## 75-84 | 3.712312 .1973451 .011946 3.711851 3.338273 4.134095
## |_cons | -8.287494 .2669905 .015025 -8.291325 -8.821961 -7.780908
##
## ------------------------------------------------------------------------------in the model as the offset.
## Note: Variable log_py is included for model parameters.
## Note: Default priors are used
##
## in ...
## Burn-
## Simulation ...
##
## Model summary
## ------------------------------------------------------------------------------
## Likelihood: poisson(xb_deaths)
## deaths ~
##
## Priors: _cons smoke} ~ normal(0,100) (1)
## {deaths:normal(0,10000) (1)
## {deaths:i.age} ~ normal(0,100)
## {deaths:age} ~
## ------------------------------------------------------------------------------of the linear form xb_deaths.
## (1) Parameters are elements
##
## Bayesian Poisson regression MCMC iterations = 12,500in = 2,500
## Random-walk Metropolis–Hastings sampling Burn-sample size = 10,000
## MCMC of obs = 10
## Number
## Acceptance rate = .2818min = .005653
## Efficiency:
## avg = .01223max = .03652
## Log marginal-likelihood = -71.441567
##
## ------------------------------------------------------------------------------
## | Equal-taileddev. MCSE Median [95% cred. interval]
## deaths | Mean Std.
## -------------+----------------------------------------------------------------
## smoke | .3462018 .1091205 .009213 .3437231 .1406685 .5704558
## |
## age |
## 45-54 | 1.376838 .1818841 .022073 1.371683 1.007062 1.733001
## 55-64 | 2.595201 .1714622 .022806 2.584071 2.281989 2.949362
## 65-74 | 3.259247 .1717166 .021148 3.247875 2.960456 3.617496
## 75-84 | 3.671238 .1836642 .021791 3.672013 3.292856 4.017376
## |_cons | -8.202599 .2510759 .026623 -8.1901 -8.706632 -7.756934
##
## age | .690597 9.899977 .518051 .5576908 -18.79259 20.23102
## ------------------------------------------------------------------------------in the model as the offset.
## Note: Variable log_py is included for some model parameters.
## Note: Default priors are used
##
##
## Chain 1in ...
## Burn-
## Simulation ...
##
## Chain 2in ...
## Burn-
## Simulation ...
##
## Chain 3in ...
## Burn-
## Simulation ...
##
## Chain 4in ...
## Burn-
## Simulation ...
##
## Model summary
## ------------------------------------------------------------------------------
## Likelihood: poisson(xb_deaths)
## deaths ~
##
## Prior: _cons} ~ normal(0,10000) (1)
## {deaths:smoke i.age
## ------------------------------------------------------------------------------of the linear form xb_deaths.
## (1) Parameters are elements
## of chains = 4
## Bayesian Poisson regression Number
## Random-walk Metropolis–Hastings sampling Per MCMC chain:
## Iterations = 5,000in = 1,000
## Burn-size = 4,000
## Sample of obs = 10
## Number
## Avg acceptance rate = .2262min = .009659
## Avg efficiency:
## avg = .01542max = .02122
## log marginal-likelihood = -82.938672 Max Gelman–Rubin Rc = 7.182
## Avg
##
## ------------------------------------------------------------------------------
## | Equal-taileddev. MCSE Median [95% cred. interval]
## deaths | Mean Std.
## -------------+----------------------------------------------------------------
## smoke | .4303658 .1391096 .007549 .4289995 .1707217 .6865947
## |
## age |
## 45-54 | 1.998429 .999939 .059009 1.577177 1.131391 3.676881
## 55-64 | 3.094335 .9058428 .051194 2.725766 2.301714 4.662266
## 65-74 | 3.864905 .9757828 .069524 3.459085 3.051721 5.506548
## 75-84 | 4.183364 .9209916 .066947 3.823151 3.328629 5.750522
## |_cons | -8.904458 1.051759 .084603 -8.509256 -10.7048 -7.873415
##
## ------------------------------------------------------------------------------in the model as the offset.
## Note: Variable log_py is included for model parameters.
## Note: Default priors are used values are used for multiple chains.
## Note: Default initial
## Note: Adaptation continues during simulation.in at least one of the
## Note: There is a high autocorrelation after 500 lags
## chains.
##
## statistics Number of chains = 4
## Posterior summary sample size = 16,000
## MCMC
##
## ------------------------------------------------------------------------------
## | Equal-taileddev. MCSE Median [95% cred. interval]
## deaths | Mean Std.
## -------------+----------------------------------------------------------------
## smoke | .4303658 .1391096 .007549 .4289995 .1707217 .6865947
## |
## age |
## 45-54 | 1.998429 .999939 .059009 1.577177 1.131391 3.676881
## 55-64 | 3.094335 .9058428 .051194 2.725766 2.301714 4.662266
## 65-74 | 3.864905 .9757828 .069524 3.459085 3.051721 5.506548
## 75-84 | 4.183364 .9209916 .066947 3.823151 3.328629 5.750522
## |_cons | -8.904458 1.051759 .084603 -8.509256 -10.7048 -7.873415
##
## ------------------------------------------------------------------------------
##
## as PNG format
## file post1.png saved
##
## as PNG format
## file post3.png saved
##
## 1 2 3 4 5
## as PNG format
## file diag_age2.png saved as PNG format
## file diag_age3.png saved as PNG format
## file diag_age4.png saved as PNG format ## file diag_age5.png saved
Fitting the models in Stan and JAGS
Poisson Model dataset
<-MLGdata::Britishdoc |>
(datamutate(person.years = ifelse(person.years==18793,18790,person.years)))
age | smoke | person.years | deaths |
---|---|---|---|
35-44 | n | 18790 | 2 |
35-44 | y | 52407 | 32 |
45-54 | n | 10673 | 12 |
45-54 | y | 43248 | 104 |
55-64 | n | 5710 | 28 |
55-64 | y | 28612 | 206 |
65-74 | n | 2585 | 28 |
65-74 | y | 12663 | 186 |
75-84 | n | 1462 | 31 |
75-84 | y | 5317 | 102 |
Poisson Model STAN
- brms package uses
STAN
bayesian software in the backend
library(brms)
# Convert age to factor if not already
$age <- factor(data$age)
data
# Fit model
<- brm(
brm_fit ~ smoke + age + offset(log(person.years)),
deaths data = data,
family = poisson(),
prior = c(
set_prior("normal(0, 0.04)", class = "Intercept"),
set_prior("normal(0, 0.04)", class = "b")
),chains = 4,
iter = 4000,
warmup = 1000,
thin = 1, seed = 678,
control = list(adapt_delta = 0.95)
)## Running "C:/PROGRA~1/R/R-44~1.3/bin/x64/Rcmd.exe" SHLIB foo.c
## using C compiler: 'gcc.exe (GCC) 13.3.0'
## gcc -I"C:/PROGRA~1/R/R-44~1.3/include" -DNDEBUG -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/Rcpp/include/" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppEigen/include/" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppEigen/include/unsupported" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/BH/include" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/StanHeaders/include/src/" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/StanHeaders/include/" -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/r1953/AppData/Local/R/win-library/4.4/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include "C:/Users/r1953/AppData/Local/R/win-library/4.4/StanHeaders/include/stan/math/prim/fun/Eigen.hpp" -std=c++1y -I"C:/RBuildTools/4.4/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c foo.c -o foo.o
## cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
## In file included from C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppEigen/include/Eigen/Core:19,
## from C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppEigen/include/Eigen/Dense:1,
## from C:/Users/r1953/AppData/Local/R/win-library/4.4/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
## from <command-line>:
## C:/Users/r1953/AppData/Local/R/win-library/4.4/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
## 679 | #include <cmath>
## | ^~~~~~~
## compilation terminated.
## make: *** [C:/PROGRA~1/R/R-44~1.3/etc/x64/Makeconf:289: foo.o] Error 1
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 4e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
## Chain 1: Iteration: 400 / 4000 [ 10%] (Warmup)
## Chain 1: Iteration: 800 / 4000 [ 20%] (Warmup)
## Chain 1: Iteration: 1001 / 4000 [ 25%] (Sampling)
## Chain 1: Iteration: 1400 / 4000 [ 35%] (Sampling)
## Chain 1: Iteration: 1800 / 4000 [ 45%] (Sampling)
## Chain 1: Iteration: 2200 / 4000 [ 55%] (Sampling)
## Chain 1: Iteration: 2600 / 4000 [ 65%] (Sampling)
## Chain 1: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 1: Iteration: 3400 / 4000 [ 85%] (Sampling)
## Chain 1: Iteration: 3800 / 4000 [ 95%] (Sampling)
## Chain 1: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.033 seconds (Warm-up)
## Chain 1: 0.091 seconds (Sampling)
## Chain 1: 0.124 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 6e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup)
## Chain 2: Iteration: 400 / 4000 [ 10%] (Warmup)
## Chain 2: Iteration: 800 / 4000 [ 20%] (Warmup)
## Chain 2: Iteration: 1001 / 4000 [ 25%] (Sampling)
## Chain 2: Iteration: 1400 / 4000 [ 35%] (Sampling)
## Chain 2: Iteration: 1800 / 4000 [ 45%] (Sampling)
## Chain 2: Iteration: 2200 / 4000 [ 55%] (Sampling)
## Chain 2: Iteration: 2600 / 4000 [ 65%] (Sampling)
## Chain 2: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 2: Iteration: 3400 / 4000 [ 85%] (Sampling)
## Chain 2: Iteration: 3800 / 4000 [ 95%] (Sampling)
## Chain 2: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.036 seconds (Warm-up)
## Chain 2: 0.093 seconds (Sampling)
## Chain 2: 0.129 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 6e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup)
## Chain 3: Iteration: 400 / 4000 [ 10%] (Warmup)
## Chain 3: Iteration: 800 / 4000 [ 20%] (Warmup)
## Chain 3: Iteration: 1001 / 4000 [ 25%] (Sampling)
## Chain 3: Iteration: 1400 / 4000 [ 35%] (Sampling)
## Chain 3: Iteration: 1800 / 4000 [ 45%] (Sampling)
## Chain 3: Iteration: 2200 / 4000 [ 55%] (Sampling)
## Chain 3: Iteration: 2600 / 4000 [ 65%] (Sampling)
## Chain 3: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 3: Iteration: 3400 / 4000 [ 85%] (Sampling)
## Chain 3: Iteration: 3800 / 4000 [ 95%] (Sampling)
## Chain 3: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.035 seconds (Warm-up)
## Chain 3: 0.094 seconds (Sampling)
## Chain 3: 0.129 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 6e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup)
## Chain 4: Iteration: 400 / 4000 [ 10%] (Warmup)
## Chain 4: Iteration: 800 / 4000 [ 20%] (Warmup)
## Chain 4: Iteration: 1001 / 4000 [ 25%] (Sampling)
## Chain 4: Iteration: 1400 / 4000 [ 35%] (Sampling)
## Chain 4: Iteration: 1800 / 4000 [ 45%] (Sampling)
## Chain 4: Iteration: 2200 / 4000 [ 55%] (Sampling)
## Chain 4: Iteration: 2600 / 4000 [ 65%] (Sampling)
## Chain 4: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 4: Iteration: 3400 / 4000 [ 85%] (Sampling)
## Chain 4: Iteration: 3800 / 4000 [ 95%] (Sampling)
## Chain 4: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.033 seconds (Warm-up)
## Chain 4: 0.097 seconds (Sampling)
## Chain 4: 0.13 seconds (Total)
## Chain 4:
# Summary and diagnostics
summary(brm_fit)
## Family: poisson
## Links: mu = log
## Formula: deaths ~ smoke + age + offset(log(person.years))
## Data: data (Number of observations: 10)
## Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
## total post-warmup draws = 12000
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.75 0.03 -3.81 -3.70 1.00 14341 8536
## smokey -0.51 0.03 -0.57 -0.46 1.00 14190 10102
## age45M54 -0.19 0.03 -0.25 -0.13 1.00 17261 8728
## age55M64 0.14 0.03 0.08 0.20 1.00 17112 8427
## age65M74 0.48 0.03 0.42 0.55 1.00 16093 9092
## age75M84 0.64 0.03 0.57 0.71 1.00 15625 9183
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(brm_fit)
pp_check(brm_fit)
Poisson Model JAGS
- Rjags package uses
JAGS
bayesian software in the backend
library(rjags)
<- "
jags_code model {
for (i in 1:N) {
deaths[i] ~ dpois(mu[i])
log(mu[i]) <- log(person_years[i]) + beta0 + beta_smoke*smoke[i] + beta_age[age[i]]
}
# Reference group (age category 1)
beta_age[1] <- 0
# Priors
beta0 ~ dnorm(0, 0.01)
beta_smoke ~ dnorm(0, 0.04)
for (k in 2:5) {
beta_age[k] ~ dnorm(0, 0.04)
}
# Derived quantities
IRR_smoke <- exp(beta_smoke)
for (k in 2:5) {
IRR_age[k] <- exp(beta_age[k])
}
}
"
<- list(
jags_data N = nrow(data),
deaths = data$deaths,
person_years = data$person.years,
smoke = data$smoke,
age = data$age
)
<- jags.model(textConnection(jags_code),
jags_fit data = jags_data,
n.chains = 4,
n.adapt = 1000)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 10
## Unobserved stochastic nodes: 6
## Total graph size: 87
##
## Initializing model
update(jags_fit, 3000)
<- coda.samples(jags_fit,
samples c("beta_smoke","beta_age","IRR_smoke","IRR_age"),
n.iter = 10000,
thin = 2)
library(coda)
plot(samples,
density = FALSE)
par(mfrow = c(1,2))
autocorr.plot(samples)
summary(samples)
##
## Iterations = 4002:14000
## Thinning interval = 2
## Number of chains = 4
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## IRR_age[2] 4.4520 0.8676 0.006135 0.029480
## IRR_age[3] 13.9745 2.5551 0.018067 0.094903
## IRR_age[4] 28.7685 5.2726 0.037283 0.192853
## IRR_age[5] 40.7859 7.7873 0.055065 0.271373
## IRR_smoke 1.4377 0.1519 0.001074 0.006562
## beta_age[1] 0.0000 0.0000 0.000000 0.000000
## beta_age[2] 1.4750 0.1906 0.001348 0.006409
## beta_age[3] 2.6212 0.1783 0.001261 0.006399
## beta_age[4] 3.3431 0.1786 0.001263 0.006420
## beta_age[5] 3.6908 0.1865 0.001319 0.006395
## beta_smoke 0.3575 0.1052 0.000744 0.004567
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## IRR_age[2] 3.0607 3.8348 4.354 4.9646 6.4050
## IRR_age[3] 9.8505 12.1452 13.671 15.4678 19.7707
## IRR_age[4] 20.3394 24.9984 28.155 31.8616 40.5896
## IRR_age[5] 28.2888 35.2270 39.850 45.3397 58.2964
## IRR_smoke 1.1679 1.3325 1.426 1.5346 1.7570
## beta_age[1] 0.0000 0.0000 0.000 0.0000 0.0000
## beta_age[2] 1.1186 1.3441 1.471 1.6023 1.8571
## beta_age[3] 2.2875 2.4969 2.615 2.7388 2.9842
## beta_age[4] 3.0126 3.2188 3.338 3.4614 3.7035
## beta_age[5] 3.3425 3.5618 3.685 3.8142 4.0655
## beta_smoke 0.1552 0.2871 0.355 0.4283 0.5636
(c) Bayesian Formulation
Model Specification
The Poisson regression model has the following structure:
\[\begin{align*} \text{Deaths}_i &\sim \text{Poisson}(\lambda_i) \\ \log(\lambda_i) &= \beta_0 + \beta_{\text{smoke}} \text{smoke}_i + \sum_{k=1}^K \beta_{\text{age}_k} \text{age}_{ik} + \log(\text{person.years}_i) \end{align*}\]
where: + \(\text{age}_{ik}\) is an indicator for the \(k\)-th age category + \(K\) is the number of age categories + \(\text{person.years}_i\) is the offset
Likelihood and Priors
The complete Bayesian formulation is:
\[\begin{align*} \text{Likelihood:} & \\ &\prod_{i=1}^n \frac{e^{-\lambda_i}\lambda_i^{y_i}}{y_i!}, \quad y_i = \text{deaths}_i \\ \\ \text{Priors:} & \\ \beta_0 &\sim \mathcal{N}(0, 0.04) \\ \beta_{\text{smoke}} &\sim \mathcal{N}(0, 0.04) \\ \beta_{\text{age}_k} &\sim \mathcal{N}(0, 0.04) \quad \text{for } k = 1,\ldots,K \\ \end{align*}\]
Joint Distribution Factorization
The joint posterior distribution factors as:
\[\begin{align*} p(\beta_0, \beta_{\text{smoke}}, \boldsymbol{\beta}_{\text{age}} \mid \mathbf{y}) &\propto \left[\prod_{i=1}^n \text{Poisson}(y_i \mid \lambda_i)\right] \\ &\times \mathcal{N}(\beta_0 \mid 0, 0.04) \\ &\times \mathcal{N}(\beta_{\text{smoke}} \mid 0, 0.04) \\ &\times \prod_{k=1}^K \mathcal{N}(\beta_{\text{age}_k} \mid 0, 0.04) \end{align*}\]
Conjugacy Analysis
- Non-conjugate components:
- The Poisson likelihood with normal priors on the log-linear parameters is not conjugate
Potentially conjugate components:
- If using Gamma priors for the Poisson rate parameters, we would have conjugacy
- In this model with normal priors on the coefficients, all full conditionals require Metropolis-Hastings or Hamiltonian Monte Carlo updates
Full Conditionals
The model requires MCMC sampling as no closed-form conditionals exist:
\[ \begin{align} p(\beta_0 \mid \cdot) &\propto \exp\left\{ \sum_{i=1}^n \left( y_i\eta_i - e^{\eta_i} \right) - \frac{\beta_0^2}{0.08} \right\} \\ p(\beta_{\text{smoke}} \mid \cdot) &\propto \exp\left\{ \sum_{i=1}^n \left( y_i\eta_i - e^{\eta_i} \right) - \frac{\beta_{\text{smoke}}^2}{0.08} \right\} \\ p(\beta_{\text{age}_k} \mid \cdot) &\propto \exp\left\{ \sum_{i:x_{\text{age}_k,i}=1} \left( y_i\eta_i - e^{\eta_i} \right) - \frac{\beta_{\text{age}_k}^2}{0.08} \right\} \end{align} \]