Metropolis-Hastings for Bernoulli with Logistic Normal Prior

1. Problem Restatement

We have:

Observations \(y_1, \ldots, y_n \stackrel{\text{i.i.d.}}{\sim} \text{Bernoulli}(\theta)\)
Prior: \(\log\left( \frac{\theta}{1-\theta} \right) \sim \mathcal{N}(0, 1)\)

Prior density for \(\theta\) is:

\[ \pi(\theta) = \frac{1}{\sqrt{2\pi} \, \theta(1-\theta)} \exp\left[ -\frac12 \left( \log\frac{\theta}{1-\theta} \right)^2 \right], \quad 0<\theta<1 \]

The posterior is:

\[ \pi(\theta \mid y) \propto \pi(\theta) \times \theta^S (1-\theta)^{n-S} \]

where \(S = \sum_{i=1}^n y_i\).

The posterior is not conjugate, so the Bayesian estimator \(E(\theta \mid y)\) is not directly available.

We want the acceptance probability for an independence Metropolis-Hastings algorithm that uses the prior as the proposal distribution, independent of current state.

2. Metropolis-Hastings (MH) Setup

Let the target distribution be the posterior:

\[ \pi(\theta \mid y) \propto \pi(\theta) \times L(\theta) \]

where: \[ L(\theta) = \prod_{i=1}^n \theta^{y_i} (1-\theta)^{1-y_i} = \theta^{S} (1-\theta)^{n-S}, \quad S = \sum_{i=1}^n y_i \]

Hence:

\[ \pi(\theta \mid y) \propto \frac{1}{\sqrt{2\pi} \, \theta(1-\theta)} \exp\left[ -\frac12 \left( \log\frac{\theta}{1-\theta} \right)^2 \right] \times \theta^{S} (1-\theta)^{n-S} \]

Proposal: \(q(\theta' \mid \theta) = \pi(\theta')\) (the prior), independent of current \(\theta\).

3. MH Acceptance Probability Formula

For MH with proposal \(q(\theta' \mid \theta)\), at state \(\theta\), propose \(\theta' \sim q(\cdot|\theta)\) and accept with probability:

\[ \alpha(\theta, \theta') = \min\left\{ 1, \frac{\pi(\theta' \mid y)}{\pi(\theta \mid y)} \times \frac{q(\theta \mid \theta')}{q(\theta' \mid \theta)} \right\} \]

Here \(q(\theta' \mid \theta) = \pi(\theta')\) by design, so \(q(\theta \mid \theta') = \pi(\theta)\).

Thus:

\[ \frac{q(\theta \mid \theta')}{q(\theta' \mid \theta)} = \frac{\pi(\theta)}{\pi(\theta')} \]

4. Simplify Ratio of Target Distributions

\[ \frac{\pi(\theta' \mid y)}{\pi(\theta \mid y)} = \frac{\pi(\theta') L(\theta')}{\pi(\theta) L(\theta)} \]

Multiply by \(\frac{\pi(\theta)}{\pi(\theta')}\):

\[ \alpha = \min\left\{ 1, \frac{\pi(\theta') L(\theta')}{\pi(\theta) L(\theta)} \cdot \frac{\pi(\theta)}{\pi(\theta')} \right\} \]

The \(\pi(\theta)\) and \(\pi(\theta')\) cancel, leaving:

\[ \alpha = \min\left\{ 1, \frac{L(\theta')}{L(\theta)} \right\} \]

5. Result

The acceptance probability is:

\[ \boxed{\alpha(\theta, \theta') = \min\left\{ 1, \frac{(\theta')^{S} (1-\theta')^{n-S}}{\theta^{S} (1-\theta)^{n-S}} \right\}} \]

So the proposal prior cancels out: MH step reduces to accepting with probability \(\min(1, \text{likelihood ratio})\).

6. Algorithm Steps

Initialization:

Draw \(z^{(0)} \sim \mathcal{N}(0, 1)\)
Set \(\theta^{(0)} = \frac{e^{z^{(0)}}}{1 + e^{z^{(0)}}}\) (draw from prior)

For \(t = 1, 2, \ldots, T\):

Step 1: Propose - Draw \(z' \sim \mathcal{N}(0, 1)\) - Set \(\theta' = \frac{e^{z'}}{1 + e^{z'}}\) (draw from prior, independent of current)

Step 2: Compute acceptance probability - Compute \(r = \frac{(\theta')^{S} (1-\theta')^{n-S}}{(\theta^{(t-1)})^{S} (1-\theta^{(t-1)})^{n-S}}\) - Set \(\alpha = \min(1, r)\)

Step 3: Accept or reject - Draw \(u \sim \text{Uniform}(0, 1)\) - If \(u < \alpha\): set \(\theta^{(t)} = \theta'\) (accept) - Else: set \(\theta^{(t)} = \theta^{(t-1)}\) (reject)

Step 4: Repeat steps 1-3 for \(t = 2, \ldots, T\)

7. R Implementation

# Generate synthetic data
set.seed(123)
n <- 100
true_theta <- 0.3
y <- rbinom(n, 1, true_theta)
S <- sum(y)
cat("Observed data: n =", n, ", S =", S, "\n")

## Observed data: n = 100 , S = 29

cat("True theta =", true_theta, "\n")

## True theta = 0.3

# Function to draw from prior (proposal distribution)
draw_from_prior <- function() {
  z <- rnorm(1, 0, 1)
  theta <- exp(z) / (1 + exp(z))
  return(theta)
}

# Function to compute likelihood
likelihood <- function(theta, S, n) {
  return(theta^S * (1-theta)^(n-S))
}

# Metropolis-Hastings independence sampler
mh_independence <- function(S, n, n_iter, initial_theta = NULL) {
  # Initialize: draw from prior if not specified
  if (is.null(initial_theta)) {
    theta_current <- draw_from_prior()
  } else {
    theta_current <- initial_theta
  }
  
  # Store samples
  samples <- numeric(n_iter)
  samples[1] <- theta_current
  acceptance_count <- 0
  
  # MCMC loop
  for (t in 2:n_iter) {
    # Propose from prior (independent of current)
    theta_proposed <- draw_from_prior()
    
    # Compute likelihood ratio
    L_current <- likelihood(theta_current, S, n)
    L_proposed <- likelihood(theta_proposed, S, n)
    
    # Handle numerical stability
    if (L_proposed == 0 && L_current == 0) {
      # Use log scale for very small values
      log_r <- S * (log(theta_proposed) - log(theta_current)) + 
               (n-S) * (log(1-theta_proposed) - log(1-theta_current))
      r <- exp(min(log_r, 700))  # Cap to avoid Inf
    } else {
      r <- L_proposed / L_current
    }
    
    alpha <- min(1, r)
    
    # Accept/reject
    if (runif(1) < alpha) {
      theta_current <- theta_proposed
      acceptance_count <- acceptance_count + 1
    }
    
    samples[t] <- theta_current
  }
  
  acceptance_rate <- acceptance_count / (n_iter - 1)
  
  return(list(samples = samples, acceptance_rate = acceptance_rate))
}

# Run the sampler
n_iter <- 10000
results <- mh_independence(S, n, n_iter)

cat("\nAcceptance rate:", round(results$acceptance_rate * 100, 2), "%\n")

## 
## Acceptance rate: 18.69 %

# Posterior mean (Bayes estimator under squared error loss)
posterior_mean <- mean(results$samples)
cat("Posterior mean (Bayes estimator):", round(posterior_mean, 4), "\n")

## Posterior mean (Bayes estimator): 0.2979

# Compare with MLE
mle <- S / n
cat("MLE:", round(mle, 4), "\n")

## MLE: 0.29

8. Diagnostics

# Trace plot
plot(results$samples, type = "l", 
     main = "Trace Plot of θ",
     xlab = "Iteration", ylab = "θ",
     col = "steelblue")
abline(h = true_theta, col = "red", lwd = 2, lty = 2)
abline(h = posterior_mean, col = "darkgreen", lwd = 2)
legend("topright", 
       legend = c("True θ", "Posterior Mean"),
       col = c("red", "darkgreen"), 
       lty = c(2, 1), lwd = 2)

# Histogram of posterior samples (after burn-in)
burn_in <- 1000
post_burnin <- results$samples[-(1:burn_in)]

hist(post_burnin, breaks = 50, 
     main = "Posterior Distribution of θ",
     xlab = "θ", ylab = "Frequency",
     col = "lightblue", border = "white")
abline(v = true_theta, col = "red", lwd = 2, lty = 2)
abline(v = posterior_mean, col = "darkgreen", lwd = 2)
abline(v = mle, col = "purple", lwd = 2, lty = 3)
legend("topright",
       legend = c("True θ", "Posterior Mean", "MLE"),
       col = c("red", "darkgreen", "purple"),
       lty = c(2, 1, 3), lwd = 2)

# Autocorrelation
acf(post_burnin, main = "Autocorrelation of θ (after burn-in)")

# Effective sample size
library(coda)
effective_size <- effectiveSize(post_burnin)
cat("\nEffective sample size:", round(effective_size, 0), "\n")

## 
## Effective sample size: 1249

cat("Efficiency (ESS/n_iter):", round(effective_size / length(post_burnin) * 100, 2), "%\n")

## Efficiency (ESS/n_iter): 13.88 %

9. Summary Statistics

summary_stats <- data.frame(
  Statistic = c("Mean", "Median", "SD", "2.5% Quantile", "97.5% Quantile"),
  Value = c(mean(post_burnin), 
            median(post_burnin), 
            sd(post_burnin),
            quantile(post_burnin, 0.025),
            quantile(post_burnin, 0.975))
)
print(summary_stats)

##        Statistic      Value
## 1           Mean 0.29827353
## 2         Median 0.29607928
## 3             SD 0.04419668
## 4  2.5% Quantile 0.21820202
## 5 97.5% Quantile 0.38711688

10. Key Points

Initialization: We draw \(\theta^{(0)}\) from the prior (via \(\mathcal{N}(0,1)\) on logit scale)
Each iteration: Propose \(\theta'\) from prior (independent of current state)
Acceptance probability: \(\min(1, \text{likelihood ratio})\) — prior cancels out!
This is an independence sampler — proposals don’t depend on current state
The acceptance rate is determined solely by how well the prior matches the likelihood. If the prior is informative and close to the true \(\theta\), acceptance will be high. Otherwise, it may be low. ```