Introduction

This document provides a comprehensive explanation of how INLA (Integrated Nested Laplace Approximation) works, based on a textbook example of normally distributed data with independent prior distributions.

What is INLA?

INLA is a computational method for Bayesian inference that avoids Markov Chain Monte Carlo (MCMC) by using:

  • Analytical approximations where possible
  • Laplace approximations for latent Gaussian models
  • Numerical integration over a grid of hyperparameters

The key advantage is speed: INLA can be hundreds or thousands of times faster than MCMC for certain classes of models.

Laplace Approximation: The Foundation

What is Laplace Approximation?

The Laplace approximation is fundamentally a second-order Taylor expansion around the mode of a log-posterior (or log-likelihood), which yields a Gaussian approximation.

The Core Idea

For a probability density function \(p(\theta)\), we approximate \(\log p(\theta)\) using a Taylor series around the mode \(\hat{\theta}\):

\[\log p(\theta) \approx \log p(\hat{\theta}) + \underbrace{\frac{d}{d\theta}\log p(\theta)\bigg|_{\hat{\theta}}}_{=0 \text{ at mode}} + \frac{1}{2}\frac{d^2}{d\theta^2}\log p(\theta)\bigg|_{\hat{\theta}}(\theta - \hat{\theta})^2\]

The first derivative term vanishes because we’re at the maximum (mode). This leaves:

\[\log p(\theta) \approx \log p(\hat{\theta}) - \frac{1}{2}|H|(\theta - \hat{\theta})^2\]

Where \(|H| = -\frac{d^2}{d\theta^2}\log p(\theta)\big|_{\hat{\theta}}\) is the negative Hessian (observed Fisher information).

Result: A Normal Distribution

Exponentiating gives:

\[p(\theta) \approx p(\hat{\theta}) \exp\left(-\frac{1}{2}|H|(\theta - \hat{\theta})^2\right)\]

Which is proportional to a Normal distribution with: - Mean = \(\hat{\theta}\) (the mode) - Variance = \(1/|H|\) (inverse of the negative second derivative)

INLA’s Use of Laplace

INLA takes this further with nested approximations:

  1. Inner level: Laplace approximation for latent effects \(\theta\) given hyperparameters \(\psi\)
  2. Outer level: Numerical integration over hyperparameters \(\psi\)

The key insight from this document is that INLA chooses \(\theta = \theta_n\) (the mode of \(p(\theta|\psi,y)\)) to evaluate \(p(\psi|y)\) efficiently, using the fact that the denominator \(p(\theta_n|\psi,y)\) has a simple Normal form.

Important Caveats

  • Accuracy improves as sample size increases (posterior becomes more Gaussian by Bernstein-von Mises theorem)
  • May fail for multimodal distributions or when mode is at boundary
  • INLA extends the basic Laplace approximation with simplified Laplace and adaptive quadrature for better accuracy

This is why in the R code later, den[h] <- dnorm(theta.n[h], theta.n[h], sd = sqrt(sigma2.n[h])) works - it’s evaluating the normalizing constant of the Laplace-approximated conditional posterior!

Mathematical Background

The Model

We have normally distributed data:

\[y_i \mid \mu, \sigma^2 \sim \text{Normal}(\mu, \sigma^2), \quad i = 1, \dots, n\]

Where:

  • \(y_i\) are independent observations
  • \(\mu\) is the unknown population mean
  • \(\sigma^2\) is the unknown variance

Prior Distributions

We choose independent priors:

  • For \(\mu\): \(\mu \sim \text{Normal}(\mu_0, \sigma_0^2)\)
  • For precision \(\psi = 1/\sigma^2\): \(\psi \sim \text{Gamma}(a, b)\)

Note: INLA typically works with precision (\(\psi = 1/\sigma^2\)) rather than variance because precisions have nicer mathematical properties in Gaussian models.

The Goal

We want the marginal posterior distributions:

\[p(\psi \mid \mathbf{y}) = \int p(\theta, \psi \mid \mathbf{y}) d\theta\]

\[p(\theta \mid \mathbf{y}) = \int p(\theta, \psi \mid \mathbf{y}) d\psi\]

Direct integration is computationally challenging. INLA solves this with a clever trick.

The INLA Trick

Step 1: Bayes’ Theorem

The joint posterior is:

\[p(\theta, \psi \mid \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta) \times p(\psi)\]

Step 2: The Key Relationship

INLA uses the relationship:

\[p(\psi \mid \mathbf{y}) = \frac{p(\theta, \psi \mid \mathbf{y})}{p(\theta \mid \psi, \mathbf{y})} \propto \frac{p(\mathbf{y} \mid \theta, \psi) \times p(\theta) \times p(\psi)}{p(\theta \mid \psi, \mathbf{y})}\]

This holds for any value of \(\theta\)!

Step 3: Choose \(\theta = \theta_n\)

For normal data with normal prior, the conditional posterior is:

\[\theta \mid \psi, \mathbf{y} \sim \text{Normal}(\theta_n, \sigma_n^2)\]

Where:

\[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

\[\sigma_n^2 = \frac{1}{n\psi + 1/\sigma_0^2}\]

By choosing \(\theta = \theta_n\) (the conditional posterior mode/mean), the denominator becomes:

\[p(\theta_n \mid \psi, \mathbf{y}) = \frac{1}{\sqrt{2\pi \sigma_n^2}}\]

Step 4: The Simplified Formula

\[p(\psi \mid \mathbf{y}) \propto \left. \frac{p(\mathbf{y} \mid \theta, \psi) p(\theta) p(\psi)}{p(\theta \mid \psi, \mathbf{y})} \right|_{\theta = \theta_n} = \frac{p(\mathbf{y} \mid \theta_n, \psi) \times p(\theta_n) \times p(\psi)}{1 / \sqrt{2\pi \sigma_n^2}}\]

Bayesian Derivation of \(\theta_n\)

The Conditional Posterior Distribution

The derivation of \(\theta_n\) comes directly from Bayes’ theorem:

\[p(\theta \mid \psi, \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta)\]

The Likelihood

\[p(\mathbf{y} \mid \theta, \psi) = \prod_{i=1}^n \sqrt{\frac{\psi}{2\pi}} \exp\left(-\frac{\psi}{2}(y_i - \theta)^2\right)\]

The Prior for \(\theta\)

\[p(\theta) = \frac{1}{\sqrt{2\pi \sigma_0^2}} \exp\left(-\frac{1}{2\sigma_0^2}(\theta - \mu_0)^2\right)\]

Combining and Completing the Square

Multiplying and completing the square gives:

\[p(\theta \mid \psi, \mathbf{y}) \propto \exp\left(-\frac{1}{2}\left[(n\psi + 1/\sigma_0^2)\theta^2 - 2\theta(\psi \sum y_i + \mu_0 / \sigma_0^2)\right]\right)\]

The Resulting Normal Distribution

This is a Normal distribution with:

Mean (\(\theta_n\)): \[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

Variance (\(\sigma_n^2\)): \[\sigma_n^2 = \frac{1}{n\psi + 1/\sigma_0^2}\]

Intuitive Interpretation

\(\theta_n\) is a weighted average of:

  • The sample mean \(\bar{y}\) with weight \(n\psi\) (data precision × sample size)
  • The prior mean \(\mu_0\) with weight \(1/\sigma_0^2\) (prior precision)

\[\theta_n = \frac{(n\psi) \cdot \bar{y} + (1/\sigma_0^2) \cdot \mu_0}{n\psi + 1/\sigma_0^2}\]


The INLA Algorithm

Overview

  1. Choose a grid of \(\psi\) values
  2. For each \(\psi^{(j)}\):
    • Compute \(\theta_n\) (conditional posterior mode)
    • Compute likelihood \(p(\mathbf{y} \mid \theta_n, \psi)\)
    • Compute prior \(p(\theta_n)\)
    • Compute prior \(p(\psi)\)
    • Compute denominator \(p(\theta_n \mid \psi, \mathbf{y})\)
    • Calculate unnormalized posterior
  3. Normalize \(p(\psi \mid \mathbf{y})\) numerically
  4. For each \(\theta\):
    • Compute conditional \(p(\theta \mid \psi, \mathbf{y})\) for all \(\psi\)
    • Integrate out \(\psi\): \(p(\theta \mid \mathbf{y}) = \int p(\theta \mid \psi, \mathbf{y}) p(\psi \mid \mathbf{y}) d\psi\)

Frequently Asked Questions

Q1: What is \(\theta_n\)?

Answer: \(\theta_n\) is the conditional posterior mode/mean of \(\theta\) given \(\psi\) and the data.

\[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

Intuition: \(\theta_n\) combines two sources of information: - The data (through \(\bar{y}\)) - The prior (through \(\mu_0\))

Examples: - When \(\psi\) is large (precise data), \(\theta_n \approx \bar{y}\) (data dominates) - When \(\sigma_0^2\) is large (vague prior), \(\theta_n \approx \bar{y}\) (prior is weak) - When \(\sigma_0^2\) is small (informative prior), \(\theta_n\) is pulled toward \(\mu_0\)

Q2: How is \(p(\theta \mid \psi, \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta)\) derived?

Answer: This is Bayes’ theorem applied conditionally on \(\psi\).

Step-by-step derivation:

  1. Start with Bayes’ theorem: \(p(\theta, \psi \mid \mathbf{y}) = \frac{p(\mathbf{y} \mid \theta, \psi) p(\theta, \psi)}{p(\mathbf{y})}\)
  2. Factor the joint prior: \(p(\theta, \psi) = p(\theta \mid \psi) \times p(\psi)\)
  3. Focus on \(\theta\) given \(\psi\): \(p(\theta \mid \psi, \mathbf{y}) = \frac{p(\theta, \psi \mid \mathbf{y})}{p(\psi \mid \mathbf{y})}\)
  4. Substitute 1 to 3 and simplify
  5. With independent priors (\(\theta \perp \psi\)), \(p(\theta \mid \psi) = p(\theta)\)

Q3: What is the purpose of creating a grid for \(\psi\)?

Answer: We discretize \(\psi\) to perform numerical integration. This does NOT mean we know \(\psi\)’s values!

Analogy: Think of it like a treasure hunt: - You don’t know where the treasure is (the true \(\psi\)) - But you decide to search at specific locations (grid points) - At each location, you measure “how likely” the treasure is there - Then you use these measurements to infer where the treasure probably is

Why we need a grid: - The integral \(\int p(\theta, \psi \mid \mathbf{y}) d\theta\) has no closed-form solution - Numerical integration requires evaluating the function at specific points

Grid considerations: - Too few points: May miss important features - Too many points: Computationally expensive - Wrong range: May miss the posterior entirely

Q4: After integration, why can the constant be used for normalizing?

Answer: The unnormalized posterior post.psi is proportional to the true density:

\[\text{post.psi} = c \times p(\psi \mid \mathbf{y})\]

Where \(c\) is an unknown constant, and the true density satisfies \(\int p(\psi \mid \mathbf{y}) d\psi = 1\).

Therefore:

\[\int \text{post.psi} \, d\psi = \int c \times p(\psi \mid \mathbf{y}) d\psi = c \times 1 = c\]

So the definite integral of the unnormalized density equals the normalizing constant! Then we can recover the true density by division: \(\frac{\text{post.psi}}{c} = p(\psi \mid \mathbf{y})\).

Q5: What is the purpose of Delta <- 1/sum(post.psi) in the code?

Answer: This converts the density to a probability mass function for numerical integration.

Explanation: We’re approximating the continuous integral:

\[p(\theta \mid \mathbf{y}) = \int p(\theta \mid \psi, \mathbf{y}) p(\psi \mid \mathbf{y}) d\psi\]

As a Riemann sum:

\[p(\theta \mid \mathbf{y}) \approx \sum_j p(\theta \mid \psi^{(j)}, \mathbf{y}) \times [p(\psi^{(j)} \mid \mathbf{y}) \times \Delta \psi_j]\]

The code combines steps: - post.psi is the density \(p(\psi^{(j)} \mid \mathbf{y})\) - Multiplying by Delta = 1/sum(post.psi) converts it to probability mass

The three-way comparison (INLA vs Manual vs MCMC) serves as validation

Method Type Role
INLA Package Professional software Reference implementation
Manual (Step-by-step) Our implementation Validates our understanding
MCMC/OpenBUGS Gold standard Validates both above

If all three match (which they should for this model), we can be confident that: - Our understanding of INLA is correct - Our manual implementation is correct - The INLA approximation is accurate


R Implementation

Step 1: Load Data and Set Parameters

# Load the data
y <- c(1.2697, 7.7637, 2.2532, 3.4557, 4.1776, 6.4320, -3.6623, 7.7567,
       5.9032, 7.2671, -2.3447, 8.0160, 3.5013, 2.8495, 0.6467, 3.2371,
       5.8573, -3.3749, 4.1507, 4.3092, 11.7327, 2.6174, 9.4942, -2.7639,
       -1.5859, 3.6986, 2.4544, -0.3294, 0.2329, 5.2846)

n <- length(y)
ybar <- mean(y)

# Prior parameters
mu0 <- -3          # prior mean for μ
sigma2_0 <- 4      # prior variance for μ
a <- 1.6           # gamma shape
b <- 0.4           # gamma rate

Step 2: Create Grid for \(\psi\)

# Grid parameters
H <- 25                    # Number of grid points
psi.min <- 0.001          # Minimum precision
psi.max <- 0.3            # Maximum precision

# Create grid
psi.grid <- seq(psi.min, psi.max, length.out = H)

# Prior for psi at grid points
hprior <- dgamma(psi.grid, shape = a, rate = b)

Step 3: Main Loop - Compute Unnormalized Posterior for \(\psi\)

# Initialize vectors
theta.n <- numeric(H)
sigma2.n <- numeric(H)
prior <- numeric(H)
lik <- numeric(H)
num <- numeric(H)
den <- numeric(H)

for (h in 1:H) {
  # Step 1: Compute θₙ (conditional posterior mode)
  theta.n[h] <- (psi.grid[h] * n * ybar + mu0 / sigma2_0) / 
                (psi.grid[h] * n + 1 / sigma2_0)
  
  # Step 2: Compute σₙ² (conditional posterior variance)
  sigma2.n[h] <- 1 / (n * psi.grid[h] + 1 / sigma2_0)
  
  # Step 3: Prior for θ at θₙ
  prior[h] <- dnorm(theta.n[h], mu0, sd = sqrt(sigma2_0))
  
  # Step 4: Likelihood of data given θₙ and ψ
  lik[h] <- prod(dnorm(y, theta.n[h], sd = 1 / sqrt(psi.grid[h])))
  
  # Step 5: Numerator = prior(ψ) × prior(θ) × likelihood
  num[h] <- hprior[h] * prior[h] * lik[h]
  
  # Step 6: Denominator = p(θₙ|ψ,y) at mode = 1/√(2πσₙ²)
  den[h] <- dnorm(theta.n[h], theta.n[h], sd = sqrt(sigma2.n[h]))
}

# Unnormalized posterior for ψ
post.psi <- num / den

Step 4: Normalize \(p(\psi \mid \mathbf{y})\)

# Create interpolation function
f.psi <- approxfun(psi.grid, post.psi, 
                   yleft = min(psi.grid), 
                   yright = max(psi.grid))

# Integrate to find normalizing constant
const <- integrate(f.psi, min(psi.grid), max(psi.grid))

# Normalize
post.psi <- post.psi / const$value

cat("Normalizing constant:", const$value, "\n")
## Normalizing constant: 9.169797e-42
cat("Posterior integrates to:", sum(post.psi) * diff(psi.grid)[1], "\n")
## Posterior integrates to: 0.9902876

Step 5: Compute Full Conditional for \(\theta\)

# Create grid for θ
J <- 50
theta.grid <- seq(-8, 5, length.out = J)

# Full conditional distributions p(θ|ψ,y)
full.cond.theta <- matrix(NA, J, H)

for (j in 1:J) {
  for (h in 1:H) {
    full.cond.theta[j, h] <- dnorm(theta.grid[j], 
                                   theta.n[h], 
                                   sd = sqrt(sigma2.n[h]))
  }
}

Step 6: Integrate Out \(\psi\) to Get \(p(\theta \mid \mathbf{y})\)

# Convert density to probability mass for numerical integration
Delta <- 1 / sum(post.psi)

# Weighted joint posterior
joint.post.theta.psi <- matrix(NA, J, H)
for (h in 1:H) {
  joint.post.theta.psi[, h] <- full.cond.theta[, h] * post.psi[h] * Delta
}

# Marginal posterior for θ (sum over ψ)
marg.post.theta <- rowSums(joint.post.theta.psi)

# Normalize
f.theta <- approxfun(theta.grid, marg.post.theta, 
                     yleft = min(theta.grid), 
                     yright = max(theta.grid))
const_theta <- integrate(f.theta, min(theta.grid), max(theta.grid))
marg.post.theta <- marg.post.theta / const_theta$value

Visualization of Results

Figure 1: Marginal Posterior of \(\psi\)

plot(psi.grid, post.psi, type = "l", lwd = 2,
     xlab = expression(psi), ylab = "Density",
     main = "Marginal Posterior of ψ",
     col = "blue")
grid()
Marginal Posterior of ψ

Marginal Posterior of ψ

Figure 2: Full Conditional Distributions \(p(\theta \mid \psi, \mathbf{y})\)

plot(theta.grid, full.cond.theta[, 1], type = "l", lwd = 0.7,
     xlab = expression(theta), ylab = "Density",
     main = "Full Conditional Distributions p(θ | ψ, y)",
     ylim = range(full.cond.theta))

for (h in 1:H) {
  lines(theta.grid, full.cond.theta[, h], lwd = 0.7)
}
grid()
Full Conditional Distributions p(θ | ψ, y)

Full Conditional Distributions p(θ | ψ, y)

Figure 3: Marginal Posterior of \(\theta\)

plot(theta.grid, marg.post.theta, type = "l", lwd = 2,
     xlab = expression(theta), ylab = "Density",
     main = "Marginal Posterior of θ",
     col = "darkred")
grid()
Marginal Posterior of θ

Marginal Posterior of θ


Verification with INLA Package

# Load INLA (uncomment to install)
# install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable")
library(INLA)

# Run INLA
formula <- y ~ 1
inla.output <- inla(formula, 
                    data = data.frame(y = y),
                    control.family = list(
                      hyper = list(
                        prec = list(prior = "loggamma", 
                                   param = c(a, b))
                      )
                    ),
                    control.fixed = list(
                      mean.intercept = mu0, 
                      prec.intercept = 1 / sigma2_0
                    ),
                    control.compute = list(config = TRUE),
                    verbose = FALSE)

# Extract results
inla_psi <- inla.output$marginals.hyperpar[[1]]
inla_theta <- inla.output$marginals.fixed$"(Intercept)"

# Interpolate INLA results to manual grid
inla_psi_interp <- approx(inla_psi[, 1], inla_psi[, 2], 
                          xout = psi.grid)$y
inla_theta_interp <- approx(inla_theta[, 1], inla_theta[, 2], 
                            xout = theta.grid)$y

Comparison Plots

par(mfrow = c(1, 2))

# ψ posterior comparison
plot(psi.grid, post.psi, type = "l", lwd = 2, col = "blue",
     xlab = expression(psi), ylab = "Density",
     main = expression("Posterior of " * psi),
     ylim = range(c(post.psi, inla_psi_interp), na.rm = TRUE))
lines(psi.grid, inla_psi_interp, col = "red", lwd = 2, lty = 2)
legend("topright", 
       legend = c("Manual", "INLA Package"),
       col = c("blue", "red"), lty = c(1, 2), lwd = 2)

# θ posterior comparison
plot(theta.grid, marg.post.theta, type = "l", lwd = 2, col = "blue",
     xlab = expression(theta), ylab = "Density",
     main = expression("Posterior of " * theta),
     ylim = range(c(marg.post.theta, inla_theta_interp), na.rm = TRUE))
lines(theta.grid, inla_theta_interp, col = "red", lwd = 2, lty = 2)
legend("topright",
       legend = c("Manual", "INLA Package"),
       col = c("blue", "red"), lty = c(1, 2), lwd = 2)

Numerical Comparison

# Error metrics for ψ
mae_psi <- mean(abs(post.psi - inla_psi_interp), na.rm = TRUE)
rmse_psi <- sqrt(mean((post.psi - inla_psi_interp)^2, na.rm = TRUE))
cor_psi <- cor(post.psi, inla_psi_interp, use = "complete.obs")

# Error metrics for θ
mae_theta <- mean(abs(marg.post.theta - inla_theta_interp), na.rm = TRUE)
rmse_theta <- sqrt(mean((marg.post.theta - inla_theta_interp)^2, na.rm = TRUE))
cor_theta <- cor(marg.post.theta, inla_theta_interp, use = "complete.obs")

# Summary table
comparison <- data.frame(
  Parameter = c("ψ (precision)", "θ (mean)"),
  MAE = c(mae_psi, mae_theta),
  RMSE = c(rmse_psi, rmse_theta),
  Correlation = c(cor_psi, cor_theta)
)

knitr::kable(comparison, digits = 6,
             caption = "Comparison between Manual Implementation and INLA Package")
Comparison between Manual Implementation and INLA Package
Parameter MAE RMSE Correlation
ψ (precision) 0.136572 0.162064 0.999973
θ (mean) 0.002716 0.003712 0.999899

Verification with OpenBUGS/(MCMC)

OpenBUGS Model File

Save this as “modelWorkingINLA_notcancel.txt”:

model {
    # Likelihood: Normal distribution for the data
    for (i in 1:n) {
        y[i] ~ dnorm(theta, psi)
    }
    
    # Prior for theta (mean)
    theta ~ dnorm(mu0, tau0)
    
    # Prior for psi (precision)
    psi ~ dgamma(a, b)
    
    # Optional: Transform psi to variance for easier interpretation
    sigma2 <- 1/psi
}
library(R2OpenBUGS)
working.dir <- getwd()
filein <- "modelWorkingINLA_notcancel.txt"
params <- c("theta","psi")
dataJags <- list(y=y,mu0=mu0,tau0=1/sigma2_0,a=a,b=b,n=n)
inits <- function(){
    list(theta=rnorm(1),psi=runif(1))
}
n.iter <- 10000
n.burnin <- 2000
n.thin <- floor((n.iter-n.burnin)/500)

mcmc <- bugs(dataJags, inits, params, model.file=filein,
    n.chains=2, n.iter, n.burnin, n.thin,
    DIC=TRUE, working.directory=working.dir, debug=FALSE)
psi.mcmc <- mcmc$sims.list$psi
theta.mcmc <- mcmc$sims.list$theta

# *** 
plot(inla.output$marginals.hyperpar[[1]], t="l",lwd=2, xlab=expression(sigma^2), ylab="",
     ylim=range(inla.output$marginals.hyperpar[[1]]))
lines(spline(psi.grid,post.psi),lty=2) #step by step
lines(density(psi.mcmc),lty=3,lwd=3,col='red') #MCMC
legend("topright",c("INLA","Step-by-step","MCMC"),lty=c(1,2,3),lwd=c(2,1,3),cex=2,bty="n")

rg <- range(inla.output$marginals.fixed$"(Intercept)"[,2])
plot(inla.output$marginals.fixed$"(Intercept)",t="l",xlab=expression(mu), ylab="",ylim=rg,lwd=2)
lines(theta.grid,apply(joint.post.theta.psi,1,sum),lty=2)#Step by step
lines(density(theta.mcmc),lty=3,lwd=3,col='red')#MCMC
legend("topright",c("INLA","Step-by-step","MCMC"),lty=c(1,2,3), lwd=c(2,1,3), cex=2,bty="n")

`

Three-Way Comparison Plot, the same as above plots.

par(mfrow = c(1, 2))

# ψ comparison
plot(inla_psi, type = "l", lwd = 2, 
     xlab = expression(psi), ylab = "Density",
     main = expression("Posterior of " * psi),
     ylim = range(c(post.psi, inla_psi[, 2], density(psi.mcmc)$y)))
lines(spline(psi.grid, post.psi), lty = 2, lwd = 2)
lines(density(psi.mcmc), lty = 3, lwd = 2)
legend("topright", 
       legend = c("INLA", "Manual", "MCMC"),
       lty = c(1, 2, 3), lwd = 2)

# θ comparison
plot(inla_theta, type = "l", lwd = 2,
     xlab = expression(theta), ylab = "Density",
     main = expression("Posterior of " * theta),
     ylim = range(c(marg.post.theta, inla_theta[, 2], density(theta.mcmc)$y)))
lines(theta.grid, marg.post.theta, lty = 2, lwd = 2)
lines(density(theta.mcmc), lty = 3, lwd = 2)
legend("topright",
       legend = c("INLA", "Manual", "MCMC"),
       lty = c(1, 2, 3), lwd = 2)


Summary of Key Concepts

What is \(\theta_n\)?

Property Value
Full name Conditional posterior mode/mean of \(\theta\)
What it represents Best estimate of \(\mu\) given \(\psi\) and data
Formula Weighted average of \(\bar{y}\) and \(\mu_0\)
Weights Precisions (\(n\psi\) for data, \(1/\sigma_0^2\) for prior)
Role in INLA Convenient point to evaluate the posterior

Why Create a Grid for \(\psi\)?

Aspect Explanation
Purpose Approximate continuous integration with discrete points
Why needed No closed-form solution for \(p(\psi \mid \mathbf{y})\)
How it works Evaluate posterior at H strategic points
Key insight Grid points are hypotheses, not known values!

Normalization Explained

Step What it does Why it works
Compute post.psi Gets unnormalized density \(c \times p(\psi \mid \mathbf{y})\) INLA gives relative probabilities
Integrate Finds \(\int \text{post.psi} d\psi = c\) Integral of unnormalized density = normalizing constant
Divide Computes \(\text{post.psi} / c = p(\psi \mid \mathbf{y})\) Division yields true probability density