Introduction

This document provides a comprehensive explanation of how INLA (Integrated Nested Laplace Approximation) works, based on a textbook example of normally distributed data with independent prior distributions.

What is INLA?

INLA is a computational method for Bayesian inference that avoids Markov Chain Monte Carlo (MCMC) by using:

Analytical approximations where possible
Laplace approximations for latent Gaussian models
Numerical integration over a grid of hyperparameters

The key advantage is speed: INLA can be hundreds or thousands of times faster than MCMC for certain classes of models.

Laplace Approximation: The Foundation

What is Laplace Approximation?

The Laplace approximation is fundamentally a second-order Taylor expansion around the mode of a log-posterior (or log-likelihood), which yields a Gaussian approximation.

The Core Idea

For a probability density function \(p(\theta)\), we approximate \(\log p(\theta)\) using a Taylor series around the mode \(\hat{\theta}\):

\[\log p(\theta) \approx \log p(\hat{\theta}) + \underbrace{\frac{d}{d\theta}\log p(\theta)\bigg|_{\hat{\theta}}}_{=0 \text{ at mode}} + \frac{1}{2}\frac{d^2}{d\theta^2}\log p(\theta)\bigg|_{\hat{\theta}}(\theta - \hat{\theta})^2\]

The first derivative term vanishes because we’re at the maximum (mode). This leaves:

\[\log p(\theta) \approx \log p(\hat{\theta}) - \frac{1}{2}|H|(\theta - \hat{\theta})^2\]

Where \(|H| = -\frac{d^2}{d\theta^2}\log p(\theta)\big|_{\hat{\theta}}\) is the negative Hessian (observed Fisher information).

Result: A Normal Distribution

Exponentiating gives:

\[p(\theta) \approx p(\hat{\theta}) \exp\left(-\frac{1}{2}|H|(\theta - \hat{\theta})^2\right)\]

Which is proportional to a Normal distribution with: - Mean = \(\hat{\theta}\) (the mode) - Variance = \(1/|H|\) (inverse of the negative second derivative)

INLA’s Use of Laplace

INLA takes this further with nested approximations:

Inner level: Laplace approximation for latent effects \(\theta\) given hyperparameters \(\psi\)
Outer level: Numerical integration over hyperparameters \(\psi\)

The key insight from this document is that INLA chooses \(\theta = \theta_n\) (the mode of \(p(\theta|\psi,y)\)) to evaluate \(p(\psi|y)\) efficiently, using the fact that the denominator \(p(\theta_n|\psi,y)\) has a simple Normal form.

Important Caveats

Accuracy improves as sample size increases (posterior becomes more Gaussian by Bernstein-von Mises theorem)
May fail for multimodal distributions or when mode is at boundary
INLA extends the basic Laplace approximation with simplified Laplace and adaptive quadrature for better accuracy

This is why in the R code later, den[h] <- dnorm(theta.n[h], theta.n[h], sd = sqrt(sigma2.n[h])) works - it’s evaluating the normalizing constant of the Laplace-approximated conditional posterior!

Mathematical Background

The Model

We have normally distributed data:

\[y_i \mid \mu, \sigma^2 \sim \text{Normal}(\mu, \sigma^2), \quad i = 1, \dots, n\]

Where:

\(y_i\) are independent observations
\(\mu\) is the unknown population mean
\(\sigma^2\) is the unknown variance

Prior Distributions

We choose independent priors:

For \(\mu\): \(\mu \sim \text{Normal}(\mu_0, \sigma_0^2)\)
For precision \(\psi = 1/\sigma^2\): \(\psi \sim \text{Gamma}(a, b)\)

Note: INLA typically works with precision (\(\psi = 1/\sigma^2\)) rather than variance because precisions have nicer mathematical properties in Gaussian models.

The Goal

We want the marginal posterior distributions:

\[p(\psi \mid \mathbf{y}) = \int p(\theta, \psi \mid \mathbf{y}) d\theta\]

\[p(\theta \mid \mathbf{y}) = \int p(\theta, \psi \mid \mathbf{y}) d\psi\]

Direct integration is computationally challenging. INLA solves this with a clever trick.

The INLA Trick

Step 1: Bayes’ Theorem

The joint posterior is:

\[p(\theta, \psi \mid \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta) \times p(\psi)\]

Step 2: The Key Relationship

INLA uses the relationship:

\[p(\psi \mid \mathbf{y}) = \frac{p(\theta, \psi \mid \mathbf{y})}{p(\theta \mid \psi, \mathbf{y})} \propto \frac{p(\mathbf{y} \mid \theta, \psi) \times p(\theta) \times p(\psi)}{p(\theta \mid \psi, \mathbf{y})}\]

This holds for any value of \(\theta\)!

Step 3: Choose \(\theta = \theta_n\)

For normal data with normal prior, the conditional posterior is:

\[\theta \mid \psi, \mathbf{y} \sim \text{Normal}(\theta_n, \sigma_n^2)\]

Where:

\[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

\[\sigma_n^2 = \frac{1}{n\psi + 1/\sigma_0^2}\]

By choosing \(\theta = \theta_n\) (the conditional posterior mode/mean), the denominator becomes:

\[p(\theta_n \mid \psi, \mathbf{y}) = \frac{1}{\sqrt{2\pi \sigma_n^2}}\]

Step 4: The Simplified Formula

\[p(\psi \mid \mathbf{y}) \propto \left. \frac{p(\mathbf{y} \mid \theta, \psi) p(\theta) p(\psi)}{p(\theta \mid \psi, \mathbf{y})} \right|_{\theta = \theta_n} = \frac{p(\mathbf{y} \mid \theta_n, \psi) \times p(\theta_n) \times p(\psi)}{1 / \sqrt{2\pi \sigma_n^2}}\]

Bayesian Derivation of \(\theta_n\)

The Conditional Posterior Distribution

The derivation of \(\theta_n\) comes directly from Bayes’ theorem:

\[p(\theta \mid \psi, \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta)\]

The Likelihood

\[p(\mathbf{y} \mid \theta, \psi) = \prod_{i=1}^n \sqrt{\frac{\psi}{2\pi}} \exp\left(-\frac{\psi}{2}(y_i - \theta)^2\right)\]

The Prior for \(\theta\)

\[p(\theta) = \frac{1}{\sqrt{2\pi \sigma_0^2}} \exp\left(-\frac{1}{2\sigma_0^2}(\theta - \mu_0)^2\right)\]

Combining and Completing the Square

Multiplying and completing the square gives:

\[p(\theta \mid \psi, \mathbf{y}) \propto \exp\left(-\frac{1}{2}\left[(n\psi + 1/\sigma_0^2)\theta^2 - 2\theta(\psi \sum y_i + \mu_0 / \sigma_0^2)\right]\right)\]

The Resulting Normal Distribution

This is a Normal distribution with:

Mean (\(\theta_n\)): \[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

Variance (\(\sigma_n^2\)): \[\sigma_n^2 = \frac{1}{n\psi + 1/\sigma_0^2}\]

Intuitive Interpretation

\(\theta_n\) is a weighted average of:

The sample mean \(\bar{y}\) with weight \(n\psi\) (data precision × sample size)
The prior mean \(\mu_0\) with weight \(1/\sigma_0^2\) (prior precision)

\[\theta_n = \frac{(n\psi) \cdot \bar{y} + (1/\sigma_0^2) \cdot \mu_0}{n\psi + 1/\sigma_0^2}\]

The INLA Algorithm

Overview

Choose a grid of \(\psi\) values
For each \(\psi^{(j)}\):
- Compute \(\theta_n\) (conditional posterior mode)
- Compute likelihood \(p(\mathbf{y} \mid \theta_n, \psi)\)
- Compute prior \(p(\theta_n)\)
- Compute prior \(p(\psi)\)
- Compute denominator \(p(\theta_n \mid \psi, \mathbf{y})\)
- Calculate unnormalized posterior
Normalize \(p(\psi \mid \mathbf{y})\) numerically
For each \(\theta\):
- Compute conditional \(p(\theta \mid \psi, \mathbf{y})\) for all \(\psi\)
- Integrate out \(\psi\): \(p(\theta \mid \mathbf{y}) = \int p(\theta \mid \psi, \mathbf{y}) p(\psi \mid \mathbf{y}) d\psi\)

Frequently Asked Questions

Q1: What is \(\theta_n\)?

Answer: \(\theta_n\) is the conditional posterior mode/mean of \(\theta\) given \(\psi\) and the data.

\[\theta_n = \frac{\psi \sum_{i=1}^n y_i + \mu_0 / \sigma_0^2}{n\psi + 1/\sigma_0^2}\]

Intuition: \(\theta_n\) combines two sources of information: - The data (through \(\bar{y}\)) - The prior (through \(\mu_0\))

Examples: - When \(\psi\) is large (precise data), \(\theta_n \approx \bar{y}\) (data dominates) - When \(\sigma_0^2\) is large (vague prior), \(\theta_n \approx \bar{y}\) (prior is weak) - When \(\sigma_0^2\) is small (informative prior), \(\theta_n\) is pulled toward \(\mu_0\)

Q2: How is \(p(\theta \mid \psi, \mathbf{y}) \propto p(\mathbf{y} \mid \theta, \psi) \times p(\theta)\) derived?

Answer: This is Bayes’ theorem applied conditionally on \(\psi\).

Step-by-step derivation:

Start with Bayes’ theorem: \(p(\theta, \psi \mid \mathbf{y}) = \frac{p(\mathbf{y} \mid \theta, \psi) p(\theta, \psi)}{p(\mathbf{y})}\)
Factor the joint prior: \(p(\theta, \psi) = p(\theta \mid \psi) \times p(\psi)\)
Focus on \(\theta\) given \(\psi\): \(p(\theta \mid \psi, \mathbf{y}) = \frac{p(\theta, \psi \mid \mathbf{y})}{p(\psi \mid \mathbf{y})}\)
Substitute 1 to 3 and simplify
With independent priors (\(\theta \perp \psi\)), \(p(\theta \mid \psi) = p(\theta)\)

Q3: What is the purpose of creating a grid for \(\psi\)?

Answer: We discretize \(\psi\) to perform numerical integration. This does NOT mean we know \(\psi\)’s values!

Analogy: Think of it like a treasure hunt: - You don’t know where the treasure is (the true \(\psi\)) - But you decide to search at specific locations (grid points) - At each location, you measure “how likely” the treasure is there - Then you use these measurements to infer where the treasure probably is

Why we need a grid: - The integral \(\int p(\theta, \psi \mid \mathbf{y}) d\theta\) has no closed-form solution - Numerical integration requires evaluating the function at specific points

Grid considerations: - Too few points: May miss important features - Too many points: Computationally expensive - Wrong range: May miss the posterior entirely

Q4: After integration, why can the constant be used for normalizing?

Answer: The unnormalized posterior post.psi is proportional to the true density:

\[\text{post.psi} = c \times p(\psi \mid \mathbf{y})\]

Where \(c\) is an unknown constant, and the true density satisfies \(\int p(\psi \mid \mathbf{y}) d\psi = 1\).

Therefore:

\[\int \text{post.psi} \, d\psi = \int c \times p(\psi \mid \mathbf{y}) d\psi = c \times 1 = c\]

So the definite integral of the unnormalized density equals the normalizing constant! Then we can recover the true density by division: \(\frac{\text{post.psi}}{c} = p(\psi \mid \mathbf{y})\).

Q5: What is the purpose of Delta <- 1/sum(post.psi) in the code?

Answer: This converts the density to a probability mass function for numerical integration.

Explanation: We’re approximating the continuous integral:

\[p(\theta \mid \mathbf{y}) = \int p(\theta \mid \psi, \mathbf{y}) p(\psi \mid \mathbf{y}) d\psi\]

As a Riemann sum:

\[p(\theta \mid \mathbf{y}) \approx \sum_j p(\theta \mid \psi^{(j)}, \mathbf{y}) \times [p(\psi^{(j)} \mid \mathbf{y}) \times \Delta \psi_j]\]

The code combines steps: - post.psi is the density \(p(\psi^{(j)} \mid \mathbf{y})\) - Multiplying by Delta = 1/sum(post.psi) converts it to probability mass

The three-way comparison (INLA vs Manual vs MCMC) serves as validation

Method	Type	Role
INLA Package	Professional software	Reference implementation
Manual (Step-by-step)	Our implementation	Validates our understanding
MCMC/OpenBUGS	Gold standard	Validates both above

If all three match (which they should for this model), we can be confident that: - Our understanding of INLA is correct - Our manual implementation is correct - The INLA approximation is accurate

R Implementation

Step 1: Load Data and Set Parameters

# Load the data
y <- c(1.2697, 7.7637, 2.2532, 3.4557, 4.1776, 6.4320, -3.6623, 7.7567,
       5.9032, 7.2671, -2.3447, 8.0160, 3.5013, 2.8495, 0.6467, 3.2371,
       5.8573, -3.3749, 4.1507, 4.3092, 11.7327, 2.6174, 9.4942, -2.7639,
       -1.5859, 3.6986, 2.4544, -0.3294, 0.2329, 5.2846)

n <- length(y)
ybar <- mean(y)

# Prior parameters
mu0 <- -3          # prior mean for μ
sigma2_0 <- 4      # prior variance for μ
a <- 1.6           # gamma shape
b <- 0.4           # gamma rate

Step 2: Create Grid for \(\psi\)

# Grid parameters
H <- 25                    # Number of grid points
psi.min <- 0.001          # Minimum precision
psi.max <- 0.3            # Maximum precision

# Create grid
psi.grid <- seq(psi.min, psi.max, length.out = H)

# Prior for psi at grid points
hprior <- dgamma(psi.grid, shape = a, rate = b)

Step 3: Main Loop - Compute Unnormalized Posterior for \(\psi\)

# Initialize vectors
theta.n <- numeric(H)
sigma2.n <- numeric(H)
prior <- numeric(H)
lik <- numeric(H)
num <- numeric(H)
den <- numeric(H)

for (h in 1:H) {
  # Step 1: Compute θₙ (conditional posterior mode)
  theta.n[h] <- (psi.grid[h] * n * ybar + mu0 / sigma2_0) / 
                (psi.grid[h] * n + 1 / sigma2_0)
  
  # Step 2: Compute σₙ² (conditional posterior variance)
  sigma2.n[h] <- 1 / (n * psi.grid[h] + 1 / sigma2_0)
  
  # Step 3: Prior for θ at θₙ
  prior[h] <- dnorm(theta.n[h], mu0, sd = sqrt(sigma2_0))
  
  # Step 4: Likelihood of data given θₙ and ψ
  lik[h] <- prod(dnorm(y, theta.n[h], sd = 1 / sqrt(psi.grid[h])))
  
  # Step 5: Numerator = prior(ψ) × prior(θ) × likelihood
  num[h] <- hprior[h] * prior[h] * lik[h]
  
  # Step 6: Denominator = p(θₙ|ψ,y) at mode = 1/√(2πσₙ²)
  den[h] <- dnorm(theta.n[h], theta.n[h], sd = sqrt(sigma2.n[h]))
}

# Unnormalized posterior for ψ
post.psi <- num / den

Step 4: Normalize \(p(\psi \mid \mathbf{y})\)

# Create interpolation function
f.psi <- approxfun(psi.grid, post.psi, 
                   yleft = min(psi.grid), 
                   yright = max(psi.grid))

# Integrate to find normalizing constant
const <- integrate(f.psi, min(psi.grid), max(psi.grid))

# Normalize
post.psi <- post.psi / const$value

cat("Normalizing constant:", const$value, "\n")

## Normalizing constant: 9.169797e-42

cat("Posterior integrates to:", sum(post.psi) * diff(psi.grid)[1], "\n")

## Posterior integrates to: 0.9902876

Step 5: Compute Full Conditional for \(\theta\)

# Create grid for θ
J <- 50
theta.grid <- seq(-8, 5, length.out = J)

# Full conditional distributions p(θ|ψ,y)
full.cond.theta <- matrix(NA, J, H)

for (j in 1:J) {
  for (h in 1:H) {
    full.cond.theta[j, h] <- dnorm(theta.grid[j], 
                                   theta.n[h], 
                                   sd = sqrt(sigma2.n[h]))
  }
}

Step 6: Integrate Out \(\psi\) to Get \(p(\theta \mid \mathbf{y})\)

# Convert density to probability mass for numerical integration
Delta <- 1 / sum(post.psi)

# Weighted joint posterior
joint.post.theta.psi <- matrix(NA, J, H)
for (h in 1:H) {
  joint.post.theta.psi[, h] <- full.cond.theta[, h] * post.psi[h] * Delta
}

# Marginal posterior for θ (sum over ψ)
marg.post.theta <- rowSums(joint.post.theta.psi)

# Normalize
f.theta <- approxfun(theta.grid, marg.post.theta, 
                     yleft = min(theta.grid), 
                     yright = max(theta.grid))
const_theta <- integrate(f.theta, min(theta.grid), max(theta.grid))
marg.post.theta <- marg.post.theta / const_theta$value

Visualization of Results

Figure 1: Marginal Posterior of \(\psi\)

plot(psi.grid, post.psi, type = "l", lwd = 2,
     xlab = expression(psi), ylab = "Density",
     main = "Marginal Posterior of ψ",
     col = "blue")
grid()

Marginal Posterior of ψ

Figure 2: Full Conditional Distributions \(p(\theta \mid \psi, \mathbf{y})\)

plot(theta.grid, full.cond.theta[, 1], type = "l", lwd = 0.7,
     xlab = expression(theta), ylab = "Density",
     main = "Full Conditional Distributions p(θ | ψ, y)",
     ylim = range(full.cond.theta))

for (h in 1:H) {
  lines(theta.grid, full.cond.theta[, h], lwd = 0.7)
}
grid()

Full Conditional Distributions p(θ | ψ, y)

Figure 3: Marginal Posterior of \(\theta\)

plot(theta.grid, marg.post.theta, type = "l", lwd = 2,
     xlab = expression(theta), ylab = "Density",
     main = "Marginal Posterior of θ",
     col = "darkred")
grid()

Marginal Posterior of θ

Verification with INLA Package

# Load INLA (uncomment to install)
# install.packages("INLA", repos = "https://inla.r-inla-download.org/R/stable")
library(INLA)

# Run INLA
formula <- y ~ 1
inla.output <- inla(formula, 
                    data = data.frame(y = y),
                    control.family = list(
                      hyper = list(
                        prec = list(prior = "loggamma", 
                                   param = c(a, b))
                      )
                    ),
                    control.fixed = list(
                      mean.intercept = mu0, 
                      prec.intercept = 1 / sigma2_0
                    ),
                    control.compute = list(config = TRUE),
                    verbose = FALSE)

# Extract results
inla_psi <- inla.output$marginals.hyperpar[[1]]
inla_theta <- inla.output$marginals.fixed$"(Intercept)"

# Interpolate INLA results to manual grid
inla_psi_interp <- approx(inla_psi[, 1], inla_psi[, 2], 
                          xout = psi.grid)$y
inla_theta_interp <- approx(inla_theta[, 1], inla_theta[, 2], 
                            xout = theta.grid)$y

Comparison Plots

par(mfrow = c(1, 2))

# ψ posterior comparison
plot(psi.grid, post.psi, type = "l", lwd = 2, col = "blue",
     xlab = expression(psi), ylab = "Density",
     main = expression("Posterior of " * psi),
     ylim = range(c(post.psi, inla_psi_interp), na.rm = TRUE))
lines(psi.grid, inla_psi_interp, col = "red", lwd = 2, lty = 2)
legend("topright", 
       legend = c("Manual", "INLA Package"),
       col = c("blue", "red"), lty = c(1, 2), lwd = 2)

# θ posterior comparison
plot(theta.grid, marg.post.theta, type = "l", lwd = 2, col = "blue",
     xlab = expression(theta), ylab = "Density",
     main = expression("Posterior of " * theta),
     ylim = range(c(marg.post.theta, inla_theta_interp), na.rm = TRUE))
lines(theta.grid, inla_theta_interp, col = "red", lwd = 2, lty = 2)
legend("topright",
       legend = c("Manual", "INLA Package"),
       col = c("blue", "red"), lty = c(1, 2), lwd = 2)

Numerical Comparison

# Error metrics for ψ
mae_psi <- mean(abs(post.psi - inla_psi_interp), na.rm = TRUE)
rmse_psi <- sqrt(mean((post.psi - inla_psi_interp)^2, na.rm = TRUE))
cor_psi <- cor(post.psi, inla_psi_interp, use = "complete.obs")

# Error metrics for θ
mae_theta <- mean(abs(marg.post.theta - inla_theta_interp), na.rm = TRUE)
rmse_theta <- sqrt(mean((marg.post.theta - inla_theta_interp)^2, na.rm = TRUE))
cor_theta <- cor(marg.post.theta, inla_theta_interp, use = "complete.obs")

# Summary table
comparison <- data.frame(
  Parameter = c("ψ (precision)", "θ (mean)"),
  MAE = c(mae_psi, mae_theta),
  RMSE = c(rmse_psi, rmse_theta),
  Correlation = c(cor_psi, cor_theta)
)

knitr::kable(comparison, digits = 6,
             caption = "Comparison between Manual Implementation and INLA Package")

Comparison between Manual Implementation and INLA Package
Parameter	MAE	RMSE	Correlation
ψ (precision)	0.136572	0.162064	0.999973
θ (mean)	0.002716	0.003712	0.999899

Verification with OpenBUGS/(MCMC)

OpenBUGS Model File

Save this as “modelWorkingINLA_notcancel.txt”:

model {
    # Likelihood: Normal distribution for the data
    for (i in 1:n) {
        y[i] ~ dnorm(theta, psi)
    }
    
    # Prior for theta (mean)
    theta ~ dnorm(mu0, tau0)
    
    # Prior for psi (precision)
    psi ~ dgamma(a, b)
    
    # Optional: Transform psi to variance for easier interpretation
    sigma2 <- 1/psi
}

library(R2OpenBUGS)
working.dir <- getwd()
filein <- "modelWorkingINLA_notcancel.txt"
params <- c("theta","psi")
dataJags <- list(y=y,mu0=mu0,tau0=1/sigma2_0,a=a,b=b,n=n)
inits <- function(){
    list(theta=rnorm(1),psi=runif(1))
}
n.iter <- 10000
n.burnin <- 2000
n.thin <- floor((n.iter-n.burnin)/500)

mcmc <- bugs(dataJags, inits, params, model.file=filein,
    n.chains=2, n.iter, n.burnin, n.thin,
    DIC=TRUE, working.directory=working.dir, debug=FALSE)
psi.mcmc <- mcmc$sims.list$psi
theta.mcmc <- mcmc$sims.list$theta

# *** 
plot(inla.output$marginals.hyperpar[[1]], t="l",lwd=2, xlab=expression(sigma^2), ylab="",
     ylim=range(inla.output$marginals.hyperpar[[1]]))
lines(spline(psi.grid,post.psi),lty=2) #step by step
lines(density(psi.mcmc),lty=3,lwd=3,col='red') #MCMC
legend("topright",c("INLA","Step-by-step","MCMC"),lty=c(1,2,3),lwd=c(2,1,3),cex=2,bty="n")

rg <- range(inla.output$marginals.fixed$"(Intercept)"[,2])
plot(inla.output$marginals.fixed$"(Intercept)",t="l",xlab=expression(mu), ylab="",ylim=rg,lwd=2)
lines(theta.grid,apply(joint.post.theta.psi,1,sum),lty=2)#Step by step
lines(density(theta.mcmc),lty=3,lwd=3,col='red')#MCMC
legend("topright",c("INLA","Step-by-step","MCMC"),lty=c(1,2,3), lwd=c(2,1,3), cex=2,bty="n")

Three-Way Comparison Plot, the same as above plots.

par(mfrow = c(1, 2))

# ψ comparison
plot(inla_psi, type = "l", lwd = 2, 
     xlab = expression(psi), ylab = "Density",
     main = expression("Posterior of " * psi),
     ylim = range(c(post.psi, inla_psi[, 2], density(psi.mcmc)$y)))
lines(spline(psi.grid, post.psi), lty = 2, lwd = 2)
lines(density(psi.mcmc), lty = 3, lwd = 2)
legend("topright", 
       legend = c("INLA", "Manual", "MCMC"),
       lty = c(1, 2, 3), lwd = 2)

# θ comparison
plot(inla_theta, type = "l", lwd = 2,
     xlab = expression(theta), ylab = "Density",
     main = expression("Posterior of " * theta),
     ylim = range(c(marg.post.theta, inla_theta[, 2], density(theta.mcmc)$y)))
lines(theta.grid, marg.post.theta, lty = 2, lwd = 2)
lines(density(theta.mcmc), lty = 3, lwd = 2)
legend("topright",
       legend = c("INLA", "Manual", "MCMC"),
       lty = c(1, 2, 3), lwd = 2)

Summary of Key Concepts

What is \(\theta_n\)?

Property	Value
Full name	Conditional posterior mode/mean of \(\theta\)
What it represents	Best estimate of \(\mu\) given \(\psi\) and data
Formula	Weighted average of \(\bar{y}\) and \(\mu_0\)
Weights	Precisions (\(n\psi\) for data, \(1/\sigma_0^2\) for prior)
Role in INLA	Convenient point to evaluate the posterior

Why Create a Grid for \(\psi\)?

Aspect	Explanation
Purpose	Approximate continuous integration with discrete points
Why needed	No closed-form solution for \(p(\psi \mid \mathbf{y})\)
How it works	Evaluate posterior at H strategic points
Key insight	Grid points are hypotheses, not known values!

Normalization Explained

Step	What it does	Why it works
Compute post.psi	Gets unnormalized density \(c \times p(\psi \mid \mathbf{y})\)	INLA gives relative probabilities
Integrate	Finds \(\int \text{post.psi} d\psi = c\)	Integral of unnormalized density = normalizing constant
Divide	Computes \(\text{post.psi} / c = p(\psi \mid \mathbf{y})\)	Division yields true probability density

Understanding INLA: A Step-by-Step Implementation Note

Enwu Liu

2026-06-14