Intro to Spatial Modeling Using INLA

=================================================================================================== PREVIOUS: https://rpubs.com/sherloconan/1198601

THIS: https://rpubs.com/sherloconan/1339374

MORE: https://github.com/zhengxiaoUVic/Spatial

0. Square Lattice on $[0,1]^2$

Remark 1: None of the existing approaches take advantage of the additional spatial structure of the ST data; that is, the gene expression levels are measured on a lattice grid ($\texttt{BOOST-MI}$; Jiang et al., 2022).

nx <- 80; ny <- 80 # N = 6,400 points
coords <- expand.grid(x=seq(0, 1, length.out=nx),
                      y=seq(0, 1, length.out=ny))
coords <- as.matrix(coords) # convert to numeric matrix N × 2 for INLA functions

For some models, INLA considers data sorted by column, i.e., a vector with the first column of the grid from top to bottom, followed by the second column and so on (Gómez-Rubio, 2021, ch. 7).

lower left corner by rows: internally stored in expand.grid
upper left corner by columns: mapping required by INLA

1. Triangulation

Build a slightly “tight” polygonal boundary around the points (negative convex shrinks).

Create a Delaunay triangulation constrained to bnd.

The max.edge parameter controls the mesh’s granularity (i.e., the largest allowed triangle edge length).
A lower max.edge=c(inside the boundary triangle, outside the boundary triangle) indicates a higher resolution.

bnd  <- inla.nonconvex.hull(coords, convex=-0.05) # polygonal boundary around the points
mesh <- inla.mesh.2d(boundary=bnd,
                     max.edge=c(0.03, 0.07),
                     cutoff=0.01) # minimum distance to avoid duplicating very close vertices
cat("The number of vertices is", mesh$n)

## The number of vertices is 4165

Result: mesh$loc is $n_{\text{spde}}\times3\ (x,y,\color{gray}{z=0})$, and mesh$n is the vertex count.

What is a Mesh?

A mesh is a triangulated representation of a spatial domain $\mathcal{D}$. It is created by dividing the domain into triangles (or tetrahedrons in 3D). This structure allows the model to approximate the spatial processes over these smaller regions. The mesh helps in defining the precision matrix ${\mathbf{Q}}$ of the GMRF that represents the spatial random effect. The mesh is passed to the inla.spde2.matern or inla.spde2.pcmatern function to define a spatial model.

Reasons for Triangles Outside the Domain

By including these extra triangles offset, the mesh ensures more accurate modeling of spatial processes near the boundaries, improves numerical stability, and enhances the overall quality of the spatial analysis.

Remark 2: For $n\times n$ lattice data, a more straightforward approach may involve using an intrinsic GMRF directly. Neighbors are the points directly adjacent on the lattice.

Besag: Used for spatial data that are defined on a lattice. This model is part of the family of intrinsic conditional autoregressive (ICAR) models. Read more.
SPDE: Used for modeling spatial fields that are continuous over an area, particularly useful for unstructured meshes and non-lattice data. Increasing alpha implies that the resulting GRF will have more differentiable sample paths.

2. Baseline Matérn parameters

SPDE representation of a GRF, $d=2$

Remark 3: Non-stationary models are supported for $\alpha=2$ only in the current implementation of INLA (v25.06.07).

Smoothness: $\alpha=\nu+d/2\equiv\color{red}{2}\implies \nu=1$ (non-mean-square differentiable)
Spatial scale: $\kappa=2\sqrt{2\nu}\ /\ \rho=2\sqrt{2}\ /\ \rho$ (autocorrelation is around 0.14 at $\lVert\mathbf{s}_i-\mathbf{s}_j\rVert=\rho$)
Marginal precision: $\tau^2=\frac{\Gamma(\nu)}{\Gamma(\alpha)(4\pi)^{d/2}\kappa^{2\nu}}\sigma^{-2}=\frac{1}{4\pi\kappa^2}\sigma^{-2}$

nu      <- 1                        # smoothness

range0  <- 0.30                     # baseline spatial range
kappa0  <- sqrt(8*nu) / range0      # baseline spatial scale

sigma0  <- 1                        # baseline marginal standard deviation
tau0    <- 1 / (2 * sqrt(pi) * kappa0 * sigma0)

3. Non-stationarity

Non-stationarity is achieved when $\tau(\mathbf{s})$ and/or $\kappa(\mathbf{s})$ are non-constant for $\mathbf{s}$.

Let $\ln\kappa$ jump on the right half and $\ln\tau$ vary smoothly with $x>0.5$.

xv <- mesh$loc[,1] # extract the x-coordinate at each mesh vertex
x_centered <- as.numeric(scale(xv, scale=F)) # mean-centered x (for identifiability)
right_half <- as.numeric(xv > 0.5) # indicator basis (0 left, 1 right) to allow a jump across x=0.5

The lengths of x_centered and right_half are each equal to the vertex count.

4. $\mathbf{B}$-matrices

Build $\mathbf{B}$ matrices with offsets from Section 2 and $x$ coordinates from Section 3

\[\begin{cases} \ \ln\tau(\mathbf{s})=\sum_i\theta_i^{(\tau)}\!\cdot B_i^{(\tau)}(\mathbf{s})=B_0^{(\tau)}(\mathbf{s})+\theta_1\!\cdot B^{(\tau)}(\mathbf{s}) \\ \\ \ \ln\kappa(\mathbf{s})=\sum_i\theta_i^{(\kappa)}\!\cdot B_i^{(\kappa)}(\mathbf{s})=B_0^{(\kappa)}(\mathbf{s})+\theta_2\!\cdot B^{(\kappa)}(\mathbf{s}) \end{cases}\]

log_tau0_mesh   <- rep(log(tau0),   mesh$n)  # offsets at each mesh vertex (NOT multiplied by theta)
log_kappa0_mesh <- rep(log(kappa0), mesh$n)

B.tau   <- cbind(log_tau0_mesh,   x_centered, 0)             # offset + theta1 * x_centered
B.kappa <- cbind(log_kappa0_mesh, 0,          right_half)    # offset + theta2 * 1{xv > 0.5}

Each $\mathbf{B}$ is $n_{\text{spde}}\times3$: one offset and two $\theta$-columns (n.theta=2).

5. SPDE model object

Build the non-stationary Matérn‐SPDE on the mesh, using the $\mathbf{B}$-matrices

Assume weak Gaussian priors, $\theta_i\sim\mathcal{N}(0,10^2)$ for $i\in\{1,2\}$.

spde <- inla.spde2.matern(
  mesh,
  B.tau   = B.tau,
  B.kappa = B.kappa,
  theta.prior.mean = c(0, 0),        # priors on theta1, theta2 (log-scale coefficients)
  theta.prior.prec = c(0.01, 0.01)   # weakly informative (sd = 10)
)

6. Simulate a latent non-stationary field

$\mathbf{w}\sim\boldsymbol{\mathcal{N}}(\mathbf{0},\mathbf{Q}^{-1})$

Choose ground-truth non-stationary effects ($\theta_1=a_1$ and $\theta_2=b_2$).

theta_true <- c(
  a1 =  0.30,   # increases log tau with x  -->  lowers variance to the right
  b1 = -0.50    # decreases log kappa on right_half  -->  increases range on right
)

Q <- inla.spde2.precision(spde, theta=theta_true) # sparse precision matrix of the GMRF for the SPDE

# one (n = 1) sample path of the latent GMRF on mesh vertices
w <- as.numeric(inla.qsample(n=1, Q=Q, seed=277))

7. Map to the lattice and add noise

Projection matrix $\mathbf{A}$ of size $N\times n_{\text{spde}}$ that interpolates from mesh vertices to observation points.

Observations: $y_i=w(\mathbf{s}_i)+\epsilon_i$.

Baseline spatial variance component (partial sill): $\sigma_s^2=1$.

Non-spatial variance component (nugget): $\sigma_\epsilon^2=0.2^2$.

A_grid <- inla.spde.make.A(mesh, loc=coords) # projection matrix
noiseless <- as.numeric(A_grid %*% w)

noise_sd <- 0.2 # observation noise (nugget effect) sd
set.seed(277)
y <- noiseless + rnorm(nrow(coords), 0, noise_sd)

image(matrix(noiseless, ny, nx)[ny:1, ], # flip the row to make yv increase upwards
      col=gray.colors(64), main="True latent field (noiseless)") # quick looks

image(matrix(y, ny, nx)[ny:1, ], # flip the row to make yv increase upwards
      col=gray.colors(64), main="Observed latent field") # quick looks

8. Stack for INLA

In the model $y_i=w(\mathbf{s}_i)+\epsilon_i$, no intercept is included, implying a zero mean.

# build the indexing list for the random effect `f()`
w_idx <- inla.spde.make.index("w", n.spde=spde$n.spde)

stk <- inla.stack(
  data    = list(y=y),
  A       = list(A_grid),     # projection matrix
  effects = list(w=w_idx),    # effects list containing the SPDE index for `w`
  tag     = "est"
)

9A. Fit the non-stationary model

The penalized complexity (PC) prior: $\mathbb{P}(\sigma>\sigma_0)=p_{\sigma}$,
e.g., the probability of $\sigma$ being greater than 1 is equal to 1% if $\mathbb{P}(\sigma>1)=0.01$.

fit <- inla(
  y ~ -1 + f(w, model=spde),                   # remove the intercept
  data = inla.stack.data(stk),                 # unwraps the stack to data lists
  family = "gaussian",
  control.family = list(hyper = list(
    prec = list(prior="pc.prec", param=c(1, 0.01))   # PC prior on the noise precision
  )),
  control.predictor = list(A=inla.stack.A(stk), compute=T),
  control.compute=list("config"=T, "dic"=T, "waic"=T)
) # roughly 3 seconds of run time

# display posterior summary for all hyperparameters
cat("\n--- Hyperparameters ---\n"); print(fit$summary.hyperpar)

## 
## --- Hyperparameters ---

##                                               mean         sd  0.025quant
## Precision for the Gaussian observations 25.4776175 0.53238957 24.44367369
## Theta1 for w                             0.2382861 0.08051712  0.08025481
## Theta2 for w                            -0.7298762 0.23936825 -1.24268395
##                                           0.5quant 0.975quant       mode
## Precision for the Gaussian observations 25.4727039 26.5397836 25.4642917
## Theta1 for w                             0.2381205  0.3972812  0.2374330
## Theta2 for w                            -0.7158623 -0.3064614 -0.6471118

cat("True precision for the Gaussian observations is", 1/noise_sd^2,
    "\nTrue Theta1 for w is", unname(theta_true[1]),
    "\nTrue Theta2 for w is", unname(theta_true[2]))

## True precision for the Gaussian observations is 25 
## True Theta1 for w is 0.3 
## True Theta2 for w is -0.5

## summary(fit)
fit$mlik # log marginal-likelihood

##                                            [,1]
## log marginal-likelihood (integration) -281.5851
## log marginal-likelihood (Gaussian)    -280.3409

Warning: The $\mathbf{A}$-matrix in the predictor (see ?control.predictor) is specified but an intercept is in the formula. This will likely result in the intercept being applied multiple times in the model, and is likely not what you want. See ?inla.stack for more information. You can remove the intercept adding -1 to the formula.

Two methods of computing the log marginal likelihood are

Integration: This approach computes the mlik by directly integrating the likelihood function over the parameter space. This method is typically more accurate but can be computationally intensive. It’s often used when precision is essential for model comparison.
Gaussian: This method approximates mlik using a normal approximation. It is generally faster and less computationally demanding than direct integration, but it might be less accurate, particularly in cases where the true likelihood is not well-approximated by a normal distribution (Rue, Martino, & Chopin, 2009, p. 345).

9B. Fit the stationary model

Keep the same mesh, A_grid, and y.

Replace the SPDE block with a stationary one (Option A with spde2.pcmatern, or Option B with constant $\mathbf{B}$ matrices).

Rebuild the stack with a new index name, fit, and compare (WAIC/DIC, posterior range/sigma).

spde_st <- inla.spde2.pcmatern(
  mesh = mesh, alpha = 2,
  prior.range = c(0.2, 0.05),   # P(range < 0.2) = 0.05
  prior.sigma = c(1.0, 0.05)    # P(sigma > 1.0) = 0.05
)

wst_idx <- inla.spde.make.index("w_st", n.spde=spde_st$n.spde) # index & stack

stk_st <- inla.stack(
  data    = list(y = y),
  A       = list(A_grid),
  effects = list(w_st = wst_idx)
)

# fit stationary model
fit_st <- inla(
  y ~ -1 + f(w_st, model=spde_st),
  data = inla.stack.data(stk_st),
  family = "gaussian",
  control.family = list(hyper = list(
    prec = list(prior="pc.prec", param=c(1, 0.01))   # noise prior (same as before)
  )),
  control.predictor = list(A=inla.stack.A(stk_st), compute=T),
  control.compute=list("config"=T, "dic"=T, "waic"=T)
) # roughly 3 seconds of run time

## summary(fit_st)
fit_st$mlik # log marginal-likelihood

##                                            [,1]
## log marginal-likelihood (integration) -284.2216
## log marginal-likelihood (Gaussian)    -282.9829

bf10 <- exp(fit$mlik - fit_st$mlik) # Bayes factor against the null (stationarity)
rownames(bf10) <- c("integration", "Gaussian"); colnames(bf10) <- "BF10.est"
bf10

##             BF10.est
## integration 13.96376
## Gaussian    14.04169

10. Recover spatially varying parameters from posteriors

INLA reports Theta1 and Theta2 for the SPDE (our two non-stationary coefficients).

th_rows <- grep("^Theta", rownames(fit$summary.hyperpar))
theta_hat <- fit$summary.hyperpar[th_rows, "mean"]  # posterior means (theta1_hat, theta2_hat)

# ln tau(s) = B.tau[,1] + B.tau[,2:3] %*% theta
log_tau_mesh_hat   <- as.numeric(B.tau[,1]   + B.tau[, -1, drop=F]   %*% theta_hat)

# ln kappa(s) = B.kappa[,1]+ B.kappa[,2:3] %*% theta
log_kappa_mesh_hat <- as.numeric(B.kappa[,1] + B.kappa[, -1, drop=F] %*% theta_hat)

tau_mesh_hat   <- exp(log_tau_mesh_hat)
kappa_mesh_hat <- exp(log_kappa_mesh_hat)

range_mesh_hat  <- sqrt(8) / kappa_mesh_hat
sigma2_mesh_hat <- 1 / (4*pi * kappa_mesh_hat^2 * tau_mesh_hat^2)

11. Put range/variance onto the regular plotting grid

proj <- inla.mesh.projector(
  mesh, xlim=c(0,1), ylim=c(0,1), dims=c(nx, ny)
) # create a projector for a regular nx × ny image domain

range_grid  <- inla.mesh.project(proj, range_mesh_hat) # interpolate mesh values onto the grid
sigma2_grid <- inla.mesh.project(proj, sigma2_mesh_hat)

par(mfrow=c(1,2), mar=c(3,3,3,3))
image(proj$x, proj$y, range_grid,  useRaster=T,
      xlab ="x", ylab="y", main="Posterior mean range(s)")
abline(v=0.5, lty=2) # larger on the right (x > 0.5)

image(proj$x, proj$y, sigma2_grid, useRaster=T,
      xlab="x", ylab="y", main="Posterior mean sigma^2(s)")
abline(v=0.5, lty=2) # a left–right gradient from the tau trend plus a step at x=0.5 from kappa

12. Compute true values for reference

log_tau_mesh_true   <- as.numeric(B.tau[,1]   + B.tau[, -1, drop=F]   %*% theta_true)
log_kappa_mesh_true <- as.numeric(B.kappa[,1] + B.kappa[, -1, drop=F] %*% theta_true)
tau_mesh_true       <- exp(log_tau_mesh_true)
kappa_mesh_true     <- exp(log_kappa_mesh_true)
range_mesh_true     <- sqrt(8) / kappa_mesh_true
sigma2_mesh_true    <- 1 / (4*pi * kappa_mesh_true^2 * tau_mesh_true^2)

# check correlations between truth and estimates on mesh
cat("\n--- Correlation checks on mesh ---\n", "corr(range_true, range_hat): ",
    cor(range_mesh_true,  range_mesh_hat),  "\ncorr(sigma2_true, sigma2_hat): ",
    cor(sigma2_mesh_true, sigma2_mesh_hat), "\n")

## 
## --- Correlation checks on mesh ---
##  corr(range_true, range_hat):  1 
## corr(sigma2_true, sigma2_hat):  0.9819166