Spatial models (in the Bayesian statistics literature) include spatial random effects. Point patterns treat points as random. Point-referenced data and block-referenced data apply spatial random effects to capture smoothing as a function distance. Dimension reduction methods can exploit this structure to avoid large matrix operations.


Resources

Software

source('clarkFunctions2021.r')

We need these libraries:

options(width = 85) 
library(spBayes)
library(MBA)
library(fields)
library(sp)
library(maptools)
library(rgdal)
library(classInt)
library(lattice)
library(xtable)
library(MASS)
library(geomapdata)
library(mapdata)
library(RANN)
library(gjam)
library(coda)
library(CARBayes)
library(CARBayesdata)

Readings

Background from Andy Finley’s site.

Objectives

Spatial data and models

Spatial data are point-referenced, block-referenced, or point patterns. The first two have fixed locations. The third has random locations.

Spatial data are referenced by location, and location is considered important for understanding responses. Point-referenced data are identified by a vector of coordinates in continuous space. For a (2-D) map the reference could be longitude, latitude: \(\mathbf{s}_i = (s_{i1}, s_{i2})\). In the ocean or atmosphere, the reference could additionally have depth or altitude, \(\mathbf{s}_i = (s_{i1}, s_{i2}, s_{i3})\).

Block-referenced data are associated with an area. Examples could be zip codes, counties, watersheds, or voxels. Both point- and block-referenced data have fixed locations. For example, a sampling design could specify locations to sample and then treat the responses at those locations as random. Even if sample locations are assigned ‘at random’, once they are selected they might be viewed as a fixed part of the design.

*The locations of trees in the Guanica plot, Puerto Rico (from M. Uriarte).*

The locations of trees in the Guanica plot, Puerto Rico (from M. Uriarte).

Point-pattern data have random locations–the pattern itself is the response. The locations of trees could be taken as a response to competition. Locations of crimes could be taken as a response to varying socio-economic conditions within a city.

All three types of spatial data share the attribute that location is viewed as important and thus will be included as part of the model. I turn immediately to an example.

*A point pattern (locations of ebird checklists in the Research Triangle), block-reference as a regular lattice (left), and as an irregular lattice (right), all from Becky Tang's study of species interactions and movement.*

A point pattern (locations of ebird checklists in the Research Triangle), block-reference as a regular lattice (left), and as an irregular lattice (right), all from Becky Tang’s study of species interactions and movement.

FIA data

Forest inventory (FIA) data are indexed by location. They can be viewed as point-referenced data. Here is an aggregated version of FIA data that is stored as a matrix of stem counts y, basal area yBA, and predictor matrix x:

load('fiaSpBayes.Rdata')
n <- nrow(y)                   # no. plots
S <- ncol(y)                   # no. species
xdata <- data.frame(x)

I have aggregated by predictors to make this run faster for class.

Here is a map that uses clarkFunctions2021.r:

maplon <- c(-95,-67)
maplat <- c(25,50)
topo   <- subETOPO5(maplon,maplat)
opt    <- list(z = topo$z, mapscale = 8, IMAGE = T)
regMap(topo$x, topo$y, opt)
## NULL
points(coords[,1],coords[,2],cex=.2)

Sample locations in this data set are aggregated FIA plots.

The points on this map show locations that I analyze below. Here is a map for loblolly pine:

spec   <- 'PITA'
maplon <- c(-95,-68)
maplat <- c(26, 38)
opt    <- list( yy = y[,spec], maplon = maplon, maplat = maplat )

speciesNAmerMap(coords[,1],coords[,2], opt = opt)
title('loblolly data')

Loblolly pine abundance.

Because I have aggregated plots on the basis of similar predictors, there are only 1223 plots and noise is reduced. In the following example I set up a spatial model for fitting in spBayes.

Spatial random effects

Spatial random effects models achieve prediction by imposing spatial smoothing. The Gaussian process parameterizes a covariance matrix as a function of distance between sample locations.

Spatial random effects models for point-referenced data parameterize a covariance function of distance. Recall that the \(n \times n\) covariance matrix holds \(n(n+1)/2\) unique elements. I cannot estimate all of these values. Not only are there too many locations, but, unless there is replication, each is typically observed once. However, if I make the elements depend on one another, then I may only need a few parameters to describe the entire matrix. Spatial smoothing is often not desirable, but, where appropriate, it can greatly improve spatial modeling. I return to this issue later.

Recall the regression model, now referenced by location,

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] Location is defined by a vector \(\mathbf{s}_i = (s_{i1}, s_{i2})'\), in this case, (longitude, latitude) of plot \(i\).

The mean structure of the model could be linear in parameters, \(\mu(\mathbf{s}_i) = \mathbf{x}'(\mathbf{s}_i) \boldsymbol{\beta}\). The errors in this model are independent and identically distributed (i.i.d). The Gaussian process is the most general approach to allow for spatial smoothing. Consider a new term in the model for the random effect, \(w(\mathbf{s}_i)\), having a joint distribution

\[ \mathbf{w} \sim GP(0, \sigma^2 R(\cdot)) \] with covariance between locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\),

\[Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) ) = \sigma^2 \rho(\phi; ||\mathbf{s}_i - \mathbf{s}_j||)\] The notation \(||\mathbf{s}_i - \mathbf{s}_j||\) indicates distance. I can think of a \(n \times n\) matrix of distances, with each element replaced by the function. The full covariance matrix can be written as

\[\mathbf{R}_{\sigma^2, \phi} = \sigma^2 R(\phi)\] In other words, the covariance matrix is a function of distance–the locations only matter in terms of their pairwise distances from one another.

There are a number of different functions that can be used to describe how covariance between points declines with distance. One common model decays exponentially,

\[\rho(\phi, s) = e^{-\phi s}\] for a distance \(s\).

With spatial random effects the model is

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \mathbf{w} &\sim MVN(0, \mathbf{R}_{\sigma^2, \phi}) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] If I marginalize out the random effects, I can equivalently write

\[ \mathbf{y} \sim MVN(\mathbf{X} \boldsymbol{\beta}, \mathbf{R}_{\sigma^2, \phi} + \tau \mathbf{I}) \] Note that I have moved the random effects from the mean structure into the covariance.

So what does the term ‘spatial’ mean?

In recent statistics literature, a spatial model typically means that there is some spatial structure in the error structure.

An ecologist might argue that all three terms in the model \(\mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i)\) are spatial, because all depend on location \(\mathbf{s}_i = (s_{i1}, s_{i2})'\). Of course, this is true. Still, in the statistics literature recognize that ‘spatial’ typically refers to spatial random effects.

Computation issues

Posterior simulation requires sampling distributions for \((\phi, \sigma^2, \boldsymbol{\beta})\). I cannot sample directly for \(\phi\)–likelihood and prior are not conjugate for this parameter. So Metropolis is an option.

Note that I do not have to sample the random effects, because they can be marginalized away. I can always sample them if I want to, e.g., if I am interested in this smooth surface. If so, I just draw from a multivariate normal distribution having mean zero and the covariance \(Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) )\)

Given covariance \(\mathbf{R}\) I can directly sample \(\boldsymbol{\beta}\), provided I can invert \(\mathbf{R}\). This will not be possible for a large number of spatial locations. I will mention some options below, but first point out that non-Gaussian data also fit within this framework.

Non-Gaussian data

The spatial model can be extended to non-Gaussian data by including an additional stage. Let \(g(\cdot)\) be a link function. For this GLM I omit the non-spatial error term,

\[ g(E[y(\mathbf{s}_i)]) = \mu(\mathbf{s}_i) + w(\mathbf{s}_i) \] Here is a spatial GLM for count data:

\[ \begin{aligned} y(\mathbf{s}) &\sim Poi(g^{-1}( \theta(\mathbf{s}_i)))\\ \theta(\mathbf{s}_i) &=\mu(\mathbf{s}_i) + w(\mathbf{s}_i) \end{aligned} \] I use the typical log link, \(g( \theta(\mathbf{s}_i)) = log(\theta(\mathbf{s}_i))\). In the spBayes example that follows, there is a predictive process that reduces the \(n \times n\) covariance matrix to a small number of ‘knots’.

spBayes

The package spBayes makes use of a predictive processs with dimension reduction. Install the package spBayes with the other packages listed at the beginning of this vignette. I will use an example with FIA data to compare the non-spatial GLM and the spatial version of a Poisson regression. I start by fitting a non-spatial model, followed by the predictive process.

First, non-spatial

To obtain starting values I first fit a (non-spatial) glm and compare fitted and observed:

form <- as.formula( y[,spec] ~ stdage + temp + deficit )

poisNS <- glm( form, data = xdata, family="poisson")
beta.starting <- coefficients( poisNS )    # to initialize spBayes
beta.tuning   <- t(chol(vcov( poisNS )))   # to start proposals
betaNS        <- matrix(beta.starting)     # non-spatial estimates
AICNS         <- summary(poisNS)$aic       # non-spatial AIC

# observed vs predicted responses
xi  <- model.matrix(form, data = xdata)
tmp <- predict.glm(poisNS, type='response', se.fit=T)
yNS <- tmp$fit

par(bty='n')
plotObsPred( y[,spec], yNS )
abline(0,1,lty=2)
abline(h=mean(y[,spec]),lty=2)
title('In-sample (at sample locations)')

Predicted vs observed counts for loblolly pine, non-spatial GLM.

These are the predictions at the sample locations. Box-and-whisker plots show 68% and 95% of predictive sample means for binned observations.

Here are fitted coefficients:

summary(poisNS)$coefficients
##               Estimate  Std. Error    z value     Pr(>|z|)
## (Intercept)  1.1121215 0.018770872   59.24720 0.000000e+00
## stdage      -2.7736851 0.024155937 -114.82416 0.000000e+00
## temp         0.5906544 0.006714573   87.96604 0.000000e+00
## deficit     -0.4310557 0.027425712  -15.71721 1.152863e-55

Exercise 1. Select a species to fit. Use AIC as a model selection criterion to determine which combination of variables in xdata best fits the data.

Obviously, the model is unable to capture some of the variation in the data. Here are maps comparing the observed and predicted data.

y1 <- cbind(y[,spec], yNS)
colnames(y1) <- c('obs','pred')
opt <- list(yy = y1, mfrow=c(1,2), titles = c('obs','predicted'),
            maplon = maplon, maplat = maplat)
speciesNAmerMap(coords[,1], coords[,2], opt = opt)

Observed and predicted maps for loblolly pine, non-spatial model.

Clearly, predictions miss some key features of the map.

Predictive process for dimension reduction

The predictive process reduces the dimensionality of a spatial covariance matrix by introducing a small set of reference points to condition on. By referencing this small matrix, we avoid inverting the full spatial matrix.

To estimate parameters we need to invert the error covariance matrix. For an \(S \times S\) covariance matrix, there are order-\(S^3\) operations. We need some mechanism for reducing the dimensionality of this process.

To reduce the dimensionality of the covariance matrix, Banerjee et al. introduce the predictive process that projects the surface to a smaller number of ‘knots’ \(\mathcal{S}^* = \mathbf{s}^*_1, \dots, \mathbf{s}^*_m\) where \(m << n\), having distribution

\[ \mathbf{w}^* \sim MVN(0, \mathbf{R}^*_{\sigma^2, \phi}) \] where \(\mathbf{R}^*_{\sigma^2, \phi}\) is a \(m \times m\) covariance matrix having the same parameterization as for the full data set. There is a conditional covariance that allows me to evaluate the expected random effect at some location \(\mathbf{s}_0\),

\[ E[w(\mathbf{s}_0) | \mathbf{w}^*] = \mathbf{r}'(\mathbf{s}_0, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \] where \(\mathbf{r}(\mathbf{s}_0, \phi)\) is the length\(-m\) vector of covariances between location \(\mathbf{s}_0\) and knot locations \(\mathbf{s}^*\). For an entire surface of points, I have the covariance function

\[ R(\mathbf{s}, \mathbf{s}'; \phi) = \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{r}(\mathbf{s}', \phi) \] where \(\mathbf{r}(\mathbf{s}, \phi)\) is now the \(n \times m\) matrix of covariances between points \(\mathbf{s}\) and \(\mathbf{s}'\). The predictive process model is now

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \tilde{w}(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \tilde{w}(\mathbf{s}) &= \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \end{aligned} \] Now the random effects in the original model have been replaced by a linear transformation of the random effects at the \(m\) knot locations. In other words, we are using the distance-dependent covariance function to interpolate between knots.

Knot locations for dimension reduction

Now I implement the spatial version. First, read the help page for the function spGLM. In addition to syntax similar to the function glm, I want to include coordinates, knots, starting values, tuning (variances for sampling), covariance model, and priors. I can call this function in a loop to obtain multiple chains.

I specify locations for knots to be used in dimension reduction. I want to assign a small number of knots, not too close to one another, but concentrated where observations are dense. I start with candidate points \(15 \times 15 = 225\) grid. Here is a map showing their locations:

library(RANN)
nk   <- 15
klon <- seq(maplon[1],maplon[2],length=nk)[-c(1,nk)]
klat <- seq(maplat[1],maplat[2],length=nk)[-c(1,nk)]
knots <- as.matrix( expand.grid(klon,klat) )

opt <- list(z = topo$z, maplon = maplon, maplat = maplat, mapscale = 8, IMAGE = T)
regMap(topo$x, topo$y, opt)
## NULL
points(coords[,1],coords[,2],cex=.2)
points(knots[,1],knots[,2],col='grey', pch=15)

Candidate knots initially with 225 locations.

Many of these locations are not helpful, e.g., those in the Atlantic Ocean. From these candidates I select those having the largest number of observations within a search radius of \(0.7\) lon/lat. I limit the total number of knots to 40. I use RANN::nn2 to find samples close to knots.

ntot <- 40
tmp <- nn2(coords, knots, searchtype='radius',k = 50, radius=.7) 
np  <- tmp[[1]]
np[np > 0] <- 1
wt <- order(rowSums(np), decreasing=T)[1:ntot]
knots <- knots[wt,]

regMap(topo$x, topo$y, opt)
## NULL
points(coords[,1],coords[,2],cex=.2)
points(knots[,1],knots[,2],col='white', pch=15)

Knot locations reduced to those closest to observations.

I will use these knots.

Here are some arguments needed for spGLM:

n.batch      <- 10
batch.length <- 1500
n.samples <- n.batch*batch.length
Q <- length(poisNS$coefficients)

The model is fitted here:

priors <-  list("beta.Flat", "phi.Unif" = c(.05, 4), "sigma.sq.IG" = c(50, 20))
betaS  <- rnorm(length(betaNS),betaNS,abs(betaNS)*.1)
phiS   <- .tnorm(1,.1,2,.05,1)
sigS   <- .tnorm(1,0,3,3,1)

out <- spGLM( form, data = xdata, family="poisson",
             coords = as.matrix(coords), knots=knots,
             starting=list("beta"=betaS, "phi"= phiS,"sigma.sq"= sigS, "w"=0),
             tuning=list("beta"=beta.tuning, "phi"=0.4,"sigma.sq"=1, "w"=0.1),
             priors=priors,
             amcmc=list("n.batch"=n.batch,"batch.length"=batch.length,
                        "accept.rate"=0.43),
             cov.model="exponential", verbose=T, n.report=500)
## ----------------------------------------
##  General model description
## ----------------------------------------
## Model fit with 1223 observations.
## 
## Number of covariates 4 (including intercept if specified).
## 
## Using the exponential spatial correlation model.
## 
## Using non-modified predictive process with 40 knots.
## 
## Number of MCMC samples 15000.
## 
## Priors and hyperpriors:
##  beta flat.
## 
##  sigma.sq IG hyperpriors shape=50.00000 and scale=20.00000
## 
##  phi Unif hyperpriors a=0.05000 and b=4.00000
## 
## Adaptive Metropolis with target acceptance rate: 43.0
## -------------------------------------------------
##      Sampling
## -------------------------------------------------
## Sampled: 15000 of 15000, 100.00%
## -------------------------------------------------
out$DIC <- unlist( spDiag(out, verbose=F) )['DIC4']

burn.in   <- 0.8*n.samples
sub.samps <- burn.in:n.samples
out$p.samples[,"phi"] <- 3/out$p.samples[,"phi"]

plot(out$p.beta.theta.samples)


Estimates

Note that I have saved the DIC. Here is a table of estimates:

coeff <- t(apply( out$p.beta.theta.samples, 2, quantile, c(.5, .025, .975) ))
se    <- apply( out$p.beta.theta.samples, 2, sd )
coeff <- cbind( coeff[,1], se, coeff[,2:3])
print( signif(coeff, 3) ) 
##                          se    2.5%   97.5%
## (Intercept) -2.18000 0.9380 -3.0200  0.4440
## stdage      -2.25000 0.0332 -2.3100 -2.2000
## temp         0.53700 0.0985  0.4500  0.8030
## deficit     -0.00792 0.0656 -0.1620  0.0798
## sigma.sq     1.33000 0.3940  0.9510  2.7400
## phi          0.16600 0.0425  0.0605  0.2550

Here again is the non-spatial fit:

print( signif( summary(poisNS)$coefficients, 3) )
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    1.110    0.01880    59.2 0.00e+00
## stdage        -2.770    0.02420  -115.0 0.00e+00
## temp           0.591    0.00671    88.0 0.00e+00
## deficit       -0.431    0.02740   -15.7 1.15e-55

Exercise 2a. Do you believe that the MCMC converged to a posterior distribution? If not, what might you do to convince yourself?

Exercise 2b. What are the differences between estimates for the spatial and non-spatial models? Are these estimates comparable? Why or why not?

Here is the covariance function:

betaS <- out$p.beta.theta.samples[sub.samps,]
samps <- betaS[,c('sigma.sq','phi')]
dseq  <- seq(0,1,length=100)
cmat  <- matrix(0,nrow(samps),100)

for(k in 1:nrow(samps)){
  cmat[k,] <- samps[k,'sigma.sq']*exp(-dseq/samps[k,'phi'])
}

ci <- apply(cmat,2,quantile,c(.5,.025,.975))
plot(dseq,ci[1,],ylim=c(0,1.2*max(ci)),type='l')
lines(dseq,ci[2,],lty=2)
lines(dseq,ci[3,],lty=2)

Fitted covariance function for the exponential model in spBayes.

Predictions

The predicted mean values have changed from the non-spatial GLM:

what  <- out$p.w.samples[,sub.samps]
ymu   <- xi%*%t(betaS[,colnames(xi)]) 
phat  <- exp( ymu + what)
yhat  <- rowMeans(phat)
ymu   <- rowMeans( exp(ymu) )
wmu   <- rowMeans( exp(what) )
yhatNS <- exp(xi%*%beta.starting)

breaks <- seq(0, 500, by=50)
plotObsPred(y[,spec], yhatNS, breaks = breaks, fill='grey')
plotObsPred(y[,spec],yhat, add=T, box.col = 'brown', fill = 'tan',
            breaks = breaks + 30)
abline(0,1,lty=2)
abline(h=mean(y[,spec]),lty=2)

Predicted vs observed in the spatial model includes random effects (brown) compared with the non-spatial model (grey).

The root mean square prediction error for the non-spatial model is 31.76. The root mean square prediction error for this model is 18.9442675

The predicted values can be decomposed into contributions from the mean structure (the covariates) and the spatial random effects, \(\mathbf{w}\). They are multiplicative in this exponential link function. Note how the two contributions vary across the map:

y2 <- cbind(y[,spec], yhat, ymu, wmu)
colnames(y2) <- c('obs','pred','mean','w')
opt <- list(yy = y2, mfrow=c(2,2), titles = colnames(y2), maplon = maplon, 
            maplat = maplat)
speciesNAmerMap(coords[,1],coords[,2], opt = opt)

Observed, predicted (above), and contributions of mean and random effects (below).

First note the differences in scale for these maps. The random effects smooth and fill in for the limited mean structure. This multivariate approach borrows from all observations, through spatial covariance, to predict the data with random effects. In the next section I compare with a different multivariate approach, non-spatial GJAM.


Exercise 3a. Divide up your group so that each determines how one of the following influences model selection. Use DIC as your criterion.

  1. Different numbers of knots

  2. Prior distributions on \(\sigma^2\).

  3. Prior distributions on \(\phi\).

Exercise 3b. Interpret the contributions of the mean and the spatial random effects to predicted surfaces. In other words, what do they mean?

GJAM comparison

I want to compare the spatial GLM with a multivariate species model that uses a different likelihood, one without a non-linear link function. I might not want spatial random effects, because i) they have no biological interpretation, and ii) they impose smoothing. On the first point, I would prefer to have explanation come from predictors in the model. I know how to interpret them.

On the second point, there can be good reasons for spatial smoothing. Any time I believe that the response between two locations could be approximated by interpolated values, I might want to consider spatial smoothing. For FIA data, I do not want spatial smoothing. When I see forest composition varying with habitat gradients over tens of meters, I do not want to approximate it with models that interpolates between samples tens of km apart.

In GJAM, a joint distribution of species can help me understand effects of predictors on all species. If I want to analyze more than one species, these residual correlations are needed for proper credible and predictive intervals. Here is the same mean specification in GJAM, but on the data scale (not the link scale):

#S <- ncol(y)
modelList <- list(ng = 3000, burnin = 1000, typeNames = 'DA', 
                  reductList = list(r = 3, N = 20))
output   <- gjam(~ stdage + temp + deficit, xdata, y, modelList)
gjamDIC  <- output$modelSummary$DIC

plotPars <- list(SAVEPLOTS = T, PLOTALLY=T)
gjamPlot(output, plotPars)

plotObsPred( y[,spec], output$prediction$ypredMu[,spec])

Here are predictions:

*Observed and predicted by gjam.*

Observed and predicted by gjam.

Here are coefficients for this species from output$parameters$betaStandXTable:

                  Estimate    SE    CI_025   CI_975 sig95
PITA_intercept    -84.4      6.14   -97.3    -71.9     *
PITA_stdage       -19.2      3.16   -24.2    -12.8     *
PITA_temp         127.0      8.99   112.0    143.0     *
PITA_deficit      -24.8      4.45   -33.6    -17.4     *

Here are maps:

ypred <- output$prediction$ypredMu
opt   <- list(yy = cbind(y[,'PITA'],ypred[,'PITA']), 
              mfrow=c(2,2), titles = c('obs','pred'))
speciesNAmerMap(coords[,1],coords[,2], opt = opt)
knitr::include_graphics("gjamMap.png")
*Maps of observed and predicted by gjam.*

Maps of observed and predicted by gjam.

Note that GJAM does not know sample locations so there is no spatial smoothing. GJAM differs from previous models in that it is not a GLM.

Exercise 4. Repeat exercise 3 with GJAM and compare results for the two models. If there are differences, can you explain them?

Conditional autoregression (CAR) model

Areal unit data, or block-referenced data, come from non-overlapping areas. The CAR model takes into account correlations that can come from adjacency. Suppose there are \(n\) spatial units. In it’s simplest form there is a \(n \times n\) indicator matrix that specifies units sharing a boundary with ‘1’ and ‘0’ elsewhere. To demonstrate I use a few examples from the CARBayes package.

\[ \begin{aligned} y_i &= \mu_i + \phi_i + \epsilon_i \\ \phi_i|\phi_{-i} &\sim N \left(\frac{\rho \sum_{j=1}^n w_{ij}\phi_{j}}{ \rho \sum_{j=1}^n w_{ij} + 1 - \rho }, \frac{\sigma^2}{\rho \sum_{j=1}^n w_{ij} + 1 - \rho} \right) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] Note that the CAR random effect \(\phi_i\) for block \(i\) is a weighted average of its neighbors \(\{-i\}\). For dependence parameter \(\rho\) there is independence for \(\rho = 0\) and the standard CAR dependence for \(\rho = 1\). The variance declines with the numbers of neighbors (sum of the weights). The parameter \(\rho\) is implemented in the function S.Carleroux.

For a transparent demonstration, here is a square neighborhood matrix:

#### Set up a square lattice region
m <- 12
xEast  <- 1:m
xNorth <- 1:m
grid   <- expand.grid(xEast, xNorth)
n      <- nrow(grid)
plot( NULL, xlim = c(0, m), ylim = c(0, m), xlab='East', ylab='North' )
abline(v=grid[,1], h=grid[,2])
text(grid[,1] - .5, grid[,2] - .5, 1:n, cex=.8)

Lattice with location labels for simulated data.

D <- W <- as.matrix(dist(grid))
W[W != 1] <- 0  

The map simply assigns a label to each block.

Take note of the structure of W. It has \(m^2 \times m^2\) elements, all zero but the off-diagonal elements. Box 1 has two neighbors (2, 13), and W[1,2] and W[1,13] are set to 1. This adjacency is apparent from the map. Box 2 has three neighbors (1, 3, 14). Box 14 has four neighbors (2, 13, 15, 26). These (row, column) combinations have a ‘1’ in W.

The zeros and ones in the W matrix could be replaced with weights specifying how each affects the other.

Here is a simulated data set

Q <- 3
x <- matrix( rnorm(Q*n), n, Q )
x[,1] <- 1
x2    <- x[,2]
x3    <- x[,3]
beta  <- matrix( rnorm(Q), Q, 1)
sigma <- .1

# simulated based on distance D
phi <- mvrnorm(1, rep(0,n), 1*exp(-0.1*D) )
y   <- x%*%beta + phi + rnorm(n, 0, sigma)

form <- as.formula(y ~ x2 + x3)

## Gaussian model
gaussianModel <- S.CARleroux(formula=form, family  = 'gaussian', W=W, 
            burnin=20000, n.sample=100000, thin=10, verbose=F)
gaussianModel
## 
## #################
## #### Model fitted
## #################
## Likelihood model - Gaussian (identity link function) 
## Random effects model - Leroux CAR
## Regression equation - y ~ x2 + x3
## Number of missing observations - 0
## 
## ############
## #### Results
## ############
## Posterior quantities and DIC
## 
##              Median    2.5%   97.5% n.effective Geweke.diag
## (Intercept) -0.3695 -0.3841 -0.3542      6361.8        -0.9
## x2          -1.0677 -1.1223 -1.0151      2162.8        -0.5
## x3          -0.5637 -0.6227 -0.5058      2140.2         1.1
## nu2          0.0061  0.0020  0.0195      1746.7         0.3
## tau2         0.3755  0.2948  0.4878      6991.2         0.1
## rho          0.9811  0.9288  0.9983      7097.2         1.4
## 
## DIC =  -216.8581       p.d =  110.1561       LMPL =  49.12

Here is the autocorrelation parameter \(\rho\)

plot(gaussianModel$samples$rho)

Here is a map of the spatial random effects:

library('RColorBrewer')

fv <- gaussianModel$fitted.values
mf <- min(fv)
cc <- fv - mf
ss <- seq(0, max(cc), length.out=10)
cc <- findInterval(cc, ss)

colM <- colorRampPalette(brewer.pal(5,'YlOrRd'))
colm <- colM(10)

symbols(x=grid[,1], y=grid[,2], squares = cc*0+1, bg=colm[cc],
        fg=colm[cc],inches=F, xlab='East', ylab='North')

The CAR model can apply to non-Gaussian likelihoods. Here is a GLM:

lambda <- exp(x%*%beta + phi + rnorm(n, 0, sigma))
y <- rpois(n, lambda)

poissonModel <- S.CARbym(formula=form, family="poisson",
                         W=W, burnin=20000, n.sample=100000, thin=10, verbose=F)
poissonModel
## 
## #################
## #### Model fitted
## #################
## Likelihood model - Poisson (log link function) 
## Random effects model - BYM CAR
## Regression equation - y ~ x2 + x3
## Number of missing observations - 0
## 
## ############
## #### Results
## ############
## Posterior quantities and DIC
## 
##              Median    2.5%   97.5% n.effective Geweke.diag
## (Intercept) -0.4933 -0.7792 -0.2429       825.2        -1.7
## x2          -1.1963 -1.4191 -1.0033       976.3        -2.1
## x3          -0.4175 -0.6250 -0.2150      1765.1        -0.7
## tau2         0.6112  0.2985  1.2216      1275.4         0.8
## sigma2       0.0063  0.0018  0.0442       183.0         0.1
## 
## DIC =  348.3795       p.d =  32.40445       LMPL =  -176.72


Repeat the spatial analysis with a binomial model. Use the CARBayes help page. Interpret the analysis.

Lip cancer analysis

Here is code from this site on a lip cancer analysis. I am reproducing it here, because I had to change some things to get the code to run.

library(knitr)
library(spdep)
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge package
## with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## 
## Attaching package: 'spData'
## The following object is masked _by_ '.GlobalEnv':
## 
##     coords
## Loading required package: sf
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(CARBayes)
library(CARBayesdata)
library(RColorBrewer)
library(reshape2)
library(plyr)
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:maps':
## 
##     ozone
library(maps)
library(maptools)

data(lipdata)
data(lipdbf)
data(lipshp)
kable(summary(lipdata), caption="Summary Statistics")
Summary Statistics
observed expected pcaff latitude longitude
Min. : 0.000 Min. : 1.100 Min. : 0.000 Min. :54.94 Min. :1.430
1st Qu.: 4.750 1st Qu.: 4.050 1st Qu.: 1.000 1st Qu.:55.78 1st Qu.:3.288
Median : 8.000 Median : 6.300 Median : 7.000 Median :56.04 Median :4.090
Mean : 9.571 Mean : 9.575 Mean : 8.661 Mean :56.40 Mean :4.012
3rd Qu.:11.000 3rd Qu.:10.125 3rd Qu.:11.500 3rd Qu.:57.02 3rd Qu.:4.730
Max. :39.000 Max. :88.700 Max. :24.000 Max. :60.24 Max. :6.800
lipdbf$dbf    <- lipdbf$dbf[ ,c(2,1)]
data.combined <- combine.data.shapefile(data=lipdata, shp=lipshp, dbf=lipdbf)

# map of neighbors
nb.bound <- poly2nb(data.combined) 
coords   <- coordinates(data.combined)
plot(data.combined, border = "gray", main="Scottland")
plot(nb.bound, coords, pch = 19, cex = 0.6, add = TRUE)

The map shows the neighbors. Here are maps of data:

breakpoints <- seq(min(lipdata$observed)-1, max(lipdata$observed)+1, length.out=8)
my.palette <- brewer.pal(n = 7, name = "OrRd")

spplot(data.combined, c("observed", "expected"), 
       main="Scottish Lip Cancer",at=breakpoints,
       col.regions=my.palette, col="grey")

Here is the pcaff variable, one of the predictors in the model:

spplot(data.combined, c("observed", "pcaff"), 
       main="Scottish Lip Cancer", at=breakpoints,
       col.regions=my.palette, col="black")

Here are model fits:

# a glm
glmmodel <- glm(observed~., family="poisson", data=data.combined@data)
resid.glmmodel <- residuals(glmmodel)

W.nb    <- poly2nb(data.combined, row.names = rownames(data.combined@data))
W.list  <- nb2listw(W.nb, style="B")
testglm <- moran.mc(x=resid.glmmodel, listw=W.list, nsim=1000)
W       <- nb2mat(W.nb, style="B")
formula <- observed ~ expected+pcaff+latitude+longitude
model.spatial1 <- S.CARleroux(formula=formula,
                              data=data.combined@data,family="gaussian", 
                              W=W, burnin=5000, n.sample=10000, thin=10, verbose=F)
betas1 <- summarise.samples(model.spatial1$samples$beta, 
                            quantiles=c(0.5, 0.025, 0.975))
resultsMS1 <- betas1$quantiles
rownames(resultsMS1) <- c("Intercept", "Expected", "Pcaff", "Latitude", "Longitude")
kable(resultsMS1, caption="95% Credible Intervals Model 1")
95% Credible Intervals Model 1
0.5 0.025 0.975
Intercept -113.6697819 -204.4317250 -13.3881098
Expected 0.3540665 0.2143274 0.4783663
Pcaff 0.2118790 -0.0647568 0.4670129
Latitude 2.2271950 0.4844062 3.8093866
Longitude -1.6772267 -3.3518017 -0.0157693

Here is a non-linear model:

model.spatial2 <- S.CARleroux(formula=formula,
                              data=data.combined@data,family="poisson", 
                              W=W, burnin=5000, n.sample=10000, thin=10, verbose=FALSE)
betas2 <- summarise.samples(model.spatial2$samples$beta, quantiles=c(0.5, 0.025, 0.975))
resultsMS2 <- betas2$quantiles
rownames(resultsMS2) <- c("Intercept", "Expected", "Pcaff", "Latitude", "Longitude")
kable(resultsMS2, caption="95% Credible Intervals Model 2")
95% Credible Intervals Model 2
0.5 0.025 0.975
Intercept -12.3595199 -27.2847786 2.4560713
Expected 0.0269074 0.0181836 0.0381038
Pcaff 0.0092210 -0.0237381 0.0361738
Latitude 0.2560567 -0.0056237 0.5152138
Longitude -0.1097376 -0.3859340 0.0979909

Here is another option with piecewise intercepts set by the constant G.

model.spatial3 <- S.CARlocalised(formula=formula, G=5,
                                 data=data.combined@data,family="poisson", 
                                 W=W, burnin=5000, n.sample=10000, thin=10,
                                 verbose=FALSE)
betas3 <- summarise.samples(model.spatial3$samples$beta, 
                            quantiles=c(0.5, 0.025, 0.975))
resultsMS3 <- betas3$quantiles
rownames(resultsMS3) <- c("Expected", "Pcaff", "Latitude", "Longitude")
kable(resultsMS3, caption="95% Credible Model 3")
95% Credible Model 3
0.5 0.025 0.975
Expected 0.0296238 0.0140120 0.0450000
Pcaff 0.0194786 -0.0051659 0.0438536
Latitude 0.3256813 0.0586318 0.4884036
Longitude -0.1495276 -0.2956594 -0.0117189


Exercise 6. Work through the lip cancer example in groups as if you were attempting to apply it to a data set of your own.

Reprise

Spatial models confront processes that explicitly require location, distance, or both. Models are often large, requiring dimension reduction.

spBayes is a powerful package for analysis of spatial random effects.

Spatial random effects are not always desirable, depending on the process and the spatial intensity of sampling.