Spatial models include spatial random effects. Point patterns treat points as random. Point-referenced data and block-referenced data apply spatial random effects to capture smoothing as a function of distance. Dimension reduction methods can exploit this structure to avoid large matrix operations.
source('clarkFunctions2024.r')
library(spBayes)
library(maps)
library(RANN)
library(gjam)
library(CARBayes)
library(CARBayesdata)
Background from Andy Finley’s site.
Recognize where spatial models are needed and where they are not.
Identify block-referenced, point-referenced, and point-pattern data.
Determine whether location, distance, or both are important for analysis.
Understand some basic dimension reduction methods and why they are important.
Execute spatial random effects models with spBayes.
Spatial data are point-referenced, block-referenced, or point patterns. The first two have fixed locations. The third has random locations.
Spatial data are referenced by location, and location is considered important for understanding responses. Point-referenced data are identified by a vector of coordinates in continuous space. For a (2-D) map the reference could be longitude, latitude: \(\mathbf{s}_i = (s_{i1}, s_{i2})\). In the ocean or atmosphere, the reference could additionally have depth or elevation, \(\mathbf{s}_i = (s_{i1}, s_{i2}, s_{i3})\).
The locations of trees in the Guanica plot, Puerto Rico (from M. Uriarte).
Block-referenced data are associated with an area. Examples could be zip codes, counties, watersheds, or voxels. Both point- and block-referenced data have fixed locations. For example, a sampling design could specify locations to sample and then treat the responses at those locations as random. Even if sample locations are assigned ‘at random’, once they are selected they might be viewed as a fixed part of the design.
Respiratory disease risk in Glascow used in a [spatial Bayesian model] (http://127.0.0.1:22285/library/CARBayes/doc/CARBayes.pdf).
Point-pattern data have random locations–the pattern itself is the response. The locations of trees could be taken as a response to competition. Locations of crimes could be taken as a response to varying socio-economic conditions within a city.
All three types of spatial data share the attribute that location is viewed as important and thus will be included as part of the model. I turn immediately to an example.
A point pattern (locations of ebird checklists in the Research Triangle), block-reference as a regular lattice (left), and as an irregular lattice (right), all from Becky Tang’s study of species interactions and movement.
Forest inventory (FIA) data are indexed by location. They can be
viewed as point-referenced data. Here is an aggregated version
of FIA data that is stored as a matrix of stem counts stem counts
ydata, and predictors in a
data.frame xdata:
load('FIAclusterDensity.rdata', verbose=T )
## Loading objects:
## xdata
## ydata
n <- nrow(ydata) # no. plots
S <- ncol(ydata) # no. species
coords <- xdata[,c('lon','lat')]
I have aggregated by predictors to make this run faster for class.
Here is a map that uses clarkFunctions2024.r:
xlim <- c(-95,-65)
ylim <- c(25,50)
map( 'usa', xlim = xlim, ylim = ylim )
points( coords[,1], coords[,2], col = 'grey', cex=.1)
Sample locations in this data set are aggregated FIA plots.
The points on this map show locations that I analyze below. Here is a map for loblolly pine:
spec <- 'pinusTaeda'
xlim <- c(-95,-75)
ylim <- c(24, 42)
grid2map( coords, ydata[,spec], nx = 100, ny = 80,
xlim = xlim, ylim = ylim, zlim = c(0,100) )
title('loblolly data')
Loblolly pine abundance.
Because I have aggregated plots on the basis of similar predictors,
there are only 13880 plots and noise is reduced. In the following
example I set up a spatial model for fitting in
spBayes.
Spatial random effects models achieve prediction by imposing spatial smoothing. The Gaussian process parameterizes a covariance matrix as a function of distance between sample locations.
Spatial random effects models for point-referenced data parameterize a covariance function of distance. Recall that the \(n \times n\) covariance matrix holds \(n(n+1)/2\) unique elements. I cannot estimate all of these values. Not only are there too many locations, but, unless there is replication, each is typically observed once. However, if I make the elements depend on one another, then I may only need a few parameters to describe the entire matrix. Spatial smoothing is often not desirable, but, where appropriate, it can greatly improve spatial modeling. I return to this issue later.
Recall the regression model, now referenced by location,
\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] Location is defined by a vector \(\mathbf{s}_i = (s_{i1}, s_{i2})'\), in this case, (longitude, latitude) of plot \(i\).
The mean structure of the model could be linear in parameters, \(\mu(\mathbf{s}_i) = \mathbf{x}'(\mathbf{s}_i) \boldsymbol{\beta}\). The errors in this model are independent and identically distributed (i.i.d). The Gaussian process is the most general approach to allow for spatial smoothing. Consider a new term in the model for the random effect, \(w(\mathbf{s}_i)\), having a joint distribution
\[ \mathbf{w} \sim GP(0, \sigma^2 R(\cdot)) \] with covariance between locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\),
\[Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) ) = \sigma^2 \rho(\phi; ||\mathbf{s}_i - \mathbf{s}_j||)\] The notation \(||\mathbf{s}_i - \mathbf{s}_j||\) indicates distance. I can think of a \(n \times n\) matrix of distances, with each element replaced by the function. The full covariance matrix can be written as
\[\mathbf{R}_{\sigma^2, \phi} = \sigma^2 R(\phi)\] In other words, the covariance matrix is a function of distance–the locations only matter in terms of their pairwise distances from one another.
There are a number of different functions that can be used to describe how covariance between points declines with distance. One common model decays exponentially,
\[\rho(\phi, s) = e^{-\phi s}\] for a distance \(s\). A plot could look like this:
The covariance function intersects the y axis at a value of \(\sigma^2\).
With spatial random effects the model is
\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \mathbf{w} &\sim MVN(0, \mathbf{R}_{\sigma^2, \phi}) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] If I marginalize out the random effects, I can equivalently write
\[ \mathbf{y} \sim MVN(\mathbf{X} \boldsymbol{\beta}, \mathbf{R}_{\sigma^2, \phi} + \tau \mathbf{I}) \] Note that I have moved the random effects from the mean structure into the covariance.
So what does the term ‘spatial’ mean?
In recent statistics literature, a spatial model typically means that there is some spatial structure in the error structure.
An ecologist might argue that all three terms in the model \(\mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i)\) are spatial, because all depend on location \(\mathbf{s}_i = (s_{i1}, s_{i2})'\). Of course, this is true. Still, in the statistics literature recognize that ‘spatial’ typically refers to spatial random effects.
Posterior simulation requires sampling distributions for \((\phi, \sigma^2, \boldsymbol{\beta})\). I cannot sample directly for \(\phi\)–likelihood and prior are not conjugate for this parameter. So Metropolis is an option.
Note that I do not have to sample the random effects, because they can be marginalized away. I can always sample them if I want to, e.g., if I am interested in this smooth surface. If so, I just draw from a multivariate normal distribution having mean zero and the covariance \(Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) )\)
Given covariance \(\mathbf{R}\) I can directly sample \(\boldsymbol{\beta}\), provided I can invert \(\mathbf{R}\). This will not be possible for a large number of spatial locations. I will mention some options below, but first point out that non-Gaussian data also fit within this framework.
The spatial model can be extended to non-Gaussian data by including an additional stage. Let \(g(\cdot)\) be a link function. For this GLM I omit the non-spatial error term,
\[ g(E[y(\mathbf{s}_i)]) = \mu(\mathbf{s}_i) + w(\mathbf{s}_i) \] Here is a spatial GLM for count data:
\[
\begin{aligned}
y(\mathbf{s}) &\sim Poi(g^{-1}( \theta(\mathbf{s}_i)))\\
\theta(\mathbf{s}_i) &=\mu(\mathbf{s}_i) + w(\mathbf{s}_i)
\end{aligned}
\] I use the typical log link, \(g(
\theta(\mathbf{s}_i)) = \log(\theta(\mathbf{s}_i))\). In the
spBayes example that follows, there is a predictive process
that reduces the \(n \times n\)
covariance matrix to a small number of ‘knots’.
The package spBayes makes use of a predictive
processs with dimension reduction. Install the package
spBayes with the other packages listed at the beginning of
this vignette. I will use an example with FIA data to compare the
non-spatial GLM and the spatial version of a Poisson regression. I start
by fitting a non-spatial model, followed by the predictive process.
To obtain starting values I first fit a (non-spatial) glm and compare fitted and observed:
form <- as.formula( ydata[,spec] ~ standAge + meanTemp + annualDef )
poisNS <- glm( form, data = xdata, family="poisson")
beta.starting <- coefficients( poisNS ) # to initialize spBayes
beta.tuning <- t(chol(vcov( poisNS ))) # to start proposals
betaNS <- matrix(beta.starting) # non-spatial estimates
AICNS <- summary(poisNS)$aic # non-spatial AIC
# observed vs predicted responses
tmp <- predict.glm(poisNS, type='response', se.fit=T)
yNS <- tmp$fit
par(bty='n')
predVsObsPlot( sqrt(ydata[,spec]), sqrt(yNS) )
abline(0,1,lty=2)
abline(h=mean( sqrt(ydata[,spec]) ),lty=2)
title('In-sample (at sample locations)')
Predicted vs observed counts for loblolly pine, non-spatial GLM.
These are the predictions at the sample locations. Box-and-whisker plots show 68% and 95% of predictive sample means for binned observations.
Here are fitted coefficients:
summary(poisNS)$coefficients
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -0.4053605 | 0.0134830 | -30.06456 | 0 |
| standAge | -0.0359860 | 0.0000990 | -363.56884 | 0 |
| meanTemp | 0.2925764 | 0.0006730 | 434.70712 | 0 |
| annualDef | -0.0012355 | 0.0000058 | -212.43460 | 0 |
Exercise 1. Select a species to fit. Use AIC as a model
selection criterion to determine which combination of variables in
xdata best fits the data.
Here are maps comparing the observed and predicted data.
zlim <- c( 0, 300 )
par( mfrow=c(1,2) )
grid2map( coords, ydata[,spec], nx = 80, ny = 50, xaxt = 'n', yaxt = 'n',
xlim = xlim, ylim = ylim, zlim = zlim )
title( 'Observed' )
grid2map( coords, yNS, nx = 80, ny = 50, xaxt = 'n', yaxt = 'n',
xlim = xlim, ylim = ylim, zlim = zlim )
title( 'Predicted' )
Observed and predicted maps for loblolly pine, non-spatial model.
Clearly, predictions miss some key features of the map.
The predictive process reduces the dimensionality of a spatial covariance matrix by introducing a small set of reference points to condition on. By referencing this small matrix, we avoid inverting the full spatial matrix.
To estimate parameters we need to invert the error covariance matrix. For an \(n \times n\) covariance matrix, there are order-\(n^3\) operations. We need some mechanism for reducing the dimensionality of this process.
To reduce the dimensionality of the covariance matrix, Banerjee et al. introduce the predictive process that projects the surface to a smaller number of ‘knots’ \(\mathcal{S}^* = \mathbf{s}^*_1, \dots, \mathbf{s}^*_m\) where \(m << n\), having distribution
\[ \mathbf{w}^* \sim MVN(0, \mathbf{R}^*_{\sigma^2, \phi}) \] where \(\mathbf{R}^*_{\sigma^2, \phi}\) is a \(m \times m\) covariance matrix having the same parameterization as for the full data set. There is a conditional covariance that allows me to evaluate the expected random effect at some location \(\mathbf{s}_0\),
\[ E[w(\mathbf{s}_0) | \mathbf{w}^*] = \mathbf{r}'(\mathbf{s}_0, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \] where \(\mathbf{r}(\mathbf{s}_0, \phi)\) is the length\(-m\) vector of covariances between location \(\mathbf{s}_0\) and knot locations \(\mathbf{s}^*\). For an entire surface of points, I have the covariance function
\[ R(\mathbf{s}, \mathbf{s}'; \phi) = \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{r}(\mathbf{s}', \phi) \] where \(\mathbf{r}(\mathbf{s}, \phi)\) is now the \(n \times m\) matrix of covariances between points \(\mathbf{s}\) and \(\mathbf{s}'\). The predictive process model is now
\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \tilde{w}(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \tilde{w}(\mathbf{s}) &= \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \end{aligned} \] Now the random effects in the original model have been replaced by a linear transformation of the random effects at the \(m\) knot locations. In other words, we are using the distance-dependent covariance function to interpolate between knots.
Now I implement the spatial version. First, read the help page for
the function spGLM. In addition to syntax similar to the
function glm, I want to include coordinates, knots,
starting values, tuning (variances for sampling), covariance model, and
priors. I can call this function in a loop to obtain multiple
chains.
I specify locations for knots to be used in dimension reduction. I want to assign a small number of knots, not too close to one another, but concentrated where observations are dense. I start with candidate points \(15 \times 15 = 225\) grid. Here is a map showing their locations:
nk <- 15
klon <- seq(xlim[1],xlim[2],length=nk)[-c(1,nk)]
klat <- seq(ylim[1],ylim[2],length=nk)[-c(1,nk)]
knots <- as.matrix( expand.grid(klon,klat) )
colnames(knots) <- c('lon','lat')
map( 'usa', xlim = xlim, ylim = ylim )
points(coords[,1],coords[,2], col = 'grey', cex=.1)
points(knots[,1],knots[,2], col='#4575b4', pch=15)
Candidate knots initially with 225 locations.
Many of these locations are not helpful, e.g., those in the Atlantic
Ocean. From these candidates I select those having the largest number of
observations within a search radius of 1 degree lon/lat. I limit the
total number of knots to 40. I use RANN::nn2 to find
samples close to knots.
k <- 50
tmp <- nn2(coords, knots, searchtype='radius',k = k, radius = 1)
np <- tmp[[1]]
np[np > 0] <- 1
wt <- order(rowSums(np), decreasing=T)[1:40]
knots <- knots[wt,]
map( 'usa', xlim = xlim, ylim = ylim )
wp <- which( ydata[,spec] > 0 ) # sites with loblolly pine
points(coords[wp,1],coords[wp,2], col = 'grey',cex=.2)
points(knots[,1],knots[,2],col='#4575b4', pch=15)
Knot locations reduced to those closest to observations.
I will use these knots.
The model is fitted here:
n.batch <- 2
batch.length <- 1500
n.samples <- n.batch*batch.length # MCMC iterations
priors <- list( "beta.Flat", "phi.Unif" = c(.05, 4), "sigma.sq.IG" = c(50, 20) )
betaS <- rnorm( length(betaNS), betaNS, abs(betaNS)*.1 )
phiS <- .tnorm( 1,.1,2,.05,1 )
sigS <- .tnorm( 1,0,3,3,1 )
out <- spGLM( form, data = xdata, family = "poisson",
coords = as.matrix(coords), knots=knots,
starting = list("beta"=betaS, "phi"= phiS,"sigma.sq"= sigS, "w"= 0 ),
tuning=list("beta" = beta.tuning, "phi" = 0.4,"sigma.sq" = 1, "w" = 0.1 ),
priors=priors,
amcmc = list("n.batch" = n.batch,"batch.length" = batch.length,
"accept.rate" = 0.43),
cov.model = "exponential", verbose = T, n.report = 500 )
## ----------------------------------------
## General model description
## ----------------------------------------
## Model fit with 13880 observations.
##
## Number of covariates 4 (including intercept if specified).
##
## Using the exponential spatial correlation model.
##
## Using non-modified predictive process with 40 knots.
##
## Number of MCMC samples 3000.
##
## Priors and hyperpriors:
## beta flat.
##
## sigma.sq IG hyperpriors shape=50.00000 and scale=20.00000
##
## phi Unif hyperpriors a=0.05000 and b=4.00000
##
## Adaptive Metropolis with target acceptance rate: 43.0
## -------------------------------------------------
## Sampling
## -------------------------------------------------
## Sampled: 3000 of 3000, 100.00%
## -------------------------------------------------
out$DIC <- unlist( spDiag(out, verbose=F) )['DIC4']
burn.in <- 0.8*n.samples
sub.samps <- burn.in:n.samples
out$p.samples[,"phi"] <- 3/out$p.samples[,"phi"]
plot(out$p.beta.theta.samples, bty = 'n')
Spatial correlation function.
Spatial correlation function.
Note that I have saved the DIC. Here is a table of estimates:
coeff <- t(apply( out$p.beta.theta.samples, 2, quantile, c(.5, .025, .975) ))
se <- apply( out$p.beta.theta.samples, 2, sd )
coeff <- cbind( coeff[,1], se, coeff[,2:3])
print( coeff )
| se | 2.5% | 97.5% | ||
|---|---|---|---|---|
| (Intercept) | -2.91000 | 8.43e-01 | -3.76000 | -0.772000 |
| standAge | -0.02950 | 4.86e-04 | -0.03140 | -0.029200 |
| meanTemp | 0.27000 | 1.16e-02 | 0.25100 | 0.289000 |
| annualDef | -0.00103 | 8.24e-05 | -0.00115 | -0.000813 |
| sigma.sq | 5.24000 | 9.63e-01 | 3.73000 | 7.120000 |
| phi | 0.05000 | 2.44e-03 | 0.05000 | 0.050200 |
Here again is the non-spatial fit:
print( summary(poisNS)$coefficients )
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -0.40500 | 1.35e-02 | -30.1 | 0 |
| standAge | -0.03600 | 9.90e-05 | -364.0 | 0 |
| meanTemp | 0.29300 | 6.73e-04 | 435.0 | 0 |
| annualDef | -0.00124 | 5.80e-06 | -212.0 | 0 |
Exercise 2a. Do you believe that the MCMC converged to a posterior distribution? If not, what might you do to convince yourself?
Exercise 2b. What are the differences between estimates for the spatial and non-spatial models? Are these estimates comparable? Why or why not?
Here is the covariance function:
betaS <- out$p.beta.theta.samples[sub.samps,]
samps <- betaS[,c('sigma.sq','phi')]
dseq <- seq(0,1,length=100)
cmat <- matrix(0,nrow(samps),100)
for(k in 1:nrow(samps)){
cmat[k,] <- samps[k,'sigma.sq']*exp(-dseq/samps[k,'phi'])
}
ci <- apply(cmat,2,quantile,c(.5,.025,.975))
plot(dseq, ci[1,], ylim = c( 0, 1.2*max(ci)), type = 'l', bty = 'n',
xlab = 'Distance', ylab = 'Covariance' )
shadeInterval( dseq, t(ci[2:3,]) )
Fitted covariance function for the exponential model in spBayes.
The predicted mean values have changed from the non-spatial GLM:
xi <- model.matrix( form, data = xdata)
what <- out$p.w.samples[,sub.samps]
ymu <- xi%*%t(betaS[,colnames(xi)])
phat <- exp( ymu + what )
yhat <- rowMeans(phat)
ymu <- rowMeans( exp(ymu) )
wmu <- rowMeans( exp(what) )
yhatNS <- exp( xi%*%beta.starting )
predVsObsPlot( ydata[,spec], yhatNS,
ybins = unique( quantile( ydata[,spec], seq(0, 1, by = .03 ) )) )
predVsObsPlot( ydata[,spec], yhat, col = 'brown',
ybins = unique( quantile( ydata[,spec], seq(0, 1, by = .04 ) )), ADD = T )
abline( h = mean( ydata[,spec] ), lty = 2)
Predicted vs observed in the spatial model includes random effects (brown) compared with the non-spatial model (grey).
The root mean square prediction error for the non-spatial model is 48.87. The root mean square prediction error for this model is 34.5308306
The predicted values can be decomposed into contributions from the mean structure (the covariates) and the spatial random effects, \(\mathbf{w}\). They are multiplicative in this exponential link function. Note how the two contributions vary across the map:
y2 <- cbind( ydata[,spec], yhat, ymu, wmu )
colnames(y2) <- c( 'obs', 'pred', 'mean', 'w')
par( mfrow = c(2,2), mar = c(.1, .1, .5, .1), omi = c(.2, .2, .2, .2) )
for(k in 1:4){
grid2map( coords, y2[,k], nx = 100, ny = 60, xaxt = 'n', yaxt = 'n',
xlim = xlim, ylim = ylim )
title( colnames( y2 )[k] )
}
Observed, predicted (above), and contributions of mean and random effects (below).
First note the differences in scale for these maps. The random effects account for much of the geographic variation in the prediction, while the mean explains only a small part of this pattern. The random effects smooth and fill in for the limited mean structure. This multivariate approach borrows from all observations, through spatial covariance, to predict the data with random effects. In the next section I compare with a different multivariate approach, the non-spatial GJAM.
Exercise 3a. Divide up your group so that each member determines how one of the following influences model selection. Use DIC as your criterion.
Different numbers of knots
Prior distributions on \(\sigma^2\).
Prior distributions on \(\phi\).
Exercise 3b. Interpret the contributions of the mean and the spatial random effects to predicted surfaces. In other words, what do they mean?
I want to compare the spatial GLM with a multivariate species model that uses a different likelihood, one without a non-linear link function. I might not want spatial random effects, because i) they have no biological interpretation, and ii) they impose smoothing. On the first point, I would prefer to have explanation come from predictors in the model. I know how to interpret them.
On the second point, there can be good reasons for spatial smoothing. Any time I believe that the response between two locations could be approximated by interpolated values, I might want to consider spatial smoothing. For FIA data, I do not want spatial smoothing. When I see forest composition varying with habitat gradients over tens of meters, I do not want to approximate it with models that interpolates between samples tens of km apart.
In GJAM, a joint distribution of species can help me understand effects of predictors on all species. If I want to analyze more than one species, these residual correlations are needed for proper credible and predictive intervals. Here is the same mean specification in GJAM, but on the data scale (not the link scale):
#S <- ncol(y)
ydata <- gjamTrimY( ydata, 1000 )$y # species on at least 1000 plots
modelList <- list(ng = 1000, burnin = 500, typeNames = 'DA',
reductList = list(r = 3, N = 20) )
output <- gjam( ~ standAge + meanTemp + annualDef, xdata, ydata, modelList )
gjamDIC <- output$modelSummary$DIC
plotPars <- list( SAVEPLOTS = T, PLOTALLY = T )
gjamPlot(output, plotPars)
predVsObsPlot( ydata[,spec], output$prediction$ypredMu[,spec] )
Here are predictions:
Observed and predicted by gjam.
Here are coefficients for this species from
output$parameters$betaStandXTable:
beta <- output$parameters$betaStandXTable
beta[ grep('pinusTaeda', rownames(beta) ), ]
| Estimate | SE | CI_025 | CI_975 | sig95 | |
|---|---|---|---|---|---|
| pinusTaeda_intercept | -73.1 | 1.090 | -74.9 | -71.0 |
|
| pinusTaeda_standAge | -27.8 | 0.922 | -29.3 | -25.8 |
|
| pinusTaeda_meanTemp | 76.1 | 0.619 | 74.9 | 77.2 |
|
| pinusTaeda_annualDef | -15.6 | 0.787 | -17.0 | -14.3 |
|
Here are maps:
ypred <- output$prediction$ypredMu
y2 <- cbind(ydata[,spec], ypred[,spec] )
colnames(y2) <- c('obs','pred')
par( mfrow = c(1,2), mar = c(.1, .1, .5, .1), omi = c(.2, .2, .2, .2) )
for(k in 1:2){
grid2map( coords, y2[,k], nx = 100, ny = 60, xaxt = 'n', yaxt = 'n',
xlim = xlim, ylim = ylim )
title( colnames( y2 )[k] )
}
Maps of observed and predicted by gjam.
Note that GJAM does not know sample locations so there is no spatial smoothing. GJAM differs from previous models in that it is not a GLM.
Exercise 4. Repeat exercise 3 with GJAM and compare results for the two models. If there are differences, can you explain them?
Areal unit data, or block-referenced data, come from
non-overlapping areas. The CAR model takes into account correlations
that can come from adjacency. Suppose there are \(n\) spatial units. In it’s simplest form
there is a \(n \times n\) indicator
matrix that specifies units sharing a boundary with ‘1’ and ‘0’
elsewhere. To demonstrate I use a few examples from the CARBayes
package.
\[
\begin{aligned}
y_i &= \mu_i + \phi_i + \epsilon_i \\
\phi_i|\phi_{-i} &\sim N \left(\frac{\rho \sum_{j=1}^n
w_{ij}\phi_{j}}{
\rho \sum_{j=1}^n w_{ij} + 1 - \rho }, \frac{\sigma^2}{\rho \sum_{j=1}^n
w_{ij} + 1 - \rho} \right) \\
\epsilon(\mathbf{s}_i) &\sim N(0, \tau^2)
\end{aligned}
\] Note that the CAR random effect \(\phi_i\) for block \(i\) is a weighted average of its neighbors
\(\{-i\}\). For dependence parameter
\(\rho\) there is independence for
\(\rho = 0\) and the standard CAR
dependence for \(\rho = 1\). The
variance declines with the numbers of neighbors (sum of the weights).
The parameter \(\rho\) is implemented
in the function S.Carleroux.
For a transparent demonstration, here is a square neighborhood matrix:
#### Set up a square lattice region
m <- 12
xEast <- 1:m
xNorth <- 1:m
grid <- expand.grid(xEast, xNorth)
n <- nrow(grid)
plot( NULL, xlim = c(0, m), ylim = c(0, m), xlab='East', ylab='North' )
abline(v=grid[,1], h=grid[,2])
text(grid[,1] - .5, grid[,2] - .5, 1:n, cex=.8)
Lattice with location labels for simulated data.
D <- W <- as.matrix(dist(grid))
W[W != 1] <- 0
The map simply assigns a label to each block.
Take note of the structure of W. It has \(m^2 \times m^2\) elements, all zero but the
off-diagonal elements. Box 1 has two neighbors (2, 13), and
W[1,2] and W[1,13] are set to 1.
This adjacency is apparent from the map. Box 2 has three neighbors (1,
3, 14). Box 14 has four neighbors (2, 13, 15, 26). These (row, column)
combinations have a ‘1’ in W.
The zeros and ones in the W matrix could be replaced
with weights specifying how each affects the other.
Here is a simulated data set
Q <- 3
x <- matrix( rnorm(Q*n), n, Q )
x[,1] <- 1
x2 <- x[,2]
x3 <- x[,3]
beta <- matrix( rnorm(Q), Q, 1)
sigma <- .1
# simulated based on distance D
phi <- t( .rMVN(1, rep(0,n), 1*exp(-0.1*D) ) )
y <- x%*%beta + phi + rnorm(n, 0, sigma)
form <- as.formula(y ~ x2 + x3)
## Gaussian model
gaussianModel <- S.CARleroux(formula = form, family = 'gaussian', W = W,
burnin = 20000, n.sample = 100000, thin = 10, verbose = F)
gaussianModel
##
## #################
## #### Model fitted
## #################
## Likelihood model - Gaussian (identity link function)
## Random effects model - Leroux CAR
## Regression equation - y ~ x2 + x3
##
##
## #################
## #### MCMC details
## #################
## Total number of post burnin and thinned MCMC samples generated - 8000
## Number of MCMC chains used - 1
## Length of the burnin period used for each chain - 20000
## Amount of thinning used - 10
##
## ############
## #### Results
## ############
## Posterior quantities and DIC
##
## Mean 2.5% 97.5% n.effective Geweke.diag
## (Intercept) 1.5483 1.5275 1.5688 6235.6 0.2
## x2 -0.1587 -0.2096 -0.1085 3208.2 0.8
## x3 0.3482 0.2957 0.4010 3593.9 0.0
## nu2 0.0121 0.0025 0.0353 1138.1 0.6
## tau2 0.3165 0.2251 0.4162 3420.7 0.1
## rho 0.9414 0.8327 0.9951 6640.4 0.5
##
## DIC = -167.301 p.d = 94.17077 LMPL = 18.92
Here is the autocorrelation parameter \(\rho\)
plot( gaussianModel$samples$rho, bty = 'n' )
Here is a map of the spatial random effects:
fv <- gaussianModel$fitted.values
mf <- min(fv)
cc <- fv - mf
ss <- seq(0, max(cc), length.out=10)
cc <- findInterval(cc, ss)
colM <- colorRampPalette( rampRed2Blue )
colm <- colM(10)
symbols(x=grid[,1], y=grid[,2], squares = cc*0+1, bg=colm[cc],
fg=colm[cc],inches=F, xlab='East', ylab='North')
The CAR model can apply to non-Gaussian likelihoods. Here is a GLM:
lambda <- exp(x%*%beta + phi + rnorm(n, 0, sigma))
y <- rpois(n, lambda)
poissonModel <- S.CARbym(formula=form, family="poisson",
W=W, burnin=20000, n.sample=100000, thin=10, verbose=F)
poissonModel
##
## #################
## #### Model fitted
## #################
## Likelihood model - Poisson (log link function)
## Random effects model - BYM CAR
## Regression equation - y ~ x2 + x3
##
##
## #################
## #### MCMC details
## #################
## Total number of post burnin and thinned MCMC samples generated - 8000
## Number of MCMC chains used - 1
## Length of the burnin period used for each chain - 20000
## Amount of thinning used - 10
##
## ############
## #### Results
## ############
## Posterior quantities and DIC
##
## Mean 2.5% 97.5% n.effective Geweke.diag
## (Intercept) 1.5356 1.4449 1.6259 5611.2 -1.8
## x2 -0.1517 -0.2465 -0.0598 4321.3 -1.1
## x3 0.3895 0.2907 0.4858 4293.1 0.6
## tau2 0.4202 0.2411 0.6689 2378.0 0.4
## sigma2 0.0113 0.0020 0.0516 162.3 0.1
##
## DIC = 695.8211 p.d = 64.88341 LMPL = -360.65
CARBayes help page. Interpret the analysis.
Here is code from this site on using block referenced data. This example illustrates two things:
R can be used with shape files, allowing direct integration of spatial data into models.
Block referenced data can be irregular, but still amenable to CAR modeling, using adjacency.
Exercise 6. Work through the respiratory risk example in groups as if you were attempting to apply it to a data set of your own.
Spatial models confront processes that explicitly require location, distance, or both. Models are often large, requiring dimension reduction.
spBayes is a powerful package for analysis of spatial
random effects.
Spatial random effects are not always desirable, depending on the process and the spatial intensity of sampling.