Spatial models include spatial random effects. Point patterns treat points as random. Point-referenced data and block-referenced data apply spatial random effects to capture smoothing as a function of distance. Dimension reduction methods can exploit this structure to avoid large matrix operations.


Resources

Software

source('clarkFunctions2024.r')
library(spBayes)
library(maps)
library(RANN)
library(gjam)
library(CARBayes)
library(CARBayesdata)

Readings

Background from Andy Finley’s site.

Objectives

Spatial data and models

Spatial data are point-referenced, block-referenced, or point patterns. The first two have fixed locations. The third has random locations.

Spatial data are referenced by location, and location is considered important for understanding responses. Point-referenced data are identified by a vector of coordinates in continuous space. For a (2-D) map the reference could be longitude, latitude: \(\mathbf{s}_i = (s_{i1}, s_{i2})\). In the ocean or atmosphere, the reference could additionally have depth or elevation, \(\mathbf{s}_i = (s_{i1}, s_{i2}, s_{i3})\).

*The locations of trees in the Guanica plot, Puerto Rico (from M. Uriarte).*

The locations of trees in the Guanica plot, Puerto Rico (from M. Uriarte).

Block-referenced data are associated with an area. Examples could be zip codes, counties, watersheds, or voxels. Both point- and block-referenced data have fixed locations. For example, a sampling design could specify locations to sample and then treat the responses at those locations as random. Even if sample locations are assigned ‘at random’, once they are selected they might be viewed as a fixed part of the design.

*Respiratory disease risk in Glascow used in a [spatial Bayesian model] (http://127.0.0.1:22285/library/CARBayes/doc/CARBayes.pdf).*

Respiratory disease risk in Glascow used in a [spatial Bayesian model] (http://127.0.0.1:22285/library/CARBayes/doc/CARBayes.pdf).

Point-pattern data have random locations–the pattern itself is the response. The locations of trees could be taken as a response to competition. Locations of crimes could be taken as a response to varying socio-economic conditions within a city.

All three types of spatial data share the attribute that location is viewed as important and thus will be included as part of the model. I turn immediately to an example.

*A point pattern (locations of ebird checklists in the Research Triangle), block-reference as a regular lattice (left), and as an irregular lattice (right), all from Becky Tang's study of species interactions and movement.*

A point pattern (locations of ebird checklists in the Research Triangle), block-reference as a regular lattice (left), and as an irregular lattice (right), all from Becky Tang’s study of species interactions and movement.

FIA data

Forest inventory (FIA) data are indexed by location. They can be viewed as point-referenced data. Here is an aggregated version of FIA data that is stored as a matrix of stem counts stem counts ydata, and predictors in a data.frame xdata:

load('FIAclusterDensity.rdata', verbose=T )
## Loading objects:
##   xdata
##   ydata
n <- nrow(ydata)                   # no. plots
S <- ncol(ydata)                   # no. species
coords <- xdata[,c('lon','lat')]

I have aggregated by predictors to make this run faster for class. Here is a map that uses clarkFunctions2024.r:

xlim <- c(-95,-65)
ylim <- c(25,50)
map( 'usa', xlim = xlim, ylim = ylim )
points( coords[,1], coords[,2], col = 'grey', cex=.1)
*Sample locations in this data set are aggregated FIA plots.*

Sample locations in this data set are aggregated FIA plots.

The points on this map show locations that I analyze below. Here is a map for loblolly pine:

spec   <- 'pinusTaeda'
xlim <- c(-95,-75)
ylim <- c(24, 42)

grid2map( coords, ydata[,spec], nx = 100, ny = 80, 
                      xlim = xlim, ylim = ylim, zlim = c(0,100) )
title('loblolly data')
*Loblolly pine abundance.*

Loblolly pine abundance.

Because I have aggregated plots on the basis of similar predictors, there are only 13880 plots and noise is reduced. In the following example I set up a spatial model for fitting in spBayes.

Spatial random effects

Spatial random effects models achieve prediction by imposing spatial smoothing. The Gaussian process parameterizes a covariance matrix as a function of distance between sample locations.

Spatial random effects models for point-referenced data parameterize a covariance function of distance. Recall that the \(n \times n\) covariance matrix holds \(n(n+1)/2\) unique elements. I cannot estimate all of these values. Not only are there too many locations, but, unless there is replication, each is typically observed once. However, if I make the elements depend on one another, then I may only need a few parameters to describe the entire matrix. Spatial smoothing is often not desirable, but, where appropriate, it can greatly improve spatial modeling. I return to this issue later.

Recall the regression model, now referenced by location,

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] Location is defined by a vector \(\mathbf{s}_i = (s_{i1}, s_{i2})'\), in this case, (longitude, latitude) of plot \(i\).

The mean structure of the model could be linear in parameters, \(\mu(\mathbf{s}_i) = \mathbf{x}'(\mathbf{s}_i) \boldsymbol{\beta}\). The errors in this model are independent and identically distributed (i.i.d). The Gaussian process is the most general approach to allow for spatial smoothing. Consider a new term in the model for the random effect, \(w(\mathbf{s}_i)\), having a joint distribution

\[ \mathbf{w} \sim GP(0, \sigma^2 R(\cdot)) \] with covariance between locations \(\mathbf{s}_i\) and \(\mathbf{s}_j\),

\[Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) ) = \sigma^2 \rho(\phi; ||\mathbf{s}_i - \mathbf{s}_j||)\] The notation \(||\mathbf{s}_i - \mathbf{s}_j||\) indicates distance. I can think of a \(n \times n\) matrix of distances, with each element replaced by the function. The full covariance matrix can be written as

\[\mathbf{R}_{\sigma^2, \phi} = \sigma^2 R(\phi)\] In other words, the covariance matrix is a function of distance–the locations only matter in terms of their pairwise distances from one another.

There are a number of different functions that can be used to describe how covariance between points declines with distance. One common model decays exponentially,

\[\rho(\phi, s) = e^{-\phi s}\] for a distance \(s\). A plot could look like this:

The covariance function intersects the y axis at a value of \(\sigma^2\).

With spatial random effects the model is

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \mathbf{w} &\sim MVN(0, \mathbf{R}_{\sigma^2, \phi}) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] If I marginalize out the random effects, I can equivalently write

\[ \mathbf{y} \sim MVN(\mathbf{X} \boldsymbol{\beta}, \mathbf{R}_{\sigma^2, \phi} + \tau \mathbf{I}) \] Note that I have moved the random effects from the mean structure into the covariance.

So what does the term ‘spatial’ mean?

In recent statistics literature, a spatial model typically means that there is some spatial structure in the error structure.

An ecologist might argue that all three terms in the model \(\mu(\mathbf{s}_i) + w(\mathbf{s}_i) + \epsilon(\mathbf{s}_i)\) are spatial, because all depend on location \(\mathbf{s}_i = (s_{i1}, s_{i2})'\). Of course, this is true. Still, in the statistics literature recognize that ‘spatial’ typically refers to spatial random effects.

Computation issues

Posterior simulation requires sampling distributions for \((\phi, \sigma^2, \boldsymbol{\beta})\). I cannot sample directly for \(\phi\)–likelihood and prior are not conjugate for this parameter. So Metropolis is an option.

Note that I do not have to sample the random effects, because they can be marginalized away. I can always sample them if I want to, e.g., if I am interested in this smooth surface. If so, I just draw from a multivariate normal distribution having mean zero and the covariance \(Cov(w(\mathbf{s}_i), w(\mathbf{s}_j) )\)

Given covariance \(\mathbf{R}\) I can directly sample \(\boldsymbol{\beta}\), provided I can invert \(\mathbf{R}\). This will not be possible for a large number of spatial locations. I will mention some options below, but first point out that non-Gaussian data also fit within this framework.

Non-Gaussian data

The spatial model can be extended to non-Gaussian data by including an additional stage. Let \(g(\cdot)\) be a link function. For this GLM I omit the non-spatial error term,

\[ g(E[y(\mathbf{s}_i)]) = \mu(\mathbf{s}_i) + w(\mathbf{s}_i) \] Here is a spatial GLM for count data:

\[ \begin{aligned} y(\mathbf{s}) &\sim Poi(g^{-1}( \theta(\mathbf{s}_i)))\\ \theta(\mathbf{s}_i) &=\mu(\mathbf{s}_i) + w(\mathbf{s}_i) \end{aligned} \] I use the typical log link, \(g( \theta(\mathbf{s}_i)) = \log(\theta(\mathbf{s}_i))\). In the spBayes example that follows, there is a predictive process that reduces the \(n \times n\) covariance matrix to a small number of ‘knots’.

spBayes

The package spBayes makes use of a predictive processs with dimension reduction. Install the package spBayes with the other packages listed at the beginning of this vignette. I will use an example with FIA data to compare the non-spatial GLM and the spatial version of a Poisson regression. I start by fitting a non-spatial model, followed by the predictive process.

First, non-spatial

To obtain starting values I first fit a (non-spatial) glm and compare fitted and observed:

form <- as.formula( ydata[,spec] ~ standAge + meanTemp + annualDef )

poisNS <- glm( form, data = xdata, family="poisson")
beta.starting <- coefficients( poisNS )    # to initialize spBayes
beta.tuning   <- t(chol(vcov( poisNS )))   # to start proposals
betaNS        <- matrix(beta.starting)     # non-spatial estimates
AICNS         <- summary(poisNS)$aic       # non-spatial AIC

# observed vs predicted responses
tmp <- predict.glm(poisNS, type='response', se.fit=T)
yNS <- tmp$fit

par(bty='n')
predVsObsPlot( sqrt(ydata[,spec]), sqrt(yNS) )
abline(0,1,lty=2)
abline(h=mean( sqrt(ydata[,spec]) ),lty=2)
title('In-sample (at sample locations)')
*Predicted vs observed counts for loblolly pine, non-spatial GLM.*

Predicted vs observed counts for loblolly pine, non-spatial GLM.

These are the predictions at the sample locations. Box-and-whisker plots show 68% and 95% of predictive sample means for binned observations.

Here are fitted coefficients:

summary(poisNS)$coefficients
Estimate Std. Error z value Pr(>&#124;z&#124;)
(Intercept) -0.4053605 0.0134830 -30.06456 0
standAge -0.0359860 0.0000990 -363.56884 0
meanTemp 0.2925764 0.0006730 434.70712 0
annualDef -0.0012355 0.0000058 -212.43460 0

Exercise 1. Select a species to fit. Use AIC as a model selection criterion to determine which combination of variables in xdata best fits the data.

Here are maps comparing the observed and predicted data.

zlim <- c( 0, 300 )
par( mfrow=c(1,2) )
grid2map( coords, ydata[,spec], nx = 80, ny = 50, xaxt = 'n', yaxt = 'n',
                      xlim = xlim, ylim = ylim, zlim = zlim )
title( 'Observed' )
grid2map( coords, yNS, nx = 80, ny = 50, xaxt = 'n', yaxt = 'n',
                      xlim = xlim, ylim = ylim, zlim = zlim )
title( 'Predicted' )
*Observed and predicted maps for loblolly pine, non-spatial model.*

Observed and predicted maps for loblolly pine, non-spatial model.

Clearly, predictions miss some key features of the map.

Predictive process for dimension reduction

The predictive process reduces the dimensionality of a spatial covariance matrix by introducing a small set of reference points to condition on. By referencing this small matrix, we avoid inverting the full spatial matrix.

To estimate parameters we need to invert the error covariance matrix. For an \(n \times n\) covariance matrix, there are order-\(n^3\) operations. We need some mechanism for reducing the dimensionality of this process.

To reduce the dimensionality of the covariance matrix, Banerjee et al. introduce the predictive process that projects the surface to a smaller number of ‘knots’ \(\mathcal{S}^* = \mathbf{s}^*_1, \dots, \mathbf{s}^*_m\) where \(m << n\), having distribution

\[ \mathbf{w}^* \sim MVN(0, \mathbf{R}^*_{\sigma^2, \phi}) \] where \(\mathbf{R}^*_{\sigma^2, \phi}\) is a \(m \times m\) covariance matrix having the same parameterization as for the full data set. There is a conditional covariance that allows me to evaluate the expected random effect at some location \(\mathbf{s}_0\),

\[ E[w(\mathbf{s}_0) | \mathbf{w}^*] = \mathbf{r}'(\mathbf{s}_0, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \] where \(\mathbf{r}(\mathbf{s}_0, \phi)\) is the length\(-m\) vector of covariances between location \(\mathbf{s}_0\) and knot locations \(\mathbf{s}^*\). For an entire surface of points, I have the covariance function

\[ R(\mathbf{s}, \mathbf{s}'; \phi) = \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{r}(\mathbf{s}', \phi) \] where \(\mathbf{r}(\mathbf{s}, \phi)\) is now the \(n \times m\) matrix of covariances between points \(\mathbf{s}\) and \(\mathbf{s}'\). The predictive process model is now

\[ \begin{aligned} y(\mathbf{s}_i) &= \mu(\mathbf{s}_i) + \tilde{w}(\mathbf{s}_i) + \epsilon(\mathbf{s}_i) \\ \tilde{w}(\mathbf{s}) &= \mathbf{r}'(\mathbf{s}, \phi) \mathbf{R}^{*-1}(\phi) \mathbf{w}^* \end{aligned} \] Now the random effects in the original model have been replaced by a linear transformation of the random effects at the \(m\) knot locations. In other words, we are using the distance-dependent covariance function to interpolate between knots.

Knot locations for dimension reduction

Now I implement the spatial version. First, read the help page for the function spGLM. In addition to syntax similar to the function glm, I want to include coordinates, knots, starting values, tuning (variances for sampling), covariance model, and priors. I can call this function in a loop to obtain multiple chains.

I specify locations for knots to be used in dimension reduction. I want to assign a small number of knots, not too close to one another, but concentrated where observations are dense. I start with candidate points \(15 \times 15 = 225\) grid. Here is a map showing their locations:

nk    <- 15
klon  <- seq(xlim[1],xlim[2],length=nk)[-c(1,nk)]
klat  <- seq(ylim[1],ylim[2],length=nk)[-c(1,nk)]
knots <- as.matrix( expand.grid(klon,klat) )
colnames(knots) <- c('lon','lat')
map( 'usa', xlim = xlim, ylim = ylim )
points(coords[,1],coords[,2], col = 'grey', cex=.1)
points(knots[,1],knots[,2], col='#4575b4', pch=15)
*Candidate knots initially with 225 locations.*

Candidate knots initially with 225 locations.

Many of these locations are not helpful, e.g., those in the Atlantic Ocean. From these candidates I select those having the largest number of observations within a search radius of 1 degree lon/lat. I limit the total number of knots to 40. I use RANN::nn2 to find samples close to knots.

k  <- 50
tmp   <- nn2(coords, knots, searchtype='radius',k = k, radius = 1) 
np    <- tmp[[1]]
np[np > 0] <- 1
wt    <- order(rowSums(np), decreasing=T)[1:40]
knots <- knots[wt,]

map( 'usa', xlim = xlim, ylim = ylim )
wp <- which( ydata[,spec] > 0 )                        # sites with loblolly pine
points(coords[wp,1],coords[wp,2], col = 'grey',cex=.2)
points(knots[,1],knots[,2],col='#4575b4', pch=15)
*Knot locations reduced to those closest to observations.*

Knot locations reduced to those closest to observations.

I will use these knots.

The model is fitted here:

n.batch      <- 2
batch.length <- 1500
n.samples    <- n.batch*batch.length          # MCMC iterations
priors <-  list( "beta.Flat", "phi.Unif" = c(.05, 4), "sigma.sq.IG" = c(50, 20) )
betaS  <- rnorm( length(betaNS), betaNS, abs(betaNS)*.1 )
phiS   <- .tnorm( 1,.1,2,.05,1 )
sigS   <- .tnorm( 1,0,3,3,1 )

out <- spGLM( form, data = xdata, family = "poisson",
             coords = as.matrix(coords), knots=knots,
             starting = list("beta"=betaS, "phi"= phiS,"sigma.sq"= sigS, "w"= 0 ),
             tuning=list("beta" = beta.tuning, "phi" = 0.4,"sigma.sq" = 1, "w" = 0.1 ),
             priors=priors,
             amcmc = list("n.batch" = n.batch,"batch.length" = batch.length,
                        "accept.rate" = 0.43),
             cov.model = "exponential", verbose = T, n.report = 500 )
## ----------------------------------------
##  General model description
## ----------------------------------------
## Model fit with 13880 observations.
## 
## Number of covariates 4 (including intercept if specified).
## 
## Using the exponential spatial correlation model.
## 
## Using non-modified predictive process with 40 knots.
## 
## Number of MCMC samples 3000.
## 
## Priors and hyperpriors:
##  beta flat.
## 
##  sigma.sq IG hyperpriors shape=50.00000 and scale=20.00000
## 
##  phi Unif hyperpriors a=0.05000 and b=4.00000
## 
## Adaptive Metropolis with target acceptance rate: 43.0
## -------------------------------------------------
##      Sampling
## -------------------------------------------------
## Sampled: 3000 of 3000, 100.00%
## -------------------------------------------------
out$DIC <- unlist( spDiag(out, verbose=F) )['DIC4']

burn.in   <- 0.8*n.samples
sub.samps <- burn.in:n.samples
out$p.samples[,"phi"] <- 3/out$p.samples[,"phi"]
plot(out$p.beta.theta.samples, bty = 'n')
*Spatial correlation function.*

Spatial correlation function.

*Spatial correlation function.*

Spatial correlation function.


Estimates

Note that I have saved the DIC. Here is a table of estimates:

coeff <- t(apply( out$p.beta.theta.samples, 2, quantile, c(.5, .025, .975) ))
se    <- apply( out$p.beta.theta.samples, 2, sd )
coeff <- cbind( coeff[,1], se, coeff[,2:3])
print( coeff ) 
se 2.5% 97.5%
(Intercept) -2.91000 8.43e-01 -3.76000 -0.772000
standAge -0.02950 4.86e-04 -0.03140 -0.029200
meanTemp 0.27000 1.16e-02 0.25100 0.289000
annualDef -0.00103 8.24e-05 -0.00115 -0.000813
sigma.sq 5.24000 9.63e-01 3.73000 7.120000
phi 0.05000 2.44e-03 0.05000 0.050200

Here again is the non-spatial fit:

print( summary(poisNS)$coefficients )
Estimate Std. Error z value Pr(>&#124;z&#124;)
(Intercept) -0.40500 1.35e-02 -30.1 0
standAge -0.03600 9.90e-05 -364.0 0
meanTemp 0.29300 6.73e-04 435.0 0
annualDef -0.00124 5.80e-06 -212.0 0

Exercise 2a. Do you believe that the MCMC converged to a posterior distribution? If not, what might you do to convince yourself?

Exercise 2b. What are the differences between estimates for the spatial and non-spatial models? Are these estimates comparable? Why or why not?

Here is the covariance function:

betaS <- out$p.beta.theta.samples[sub.samps,]
samps <- betaS[,c('sigma.sq','phi')]
dseq  <- seq(0,1,length=100)
cmat  <- matrix(0,nrow(samps),100)

for(k in 1:nrow(samps)){
  cmat[k,] <- samps[k,'sigma.sq']*exp(-dseq/samps[k,'phi'])
}

ci <- apply(cmat,2,quantile,c(.5,.025,.975))
plot(dseq, ci[1,], ylim = c( 0, 1.2*max(ci)), type = 'l', bty = 'n',
     xlab = 'Distance', ylab = 'Covariance' )
shadeInterval( dseq, t(ci[2:3,]) )
*Fitted covariance function for the exponential model in spBayes.*

Fitted covariance function for the exponential model in spBayes.

Predictions

The predicted mean values have changed from the non-spatial GLM:

xi     <- model.matrix( form, data = xdata)
what   <- out$p.w.samples[,sub.samps]
ymu    <- xi%*%t(betaS[,colnames(xi)]) 
phat   <- exp( ymu + what )
yhat   <- rowMeans(phat)
ymu    <- rowMeans( exp(ymu) )
wmu    <- rowMeans( exp(what) )
yhatNS <- exp( xi%*%beta.starting )
predVsObsPlot( ydata[,spec], yhatNS, 
               ybins = unique( quantile( ydata[,spec], seq(0, 1, by = .03 ) )) )
predVsObsPlot( ydata[,spec], yhat, col = 'brown', 
               ybins = unique( quantile( ydata[,spec], seq(0, 1, by = .04 ) )), ADD = T )
abline( h = mean( ydata[,spec] ), lty = 2)
*Predicted vs observed in the spatial model includes random effects (brown) compared with the non-spatial model (grey).*

Predicted vs observed in the spatial model includes random effects (brown) compared with the non-spatial model (grey).

The root mean square prediction error for the non-spatial model is 48.87. The root mean square prediction error for this model is 34.5308306

The predicted values can be decomposed into contributions from the mean structure (the covariates) and the spatial random effects, \(\mathbf{w}\). They are multiplicative in this exponential link function. Note how the two contributions vary across the map:

y2     <- cbind( ydata[,spec], yhat, ymu, wmu )
colnames(y2) <- c( 'obs', 'pred', 'mean', 'w')

par( mfrow = c(2,2), mar = c(.1, .1, .5, .1), omi = c(.2, .2, .2, .2) )
for(k in 1:4){
grid2map( coords, y2[,k], nx = 100, ny = 60, xaxt = 'n', yaxt = 'n',
                      xlim = xlim, ylim = ylim )
  title( colnames( y2 )[k] )
}
*Observed, predicted (above), and contributions of mean and random effects (below).*

Observed, predicted (above), and contributions of mean and random effects (below).

First note the differences in scale for these maps. The random effects account for much of the geographic variation in the prediction, while the mean explains only a small part of this pattern. The random effects smooth and fill in for the limited mean structure. This multivariate approach borrows from all observations, through spatial covariance, to predict the data with random effects. In the next section I compare with a different multivariate approach, the non-spatial GJAM.


Exercise 3a. Divide up your group so that each member determines how one of the following influences model selection. Use DIC as your criterion.

  1. Different numbers of knots

  2. Prior distributions on \(\sigma^2\).

  3. Prior distributions on \(\phi\).

Exercise 3b. Interpret the contributions of the mean and the spatial random effects to predicted surfaces. In other words, what do they mean?

GJAM comparison

I want to compare the spatial GLM with a multivariate species model that uses a different likelihood, one without a non-linear link function. I might not want spatial random effects, because i) they have no biological interpretation, and ii) they impose smoothing. On the first point, I would prefer to have explanation come from predictors in the model. I know how to interpret them.

On the second point, there can be good reasons for spatial smoothing. Any time I believe that the response between two locations could be approximated by interpolated values, I might want to consider spatial smoothing. For FIA data, I do not want spatial smoothing. When I see forest composition varying with habitat gradients over tens of meters, I do not want to approximate it with models that interpolates between samples tens of km apart.

In GJAM, a joint distribution of species can help me understand effects of predictors on all species. If I want to analyze more than one species, these residual correlations are needed for proper credible and predictive intervals. Here is the same mean specification in GJAM, but on the data scale (not the link scale):

#S <- ncol(y)
ydata     <- gjamTrimY( ydata, 1000 )$y          # species on at least 1000 plots
modelList <- list(ng = 1000, burnin = 500, typeNames = 'DA', 
                  reductList = list(r = 3, N = 20) )
output    <- gjam( ~ standAge + meanTemp + annualDef, xdata, ydata, modelList )
gjamDIC   <- output$modelSummary$DIC

plotPars <- list( SAVEPLOTS = T, PLOTALLY = T )
gjamPlot(output, plotPars)

predVsObsPlot( ydata[,spec], output$prediction$ypredMu[,spec] )

Here are predictions:

*Observed and predicted by gjam.*

Observed and predicted by gjam.

Here are coefficients for this species from output$parameters$betaStandXTable:

beta <- output$parameters$betaStandXTable
beta[ grep('pinusTaeda', rownames(beta) ), ] 
Estimate SE CI_025 CI_975 sig95
pinusTaeda_intercept -73.1 1.090 -74.9 -71.0
pinusTaeda_standAge -27.8 0.922 -29.3 -25.8
pinusTaeda_meanTemp 76.1 0.619 74.9 77.2
pinusTaeda_annualDef -15.6 0.787 -17.0 -14.3

Here are maps:

ypred <- output$prediction$ypredMu

y2 <- cbind(ydata[,spec], ypred[,spec] )
colnames(y2) <- c('obs','pred')

par( mfrow = c(1,2), mar = c(.1, .1, .5, .1), omi = c(.2, .2, .2, .2) )
for(k in 1:2){
grid2map( coords, y2[,k], nx = 100, ny = 60, xaxt = 'n', yaxt = 'n',
                      xlim = xlim, ylim = ylim )
  title( colnames( y2 )[k] )
}
*Maps of observed and predicted by gjam.*

Maps of observed and predicted by gjam.

Note that GJAM does not know sample locations so there is no spatial smoothing. GJAM differs from previous models in that it is not a GLM.

Exercise 4. Repeat exercise 3 with GJAM and compare results for the two models. If there are differences, can you explain them?

Conditional autoregression (CAR) model

Areal unit data, or block-referenced data, come from non-overlapping areas. The CAR model takes into account correlations that can come from adjacency. Suppose there are \(n\) spatial units. In it’s simplest form there is a \(n \times n\) indicator matrix that specifies units sharing a boundary with ‘1’ and ‘0’ elsewhere. To demonstrate I use a few examples from the CARBayes package.

\[ \begin{aligned} y_i &= \mu_i + \phi_i + \epsilon_i \\ \phi_i|\phi_{-i} &\sim N \left(\frac{\rho \sum_{j=1}^n w_{ij}\phi_{j}}{ \rho \sum_{j=1}^n w_{ij} + 1 - \rho }, \frac{\sigma^2}{\rho \sum_{j=1}^n w_{ij} + 1 - \rho} \right) \\ \epsilon(\mathbf{s}_i) &\sim N(0, \tau^2) \end{aligned} \] Note that the CAR random effect \(\phi_i\) for block \(i\) is a weighted average of its neighbors \(\{-i\}\). For dependence parameter \(\rho\) there is independence for \(\rho = 0\) and the standard CAR dependence for \(\rho = 1\). The variance declines with the numbers of neighbors (sum of the weights). The parameter \(\rho\) is implemented in the function S.Carleroux.

For a transparent demonstration, here is a square neighborhood matrix:

#### Set up a square lattice region
m <- 12
xEast  <- 1:m
xNorth <- 1:m
grid   <- expand.grid(xEast, xNorth)
n      <- nrow(grid)
plot( NULL, xlim = c(0, m), ylim = c(0, m), xlab='East', ylab='North' )
abline(v=grid[,1], h=grid[,2])
text(grid[,1] - .5, grid[,2] - .5, 1:n, cex=.8)
*Lattice with location labels for simulated data*.

Lattice with location labels for simulated data.

D <- W <- as.matrix(dist(grid))
W[W != 1] <- 0  

The map simply assigns a label to each block.

Take note of the structure of W. It has \(m^2 \times m^2\) elements, all zero but the off-diagonal elements. Box 1 has two neighbors (2, 13), and W[1,2] and W[1,13] are set to 1. This adjacency is apparent from the map. Box 2 has three neighbors (1, 3, 14). Box 14 has four neighbors (2, 13, 15, 26). These (row, column) combinations have a ‘1’ in W.

The zeros and ones in the W matrix could be replaced with weights specifying how each affects the other.

Here is a simulated data set

Q <- 3
x <- matrix( rnorm(Q*n), n, Q )
x[,1] <- 1
x2    <- x[,2]
x3    <- x[,3]
beta  <- matrix( rnorm(Q), Q, 1)
sigma <- .1

# simulated based on distance D
phi <- t( .rMVN(1, rep(0,n), 1*exp(-0.1*D) ) )
y   <- x%*%beta + phi + rnorm(n, 0, sigma)

form <- as.formula(y ~ x2 + x3)

## Gaussian model
gaussianModel <- S.CARleroux(formula = form, family  = 'gaussian', W = W, 
                             burnin = 20000, n.sample = 100000, thin = 10, verbose = F)
gaussianModel
## 
## #################
## #### Model fitted
## #################
## Likelihood model - Gaussian (identity link function) 
## Random effects model - Leroux CAR
## Regression equation - y ~ x2 + x3
## 
## 
## #################
## #### MCMC details
## #################
## Total number of post burnin and thinned MCMC samples generated - 8000
## Number of MCMC chains used - 1
## Length of the burnin period used for each chain - 20000
## Amount of thinning used - 10
## 
## ############
## #### Results
## ############
## Posterior quantities and DIC
## 
##                Mean    2.5%   97.5% n.effective Geweke.diag
## (Intercept)  1.5483  1.5275  1.5688      6235.6         0.2
## x2          -0.1587 -0.2096 -0.1085      3208.2         0.8
## x3           0.3482  0.2957  0.4010      3593.9         0.0
## nu2          0.0121  0.0025  0.0353      1138.1         0.6
## tau2         0.3165  0.2251  0.4162      3420.7         0.1
## rho          0.9414  0.8327  0.9951      6640.4         0.5
## 
## DIC =  -167.301       p.d =  94.17077       LMPL =  18.92

Here is the autocorrelation parameter \(\rho\)

plot( gaussianModel$samples$rho, bty = 'n' )

Here is a map of the spatial random effects:

fv <- gaussianModel$fitted.values
mf <- min(fv)
cc <- fv - mf
ss <- seq(0, max(cc), length.out=10)
cc <- findInterval(cc, ss)

colM <- colorRampPalette( rampRed2Blue )
colm <- colM(10)

symbols(x=grid[,1], y=grid[,2], squares = cc*0+1, bg=colm[cc],
        fg=colm[cc],inches=F, xlab='East', ylab='North')

The CAR model can apply to non-Gaussian likelihoods. Here is a GLM:

lambda <- exp(x%*%beta + phi + rnorm(n, 0, sigma))
y <- rpois(n, lambda)

poissonModel <- S.CARbym(formula=form, family="poisson",
                         W=W, burnin=20000, n.sample=100000, thin=10, verbose=F)
poissonModel
## 
## #################
## #### Model fitted
## #################
## Likelihood model - Poisson (log link function) 
## Random effects model - BYM CAR
## Regression equation - y ~ x2 + x3
## 
## 
## #################
## #### MCMC details
## #################
## Total number of post burnin and thinned MCMC samples generated - 8000
## Number of MCMC chains used - 1
## Length of the burnin period used for each chain - 20000
## Amount of thinning used - 10
## 
## ############
## #### Results
## ############
## Posterior quantities and DIC
## 
##                Mean    2.5%   97.5% n.effective Geweke.diag
## (Intercept)  1.5356  1.4449  1.6259      5611.2        -1.8
## x2          -0.1517 -0.2465 -0.0598      4321.3        -1.1
## x3           0.3895  0.2907  0.4858      4293.1         0.6
## tau2         0.4202  0.2411  0.6689      2378.0         0.4
## sigma2       0.0113  0.0020  0.0516       162.3         0.1
## 
## DIC =  695.8211       p.d =  64.88341       LMPL =  -360.65


Repeat the spatial analysis with a binomial model. Use the CARBayes help page. Interpret the analysis.

GIS data

Here is code from this site on using block referenced data. This example illustrates two things:

  1. R can be used with shape files, allowing direct integration of spatial data into models.

  2. Block referenced data can be irregular, but still amenable to CAR modeling, using adjacency.


Exercise 6. Work through the respiratory risk example in groups as if you were attempting to apply it to a data set of your own.

Reprise

Spatial models confront processes that explicitly require location, distance, or both. Models are often large, requiring dimension reduction.

spBayes is a powerful package for analysis of spatial random effects.

Spatial random effects are not always desirable, depending on the process and the spatial intensity of sampling.