An introduction to spatial regression techniques including:

  1. Mapping model residuals to look for spatial autocorrelation (SAC)
  2. Testing for SAC in model residuals using Moran’s I
  3. Performing an autocovariate regression to account for SAC
# dependencies
library(spdep)
library(sp)
library(raster)
library(tidyverse)

Lets have a look at the data:

Column 1 represents data extracted from a raster grid of the number of acoustic bat passes that occured in separate 100x100m grid squares sampled across a 5 year period (A).

Columns 2-5 represents data extracted from rasters showing percentage cover of several habitat variables measured in each square (agricultural, buildings, woodland, water) (B). Coordinates of the centre point of each cell are also shown.

dat <- read_csv('tutorial2_data.csv') %>% select(-X1)
x y passes agr_cover bldngs_cover wood_cover water_cover
-2.15 49.25 0 0.04 0.06 0.00 0
-2.15 49.25 0 0.23 0.00 0.05 0
-2.15 49.25 0 0.14 0.00 0.00 0
-2.15 49.25 1 0.00 0.04 0.00 0
-2.15 49.25 0 0.31 0.00 0.00 0
-2.23 49.25 0 0.14 0.09 0.00 0
-2.23 49.25 0 0.88 0.00 0.00 0
-2.17 49.25 0 0.14 0.00 0.00 0
-2.16 49.25 0 0.04 0.00 0.01 0
-2.15 49.25 0 0.41 0.02 0.00 0

A - Raster of bat passes recorded per cell:

B - Example habitat raster showing percentage cover of agricultural land in sampled squares:

Initial inspection of plot A shows that there are distinct clumps of bat passes which may indicate the presence of spatial autocorrelation (SAC). Spatial autocorrelation occurs when values of a variable sampled at locations close to each other are more similar than values taken from more distant locations.

Crucially, its presence in the residuals of a fitted model mean that the common statistical assumption of independent and identically residuals is violated (Dormann et al. 2007). To account for this, residuals should be mapped and tested for signs of SAC. If it is present, SAC should then be accounted for by constructing a spatially explicit model.

First lets construct a non-spatial generalised linear model using the number of bat passes per cell as the response variable and the percentage cover of habitat variables as predictors…

f1 <- passes ~ agr_cover + bldngs_cover + wood_cover + water_cover
m1 <- glm(f1, data = dat, family = 'poisson')
summary(m1)
## 
## Call:
## glm(formula = f1, family = "poisson", data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4878  -0.9825  -0.8585   0.5895   5.0356  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.72843    0.04773 -15.262  < 2e-16 ***
## agr_cover     0.15001    0.10048   1.493    0.135    
## bldngs_cover -3.87379    0.55436  -6.988 2.79e-12 ***
## wood_cover    1.33528    0.14606   9.142  < 2e-16 ***
## water_cover   2.72741    0.45784   5.957 2.57e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3786.2  on 3123  degrees of freedom
## Residual deviance: 3571.1  on 3119  degrees of freedom
## AIC: 5858.5
## 
## Number of Fisher Scoring iterations: 6

The output suggests that building cover has a significant negative effect on the number of passes resocrded (bat activity) whereas woodland and water cover has a significant positive effect.




Mapping model residuals to look for spatial autocorrelation (SAC)


Now lets look for signs of spatial autocorrelation in the residuals of each square…

# extract residuals
res <- m1$residuals

# extract coordinates corresponding to each residual and combine
res <- data.frame(Residuals = res, x = dat$x, y = dat$y)
Residuals x y
-1.00 -2.15 49.25
-1.00 -2.15 49.25
-1.00 -2.15 49.25
1.38 -2.15 49.25
-1.00 -2.15 49.25
-1.00 -2.23 49.25
-1.00 -2.23 49.25
-1.00 -2.17 49.25
-1.00 -2.16 49.25
-1.00 -2.15 49.25
#plot
plot <- ggplot(res, aes(x = x, y = y)) + geom_point(aes(colour = Residuals, size = Residuals)) + 
  geom_point(shape = 1, aes(size = Residuals, colour = sign) ,colour = "black") + 
  scale_colour_gradient2(low = "#E8F3EB", mid = "#FF1C55",
                         high = "#560000", midpoint = 8, space = "Lab",
                         na.value = "grey50", guide = "colourbar")

The size of the point in the residual plot represents the size of the residual. Points are also coloured by size which gives an additional insight into where residuals of differing sizes are located in space. Inspection of the plot indicates noticeable clumping of residuals with the same sign - in particular, negative residuals tend to be clumped together.




Testing for SAC in model residuals using Moran’s I


To confirm the presence or absence of SAC in the residuals, we will now conduct a Moran’s I test.

# Convert data to spatial points dataframe
dat <- SpatialPointsDataFrame(cbind(dat$x, dat$y), dat)

# Construct weights matrix in weights list form using the 10 nearest neighbors 
lstw  <- nb2listw(knn2nb(knearneigh(dat, k = 10)))

# Compute Moran's I using residuals of model
moran.test(residuals.glm(m1), lstw) 
## 
##  Moran I test under randomisation
## 
## data:  residuals.glm(m1)  
## weights: lstw  
## 
## Moran I statistic standard deviate = 10.962, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      8.404562e-02     -3.202049e-04      5.923356e-05

Results of the test confirm that SAC is present - p-value < 0.05 therefore we can reject null hypothesis that there is no spatial autocorrelation in the residuals of the model.




Performing an autocovariate regression to account for SAC


To account for SAC we will now construct a spatial autocovariate which will be included as an additional predictor variable in the original model.

For a particular square, the autocovariate is calculated as the distance weighted average of neighbouring response values (bat passes) (Dormann et al. 2007). This means that the bat pass counts surrounding each square are averaged and cells that are further away receive a lower weighting in the calculation of the average. A

t a certain distance two cells are no longer considered neighbours. This distance is determined via a pre-specified neighbourhood distance (Augustin, Mugglestone & Buckland 1996). In our case, this will be set by determining the minimum distance at which no cells had zero neighbours (Brierley et al. 2016).

library(spdep)

# define cell coordinates 
coords <- as.matrix(cbind(dat$x, dat$y) )

# construct autcovariate - increase neigbourhood dist (nbs) by increments of 0.1 till no cells have zero neighbours
ac <- autocov_dist(as.numeric(dat$passes), coords, nbs = 0.1, longlat = TRUE)
## Warning in autocov_dist(as.numeric(dat$passes), coords, nbs = 0.1, longlat
## = TRUE): With value 0.1 some points have no neighbours
## Error in nb2listw(nb, glist = gl, style = style, zero.policy = zero.policy): Empty neighbour sets found
# try nbs = 0.2
ac <- autocov_dist(as.numeric(dat$passes), coords, nbs = 0.2, longlat = TRUE)

Trial and error eventually yields 0.2 (200m) as the distance at which no cells have zero neighbours.

To visualize what the autocovariate looks in space we can convert to a raster

# combine with cell coordinates
ac <- data.frame(ac = ac, x = dat$x, y = dat$y)

# convert to spatial points df
coordinates(ac) <- ~ x + y

# rasterize and specify ac as classifying field
ac_ras <- rasterize(ac, agr, field = 'ac')

Autocovariate raster:

Comparing alongside the original raster of bat passes shows how areas where more bats were recorded have greater autocovariate values. Including the autocovariate as an additional predictor therefore attempts to account for this spatial bias.

Original bat pass raster:

Now lets rerun the original model with the autocovariate included.

# specify new formula
f2 <- passes ~ agr_cover + bldngs_cover + wood_cover + water_cover + ac 

# run new model
m2 <- glm(f2, data = dat, family = 'poisson')
summary(m2)
## 
## Call:
## glm(formula = f2, family = "poisson", data = dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4838  -0.9433  -0.8163   0.5632   4.6366  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.968574   0.052233 -18.543  < 2e-16 ***
## agr_cover     0.047515   0.100681   0.472    0.637    
## bldngs_cover -3.384745   0.552074  -6.131 8.74e-10 ***
## wood_cover    0.937246   0.153715   6.097 1.08e-09 ***
## water_cover   0.578807   0.531170   1.090    0.276    
## ac            0.013860   0.001108  12.503  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3786.2  on 3123  degrees of freedom
## Residual deviance: 3432.4  on 3118  degrees of freedom
## AIC: 5721.8
## 
## Number of Fisher Scoring iterations: 6

Interestingly the updated model yields slightly different results to the first model - water no longer has a significant effect on bat activity. However, comparing the AIC of the two models indicates that including the autocovariate leads to a noticeable improvement in model fit in terms of AIC.

AIC(m1,m2)
##    df      AIC
## m1  5 5858.508
## m2  6 5721.760

Now lets examine the residuals of the spatial model

#extract residuals of spatial model
res2 <- m2$residuals

# add coords
res2 <- data.frame(Residuals = res2, x = dat$x, y = dat$y)

#plot2
plot2 <- ggplot(res2, aes(x = x, y = y)) + geom_point(aes(colour = Residuals, size = Residuals)) + 
  geom_point(shape = 1, aes(size = Residuals, colour = sign) ,colour = "black") + 
  scale_colour_gradient2(low = "#E8F3EB", mid = "#FF1C55",
                         high = "#560000", midpoint = 8, space = "Lab",
                         na.value = "grey50", guide = "colourbar")
plot(plot2)

Compare with residual plot of model without autocovariate:

plot(plot)

Comparison of the two plots reveals only subtle differences. It almost suggests that inclusion of the autocovariate hasn’t done much good. However, testing the residuals of the updated model using Moran’s I reveals that SAC is no longer present after inclusion of an autocovariate.

# Compute Moran's I using residuals of updated model and weight list
moran.test(residuals.glm(m2), lstw) 
## 
##  Moran I test under randomisation
## 
## data:  residuals.glm(m2)  
## weights: lstw  
## 
## Moran I statistic standard deviate = 1.4256, p-value = 0.07699
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      1.065180e-02     -3.202049e-04      5.923623e-05

P-value is > 0.05 so we accept the null hypothesis that there is no SAC present in the residuals of the model. This result alongside the improved fit in terms of AIC of the autocovariate model indicates that a spatial modelling approch is the best technique to use.

There you have it - a basic intro to spatial regression involving an autocovariate.



References

Augustin, A.N.H., Mugglestone, M.A. & Buckland, S.T. (1996) An Autologistic Model for the Spatial Distribution of Wildlife. Journal of Applied Ecology, 33, 339–347.

Brierley, L., Vonhof, M.J., Olival, K.J., Daszak, P. & Jones, K.E. (2016) Quantifying Global Drivers of Zoonotic Bat Viruses: A Process-Based Perspective. The American Naturalist, 187, E53–E64.

Dormann, C., McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., G. Davies, R., Hirzel, A., Jetz, W., Daniel Kissling, W., Kühn, I., Ohlemüller, R., R. Peres-Neto, P., Reineking, B., Schröder, B., M. Schurr, F. & Wilson, R. (2007) Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 30, 609–628.