Spatial Statistics - Script & Exercises

In this assignment, we will work with the dataset “gorillas”, which contains nesting locations of gorillas in a National Park in Cameroon. You can find a description of this dataset here: https://rdrr.io/cran/spatstat.data/man/gorillas.html

We will explore the spatial distribution of the nesting sites and which environmental factors might determine it!

1. PREPARATIONS

First, please install the following packages, and manually install the package “spatstat.core” that you can find in the folder with today’s material:

# install.packages("spatstat", "raster", "rgdal", "sp", "leaflet", "viridis", "ggplot2", "spdep", "spatialreg")
library(spatstat)    ## for spatial statistics
library(raster)      ## to work with raster files
library(rgdal)       ## to work with shapefiles
library(sp)          ## to work with shapefiles
library(leaflet)     ## to plot interactive maps
library(viridis)     ## for some nice color scales
library(ggplot2)     ## for nice diagrams
library(spdep)       ## to calculate spatial weights
library(spatialreg)  ## for spatial regression

Define your working directory:

dir <- "C:/Users/Max/Desktop/SpatialStats_Class"

Let’s load the gorillas dataset into our environment:

data(gorillas)

2. DATA INSPECTION

“gorillas” is a “ppp” (planar point pattern) object that contains 647 observations of nesting sites. For each site, we have information about the group, season and date when the site was recorded.

gorillas

## Marked planar point pattern: 647 points
## Mark variables: group, season, date 
## window: polygonal boundary
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres

summary(gorillas)

## Marked planar point pattern:  647 points
## Average intensity 3.255566e-05 points per square metre
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 metres
## 
## Mark variables: group, season, date
## Summary:
##     group              season               date           
##  Length:647         Length:647         Min.   :2006-01-06  
##  Class :character   Class :character   1st Qu.:2007-03-15  
##  Mode  :character   Mode  :character   Median :2008-02-05  
##                                        Mean   :2007-12-14  
##                                        3rd Qu.:2008-09-23  
##                                        Max.   :2009-05-31  
## 
## Window: polygonal boundary
## single connected closed polygon with 21 vertices
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres
##                      (5476 x 4566 metres)
## Window area = 19873700 square metres
## Unit of length: 1 metre
## Fraction of frame area: 0.795

class(gorillas)

## [1] "ppp"

Plot the dataset - You can see one map for each of the three nest variables:

plot(gorillas)

Let’s inspect the object “gorillas” with more detail. It contains six elements:

length(gorillas)

## [1] 6

names(gorillas)

## [1] "window"     "n"          "x"          "y"          "markformat"
## [6] "marks"

The first element is an object of type “owin”, which is essentially a polygon that depicts the area in which the nests were sampled:

gorillas[[1]]

## window: polygonal boundary
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres

class(gorillas[[1]])

## [1] "owin"

plot(gorillas[[1]], main = "Study Area")

The third and fourth elements are vectors of the x and y coordinates:

head(gorillas$x)

## [1] 582518.4 581823.0 582131.0 582111.9 582585.1 582302.3

head(gorillas$y)

## [1] 676886.2 677422.7 676937.9 677420.0 677509.7 677521.6

The sixth element is a data frame that contains the group, season and date for each observation:

head(gorillas$marks)

group	season	date
major	dry	2006-01-06
major	dry	2006-01-10
major	dry	2006-01-15
major	dry	2006-01-24
minor	dry	2006-01-27
major	dry	2006-01-28

Let’s also inspect the object “gorillas.extra”. This is a collection of seven environmental raster datasets in the form of “im” (pixel image) objects:

gorillas.extra

## List of pixel images
## 
## aspect:
## factor-valued pixel image
## factor levels:
## [1] "N"  "NE" "E"  "SE" "S"  "SW" "W"  "NW"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## elevation:
## integer-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## heat:
## factor-valued pixel image
## factor levels:
## [1] "Warmest"  "Moderate" "Coolest" 
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## slopeangle:
## real-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## slopetype:
## factor-valued pixel image
## factor levels:
## [1] "Valley"   "Toe"      "Flat"     "Midslope" "Upper"    "Ridge"   
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## vegetation:
## factor-valued pixel image
## factor levels:
## [1] "Disturbed"  "Colonising" "Grassland"  "Primary"    "Secondary" 
## [6] "Transition"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
## 
## waterdist:
## real-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres

names(gorillas.extra)

## [1] "aspect"     "elevation"  "heat"       "slopeangle" "slopetype" 
## [6] "vegetation" "waterdist"

plot(gorillas.extra)

class(gorillas.extra$aspect)

## [1] "im"

3. CONVERT & MAP THE DATA

We have already produced some maps, but the options are rather limited as long as our data is in the format of “ppp”, “owin” and “im” as these are quite specific formats in the “spatstat” package. For more advanced mapping, and for our further analyses, we have to transform some of this data into raster and shapefiles.

Let’s convert the outline of the study region to a polygon (“SpatialPolygons”) by accessing the coordinates of the vertices in the “owin” object:

coords_study_area <- data.frame(x_coord = gorillas[[1]]$bdry[[1]]$x, 
                                y_coord = gorillas[[1]]$bdry[[1]]$y)

p = Polygon(coords_study_area)
ps = Polygons(list(p),1)
study_area_UTM = SpatialPolygons(list(ps), proj4string = CRS("+proj=utm +zone=32 +datum=WGS84"))

class(study_area_UTM)

## [1] "SpatialPolygons"
## attr(,"package")
## [1] "sp"

plot(study_area_UTM, main="Study Area")

Let’s convert the point observations to a polygon (“SpatialPointsDataFrame”) by accessing their coordinates in the “ppp” object:

coords_nests <- data.frame(x_coord = gorillas$x, y_coord = gorillas$y)
coords_nests <- cbind(coords_nests, gorillas$marks)

nests_UTM <- SpatialPointsDataFrame(coords = coords_nests[,c(1:2)], data = coords_nests,
                                proj4string = CRS("+proj=utm +zone=32 +datum=WGS84"))
nests_UTM

## class       : SpatialPointsDataFrame 
## features    : 647 
## extent      : 580797.3, 584945.3, 675238.7, 678313.5  (xmin, xmax, ymin, ymax)
## crs         : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs 
## variables   : 5
## names       :   x_coord,   y_coord, group, season,  date 
## min values  :  580797.3, 675238.68, major,    dry, 13154 
## max values  : 584945.34, 678313.46, minor,  rainy, 14395

class(nests_UTM)

## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

head(nests_UTM@data)

x_coord	y_coord	group	season	date
582518.4	676886.2	major	dry	2006-01-06
581823.0	677422.7	major	dry	2006-01-10
582131.0	676937.9	major	dry	2006-01-15
582111.9	677420.0	major	dry	2006-01-24
582585.1	677509.7	minor	dry	2006-01-27
582302.3	677521.6	major	dry	2006-01-28

We can convert any element of “gorillas.extra” to a raster object in the following way. The raster object “aspect_ras” does not yet contain a coordinate reference system (crs):

aspect_ras <- raster(gorillas.extra$aspect)
class(aspect_ras)

## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

aspect_ras

## class      : RasterLayer 
## dimensions : 149, 181, 26969  (nrow, ncol, ncell)
## resolution : 30.70955, 30.70955  (x, y)
## extent     : 580440.4, 585998.8, 674156.5, 678732.2  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : layer 
## values     : 1, 8  (min, max)
## attributes :
##        ID VALUE
##  from:  1     N
##   to :  8    NW

my_palette <- c("red", "orange", "yellow", "green", "cyan", "#00bbff", "#0000ff", "#ff00ff")
plot(aspect_ras, "VALUE", main="Aspect", col=my_palette)

Let’s import the shapefile with the outline of Cameroon. This shapefile does not contain projected but geographic lon/lat coordinates. We use this CRS to project the aspect raster, and plot the raster together with the nest coordinates and the study area border in an interactive leaflet map (UTM coordinates would not work with leaflet):

cameroon <- readOGR(paste0(dir, "/shapefile/gadm41_CMR_0.shp"), verbose=F)
plot(cameroon, main="Cameroon")

crs(aspect_ras) <- CRS("+proj=utm +zone=32 +datum=WGS84") 

## re-project the shapefiles to geographic coordinates, otherwise they are not displayed in leaflet
study_area_LONLAT <- spTransform(study_area_UTM, crs(cameroon))
nests_LONLAT <- spTransform(nests_UTM, crs(cameroon))

leaflet() %>%
  #setView(9.75,6.12, zoom = 12) %>%
  addTiles() %>%
  addRasterImage(aspect_ras, colors = my_palette, opacity = 0.8) %>%
  addCircleMarkers(data = nests_LONLAT, radius=1, color="black", opacity=1) %>%
  addPolygons(data = study_area_LONLAT, fillColor = "red", weight = 3, color = "black", fillOpacity = 0) %>%
  addLegend(colors = my_palette, values = values(aspect_ras), title = "Aspect", 
            labels=c("North", "Northeast", "East", "Southeast", "South", "Southwest", "West", "Northwest")) %>%
  addScaleBar()

Plot the elevation dataset, which is integer-valued and for which a continuous color palette can be used:

elevation_ras <- raster(gorillas.extra$elevation)
crs(elevation_ras) <- CRS("+proj=utm +zone=32 +datum=WGS84")
plot(elevation_ras, col=terrain.colors(100))

leaflet() %>%
  addProviderTiles('Esri.WorldImagery') %>%  
  addRasterImage(elevation_ras, colors = viridis(100), opacity = 0.8) %>%
  addCircleMarkers(data = nests_LONLAT, radius=1, color="black", opacity=1, label = ~paste(group, season, date, sep=", ")) %>%
  addScaleBar()

4. POINT PATTERN ANALYSIS

Let’s go back to the object “gorillas”; we will need the raster and shapefile datasets again later. We now focus on empirically assessing whether the point pattern in the nest locations is random or clustered.

Do you remember the quadrat counting exercise from the course? We can do the same in R:

quadrats = quadratcount(gorillas, nx = 10, ny = 10)

gorillas_unmarked <- unmark(gorillas)   ## delete the marks information, otherwise the next plot does not work

plot(intensity(quadrats, image=TRUE), main="Gorilla nests - quadrats" , las=1)  # Plot density raster
plot(gorillas_unmarked, pch=20, cols="grey70", add=T, size=0.5)
plot(quadrats, add=T)

We can perform a chi-squared test of complete spatial randomness for the points and the quadrats defined above. The resulting p-value indicates that our point pattern is not random.

quadrat.test(gorillas, nx = 10, ny = 10)

## Warning: Some expected counts are small; chi^2 approximation may be inaccurate

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  gorillas
## X2 = 2071.6, df = 93, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 94 tiles (irregular windows)

Compute an isotropic kernel density estimate of the point pattern:

plot(density(gorillas), main = "Nest density")
plot(gorillas, pch = 1, cex = 0.5, add = TRUE)

## Plotting the first column of marks

contour(density(gorillas), add = TRUE)

The K-Ripley function also indicates that the nests are spatially more clustered than would be expected:

K_function <- envelope(gorillas,fun=Kest,
                   funargs=list(correction="border"),global=TRUE)

## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(K_function)

5. RANDOM SAMPLING POINTS

In chapter 6, we want to model nest density with the seven environmental covariates contained in “gorillas.extra”. To have an unbiased spatial sample, we have to sample nest density and the covariates at 100 random locations, and summarize all information at these locations in a table.

Let us use the shapefile “study_area_UTM” to define these 100 random points first:

set.seed(1234)
random_points_UTM <- spsample(study_area_UTM, n=100, "random")

## Warning in proj4string(obj): CRS object has comment, which is lost in output

plot(random_points_UTM, main="Random points (black) and nests (red)")
plot(nests_UTM, add=T, col="red")
plot(study_area_UTM, add=T)

Rasterize the remaining layers of “gorillas.extra”, and the kernel density map:

heat_ras       <- raster(gorillas.extra$heat)
slopeangle_ras <- raster(gorillas.extra$slopeangle)
slopetype_ras  <- raster(gorillas.extra$slopetype)
vegetation_ras <- raster(gorillas.extra$vegetation)
waterdist_ras  <- raster(gorillas.extra$waterdist)

density_ras    <- raster(density(gorillas))

Join all raster layers in one multi-layered raster-stack and assign CRS. You can very easily plot all layers contained in the stack:

stack1 <- stack(aspect_ras, elevation_ras, heat_ras, slopeangle_ras, slopetype_ras, vegetation_ras, waterdist_ras)
names(stack1) <- c("aspect", "elevation", "heat", "slopeangle", "slopetype", "vegetation", "waterdist")
crs(stack1) <- CRS("+proj=utm +zone=32 +datum=WGS84")
plot(stack1)

Extract raster values for the point locations. Also extract the respective values from the density raster, join the results together, and add coordinates:

nestdata         <- as.data.frame(extract(stack1, random_points_UTM))
nestdata$density <- extract(density_ras, random_points_UTM)
nestdata         <- cbind(nestdata, random_points_UTM@coords)

head(nestdata)

aspect	elevation	heat	slopeangle	slopetype	vegetation	waterdist	density	x	y
8	1849	2	10.560980	1	1	122.83730	3.00e-06	584608.5	675415.8
6	1247	1	8.247169	1	1	68.66813	8.00e-07	582765.3	674575.9
7	1602	2	16.906500	1	1	43.42954	1.91e-05	584074.9	678035.5
1	1789	1	20.547510	6	4	61.41864	5.85e-05	584059.9	677174.1
8	1634	1	26.627790	1	1	122.83730	3.98e-05	583230.8	678131.9
6	1808	2	42.498470	1	4	61.41864	4.69e-05	582479.6	676053.7

6. SPATIAL REGRESSION MODEL

We start off with an ordinary OLS linear regression model. Let’s use both elevation and vegetation as predictors and plot the residuals in a map. You can see that there is spatial pattern in the residuals:

model1_lm <- lm(density ~ elevation + vegetation, data = nestdata)

summary(model1_lm)

## 
## Call:
## lm(formula = density ~ elevation + vegetation, data = nestdata)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -8.010e-05 -1.370e-05 -6.780e-06  2.219e-05  5.761e-05 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.105e-06  2.646e-05  -0.042    0.967    
## elevation   -1.457e-09  1.661e-08  -0.088    0.930    
## vegetation   1.732e-05  2.070e-06   8.364  4.5e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.839e-05 on 97 degrees of freedom
## Multiple R-squared:  0.4538, Adjusted R-squared:  0.4425 
## F-statistic: 40.29 on 2 and 97 DF,  p-value: 1.828e-13

nestdata$residuals <- residuals(model1_lm)
nestdata$fitted    <- fitted(model1_lm)

ggplot(nestdata, aes(x = x, y = y, col = residuals, size = residuals)) +
  geom_point() +
  scale_color_gradient2()

To assess the degree of spatial autocorrelation in density, let’s calculate Moran’s I.

To do so, we first have to identify which points are neighboring. We can do that with the function “dnearneigh”. The resulting object “nests_nb” contains one list element for each random point, and the IDs of all respective neighboring points. You can change the argument “d2” in “dnearneigh” to in- or decrease the search radius:

nests_nb <- spdep::dnearneigh(as.matrix(nestdata[c("x", "y")]), d1 = 0, d2 = 500)
nests_nb

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 330 
## Percentage nonzero weights: 3.3 
## Average number of links: 3.3 
## 1 region with no links:
## 26

Let’s use “nests_nb” to assign weights

nests_weights <- nb2listw(nests_nb, style = "W", zero.policy = TRUE)
class(nests_weights)

## [1] "listw" "nb"

# head(nests_weights)
str(nests_weights$weights, list.len=5, give.attr = F)

## List of 100
##  $ : num [1:5] 0.2 0.2 0.2 0.2 0.2
##  $ : num [1:4] 0.25 0.25 0.25 0.25
##  $ : num [1:5] 0.2 0.2 0.2 0.2 0.2
##  $ : num [1:2] 0.5 0.5
##  $ : num [1:4] 0.25 0.25 0.25 0.25
##   [list output truncated]

Perform the Moran’s I test. The p-value is below 0.05, which indicates spatial autocorrelation:

lm.morantest(model1_lm, nests_weights, zero.policy = T)

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
## 
## Moran I statistic standard deviate = 6.5613, p-value = 2.667e-11
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.532063900     -0.024404619      0.007192828

Now, the question is whether we should use a spatial lag or a spatial error model. We need to run a statistical test, the Lagrange Multiplier test, to find that out. The p-value for the spatial lag term is smaller than for the spatial error term, which indicates that the spatial lag model is more appropriate:

lm.LMtests(model1_lm, nests_weights, test="LMerr", zero.policy = T)

## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
## 
## LMErr = 38.075, df = 1, p-value = 6.807e-10

lm.LMtests(model1_lm, nests_weights, test="LMlag", zero.policy = T)

## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
## 
## LMlag = 78.895, df = 1, p-value < 2.2e-16

Anyway, let’s fit both models and compare them against each other and against the first model:

model1_slm <- lagsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, zero.policy = TRUE)

## Warning in lagsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, : inversion of asymptotic covariance matrix failed for tol.solve = 2.22044604925031e-16 
##   reziproke Konditionszahl = 1.16716e-19 - using numerical Hessian.

model1_slm

## 
## Call:
## lagsarlm(formula = density ~ elevation + vegetation, data = nestdata, 
##     listw = nests_weights, zero.policy = TRUE)
## Type: lag 
## 
## Coefficients:
##           rho   (Intercept)     elevation    vegetation 
##  9.120527e-01 -1.063415e-05  5.854761e-09  1.596096e-06 
## 
## Log likelihood: 1023.556

model1_sem <- errorsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, zero.policy = TRUE)

## Warning in errorsarlm(density ~ elevation + vegetation, data = nestdata, : inversion of asymptotic covariance matrix failed for tol.solve = 2.22044604925031e-16 
##   reziproke Konditionszahl = 2.63198e-19 - using numerical Hessian.

model1_sem

## 
## Call:
## errorsarlm(formula = density ~ elevation + vegetation, data = nestdata, 
##     listw = nests_weights, zero.policy = TRUE)
## Type: error 
## 
## Coefficients:
##        lambda   (Intercept)     elevation    vegetation 
##  9.584548e-01 -7.001184e-05  4.342284e-08  2.744002e-07 
## 
## Log likelihood: 1023.057

AIC(model1_lm, model1_slm, model1_sem)

	df	AIC
model1_lm	4	-1805.135
model1_slm	5	-2037.111
model1_sem	5	-2036.113

Let’s systematically explore how a change in the distance parameter “d2” affects Moran’s I in the three different models:

# create empty vectors for holding the Moran's I statistics
moran_I_lm <- c()
moran_I_slm <- c()
moran_I_sem <- c()

# loop through a distance vector d ranging from 50 to 2000 m
for (d in seq(from = 50, to = 2000, by = 50)) 
{ nests_nb_temp <- dnearneigh(as.matrix(nestdata[,c("x", "y")]), d1 = 0, d2 = d)
  nests_weights_temp <- nb2listw(nests_nb_temp, style = "W", zero.policy = TRUE)
  
  moran_lm <- moran.mc(model1_lm$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
  moran_I_lm <- c(moran_I_lm, moran_lm$statistic)
  
  moran_slm <- moran.mc(model1_slm$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
  moran_I_slm <- c(moran_I_slm, moran_slm$statistic)
  
  moran_sem <- moran.mc(model1_sem$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
  moran_I_sem <- c(moran_I_sem, moran_sem$statistic)
}

moran_I_lm  <- data.frame(moran_I = moran_I_lm, 
                          distance = seq(50, 2000, 50), 
                          model="Simple linear model")

moran_I_slm <- data.frame(moran_I = moran_I_slm, 
                          distance = seq(50, 2000, 50), 
                          model="Spatial lag model")

moran_I_sem <- data.frame(moran_I = moran_I_sem, 
                          distance = seq(50, 2000, 50), 
                          model="Spatial error model")

moran.df <- rbind(moran_I_lm, moran_I_slm, moran_I_sem)

ggplot(moran.df, aes(x = distance, y = moran_I, col=model)) + 
  geom_point() +
  geom_line()

EXERCISES

A. Explore whether the 100 randomly sampled points are really randomly distributed in space.

#### Hint: We first have to convert the random points to a "ppp" object:
random_points_ppp <- ppp(x=random_points_UTM@coords[,1], y=random_points_UTM@coords[,2], window=gorillas$window)

B. Integrate additional, or different environmental covariates in the model (e.g. heat, distance to water, etc.), and compare the models against each other. Which models perform better?

C. Create separate models to predict the density of different groups of nests (major vs. minor) or of different seasons (rainy vs. dry). Which models perform better?

#### Hint: You can make subsets of "ppp" objects in the following way:
gorillas_major <- subset.ppp(gorillas, group=="major")