In this assignment, we will work with the dataset “gorillas”, which contains nesting locations of gorillas in a National Park in Cameroon. You can find a description of this dataset here: https://rdrr.io/cran/spatstat.data/man/gorillas.html
We will explore the spatial distribution of the nesting sites and which environmental factors might determine it!
First, please install the following packages, and manually install the package “spatstat.core” that you can find in the folder with today’s material:
# install.packages("spatstat", "raster", "rgdal", "sp", "leaflet", "viridis", "ggplot2", "spdep", "spatialreg")
library(spatstat) ## for spatial statistics
library(raster) ## to work with raster files
library(rgdal) ## to work with shapefiles
library(sp) ## to work with shapefiles
library(leaflet) ## to plot interactive maps
library(viridis) ## for some nice color scales
library(ggplot2) ## for nice diagrams
library(spdep) ## to calculate spatial weights
library(spatialreg) ## for spatial regression
Define your working directory:
dir <- "C:/Users/Max/Desktop/SpatialStats_Class"
Let’s load the gorillas dataset into our environment:
data(gorillas)
“gorillas” is a “ppp” (planar point pattern) object that contains 647 observations of nesting sites. For each site, we have information about the group, season and date when the site was recorded.
gorillas
## Marked planar point pattern: 647 points
## Mark variables: group, season, date
## window: polygonal boundary
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres
summary(gorillas)
## Marked planar point pattern: 647 points
## Average intensity 3.255566e-05 points per square metre
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 metres
##
## Mark variables: group, season, date
## Summary:
## group season date
## Length:647 Length:647 Min. :2006-01-06
## Class :character Class :character 1st Qu.:2007-03-15
## Mode :character Mode :character Median :2008-02-05
## Mean :2007-12-14
## 3rd Qu.:2008-09-23
## Max. :2009-05-31
##
## Window: polygonal boundary
## single connected closed polygon with 21 vertices
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres
## (5476 x 4566 metres)
## Window area = 19873700 square metres
## Unit of length: 1 metre
## Fraction of frame area: 0.795
class(gorillas)
## [1] "ppp"
Plot the dataset - You can see one map for each of the three nest variables:
plot(gorillas)
Let’s inspect the object “gorillas” with more detail. It contains six elements:
length(gorillas)
## [1] 6
names(gorillas)
## [1] "window" "n" "x" "y" "markformat"
## [6] "marks"
The first element is an object of type “owin”, which is essentially a polygon that depicts the area in which the nests were sampled:
gorillas[[1]]
## window: polygonal boundary
## enclosing rectangle: [580457.9, 585934] x [674172.8, 678739.2] metres
class(gorillas[[1]])
## [1] "owin"
plot(gorillas[[1]], main = "Study Area")
The third and fourth elements are vectors of the x and y coordinates:
head(gorillas$x)
## [1] 582518.4 581823.0 582131.0 582111.9 582585.1 582302.3
head(gorillas$y)
## [1] 676886.2 677422.7 676937.9 677420.0 677509.7 677521.6
The sixth element is a data frame that contains the group, season and date for each observation:
head(gorillas$marks)
| group | season | date |
|---|---|---|
| major | dry | 2006-01-06 |
| major | dry | 2006-01-10 |
| major | dry | 2006-01-15 |
| major | dry | 2006-01-24 |
| minor | dry | 2006-01-27 |
| major | dry | 2006-01-28 |
Let’s also inspect the object “gorillas.extra”. This is a collection of seven environmental raster datasets in the form of “im” (pixel image) objects:
gorillas.extra
## List of pixel images
##
## aspect:
## factor-valued pixel image
## factor levels:
## [1] "N" "NE" "E" "SE" "S" "SW" "W" "NW"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## elevation:
## integer-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## heat:
## factor-valued pixel image
## factor levels:
## [1] "Warmest" "Moderate" "Coolest"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## slopeangle:
## real-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## slopetype:
## factor-valued pixel image
## factor levels:
## [1] "Valley" "Toe" "Flat" "Midslope" "Upper" "Ridge"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## vegetation:
## factor-valued pixel image
## factor levels:
## [1] "Disturbed" "Colonising" "Grassland" "Primary" "Secondary"
## [6] "Transition"
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
##
## waterdist:
## real-valued pixel image
## 149 x 181 pixel array (ny, nx)
## enclosing rectangle: [580440, 586000] x [674160, 678730] metres
names(gorillas.extra)
## [1] "aspect" "elevation" "heat" "slopeangle" "slopetype"
## [6] "vegetation" "waterdist"
plot(gorillas.extra)
class(gorillas.extra$aspect)
## [1] "im"
We have already produced some maps, but the options are rather limited as long as our data is in the format of “ppp”, “owin” and “im” as these are quite specific formats in the “spatstat” package. For more advanced mapping, and for our further analyses, we have to transform some of this data into raster and shapefiles.
Let’s convert the outline of the study region to a polygon (“SpatialPolygons”) by accessing the coordinates of the vertices in the “owin” object:
coords_study_area <- data.frame(x_coord = gorillas[[1]]$bdry[[1]]$x,
y_coord = gorillas[[1]]$bdry[[1]]$y)
p = Polygon(coords_study_area)
ps = Polygons(list(p),1)
study_area_UTM = SpatialPolygons(list(ps), proj4string = CRS("+proj=utm +zone=32 +datum=WGS84"))
class(study_area_UTM)
## [1] "SpatialPolygons"
## attr(,"package")
## [1] "sp"
plot(study_area_UTM, main="Study Area")
Let’s convert the point observations to a polygon (“SpatialPointsDataFrame”) by accessing their coordinates in the “ppp” object:
coords_nests <- data.frame(x_coord = gorillas$x, y_coord = gorillas$y)
coords_nests <- cbind(coords_nests, gorillas$marks)
nests_UTM <- SpatialPointsDataFrame(coords = coords_nests[,c(1:2)], data = coords_nests,
proj4string = CRS("+proj=utm +zone=32 +datum=WGS84"))
nests_UTM
## class : SpatialPointsDataFrame
## features : 647
## extent : 580797.3, 584945.3, 675238.7, 678313.5 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## variables : 5
## names : x_coord, y_coord, group, season, date
## min values : 580797.3, 675238.68, major, dry, 13154
## max values : 584945.34, 678313.46, minor, rainy, 14395
class(nests_UTM)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
head(nests_UTM@data)
| x_coord | y_coord | group | season | date |
|---|---|---|---|---|
| 582518.4 | 676886.2 | major | dry | 2006-01-06 |
| 581823.0 | 677422.7 | major | dry | 2006-01-10 |
| 582131.0 | 676937.9 | major | dry | 2006-01-15 |
| 582111.9 | 677420.0 | major | dry | 2006-01-24 |
| 582585.1 | 677509.7 | minor | dry | 2006-01-27 |
| 582302.3 | 677521.6 | major | dry | 2006-01-28 |
We can convert any element of “gorillas.extra” to a raster object in the following way. The raster object “aspect_ras” does not yet contain a coordinate reference system (crs):
aspect_ras <- raster(gorillas.extra$aspect)
class(aspect_ras)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
aspect_ras
## class : RasterLayer
## dimensions : 149, 181, 26969 (nrow, ncol, ncell)
## resolution : 30.70955, 30.70955 (x, y)
## extent : 580440.4, 585998.8, 674156.5, 678732.2 (xmin, xmax, ymin, ymax)
## crs : NA
## source : memory
## names : layer
## values : 1, 8 (min, max)
## attributes :
## ID VALUE
## from: 1 N
## to : 8 NW
my_palette <- c("red", "orange", "yellow", "green", "cyan", "#00bbff", "#0000ff", "#ff00ff")
plot(aspect_ras, "VALUE", main="Aspect", col=my_palette)
Let’s import the shapefile with the outline of Cameroon. This shapefile does not contain projected but geographic lon/lat coordinates. We use this CRS to project the aspect raster, and plot the raster together with the nest coordinates and the study area border in an interactive leaflet map (UTM coordinates would not work with leaflet):
cameroon <- readOGR(paste0(dir, "/shapefile/gadm41_CMR_0.shp"), verbose=F)
plot(cameroon, main="Cameroon")
crs(aspect_ras) <- CRS("+proj=utm +zone=32 +datum=WGS84")
## re-project the shapefiles to geographic coordinates, otherwise they are not displayed in leaflet
study_area_LONLAT <- spTransform(study_area_UTM, crs(cameroon))
nests_LONLAT <- spTransform(nests_UTM, crs(cameroon))
leaflet() %>%
#setView(9.75,6.12, zoom = 12) %>%
addTiles() %>%
addRasterImage(aspect_ras, colors = my_palette, opacity = 0.8) %>%
addCircleMarkers(data = nests_LONLAT, radius=1, color="black", opacity=1) %>%
addPolygons(data = study_area_LONLAT, fillColor = "red", weight = 3, color = "black", fillOpacity = 0) %>%
addLegend(colors = my_palette, values = values(aspect_ras), title = "Aspect",
labels=c("North", "Northeast", "East", "Southeast", "South", "Southwest", "West", "Northwest")) %>%
addScaleBar()
Plot the elevation dataset, which is integer-valued and for which a continuous color palette can be used:
elevation_ras <- raster(gorillas.extra$elevation)
crs(elevation_ras) <- CRS("+proj=utm +zone=32 +datum=WGS84")
plot(elevation_ras, col=terrain.colors(100))
leaflet() %>%
addProviderTiles('Esri.WorldImagery') %>%
addRasterImage(elevation_ras, colors = viridis(100), opacity = 0.8) %>%
addCircleMarkers(data = nests_LONLAT, radius=1, color="black", opacity=1, label = ~paste(group, season, date, sep=", ")) %>%
addScaleBar()
Let’s go back to the object “gorillas”; we will need the raster and shapefile datasets again later. We now focus on empirically assessing whether the point pattern in the nest locations is random or clustered.
Do you remember the quadrat counting exercise from the course? We can do the same in R:
quadrats = quadratcount(gorillas, nx = 10, ny = 10)
gorillas_unmarked <- unmark(gorillas) ## delete the marks information, otherwise the next plot does not work
plot(intensity(quadrats, image=TRUE), main="Gorilla nests - quadrats" , las=1) # Plot density raster
plot(gorillas_unmarked, pch=20, cols="grey70", add=T, size=0.5)
plot(quadrats, add=T)
We can perform a chi-squared test of complete spatial randomness for the points and the quadrats defined above. The resulting p-value indicates that our point pattern is not random.
quadrat.test(gorillas, nx = 10, ny = 10)
## Warning: Some expected counts are small; chi^2 approximation may be inaccurate
##
## Chi-squared test of CSR using quadrat counts
##
## data: gorillas
## X2 = 2071.6, df = 93, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 94 tiles (irregular windows)
Compute an isotropic kernel density estimate of the point pattern:
plot(density(gorillas), main = "Nest density")
plot(gorillas, pch = 1, cex = 0.5, add = TRUE)
## Plotting the first column of marks
contour(density(gorillas), add = TRUE)
The K-Ripley function also indicates that the nests are spatially more clustered than would be expected:
K_function <- envelope(gorillas,fun=Kest,
funargs=list(correction="border"),global=TRUE)
## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(K_function)
In chapter 6, we want to model nest density with the seven environmental covariates contained in “gorillas.extra”. To have an unbiased spatial sample, we have to sample nest density and the covariates at 100 random locations, and summarize all information at these locations in a table.
Let us use the shapefile “study_area_UTM” to define these 100 random points first:
set.seed(1234)
random_points_UTM <- spsample(study_area_UTM, n=100, "random")
## Warning in proj4string(obj): CRS object has comment, which is lost in output
plot(random_points_UTM, main="Random points (black) and nests (red)")
plot(nests_UTM, add=T, col="red")
plot(study_area_UTM, add=T)
Rasterize the remaining layers of “gorillas.extra”, and the kernel density map:
heat_ras <- raster(gorillas.extra$heat)
slopeangle_ras <- raster(gorillas.extra$slopeangle)
slopetype_ras <- raster(gorillas.extra$slopetype)
vegetation_ras <- raster(gorillas.extra$vegetation)
waterdist_ras <- raster(gorillas.extra$waterdist)
density_ras <- raster(density(gorillas))
Join all raster layers in one multi-layered raster-stack and assign CRS. You can very easily plot all layers contained in the stack:
stack1 <- stack(aspect_ras, elevation_ras, heat_ras, slopeangle_ras, slopetype_ras, vegetation_ras, waterdist_ras)
names(stack1) <- c("aspect", "elevation", "heat", "slopeangle", "slopetype", "vegetation", "waterdist")
crs(stack1) <- CRS("+proj=utm +zone=32 +datum=WGS84")
plot(stack1)
Extract raster values for the point locations. Also extract the respective values from the density raster, join the results together, and add coordinates:
nestdata <- as.data.frame(extract(stack1, random_points_UTM))
nestdata$density <- extract(density_ras, random_points_UTM)
nestdata <- cbind(nestdata, random_points_UTM@coords)
head(nestdata)
| aspect | elevation | heat | slopeangle | slopetype | vegetation | waterdist | density | x | y |
|---|---|---|---|---|---|---|---|---|---|
| 8 | 1849 | 2 | 10.560980 | 1 | 1 | 122.83730 | 3.00e-06 | 584608.5 | 675415.8 |
| 6 | 1247 | 1 | 8.247169 | 1 | 1 | 68.66813 | 8.00e-07 | 582765.3 | 674575.9 |
| 7 | 1602 | 2 | 16.906500 | 1 | 1 | 43.42954 | 1.91e-05 | 584074.9 | 678035.5 |
| 1 | 1789 | 1 | 20.547510 | 6 | 4 | 61.41864 | 5.85e-05 | 584059.9 | 677174.1 |
| 8 | 1634 | 1 | 26.627790 | 1 | 1 | 122.83730 | 3.98e-05 | 583230.8 | 678131.9 |
| 6 | 1808 | 2 | 42.498470 | 1 | 4 | 61.41864 | 4.69e-05 | 582479.6 | 676053.7 |
We start off with an ordinary OLS linear regression model. Let’s use both elevation and vegetation as predictors and plot the residuals in a map. You can see that there is spatial pattern in the residuals:
model1_lm <- lm(density ~ elevation + vegetation, data = nestdata)
summary(model1_lm)
##
## Call:
## lm(formula = density ~ elevation + vegetation, data = nestdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.010e-05 -1.370e-05 -6.780e-06 2.219e-05 5.761e-05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.105e-06 2.646e-05 -0.042 0.967
## elevation -1.457e-09 1.661e-08 -0.088 0.930
## vegetation 1.732e-05 2.070e-06 8.364 4.5e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.839e-05 on 97 degrees of freedom
## Multiple R-squared: 0.4538, Adjusted R-squared: 0.4425
## F-statistic: 40.29 on 2 and 97 DF, p-value: 1.828e-13
nestdata$residuals <- residuals(model1_lm)
nestdata$fitted <- fitted(model1_lm)
ggplot(nestdata, aes(x = x, y = y, col = residuals, size = residuals)) +
geom_point() +
scale_color_gradient2()
To assess the degree of spatial autocorrelation in density, let’s calculate Moran’s I.
To do so, we first have to identify which points are neighboring. We can do that with the function “dnearneigh”. The resulting object “nests_nb” contains one list element for each random point, and the IDs of all respective neighboring points. You can change the argument “d2” in “dnearneigh” to in- or decrease the search radius:
nests_nb <- spdep::dnearneigh(as.matrix(nestdata[c("x", "y")]), d1 = 0, d2 = 500)
nests_nb
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 330
## Percentage nonzero weights: 3.3
## Average number of links: 3.3
## 1 region with no links:
## 26
Let’s use “nests_nb” to assign weights
nests_weights <- nb2listw(nests_nb, style = "W", zero.policy = TRUE)
class(nests_weights)
## [1] "listw" "nb"
# head(nests_weights)
str(nests_weights$weights, list.len=5, give.attr = F)
## List of 100
## $ : num [1:5] 0.2 0.2 0.2 0.2 0.2
## $ : num [1:4] 0.25 0.25 0.25 0.25
## $ : num [1:5] 0.2 0.2 0.2 0.2 0.2
## $ : num [1:2] 0.5 0.5
## $ : num [1:4] 0.25 0.25 0.25 0.25
## [list output truncated]
Perform the Moran’s I test. The p-value is below 0.05, which indicates spatial autocorrelation:
lm.morantest(model1_lm, nests_weights, zero.policy = T)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
##
## Moran I statistic standard deviate = 6.5613, p-value = 2.667e-11
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.532063900 -0.024404619 0.007192828
Now, the question is whether we should use a spatial lag or a spatial error model. We need to run a statistical test, the Lagrange Multiplier test, to find that out. The p-value for the spatial lag term is smaller than for the spatial error term, which indicates that the spatial lag model is more appropriate:
lm.LMtests(model1_lm, nests_weights, test="LMerr", zero.policy = T)
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
##
## LMErr = 38.075, df = 1, p-value = 6.807e-10
lm.LMtests(model1_lm, nests_weights, test="LMlag", zero.policy = T)
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = density ~ elevation + vegetation, data = nestdata)
## weights: nests_weights
##
## LMlag = 78.895, df = 1, p-value < 2.2e-16
Anyway, let’s fit both models and compare them against each other and against the first model:
model1_slm <- lagsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, zero.policy = TRUE)
## Warning in lagsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, : inversion of asymptotic covariance matrix failed for tol.solve = 2.22044604925031e-16
## reziproke Konditionszahl = 1.16716e-19 - using numerical Hessian.
model1_slm
##
## Call:
## lagsarlm(formula = density ~ elevation + vegetation, data = nestdata,
## listw = nests_weights, zero.policy = TRUE)
## Type: lag
##
## Coefficients:
## rho (Intercept) elevation vegetation
## 9.120527e-01 -1.063415e-05 5.854761e-09 1.596096e-06
##
## Log likelihood: 1023.556
model1_sem <- errorsarlm(density ~ elevation + vegetation, data = nestdata, listw = nests_weights, zero.policy = TRUE)
## Warning in errorsarlm(density ~ elevation + vegetation, data = nestdata, : inversion of asymptotic covariance matrix failed for tol.solve = 2.22044604925031e-16
## reziproke Konditionszahl = 2.63198e-19 - using numerical Hessian.
model1_sem
##
## Call:
## errorsarlm(formula = density ~ elevation + vegetation, data = nestdata,
## listw = nests_weights, zero.policy = TRUE)
## Type: error
##
## Coefficients:
## lambda (Intercept) elevation vegetation
## 9.584548e-01 -7.001184e-05 4.342284e-08 2.744002e-07
##
## Log likelihood: 1023.057
AIC(model1_lm, model1_slm, model1_sem)
| df | AIC | |
|---|---|---|
| model1_lm | 4 | -1805.135 |
| model1_slm | 5 | -2037.111 |
| model1_sem | 5 | -2036.113 |
Let’s systematically explore how a change in the distance parameter “d2” affects Moran’s I in the three different models:
# create empty vectors for holding the Moran's I statistics
moran_I_lm <- c()
moran_I_slm <- c()
moran_I_sem <- c()
# loop through a distance vector d ranging from 50 to 2000 m
for (d in seq(from = 50, to = 2000, by = 50))
{ nests_nb_temp <- dnearneigh(as.matrix(nestdata[,c("x", "y")]), d1 = 0, d2 = d)
nests_weights_temp <- nb2listw(nests_nb_temp, style = "W", zero.policy = TRUE)
moran_lm <- moran.mc(model1_lm$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
moran_I_lm <- c(moran_I_lm, moran_lm$statistic)
moran_slm <- moran.mc(model1_slm$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
moran_I_slm <- c(moran_I_slm, moran_slm$statistic)
moran_sem <- moran.mc(model1_sem$residuals, nests_weights_temp, nsim = 999, zero.policy = TRUE)
moran_I_sem <- c(moran_I_sem, moran_sem$statistic)
}
moran_I_lm <- data.frame(moran_I = moran_I_lm,
distance = seq(50, 2000, 50),
model="Simple linear model")
moran_I_slm <- data.frame(moran_I = moran_I_slm,
distance = seq(50, 2000, 50),
model="Spatial lag model")
moran_I_sem <- data.frame(moran_I = moran_I_sem,
distance = seq(50, 2000, 50),
model="Spatial error model")
moran.df <- rbind(moran_I_lm, moran_I_slm, moran_I_sem)
ggplot(moran.df, aes(x = distance, y = moran_I, col=model)) +
geom_point() +
geom_line()
#### Hint: We first have to convert the random points to a "ppp" object:
random_points_ppp <- ppp(x=random_points_UTM@coords[,1], y=random_points_UTM@coords[,2], window=gorillas$window)
#### Hint: You can make subsets of "ppp" objects in the following way:
gorillas_major <- subset.ppp(gorillas, group=="major")