In this week’s practical we will consider two principal concerns in spatial ecology that we have looked at largely in isolation on this course: prediction and inference. Both of these analytical approaches are important in ecology and conservation science. From the perspective of biodiversity, the ability to interpret ecological data and predict losses (mitigation) and gains (intervention or restoration) is of particular interest to scientists. In the first half of this practical we will look at options for making predictions of species diversity based on a simple assessment of species richness (number of species in a given area) and similarity (how alike are sites with each other based on community composition). In the second half we will carry out analysis on a data set of geo-located species richness and abundance data to make inferences about site contribution to beta diversity.

Part One: Prediction

The basic approach taken to predicting species richness in the landscape is, at its simplest, an extention of the species distribution modelling work carried out in weeks 4 and 5. Two general approaches are possible: predict first, then assemble and assemble first then predict. In the predict first-then assemble approach the procedure is to carry out species distribution models for all species in a dataset (typically from transect observations) and then summarize the models through map algebra. In the assemble first-then predict approach, species richness itself is modelled as the response variable (in the case of this week’s practical, of a general linear model) with a single prediction based on the resulting model.

The data for this part of the practical are taken from a bird distribution survey in Montana and Idaho, USA (Hutto and Young 2002). Sampling locations consist of point counts (100-m radius), along a transect (10 points/transect; transects are approximately 3 km long), with transects randomly selected within USFS Forest Regions across Montana and Idaho.

First, copy the data from “Diversity” folder in the course dropbox (get the link from the content page on Blackboard) to a folder for Week9 in the GEOG70922 area of your P drive.

Now let’s load the necessary libraries to carry out the analysis for Part One:

#install packages
install.packages(c("raster","rgdal","gdm","reshape2","vegan","devtools"))
#load packages
library(raster)           #for raster covariate data; version 2.6-7 used
## Loading required package: sp
library(rgdal)            #for reading different types of GIS files; version 1.3-4 used
## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Overwritten PROJ_LIB was C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/proj
library(gdm)              #generalized dissimilarity modeling; version 1.3.11 used
## Warning: package 'gdm' was built under R version 4.0.5
## Loading required package: foreach
## Loading required package: doParallel
## Loading required package: iterators
## Loading required package: parallel
library(reshape2)         #for re-formatting data; version 1.4.3 used
library(vegan)             #for ecological indices
## Warning: package 'vegan' was built under R version 4.0.5
## Loading required package: permute
## Warning: package 'permute' was built under R version 4.0.5
## Loading required package: lattice
## This is vegan 2.5-7

Modeling communities

For both approaches mentioned above, we need raster (environmental) covariates in order to make our species distribution predictions. These consist of layers representing elevation, canopy and precipitation. First we load in these layers and resample the precipitation layer, which has a different resolution to the other layers, using the resample() function. Note the canopy raster is derived from a Principal Component Analysis and is measured on a scale from -1 (no cover) to 1 (full cover).

#layers
Elev <- raster("elev.gri")                           #elevation layer
Canopy <- raster("cc2.gri")                          #linear gradient in canopy 



Precip <- raster("precip.gri")                       #precip
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
## = prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in Proj4
## definition
#reformat Precip to same res/extent as elevation
Precip <- resample(x = Precip, y = Elev, "bilinear") #for continuous data
Precip <- mask(Precip, Elev)

plot(Precip)

#convert Precip to meters
Precip <- Precip / 100
plot(Precip)

layers <- stack(Canopy, Elev, Precip)          
names(layers) <- c("canopy", "elev", "precip")

#plot
plot(layers)

The next job is to create a crs (coordinate reference system) object from our raster objects so we can use this in the subsequent step to set the bird data crs.

#inspect
projection(layers)
## [1] NA
#label the CRS
crs.layers <- CRS("+proj=aea +lat_1=46 +lat_2=48 +lat_0=44 +lon_0=-109.5 +x_0=600000 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
projection(layers) <- crs.layers

#-------------#
#bird data
#-------------#

birds <- read.csv("birdcommunity.csv")

#inspect the data
head(birds, 3)
##    TRANSECT STOP LAT_WGS84 LONG_WGS84 SPECIES PRES
## 1 452511619    5  45.34436  -116.4082    AMDI    0
## 2 452511619    6  45.34442  -116.4122    AMDI    0
## 3 452511619    9  45.34158  -116.4129    AMDI    0
#create a spatial points dataframe for converting coordinates
birds.coords <- data.frame(x = birds$LONG_WGS84, y = birds$LAT_WGS84)
birds.attributes <- data.frame(transect = birds$TRANSECT, point = birds$STOP, species = birds$SPECIES, pres = birds$PRES)

#define CRS
crs.latlong <-  CRS("+proj=longlat +datum=WGS84")
birds.spdf <- SpatialPointsDataFrame(birds.coords, data = birds.attributes, proj4string = crs.latlong)

#transform CRS to layers CRS
birds.spdf <- spTransform(birds.spdf, crs.layers)

We create a new data frame from the point data attributes and their coordinates and plot them against elevation:

#new bird data frame
birds.df <- data.frame(birds.spdf@data, x = coordinates(birds.spdf)[,1], y = coordinates(birds.spdf)[,2])

#inspect
names(birds.df)
## [1] "transect" "point"    "species"  "pres"     "x"        "y"
#plot
plot(Elev)
points(birds.spdf)

We need to render our data in the appropriate format with sampling sites taking up the rows and species the columns. We can carry this out using the dcast function that we used in Week 3 to format our Canis lupus telemetry data:

#make a site-species data.frame
species.site <- dcast(birds.df, transect + point + x + y ~ species, value.var = "pres")

head(species.site)
##    transect point        x        y AMDI AMGO AMKE AMRE AMRO ATTW BADO BBMA
## 1 452511619     5 59142.22 173151.8    0    1    0    0    0    0    0    0
## 2 452511619     6 58834.36 173185.7    0    0    0    0    1    0    0    1
## 3 452511619     9 58754.24 172876.0    0    0    0    0    1    0    0    1
## 4 452511625     1 59037.42 181450.2    0    0    0    0    1    0    0    0
## 5 452511625     4 59336.71 181389.1    0    0    0    0    1    0    0    0
## 6 452511625     6 59220.69 181108.2    0    0    0    0    1    0    0    0
##   BBWO BCCH BEKI BHCO BHGR BOCH BRCR CAFI CAHU CANG CANW CAQU CAVI CBCH CCSP
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    1    0    0    1    0    0    0    0    0    0    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    1    0    0    0    0    0    0    0    0    0    0    1    1    0
## 5    0    1    0    0    1    0    0    0    0    0    0    0    0    1    0
## 6    0    0    0    0    1    0    0    0    0    0    0    0    1    0    0
##   CEDW CHSP CHUK CLNU COFL COHA COME CORA COYE DEJU DOWO DUFL DUGR EAKI EVGR
## 1    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    1    1    0    0    0    0    0    0    1    0    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    1    0    0    0    0    0    0    0    0    0    1    0    0    0
## 5    0    1    0    0    0    0    0    0    0    1    0    1    0    0    0
## 6    0    0    0    0    0    0    0    0    0    1    0    1    0    0    0
##   FOSP GCKI GRAJ GRCA HAFL HAWO HETH HOFI HOWR KILL LAZB LBCU LEFL LISP MALL
## 1    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
## 4    0    1    0    0    1    0    0    0    0    0    0    0    0    0    0
## 5    0    1    0    0    1    0    0    0    0    0    0    0    0    0    0
## 6    1    0    1    0    0    0    0    0    1    0    0    0    0    0    0
##   MAWR MGWA MOBL MOCH MODO NAWA NOBO NOFL NOGO NOPO NOWA NRWS OCWA OSFL PAWR
## 1    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    1    0    0    0    1    0    0    0    0    0    0    0    0    0
## 3    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
## 4    0    1    0    0    0    0    0    1    0    0    0    0    0    0    1
## 5    0    1    0    0    0    1    0    0    0    0    0    0    1    0    0
## 6    0    1    0    1    0    0    0    1    0    0    0    0    1    0    0
##   PIGR PISI PIWO PYNU RBNU RCKI RECR REVI RNEP RNSA ROWR RTHA RUGR RUHU RWBL
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0
## 5    0    0    0    0    1    0    0    0    0    0    0    0    1    0    0
## 6    0    0    0    0    1    0    0    0    0    0    0    0    1    0    0
##   SACR SAVS SOSP SPGR SPSA SPTO SSHA STJA SWHA SWTH TEWA TOSO TOWA TRES TUVU
## 1    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
## 2    0    0    1    0    0    1    0    0    0    0    0    0    0    0    0
## 3    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
## 4    0    0    0    0    0    0    0    0    0    1    0    0    1    0    0
## 5    0    0    0    0    0    1    0    0    0    1    0    0    1    0    0
## 6    0    0    0    0    0    1    0    1    0    1    0    0    0    0    0
##   VASW VATH VESP WAVI WBNU WCSP WEME WETA WEWP WIFL WISA WISN WITU WIWA YEWA
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    1    0    0    1    1    0    0    0    0    0    0    1
## 3    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
## 4    0    1    0    0    0    0    0    1    0    0    0    0    0    0    0
## 5    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
## 6    0    0    0    1    0    0    0    1    0    0    0    0    0    0    0
##   YRWA
## 1    0
## 2    0
## 3    0
## 4    1
## 5    0
## 6    1
View(species.site)

From the data frame we have just created, subset the site and species data only to create a matrix (i.e. drop the transect, point and x,y columns). We then subset the data further to include only species that occur at 20 or more sampling points. Note the use of the %in% command for matching elements within different objects (in this case column names in these two data frames).

#site X species matrix

spp.matrix <- subset(species.site, select = -c(transect, point, x, y))

head(spp.matrix)
##   AMDI AMGO AMKE AMRE AMRO ATTW BADO BBMA BBWO BCCH BEKI BHCO BHGR BOCH BRCR
## 1    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    0    1    0    0    1    0    1    0    0    1    0    0
## 3    0    0    0    0    1    0    0    1    0    0    0    0    0    0    0
## 4    0    0    0    0    1    0    0    0    0    1    0    0    0    0    0
## 5    0    0    0    0    1    0    0    0    0    1    0    0    1    0    0
## 6    0    0    0    0    1    0    0    0    0    0    0    0    1    0    0
##   CAFI CAHU CANG CANW CAQU CAVI CBCH CCSP CEDW CHSP CHUK CLNU COFL COHA COME
## 1    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    1    1    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    0    0    0    1    1    0    0    1    0    0    0    0    0
## 5    0    0    0    0    0    0    1    0    0    1    0    0    0    0    0
## 6    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
##   CORA COYE DEJU DOWO DUFL DUGR EAKI EVGR FOSP GCKI GRAJ GRCA HAFL HAWO HETH
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    0    0    1    0    0    0    0    1    0    0    1    0    0
## 5    0    0    1    0    1    0    0    0    0    1    0    0    1    0    0
## 6    0    0    1    0    1    0    0    0    1    0    1    0    0    0    0
##   HOFI HOWR KILL LAZB LBCU LEFL LISP MALL MAWR MGWA MOBL MOCH MODO NAWA NOBO
## 1    0    0    0    1    0    0    0    0    0    1    0    0    0    0    0
## 2    0    0    0    1    0    0    0    0    0    1    0    0    0    1    0
## 3    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
## 5    0    0    0    0    0    0    0    0    0    1    0    0    0    1    0
## 6    0    1    0    0    0    0    0    0    0    1    0    1    0    0    0
##   NOFL NOGO NOPO NOWA NRWS OCWA OSFL PAWR PIGR PISI PIWO PYNU RBNU RCKI RECR
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 3    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 4    1    0    0    0    0    0    0    1    0    0    0    0    1    0    0
## 5    0    0    0    0    0    1    0    0    0    0    0    0    1    0    0
## 6    1    0    0    0    0    1    0    0    0    0    0    0    1    0    0
##   REVI RNEP RNSA ROWR RTHA RUGR RUHU RWBL SACR SAVS SOSP SPGR SPSA SPTO SSHA
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
## 2    0    0    0    0    0    0    0    0    0    0    1    0    0    1    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
## 4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 5    0    0    0    0    0    1    0    0    0    0    0    0    0    1    0
## 6    0    0    0    0    0    1    0    0    0    0    0    0    0    1    0
##   STJA SWHA SWTH TEWA TOSO TOWA TRES TUVU VASW VATH VESP WAVI WBNU WCSP WEME
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    0    0    0    0    0    0    0    0    0    0    1    0    0    1
## 3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1
## 4    0    0    1    0    0    1    0    0    0    1    0    0    0    0    0
## 5    0    0    1    0    0    1    0    0    0    0    0    0    0    0    0
## 6    1    0    1    0    0    0    0    0    0    0    0    1    0    0    0
##   WETA WEWP WIFL WISA WISN WITU WIWA YEWA YRWA
## 1    0    0    0    0    0    0    0    0    0
## 2    1    0    0    0    0    0    0    1    0
## 3    0    0    0    0    0    0    0    0    0
## 4    1    0    0    0    0    0    0    0    1
## 5    1    0    0    0    0    0    0    0    0
## 6    1    0    0    0    0    0    0    0    1
#subset based on frequency of occurrence
prevalence <- colSums(spp.matrix)
prevalence.20 <- prevalence[prevalence > 20]
species.20 <- names(prevalence.20)
species.matrix <- spp.matrix[,colnames(spp.matrix) %in% species.20]

Now we have our columns of species occurence for each site we need to extract values for each site from our environmental covariates using the extract function which by now should be very familiar.We use the [] command to select coordinates from the species.site object for our extraction points.

#extract covariates for sites
site.cov <- extract(layers, species.site[,c("x", "y")])

#inspect
head(site.cov)
##             canopy  elev precip
## [1,]  0.0000000000 0.867  0.470
## [2,]  0.0000000000 0.949  0.470
## [3,] -0.0005330133 1.140  0.470
## [4,] -0.1638363004 1.174  0.600
## [5,] -0.1675450951 1.209  0.600
## [6,] -0.0315038487 1.263  0.607

Before moving on to the modelling we can calculate the Sorenson Dissimilarity Index for each pair of sites as a summary statistic:

sorenson <- vegdist(species.matrix, method = "bray")
sorenson.mat <- as.matrix(sorenson)

sorenson.mat

Now let’s get to the predictions. We have multiple species to predict (the number of columns in species.matrix) so naturally we need to create a list to hold our raster objects (maps) for each prediction.

######################################################################
# Predict first, assemble later
######################################################################

#Create lists to save the species models
pred.map <- list()                         #stores map predictions


#create vector for the number of species presence/absence records
Nspecies <- ncol(species.matrix)


#create vectors of predictors for simple processing
canopy <- site.cov[,"canopy"]
elev <- site.cov[,"elev"]
precip <- site.cov[,"precip"]

Now we create a for loop to iterate over each species, perform a glm model using our environmental covariates (this is exactly the same as for the glm models we looked at in Week 4) and populate the list with the resulting raster predictions.

#Run GLM model for each species

for (i in 1:Nspecies){
  
  species.i <- glm(species.matrix[,i] ~ canopy + poly(elev,2) + precip, family = binomial)

  #predictions for mapping
  prob.pred <- predict(layers,species.i,  type = "response")  
  
  
  pred.map[[i]] <- prob.pred
  print(i) # print each iteration to be sure the loop is working

}  
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
prob.map.stack <- stack(pred.map)
names(prob.map.stack) <- colnames(species.matrix)

#Plot of predicted distribution for first species
plot(prob.map.stack$AMRO, xlab = "Longitude", ylab = "Latitude",
     main="Predicted map for AMRO - Predict first, assemble later")

So we have our familiar probability maps but this week we are in the business of predicting species richness. We therefore need to do two things: 1. decide on a sensible threshold to delineate predicted presence/absence and 2. once all maps are converted to a binary 1/0 (presence/absence), sum all the rasters to return a species richness prediction (remember richness is just the total number of species).

In previous weeks we used the Receiver Operator’s Curve to select a threshold that maximized true positive and minimized false positive classifications. Here, for the purpose of demonstration, we will use a simpler method based on the overall ratio of presence records to total sampling sites. For example, if a species is detected 60 times across 100 sampling sites then, by this rationale, the threshold for predicted occurrence in other locations is set to 60/100 = 0.6.

#use threshold to predict 0/1

Nsites <- nrow(species.matrix)

spp.thres.map <- pred.map
for (i in 1:Nspecies){
  thresh.i <- sum(species.matrix[,i]) / Nsites

  spp.thres.map[[i]][which(spp.thres.map[[i]][] > thresh.i)] <- 1
  spp.thres.map[[i]][which(spp.thres.map[[i]][] <= thresh.i)] <- 0
}

# make a multilayered file (basically a multi-band raster) for species distributions
spp.thres.map <- stack(spp.thres.map)

## Computing species richness
spp.thres.map <- sum(spp.thres.map)

#plot
plot(spp.thres.map)

The second approach to this same problem of predicting species richness is to calculate species richness from the data as the first step (by summing the 0s and 1s for each site) and enter this as the response variable of a single model. Note that instead of binomial here we use the Poisson family which should be used when the response variable consists of count data.

###########################################################################
# Assemble first-then predict 
###########################################################################



#calculate species richness
richness <- rowSums(species.matrix)

#Poisson glm for use with count data
pois.rich <- glm(richness ~ canopy + poly(elev,2) + precip, family = poisson)

#inspect
summary(pois.rich)
## 
## Call:
## glm(formula = richness ~ canopy + poly(elev, 2) + precip, family = poisson)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6603  -0.6006  -0.0088   0.6149   2.9355  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     2.72432    0.03549  76.770  < 2e-16 ***
## canopy         -0.19808    0.02889  -6.856 7.06e-12 ***
## poly(elev, 2)1 -1.52544    0.29129  -5.237 1.63e-07 ***
## poly(elev, 2)2 -1.14501    0.29472  -3.885 0.000102 ***
## precip         -0.18532    0.04015  -4.615 3.92e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 775.20  on 781  degrees of freedom
## Residual deviance: 658.86  on 777  degrees of freedom
## AIC: 4077.3
## 
## Number of Fisher Scoring iterations: 4
#map the Poisson model
pois.raster <- predict(layers,pois.rich, type = "response") 




plot(pois.raster, xlab = "Longitude", ylab = "Latitude",
     main="Predicted species richness - Assemble first, predict later")

#make a map of differences between models of richness
spp.diff <- spp.thres.map - pois.raster
plot(spp.diff, xlab = "Longitude", ylab = "Latitude", main="Difference in richness models")

#Raster stack of different species richness predictions
richness.stack <- stack(spp.thres.map, pois.raster)
names(richness.stack) <- c("thres-rich", "pois-rich")

#correlation among richness predictions
richness.map.corr <- layerStats(richness.stack, 'pearson', na.rm = T)

#inspect
richness.map.corr
## $`pearson correlation coefficient`
##            thres.rich pois.rich
## thres.rich  1.0000000 0.8997051
## pois.rich   0.8997051 1.0000000
## 
## $mean
## thres.rich  pois.rich 
##   22.44090   12.44444
#remove big files no longer needed
rm(pred.map)
rm(spp.thres.map)

We can see considerable variation (difference) between the predicted species richness using our two approaches. In general stacked species distribution models (S-SDMs) are thought to over-predict species richness and do not take into account biotic interactions (e.g. inter-specific competition). This tendency to over-predict is borne out in our results here.

For the final step of Part One we consider the relationship between site similarity and our environmental variables. This works in as similar way as for distribution modelling only here our response variable is continuous (similarity) rather than binary (presence/absence). The gdm() function calculates the “generalized” dissimilarity of locations (a measure of beta diversity) and tries to predict them using the a set of predictor variables (in our case the precipitation, elevation and canopy raster layers).

#format data for gdm
siteID <- 1:nrow(species.matrix)
site.xy <- data.frame(x = species.site$x, y = species.site$y)

gdm.species.matrix <- data.frame(cbind(siteID, site.xy, species.matrix))
gdm.site.matrix <- data.frame(cbind(siteID, site.cov))

#inspect
head(gdm.site.matrix)
##   siteID        canopy  elev precip
## 1      1  0.0000000000 0.867  0.470
## 2      2  0.0000000000 0.949  0.470
## 3      3 -0.0005330133 1.140  0.470
## 4      4 -0.1638363004 1.174  0.600
## 5      5 -0.1675450951 1.209  0.600
## 6      6 -0.0315038487 1.263  0.607
head(gdm.species.matrix)
##   siteID        x        y AMRO BCCH BHCO BHGR BRCR CAFI CAHU CAVI CBCH CHSP
## 1      1 59142.22 173151.8    0    0    0    0    0    0    0    0    0    1
## 2      2 58834.36 173185.7    1    1    0    1    0    0    0    0    0    1
## 3      3 58754.24 172876.0    1    0    0    0    0    0    0    0    0    0
## 4      4 59037.42 181450.2    1    1    0    0    0    0    0    1    1    1
## 5      5 59336.71 181389.1    1    1    0    1    0    0    0    0    1    1
## 6      6 59220.69 181108.2    1    0    0    1    0    0    0    1    0    0
##   CLNU CORA DEJU DUFL EVGR FOSP GCKI GRAJ HAFL HAWO HETH HOWR LAZB MGWA MOBL
## 1    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0
## 2    0    0    1    0    0    0    0    0    0    0    0    0    1    1    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0
## 4    0    0    0    1    0    0    1    0    1    0    0    0    0    1    0
## 5    0    0    1    1    0    0    1    0    1    0    0    0    0    1    0
## 6    0    0    1    1    0    1    0    1    0    0    0    1    0    1    0
##   MOCH NAWA NOFL OCWA OSFL PAWR PISI PIWO RBNU RCKI RECR RNSA RUGR RUHU SOSP
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    1    0    0    0    0    0    0    0    0    0    0    0    0    1
## 3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    1    0    0    1    0    0    1    0    0    0    0    0    0
## 5    0    1    0    1    0    0    0    0    1    0    0    0    1    0    0
## 6    1    0    1    1    0    0    0    0    1    0    0    0    1    0    0
##   SPTO STJA SWTH TOSO TOWA VATH WAVI WBNU WETA WISA WIWA YEWA YRWA
## 1    1    0    0    0    0    0    0    0    0    0    0    0    0
## 2    1    0    0    0    0    0    1    0    1    0    0    1    0
## 3    1    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    1    0    1    1    0    0    1    0    0    0    1
## 5    1    0    1    0    1    0    0    0    1    0    0    0    0
## 6    1    1    1    0    0    0    1    0    1    0    0    0    1
#get gdm formatted object - this function re-formats the data into a data frame with all site pairs as rows and with columns that represent the distance between sites based on the Sorenson index (method="bray" for binary data) and predictor variables (canopy, precipitation and elevation) for each site.

gdm.data <- formatsitepair(gdm.species.matrix, bioFormat = 1, dist = "bray", abundance = F,
                         XColumn = "x", YColumn = "y", siteColumn = "siteID", predData = gdm.site.matrix)

#take a look at the reformatted data

head(gdm.data)
##      distance weights s1.xCoord s1.yCoord s2.xCoord s2.yCoord s1.canopy s1.elev
## 1   0.5294118       1  59142.22  173151.8  58834.36  173185.7         0   0.867
## 1.1 0.5000000       1  59142.22  173151.8  58754.24  172876.0         0   0.867
## 1.2 0.8095238       1  59142.22  173151.8  59037.42  181450.2         0   0.867
## 1.3 0.7272727       1  59142.22  173151.8  59336.71  181389.1         0   0.867
## 1.4 0.8333333       1  59142.22  173151.8  59220.69  181108.2         0   0.867
## 1.5 0.8400000       1  59142.22  173151.8  59020.85  180919.5         0   0.867
##     s1.precip     s2.canopy s2.elev s2.precip
## 1        0.47  0.0000000000   0.949     0.470
## 1.1      0.47 -0.0005330133   1.140     0.470
## 1.2      0.47 -0.1638363004   1.174     0.600
## 1.3      0.47 -0.1675450951   1.209     0.600
## 1.4      0.47 -0.0315038487   1.263     0.607
## 1.5      0.47  0.2088066936   1.260     0.640
#run gdm model
gdm.dist <- gdm(gdm.data, geo = T)

The maximum height of each line in the following plots indicates the magnitude of total biological change along each environmental gradient and thereby corresponds to the relative importance of that predictor in contributing to biological turnover while holding all other variables constant (i.e. it is the partial ecological distance). Ecological distance (dissimilarity) is plotted on the y axis and the environmental gradient on the x.

#partial plots
plot(gdm.dist)

#predictions for mapping; layers must be in same order as gdm model the gdm.transform function gives us the model predictions from the gdm() function and projects them onto our raster layers.

gdm.trans.data <- gdm.transform(gdm.dist, layers)

Now we have, for each predictor variable, a prediction for dissimilarity based on our environmental covariates. We can plot them like so:

#plot
plot(gdm.trans.data)

In order to be able to map these different predictions of similarity we need a way to combine them into a single prediction output. One way is to transform the different layers to the same scale then then use the new variables produced to plot an RGB (Red-Green-Blue) image.

gdm.trans.data<- (gdm.trans.data/(maxValue(gdm.trans.data))) #normalize data to 0/1 scale



#plot
plot(gdm.trans.data)

As a last step here we can plot a map of the study area that combines each of the three layers produced on the previous step into one image using the plotRGB function. The resulting map gives a relative measure of spatial variation in bird species composition.

#plot with RGB, adding each layer
plotRGB(gdm.trans.data,r=5, g=4, b=3, scale=1)

Part Two: Inference (analysis of contributions to beta diversity)

In part two we will consider measurements of different levels of diversity (alpha, beta, gamma) and look at beta diversity in detail as a special case that takes on particular significance due it’s inherently spatial characteristics.

First install the packages for Part Two:

install.packages(c("adegraphics","adespatial","ggplot2"))

#package "vegetarian" is not available for recent versions of R so we have to use the devtools package to get it:

library("devtools")

devtools::install_version("vegetarian", version = "1.2")

Now call the libraries before starting the analysis:

# Load packages
library(adegraphics)#for plotting multivariate data
## Warning: package 'adegraphics' was built under R version 4.0.5
## Registered S3 methods overwritten by 'adegraphics':
##   method         from
##   biplot.dudi    ade4
##   kplot.foucart  ade4
##   kplot.mcoa     ade4
##   kplot.mfa      ade4
##   kplot.pta      ade4
##   kplot.sepan    ade4
##   kplot.statis   ade4
##   scatter.coa    ade4
##   scatter.dudi   ade4
##   scatter.nipals ade4
##   scatter.pco    ade4
##   score.acm      ade4
##   score.mix      ade4
##   score.pca      ade4
##   screeplot.dudi ade4
## 
## Attaching package: 'adegraphics'
## The following object is masked from 'package:raster':
## 
##     zoom
library(adespatial)#for beta diversity measures
## Registered S3 method overwritten by 'spdep':
##   method   from
##   plot.mst ape
## Registered S3 methods overwritten by 'adespatial':
##   method             from       
##   plot.multispati    adegraphics
##   print.multispati   ade4       
##   summary.multispati ade4
library(vegetarian)#for calculating Hill numbers
library(ggplot2)#for data visualization

The bird data used in Part One was appropriate for making prediction across space for a simple measure of diversity - species richness. However, if we want to carry out more detailed analyses of community structure across space we require data on abundance. For Part Two of this practical we use a well known dataset on a fish community in a European river system used in the doctoral thesis of Verneaux (1973;2003). These data were collected at 30 sites along the Doubs River, which runs near the France-Switzerland border in the Jura Mountains. The data is a matrix that contains coded abundances of 27 fish species.

spe<-read.csv("Doubs.csv")

First we calculate some basic single-number measures of diversity for this dataset:

#Calculation of species richness (Hill number =0), Shannon's Index (Hill number = 1) and Simpson's Index (Hill number =2)

# First consult the guidance on the d() function of the package "vegetarian" that facilitates this process. 
?d
## starting httpd help server ... done
# Mean alpha species richness
d(spe, lev = "alpha", q = 0)
## [1] 12.93103
# Mean alpha Shannon diversity
d(spe, lev = "alpha", q = 1)
## [1] 8.807188
# Mean alpha Simpson diversity
d(spe, lev = "alpha", q = 2, boot = TRUE)
## $D.Value
## [1] 5.701004
## 
## $StdErr
## [1] 0.1712793
# Multiplicative beta species richness
d(spe, lev = "beta", q = 0)
## [1] 2.088
# Multiplicative beta Shannon diversity
d(spe, lev = "beta", q = 1)
## [1] 2.24448
# Multiplicative beta Simpson diversity
d(spe, lev = "beta", q = 2, boot = TRUE)
## $D.Value
## [1] 2.660885
## 
## $StdErr
## [1] 0.1412974
# Gamma species richness
d(spe, lev = "gamma", q = 0)
## [1] 27
# Gamma Shannon diversity
d(spe, lev = "gamma", q = 1)
## [1] 19.76756
# Gamma Simpson diversity
d(spe, lev = "gamma", q = 2, boot = TRUE)
## $D.Value
## [1] 15.16971
## 
## $StdErr
## [1] 0.5297977

Now we can plot the change in beta diversity as the value for Hill number increases. What we are modelling here is the overall species turnover (difference between sites - y axis) when emphasis on evenness is increased (x axis).

First we create a dataframe to hold the beta diversity values for a nominal value range of 0 to 20.

# Plot multiplicative beta diversity vs order
mbeta <- data.frame(order = 0:20, beta = NA, se = NA)
for (i in 1:nrow(mbeta)) {
  out <- d(spe, lev = "beta", q = mbeta$order[i], boot = TRUE)
  mbeta$beta[i] <- out$D.Value # the result of the d() function is placed in the "beta" column of our data frame
  mbeta$se[i] <- out$StdErr # the standard error is placed in the "se" column of our data frame
}

mbeta
##    order     beta         se
## 1      0 2.088000 0.04045129
## 2      1 2.244480 0.05898979
## 3      2 2.660885 0.14060237
## 4      3 3.333756 0.23126054
## 5      4 4.083607 0.28244868
## 6      5 4.746571 0.26915585
## 7      6 5.268541 0.31069040
## 8      7 5.665132 0.27989195
## 9      8 5.967177 0.34797601
## 10     9 6.200845 0.36260963
## 11    10 6.384976 0.36134239
## 12    11 6.532633 0.38734760
## 13    12 6.652902 0.39525034
## 14    13 6.752215 0.40413964
## 15    14 6.835219 0.44503248
## 16    15 6.905340 0.37478391
## 17    16 6.965154 0.47712530
## 18    17 7.016627 0.42712673
## 19    18 7.061282 0.45794116
## 20    19 7.100315 0.37479211
## 21    20 7.134671 0.48180484

To visualize the result we call the ggplot() function to create the different components of the chart (this gives us more versatility than the base R plot() function).

ggplot(mbeta, aes(order, beta)) +  # the '+' symbol allows us to build up (add) multiple components to our chart
  geom_point() +
  geom_line() +
  geom_errorbar(
    aes(order, beta, ymin = beta - se, ymax = beta + se), # here we construct the error bars manually
    width = 0.2) + labs(y = "beta diversity", 
                        x = "Order of the diversity measure")

We can see that as evenness is increasingly emphasized in our diversity measures, the measure of beta diversity increases and does so most noticeable for Hill numbers between 2 and 10. This demonstrates the importance of considering the concepts of abundance and evenness in our estimates of diversity and underlines the limitations of numerical assessments such as species richness.

For the final part of this practical we will consider the contribution of individual sites to overall beta diversity, the Local Contribution to Beta Diversity (LCBD). We will use the the spe.beta() function that was written specifically for this purpose by Legendre and De Cáceres (2013). This function is provided by the “adespatial” package.

In order to pass our abundance data to the spe.beta() function the count data must be transformed to ratio data. We can do this using a popular transformation methods known as “Hellinger” transformation.

# Computation using beta.div {adespatial} on 
# Hellinger-transformed species data
spe.beta <- beta.div(spe, method = "hellinger", nperm = 9999)

# get the LCBD values
spe.beta$LCBD
##  [1] 0.07252040 0.04446478 0.04104101 0.02797626 0.02645407 0.01997878
##  [7] 0.03448857 0.04090284 0.03092578 0.03935674 0.03897281 0.04773390
## [13] 0.03469917 0.02892282 0.02303320 0.01719880 0.01606821 0.01941901
## [19] 0.02358674 0.02622679 0.03032694 0.06155305 0.05241202 0.04756547
## [25] 0.03046571 0.03078689 0.03168996 0.02372550 0.03750379
# p-values
spe.beta$p.LCBD
##  [1] 0.0001 0.0974 0.1850 0.8583 0.9057 0.9963 0.4960 0.1758 0.6899 0.2166
## [11] 0.2230 0.0293 0.4097 0.7560 0.9611 0.9993 0.9998 0.9947 0.9630 0.9050
## [21] 0.7357 0.0005 0.0095 0.0336 0.6937 0.6829 0.6270 0.9635 0.3017
# Holm correction for conservative estimate of significance where multiple tests are run
p.adjust(spe.beta$p.LCBD, "holm")
##  [1] 0.0029 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
## [11] 1.0000 0.7618 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
## [21] 1.0000 0.0140 0.2565 0.8400 1.0000 1.0000 1.0000 1.0000 1.0000
# Sites with significant Holm-corrected LCBD value
row.names(spe[which(p.adjust(spe.beta$p.LCBD, "holm") <= 0.05),])
## [1] "1"  "22"
dev.off()# clear the plotting pane before next step
## null device 
##           1

We read in the file containing the sampling locations (XY coordinates) and plot each site as a function of its contribution to beta diversity (larger size signifies greater contribution).

DoubsCoords<-read.csv("DoubsSpa.csv")

head(DoubsCoords)
##        X      Y
## 1 85.678 20.000
## 2 84.955 20.100
## 3 92.301 23.796
## 4 91.280 26.431
## 5 92.005 29.163
## 6 95.954 36.315
plot(DoubsCoords, 
     cex.axis = 0.8, 
     pch = 21, 
     col = "red", 
     bg = "white", 
     cex = spe.beta$LCBD * 70, 
     main = "LCBD values", 
     xlab = "x coordinate (km)", 
     ylab = "y coordinate (km)"
)

#Finally plot lines approximating the course of the river and add symbols denoting significant cases:

lines(DoubsCoords, col = "light blue")
text(76, 20, "***", cex = 1.2, col = "red")
text(80, 90, "***", cex = 1.2, col = "red")