Week 8 Diversity

In this week’s practical we will consider two principal concerns in spatial ecology that we have looked at largely in isolation on this course: prediction and inference. Both of these analytical approaches are important in ecology and conservation science. From the perspective of biodiversity, the ability to interpret ecological data and predict losses (mitigation) and gains (intervention or restoration) is of particular interest to scientists. In the first half of this practical we will look at options for making predictions of species diversity based on a simple assessment of species richness (number of species in a given area) and similarity (how alike are sites with each other based on community composition). In the second half we will carry out analysis on a data set of geo-located species richness and abundance data to make inferences about site contribution to beta diversity.

Part One: Prediction

The basic approach taken to predicting species richness in the landscape is, at its simplest, an extention of the species distribution modelling work carried out in weeks 2 and 3. Two general approaches are possible: predict first, then assemble and assemble first then predict. In the predict first-then assemble approach the procedure is to carry out species distribution models for all species in a dataset (typically from transect observations) and then summarize the models through map algebra. In the assemble first-then predict approach, species richness itself is modelled as the response variable (in the case of this week’s practical, of a general linear model) with a single prediction based on the resulting model.

The data for this part of the practical are taken from a bird distribution survey in Montana and Idaho, USA (Hutto and Young 2002). Sampling locations consist of point counts (100-m radius), along a transect (10 points/transect; transects are approximately 3 km long), with transects randomly selected within USFS Forest Regions across Montana and Idaho.

First, copy the data from “Diversity” folder in the course dropbox (get the link from the content page on Canvas).

Set your working directory and load the necessary libraries to carry out the analysis for Part One:

setwd("./week8")

#install packages
install.packages(c("terra","gdm","vegan","devtools","ggplot2","tidyr"))

#load packages
library(terra)           #for raster covariate data;

## terra 1.9.1

library(gdm)              #generalized dissimilarity modeling;
library(vegan)             #for ecological indices

## Loading required package: permute

library(sf)               #spatial objects

## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

library(tidyr)            #data handling

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:terra':
## 
##     extract

Modeling communities

For both approaches mentioned above, we need raster (environmental) covariates in order to make our species distribution predictions. These consist of layers representing elevation, canopy and precipitation. First we load in these layers and resample the precipitation layer, which has a different resolution to the other layers, using the resample() function. Note the canopy raster is derived from a Principal Component Analysis and is measured on a scale from -1 (no cover) to 1 (full cover).

#layers
Elev <- rast("elev.tif")                           #elevation layer
Canopy <- rast("canopy.tif")                          #linear gradient in canopy 



Precip <- rast("precip.tif")                       #precip



#reformat Precip to same res/extent as elevation
Precip <- resample(x = Precip, y = Elev, "bilinear") #for continuous data
Precip <- mask(Precip, Elev)

plot(Precip)

#convert Precip to meters
Precip <- Precip / 100
plot(Precip)

layers <- c(Canopy, Elev, Precip)          
names(layers) <- c("canopy", "elev", "precip")

#plot
plot(layers)

Now read in the bird data.

#-------------#
#bird data
#-------------#

birds <- read.csv("birdcommunity.csv")

#inspect the data
head(birds)

##    TRANSECT STOP LAT_WGS84 LONG_WGS84 SPECIES PRES
## 1 452511619    5  45.34436  -116.4082    AMDI    0
## 2 452511619    6  45.34442  -116.4122    AMDI    0
## 3 452511619    9  45.34158  -116.4129    AMDI    0
## 4 452511625    1  45.41867  -116.4189    AMDI    0
## 5 452511625    4  45.41836  -116.4150    AMDI    0
## 6 452511625    6  45.41575  -116.4162    AMDI    0

#create sf object
birds_sf <- st_as_sf(birds, coords = c("LONG_WGS84", "LAT_WGS84"), crs = "epsg:4326")
#transform CRS for sites to layers CRS
birds_sf <- st_transform(birds_sf, crs = crs(layers))
#new df with x-y coordinates
birds_df <- data.frame(birds, st_coordinates(birds_sf))
head(birds_df)

##    TRANSECT STOP LAT_WGS84 LONG_WGS84 SPECIES PRES        X        Y
## 1 452511619    5  45.34436  -116.4082    AMDI    0 59142.22 173151.8
## 2 452511619    6  45.34442  -116.4122    AMDI    0 58834.36 173185.7
## 3 452511619    9  45.34158  -116.4129    AMDI    0 58754.24 172876.0
## 4 452511625    1  45.41867  -116.4189    AMDI    0 59037.42 181450.2
## 5 452511625    4  45.41836  -116.4150    AMDI    0 59336.71 181389.1
## 6 452511625    6  45.41575  -116.4162    AMDI    0 59220.69 181108.2

We need to render our data in the appropriate format with sampling sites taking up the rows and species the columns. We can carry this out using the pivot_wider() function from the tidyr package:

#make a site-species data.frame

# re-order data to wide format (note the output is a "tibble" - an data object format used by the tidyverse family of R packages - but it's basically the same as a table)
species_site =pivot_wider(birds_df, names_from=SPECIES, values_from = PRES)

head(species_site)

## # A tibble: 6 × 120
##    TRANSECT  STOP LAT_WGS84 LONG_WGS84      X       Y  AMDI  AMGO  AMKE  AMRE
##       <int> <int>     <dbl>      <dbl>  <dbl>   <dbl> <int> <int> <int> <int>
## 1 452511619     5      45.3      -116. 59142. 173152.     0     1     0     0
## 2 452511619     6      45.3      -116. 58834. 173186.     0     0     0     0
## 3 452511619     9      45.3      -116. 58754. 172876.     0     0     0     0
## 4 452511625     1      45.4      -116. 59037. 181450.     0     0     0     0
## 5 452511625     4      45.4      -116. 59337. 181389.     0     0     0     0
## 6 452511625     6      45.4      -116. 59221. 181108.     0     0     0     0
## # ℹ 110 more variables: AMRO <int>, ATTW <int>, BADO <int>, BBMA <int>,
## #   BBWO <int>, BCCH <int>, BEKI <int>, BHCO <int>, BHGR <int>, BOCH <int>,
## #   BRCR <int>, CAFI <int>, CAHU <int>, CANG <int>, CANW <int>, CAQU <int>,
## #   CAVI <int>, CBCH <int>, CCSP <int>, CEDW <int>, CHSP <int>, CHUK <int>,
## #   CLNU <int>, COFL <int>, COHA <int>, COME <int>, CORA <int>, COYE <int>,
## #   DEJU <int>, DOWO <int>, DUFL <int>, DUGR <int>, EAKI <int>, EVGR <int>,
## #   FOSP <int>, GCKI <int>, GRAJ <int>, GRCA <int>, HAFL <int>, HAWO <int>, …

From the data frame we have just created, subset the site and species data only to create a matrix (i.e. drop the transect, point and x,y columns). We then subset the data further to include only species that occur at 20 or more sampling points. Note the use of the %in% command for matching elements within different objects (in this case column names in these two data frames).

#site X species matrix (choose all columns from columns 7 as currently columns 1:6 do not contain species data)

spp_matrix <- species_site[,7:ncol(species_site)]

#inspect
head(spp_matrix)

## # A tibble: 6 × 114
##    AMDI  AMGO  AMKE  AMRE  AMRO  ATTW  BADO  BBMA  BBWO  BCCH  BEKI  BHCO  BHGR
##   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1     0     1     0     0     0     0     0     0     0     0     0     0     0
## 2     0     0     0     0     1     0     0     1     0     1     0     0     1
## 3     0     0     0     0     1     0     0     1     0     0     0     0     0
## 4     0     0     0     0     1     0     0     0     0     1     0     0     0
## 5     0     0     0     0     1     0     0     0     0     1     0     0     1
## 6     0     0     0     0     1     0     0     0     0     0     0     0     1
## # ℹ 101 more variables: BOCH <int>, BRCR <int>, CAFI <int>, CAHU <int>,
## #   CANG <int>, CANW <int>, CAQU <int>, CAVI <int>, CBCH <int>, CCSP <int>,
## #   CEDW <int>, CHSP <int>, CHUK <int>, CLNU <int>, COFL <int>, COHA <int>,
## #   COME <int>, CORA <int>, COYE <int>, DEJU <int>, DOWO <int>, DUFL <int>,
## #   DUGR <int>, EAKI <int>, EVGR <int>, FOSP <int>, GCKI <int>, GRAJ <int>,
## #   GRCA <int>, HAFL <int>, HAWO <int>, HETH <int>, HOFI <int>, HOWR <int>,
## #   KILL <int>, LAZB <int>, LBCU <int>, LEFL <int>, LISP <int>, MALL <int>, …

#subset based on frequency of occurrence
prevalence <- colSums(spp_matrix)
prevalence_20 <- prevalence[prevalence > 20]
species_20 <- names(prevalence_20)
species_matrix <- data.frame(spp_matrix[,colnames(spp_matrix) %in% species_20])

Now we have our columns of species occurence for each site we need to extract values for each site from our environmental covariates using the extract function which by now should be very familiar.We use square brackets [] to select coordinates from the species_site object for our extraction points.

#extract covariates for sites
site_cov <- terra::extract(layers, species_site[,c("X", "Y")])

#inspect
head(site_cov)

##   ID        canopy     elev    precip
## 1  1 -2.887155e-06 0.862175 0.4700000
## 2  2 -2.957113e-03 0.957150 0.4700000
## 3  3 -8.889201e-02 1.084013 0.4782500
## 4  4 -9.533008e-02 1.184963 0.6000000
## 5  5 -1.105027e-01 1.223212 0.5977500
## 6  6  2.360085e-02 1.273920 0.6160375

Before moving on to the modelling we can calculate the Sorenson Dissimilarity Index for each pair of sites as a summary statistic:

#sorenson index with methd = "bray" for binary data

sorenson <- vegdist(species_matrix, method = "bray")
sorenson_mat <- as.matrix(sorenson)

View(sorenson_mat)

Now let’s get to the predictions. We have multiple species to predict (the number of columns in species_matrix) so naturally we need to create a list to hold our raster objects (maps) for each prediction.

######################################################################
# Predict first, assemble later
######################################################################

#Create lists to save the species models
pred_map <- list()                         #stores map predictions


#create vector for the number of species presence/absence records
Nspecies <- 1:ncol(species_matrix)


#create vectors of predictors for simple processing
canopy <- site_cov[,"canopy"]
elev <- site_cov[,"elev"]
precip <- site_cov[,"precip"]

Now we create a for loop to iterate over each species, perform a glm model using our environmental covariates (this is exactly the same as for the glm models we looked at in Weeks 1 and 2) and populate the list with the resulting raster predictions.

#Run GLM model for each species

for (i in Nspecies){
 

  species.i <- glm(species_matrix[,i] ~ canopy + poly(elev,2) + precip, family = binomial)

  #predictions for mapping
  prob_pred <- predict(layers,species.i,  type = "response")  
  
  
  pred_map[[i]] <- prob_pred
  print(i) # print each iteration to be sure the loop is working

}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53

#stack all rasters together
prob_map_stack <- c(pred_map)

#assign names
names(prob_map_stack) <- colnames(species_matrix)

#Plot of predicted distribution for first species
plot(prob_map_stack$AMRO, xlab = "Longitude", ylab = "Latitude",
     main="Predicted map for AMRO - Predict first, assemble later")

So we have our familiar probability maps but this week we are in the business of predicting species richness. We therefore need to do two things: 1. decide on a sensible threshold to delineate predicted presence/absence and 2. once all maps are converted to a binary 1/0 (presence/absence), sum all the rasters to return a species richness prediction (remember richness is just the total number of species).

In previous weeks we used the Receiver Operator’s Curve to select a threshold that maximized true positive and minimized false positive classifications. Here, for the purpose of demonstration, we will use a simpler method based on the overall ratio of presence records to total sampling sites. For example, if a species is detected 60 times across 100 sampling sites then, by this rationale, the threshold for predicted occurrence in other locations is set to 60/100 = 0.6.

#use threshold to predict 0/1

Nsites <- nrow(species_matrix)

spp_thres_map <- pred_map
for (i in Nspecies){
  thresh_i <- sum(species_matrix[,i]) / Nsites

  spp_thres_map[[i]][which(spp_thres_map[[i]][] > thresh_i)] <- 1
  spp_thres_map[[i]][which(spp_thres_map[[i]][] <= thresh_i)] <- 0
}

# make a multilayered file (basically a multi-band raster) for species distributions
spp_thres_map <- c(spp_thres_map)

Now we have a list of our truncated raster values (to values of either 0 or 1), we can use the app() function from the terra package to sum these rasters together. app() works a little like the apply() function but iterates specifically over bands in a raster stack.

## Computing species richness

spp_thres_map <- app(rast(spp_thres_map), fun = "sum", na.rm = TRUE)

## |---------|---------|---------|---------|=========================================

#plot
plot(spp_thres_map)

The second approach to this same problem of predicting species richness is to calculate species richness from the data as the first step (by summing the 0s and 1s for each site) and enter this as the response variable of a single model. Note that instead of binomial here we use the Poisson family which should be used when the response variable consists of count data.

###########################################################################
# Assemble first-then predict 
###########################################################################


 
#calculate species richness
richness <- rowSums(species_matrix)

#Poisson glm for use with count data
pois_rich <- glm(richness ~ canopy + poly(elev,2) + precip, family = poisson)

#inspect
summary(pois_rich)

## 
## Call:
## glm(formula = richness ~ canopy + poly(elev, 2) + precip, family = poisson)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     2.72843    0.03541  77.056  < 2e-16 ***
## canopy         -0.23588    0.03180  -7.417 1.20e-13 ***
## poly(elev, 2)1 -1.48210    0.29084  -5.096 3.47e-07 ***
## poly(elev, 2)2 -1.18025    0.29456  -4.007 6.15e-05 ***
## precip         -0.18285    0.04001  -4.570 4.89e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 775.20  on 781  degrees of freedom
## Residual deviance: 652.92  on 777  degrees of freedom
## AIC: 4071.4
## 
## Number of Fisher Scoring iterations: 4

#map the Poisson model
pois_raster <- predict(layers,pois_rich, type = "response") 




plot(pois_raster, xlab = "Longitude", ylab = "Latitude",
     main="Predicted species richness - Assemble first, predict later")

#make a map of differences between models of richness
spp_diff <- spp_thres_map - pois_raster
plot(spp_diff, xlab = "Longitude", ylab = "Latitude", main="Difference in richness models")

#Raster stack of different species richness predictions
richness_stack <- c(spp_thres_map, pois_raster)
names(richness_stack) <- c("thres-rich", "pois-rich")

#correlation among richness predictions
richness_map_corr <- layerCor(richness_stack, "pearson")

#inspect
richness_map_corr

## $correlation
##            thres-rich pois-rich
## thres-rich  1.0000000 0.9029699
## pois-rich   0.9029699 1.0000000
## 
## $mean
##            thres-rich pois-rich
## thres-rich        NaN  22.64767
## pois-rich    12.49485       NaN
## 
## $n
##            thres-rich pois-rich
## thres-rich        NaN   2479106
## pois-rich     2479106       NaN

#remove big files no longer needed
rm(pred_map)
rm(spp_thres_map)

We can see considerable variation (difference) between the predicted species richness using our two approaches. In general stacked species distribution models (S-SDMs) are thought to over-predict species richness and do not take into account biotic interactions (e.g. inter-specific competition). This tendency to over-predict is borne out in our results here.

For the final step of Part One we consider the relationship between site similarity and our environmental variables. This works in as similar way as for distribution modelling only here our response variable is continuous (similarity) rather than binary (presence/absence). The gdm() function calculates the “generalized” dissimilarity of locations (a measure of beta diversity) and tries to predict them using a set of predictor variables (in our case the precipitation, elevation and canopy raster layers).

#format data for gdm
siteID <- 1:nrow(species_matrix)
site_xy <- data.frame(x = species_site$X, y = species_site$Y)

#combine data on species
gdm_species_matrix <- data.frame(cbind(siteID, site_xy, species_matrix))

#create data frame for site characteristcs (predictor variables)
gdm_site_matrix <- data.frame(cbind(siteID, site_cov[,2:ncol(site_cov)]))

#inspect
head(gdm_site_matrix)

##   siteID        canopy     elev    precip
## 1      1 -2.887155e-06 0.862175 0.4700000
## 2      2 -2.957113e-03 0.957150 0.4700000
## 3      3 -8.889201e-02 1.084013 0.4782500
## 4      4 -9.533008e-02 1.184963 0.6000000
## 5      5 -1.105027e-01 1.223212 0.5977500
## 6      6  2.360085e-02 1.273920 0.6160375

head(gdm_species_matrix)

##   siteID        x        y AMRO BCCH BHCO BHGR BRCR CAFI CAHU CAVI CBCH CHSP
## 1      1 59142.22 173151.8    0    0    0    0    0    0    0    0    0    1
## 2      2 58834.36 173185.7    1    1    0    1    0    0    0    0    0    1
## 3      3 58754.24 172876.0    1    0    0    0    0    0    0    0    0    0
## 4      4 59037.42 181450.2    1    1    0    0    0    0    0    1    1    1
## 5      5 59336.71 181389.1    1    1    0    1    0    0    0    0    1    1
## 6      6 59220.69 181108.2    1    0    0    1    0    0    0    1    0    0
##   CLNU CORA DEJU DUFL EVGR FOSP GCKI GRAJ HAFL HAWO HETH HOWR LAZB MGWA MOBL
## 1    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0
## 2    0    0    1    0    0    0    0    0    0    0    0    0    1    1    0
## 3    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0
## 4    0    0    0    1    0    0    1    0    1    0    0    0    0    1    0
## 5    0    0    1    1    0    0    1    0    1    0    0    0    0    1    0
## 6    0    0    1    1    0    1    0    1    0    0    0    1    0    1    0
##   MOCH NAWA NOFL OCWA OSFL PAWR PISI PIWO RBNU RCKI RECR RNSA RUGR RUHU SOSP
## 1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
## 2    0    1    0    0    0    0    0    0    0    0    0    0    0    0    1
## 3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    1    0    0    1    0    0    1    0    0    0    0    0    0
## 5    0    1    0    1    0    0    0    0    1    0    0    0    1    0    0
## 6    1    0    1    1    0    0    0    0    1    0    0    0    1    0    0
##   SPTO STJA SWTH TOSO TOWA VATH WAVI WBNU WETA WISA WIWA YEWA YRWA
## 1    1    0    0    0    0    0    0    0    0    0    0    0    0
## 2    1    0    0    0    0    0    1    0    1    0    0    1    0
## 3    1    0    0    0    0    0    0    0    0    0    0    0    0
## 4    0    0    1    0    1    1    0    0    1    0    0    0    1
## 5    1    0    1    0    1    0    0    0    1    0    0    0    0
## 6    1    1    1    0    0    0    1    0    1    0    0    0    1

Get a gdm formatted object using the formatsitepair() function - this function re-formats the data into a data frame with all site pairs as rows and with columns that represent the distance between sites based on the Sorenson index (method=“bray” for binary data) and predictor variables (canopy, precipitation and elevation) for each site.

gdm_data <- formatsitepair(gdm_species_matrix, bioFormat = 1, dist = "bray", abundance = F,
                         XColumn = "x", YColumn = "y", siteColumn = "siteID", predData = gdm_site_matrix)

#take a look at the reformatted data

head(gdm_data)

##      distance weights s1.xCoord s1.yCoord s2.xCoord s2.yCoord     s1.canopy
## 1   0.5294118       1  59142.22  173151.8  58834.36  173185.7 -2.887155e-06
## 1.1 0.5000000       1  59142.22  173151.8  58754.24  172876.0 -2.887155e-06
## 1.2 0.8095238       1  59142.22  173151.8  59037.42  181450.2 -2.887155e-06
## 1.3 0.7272727       1  59142.22  173151.8  59336.71  181389.1 -2.887155e-06
## 1.4 0.8333333       1  59142.22  173151.8  59220.69  181108.2 -2.887155e-06
## 1.5 0.8400000       1  59142.22  173151.8  59020.85  180919.5 -2.887155e-06
##      s1.elev s1.precip    s2.canopy  s2.elev s2.precip
## 1   0.862175      0.47 -0.002957113 0.957150 0.4700000
## 1.1 0.862175      0.47 -0.088892013 1.084013 0.4782500
## 1.2 0.862175      0.47 -0.095330082 1.184963 0.6000000
## 1.3 0.862175      0.47 -0.110502735 1.223212 0.5977500
## 1.4 0.862175      0.47  0.023600850 1.273920 0.6160375
## 1.5 0.862175      0.47  0.250101894 1.279144 0.6400000

Now run the generalised dissimilarity model, including environmental covariates and xy coordinates (geo=TRUE)

#run gdm model
gdm_dist <- gdm(gdm_data, geo = TRUE)

The maximum height of each line in the following plots indicates the magnitude of total biological change along each environmental gradient and thereby corresponds to the relative importance of that predictor in contributing to dissimilarity between sites while holding all other variables constant (i.e. it is the partial ecological distance). Ecological distance (dissimilarity) is plotted on the y axis and the environmental gradient on the x.

#partial plots
plot(gdm_dist)

Predictions for mapping - layers for prediction must be in same order as in the gdm model. The gdm.transform() function gives us the model predictions from the gdm() function and projects them onto our raster layers.

#transform data for mapping
gdm_transform_data <- gdm.transform(gdm_dist, layers)

Now we have, for each predictor variable, a prediction for dissimilarity based on our environmental covariates. We can plot them like so:

#plot
plot(gdm_transform_data)

In order to be able to map these different predictions of similarity we need a way to combine them into a single prediction output. One way is to transform the different layers to the same scale then then use the new variables produced to plot an RGB (Red-Green-Blue) image. The first step is to render all values on the same scale for plotting which we can do inside a for loop.

The “global” part in the loop below is just taking each raster layer and returning the max value.

#normalize data to 0/1 scale 

#initialise list
layerNorm=list()

#for loop to iterate through raster bands and divide each by the max value for that band.
for(i in 1:nlyr(gdm_transform_data)){
  layer.i=gdm_transform_data[[i]]
  layer.i=layer.i/as.numeric(global(layer.i,"max",na.rm=T))
  layerNorm[[i]]=layer.i
}

#combine as a single raster stack
gdm_transform_data=c(rast(layerNorm))

As a last step here we can plot a map of the study area that combines each of the three layers produced on the previous step into one image using the plotRGB function. The resulting map gives a relative measure of spatial variation in bird species composition.

#plot with RGB, adding each layer
plotRGB(gdm_transform_data,r=5, g=4, b=3, scale=1)

Part Two: Inference (analysis of contributions to beta diversity)

In part two we will consider measurements of different levels of diversity (alpha, beta, gamma) and look at beta diversity in detail as a special case that takes on particular significance due it’s inherently spatial characteristics.

#package "vegetarian" is not available for recent versions of R so we have to use the devtools package to get it:



library(devtools)

devtools::install_version("vegetarian", version = "1.2")

Now call the libraries before starting the analysis:

# Load packages

library(vegetarian)#for calculating Hill numbers
library(ggplot2)#for data visualization

The bird data used in Part One was appropriate for making prediction across space for a simple measure of diversity - species richness. However, if we want to carry out more detailed analyses of community structure across space we require data on abundance. For Part Two of this practical we use a well known dataset on a fish community in a European river system used in the doctoral thesis of Verneaux (1973;2003). These data were collected at 30 sites along the Doubs River, which runs near the France-Switzerland border in the Jura Mountains. The data is a matrix that contains coded abundances of 27 fish species.

spe<-read.csv("Doubs.csv")

First we calculate some basic single-number measures of diversity for this dataset:

#Calculation of species richness (q =0), Shannon's Index (q = 1) and Simpson's Index (q =2)

# First consult the guidance on the d() function of the package "vegetarian" that facilitates this process. 
?d
# Mean alpha species richess (q=0)
d(spe, lev = "alpha", q = 0)

## [1] 12.93103

# Mean alpha Shannon diversity (q=1)
d(spe, lev = "alpha", q = 1)

## [1] 8.807188

# Mean alpha Simpson diversity (q=2)
d(spe, lev = "alpha", q = 2, boot = TRUE)

## $D.Value
## [1] 5.701004
## 
## $StdErr
## [1] 0.1943825

# Multiplicative beta q=0
d(spe, lev = "beta", q = 0)

## [1] 2.088

# Multiplicative beta q=1
d(spe, lev = "beta", q = 1)

## [1] 2.24448

# Multiplicative beta q=2
d(spe, lev = "beta", q = 2, boot = TRUE)

## $D.Value
## [1] 2.660885
## 
## $StdErr
## [1] 0.1385421

# Gamma q=0
d(spe, lev = "gamma", q = 0)

## [1] 27

# Gamma q=1
d(spe, lev = "gamma", q = 1)

## [1] 19.76756

# Gamma q=2
d(spe, lev = "gamma", q = 2, boot = TRUE)

## $D.Value
## [1] 15.16971
## 
## $StdErr
## [1] 0.4450072

Now we can plot the change in beta diversity as the value for q increases. What we are modelling here is the overall species turnover (difference between sites - y axis) when emphasis on evenness is increased (x axis).

First we create a dataframe to hold the beta diversity values for a nominal value range of 0 to 20.

# Plot multiplicative beta diversity vs order
mbeta <- data.frame(order = 0:20, beta = NA, se = NA)
for (i in 1:nrow(mbeta)) {
  out <- d(spe, lev = "beta", q = mbeta$order[i], boot = TRUE)
  mbeta$beta[i] <- out$D.Value # the result of the d() function is placed in the "beta" column of our data frame
  mbeta$se[i] <- out$StdErr # the standard error is placed in the "se" column of our data frame
}

#inspect
mbeta

##    order     beta         se
## 1      0 2.088000 0.04418305
## 2      1 2.244480 0.06681063
## 3      2 2.660885 0.14265403
## 4      3 3.333756 0.21780584
## 5      4 4.083607 0.24922862
## 6      5 4.746571 0.29509736
## 7      6 5.268541 0.33171273
## 8      7 5.665132 0.30277753
## 9      8 5.967177 0.37454362
## 10     9 6.200845 0.41624856
## 11    10 6.384976 0.37240988
## 12    11 6.532633 0.39816737
## 13    12 6.652902 0.46538009
## 14    13 6.752215 0.37557534
## 15    14 6.835219 0.41645745
## 16    15 6.905340 0.41768199
## 17    16 6.965154 0.44647407
## 18    17 7.016627 0.47889145
## 19    18 7.061282 0.42326517
## 20    19 7.100315 0.38292418
## 21    20 7.134671 0.47516015

To visualize the result we call the ggplot() function to create the different components of the chart (this gives us more versatility than the base R plot() function).

ggplot(mbeta, aes(order, beta)) +  # the '+' symbol allows us to build up (add) multiple components to our chart
  geom_point() +
  geom_line() +
  geom_errorbar(
    aes(order, beta, ymin = beta - se, ymax = beta + se), # here we construct the error bars manually
    width = 0.2) + labs(y = "beta diversity", 
                        x = "Order of the diversity measure")

We can see that as evenness is increasingly emphasized in our diversity measures, the measure of beta diversity increases and does so most noticeable for Hill numbers between 2 and 10. This demonstrates the importance of considering the concepts of abundance and evenness in our estimates of diversity and underlines the limitations of numerical assessments such as species richness.

For the final part of this practical we will consider the contribution of individual sites to overall beta diversity, the Local Contribution to Beta Diversity (LCBD).

In order to compute LCBD for each site we need to create a function that will 1. transform the count data to ratio (continuous) format, 2. get the square distance from the study mean for each site for each species and 3. sum the values for each site from step two and divide by the study total to compute the relative contribution of each site to beta diversity . To get the data in a ratio format we will use a method known as “Hellinger” transformation () which is calculated as the square root of the abundance of each species in a site divided by the total abundance in that site.

# Computation of LCBD 
#build function to compute LCBD
# Hellinger-transformed species data


function_lcbd=function(x){

#carry out Hellinger transformation
spe1=decostand(x, method = 'hellinger')

#creat an empty matrix to hold results
ss_mat=spe1

ss_mat[]=0


#for loop to iterate over each column 
for(i in 1:ncol(spe1)){
  #select column i
  sp.i=spe1[,i]
  #get mean of column i (this is the mean abundance for species i over all sites)
  col.i_mean=mean(sp.i)
  #get the difference between site abundance and mean abundance and square it
  beta.i=sapply(sp.i,FUN=function(x) (x-col.i_mean)^2)
  #add to results matrix
  ss_mat[,i]=beta.i
}

#get sum of all values
ss_total=sum(ss_mat)
#site local contribution to beta diversity is the result for each site (summed for all species) divided by the sum of all values in the matrix 
site_LCBD=rowSums(ss_mat)/ss_total

#return contribution for each site
return(site_LCBD)

}

site_LCBD_vals=function_lcbd(spe)

We read in the file containing the sampling locations (XY coordinates) and plot each site as a function of its contribution to beta diversity (larger size signifies greater contribution).

DoubsCoords<-read.csv("DoubsSpa.csv")

head(DoubsCoords)

##        X      Y
## 1 85.678 20.000
## 2 84.955 20.100
## 3 92.301 23.796
## 4 91.280 26.431
## 5 92.005 29.163
## 6 95.954 36.315

plot(DoubsCoords, 
     cex.axis = 0.8, 
     pch = 21, 
     col = "black", 
     bg = "red", 
     cex =  site_LCBD_vals* 70, 
     main = "LCBD values", 
     xlab = "x coordinate (km)", 
     ylab = "y coordinate (km)"
)

#Finally plot lines approximating the course of the river

lines(DoubsCoords, col = "light blue")

Week 8 Diversity

MD

23/03/2026