Worked example 2: Simulated spatially and temporally random data for the UK

if (!"occAssess" %in% installed.packages()) devtools::install_github("https://github.com/robboyd/occAssess")
library(occAssess)

Introduction

This vignette provides a worked example for the functionality of occAssess. For a full tutorial see the main vignette supplied in Boyd et al. (2021).

Occurrence data

In this worked example, we simulated data that approximates random distributions in space and time for five species. The data can be accessed from within occAssess as follows:


data(random)

spDat <- random

str(spDat)
#> 'data.frame':    5000 obs. of  25 variables:
#>  $ species           : Factor w/ 5 levels "Gungan","Hutt",..: 4 4 3 3 4 2 5 1 5 3 ...
#>  $ x                 : num  269500 479500 317500 337500 319500 ...
#>  $ y                 : num  730500 186500 572500 308500 475500 ...
#>  $ year              : int  2003 2000 2006 2004 2002 2002 2000 2006 2010 2010 ...
#>  $ spatialUncertainty: num  2117 7164 11458 7316 19586 ...
#>  $ identifier        : Factor w/ 2 levels "survey1","survey2": 1 1 1 1 2 2 1 2 2 2 ...
#>  $ clim1             : num  6.23 9.87 8.66 8.85 8.74 ...
#>  $ clim2             : num  11.4 16.1 14.1 14.5 14.1 ...
#>  $ clim3             : num  1.63 4.46 3.87 3.87 3.83 ...
#>  $ clim4             : num  1778 709 1126 871 1239 ...
#>  $ clim5             : num  229.3 72.1 124.2 92.3 141.3 ...
#>  $ clim6             : num  89.6 42.8 61.3 55.2 68.2 ...
#>  $ clim7             : num  32.2 15.9 24.4 17.4 26.7 ...
#>  $ clim8             : num  617 211 363 269 417 ...
#>  $ clim9             : num  281 154 193 177 215 ...
#>  $ clim10            : num  310 156 234 190 248 ...
#>  $ clim11            : num  574 189 337 245 361 ...
#>  $ clim12            : num  5.98 7.74 7.02 7.78 6.34 ...
#>  $ clim13            : num  36.1 37.2 39 39.3 36.7 ...
#>  $ clim14            : num  410 479 420 440 426 ...
#>  $ clim15            : num  15.5 21.9 18.9 20.2 18.3 ...
#>  $ clim16            : num  -1.024 1.131 0.918 0.384 1.021 ...
#>  $ clim17            : num  16.6 20.8 18 19.8 17.3 ...
#>  $ clim18            : num  2.36 6.2 5.4 4.76 6.6 ...
#>  $ clim19            : num  7.61 6.72 10.07 13.01 10.17 ...

Periods

In this example, we will specify five periods over 2001 to 2010


periods <- list(2001:2002, 2003:2004, 2005:2006, 2007:2008, 2009:2010)

Functions

All of the functions in occAssess require two common arguments: dat and periods (outlined above). I will run through each function in the following, indicating where additional arguments are required. Generally, the functions in occAssess return a list with two elements: one being a ggplot2 object, with a separate panel for each level of identifier; and a second with the data underpinning the plot.

assessRecordNumber

The first function I will introduce is the simplest: assessRecordNumber. This function simply plots out the number of records per year in your dataset.

This function enables researchers to quickly establish how the number of records has changed over time.

assessSpeciesNumber

In addition to the number of records, you may wish to know how the number of species (taxonomic coverage) in your dataset changes over time. For this you can use the function assessSpeciesNumber:

assessSpeciesID

It has been speculated that apparent changes in taxonomic coverage could, in fact, reflect a change in taxonomic expertise over time. For example, if fewer individuals have the skill to identify certain species, then it may not appear in your dataset in the later periods. The function assessSpeciesID treats the proportion of species identified to species level as a proxy for taxonomic expertise:

The argument “type” can take the values proportion (proportion of records identified to species level) or count (number of records identified to species level).

assessRarityBias

A number of studies have defined taxonomic bias in a dataset as the degree of proportionality between species’ range sizes (usually proxied by the number of grid cells on which it has been recorded) and the total number of records. One can regress the number of records on range size, and the residuals give an index of how over-or undersampled a species is given its prevalence. The function assessRarityBias conducts these analyses for each time period, and uses the r2 value from the linear regressions as an index proportionality between range sizes and number of records. Higher values indicate that species’ are sampled in proportion to their range sizes whereas lower values indicate that some species are over- or undersampled.

Note the warning message which tells us that there are low numbers of species in some periods (not surprising as the data only contain five species). This represents a small sample size for the regression of range size on number of records so the results should be viewed with caution.

assessSpatialCov

The function assessSpatialCov grids your data at a specified spatial resolution then maps it in geographic space. In this example, I provide a shapefile with the boundaries of the UK. If I was working on the WGS84 coordinate reference system (here I am using OSGB 36) this would not be necessary; I could instead use the countries argument and simply specify “UK”.

As you can see there are three new arguments to be specified. res is the spatial resolution at which you would like to map the data (units depend on you coordinate reference system, e.g. m if easting and northing, and decimal degress in lon/ lat); logCount indicates whether or not you would like to log10 transform the counts for visual purposes; and countries defines the countries covered by your data. Countries must be specified in order to plot their boundaries.

assessSpatialUncertainty

Point occurrence data often comes with assocatiated spatial uncertainty (i.e. how uncertain the coordinates are in x and y dimensions). For example, GBIF data comes with a field called coorinateUncertaintyInMeters. The function assessSptialUncertainty can be used to visualize spatial uncertainty in your dataset as a kernel density plot:

assessSpatialBias

Even if your data has good spatial coverage, it may be biased; that is to say, it may deviate from a random distribution in space. The function assessSpatialBias provides an index of how far your data deviates from a random distribution. To do this is simulates an equal number of points to your data randomly across your study region. Then, for each time period, it calculates the average nearest neighbour distance across your data points and divides it by the average nearest neighbour distance from the random sample. If the index is lower than one then your data is more clustered than the random sample, and if it is above one it is more dispersed. To delineate your study area, you must provide a mask layer. The mask is a raster object which is has numeric values within your study area, and is NA outside of your study area. Here, I’ll use some species distribution model outputs for the UK as a mask layer:

The argument nSamps indicates how many random distributions should be drawn, and the argument degrade = TRUE indicates that any duplicated coordinates within a time period and for a given level of identifier are removed. The shaded regions on the plot indicate the 5th and 95th percentiles of the nearest neighbour index calculated over nSamps random samples.

assessEnvBias

Spatial bias in your dataset does not necessarily tell you anything about environmental bias. The function assessEnvBias assess the degree to which your data are biased across time periods in environmental space. To do this we first need to get some climate data. I will use the standard suite of 19 bioclimatic variables from worldclim. It is possible to get this data through R using the raster package, but here I will use my local version for speed:

assessEnvBias conducts a principal component analysis on your environmental data, then maps your occurrence data in environmental space:

It is also possible to modify the appearance of envBias$plot using additional arguments that can be passed to ggfortify::autoplot. For example, you can include elipses, use different principal components, include vaiable vectors, etc.