The question

It is often interesting to ask where species occur in relation to climate variables. We often like to compare ‘species from a dry climate’ to a ‘species from a wet climate’, and put some numbers on it. For example, statements like “the mean rainfall for the species across its distribution is 1000mm”. It may also be interesting to estimate quantiles of the rainfall or temperature, for example “mean annual rainfall for the species across its range was is between 200 - 400mm (95% quantile)”. How do we get these numbers?

A naive and biased approach

We may take the following simple approach:

Retrieve species occurrence records from wonderful sources like ALA (Australia) or GBIF (the world), as longitude-latitude points.
At these exact point locations, extract long-term climate data from the amazing WorldClim resource.
For the (often many thousands) of observations, calculate your statistics of interest (mean MAP, mean MAT, quantiles, whatever).

The problem with this approach lies in the nature of the species occurrence records. These observations are not where the species actually occurs, rather it is a heavily biased record of where the species has been observed to occur. In practice this means that some areas will be sampled heavily (areas close to major cities, national parks or research stations, next to roads - particularly in remote area, etc.). These problems are well documented.

With biased occurrence records, climate metrics will be equally biased. For example it can be expected that remote (often dry) areas will be more often undersampled than populated (often wetter) areas, as well as coastal vs. inland areas.

An alternative approach

To remove some of this bias, in particular oversampling/undersampling, we suggest converting the occurrence data into presence/absence grid cells, at the same spatial resolution as the finest resolution of Worldclim (10min = 18.6 x 18.6km at the equator). Then, climate metrics can be calculated for the grid cells where the species is found to occur, regardless of how often it was sampled. This approach ensures that poorly sampled areas are equally weighted in the climate envelope estimates.

Note that neither this approach or the naive approach above is insured against wrong observations, outliers, or missing observations. More work can be done to clean raw occurrence data from the ALA, and in fact a lot of information is given by ALA in terms of data quality that can be used. In the following I use the records as-is.

Implementation in R

First download this file with functions I wrote to do all of the necessary steps:

# Downloads the script into your working directory
url <- "https://bitbucket.org/!api/2.0/snippets/remkoduursma/pp6R4/263ee377b87bf487b3687c90755f7ef658f71151/files/ala_gbif_worldclim_species_envelope.r"
download.file(url, basename(url))
 
# and source
source("ala_gbif_worldclim_species_envelope.r")

Next load the following packages, install from CRAN first if you don’t have them:

library(raster)

## Loading required package: sp

library(rgbif)  # when using GBIF
library(ALA4R)  # when using ALA

The function worldclim_presence does all steps at once (see last section of this document), but I will show the individual steps first, for illustration of the process.

Step 1. Get occurrences

Using the ALA4R package, we can now quickly get all species occurrences in Australia. Within the functions defined in the script above, use the get_occurrences_ala function, like this.

euca <- get_occurrences_ala("Eucalyptus camaldulensis")

Use get_occurrences_gbif if you want global records (for Australian species, use ALA since that will have far more records within Australia). Also note that GBIF is very slow, but ALA is very fast.

A simple map is quickly produced:

library(oz)
par(mfrow=c(1,2), mar=c(4,4,1,1))

oz()
with(euca, points(longitude, latitude, pch=".", col="red"))

oz(xlim=c(138,139), ylim=c(-36, -32))
with(euca, points(longitude, latitude, pch=".", col="red"))

Raw occurrences for River red gum across Australia or a small section of SA (right panel).

For river red gum we can spot where major rivers are (especially note the Murray and Darling). Perhaps MAT and MAP are not the best metrics for the climate envelope for this species - but that’s up to you to decide.

Step 2. Rasterize occurrences

Next we make a raster of presences of the species occurrences. The following code does this for you, and note that the resolution can not be changed as it is fixed to the Worldclim resolution.

euca_ras <- rasterize_occurrences(euca)

Remaking the map from above we see what was done:

library(oz)
par(mfrow=c(1,2), mar=c(4,4,1,1))

oz()
with(euca_ras, points(longitude, latitude, pch=".", col="red"))

oz(xlim=c(138,139), ylim=c(-36, -32))
with(euca_ras, points(longitude, latitude, pch=16, cex=0.3, col="red"))

Rasterized occurrences for River red gum across Australia or a small section of SA (right panel).

It is also easy to see that the spatial weighting is quite different, as these simple frequency graphs show. We see heavy oversampling in the low latitudes, and to some extent in the East.

par(cex.main=0.9, mfrow=c(2,2), mar=c(4,4,1,1), mgp=c(2,0.5,0), tcl=0.2)
hist(euca$longitude, xlim=c(90, 160), breaks=100, main="Raw occurrences", xlab="Longitude")
hist(euca$latitude, xlim=c(-40,-10), breaks=100, main="Raw occurrences", xlab="Latitude")
hist(euca_ras$longitude, xlim=c(90, 160), breaks=100, main="Rasterized", xlab="Longitude")
hist(euca_ras$latitude, xlim=c(-40,-10), breaks=100, main="Rasterized", xlab="Latitude")

Frequency histograms of raw or rasterized occurrences by latitude or longitude for River red gum.

Step 3. Extract Worldclim observations

Next we can extract the climate observations from Worldclim. Some packages have functionality for this, but I rewrote it as the implementations are slow and not flexible. The first step is to download the layers from the online source, the function get_worldclim_rasters will download it to either a temporary file or a folder of your choosing (in which case the layers will not be downloaded each time).

Worldclim stores monthly long-term average precipitation, and temperature (mean, max, min). Here I extract only precipitation and mean temperature, some more work is needed to extract any of these variables as well as one of many ‘bioclimatic’ variables. You have the option to download annual averages and monthly values for every occurrence in the data, or return simple summaries (mean and quantiles of precip and temperature). This is controlled by the return= argument in the example below.

You can use the function get_worldclim_rasters to download the layers, but you don’t have to do that since other functions will run this for you.

The following code extracts climate data for all raw occurrences and the rasterized version. I suggest adding the argument topath = "c:/mydata" for example - so the layers are not redownloaded each time. I have omitted that here.

##                    species     n  lat_mean long_mean MAT_mean  MAT_q05
## 1 Eucalyptus camaldulensis 28874 -33.18432  141.4881 16.50053 13.19167
##   MAT_q95 MAP_mean MAP_q05 MAP_q95
## 1   25.35 563.6131     256     847

##                    species    n  lat_mean long_mean MAT_mean MAT_q05
## 1 Eucalyptus camaldulensis 3054 -28.10604   139.279 19.73766   13.35
##    MAT_q95 MAP_mean MAP_q05 MAP_q95
## 1 27.09708 526.1125     199     976

In this particular example we see quite large differences in mean MAT across occurrence records between raw data and the rasterized version. We can also look at the distribution across occurrences, like this:

clim_occ <- get_worldclim_prectemp(euca, return="all")

clim_ras <- get_worldclim_prectemp(euca_ras, return="all")

It should be mentioned that these objects contain a row for every latitude/longitude pair in the dataset, and the output contains monthly variables (prec_1, prec_2, and tmean_1 and so on), as well as annual averages temperature (MAT) and total precipitation (MAP).

Finally you can see that the distribution of climate across the species’ range is quite different when using the raw data or rasterized occurrences.

par(cex.main=0.9, mfrow=c(2,2), mar=c(4,4,1,1), mgp=c(2,0.5,0), tcl=0.2)
with(clim_occ, hist(MAT, main="Raw occurrences", xlim=c(10, 30)))
with(clim_occ, hist(MAP, main="Raw occurrences", xlim=c(0,1800)))
with(clim_ras, hist(MAT, main="Rasterized", xlim=c(10, 30)))
with(clim_ras, hist(MAP, main="Rasterized", xlim=c(0,1800)))

Frequency histograms of mean annual temperature (MAT) or precipitation (MAP) across the raw occurrence locations or rasterized locations.

Simple range maps and climate envelopes

We can all steps at once via,

eute <- worldclim_presence("Eucalyptus tereticornis", return="summary")

We can get summaries for multiple species via:

eucs <- worldclim_presence(c("Eucalyptus tereticornis","Corymbia calophylla","Eucalyptus globulus"),
                             return="summary")

knitr::kable(eucs, digits=2)

species	n	lat_mean	long_mean	MAT_mean	MAT_q05	MAT_q95	MAP_mean	MAP_q05	MAP_q95
Eucalyptus tereticornis	991	-27.66	150.03	18.44	13.04	23.35	1049.66	645.25	1741.0
Corymbia calophylla	274	-33.45	121.41	16.06	14.29	18.56	790.81	413.20	1172.5
Eucalyptus globulus	620	-37.93	147.46	12.49	9.18	15.13	970.12	623.60	1523.8

Or even all (rasterized) occurrences with climate data with return="all". Note that the result is a list.

eucs <- worldclim_presence(c("Eucalyptus tereticornis","Corymbia calophylla","Eucalyptus globulus"),
                             return="all")

We can use this list for a range map:

library(oz)
oz()

palette(c("red2","cornflowerblue","forestgreen"))

for(i in seq_along(eucs)){
  with(eucs[[i]], points(longitude, latitude, pch=16, cex=0.2, col=palette()[i]))
}
legend("topleft", names(eucs), fill=palette(), cex=0.7)

Or to plot climate envelopes:

plot(1, type='n', xlim=c(0,30), ylim=c(0,2500),
     xlab="MAT", ylab="MAP")
for(i in seq_along(eucs)){
  with(eucs[[i]], points(MAT, MAP, pch=16, cex=0.8, col=palette()[i]))
}
legend("topleft", names(eucs), fill=palette(), cex=0.7)

Calculating species climate envelopes in R

Remko Duursma

10 November 2016