It is often interesting to ask where species occur in relation to climate variables. We often like to compare ‘species from a dry climate’ to a ‘species from a wet climate’, and put some numbers on it. For example, statements like “the mean rainfall for the species across its distribution is 1000mm”. It may also be interesting to estimate quantiles of the rainfall or temperature, for example “mean annual rainfall for the species across its range was is between 200 - 400mm (95% quantile)”. How do we get these numbers?
We may take the following simple approach:
The problem with this approach lies in the nature of the species occurrence records. These observations are not where the species actually occurs, rather it is a heavily biased record of where the species has been observed to occur. In practice this means that some areas will be sampled heavily (areas close to major cities, national parks or research stations, next to roads - particularly in remote area, etc.). These problems are well documented.
With biased occurrence records, climate metrics will be equally biased. For example it can be expected that remote (often dry) areas will be more often undersampled than populated (often wetter) areas, as well as coastal vs. inland areas.
To remove some of this bias, in particular oversampling/undersampling, we suggest converting the occurrence data into presence/absence grid cells, at the same spatial resolution as the finest resolution of Worldclim (10min = 18.6 x 18.6km at the equator). Then, climate metrics can be calculated for the grid cells where the species is found to occur, regardless of how often it was sampled. This approach ensures that poorly sampled areas are equally weighted in the climate envelope estimates.
Note that neither this approach or the naive approach above is insured against wrong observations, outliers, or missing observations. More work can be done to clean raw occurrence data from the ALA, and in fact a lot of information is given by ALA in terms of data quality that can be used. In the following I use the records as-is.
First download this file with functions I wrote to do all of the necessary steps:
# Downloads the script into your working directory
url <- "https://bitbucket.org/!api/2.0/snippets/remkoduursma/pp6R4/263ee377b87bf487b3687c90755f7ef658f71151/files/ala_gbif_worldclim_species_envelope.r"
download.file(url, basename(url))
# and source
source("ala_gbif_worldclim_species_envelope.r")
Next load the following packages, install from CRAN first if you don’t have them:
library(raster)
## Loading required package: sp
library(rgbif) # when using GBIF
library(ALA4R) # when using ALA
The function worldclim_presence
does all steps at once (see last section of this document), but I will show the individual steps first, for illustration of the process.
Using the ALA4R package, we can now quickly get all species occurrences in Australia. Within the functions defined in the script above, use the get_occurrences_ala
function, like this.
euca <- get_occurrences_ala("Eucalyptus camaldulensis")
Use get_occurrences_gbif
if you want global records (for Australian species, use ALA since that will have far more records within Australia). Also note that GBIF is very slow, but ALA is very fast.
A simple map is quickly produced:
library(oz)
par(mfrow=c(1,2), mar=c(4,4,1,1))
oz()
with(euca, points(longitude, latitude, pch=".", col="red"))
oz(xlim=c(138,139), ylim=c(-36, -32))
with(euca, points(longitude, latitude, pch=".", col="red"))
Raw occurrences for River red gum across Australia or a small section of SA (right panel).
For river red gum we can spot where major rivers are (especially note the Murray and Darling). Perhaps MAT and MAP are not the best metrics for the climate envelope for this species - but that’s up to you to decide.
Next we make a raster of presences of the species occurrences. The following code does this for you, and note that the resolution can not be changed as it is fixed to the Worldclim resolution.
euca_ras <- rasterize_occurrences(euca)
Remaking the map from above we see what was done:
library(oz)
par(mfrow=c(1,2), mar=c(4,4,1,1))
oz()
with(euca_ras, points(longitude, latitude, pch=".", col="red"))
oz(xlim=c(138,139), ylim=c(-36, -32))
with(euca_ras, points(longitude, latitude, pch=16, cex=0.3, col="red"))
Rasterized occurrences for River red gum across Australia or a small section of SA (right panel).
It is also easy to see that the spatial weighting is quite different, as these simple frequency graphs show. We see heavy oversampling in the low latitudes, and to some extent in the East.
par(cex.main=0.9, mfrow=c(2,2), mar=c(4,4,1,1), mgp=c(2,0.5,0), tcl=0.2)
hist(euca$longitude, xlim=c(90, 160), breaks=100, main="Raw occurrences", xlab="Longitude")
hist(euca$latitude, xlim=c(-40,-10), breaks=100, main="Raw occurrences", xlab="Latitude")
hist(euca_ras$longitude, xlim=c(90, 160), breaks=100, main="Rasterized", xlab="Longitude")
hist(euca_ras$latitude, xlim=c(-40,-10), breaks=100, main="Rasterized", xlab="Latitude")
Frequency histograms of raw or rasterized occurrences by latitude or longitude for River red gum.
Next we can extract the climate observations from Worldclim. Some packages have functionality for this, but I rewrote it as the implementations are slow and not flexible. The first step is to download the layers from the online source, the function get_worldclim_rasters
will download it to either a temporary file or a folder of your choosing (in which case the layers will not be downloaded each time).
Worldclim stores monthly long-term average precipitation, and temperature (mean, max, min). Here I extract only precipitation and mean temperature, some more work is needed to extract any of these variables as well as one of many ‘bioclimatic’ variables. You have the option to download annual averages and monthly values for every occurrence in the data, or return simple summaries (mean and quantiles of precip and temperature). This is controlled by the return=
argument in the example below.
You can use the function get_worldclim_rasters
to download the layers, but you don’t have to do that since other functions will run this for you.
The following code extracts climate data for all raw occurrences and the rasterized version. I suggest adding the argument topath = "c:/mydata"
for example - so the layers are not redownloaded each time. I have omitted that here.
## species n lat_mean long_mean MAT_mean MAT_q05
## 1 Eucalyptus camaldulensis 28874 -33.18432 141.4881 16.50053 13.19167
## MAT_q95 MAP_mean MAP_q05 MAP_q95
## 1 25.35 563.6131 256 847
## species n lat_mean long_mean MAT_mean MAT_q05
## 1 Eucalyptus camaldulensis 3054 -28.10604 139.279 19.73766 13.35
## MAT_q95 MAP_mean MAP_q05 MAP_q95
## 1 27.09708 526.1125 199 976
In this particular example we see quite large differences in mean MAT across occurrence records between raw data and the rasterized version. We can also look at the distribution across occurrences, like this:
clim_occ <- get_worldclim_prectemp(euca, return="all")
clim_ras <- get_worldclim_prectemp(euca_ras, return="all")
It should be mentioned that these objects contain a row for every latitude/longitude pair in the dataset, and the output contains monthly variables (prec_1
, prec_2
, and tmean_1
and so on), as well as annual averages temperature (MAT) and total precipitation (MAP).
Finally you can see that the distribution of climate across the species’ range is quite different when using the raw data or rasterized occurrences.
par(cex.main=0.9, mfrow=c(2,2), mar=c(4,4,1,1), mgp=c(2,0.5,0), tcl=0.2)
with(clim_occ, hist(MAT, main="Raw occurrences", xlim=c(10, 30)))
with(clim_occ, hist(MAP, main="Raw occurrences", xlim=c(0,1800)))
with(clim_ras, hist(MAT, main="Rasterized", xlim=c(10, 30)))
with(clim_ras, hist(MAP, main="Rasterized", xlim=c(0,1800)))
Frequency histograms of mean annual temperature (MAT) or precipitation (MAP) across the raw occurrence locations or rasterized locations.
We can all steps at once via,
eute <- worldclim_presence("Eucalyptus tereticornis", return="summary")
We can get summaries for multiple species via:
eucs <- worldclim_presence(c("Eucalyptus tereticornis","Corymbia calophylla","Eucalyptus globulus"),
return="summary")
knitr::kable(eucs, digits=2)
species | n | lat_mean | long_mean | MAT_mean | MAT_q05 | MAT_q95 | MAP_mean | MAP_q05 | MAP_q95 |
---|---|---|---|---|---|---|---|---|---|
Eucalyptus tereticornis | 991 | -27.66 | 150.03 | 18.44 | 13.04 | 23.35 | 1049.66 | 645.25 | 1741.0 |
Corymbia calophylla | 274 | -33.45 | 121.41 | 16.06 | 14.29 | 18.56 | 790.81 | 413.20 | 1172.5 |
Eucalyptus globulus | 620 | -37.93 | 147.46 | 12.49 | 9.18 | 15.13 | 970.12 | 623.60 | 1523.8 |
Or even all (rasterized) occurrences with climate data with return="all"
. Note that the result is a list.
eucs <- worldclim_presence(c("Eucalyptus tereticornis","Corymbia calophylla","Eucalyptus globulus"),
return="all")
We can use this list for a range map:
library(oz)
oz()
palette(c("red2","cornflowerblue","forestgreen"))
for(i in seq_along(eucs)){
with(eucs[[i]], points(longitude, latitude, pch=16, cex=0.2, col=palette()[i]))
}
legend("topleft", names(eucs), fill=palette(), cex=0.7)
Or to plot climate envelopes:
plot(1, type='n', xlim=c(0,30), ylim=c(0,2500),
xlab="MAT", ylab="MAP")
for(i in seq_along(eucs)){
with(eucs[[i]], points(MAT, MAP, pch=16, cex=0.8, col=palette()[i]))
}
legend("topleft", names(eucs), fill=palette(), cex=0.7)