Species Distribution Models

One of the fundamental challenges for current ecologists is to anticipate the responses of complex ecological systems to anthropogenic climate change. The distribution and abundance of species, the diversity and structure of communities, and the functioning of whole ecosystems is strongly influenced by climatic patterns of temperature, precipitation, and their seasonal changes. And due fossil fuel combustion, deforestation, and other anthropogenic impacts, Earth’s climate is warming at an unprecedented rate, changing not just local mean temperatures, but also atmospheric and oceanic circulation patterns, which determine the seasonality and extremes of precipitation and temperature.

One way to estimate how ecological systems will change in response to climate is to model how the favorable habitats available to a particular species will shift under different climate change scenarios. In a sense, this approach is based on considering changing climate in terms of the fundamental physiological tolerances of a species, in an attempt to describe the climatic or environmental niche of the species: the full set of climatic conditions under which the species is likely to occur. Once the “climatic space” occupied by a species is characterized, it can be mapped into geographic space, based either on current climatological data or projections based on models of future climate change. These approaches, known as Environmental Niche Models or Species Distribution Models have matured to the point that they are one of the foundational methods that ecologists use to make climate change predictions.

Our goal in this exercise is to develop species distribution models for a set of species native to our local environment here in Ohio, to analyze the relative importance of different climatic factors in determining the species’ geographic range, and to apply our model to predict how the species will respond to climate change over the next half century based on the latest generation of climate change scenarios.

This exercise adapts code from Fiona Spooner at University College London, from the Zarnetske Spatial Ecology lab at Michigan State, and from the Species Distribution Modeling vignette that accompanies the ‘dismo’ package from Hijmans and Elith (2011). We have also installed the java executable file for the maxent model.

Case Study: The Timber Rattlesnake, Crotalus horridus

As an example, we will model the geographic distribution of the timber rattlesnake, Crotalus horridus, which is one of two native rattlesnakes in Ohio. Based on field observations of C. horridus at specific locations, we will use climate data to model the distribution of the species. After analyzing the model to see which climate variables are the strongest predictors of the species’ range, we will then use climate change predictions for 2070 to assess how the suitability of C. horridus habitat will change over the region.

Timber Rattlesnake

Timber Rattlesnake

To get started, we need several libraries, which all need to be installed up front.

## Loading required package: sp
## rgdal: version: 1.1-10, (SVN revision 622)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.11.4, released 2016/01/25
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-3
## Checking rgeos availability: TRUE

Data Access and Preliminary Visualization

Our modeling effort is going to be based on data from a couple of large “ecoinformatic” efforts. The current and future environmental data are drawn from the Worldclim project (http://worldclim.org, Hijmans et al. 2005), which has collected globally standardized climate coverages at very high resolution. Specifically, we will be using their data at a resolution of 2.5 arc seconds of latitude and longitude. Specifically, we will be examining “bioclimatic” variables that are likely to influence species distributions. We can download the data directly using the ‘getData()’ function from the ‘raster’ package

currentEnv=getData("worldclim", var="bio", res=2.5)

Worldclim also compiles matched future climate projections for a large collection of global climate models (GCM’s) used in the phase 5 Coupled Model Intercomparison Project (CMIP5). Specifically, we are going to use estimates from the Hadley Centre Global Environmental Model version 2, with Earth System components (HADGEM2-ES). Earth System models include a bit more ecology, allowing landcover (vegetation) and hydrology to change more dynamically as the climate changes. To estimate change, we are going to use the most extreme climate change scenario (RCP8.5) which assumes that greenhouse gas emissions will continue to increase through 2100. We are downloading the predictions for 2070.

futureEnv=getData('CMIP5', var='bio', res=2.5, rcp=85, model='HE', year=70)

The climate data have lots of different variables, many of which our correlated with one another. To make things a bit simpler, we will limit ourselves to a bit smaller set of bioclimatic predictors.

currentEnv=dropLayer(currentEnv, c("bio2", "bio3", "bio4", "bio10", "bio11", "bio13", "bio14", "bio15"))
futureEnv=dropLayer(futureEnv, c("bio2", "bio3", "bio4", "bio10", "bio11", "bio13", "bio14", "bio15"))

Our remaining predictors are:

  • BIO1 = Annual Mean Temperature
  • BIO5 = Max Temperature of Warmest Month
  • BIO6 = Min Temperature of Coldest Month
  • BIO7 = Temperature Annual Range (BIO5-BIO6)
  • BIO8 = Mean Temperature of Wettest Quarter
  • BIO9 = Mean Temperature of Driest Quarter
  • BIO12 = Annual Precipitation
  • BIO16 = Precipitation of Wettest Quarter
  • BIO17 = Precipitation of Driest Quarter
  • BIO18 = Precipitation of Warmest Quarter
  • BIO19 = Precipitation of Coldest Quarter

Now that we have our environmental data we need to get records of our species occurrences, so we can know what the climate is like where they are known to occur. Such occurrence data have been compiled for millions of species worldwide by the Global Biodiversity Information Facility (GBIF, http://gbif.org). The ‘dismo’ package has a ‘gbif()’ function for accessing data directly. All you need is the scientific name for the species.

## 5851 records found
## 0-
## 300-
## 600-
## 900-
## 1200-
## 1500-
## 1800-
## 2100-
## 2400-
## 2700-
## 3000-
## 3300-
## 3600-
## 3900-
## 4200-
## 4500-
## 4800-
## 5100-
## 5400-
## 5700-
## 5851 records downloaded

Before we run our model, we need to validate the observational data to make sure that they have geographic locations and that all the observations are reasonable. First, we will remove all the observations without latitudes and longitudes.

rattler=subset(rattler, !is.na(lon) & !is.na(lat)) # the ! symbol can be used to mean 'not'

Next, we want to find and eliminate any duplicate locations.

rattlerdups=duplicated(rattler[, c("lon", "lat")])
rattler <-rattler[!rattlerdups, ]

Now let’s look at a preliminary map to see how our C. horridus observations are distributed. We’ll make the map extend 1 degree around the observations

plot(wrld_simpl, xlim=c(min(rattler$lon)-1,max(rattler$lon)+1), ylim=c(min(rattler$lat)-1,max(rattler$lat)+1), axes=TRUE, col="light yellow")
points(rattler$lon, rattler$lat, col="orange", pch=20, cex=0.75)