Lab 3: Running a species distribution model in Maxent

A lot of the material in these practicals are adapted from Hijmans and Elith (2011) - https://cran.r-project.org/web/packages/dismo/vignettes/sdm.pdf

Aims

To run an ecological niche model (species distribution model) using Maxent
To set a threshold of occurence
To map the model results

The basis for this session’s lab is a software program developed by Steven Phillips, Miro Dudik and Rob Schapire, with support from AT&T Labs-Research, Princeton University, and the American Museum of Natural History. The software is free and can be downloaded from:

https://biodiversityinformatics.amnh.org/open_source/maxent/

You should download the latest version (requires entering your name, institution, and email, and agreeing to the terms of use). It is simplest to download the Maxent.zip file, which includes maxent.jar and maxent.bat. The files should be unzipped and saved in the same directory. Note that the program can be used on any computer running Java version 1.4 or later (the Java runtime environment can be obtained from java.sun.com/javase/downloads; this is pre-installed in the UCL classrooms).

The model is described in detail in the following paper.

Phillips, S. J., R. P. Anderson, M. Dudík, R. E. Schapire, M. Blair. 2017. Opening the black box: an open-source release of Maxent. Ecography 40:887–893.

Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

You might want to also download the tutorial provided on the website. Dr Phillips’ tutorial provides an introduction to the many features of Maxent, which you may be interested to explore further. However, in this document I pick out and the things you should complete for this course. I also outline some additional processes, including how to explore alternative thresholds and how to import your results in to ArcMap.

There’s no need for our purposes to download the example datasets available on the web page. The data you have prepared in Labs 1 and 2 is formatted (almost!) for use with Maxent, so you should use your own case study species.

Part 1: Make a Maxent model

To perform a model run, you need to supply a file containing species occurrence localities (‘samples’), a directory containing environmental variables, and an output directory. You can use the species occurrence localities that you obtained in Lab 1.

Next, in Maxent, use the browse button under ‘Samples’ to locate your .csv file.

For your environmental layers, use the bioclim layers that you clipped to the appropriate region and converted to ASCII grids in Lab 2. You could also include other environmental variables, for example from remote sensing. All layers should be in ASCII grid format and all must have exactly the same header information. Use the Browse button under ‘Environmental layers’ to locate your ASCII grids, but note that while browsing for the environmental variables you are looking for the directory that contains them (you don’t need to browse down to the files in the directory).

Select your own Output directory, e.g. “~/SDM_Course/Output”

For our purposes, we will use the logistic output format in Maxent (check the dropdown menu under ‘output format’ is set to logistic) and we will ‘make pictures of predictions’ (ensure the box is checked). Other parts of the interface can be left as default. Now, your interface should look similar to this:

Now you are ready to run the model: hit Run!

You will see progress bars as the model is created (the run may take a few minutes). On finishing, multiple output files are produced. Look first at the html file (e.g., Capra_ibex.html). This includes a picture of your model. The image uses colors to indicate predicted probability that conditions are suitable, with red indicating high probability of suitable conditions for the species, green indicating conditions typical of those where the species is found, and lighter shades of blue indicating low predicted probability of suitable conditions.

The species occurrence localities that you used to build the model are shown as white squares.

Here is the Maxent model for the Alpine ibex, which I have used as an example:

Does your model make sense? Are the patterns as you expected?

Part 2: Setting a decision threshold in R

The picture in the html file is for quick viewing, but it is useful to import the data into R for drawing a neater map and for subsequent analysis. Notice that the software has created an ASCII grid file in your output directory (e.g., Capra_ibex.asc). This file contains the model results, exactly as plotted in the html file, and can be imported into R, ArcMap (or another GIS).

First of all let’s install the packages needed for this section, you don’t need to install them again if you have already done so for a previous lab.

install.packages("raster")
install.packages("maps")
install.packages("maptools")
install.packages("ggplot2")
install.packages("scales")
install.packages("rgdal")

And load them in using the library function:

library(raster)
library(maps)
library(mapdata)
library(ggplot2)
library(scales)
library(rgdal)

Let’s load back in the species location data so we can plot it on top of the model results.

Next let’s input this file back into R as a raster and have a look at it. We can also get the extent of the raster and store these in some objects that we will use later.

capout <- raster("Capra_ibex.asc")
plot(capout)
points(capg$lon, capg$lat, pch="+")

xmin <- extent(capout)[1]
xmax <- extent(capout)[2]
ymin <- extent(capout)[3]
ymax <- extent(capout)[4]

Now, it is often desirable to set a decision threshold (threshold of occurrence) above which the environment is considered to be suitable. Our goal is thus to convert the probability of occurrence that is predicted by Maxent to a binary ‘present/absent’ prediction. There a several possible approaches to setting a threshold, but the two common ones that we will look at here are the ‘minimum training presence’ (the largest possible threshold, and hence smallest area, that does not result in omission of any of the occurrence records) and the ‘10 percentile training presence’ (the largest threshold that leaves out 10% of occurrence records). The Maxent software calculates these thresholds for you see the table in the html file, for example:

We are working with Logistic model output, so we can ignore the first column. The second column gives the number we will use. The fourth column informs us the fraction of the study are that is considered ‘present’ or ‘suitable’ if we use this threshold, and the fifth column tells us the proportion of occurrence records that are omitted from the prediction (the omission rate). Of course, the omission rate must be 0.0 for the minimum training presence threshold (since the threshold is set to not allow any omission) and the omission rate must be 0.1 (or very close) for the 10 percentile threshold (since the threshold is set to give 10% omission). Note that we will look at this table in more detail in Lab 4.

For now, we can use the thresholds from this table (e.g., 0.011, 0.262) to redraw our map as a binary prediction. To do this we need to create a matrix to set the thresholds for the areas which are considered ‘suitable’ or ‘unsuitable’. The first column in the matrix created below represents the ‘from’ values and the second column represents the ‘to’ values and the third column sets what those values should be set to. So here we are setting anything 0-0.011 to 0 and anything 0.011-1 as 1. Note that these values will probably be different for you and will depend on the threshold values that your model suggests.

We then use the reclassify function to reclassify the values in the model output as either 0 (species absent) or 1 (species present). We can plot out the resulting map of predicted presence and absence.

m <- c(0, 0.011, 0, 0.011, 1, 1)
rclmat <- matrix(m, ncol = 3, byrow = TRUE)

rclmat

##       [,1]  [,2] [,3]
## [1,] 0.000 0.011    0
## [2,] 0.011 1.000    1

rc <- reclassify(capout, rclmat)

plot(rc)
map(
  'worldHires',
  xlim = c(xmin, xmax),
  ylim = c(ymin, ymax),
  fill = F,
  add = T
)
points(capg$lon, capg$lat, pch = "+")

Does this map look sensible to you? If the distribution is larger than you expected you might need to increase the threshold values.

Let’s try the 10 percentile training presence threshold to see what difference that makes.

n <- c(0, 0.262, 0, 0.262, 1, 1)
rclmat2 <- matrix(n, ncol = 3, byrow = TRUE)

rclmat2

##       [,1]  [,2] [,3]
## [1,] 0.000 0.262    0
## [2,] 0.262 1.000    1

rc2 <- reclassify(capout, rclmat2)

plot(rc2)
map(
  'worldHires',
  xlim = c(xmin, xmax),
  ylim = c(ymin, ymax),
  fill = F,
  add = T
)
points(capg$lon, capg$lat, pch = "+")

Does this one look more realistic? Or does it look smaller than you might expect? If both the values look unrealistic you might want to pick a number in between.

There are lots of different ways to plot things in R - the package ggplot2 has vast capabilities but first of all we need to download some data.

Go to http://www.naturalearthdata.com/downloads/

The scale of your study will influence what data you download. If your species is very broadly distributed - across 2+ continents then you will need to download large scale data. If your study covers multiple countries then download medium scale data, if it is restricted to a smaller area then download the small scale data. I have chosen the small scale data as my study covers a fairly restricted area.

You will need to download the Cultural -> Admin 0 – Countries data and also Physical -> Land data

You can use the readOGR() function to load these shapefiles into R. After “dsn=” you need to put the file path to the folder containing the .shp file and then after “layer=” you need to put the name of the shape file (.shp) but not including the .shp suffix.

The fortify function changes shapefiles to dataframes, this is the format that ggplot requires.

world <-
  readOGR(dsn = "C:/Users/Fiona/Documents/GIS/ne_10m_land", layer = "ne_10m_land")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Fiona\Documents\GIS\ne_10m_land", layer: "ne_10m_land"
## with 10 features
## It has 3 fields

world_df <- fortify(world)

## Regions defined for each Polygons

countries <-
  readOGR(dsn = "C:/Users/Fiona/Documents/GIS/ne_10m_admin_0_countries", layer =
            "ne_10m_admin_0_countries")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Fiona\Documents\GIS\ne_10m_admin_0_countries", layer: "ne_10m_admin_0_countries"
## with 255 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID

countries_df <- fortify(countries)

## Regions defined for each Polygons

We need to format the data for plotting with ggplot.

proj4string(rc2) <- CRS("+proj=longlat")

proj4string(capout) <- CRS("+proj=longlat")

rcp <- rasterToPoints(rc2)
rcpdf <- data.frame(rcp)
#Make appropriate column headings
colnames(rcpdf) <- c("Longitude", "Latitude", "Capra")

capout_pts <- rasterToPoints(capout)
capout_df <- data.frame(capout_pts)
colnames(capout_df) <- c("Longitude", "Latitude", "Capra")

Setting the theme options for ggplot.

theme_opts <- list(
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major = element_blank(),
    panel.background = element_rect(fill = 'white', colour = NA),
    plot.background = element_rect(),
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    plot.title = element_text()
  )
)

ggplot(data = capout_df, aes(y = Latitude, x = Longitude)) +
  geom_polygon(data = world_df, aes(long, lat, group = group), fill =
                 "light grey") +
  geom_raster(aes(fill = Capra)) +
  scale_fill_gradientn(colours = c("light grey", "dark orange")) +
  geom_path(
    data = countries_df,
    aes(long, lat, group = group),
    color = "black",
    size = 0.1
  ) +
  geom_point(
    data = capg,
    aes(lon, lat, group = NULL, fill = NULL),
    size = 1,
    shape = 16
  ) +
  coord_cartesian(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
  theme_opts

ggplot(data = rcpdf, aes(y = Latitude, x = Longitude)) +
  geom_polygon(data = world_df, aes(long, lat, group = group), fill = "light grey") +
  geom_raster(aes(fill = Capra)) +
  scale_fill_gradientn(colours = c("light grey", "dark orange")) +
  geom_path(
    data = countries_df,
    aes(long, lat, group = group),
    color = "black",
    size = 0.1
  ) +
  geom_point(
    data = capg,
    aes(lon, lat, group = NULL, fill = NULL),
    size = 1,
    shape = 16
  ) +
  coord_cartesian(xlim = c(xmin, xmax ), ylim = c(ymin, ymax)) +
  theme(legend.position = "none") +
  theme_opts