BOTV211: Practical 5 — Species Distribution Modelling

Installing the software

Download the zip file from (click) here.
Unzip the folder on your desktop.
Navigate to “Maxent 4.1” and open “maxent.bat”. If the software does not run, it means that you do not have java installed! Please click here to install Java..

Introduction

This practical focuses on the field of species distribution modelling (SDM). Thus, we will combine locality data with environment information (in this case, only climate) to create a model that characterises the relationship between environment and distribution.

SDM has many uses, but in this practical we are going to:

explore some of the terminology used in SDM (e.g. AUC statistics, reciever operation curve, training and testing data, and random test percentage versus cross-validation).
generate a niche model for four species, and project these models onto spatial maps of current climate (climate layers averaged from 1970-2000), future climate (2070 under the RCP6 scenario) and climate under last glacial maximum conditions (~21 ka).

In this practical, we are going to use an algorithm called Maximum Entropy (cool name, huh?) that has been implemented in the java software Maxent. We are going to skim the surface of SDM with MaxEnt, but for those of you who’re up for a challenge, Cory Merow’s paper (2013, “A practical guide to modeling species’ distributions”) is well worth a read.

The aims of this practical are to:

familiarise yourself with SDM,
compare the current, past, and future predictions from upland/inland and lowland/coastal taxa.

Species information

The four taxa that we will be modelling are:

Cyclopia intermedia E.Mey. (Fabaceae): This is the most widespread of species in the genus Cyclopia . It’s range extends from the Witteberg Mountains near Touwsrivier to the Van Stadens Mountains near Port Elizabeth, and is found on rocky, loam, sandy soil between elevations of 500–1700 m. Thus, this can be considered an inland mountain species. This species is a resprouter with myrmechochorous seeds.
Metalasia muricata (L.) D.Don (Asteraceae): This widespread and common species found in coastal to mountainous regions of southern Africa. Here we are focusing on the coastal form, which is as ecotype (or subspecies) within this lineage. This species is a reseeder and has tiny wind-dispersed seeds. It is often a pioneer shrub that is the first of the reseeders to appear after fire.
Olea exasperata Jacq. (Oleaceae) is a large, often multi-stemmed, shrub or tree that is usually found along the west and south coastlines growing in recent and ancient dune sands. It is also a post-fire resprouter — this species relies heavily on resprouting for persistence. It has white flowers which appear in spring mature into purple fruits that are bird-dispersed. In germination experiments, Cowling et al. (1997; J. Veg. Sci.) found that no seeds of O. exasperata germinated and that seedlings were rare in the Cape St Francis Dunes and Cape Peninsula.
Pappea capensis Eckl. & Zeyh. (Sapindaceae) has distribution that extends from the Little Karoo, into the Eastern Cape and up into KwaZulu-Natal, to the northern provinces, as well as Mozambique, Zimbabwe and northwards into eastern and southern tropical Africa. For the purposes of this practical, we can consider it an inland subtropical lineage (i.e. has affinities with the eastern part of SA). In the Eastern Cape, it forms a core member of the Albany Subtropical Thicket and is usually found inland. It has a red fruit that is dispersed by birds and browsing herbivores.

An explanation of the folder structure

There are 10 folders in the zip file that are in the downloaded zip file:

Climate_Current contains geographic layers of six bioclimatic variables obtained from the WorldClim database. WorldClim is built on weather station data that has been interpolated¹ between stations, taking – for example – topography into account.

Climate_LGM contains the same six bioclimate layers as the Current_Climate folder, but in this case these represent statistically-downscaled PMIP simulations; these are global circulation models run under LGM conditions at a coarse resolution (usually ~ 1 degree cells, hence the need for downscaling).
Climate_2070 contains the same six bioclimate layers as the Current_Climate folder, but with statistically-downscaled CMIP5 simulations; these are global circulation models run under future scenarios of CO₂ conditions at a coarse resolution (again the need for downscaling). There are A LOT of future climate simulations (look here for a brief summary). I have provided the MIROC-ESM model simulations for 2017 under the RCP6 scenario. This scenario assumes that we do get our emission rate under control (i.e. it is not the worst case scenario simulation).
Locality_Dataset contains a .csv file with the localities (in the form of longitude and latitude coordinates for the species under investigation in this practical).
Maxent 4.1 contains the software to run the SDM algorithm, Maxent.
Out.S1 to Out.S6 are used to store the output of different MaxEnt analyses. Note that every time you run a MaxEnt analysis, it will overwrite the old analysis that is in the output folder. As you will need to compare between these analyses (e.g. present vs past and present vs future), it is much better to redirect each analysis to a new folder instead of re-running and overwriting the results the whole time.

The MaxEnt Interface

You start the software by clicking on maxent.bat. If the software does not run, it means that you do not have java installed! Please click here to install Java..

When you open the software, you will get the interface below…

Some descriptions of the interface:

Samples is where you specify the file that contains the species locality data. Browse to “Locality_Dataset/Locs.csv”
Environmental layers is where you specify the folder that contains the present-day bioclimatic layers. The software will intersect the locality points with these layers to build the model, thus it is crucial that modern locality points are used along with modern climate. It is entirely possible for you to specify the 2070 or LGM folder here, you will get a result — but that would very, very wrong! You would be training your model using modern locality points that intersect with a very distant past or a very different future environment. Please make sure you browse to the Climate_Current folder!
If you find that you your analyses are not generating any pictures, it is usually because this has been unticked (it is ticked by default).
Output directory: Please keep track of the output directory, and each time you go to a new section, change the output directory. As explained above, this will help you quickly get your results from each analysis for comparison purposes. Please use the Out.S1 to Out.S6 folders (the S1-S6 stands for sections 1 to 6).
Project layers directory/file: Specify the Climate_2070 or Climate_LGM folder here. It is unfortunate that you cannot specify multiple projection layers (i.e. future and past) in the same analysis. You need to do this separately (redirecting to a different output directory for each projection!),

Per section instructions

Section 1: SDM USING RANDOM TEST PERCENTAGE

Set the output folder to “Out.S1”.
Set the random test percentage to 25% (Click on Settings).
Run the four species ( Cyclopia intermedia, Metalasia muricata, Pappea capensis and Olea exasperata) on the default settings for all the current environmental layers.
If you get a warning, write it down, and then click on “Suppress similar visual warnings”.
OK, you’ve clicked Run, a bunch of windows popped up, your computer went haywire, and then… Nothing. Yes?
The output of each analysis is stored as an htlm in the Output Directory. So, in the folder “Out.S1”, open “Cyclopia_intermedia.html”,“Metalasia_muricata.html”, “Pappea_capensis.html” and “Olea_exasperata.html”.
Now, you’ve got four .html files open. How do you keep track of which is which?

Match up the figures below with the SDM predictions that you generated.

Q1.1 (A)

Q1.1 (B)

Q1.1 (C)

Q1.1 (D)

Hints for each question

Q1.2: Please report the Training and Testing AUC statistics.

Q1.3: Look in the .html results file.

Q1.4: visit this site: http://worldclim.org/bioclim. Please only report those variables that you are using in your models.

Q1.5: go to the Analysis of variable contributions section in each *.html file.

Q1.8: Open “Locality.data.csv” in excel and highlight and hold cells (with SHIFT) - then look for the row count. ** Hint: Look in the “Raw data outputs and control parameters” section in the Maxent output .html file.

Section 2: AREA UNDER THE CURVE

Some light reading on the AUC and ROC. ROC curves are used in many different fields beyond SDM. Here is a Youtube video explaining the general principles.

Section 3: TESTING MODELS

Light reading on training, test, and validation sets. Please explain these terms in your own words!
Random test percentage versus cross-validate. Please develop a different diagram to the one shown and make sure you include the random test percentage).

Section 4: PROJECTING MODELS ONTO FUTURE CLIMATE

Set the output folder to “Out.S4”.
Select all species to model.
Setup 4-folded cross-validation. Click on Settings, ensure Cross-validate is selected and set the number of replicates to 4.
Under __Projection Layers directory/file:__, browse to the Climate_2070 folder.
Run the analyses and open the .html files for each of the species (i.e. NOT the numbered _0.html etc. files).
Consider using the following terminology: contracting, expanding, shifting, loss of geographic area, gain of geographic area, fragmentation, merging/coalescing.

Section 5: PROJECTING MODELS ONTO PAST CLIMATE

Set the output folder to “Out.S5”.
Select all species to model. Keep 4-folded cross-validation.
Under __Projection Layers directory/file:__ select the Climate_LGM folder.
Run the analyses and open the .html files for each of the species (i.e. NOT the numbered _0.html etc. files).

Section 6: GENERAL QUESTIONS

All questions provided on the answer sheet. Consider aspects covered in all previous lecture material.

interpolation is a method of constructing new data points within the range of a discrete set of known data points — in this case, the interpolation estimates the climates between the climate stations (the “known data points”).↩