Lab 1: Biological Data

The following video by Richard Pearson is relevant for both this Lab and Lab 2:

https://www.youtube.com/watch?v=8inEr1c2UmE&list=PLKYTvTbXFuChaoF-L-1e9RzCagdLPQcCU&index=3

Additionally, a lot of the material in these practicals are adapted from Hijmans and Elith (2011) - https://cran.r-project.org/web/packages/dismo/vignettes/sdm.pdf

Aims

To obtain some species occurence data from GBIF
To produce a map of the species’ occurrences

First we need to install the packages needed for this practical:

install.packages("dismo")
install.packages("maptools")
install.packages("maps")
install.packages("mapdata")
install.packages("dplyr")
install.packages("CoordinateCleaner")
install.packages("raster")
install.packages("ggplot2")
install.packages("scales")

Then we need to load the packages into R using the library() command:

You only need to install the packages onto your computer once but each time you open R you will need to use the library() command to call up the packages you wish to use.

library(dismo)
library(maptools)
library(maps)    
library(mapdata) 
library(dplyr)
library(CoordinateCleaner)

Next you need to create a folder called ‘SDM_Course’, or similar, to keep all of the files from this practical in. Then create separate folders for your species locations and for the environmental data. The code below will check if a folder with the path “~SDM_Course/Species_Locs” exists and if it doesn’t it will create it. It will then set this to the working directory.

if(!dir.exists("~/SDM_Course/Species_Locs")){
  dir.create("~SDM_Course/Species_Locs", recursive = TRUE)
}

setwd("~/SDM_Models/Species_Locs")

Now we can download species location data from GBIF, this code will download data for the Alpine ibex (Capra ibex), but you can change it to the binomial of whichever species you are interested in.

Downloading Data

First of all we can count the number of records available for your chosen species. This is to ensure that there are enough records (>~100) but not too many, the function to download records will not work if there are more than 200,000 records. Also downloading the data can take quite a while so we want to check there is a reasonable number of records before downloading them.

count_sp <- dismo::gbif(
    genus = "Capra",
    species = "ibex",
    geo = TRUE,  #we only want records with geographic information
    removeZeros = TRUE, #removes records where the latitude or longitude are 0
    download = FALSE
  )

print(count_sp)

## [1] 50122

There are about 50,000 records for the Alpine ibex so we can go ahead and download the data for this species.

capra <- dismo::gbif(
      genus = "Capra",
      species = "ibex",
      geo = TRUE, #only downloads ones with location data
      removeZeros = TRUE, # ignores locations where either lat or lon are 0
      download = TRUE
    )

The gbif() command is from the “dismo” package - if you had problems with installing it you can download the data directly from the GBIF website - http://www.gbif.org/ (You’ll need to create an account). If dismo worked then skip ahead to “Looking at the downloaded data”

To download the data manually:

Data -> Explore Species -> Type in your species -> Click on the name of your species -> Click “All X,XXX” to the right of the map -> Download -> Select Darwin Core Archive -> Download and extract file -> Copy and paste “occurrence.txt” to the folder you previously set as the working directory

Here we also create two new columns, “lon” and “lat” so that the column names are the same as the data downloaded using the gbif function.

capra <- read.delim("occurrence.txt", header = TRUE)

capra$lon <- capra$decimalLongitude
capra$lat <- capra$decimalLatitude

Looking at the downloaded data

We can count the number of records downloaded using the nrow() command

nrow(capra)

## [1] 56678

The head() command will show you the first six rows of the data, including the longitude and latitude. A similar command is tail() for looking at the last six rows in the data.

head(capra)

The key information we need from these data are the latitude and longitude, some of the downloaded points will be missing this information, so we can remove those.

Here we use the dplyr package which is really good for handling data. The dplyr package uses pipes (%>%) to link functions together. For example in the code below we start with the ‘capra’ dataframe and then use the select() function to select out which columns we are interested in. Then we pipe these columns into the filter() function wrapped around the complete.cases() function, this filters out rows which have an NA in any of the selected columns. Lastly we pipe this into the distinct() function which removes any duplicate rows from the dataframe.

In the case of the Alpine ibex this removes a large amount of the data and we are left with about 1,800 records.

Cleaning the occurrence data

#A rough extent of the European Alps
xmin <- 6
xmax <- 14.5
ymin <- 43
ymax <- 49

capgeo <- capra %>%
  select(species, lat, lon) %>% #selecting columns
  filter(complete.cases(.)) %>%  #removing any rows with NA in
  filter(lon > xmin &
           lon < xmax &
           lat > ymin & lat < ymax) %>%  #excluding points outside the alps
  distinct() #getting unique points


nrow(capgeo)

## [1] 459

We can also use the CoordinateCleaner package to automatically clean the coordinates. This runs through a series of rules and flags whether each coordinate breaks any of the rules. We can filter out rows in which the coordinates break any of the rules using the code below.

clean_cap <-
  clean_coordinates(capgeo,
                    lat = "lat",
                    lon = "lon",
                    species = "species")

## Testing coordinate validity

## Flagged 0 records.

## Testing equal lat/lon

## Flagged 0 records.

## Testing zero coordinates

## Flagged 0 records.

## Testing country capitals

## Flagged 0 records.

## Testing country centroids

## Flagged 0 records.

## Testing sea coordinates

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Fiona\AppData\Local\Temp\RtmpWyL7TC", layer: "ne_50m_land"
## with 1420 features
## It has 3 fields
## Integer64 fields read as strings:  scalerank

## Flagged 0 records.

## Testing geographic outliers

## Flagged 0 records.

## Testing GBIF headquarters, flagging records around Copenhagen

## Flagged 0 records.

## Testing biodiversity institutions

## Flagged 0 records.

## Flagged 0 of 459 records, EQ = 0.

head(clean_cap)

##      species      lat       lon .val .equ .zer .cap .cen .sea .otl .gbf .inst
## 1 Capra ibex 46.17947 10.495021 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## 2 Capra ibex 46.97640  8.248958 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## 3 Capra ibex 44.50710  6.922272 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## 4 Capra ibex 44.51242  6.909464 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## 5 Capra ibex 44.50958  6.908445 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## 6 Capra ibex 46.33774 10.486901 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##   .summary
## 1     TRUE
## 2     TRUE
## 3     TRUE
## 4     TRUE
## 5     TRUE
## 6     TRUE

clean_cap <- clean_cap %>%
  filter(.summary == TRUE)

Now we can start plotting the data to make sure that they are where you expect them to be.

The “maptools” package has some maps already built into it, here we use the “wrld_simpl” one as a quick way of visualising the data.

Here we have altered the extent of the frame, the xlim and ylim values. They have been changed to ensure that the extent of the map is one degree wider than the furthest location points.

The box() command plots an empty box around the frame of the map as the axes can get cut off where they overlap with land.

The points() command plots the location points on top of the map, the pch argument selects the character used to represent location points and the cex argument can be used to change the size of the points.

Plotting the occurrence points

map(
  'worldHires',
  xlim = c(min(capgeo$lon) - 0.2, max(capgeo$lon) + 0.2),
  ylim = c(min(capgeo$lat) - 0.2, max(capgeo$lat) + 0.2),
  fill = T,
  col = "light grey"
)

box()

points(capgeo$lon,
       capgeo$lat,
       col = "orange",
       pch = 20,
       cex = 0.7)

Some of the location points may look unusually evenly spaced, why might this be?

This plot shows all of the possible pch options in a variety of cex sizes.

plot(1:20, 1:20, pch=1:20, cex=rep(1:4))

In order to enter the species locations into Maxent we need to create a csv (comma separated value) file with one column each for the species name, latitude and longitude. This file will automatically be saved in the working directory.

Saving the occurences ready for Maxent

capc <- capgeo %>% 
  dplyr::select(species, lat, lon)

write.csv(capc, "capra_locs.csv", row.names = FALSE)

The next step is to gather some environmental data to input into Maxent, which we will do in Lab 2.