Introduction

This is a quick guide to obtaining data for research from GBIF. It guides you through how to get occurrence data for a particular species.

Load the required packages

You need to have the rgbif package loaded. If you haven’t already installed them you need to do this first like this…

install.packages("rgbif")

Then load it like this…

library(rgbif)

Getting data

GBIF provides two methods for retrieving occurrence data:

  1. occ_download(): Offers access to an unlimited number of records, making it ideal for rigorous research and citation purposes.
  2. occ_search(): Capped at 100,000 records, this method is primarily suited for preliminary tests.

The occ_search() function, along with its counterpart occ_data(), is not recommended for comprehensive research efforts. Despite the convenience of not requiring a username or password, and avoiding the wait time associated with downloads, occ_search() falls short for in-depth research projects. For any substantial research undertaking, the use of occ_download() is strongly advised.

We will therefore focus on occ_download.

Setting up your log in credentials

The function occ_download() requires your log in credentials. You will first need to create an account at GBIF.

Take a note of your username, email and password.

The simplest way to provide these details to the occ_download() function is to run include them in the function arguments.

To do that, I first create these credentials as objects in R.

user <- "owenjones"
pwd <- "my secure password 1234"
email <- "jones@biology.sdu.dk"

Then I can run occ_download to download data.

But first you should know how to query the database.

The results should now be on your downloads user page https://www.gbif.org/user/download.

Building a query for GBIF.

You can filter your download request based on taxon, geographic location and other factors. This is handled with a set of functions that all have the prefix pred, which stands for predicate.

It is likely that you will want to download data for a particular species, so let’s look at that first. To cope with synonyms, GBIF uses a fixed backbone taxonomy where species are given a numerical ID - synonyms will have the same ID. To find out the ID number for a particular species you can use the function name_backbone. The function produces a data frame with a lot of taxonomic information, but what you need in this context is the usageKey

For example, the European Cuckoo (Cuculus canorus)…

x <- name_backbone("Cuculus canorus")
x$usageKey
## [1] 5231918

We can now use that key to set the pred for the taxon like this.

pred("taxonKey", x$usageKey)

We might also want to add other specifications for our search:

pred("country","DK") # has country Denmark (DK)
pred("hasGeospatialIssue", FALSE) # Has no problems with geospatial coordinates
pred("hasCoordinate", TRUE) # Has a coordinate included
pred("occurrenceStatus","PRESENT") # Is recorded as a presence
pred_gte("year", 1900) # Is recorded from 1900 onwards

There are other pred commands to build complex sophisiticated queries: https://docs.ropensci.org/rgbif/articles/getting_occurrence_data.html

We can put all these together into a single command like this.

(gbif_download <- occ_download(
  pred("taxonKey", x$usageKey),
  pred("country", "DE"), 
  pred("hasGeospatialIssue", FALSE), 
  pred("hasCoordinate", TRUE), 
  pred("occurrenceStatus", "PRESENT"), 
  pred_gte("year", 1900), 
  user = user, pwd = pwd, email = email,
  format = "SIMPLE_CSV"
))

When you run this command it will produce some output to the screen, and save the information to an object called gbif_download.

It will take a few minutes to prepare the data, so first you should check that the data are available like this;

occ_download_wait(gbif_download)

You will get a message when it is done. Then you can get the data using a combination of occ_download_get and occ_download_import:

myData <- occ_download_get(gbif_download) |>
  occ_download_import()

Now your data are stored as myData, which you can examine using summary(myData).