Introduction

This is a tutorial on how to access digital biodiversity datasets from previously published papers and community biodiversity projects. The first part of this tutorial will cover how to search and access data from previously published papers (Google Scholar, Biodiversity Heritage Library, dryad, etc.). The second part will cover how to access information from community projects like iNaturalist through GBIF and iDigBio.

The Literature and You or “Why did they organize their crap like this?”

Google Scholar is generally considered the premier academic search engine. It can be found at the following web address:

https://scholar.google.com/

Now, lets try an exercise that I was required to do throughout my undergraduate research program: searching for intestine length data. For this exercise, we want to find an article that possesses meaningful intestine length measurements for either one or an assortment of species within Carnivora (a clade compromising most mammalian carnivores). Try out the following phrase in the search bar:

Carnivora Intestine Length


We can see a couple of things from this search.

In the center is a list of articles with links in blue to the article in question as well as the snippets from the article matching your search.

On the left side is a series of filters that allow you

  1. Limit your search to articles by the year they were published
  2. Filter out patents and citatations(where someone has cited an article but scholar does not have an online link for the article in question).
  3. Organize by date (but this does not appear to work well beyond one year from present day).

The links on the right side indicate whether the article is behind a pay-wall or not. Sometimes, our university has access to articles that you may not have personal access to. Usually there will be a tag saying whether it is available from uidaho.edu.

Clicking on the center blue link will bring you to the journal page in question. Let’s click on the first article that pops up.


Usually, the first thing displayed when you view the article is the abstract (a condensed summary with important findings of the study). While this article appears to have some information on intestine length for a carnivore, it is behind a paywall. To get around it, try searching for the article title using the uidaho library search engine. The first one to report the information on intestine length in this article gets a cookie.

Let’s go back to the scholar page.

Now, let’s try out making our search more specific. Place quotes around intestine length like so:

Carnivora "Intestine Length"

What do you see? Any carnivore articles?

Since there were not many on the first page, let’s try narrowing down our search to things by year. Let’s restrict it to articles only published after 2014.

—-

The first one looks particularly promising, let’s click on the right link next to the article [PDF]researchgate.net.

The article will be downloaded to your computer but may require a captcha verification before it can proceed.

Examine the article and tell me when you find the small intestine length of the brown bear. Tables like the one below are a good place to start.


In this case, the intestine length dataset was found in a table in the main meat of the article. This is not often the case. Usually the journals or article authors opt to put the data used in the study in the supplementary material that is not included in the main part of the article - to save space in the actual journal publication.

Let’s go back to our first search to see an example of this. Look for the article with Lavin as the primary author. Click on the article and attempt to find the supplementary material. There are many paths you can take to get to this article but not all contain the supplementary material. This can be frustrating. Try to find the way that gets you the supplementary material with intestine length data.

There are many other ways to search through Google Scholar there are some advanced search settings you can use to filter by author, journal, or additional date ranges.


Some advice if you know an article exists but is not on Scholar:

Try a normal Google search. If you can find the journal and date it is much easier to locate the article.

You can look for it on Biodiversity Heritage Library if it is a very old article.

Worst case, request the article from our Interlibrary Loan service but expect them to take a couple of weeks to find it.

—-

GBIF

Now, let’s take at the first of our community biodiversity datasets and what is considered by some as the premier global biodiversity dataset: GBIF.

GBIF stands for Global Biodiversity Information Facility. It houses information on museum specimens, iNaturalist observations, and other collections who have detailed locality information for their specimens.

Let’s poke around a bit. You can search by occurrence, for a particular species, or for a specific dataset you want to access. The datasets are all synthesized into the larger GBIF dataset so it is not necessary to poke around into one particular dataset unless you are interested in how the data was originally sourced.

Once you all feel like you have sort of feel for it, let’s go to the occurrence page and search for Anguispira, the snail we found out at MOSS.

Zoom into North America. You should see something like this:

We can clearly see an east/west division. It is common knowledge in snail world that the midwest is generally lackluster in terms of snail diversity. The large grassy plains likey pose a significant dispersal barrier to many species.

Let’s add some restrictions to the dataset to filter out low quality occurrences. Check the boxes that require the occurrence to have coordinates(lat,long) and require an image.

We went from ~2500 occurences to 68! Try out some of the other filters to see how you can chop up the occurrence dataset to fit a question.

If you want, you can export this dataset (or any other occurrence search) by clicking on the download button, but you will have to have a GBIF account or sign in through another service. It will usually take 5 minutes or so for a link to be emailed with a link to your data.

—-

Alright, let’s actually get some regional biodiversity data and visualize it a little bit using some of GBIFs software. In the Occurrence search bar enter in Galapagos, select Ecuador as a country filter, and require coordinates for the observation. Zoom in on the Galapagos and you should see something like that below.

—-

Now, switch over to the Taxonomy page and you can see the distribution of Occurrence across known taxonomic units (family level mostly) and groups. This can allow you to get an idea of what are the most commonly observed fauna in the Galapagos.

—-

We can also check up on some metrics in the GBIF dataset to see whether we may be biased towards certain taxonomic groups. Click on the Metrics tab to view a slough of graphs depicting the observations over the course of a year, observations per dataset, and the basis of record for these observations.

—-

GBIF is built for utility and we have only scratched the surface of some of the possible datasets that can be derived. The occurrence information can be used to build predictive models, mine images for locality information, and much more.

—-

iDigBio

Now, let’s move on to the sister of GBIF that has a greater emphasis on museum collections: iDigBio.

iDigBio is an NSF initiative to digitize the collections of many major museum collections. iNaturalist and other applications that monitor present day species through human observation do not feed into this dataset. As such, the only occurrences of species in the dataset are record labels for specimens that are deposited in a museum. These are usually composed of voucher specimens that are submitted during the publication process.

Some of the plusses of iDigBio over GBIF is the quality of the media for species.

Let’s get searching. Click on the Searth the Portal button and you will be redirected to the search page. Search for Strombus a sea snail.

On the right you should see something like the image below. A map of Strombus localities.

Like in GBIF, we can export the results of this search by clicking on the Download tab. Also, like GBIF, it will take you 5mins - 1 hour for the link to be emailed to you.

Let’s try finding all Galapagos species occurrences in iDigBio like we did in GBIF. We could search for Galapagos, and I urge you to try, but GBIF has a handy map tool that lets you draw a circle or rectangle around an area to retrieve all records from that area. On the map, click on the rectangle icon and click-drag a rectangle over the entire Galapagos islands. You should see something like what I have pictured below:


Many of the filters implemented in GBIF are implemented in iDigBio. Try playing around with them to filter the dataset down to any Galapagos fauna of interest to you (Snail, Finch, Tortoise, Blue-footed booby, etc.)

Conclusions

These are some of the fundamental tools you can use to gather biodiversity information. We are available to discuss these tools and aid your projects so please do reach out to us. For the rest of the class please try to use these tools to explore some of the questions in your mind. Maybe you will find a perspective to your question that these tools will enable you to explore.

  • T. Mason Linscott