Introduction

This is a tutorial on how to access digital biodiversity datasets from previously published papers and community biodiversity projects. The first part of this tutorial will cover how to search and access data from previously published papers (Google Scholar, Biodiversity Heritage Library, dryad, etc.). The second part will cover how to access information from community projects like iNaturalist through GBIF and iDigBio.

The Literature and You or “Why did they organize their crap like this?”

Google Scholar is generally considered the premier academic search engine. It can be found at the following web address:

https://scholar.google.com/

Now, lets try an exercise that I was required to do throughout my undergraduate research program: searching for intestine length data. For this exercise, we want to find an article that possesses meaningful intestine length measurements for either one or an assortment of species within Carnivora (a clade compromising most mammalian carnivores). Try out the following phrase in the search bar:

Carnivora Intestine Length


We can see a couple of things from this search.

In the center is a list of articles with links in blue to the article in question as well as the snippets from the article matching your search.

On the left side is a series of filters that allow you

  1. Limit your search to articles by the year they were published
  2. Filter out patents and citatations(where someone has cited an article but scholar does not have an online link for the article in question).
  3. Organize by date (but this does not appear to work well beyond one year from present day).

The links on the right side indicate whether the article is behind a pay-wall or not. Sometimes, our university has access to articles that you may not have personal access to. Usually there will be a tag saying whether it is available from uidaho.edu.

Clicking on the center blue link will bring you to the journal page in question. Let’s click on the first article that pops up.


Usually, the first thing displayed when you view the article is the abstract (a condensed summary with important findings of the study). While this article appears to have some information on intestine length for a carnivore, it is behind a paywall. To get around it, try searching for the article title using the uidaho library search engine. The first one to report the information on intestine length in this article gets a cookie.

Let’s go back to the scholar page.

Now, let’s try out making our search more specific. Place quotes around intestine length like so:

Carnivora "Intestine Length"

What do you see? Any carnivore articles?

Since there were not many on the first page, let’s try narrowing down our search to things by year. Let’s restrict it to articles only published after 2014.

—-

The first one looks particularly promising, let’s click on the right link next to the article [PDF]researchgate.net.

The article will be downloaded to your computer but may require a captcha verification before it can proceed.

Examine the article and tell me when you find the small intestine length of the brown bear. Tables like the one below are a good place to start.


In this case, the intestine length dataset was found in a table in the main meat of the article. This is not often the case. Usually the journals or article authors opt to put the data used in the study in the supplementary material that is not included in the main part of the article - to save space in the actual journal publication.

Let’s go back to our first search to see an example of this. Look for the article with Lavin as the primary author. Click on the article and attempt to find the supplementary material. There are many paths you can take to get to this article but not all contain the supplementary material. This can be frustrating. Try to find the way that gets you the supplementary material with intestine length data.

There are many other ways to search through Google Scholar there are some advanced search settings you can use to filter by author, journal, or additional date ranges.


Some advice if you know an article exists but is not on Scholar:

Try a normal Google search. If you can find the journal and date it is much easier to locate the article.

You can look for it on Biodiversity Heritage Library if it is a very old article.

Worst case, request the article from our Interlibrary Loan service but expect them to take a couple of weeks to find it.

—-

GBIF

Now, let’s take at the first of our community biodiversity datasets and what is considered by some as the premier global biodiversity dataset: GBIF.

GBIF stands for Global Biodiversity Information Facility. It houses information on museum specimens, iNaturalist observations, and other collections who have detailed locality information for their specimens.

Let’s poke around a bit. You can search by occurrence, for a particular species, or for a specific dataset you want to access. The datasets are all synthesized into the larger GBIF dataset so it is not necessary to poke around into one particular dataset unless you are interested in how the data was originally sourced.

Once you all feel like you have sort of feel for it, let’s go to the occurrence page and search for Anguispira, the snail we found out at MOSS.

Zoom into North America. You should see something like this:

To be continued

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.