Introduction

The Geospatial Quality API (GQ API) is a REST API created to provide access to a set of basic geospatial assessment functions over sets of primary biodiversity records. This package, rgeospatialquality, is built as a wrapper for the GQ API. It provides native access to the methods of the API and allows to use its functions from within an R environment.

In this document, I will show how this package can be used together with ROpenSci’s rgbif to easily apply quality assessment functions to data downloaded through its methods.

Getting occurrence data with rgbif package

Since version 0.9.2, rgbif package offers a new function called occ_data. According to the changelog:

(…) its primary purpose to perform faster data requests. Whereas occ_search() gives you lots of data, including taxonomic hierarchies and media records, occ_data() only gives occurrence data. (via)

This is a perfect function to show how to build synergies between both packages. We will use the occ_data method to download a set of records using any of the available filters and will pass the data to the add_flags function to directly assess the quality of the records.

First, we need to download some records from GBIF with occ_data:

library(rgbif)

d <- occ_data(
    scientificName="Apis mellifera",
    limit=50,
    minimal=FALSE
)

We will extract just 50 records for the bee species Apis mellifera. The default value for limit is 500, but for the purpose of this example, we will stick to a smaller amount of records. minimal=FALSE allows us to get the full set of fields for each record and not only the three “basic” ones (see occ_data documentation for more info).

This method returns a list with 2 elements, meta and data. We will operate with the records themselves, which can be found in the data element

d <- d$data
str(d)
## 'data.frame':    50 obs. of  61 variables:
##  $ name                                : chr  "Apis mellifera" "Apis mellifera" "Apis mellifera" "Apis mellifera" ...
##  $ key                                 : int  1227768619 1227771417 1229612532 1229613441 1249281806 1233602060 1233603193 1249279266 1227770975 1233598326 ...
##  $ decimalLatitude                     : num  20.4 32.7 34.1 32.6 33.1 ...
##  $ decimalLongitude                    : num  -100.1 -117.2 -118 -97 -96.6 ...
##  $ issues                              : chr  "cdround,cudc,gass84" "cdround,cudc,gass84" "cdround,cudc,gass84" "cdround,cudc,gass84" ...
##  $ datasetKey                          : chr  "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" ...
##  $ publishingOrgKey                    : chr  "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" ...
##  $ publishingCountry                   : chr  "US" "US" "US" "US" ...
##  $ protocol                            : chr  "DWC_ARCHIVE" "DWC_ARCHIVE" "DWC_ARCHIVE" "DWC_ARCHIVE" ...
##  $ lastCrawled                         : chr  "2016-02-25T23:57:25.091+0000" "2016-02-25T23:57:29.193+0000" "2016-02-25T23:57:38.240+0000" "2016-02-25T23:57:39.643+0000" ...
##  $ lastParsed                          : chr  "2016-02-25T23:58:11.842+0000" "2016-02-25T23:58:17.571+0000" "2016-02-25T23:58:29.078+0000" "2016-02-25T23:58:31.027+0000" ...
##  $ basisOfRecord                       : chr  "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" ...
##  $ taxonKey                            : int  1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 ...
##  $ kingdomKey                          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ phylumKey                           : int  54 54 54 54 54 54 54 54 54 54 ...
##  $ classKey                            : int  216 216 216 216 216 216 216 216 216 216 ...
##  $ orderKey                            : int  1457 1457 1457 1457 1457 1457 1457 1457 1457 1457 ...
##  $ familyKey                           : int  4334 4334 4334 4334 4334 4334 4334 4334 4334 4334 ...
##  $ genusKey                            : int  1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 ...
##  $ scientificName                      : chr  "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" ...
##  $ kingdom                             : chr  "Animalia" "Animalia" "Animalia" "Animalia" ...
##  $ phylum                              : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ order                               : chr  "Hymenoptera" "Hymenoptera" "Hymenoptera" "Hymenoptera" ...
##  $ family                              : chr  "Apidae" "Apidae" "Apidae" "Apidae" ...
##  $ genus                               : chr  "Apis" "Apis" "Apis" "Apis" ...
##  $ genericName                         : chr  "Apis" "Apis" "Apis" "Apis" ...
##  $ specificEpithet                     : chr  "mellifera" "mellifera" "mellifera" "mellifera" ...
##  $ taxonRank                           : chr  "SPECIES" "SPECIES" "SPECIES" "SPECIES" ...
##  $ dateIdentified                      : chr  "2016-01-02T22:43:48.000+0000" "2016-01-05T03:56:10.000+0000" "2016-01-11T19:22:56.000+0000" "2016-01-12T18:07:05.000+0000" ...
##  $ year                                : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ month                               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ day                                 : int  1 5 11 12 31 22 22 17 2 24 ...
##  $ eventDate                           : chr  "2015-12-31T23:00:00.000+0000" "2016-01-05T00:16:52.000+0000" "2016-01-11T20:12:11.000+0000" "2016-01-11T23:00:00.000+0000" ...
##  $ modified                            : chr  "2016-01-03T00:25:12.000+0000" "2016-01-05T11:42:06.000+0000" "2016-01-13T21:05:37.000+0000" "2016-01-12T18:46:23.000+0000" ...
##  $ lastInterpreted                     : chr  "2016-02-26T00:12:11.464+0000" "2016-02-26T00:12:21.865+0000" "2016-02-26T00:12:42.761+0000" "2016-02-26T00:12:44.401+0000" ...
##  $ references                          : chr  "http://www.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ geodeticDatum                       : chr  "WGS84" "WGS84" "WGS84" "WGS84" ...
##  $ class                               : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ countryCode                         : chr  "MX" "US" "US" "US" ...
##  $ country                             : chr  "Mexico" "United States" "United States" "United States" ...
##  $ rightsHolder                        : chr  "Laura" "Damon Tighe" "Mary" "Sam Kieschnick" ...
##  $ identifier                          : chr  "2542710" "2557600" "2573140" "2575739" ...
##  $ verbatimEventDate                   : chr  "2016-01-01" "Mon Jan 04 2016 16:16:52 GMT-0800 (PST)" "2016-01-11 12:12:11 PM PST" "2016-01-12" ...
##  $ datasetName                         : chr  "iNaturalist research-grade observations" "iNaturalist research-grade observations" "iNaturalist research-grade observations" "iNaturalist research-grade observations" ...
##  $ gbifID                              : chr  "1227768619" "1227771417" "1229612532" "1229613441" ...
##  $ verbatimLocality                    : chr  "San Juan del Río, Querétaro" "San Diego Training Center, San Diego, CA, US" NA NA ...
##  $ collectionCode                      : chr  "Observations" "Observations" "Observations" "Observations" ...
##  $ occurrenceID                        : chr  "http://conabio.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ taxonID                             : chr  "47219" "47219" "47219" "47219" ...
##  $ license                             : chr  "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" ...
##  $ recordedBy                          : chr  "Laura" "Damon Tighe" "Mary" "Sam Kieschnick" ...
##  $ catalogNumber                       : chr  "2542710" "2557600" "2573140" "2575739" ...
##  $ http://unknown.org/occurrenceDetails: chr  "http://conabio.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ institutionCode                     : chr  "iNaturalist" "iNaturalist" "iNaturalist" "iNaturalist" ...
##  $ rights                              : chr  "© Laura some rights reserved" "© Damon Tighe some rights reserved" "© Mary some rights reserved" "© Sam Kieschnick some rights reserved" ...
##  $ occurrenceRemarks                   : chr  "Periferia Presa Constitución 1917. San Juan del Río, Querétaro. Laura Uribe, Grupo Vasconcelos." NA "The bee had fallen into a container of water and I offered it a napkin to climb on. It took its time to dry off and then it fle"| __truncated__ "At lunch, I went over by this pond off of Seeton Road -- spotted a few ducks and some other organisms too." ...
##  $ identificationID                    : chr  "4725405" "4749666" "4788794" "4795726" ...
##  $ eventTime                           : chr  NA "00:16:52Z" "20:12:11Z" NA ...
##  $ coordinateAccuracy                  : num  NA NA 0.024887 0.000036 NA ...
##  $ coordinateAccuracyInMeters          : num  NA NA 2752 3.98 NA ...
##  $ informationWithheld                 : chr  NA NA NA NA ...

Data structure

Both GBIF and the GQ API use Darwin Core (DwC) as the standard for biodiversity data exchange. This standard suggests certain specific names and formats for data values. In particular, the DwC suggests:

The data frame we obtained in the previuos step is already formatted according to the DwC standard:

"decimalLatitude" %in% names(d)
## [1] TRUE
"decimalLongitude" %in% names(d)
## [1] TRUE
"countryCode" %in% names(d)
## [1] TRUE
"scientificName" %in% names(d)
## [1] TRUE

Therefore, we don’t need any further transformation of the data frame, and we can proceed to assess the geospatial quality of the records.

Sending the records to the GQ API

We will use the add_flags function to assess the quality of a set of more than one record. This function is a wrapper for the POST method of the GQ API.

Internally, the function transforms the content of the supplied data.frame to JSON and performs the POST request. Then, translates the results back from JSON to a new data.frame. The resulting object has the same structure as the provided one, with the addition of a list-type new element called flags. Inside that element, there are several sub-fields, each one with the result of a particular check. Please see the GQ API documentation for more information on the functioning of the API.

dd <- add_flags(d)
str(dd)
## 'data.frame':    50 obs. of  62 variables:
##  $ name                                : chr  "Apis mellifera" "Apis mellifera" "Apis mellifera" "Apis mellifera" ...
##  $ key                                 : int  1227768619 1227771417 1229612532 1229613441 1249281806 1233602060 1233603193 1249279266 1227770975 1233598326 ...
##  $ decimalLatitude                     : num  20.4 32.7 34.1 32.6 33.1 ...
##  $ decimalLongitude                    : num  -100.1 -117.2 -118 -97 -96.6 ...
##  $ issues                              : chr  "cdround,cudc,gass84" "cdround,cudc,gass84" "cdround,cudc,gass84" "cdround,cudc,gass84" ...
##  $ datasetKey                          : chr  "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" "50c9509d-22c7-4a22-a47d-8c48425ef4a7" ...
##  $ publishingOrgKey                    : chr  "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" "28eb1a3f-1c15-4a95-931a-4af90ecb574d" ...
##  $ publishingCountry                   : chr  "US" "US" "US" "US" ...
##  $ protocol                            : chr  "DWC_ARCHIVE" "DWC_ARCHIVE" "DWC_ARCHIVE" "DWC_ARCHIVE" ...
##  $ lastCrawled                         : chr  "2016-02-25T23:57:25.091+0000" "2016-02-25T23:57:29.193+0000" "2016-02-25T23:57:38.240+0000" "2016-02-25T23:57:39.643+0000" ...
##  $ lastParsed                          : chr  "2016-02-25T23:58:11.842+0000" "2016-02-25T23:58:17.571+0000" "2016-02-25T23:58:29.078+0000" "2016-02-25T23:58:31.027+0000" ...
##  $ basisOfRecord                       : chr  "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" ...
##  $ taxonKey                            : int  1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 1341976 ...
##  $ kingdomKey                          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ phylumKey                           : int  54 54 54 54 54 54 54 54 54 54 ...
##  $ classKey                            : int  216 216 216 216 216 216 216 216 216 216 ...
##  $ orderKey                            : int  1457 1457 1457 1457 1457 1457 1457 1457 1457 1457 ...
##  $ familyKey                           : int  4334 4334 4334 4334 4334 4334 4334 4334 4334 4334 ...
##  $ genusKey                            : int  1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 1334757 ...
##  $ scientificName                      : chr  "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" "Apis mellifera Linnaeus, 1758" ...
##  $ kingdom                             : chr  "Animalia" "Animalia" "Animalia" "Animalia" ...
##  $ phylum                              : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ order                               : chr  "Hymenoptera" "Hymenoptera" "Hymenoptera" "Hymenoptera" ...
##  $ family                              : chr  "Apidae" "Apidae" "Apidae" "Apidae" ...
##  $ genus                               : chr  "Apis" "Apis" "Apis" "Apis" ...
##  $ genericName                         : chr  "Apis" "Apis" "Apis" "Apis" ...
##  $ specificEpithet                     : chr  "mellifera" "mellifera" "mellifera" "mellifera" ...
##  $ taxonRank                           : chr  "SPECIES" "SPECIES" "SPECIES" "SPECIES" ...
##  $ dateIdentified                      : chr  "2016-01-02T22:43:48.000+0000" "2016-01-05T03:56:10.000+0000" "2016-01-11T19:22:56.000+0000" "2016-01-12T18:07:05.000+0000" ...
##  $ year                                : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ month                               : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ day                                 : int  1 5 11 12 31 22 22 17 2 24 ...
##  $ eventDate                           : chr  "2015-12-31T23:00:00.000+0000" "2016-01-05T00:16:52.000+0000" "2016-01-11T20:12:11.000+0000" "2016-01-11T23:00:00.000+0000" ...
##  $ modified                            : chr  "2016-01-03T00:25:12.000+0000" "2016-01-05T11:42:06.000+0000" "2016-01-13T21:05:37.000+0000" "2016-01-12T18:46:23.000+0000" ...
##  $ lastInterpreted                     : chr  "2016-02-26T00:12:11.464+0000" "2016-02-26T00:12:21.865+0000" "2016-02-26T00:12:42.761+0000" "2016-02-26T00:12:44.401+0000" ...
##  $ references                          : chr  "http://www.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ geodeticDatum                       : chr  "WGS84" "WGS84" "WGS84" "WGS84" ...
##  $ class                               : chr  "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ countryCode                         : chr  "MX" "US" "US" "US" ...
##  $ country                             : chr  "Mexico" "United States" "United States" "United States" ...
##  $ rightsHolder                        : chr  "Laura" "Damon Tighe" "Mary" "Sam Kieschnick" ...
##  $ identifier                          : chr  "2542710" "2557600" "2573140" "2575739" ...
##  $ verbatimEventDate                   : chr  "2016-01-01" "Mon Jan 04 2016 16:16:52 GMT-0800 (PST)" "2016-01-11 12:12:11 PM PST" "2016-01-12" ...
##  $ datasetName                         : chr  "iNaturalist research-grade observations" "iNaturalist research-grade observations" "iNaturalist research-grade observations" "iNaturalist research-grade observations" ...
##  $ gbifID                              : chr  "1227768619" "1227771417" "1229612532" "1229613441" ...
##  $ verbatimLocality                    : chr  "San Juan del Río, Querétaro" "San Diego Training Center, San Diego, CA, US" NA NA ...
##  $ collectionCode                      : chr  "Observations" "Observations" "Observations" "Observations" ...
##  $ occurrenceID                        : chr  "http://conabio.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ taxonID                             : chr  "47219" "47219" "47219" "47219" ...
##  $ license                             : chr  "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" "http://creativecommons.org/licenses/by-nc/4.0/" ...
##  $ recordedBy                          : chr  "Laura" "Damon Tighe" "Mary" "Sam Kieschnick" ...
##  $ catalogNumber                       : chr  "2542710" "2557600" "2573140" "2575739" ...
##  $ http://unknown.org/occurrenceDetails: chr  "http://conabio.inaturalist.org/observations/2542710" "http://www.inaturalist.org/observations/2557600" "http://www.inaturalist.org/observations/2573140" "http://www.inaturalist.org/observations/2575739" ...
##  $ institutionCode                     : chr  "iNaturalist" "iNaturalist" "iNaturalist" "iNaturalist" ...
##  $ rights                              : chr  "© Laura some rights reserved" "© Damon Tighe some rights reserved" "© Mary some rights reserved" "© Sam Kieschnick some rights reserved" ...
##  $ occurrenceRemarks                   : chr  "Periferia Presa Constitución 1917. San Juan del Río, Querétaro. Laura Uribe, Grupo Vasconcelos." NA "The bee had fallen into a container of water and I offered it a napkin to climb on. It took its time to dry off and then it fle"| __truncated__ "At lunch, I went over by this pond off of Seeton Road -- spotted a few ducks and some other organisms too." ...
##  $ identificationID                    : chr  "4725405" "4749666" "4788794" "4795726" ...
##  $ eventTime                           : chr  NA "00:16:52Z" "20:12:11Z" NA ...
##  $ coordinateAccuracy                  : num  NA NA 0.024887 0.000036 NA ...
##  $ coordinateAccuracyInMeters          : num  NA NA 2752 3.98 NA ...
##  $ informationWithheld                 : chr  NA NA NA NA ...
##  $ flags                               :'data.frame':    50 obs. of  12 variables:
##   ..$ hasCoordinates          : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ validCountry            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ validCoordinates        : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ hasCountry              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ coordinatesInsideCountry: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ hasScientificName       : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ highPrecisionCoordinates: logi  TRUE FALSE TRUE TRUE TRUE TRUE ...
##   ..$ nonZeroCoordinates      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..$ negatedLatitude         : logi  NA NA NA NA NA NA ...
##   ..$ negatedLongitude        : logi  NA NA NA NA NA NA ...
##   ..$ distanceToCountryInKm   : num  NA NA NA NA NA NA NA 0 NA NA ...
##   ..$ transposedCoordinates   : logi  NA NA NA NA NA NA ...
dd[1,]$flags  # Flags for the first record
##   hasCoordinates validCountry validCoordinates hasCountry
## 1           TRUE         TRUE             TRUE       TRUE
##   coordinatesInsideCountry hasScientificName highPrecisionCoordinates
## 1                     TRUE              TRUE                     TRUE
##   nonZeroCoordinates negatedLatitude negatedLongitude
## 1               TRUE              NA               NA
##   distanceToCountryInKm transposedCoordinates
## 1                    NA                    NA

References