library(frictionless)
library(GeoLocatoR)
library(tidyverse)
library(writexl)
library(readxl)Online version: https://rpubs.com/rafnuss/geolocator_create_from_soi
In this vignette, we cover the steps involved in the creation of the Core GeoLocator Data Package (i.e., before analysis) with the Swiss Ornithological Institute (SOI) geolocator dataset.
Illustration of the entire pipeline of data for SOI. This script deals with steps 1, 2, 4, 5 and 6.
1 Create Project
A project will result in a single data package, but usually consists of multiple orders. As such, the creation of project metadata is only performed once.
First, define the path where you are working (typically on the Z: drive) and your project name.
dp_path <- "Z:/DOM_Forschung/UNIT_Vogelzug/40 Data/20 Geolocator/"
# dp_path <- '/Volumes/Daten/DOM_Forschung/UNIT_Vogelzug/40 Data/20 Geolocator/' # Mac User
project_name <- "test"1.1 Create the project folder
You can create a new Rstudio project with
usethis::create_project(path = glue::glue("{dp_path}/Datapackage/{project_name}"))Add the data Scientifique Collaboration Aggreement to this folder.
1.2 Initiate DataPackage with datapackage.json
Start by setting be most basic metadata. The required metadata (with a * below) should be defined in the data agreement, while optional metadata can be added later. These metadata make up the datapackage.json file:
title*contributors*: including email to define who has access to the private Zenodo during embargoembargo*: default is no embargo with a date defined in the past (e.g. 1970-01-01)licences*: CC-BY-4.0 by default.descriptionrelatedIdentifiersgrantskeywords
pkg <- create_gldp(
title = "Geolocator study of {species_name} in {location}", # required
contributors = list( # required
list(
title = "Raphaël Nussbaumer",
roles = c("ContactPerson", "DataCurator", "ProjectLeader"),
email = "raphael.nussbaumer@vogelwarte.ch",
path = "https://orcid.org/0000-0002-8185-1020",
organization = "Swiss Ornithological Institute"
),
list(
title = "Yann Rime",
roles = c("Researcher"),
email = "yann.rimme@vogelwarte.ch",
path = "https://orcid.org/0009-0005-7264-6753",
organization = "Swiss Ornithological Institute"
)
),
# licenses = The default licenses should be ok in most cases
embargo = "2028-01-01"
)Read the datapackage specification to learn about all recommended metadata that can be added. They can be added in create_gldp() or updated manually on pkg directly as below:
# Description is really important to provide some textual background information on the project.
pkg$description <- "Geolocator study of Mangrove Kingfisher and Red-capped Robin-chat on the coast of Kenya"
# You can also add keywords:
pkg$keywords <- c("Mangrove Kingfisher", "Red-capped Robin Chat", "multi-sensor geolocator")
# Funding sources
pkg$grants <- c("Swiss Ornithological Intitute")
# Identifiers of resources related to the package (e.g. papers, project pages, derived datasets, APIs, etc.).
pkg$relatedIdentifiers <- list(
list(
relationType = "IsDescribedBy",
relatedIdentifier = "10.13140/RG.2.2.34477.10721",
relatedIdentifierType = "DOI"
)
)3 Processing returned data
3.1 Process geolocator data
The geolocator data can be extracted and stored as usual on the Z: drive.
3.3 Create DataPackage with data
We can now use the function read_gldp to read the datapackage from the previous version and add the three core resources on pkg with add_gldp_soi()
pkg <- read_gldp(file = file.path(version, "datapackage.json")) # Read the datapackage from the folder just created
# Add data by selection
pkg <- pkg %>% add_gldp_soi(
gdl = gdl0 %>% filter((GDL_ID %in% tags(pkg)$tag_id) & (OrderName == "OtuScoES23")),
directory_data = "../../10 Raw data"
)⠙ 1/5 ETA: 17s |
⠹ 2/5 ETA: 10s |
⠸ 4/5 ETA: 3s |
# Bump the version
pkg$version <- "v0.1.2"
# v(0=not analyse),(1=first year of the project),(1=New version of the package with the same list of tag_id)
# Diplay the package
print(pkg)
── A GeoLocator Data Package (v0.2)
• title: "Geolocator study of {species_name} in {location}"
• contributors:
Raphaël Nussbaumer ('raphael.nussbaumer@vogelwarte.ch') (ContactPerson,
DataCurator, ProjectLeader) - <https://orcid.org/0000-0002-8185-1020>
Yann Rime ('yann.rimme@vogelwarte.ch') (Researcher) -
<https://orcid.org/0009-0005-7264-6753>
• embargo: 2028-01-01
• licenses: Creative Commons Attribution 4.0 (CC-BY-4.0) -
<https://creativecommons.org/licenses/by/4.0/>
• description: "Geolocator study of Mangrove Kingfisher and Red-capped
Robin-chat on the coast of Kenya"
• version: "v0.1.2"
• relatedIdentifiers:
IsDescribedBy <10.13140/RG.2.2.34477.10721>
• grants: "Swiss Ornithological Intitute"
• keywords: "Mangrove Kingfisher", "Red-capped Robin Chat", and "multi-sensor
geolocator"
• created: 2025-01-13 10:57:54
• spatial: Polygon and c(-3.840505, -3.794914, -3.794914, -3.840505, -3.840505,
43.455195, 43.455195, 43.485319, 43.485319, 43.455195)
• temporal: "2023-07-01" to "2024-08-20"
• taxonomic: "Otus scops"
• numberTags:
tags: 22
measurements: 5
light: 5
pressure: 5
activity: 5
magnetic: 5
── 3 resources:
• tags
• observations
• measurements
Use `unclass()` to print the Geolocator Data Package as a list.
As we’ve just added the three core resources, some additional metadata of the package were automatically computed. You can now see the total number of tags.
3.4 Validate datapackage
Before publishing/sharing your data, it is essential to validate your GeoLocator Data Package.
validate_gldp(pkg)
── Check GeoLocator DataPackage profile ──
✔ title is valid.
✔ contributors is valid.
✔ contributors[[1]] is valid.
✔ contributors[[2]] is valid.
✔ embargo is valid.
✔ licenses is valid.
✔ licenses[[1]] is valid.
✔ created is valid.
✔ $schema is valid.
✔ resources is valid.
✔ resources[[1]] is valid.
✔ resources[[2]] is valid.
✔ resources[[3]] is valid.
✔ description is valid.
✔ keywords is valid.
✔ keywords[[1]] is valid.
✔ keywords[[2]] is valid.
✔ keywords[[3]] is valid.
✔ grants is valid.
✔ grants[[1]] is valid.
✔ relatedIdentifiers is valid.
✔ relatedIdentifiers[[1]] is valid.
✔ taxonomic is valid.
✔ taxonomic[[1]] is valid.
✔ numberTags is valid.
✔ numberTags$tags is valid.
✔ numberTags$measurements is valid.
✔ numberTags$light is valid.
✔ numberTags$pressure is valid.
✔ numberTags$activity is valid.
✔ numberTags$temperature_external is valid.
✔ numberTags$temperature_internal is valid.
✔ numberTags$magnetic is valid.
✔ numberTags$wet_count is valid.
✔ numberTags$conductivity is valid.
✔ spatial is valid.
! spatial cannot be validated (external schema).
✔ version is valid.
✔ temporal is valid.
✔ temporal$start is valid.
✔ temporal$end is valid.
✔ Package is consistent with the profile.
── Check GeoLocator DataPackage Resources
── Check GeoLocator DataPackage Resources tags ──
✔ tags$tag_id is valid.
✔ tags$ring_number is valid.
✔ tags$scientific_name is valid.
✔ tags$manufacturer is valid.
✔ tags$model is valid.
✔ tags$firmware is valid.
✔ tags$weight is valid.
✔ tags$attachment_type is valid.
✔ tags$readout_method is valid.
✔ tags$tag_comments is valid.
✔ Table tags is consistent with the schema.
── Check GeoLocator DataPackage Resources observations ──
✔ observations$ring_number is valid.
✔ observations$tag_id is valid.
✔ observations$observation_type is valid.
✔ observations$datetime is valid.
✔ observations$latitude is valid.
✔ observations$longitude is valid.
✔ observations$location_name is valid.
✔ observations$device_status is valid.
✔ observations$observer is valid.
✔ observations$catching_method is valid.
✔ observations$age_class is valid.
✔ observations$sex is valid.
✔ observations$condition is valid.
✔ observations$mass is valid.
✔ observations$wing_length is valid.
✔ observations$additional_metric is valid.
✔ observations$observation_comments is valid.
✔ Table observations is consistent with the schema.
── Check GeoLocator DataPackage Resources measurements ──
✔ measurements$tag_id is valid.
✔ measurements$sensor is valid.
✔ measurements$datetime is valid.
✔ measurements$value is valid.
✔ measurements$label is valid.
✔ Table measurements is consistent with the schema.
✔ Package's ressources are valid.
── Check GeoLocator DataPackage Coherence
✔ Package is internally coherent.
── Check Observations Coherence
✔ observations table is coherent.
✔ Package is valid.
4 Create the Zenodo repository
4.1 Option 1: Manually
First, create a new deposit on Zenodo and reserve the DOI to be able to define the package id.
The package id should be the concept DOI, that is, the one that doesn’t change with new versions. The DOI displayed on Zenodo is actually the DOI of the first version, but you can retrieve the concept DOI by substracting 1 to your ID number
pkg$id <- "https://doi.org/10.5281/zenodo.{ZENODO_ID - 1}"
# e.g. "10.5281/zenodo.14620590" for a DOI reserved as 10.5281/zenodo.14620591
# Update the bibliographic citation with this new DOI
pkg <- pkg %>% update_gldp_bibliographic_citation()Now, we can write the datapackage to file with
write_package(pkg, directory = pkg$version)The content of the folder created can now be uploaded on your Zenodo deposit.
You can populate all other fields on Zenodo with the information provided in datapackage.json! Note that a datapackage contributors corresponds to creators on Zenodo and not the contributors.
4.2 Option 2: Programatically
A more efficient solution is to create a deposit on Zenodo using the API. For this, you first need to create a token and save it to your keyring with:
keyring::key_set_with_value("ZENODO_PAT", password = "{your_zenodo_token}")This will allow us to create a ZenodoManager object which will become useful later.
zenodo <- zen4R::ZenodoManager$new(token = keyring::key_get(service = "ZENODO_PAT"))✔ Successfully connected to Zenodo with user token
You can create a zen4R::ZenodoRecord object from the from pkg.
z <- gldp2zenodoRecord(pkg)✔ Successfully connected to Zenodo with user token
✔ Successfully fetched resourcetype 'dataset'
✔ Successfully fetched list of affiliations!
Warning: ! Zenodo's creator can only have a single role.
→ Only the first role will be kept
✔ Successfully fetched list of affiliations!
✔ Successfully fetched license 'cc-by-4.0'
✔ Successfully fetched list of funders!
print(z)<ZenodoRecord>
....|-- created: <NULL>
....|-- updated: <NULL>
....|-- revision_id: <NULL>
....|-- is_draft: <NULL>
....|-- is_published: <NULL>
....|-- status: <NULL>
....|-- versions: <NULL>
....|-- access:
........|-- record: public
........|-- files: restricted
........|-- embargo:
............|-- active: TRUE
............|-- until: 2028-01-01
............|-- reason:
....|-- files: <NULL>
....|-- id: <NULL>
....|-- links: <NULL>
....|-- metadata:
........|-- resource_type:
............|-- id: dataset
........|-- publisher: Zenodo
........|-- title: Geolocator study of {species_name} in {location}
........|-- creators:
........|-- rights:
........|-- description: Geolocator study of Mangrove Kingfisher and Red-capped Robin-chat on the coast of Kenya
........|-- version: v0.1.2
........|-- related_identifiers:
........|-- subjects:
........|-- publication_date: 2025-01-13
....|-- parent: <NULL>
....|-- pids: <NULL>
....|-- stats: <NULL>
Learn more about the zen4R package!
You can create the deposit on the website. For this we need to reserve the DOI, but without publishing the record yet: there is no data!
z <- zenodo$depositRecord(z, reserveDOI = TRUE, publish = FALSE)✔ Successful record deposition
✔ Successful reserved DOI for record 15004303
You can now open this record on your browser using its self_html
print(z$links$self_html)[1] "https://zenodo.org/uploads/15004303"
We can retrieve the concept DOI to build the pkg id
pkg$id <- paste0("https://doi.org/", z$getConceptDOI())We can now upload the data to the deposit with (or do it manually from the website)
write_package(pkg, directory = pkg$version)
for (f in list.files(pkg$version)) {
zenodo$uploadFile(file.path(pkg$version, f), z)
}At this stage, the Zenodo record is still not published. This is voluntarily not done automatically so that you check the record before publishing.
A nice feature of Zenodo is that you can share the record BEFORE publication with others (e.g., co-authors) allowing them to check everything before publication.
If any modification of the metadata are made on Zenodo, you overwrite pkg’s metadata with
z_updated <- zenodo$getDepositionByConceptDOI(z$getConceptDOI())
pkg <- zenodoRecord2gldp(z_updated, pkg)5 Following years
The following years, the same two operations will need to be performed with a subset of the geolocators.
5.1 Placing order
The creation of the tags.csv and observation.csv follows the same procedure. Create the data package from datapackage.json, read the corresponding gdl (using OtuScoES24 for the new year). add_gldp_soi gives priority to the existing tags and observations and only generates the empty row for the new geolocators.
pkg <- read_gldp("v0.1.2/datapackage.json") %>%
add_gldp_soi(
gdl = gdl0 %>% filter(str_detect(OrderName, "OtuScoES24")),
directory_data = "../../data"
)
pkg$version <- "v0.2.0" # v(0=not analyse),(2=second year of the project),(0=empty table)
write_package(pkg, directory = pkg$version)The tags.csv and observation.csv created will therefore have a combination of last years’ and the current year’s data.
5.2 Processing return data
The newly filled tags.csv and observations.csv files that the ringers returned should now be put in the new folder version v0.2.1.
version <- "v0.2.1" # v(0=not analyse),(2=second year of the project),(1=first version returned)pkg <- read_gldp(file = file.path("v0.2.1", "datapackage.json")) # Read the datapackage from the folder just created
# Add data by selection
pkg <- pkg %>% add_gldp_soi(
gdl = gdl0 %>% filter((GDL_ID %in% tags(pkg)$tag_id) & (OrderName == "OtuScoES24")),
directory_data = "../../10 Raw data"
)Create the new version with
# Bump the version
pkg$version <- "v0.2.2"
# v(0=not analyse),(2=second year of the project),(1=New version of the package with the same list of tag_id)
write_package(pkg, directory = pkg$version)5.3 Update Zenodo
Not yet, tested, probably need to create a new deposit if already published???
We can retrieve the latest record from the concept DOI
z <- zenodo$getDepositionByConceptDOI(gsub("https://doi.org/", "", pkg$id))If any metadata from datapackage.json has changed, we can update the deposit with
z <- gldp2zenodoRecord(pkg, z)✔ Successfully connected to Zenodo with user token
✔ Successfully fetched resourcetype 'dataset'
✔ Successfully fetched list of affiliations!
Warning: ! Zenodo's creator can only have a single role.
→ Only the first role will be kept
✔ Successfully fetched list of affiliations!
✔ Successfully fetched license 'cc-by-4.0'
✔ Successfully fetched list of funders!
We can create a new version with
z <- zenodo$depositRecordVersion(
z,
delete_latest_files = TRUE,
files = file.path(pkg$version, list.files(pkg$version)),
publish = FALSE
)