In our observatories, we store the authentic copies of new datasets on the European scientific data repository, Zenodo. If you are new to Zenodo, you should upload at least one or two datasets manually, before trying to automate the process. And to avoid live-testing in Zenodo, where everythings is supposed to be permanent, set up a practice account on its practice clone, Sandbox Zenodo account.
In this example, you will authenticate yourself twice: you will authenticate yourself as the creator (author) of a scientific object with ORCID. ORCID provides a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. And you will authenticate your session to Zenodo / Sandbox Zenodo with a personal access token to one of these sites. Beware, your Zenodo credentials do not work on Sandbox Zenodo, which ensures that you cannot accidentally upload practicing material to the real Zenodo, where there is no undo
button.
For practicing, please set up a Zenodo Sandbox account. Zenodo Sandbox is a clone of Zenodo, created for testing applications. Do not practice on Zenodo itself, because whatever you publish on Zenodo is permanent. All examples below work exactly the same on Zenodo, without the sandbox
subdomain in the calls.
Important: you will get a verification for both your account and your email address. If you do not reply to the verification links (check Spam, Social, etc.) the API will seemingly work, but not record anything. That will lead to misleading error messages.
Once you are sure that your Sandbox account is up an running and verified, you should create your Personal Access Token (PAT). In your user profile, Go to Applications
and create a secret code with clicking deposit:actions, deposit:write.
In R, the best practice is to store this PAT in a keyring with the keyring package. The following code will interactively set your PAT, i.e. if you run it in R, a pop-up window will ask you to copy the Zenodo Sandbox PAT from your browser to a textbox. Of course, you can use your favorite method managing your secret variables, but do not expose it to the risk that you accidentally upload your PAT to github or send it to somebody in an email. If you store it in your repo, make sure that you exclude its synching in .gitignore
, and in a package exclude it in .Rbuildignore
, too.
require(keyring)
keyring::key_set (service = "Zenodo_Sandbox", username = NULL) # separately for the sandbox
keyring::key_set (service = "Zenodo", username = NULL) # and the real service
Your deposition has three important parts:
In this tutorial we use Zen4R, the R Interface to the Zenodo REST API. Zen4R uses R6 objects to prepare your deposition, which, unless you are familiar with truly object oriented languages, is a bit bizarre at first sight. R6 objects are not real R objects, but environments, so when you create a metadata record, they will not show in RStudios’s Environment window as an object, but as a new environment. If all goes well, this should not bother use, but if you experience unexpected behavior, be mindful that you are not debugging a true R object in your global or function environment.
Using the keyring you you set up earlier, you initiate a new session with the API:
require(zen4R) # for Zenodo API interaction
## Loading required package: zen4R
## Warning: package 'zen4R' was built under R version 4.0.5
require(keyring) # you can use any other secure form to store your PAT
## Loading required package: keyring
## Warning: package 'keyring' was built under R version 4.0.5
ZENODO <- ZenodoManager$new(
url = "https://sandbox.zenodo.org/api",
token = keyring::key_get( "Zenodo_Sandbox"),
logger = "INFO"
)
## [zen4R][INFO] ZenodoManager - Successfully connected to Zenodo with user token
If you did not store your PAT anywhere, you can also simply write token = 'abc_mytoken_def'
where ‘abc_mytoken_def’ is of course your secret token generate on Zenodo Sandbox. If you have failed to save, you can always generate a new one in the web interface.
In this step we are creating an R6 object called myrec
. If you run this code, in your RStudio, in the Environment window you will see myrec not as an object, but as an Environment
.
If you run myrec
in your console, you will be able to print out your record, but only if you replaced Jane Doe
with your own name, and orcid
with your true ORCID ID
. Also, if you specify a pre-set DOI (and don’t expect to get a new DOI from Zenodo), you must beware that the API will check the validity of the DOI
and the ORID ID
, firstname
, lastname
. So you cannot test the following code with the dummy http://doi.org/00.0000/zenodo.00000
and Jane Doe
.
Because we work with R6
objects, your session is called ZENODO
, and you call the depositRecord
method on session ZENODO
to assign the myrec
ZenodoRecord record object. With Jane Doe
, you will get a validation error, but it should work fine, provided that you have spelled your name identically to your ORCID ID records.
myrec <- ZENODO$depositRecord(myrec)
## [zen4R][ERROR] ZenodoManager - Error while depositing record: Validation error.
## [zen4R][ERROR] ZenodoManager - Error: metadata.creators.0.orcid - Not a valid ORCID identifier.
## [zen4R][ERROR] ZenodoManager - Error: metadata.doi - The provided DOI is invalid - it should look similar to '10.1234/foo.bar'.
If you have provided some real information, you went throgh the verification, and take a look in your browser to your Zenodo Sandbox account’s uploads
, you must see a record, but not the data.
Crucially, at this point your record should have a Zenodo ID, which connects you, as the author, your ORCID ID with the metadata record. In this blogpost example, because we used Jane Doe
, you get an NULL id back.
myrec$id
## NULL
Should you work programmatically, your script can make check before upload if you have got here safely with:
is.null(myrec$id)
## [1] TRUE
… which of course in a real-life example must return FALSE
. The final step is to connect your ZENODO
session (the R6
object) to an R file that should be uploaded. You can, if you want to, upload rds
files to Zenodo, but I would suggest something more system- and language independent.
In this example, I create a temporary file in your R session, I write there the famous iris
dataset in csv format, and try to upload it to the ZENODO
session. In this case, you get an error code because Jane Doe
with her fictious ORCID ID and non-existing DOI was prevented to spam the server.
my_file_path <- tempfile()
write.csv(iris, my_file_path, row.names = FALSE)
ZENODO$uploadFile(my_file_path, myrec$id)
## $message
## [1] "The method is not allowed for the requested URL."
##
## $status
## [1] 405