Share your data and code!

CES Skills Seminars

José R. Ferrer-Paris
(a.k.a. Jose Ferrer; JR)

2023-04-18

My observation of the weekend

inaturalist.org/observations/155084844

Community id

library(rinat)
library(dplyr) 
user_obs <- get_inat_obs_user("NeoMapas")  %>% 
  filter(common_name %in% "Girdled Scalyfin")

My collection of observations

library(leaflet)
library(sf)
obs_df <- 
get_inat_obs_user("NeoMapas")  %>% 
  select(longitude, latitude, datetime, common_name, 
      scientific_name, image_url) %>%
  st_as_sf(coords=c("longitude", "latitude"), crs=4326)
popup_html <- with(obs_df, 
                   sprintf("<p><b>%s</b><br/><i>%s</i></p>
                           <p>Observed: %s<br/>
                           <p><img src='%s'/></p>", 
                           common_name,  scientific_name,
                           datetime, image_url))
leaflet(obs_df) %>% 
  addProviderTiles("Esri.WorldStreetMap") %>% 
  addMarkers(data = obs_df,
                   popup = ~popup_html)

My collection of observations

What is Open Science

Open science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional.

https://en.wikipedia.org/wiki/Open_science

Why Open Science ?

Open Science has the potential of making the scientific process more transparent, inclusive and democratic. It is increasingly recognized as a critical accelerator for the achievement of the United Nations Sustainable Development Goals and a true game changer in bridging the science, technology and innovation gaps and fulfilling the human right to science.

https://www.unesco.org/en/open-science

UNESCO-Open science-pillars-en

But seriously, why?

To deposit or not to deposit, that is the question - journal.pbio.1001779.g001

We all benefit from Open Science

  • Everybody wins
  • Be more efficient
  • Avoid common problems / find solutions faster
  • Increases collaboration
  • Contribute what you can, use what you need
  • Get credit for your work

Kramer, Bianca, & Bosman, Jeroen. (2018, January 14). Rainbow of open science practices. Zenodo. https://doi.org/10.5281/zenodo.1147025

Prepare yourself

To maximise discovery and reuse you should employ best practices in preparing and publishing your code and data to make it Findable Accesible Interoperational Reusable.

Think about

Intelectual property

Data availability policies of journals/funders

File formats

Do you work with sensitive data?

Persistent identifiers

Identify yourself

Use a unique identifier like ORC

Identify your institution

UNSW Sydney:

Identify your data

A Digital Object Identifier (DOI) is a unique, persistent identifier for research outputs.

Ask for help!

Multiple resources available at UNSW

Research Technology Services (ResTech)

UNSWorks is UNSW’s open access repository.

UNSW codeRs UNSW codeRs logo

Share your code

Shared code helps you and your collaborators to analyse and/or visualise your data. It supports validation of your findings and help others to build upon your work.

Why?

  1. Encourage reproducibility
  2. Meet journal/funder requirements
  3. You’ll learn a lot
  4. Extend the half-life of your research
  5. Be more employable

Version control, please!

You can use any flavour1 of version control you like !

There are plenty of tutorials and workshops to learn from.

If you learn all the command-line tricks, great!

But you can do a lot from the browser or in RStudio .

Where?

If you use git you can share your code with:

Share your data

Research data should be accessible to support validation and facilitate data reuse. For sensitive data provide the descriptive metadata.

Ask your librarians!

UNSWorks allows researchers to publish datasets on their platform of choice and then create a record and link to their publications to increase discoverability of their data. Small datasets (up to 5GB) can be uploaded to UNSWorks.

UNSWorks benefits

  • Support for Creative Commons Licenses
  • Automated DOI assignment
  • Automated publication to Research Data Australia (RDA)
  • Embargo functionality
  • Integration with InfoEd grants

Subject-specific repositories

Increase the visibility of your data among researchers in your field.

For example iNaturalist for wildlife observations!

Check the Registry of Research Data Repositories (r3data) to find a repository in your discipline.

Generalist publishing platforms

Figshare

Anyone can sign up for an account, simple up-load and very easy to download and reuse data.

I uploaded my Rdata file here, now I can:

figshare.url <- "https://figshare.com/ndownloader/files/13874333"
con <- url(figshare.url)
load(con)
close(con)

Zenodo

Developed by CERN to support the open access and open science movement in Europe, but available for use by researchers worldwide.

Great for github integration: now your repo has a DOI!

Dryad

  • UNSW researchers have institutional access
  • File upload of up to 300GB per dataset record
  • Curation support
  • Automatic DOI assignment
  • ORCID log in
  • CC0 license

Combine data and code?

Did you pre-register your analysis? have a pre-print? have input and output data? code? photos?

Sooo many DOIs…

OSF

The Open Science framework or (OSF) is an online platform that enables researchers to transparently plan, collect, analyze, and share their work throughout the entire research life cycle.

I use OSF to organise different components of a project

Thank you!

Remember, we are here to help:

Research Technology Services (ResTech)

UNSWorks

UNSW codeRs UNSW codeRs logo

This presentation was prepared by:

José R. Ferrer-Paris ( 0000-0002-9554-3395 / @jrfep)

and is shared with license: Atribution 4.0 Internacional ( 4.0)

This presentation is available at:

rpubs.com/jrfep/…

This presentation was created using RStudio, Quarto v.1.3.330 with fontawesome extension, and reveal.js. Original content, code and instructions available at: UNSW-codeRs/how-to-share-data-and-code

UNESCO.org CC BY 4.0, via Wikimedia Commons

Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014), CC BY 4.0, via Wikimedia Commons

Background images from my iNaturalists observations: https://www.inaturalist.org/photos/268028588 and https://www.inaturalist.org/photos/101251571

Other images attributed in the slide text or source code.

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-9      leaflet_2.1.1 dplyr_1.0.10  rinat_0.1.9  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0        xfun_0.35               bslib_0.4.2            
 [4] leaflet.providers_1.9.0 colorspace_2.0-3        vctrs_0.5.1            
 [7] generics_0.1.3          htmltools_0.5.4         yaml_2.3.7             
[10] utf8_1.2.2              rlang_1.0.6             e1071_1.7-12           
[13] pillar_1.8.1            jquerylib_0.1.4         withr_2.5.0            
[16] glue_1.6.2              DBI_1.1.3               lifecycle_1.0.3        
[19] plyr_1.8.8              stringr_1.5.0           munsell_0.5.0          
[22] gtable_0.3.1            htmlwidgets_1.6.0       evaluate_0.19          
[25] knitr_1.41              fastmap_1.1.0           crosstalk_1.2.0        
[28] curl_4.3.3              class_7.3-20            fansi_1.0.3            
[31] Rcpp_1.0.9              KernSmooth_2.23-20      scales_1.2.1           
[34] DT_0.27                 classInt_0.4-8          cachem_1.0.6           
[37] jsonlite_1.8.4          ggplot2_3.4.0           digest_0.6.31          
[40] stringi_1.7.8           grid_4.2.1              cli_3.4.1              
[43] tools_4.2.1             magrittr_2.0.3          maps_3.4.1             
[46] sass_0.4.4              proxy_0.4-27            tibble_3.1.8           
[49] pkgconfig_2.0.3         ellipsis_0.3.2          assertthat_0.2.1       
[52] rmarkdown_2.19          httr_1.4.4              rstudioapi_0.14        
[55] R6_2.5.1                units_0.8-1             compiler_4.2.1