Adding taxonomic info to the DASCO data

RAVeM

Author

Published

October 15, 2025

Purpose

Add taxonomic information to the DASCO v2.5 dataset

The RAVeM workflow uses data from the DASCO data set, which provides coordinates of alien species occurrences worldwide. The DASCO data set was produced by applying the DASCO workflow (the DASCO workflow) using the SInAS database (version 2.5). The workflow imports checklists of alien species such as those stored in SInAS, and extracts coordinates for the alien regions (according to SInAS) from GBIF and OBIS.

The original DASCO data (version 2.5) can be found at https://zenodo.org/records/10054162. In this report we show how to further format the DASCO data in order to incorporate additional metadata (including taxonomic data) that can be used to extract data subsets for specific groups. The code below should be used only if you want to update the DASCO data to a newer version. If you are using DASCO v2.5, you can skip this section.

1 Load DASCO data

1.1 Load packages

Code

# install packages if not installed
if (!requireNamespace("sketchy", quietly = TRUE)) {
  install.packages("sketchy")
}

packages <- c("data.table", "rgbif", "pbapply", "dplyr")

# install/ load packages
sketchy::load_packages(packages = packages)

1.2 Download/read data

We will download the .csv file with the 2.5 version of DASCO data was previously extracted from the DASCO repository. If you plan to use a newer version of the data please skip this step. The following code downloads it to a local folder defined by the user:

Code

local_folder <- "PATH TO YOUR LOCAL FOLDER" # e.g. "C:/Users/RAVe-M_2025/"

download.file(
    "https://zenodo.org/record/10054162/files/DASCO_AlienCoordinates_SInAS_2.5.csv?download=1",
    destfile = file.path(local_folder, "DASCO_AlienCoordinates_SInAS_2.5.csv"),
    mode = "wb"
)

To work on a more recent version of the DASCO data, replace the name and path of the file in the code below. Now we can read the downloaded file. We will use the fread function from the data.table package, which is optimized for fast reading of large files:

Code

# Load DASCO data
DASCO_v2.5 <- fread(file.path(local_folder, "DASCO_AlienCoordinates_SInAS_2.5.csv"))

2 Format data

2.1 Select non-marine observations

We first select only non-marine observations from the DASCO dataset, which includes terrestrial and freshwater records:

Code

DASCO_v2.5 <- DASCO_v2.5[DASCO_v2.5$Realm != "marine",]

2.2 Add GBIF taxonomy

Now we extract taxonomic details for all species in DASCO from GBIF’s backbone taxonomy (GBIF Secretariat, 2023) using the function rgbif::name_backbone() (this can take several hours to run):

Code

# Extract all species names in DASCO (18292 species names)
sppDASCO <- unique(DASCO_v2.5$taxon)

# Extract all GBIF info for those spp
GBIFtax_list <- pblapply(sppDASCO, name_backbone) 

names(GBIFtax_list) <- sppDASCO

# Create a single DF
GBIFtax <- bind_rows(GBIFtax_list)

We then add higher taxonomy to all records in DASCO v2.5 by merging the taxonomic information extracted from GBIF with the DASCO dataset:

Code

# ADD "taxon" column to the taxonomic dataset to make the match with the names in DASCO dataset
GBIFtax$taxon <- GBIFtax$canonicalName

# Merge taxonomic information with DASCO dataset
DASCO_tax <- merge(DASCO_v2.5, GBIFtax[, c("taxon",
                                           "speciesKey",
                                           "kingdom",
                                           "phylum",
                                           "class",
                                           "order",
                                           "family")], by = "taxon")

3 Save results

3.1 Save DASCO + taxonomy in a single file

We can save the DASCO data with taxonomic information as a .csv file:

Code

# Save the updated DASCO data with taxonomy
fwrite(DASCO_tax, file.path(local_folder, "DASCO_v2.5_withTaxonomy.csv"))

3.2 Save DASCO + taxonomy as data subsets by groups of interest

We can also split the data set by kingdom (e.g. Animalia, Plantae, Fungi, etc.) and save them as .csv files:

Code

# save a csv file for each subset by kingdom
for(i in unique(DASCO_tax$kingdom)) {
    subset_data <- DASCO_tax[DASCO_tax$kingdom == i, ]
    fwrite(subset_data, file.path(local_folder, paste0("DASCO_v2.5_", i, ".csv")))
}

The output files can be found in the folder defined by the user in the local_folder variable:

./data/processed
├── DASCO_v2.5_Animalia.csv
├── DASCO_v2.5_Bacteria.csv
├── DASCO_v2.5_Chromista.csv
├── DASCO_v2.5_Fungi.csv
├── DASCO_v2.5_Plantae.csv
└── DASCO_v2.5_Viruses.csv

The files saved above can be quite large (> 3GB) and may be difficult to handle in some software. Therefore, we recommend selecting a subset including only the groups of interest. For instance, the following code chunk show how to extract vertebrates from the DASCO dataset, which is done by first extracting tetrapods and then fish.

To extract tetrapod species (amphibians, reptiles, birds and mammals) we can use the following code:

Code

# Extract DASCO records for tetrapod species
DASCO_Tetrap <- subset(DASCO_tax, class %in% 
                       c("Amphibia", "Aves", "Squamata", 
                         "Crocodylia", "Testudines","Mammalia"))

The following code can be used to extract fish species (Actinopterygii):

Code

# Orders of Actinopterygii:
Actinop <- c("Acipenseriformes","Albuliformes", "Amiiformes",
            "Anguilliformes", "Atheriniformes", "Aulopiformes", 
            "Beloniformes", "Beryciformes", "Characiformes",
            "Clupeiformes", "Cypriniformes", "Cyprinodontiformes",
            "Elopiformes", "Esociformes", "Gadiformes",
            "Gasterosteiformes", "Gonorynchiformes", "Lepisosteiformes",
            "Mugiliformes","Osmeriformes", "Osteoglossiformes",
            "Perciformes", "Percopsiformes", "Pleuronectiformes",
            "Salmoniformes", "Scorpaeniformes", "Siluriformes",
            "Synbranchiformes", "Syngnathiformes", "Tetraodontiformes")

# extract subset
DASCO_Fish <- subset(DASCO_tax, 
                      order %in% Actinop)

# ADD Class Name
DASCO_Fish$class <- "Actinopterigii"

We can now combine all vertebrates and save them as a single .csv file:

Code

# Combine all vertebrates
DASCO_Verts <- rbind(DASCO_Tetrap, DASCO_Fish)

# save csv
fwrite(DASCO_Verts, file.path(local_folder, "DASCO_v2.5_Vertebrates.csv"))

Session information

Click to see

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.0 (2025-04-11)
 os       Ubuntu 22.04.5 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Costa_Rica
 date     2025-10-15
 pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
 quarto   1.7.31 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cachem        1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
 cli           3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
 crayon        1.5.3   2024-06-20 [1] CRAN (R 4.5.0)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.5.0)
 digest        0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
 ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
 evaluate      1.0.5   2025-08-27 [1] CRAN (R 4.5.0)
 fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
 fs            1.6.6   2025-04-12 [1] CRAN (R 4.5.0)
 glue          1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
 htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets   1.6.4   2023-12-06 [1] RSPM (R 4.5.0)
 httpuv        1.6.16  2025-04-16 [1] RSPM (R 4.5.0)
 jsonlite      2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
 knitr         1.50    2025-03-16 [1] CRAN (R 4.5.0)
 later         1.4.2   2025-04-08 [1] RSPM (R 4.5.0)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
 magrittr      2.0.4   2025-09-12 [1] CRAN (R 4.5.0)
 memoise       2.0.1   2021-11-26 [3] CRAN (R 4.1.2)
 mime          0.13    2025-03-17 [1] CRAN (R 4.5.0)
 miniUI        0.1.2   2025-04-17 [3] CRAN (R 4.5.0)
 pkgbuild      1.4.8   2025-05-26 [1] CRAN (R 4.5.0)
 pkgload       1.4.0   2024-06-28 [1] CRAN (R 4.5.0)
 profvis       0.4.0   2024-09-20 [1] CRAN (R 4.5.0)
 promises      1.3.3   2025-05-29 [1] RSPM (R 4.5.0)
 purrr         1.1.0   2025-07-10 [1] CRAN (R 4.5.0)
 R6            2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
 Rcpp          1.1.0   2025-07-02 [1] CRAN (R 4.5.0)
 remotes       2.5.0   2024-03-17 [1] CRAN (R 4.5.0)
 rlang         1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown     2.30    2025-09-28 [1] CRAN (R 4.5.0)
 rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.5.0)
 sessioninfo   1.2.3   2025-02-05 [3] CRAN (R 4.5.0)
 shiny         1.10.0  2024-12-14 [1] CRAN (R 4.5.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.5.0)
 usethis       3.1.0   2024-11-26 [3] CRAN (R 4.5.0)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.5.0)
 xfun          0.53    2025-08-19 [1] CRAN (R 4.5.0)
 xtable        1.8-4   2019-04-21 [3] CRAN (R 4.0.1)
 yaml          2.3.10  2024-07-26 [1] CRAN (R 4.5.0)

 [1] /home/m/R/x86_64-pc-linux-gnu-library/4.5
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────