library(WorldFlora)
library(data.table)
library(readxl)
The WorldFlora package (Kindt 2020) was originally designed to use the taxonomic backbone data of World Flora Online (Borsch et al. 2020; www.worldfloraonline.org). A new function new.backbone introduced in version 1.8 of the package now allows to use alternative taxonomic backbone data sets. What is required is that the new backbone data includes a key subset of variables that correspond to variables of World Flora Online.
The examples here will standardize names of mammal species via the Mammal Diversity Database (version 1.5 of 2021; Burgin et al. 2018). Version 1.5 of the database was downloaded from the website, copied locally and then loaded via the following script.
data.dir <- "E:\\Roeland\\R\\World Flora Online\\2021"
MDD <- read.csv(paste0(data.dir, "//MDD_v1.5_6554species.csv"),
encoding="UTF-8")
nrow(MDD)
## [1] 6554
First a data.table is created with current names.
The key variables required to create a new backbone are:
Another key variable is optional to allow also to match naming authorities (here: authoritySpeciesAuthor).
MDD.data1 <- data.table(id = paste0("C", MDD$id),
MDD[, c("sciName",
"mainCommonName",
"authoritySpeciesAuthor",
"authoritySpeciesYear",
"family")])
MDD.data1$currentID <- as.character("", nrow(MDD.data1))
MDD.data1$status <- as.character("Accepted", nrow(MDD.data1))
head(MDD.data1)
## id sciName mainCommonName
## 1: C1000001 Ornithorhynchus_anatinus Platypus
## 2: C1000002 Tachyglossus_aculeatus Short-beaked Echidna
## 3: C1000003 Zaglossus_attenboroughi Attenborough's Long-beaked Echidna
## 4: C1000004 Zaglossus_bartoni Eastern Long-beaked Echidna
## 5: C1000005 Zaglossus_bruijnii Western Long-beaked Echidna
## 6: C1000006 Caenolestes_caniventer Gray-bellied Shrew-opossum
## authoritySpeciesAuthor authoritySpeciesYear family currentID
## 1: G. Shaw 1799 ORNITHORHYNCHIDAE
## 2: G. Shaw 1792 TACHYGLOSSIDAE
## 3: Flannery & Groves 1998 TACHYGLOSSIDAE
## 4: O. Thomas 1907 TACHYGLOSSIDAE
## 5: W. Peters & Doria 1876 TACHYGLOSSIDAE
## 6: Anthony 1921 CAENOLESTIDAE
## status
## 1: Accepted
## 2: Accepted
## 3: Accepted
## 4: Accepted
## 5: Accepted
## 6: Accepted
Via the currentID field, synomyms are linked with the current name.
First a subset is created where the field of MSW3_sciName is different from the current name.
Next the data is prepared so that the currentID provides the ID for the current name.
MDD.data2 <- MDD[MDD$sciName != MDD$MSW3_sciName, ]
MDD.data2 <- MDD.data2[is.na(MDD.data2$MSW3_sciName) == FALSE, ]
nrow(MDD.data2)
## [1] 576
MDD.data3 <- data.table(id = paste0("S", MDD.data2$id),
sciName = MDD.data2$MSW3_sciName,
mainCommonName = MDD.data2$mainCommonName,
authoritySpeciesAuthor = rep("", nrow(MDD.data2)),
authoritySpeciesYear = rep("", nrow(MDD.data2)),
family = rep("", nrow(MDD.data2)),
currentID = paste0("C", MDD.data2$id),
status = rep("Synonym", nrow(MDD.data2)))
head(MDD.data3)
## id sciName mainCommonName
## 1: S1000005 Zaglossus_bruijni Western Long-beaked Echidna
## 2: S1000034 Philander_frenatus Southern Four-eyed Opossum
## 3: S1000036 Micoureus_alstoni Alston's Woolly Mouse Opossum
## 4: S1000038 Micoureus_constantiae White-bellied Woolly Mouse Opossum
## 5: S1000039 Micoureus_demerarae North-eastern Woolly Mouse Opossum
## 6: S1000044 Marmosa_quichua Western Amazonian Mouse Opossum
## authoritySpeciesAuthor authoritySpeciesYear family currentID status
## 1: C1000005 Synonym
## 2: C1000034 Synonym
## 3: C1000036 Synonym
## 4: C1000038 Synonym
## 5: C1000039 Synonym
## 6: C1000044 Synonym
The synonym data is now in the right format:
MDD.data3[MDD.data3$currentID == "C1000005", c("id", "sciName", "status")]
## id sciName status
## 1: S1000005 Zaglossus_bruijni Synonym
MDD.data1[MDD.data1$id == "C1000005", c("id", "sciName", "status")]
## id sciName status
## 1: C1000005 Zaglossus_bruijnii Accepted
MDD.data3[MDD.data3$currentID == "C1000452", c("id", "sciName", "status")]
## id sciName status
## 1: S1000452 Elephantulus_rufescens Synonym
MDD.data1[MDD.data1$id == "C1000452", c("id", "sciName", "status")]
## id sciName status
## 1: C1000452 Galegeeska_rufescens Accepted
Now the data of current names can be combined with data of synonyms:
MDD.dat <- rbind(MDD.data1, MDD.data3)
Now the new function of new.backbone can be used.
MDD.data <- new.backbone(MDD.dat,
taxonID="id",
scientificName="sciName",
scientificNameAuthorship="authoritySpeciesAuthor",
acceptedNameUsageID = "currentID",
taxonomicStatus = "status")
head(MDD.data)
## taxonID scientificName scientificNameAuthorship
## 1: C1000001 Ornithorhynchus_anatinus G. Shaw
## 2: C1000002 Tachyglossus_aculeatus G. Shaw
## 3: C1000003 Zaglossus_attenboroughi Flannery & Groves
## 4: C1000004 Zaglossus_bartoni O. Thomas
## 5: C1000005 Zaglossus_bruijnii W. Peters & Doria
## 6: C1000006 Caenolestes_caniventer Anthony
## acceptedNameUsageID taxonomicStatus id sciName
## 1: Accepted C1000001 Ornithorhynchus_anatinus
## 2: Accepted C1000002 Tachyglossus_aculeatus
## 3: Accepted C1000003 Zaglossus_attenboroughi
## 4: Accepted C1000004 Zaglossus_bartoni
## 5: Accepted C1000005 Zaglossus_bruijnii
## 6: Accepted C1000006 Caenolestes_caniventer
## mainCommonName authoritySpeciesAuthor
## 1: Platypus G. Shaw
## 2: Short-beaked Echidna G. Shaw
## 3: Attenborough's Long-beaked Echidna Flannery & Groves
## 4: Eastern Long-beaked Echidna O. Thomas
## 5: Western Long-beaked Echidna W. Peters & Doria
## 6: Gray-bellied Shrew-opossum Anthony
## authoritySpeciesYear family currentID status
## 1: 1799 ORNITHORHYNCHIDAE Accepted
## 2: 1792 TACHYGLOSSIDAE Accepted
## 3: 1998 TACHYGLOSSIDAE Accepted
## 4: 1907 TACHYGLOSSIDAE Accepted
## 5: 1876 TACHYGLOSSIDAE Accepted
## 6: 1921 CAENOLESTIDAE Accepted
tail(MDD.data)
## taxonID scientificName scientificNameAuthorship
## 1: S1006418 Lagenorhynchus_acutus
## 2: S1006426 Lagenorhynchus_australis
## 3: S1006427 Lagenorhynchus_cruciger
## 4: S1006428 Lagenorhynchus_obliquidens
## 5: S1006429 Lagenorhynchus_obscurus
## 6: S1006461 Physeter_catodon
## acceptedNameUsageID taxonomicStatus id sciName
## 1: C1006418 Synonym S1006418 Lagenorhynchus_acutus
## 2: C1006426 Synonym S1006426 Lagenorhynchus_australis
## 3: C1006427 Synonym S1006427 Lagenorhynchus_cruciger
## 4: C1006428 Synonym S1006428 Lagenorhynchus_obliquidens
## 5: C1006429 Synonym S1006429 Lagenorhynchus_obscurus
## 6: C1006461 Synonym S1006461 Physeter_catodon
## mainCommonName authoritySpeciesAuthor authoritySpeciesYear
## 1: Atlantic White-sided Dolphin
## 2: Peale's Dolphin
## 3: Hourglass Dolphin
## 4: Pacific White-sided Dolphin
## 5: Dusky Dolphin
## 6: Sperm Whale
## family currentID status
## 1: C1006418 Synonym
## 2: C1006426 Synonym
## 3: C1006427 Synonym
## 4: C1006428 Synonym
## 5: C1006429 Synonym
## 6: C1006461 Synonym
In a last step, underscores (_) are removed from the species names.
MDD.data$scientificName <- gsub(pattern="_", replacement=" ", x=MDD.data$scientificName)
In the first example, a list of species names from a study on factors governing patterns of biodiversity in the global tropics and subtropics (Rowan et al. 2020) is checked. Data from the supporting information Dataset_S01 was first copied to a local drive.
PNAS <- data.frame(read_excel(paste0(data.dir, "//pnas.1910489116.sd01.xlsx"),
sheet="PA Data"))
PNAS <- PNAS[, c(1:4)]
nrow(PNAS)
## [1] 277
head(PNAS)
## Order Family Genus Species
## 1 Afrosoricida Tenrecidae Potamogale Potamogale_velox
## 2 Artiodactyla Bovidae Addax Addax_nasomaculatus
## 3 Artiodactyla Bovidae Aepyceros Aepyceros_melampus
## 4 Artiodactyla Bovidae Alcelaphus Alcelaphus_buselaphus
## 5 Artiodactyla Bovidae Ammotragus Ammotragus_lervia
## 6 Artiodactyla Bovidae Antidorcas Antidorcas_marsupialis
PNAS$Species <- gsub(pattern="_", replacement=" ", x=PNAS$Species)
The species names can now be checked.
PNAS.match <- WFO.one(WFO.match(PNAS,
WFO.data = MDD.data,
spec.name = "Species"))
## Fuzzy matches for Cephalophus harveyi were only found for first term
## Fuzzy matches for Cephalophus harveyi were: Cephalopachus bancanus, Cephalophus adersi, Cephalophus brookei, Cephalophus callipygus, Cephalophus dorsalis, Cephalophus jentinki, Cephalophus leucogaster, Cephalophus natalensis, Cephalophus niger, Cephalophus nigrifrons, Cephalophus ogilbyi, Cephalophus rufilatus, Cephalophus silvicultor, Cephalophus spadix, Cephalophus weynsi, Cephalophus zebra, Elaphodus cephalophus
## Best fuzzy matches for Cephalophus harveyi were: Cephalophus adersi, Cephalophus ogilbyi
## Fuzzy matches for Piliocolobus temminckii were only found for first term
## Fuzzy matches for Piliocolobus temminckii were: Piliocolobus badius, Piliocolobus bouvieri, Piliocolobus epieni, Piliocolobus foai, Piliocolobus gordonorum, Piliocolobus kirkii, Piliocolobus langi, Piliocolobus lulindicus, Piliocolobus oustaleti, Piliocolobus parmentieri, Piliocolobus pennantii, Piliocolobus preussi, Piliocolobus rufomitratus, Piliocolobus semlikiensis, Piliocolobus tephrosceles, Piliocolobus tholloni, Piliocolobus waldronae
## Best fuzzy matches for Piliocolobus temminckii were: Piliocolobus kirkii, Piliocolobus pennantii
## Fuzzy matches for Galagoides cocos were only found for first term
## Fuzzy matches for Galagoides cocos were: Galagoides demidoff, Galagoides kumbirensis, Galagoides thomasi
## Best fuzzy matches for Galagoides cocos were: Galagoides thomasi
## Fuzzy matches for Galagoides granti were only found for first term
## Fuzzy matches for Galagoides granti were: Galagoides demidoff, Galagoides kumbirensis, Galagoides thomasi
## Best fuzzy matches for Galagoides granti were: Galagoides thomasi
## Fuzzy matches for Galagoides zanzibaricus were only found for first term
## Fuzzy matches for Galagoides zanzibaricus were: Galagoides demidoff, Galagoides kumbirensis, Galagoides thomasi
## Best fuzzy matches for Galagoides zanzibaricus were: Galagoides kumbirensis, Galagoides thomasi
##
## Checking new accepted IDs
## Different candidates for original record # 12, including Cephalophus adersi
## Smallest ID candidates for 12 were: Cephalophus adersi, Cephalophus ogilbyi
## Selected record with smallest ID for record # 12
## Different candidates for original record # 232, including Piliocolobus kirkii
## Smallest ID candidates for 232 were: Piliocolobus kirkii, Piliocolobus pennantii
## Selected record with smallest ID for record # 232
## Different candidates for original record # 247, including Galagoides kumbirensis
## Smallest ID candidates for 247 were: Galagoides kumbirensis, Galagoides thomasi
## Selected record with smallest ID for record # 247
The results show that for the majority of species, there was an exact match.
For five species, it seems that these are not listed in the Mammal Diversity Database.
nrow(PNAS.match[PNAS.match$Fuzzy == TRUE, ] )
## [1] 5
PNAS.match[PNAS.match$Fuzzy == TRUE, ]
## Order Family Genus Species
## 12 Artiodactyla Bovidae Cephalophus Cephalophus harveyi
## 232 Primates Cercopithecidae Piliocolobus Piliocolobus temminckii
## 243 Primates Galagidae Galagoides Galagoides cocos
## 245 Primates Galagidae Galagoides Galagoides granti
## 247 Primates Galagidae Galagoides Galagoides zanzibaricus
## Species.ORIG Squished Brackets.detected Number.detected Unique
## 12 Cephalophus harveyi FALSE FALSE FALSE FALSE
## 232 Piliocolobus temminckii FALSE FALSE FALSE FALSE
## 243 Galagoides cocos FALSE FALSE FALSE TRUE
## 245 Galagoides granti FALSE FALSE FALSE TRUE
## 247 Galagoides zanzibaricus FALSE FALSE FALSE FALSE
## Matched Fuzzy Fuzzy.toomany Fuzzy.two Fuzzy.one Fuzzy.dist OriSeq Subseq
## 12 TRUE TRUE FALSE FALSE TRUE 5 12 1
## 232 TRUE TRUE FALSE FALSE TRUE 6 232 1
## 243 TRUE TRUE FALSE FALSE TRUE 5 243 1
## 245 TRUE TRUE FALSE FALSE TRUE 6 245 1
## 247 TRUE TRUE FALSE FALSE TRUE 10 247 1
## taxonID scientificName
## 12 C1006214 Cephalophus adersi
## 232 C1000646 Piliocolobus kirkii
## 243 C1001048 Galagoides thomasi
## 245 C1001048 Galagoides thomasi
## 247 C1001047 Galagoides kumbirensis
## scientificNameAuthorship
## 12 O. Thomas
## 232 J. E. Gray
## 243 D. G. Elliot
## 245 D. G. Elliot
## 247 Svensson, Bersacola, Mills, Munds, Nijman, Perkin, Masters, Couette, Nekaris, & Bearder
## acceptedNameUsageID taxonomicStatus id sciName
## 12 Accepted C1006214 Cephalophus_adersi
## 232 Accepted C1000646 Piliocolobus_kirkii
## 243 Accepted C1001048 Galagoides_thomasi
## 245 Accepted C1001048 Galagoides_thomasi
## 247 Accepted C1001047 Galagoides_kumbirensis
## mainCommonName
## 12 Aders's Duiker
## 232 Zanzibar Red Colobus
## 243 Thomas's Dwarf Galago
## 245 Thomas's Dwarf Galago
## 247 Angolan Dwarf Galago
## authoritySpeciesAuthor
## 12 O. Thomas
## 232 J. E. Gray
## 243 D. G. Elliot
## 245 D. G. Elliot
## 247 Svensson, Bersacola, Mills, Munds, Nijman, Perkin, Masters, Couette, Nekaris, & Bearder
## authoritySpeciesYear family currentID status Hybrid New.accepted
## 12 1918 BOVIDAE Accepted FALSE
## 232 1868 CERCOPITHECIDAE Accepted FALSE
## 243 1907 GALAGIDAE Accepted FALSE
## 245 1907 GALAGIDAE Accepted FALSE
## 247 2017 GALAGIDAE Accepted FALSE
## Old.status Old.ID Old.name One.Reason
## 12 smallest ID
## 232 smallest ID
## 243
## 245
## 247 smallest ID
PNAS.match[PNAS.match$Fuzzy == TRUE, c("Species",
"Fuzzy.dist",
"scientificName",
"mainCommonName")]
## Species Fuzzy.dist scientificName
## 12 Cephalophus harveyi 5 Cephalophus adersi
## 232 Piliocolobus temminckii 6 Piliocolobus kirkii
## 243 Galagoides cocos 5 Galagoides thomasi
## 245 Galagoides granti 6 Galagoides thomasi
## 247 Galagoides zanzibaricus 10 Galagoides kumbirensis
## mainCommonName
## 12 Aders's Duiker
## 232 Zanzibar Red Colobus
## 243 Thomas's Dwarf Galago
## 245 Thomas's Dwarf Galago
## 247 Angolan Dwarf Galago
In the second example, a list of species from a global study on the range of species that is covered by terrestrial protected areas (Pacifici, Di Marco & Watson 2020) is checked.
From the supporting information, Table S2 was copied to a local Excel file and then loaded into R.
CONL <- data.frame(read_excel(paste0(data.dir, "//conl1212748 Table S2.xlsx")))
CONL <- CONL[, c(1:2)]
nrow(CONL)
## [1] 237
head(CONL)
## Order Binomial
## 1 Afrosoricida Chrysospalax trevelyani
## 2 Carnivora Acinonyx jubatus
## 3 Carnivora Ailuropoda melanoleuca
## 4 Carnivora Aonyx cinereus
## 5 Carnivora Canis adustus
## 6 Carnivora Canis latrans
Now the names can be checked against the Mammal Diversity Database.
CONL.match <- WFO.one(WFO.match(CONL,
WFO.data = MDD.data,
spec.name = "Binomial"))
## Fuzzy matches for Alces americanus were only found for first term
## Fuzzy matches for Alces americanus were: Alcelaphus buselaphus, Alces alces
## Best fuzzy matches for Alces americanus were: Alces alces
## Fuzzy matches for Isodoon auratus were: Isoodon auratus
## Fuzzy matches for Microtus breweri were only found for first term
## Fuzzy matches for Microtus breweri were: Microtus abbreviatus, Microtus afghanus, Microtus agrestis, Microtus anatolicus, Microtus arvalis, Microtus brachycercus, Microtus bucharensis, Microtus cabrerae, Microtus californicus, Microtus canicaudus, Microtus chrotorrhinus, Microtus daghestanicus, Microtus dogramacii, Microtus drummondii, Microtus dukecampbelli, Microtus duodecimcostatus, Microtus elbeyli, Microtus felteni, Microtus gerbii, Microtus guatemalensis, Microtus guentheri, Microtus hartingi, Microtus ilaeus, Microtus irani, Microtus juldaschi, Microtus kermanensis, Microtus lavernedii, Microtus liechtensteini, Microtus longicaudus, Microtus lusitanicus, Microtus lydius, Microtus majori, Microtus mexicanus, Microtus miurus, Microtus mogollonensis, Microtus montanus, Microtus multiplex, Microtus mustersi, Microtus mystacinus, Microtus nebrodensis, Microtus oaxacensis, Microtus obscurus, Microtus ochrogaster, Microtus oregoni, Microtus paradoxus, Microtus pennsylvanicus, Microtus pinetorum, Microtus qazvinensis, Microtus quasiater, Microtus richardsoni, Microtus rozianus, Microtus savii, Microtus schelkovnikovi, Microtus socialis, Microtus subterraneus, Microtus tatricus, Microtus thomasi, Microtus townsendii, Microtus transcaspicus, Microtus umbrosus, Microtus xanthognathus, Macrotus californicus, Macrotus waterhousii, Microtus evoronensis, Microtus fortis, Microtus kikuchii, Microtus limnophilus, Microtus maximowiczii, Microtus middendorffii, Microtus mongolicus, Microtus montebelli, Microtus mujanensis, Microtus oeconomus, Microtus sachalinensis, Microtus gregalis, Microtus gerbei, Microtus levis, Microtus clarkei
## Best fuzzy matches for Microtus breweri were: Microtus oregoni
##
## Checking new accepted IDs
Again for the majority of species, there was an exact match.
nrow(CONL.match[CONL.match$Fuzzy == TRUE, ] )
## [1] 3
CONL.match[CONL.match$Fuzzy == TRUE, ]
## Order Binomial Binomial.ORIG Squished
## 57 Cetartiodactyla Alces americanus Alces americanus FALSE
## 139 Peramelemorphia Isodoon auratus Isodoon auratus FALSE
## 197 Rodentia Microtus breweri Microtus breweri FALSE
## Brackets.detected Number.detected Unique Matched Fuzzy Fuzzy.toomany
## 57 FALSE FALSE TRUE TRUE TRUE FALSE
## 139 FALSE FALSE TRUE TRUE TRUE FALSE
## 197 FALSE FALSE TRUE TRUE TRUE FALSE
## Fuzzy.two Fuzzy.one Fuzzy.dist OriSeq Subseq taxonID scientificName
## 57 FALSE TRUE 7 57 1 C1006285 Alces alces
## 139 FALSE FALSE 2 139 1 C1000233 Isoodon auratus
## 197 FALSE TRUE 4 197 1 C1002094 Microtus oregoni
## scientificNameAuthorship acceptedNameUsageID taxonomicStatus id
## 57 Linnaeus Accepted C1006285
## 139 Ramsay Accepted C1000233
## 197 Bachman Accepted C1002094
## sciName mainCommonName authoritySpeciesAuthor
## 57 Alces_alces Moose Linnaeus
## 139 Isoodon_auratus Golden Bandicoot Ramsay
## 197 Microtus_oregoni Creeping Vole Bachman
## authoritySpeciesYear family currentID status Hybrid New.accepted
## 57 1758 CERVIDAE Accepted FALSE
## 139 1887 PERAMELIDAE Accepted FALSE
## 197 1839 CRICETIDAE Accepted FALSE
## Old.status Old.ID Old.name One.Reason
## 57
## 139
## 197
CONL.match[CONL.match$Fuzzy == TRUE, c("Binomial",
"Fuzzy.dist",
"scientificName",
"mainCommonName")]
## Binomial Fuzzy.dist scientificName mainCommonName
## 57 Alces americanus 7 Alces alces Moose
## 139 Isodoon auratus 2 Isoodon auratus Golden Bandicoot
## 197 Microtus breweri 4 Microtus oregoni Creeping Vole
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 data.table_1.12.8 WorldFlora_1.9
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 crayon_1.3.4 digest_0.6.25 cellranger_1.1.0
## [5] lifecycle_0.2.0 magrittr_1.5 evaluate_0.14 pillar_1.4.4
## [9] rlang_0.4.8 stringi_1.4.6 vctrs_0.3.4 ellipsis_0.3.1
## [13] rmarkdown_2.3 tools_4.0.2 stringr_1.4.0 xfun_0.15
## [17] yaml_2.2.1 compiler_4.0.2 pkgconfig_2.0.3 htmltools_0.5.1.1
## [21] knitr_1.28 tibble_3.0.1