1 Packages needed

library(WorldFlora)
library(data.table)
library(stringr)

2 Introduction

In previous posts (see here and here), I showed how the WorldFlora package can be used to standardize names from GlobalTreeSearch with the taxonomic backbone data from World Flora Online or the World Checklist of Vascular Plants. (My original inspiration to develop WorldFlora came during R scripting while updating the Agroforestry Species Switchboard; a next release of the ‘Switchboard’ was made available in 2022).

Here I show how standardization of the GlobalTreeSearch names can be done more quickly via the new function of WFO.match.fuzzyjoin. Using the new function comes with a cost of not being able to process larger data sets due to to memory size problems (on my machine, the error was Error: cannot allocate vector of size 6.9 Gb). But this problem can be relatively easily be circumvented by splitting the data in different data sets, as shown here.

What I also show is how some of the manual checks after fuzzy matching was completed, such as accepting fuzzy matches for species names that only differ in ending as -a, -us or -um (these changes indicate different genders; see the International Code of Botanical Nomenclature for various examples), can be facilitated by a customized function.

3 Load the taxonomic backbone data file of World Flora Online

I used the latest available version of World Flora Online of v.2022.12. The download was done earlier, as was providing the location of the file to WFO.download via its argument ‘WFO.file’.

In the taxonomic backbone, World Flora Online lists about half a million current species names.

WFO.remember()
## Data sourced from: E:\Roeland\WorldFloraOnline\WFO 2022\classification.txt (Sun Jan  8 08:49:17 2023)
## Reading WFO data
## Warning in data.table::fread(WFO.file1, encoding = "UTF-8"): Found and resolved
## improper quoting out-of-sample. First healed line 117: <<wfo-0000000117
## GCC-1AF25765-5E36-4AED-8F64-12BB4F58DEC0 Hieracium onosmoides subsp.
## sphaerianthum SUBSPECIES wfo-0000034880 (Arv.-Touv.) Zahn Asteraceae Hieracium
## onosmoides sphaerianthum subsp. "Zahn, in Engler, Pflanzenr. 82. 1923." 1676
## 1923 ACCEPTED wfo-0000118008 More details could be found in <a href=http://
## www.theplantlist.org/tpl1.1/record/gcc-10011 >The Plant List v.1.1.</a>
## Originally in <a href=http://www.theplantlist.org/tpl/record/gcc-10011 >The
## Plant List v.1.0</a> 2012-02->>. If the fields are not quoted (e.g. field
## separator does not appear within any field), try quote="" to avoid this warning.
## The WFO data is now available from WFO.data
nrow(WFO.data)
## [1] 1425061
nrow(WFO.data[WFO.data$taxonRank == "SPECIES", ])
## [1] 1096665
nrow(WFO.data[WFO.data$taxonRank == "SPECIES" & WFO.data$acceptedNameUsageID == "", ])
## [1] 501257

4 Load the complete list of species of GlobalTreeSearch

The downloadable complete list of tree species (version 1.6) was obtained from Global Tree Search. The list includes close to 60,000 species names (so about 10 percent of current species names in World Flora Online).

# GTS.file <- choose.files()
GTS.file <- "E:\\Roeland\\WorldFloraOnline\\WFO 2022\\global_tree_search_trees_1_6.csv"
GTS <- fread(GTS.file, header=TRUE, encoding="UTF-8")
head(GTS)
##               TaxonName                                Author V3 V4
## 1:     Abarema abbottii (Rose & Leonard) Barneby & J.W.Grimes NA NA
## 2:      Abarema acreana                   (J.F.Macbr.) L.Rico NA NA
## 3:   Abarema adenophora          (Ducke) Barneby & J.W.Grimes NA NA
## 4:    Abarema alexandri           (Urb.) Barneby & J.W.Grimes NA NA
## 5: Abarema asplenifolia        (Griseb.) Barneby & J.W.Grimes NA NA
## 6:   Abarema auriculata         (Benth.) Barneby & J.W.Grimes NA NA
##    Citation:  GlobalTreeSearch online database. Botanic Gardens Conservation International. Richmond, U.K. Available at www.bgci.org. Accessed on DD/MM/YYYY. DOI: 10.13140/RG.2.2.34206.61761
## 1:                                                                                                                                                            DOI: 10.13140/RG.2.2.34206.61761
## 2:                                                                                                                                                                                            
## 3:                                                                                                                                                                                            
## 4:                                                                                                                                                                                            
## 5:                                                                                                                                                                                            
## 6:
GTS <- GTS[, c("TaxonName", "Author")]
nrow(GTS)
## [1] 57958

5 Use WFO.match.fuzzyjoin

Everything is in place now to start the matching process. To avoid a crash of the new WFO.match.fuzzyjoin function, however, the data needs to be split. This can be done relatively easily via the cut function.

It still takes over an hour for the matching to be completed, but this is considerably faster than previously.

cuts <- cut(c(1:nrow(GTS)), breaks=10, labels=FALSE)
cut.i <- sort(unique(cuts))

start.time <- Sys.time()

for (i in 1:length(cut.i)) {

cat(paste("Cut: ", i, "\n"))  
    
GTS.i <- WFO.one(WFO.match.fuzzyjoin(spec.data=GTS[cuts==cut.i[i], ],
                                     WFO.data=WFO.data,
                                     spec.name="TaxonName",
                                     Authorship="Author",
                                     fuzzydist.max=3),
                 verbose=FALSE)

if (i==1) {
  GTS.WFO <- GTS.i
}else{
  GTS.WFO <- rbind(GTS.WFO, GTS.i)
}

}
## Cut:  1
## Checking for fuzzy matches for 113 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  2
## Checking for fuzzy matches for 94 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  3
## Checking for fuzzy matches for 93 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  4
## Checking for fuzzy matches for 99 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  5
## Checking for fuzzy matches for 104 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  6
## Checking for fuzzy matches for 137 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  7
## Checking for fuzzy matches for 245 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  8
## Checking for fuzzy matches for 154 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  9
## Checking for fuzzy matches for 179 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  10
## Checking for fuzzy matches for 146 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
end.time <- Sys.time()
end.time - start.time # 1.113053 hours
## Time difference of 1.22014 hours

5.1 Breakdown of matches

The results can be subdivided into species that could not be matched, species that could be directly matched and species with fuzzy matches.

# not matched
nrow(GTS.WFO[GTS.WFO$Matched == FALSE, ])
## [1] 410
# directly matched
nrow(GTS.WFO[GTS.WFO$Matched == TRUE & GTS.WFO$Fuzzy == FALSE, ])
## [1] 56594
GTS.fuzzy <- GTS.WFO[GTS.WFO$Fuzzy == TRUE, ]
nrow(GTS.fuzzy)
## [1] 954

5.2 Matches with distance = 1

Roughly 45 percent of the species with a fuzzy match had a matching distance of 1, indicating a difference of only 1 character.

nrow(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, ])
## [1] 439
head(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")][1:30, ])
##                            TaxonName                  scientificName Old.name
## 40                Abarema microcalyx               Abarema microcaly         
## 252                  Acacia cretacea                Acacia creatacea         
## 1319       Adinandra macquilingensis        Adinandra maquilingensis         
## 1396           Aegiphila luschnathii            Aegiphila luschnatii         
## 1456            Aeschynomene burttii           Aeschynomene burttiie         
## 1460 Aeschynomene pararubrofarinacea Aeschynomene pararuhrofarinacea         
##             taxonID
## 40   wfo-0000194017
## 252  wfo-0000201128
## 1319 wfo-0000520935
## 1396 wfo-0000811926
## 1456 wfo-0000173135
## 1460 wfo-0000173772
tail(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                    TaxonName        scientificName Old.name        taxonID
## 54807       Xylopia maccreae      Xylopia maccreai          wfo-0000428955
## 554113  Xylopia subdehiscens Xylopia sub-dehiscens          wfo-0000428719
## 55879     Xylosma kaalaensis     Xylosma kaalensis          wfo-0001063072
## 56697     Zabelia tyaihyonii     Zabelia tyaihyoni          wfo-0000430178
## 56798  Zanthoxylum amapaense  Zanthoxylum amapense          wfo-0000430332
## 596211  Zygocarpum caeruleum  Zygocarpum coeruleum          wfo-0000430418

5.3 Matches with distance = 2

Almost a third of species with fuzzy matching had a distance of 2.

nrow(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, ])
## [1] 276
head(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName        scientificName Old.name        taxonID
## 1028   Acropogon calcicolus   Acropogon calcicola          wfo-0000506268
## 1210 Adelobotrys macranthus Adelobotrys macrantha          wfo-0001080704
## 1721         Aidia congesta       Aidia congestum          wfo-0000931235
## 1743      Ailanthus excelsa    Ailanthus excelsus          wfo-0000524612
## 2770   Amphitecna kennedyae   Amphitecna kennedyi          wfo-0000780939
## 2954        Aniba canelilla       Aniba canellila          wfo-0000536813
tail(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName                scientificName             Old.name
## 404113  Vepris occidentalis               Rotala ramosior  Peplis occidentalis
## 42758     Villaria purpurea             Viscaria purpurea                     
## 43098      Virola marleneae               Virola marlenei                     
## 45569         Vitex urbanii Teijsmanniodendron ahernianum       Vitex curranii
## 461112 Vochysia condorensis        Vochysia guatemalensis Vochysia hondurensis
## 52259   Xanthophyllum laeve          Xanthophyllum laevis                     
##               taxonID
## 404113 wfo-0000404304
## 42758  wfo-0000422656
## 43098  wfo-0001085217
## 45569  wfo-0000321239
## 461112 wfo-0001146187
## 52259  wfo-0000428598

5.4 Matches of distance = 3

Just about a quarter of species were matched with distance = 3.

nrow(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 3, ])
## [1] 239
GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 3, 
          c("TaxonName", "scientificName", "Old.name", "taxonID")]
##                           TaxonName                        scientificName
## 773                   Acer iranicum                         Acer creticum
## 1783             Aiouea leptophylla                    Ocotea leptophylla
## 1849             Alangium denudatum                      Allium denudatum
## 1854               Alangium gracile                      Eryngium gracile
## 2418               Alnus lusitanica                     Prunus lusitanica
## 2430                 Alnus rohlenae                        Rubus rohlenae
## 3186               Annona oleifolia                     Annona cordifolia
## 3772            Arawakia lanceolata  Minuartia rupestris subsp. clementei
## 3776            Arawakia macrocarpa                  Minuartia macrocarpa
## 3783            Arawakia parvifolia                   Arenaria parvifolia
## 4314            Artocarpus montanus                   Gonocarpus montanus
## 4418        Aspidosperma huberianum               Aspidosperma tomentosum
## 5318        Barringtonia magnifolia               Barringtonia pinnifolia
## 5528             Beaucarnea olsonii                       Nolina watsonii
## 6931              Bribria apiculata                     Beyeria apiculata
## 6961                Bribria crenata                     Zyrphelis crenata
## 12311              Bursera zapoteca                        Bursera aptera
## 21331      Campomanesia sepalifolia                Campomanesia pubescens
## 23741  Capparidastrum cuatrecasanum             Morisonia cuatrecasasiana
## 24631               Caragana gobica                       Caragana sinica
## 27461            Casearia americana                    Discaria americana
## 28211               Casearia kigeri                      Casearia engleri
## 29271         Casearia yucatanensis            Senna pallida var. gaumeri
## 43921      Cinnamomum austrosinense            Cinnamomum austro-sinensis
## 57391               Coccoloba tunii                      Coccoloba buchii
## 8103                  Cordia megiae                        Cordia mexiana
## 13772              Coussarea mexiae                    Coussarea mexicana
## 16292      Craterispermum capitatum              Craterispermum aristatum
## 20362             Croton nirguensis                    Croton dinghuensis
## 20622          Croton perstipulatus                     Croton stipulatus
## 32192              Cyrtandra kinhoi                     Cyrtandra keithii
## 32262        Cyrtandra longistamina              Cryptandra longistaminea
## 41662             Desmopsis wendtii                     Ipomopsis wendtii
## 45222            Diospyros agnitser                      Diospyros anitae
## 50902             Diospyros robolot                    Diospyros discolor
## 51292            Diospyros sennenii                    Diospyros senensis
## 51632         Diospyros subargentea                    Diospyros argentea
## 6080               Drypetes louisii                       Drypetes dussii
## 24313                Durio connatus                       Durio carinatus
## 7015              Elaeocarpus avium                    Elaeocarpus badius
## 13973           Endiandra teschneri                Endiandra teschneriana
## 16273        Eremanthus reticulatus                Eremanthus auriculatus
## 16332        Eremanthus syncephalus                  Chresta pycnocephala
## 22992         Eucalyptus alatissima                 Eucalyptus plenissima
## 23803             Eucalyptus bunyip                     Eucalyptus dunnii
## 28683           Eucalyptus revelata                   Eucalyptus rugulata
## 32603              Eugenia crispula                      Begonia crispula
## 35753             Eugenia marleneae                        Luma apiculata
## 35983            Eugenia miragoanae                       Eugenia magoana
## 36503              Eugenia ochracea                     Eugenia chartacea
## 36662           Eugenia pachyadenia                    Eugenia macradenia
## 40233              Eumachia montana                       Fumaria montana
## 59462              Freziera trollii                      Freziera neillii
## 15115          Garcinia leptophylla                 Garcinia terpnophylla
## 15604                Grewia milleri                       Grimmia milleri
## 16834                  Guapira laxa                         Guapira noxia
## 19484         Guatteria turrialbana              Stenostomum turrialbanum
## 19504           Guatteria vallensis                     Guatteria allenii
## 28233              Helicia kingiana                     Triunia youngiana
## 29117       Heliotropium filiflorum              Heliotropium flaviflorum
## 30624      Herrania cuatrecasasiana                 Herrania cuatrecasana
## 32254         Hibiscus ankeranensis                Hibiscus ankaramyensis
## 32434              Hibiscus cooperi                     Hibiscus coulteri
## 36204          Homalium ovatifolium                  Homalium myrtifolium
## 36483             Homalium serratum                     Homalium dentatum
## 43224           Hyptidendron roseum                 Hyptidendron arboreum
## 53594              Ixora kalehensis                      Ixora balinensis
## 58194                Keetia davidii                       Swertia davidii
## 13618           Kurrimia paniculata                 Turpinia occidentalis
## 40517      Lasianthus linearifolius                  Lysiana linearifolia
## 12305          Linochilus fosbergii                 Microchilus fosbergii
## 12365          Linochilus rupestris                 Appendicula rupestris
## 22445       Lychnophorella santosii                  Lychnophora santosii
## 27385             Machilus coriacea                      Machilus sericea
## 32134          Magnolia betuliensis                  Magnolia betongensis
## 32235         Magnolia brasiliensis                  Magnolia braianensis
## 33319            Magnolia jaenensis                   Magnolia narinensis
## 33419           Magnolia juninensis                   Magnolia narinensis
## 33475          Magnolia kachinensis                   Magnolia narinensis
## 33855            Magnolia manuensis                   Magnolia panamensis
## 34005           Magnolia mindoensis    Magnolia sieboldii subsp. sinensis
## 34275                Magnolia ottoi               Magnolia kwangtungensis
## 34595       Magnolia quangninhensis                Magnolia guangnanensis
## 34945           Magnolia sonlaensis                   Magnolia shiluensis
## 42185       Matisia cuatrecasasiana                  Matisia cuatrecasana
## 48285           Memecylon biokoense                    Memecylon boinense
## 49655           Memecylon rovumense                   Memecylon korupense
## 53135             Miconia antillana                      Miconia angelana
## 53545              Miconia birimosa                       Miconia formosa
## 55125             Miconia galeottii                      Mimosa galeottii
## 55365             Miconia haemantha                     Miconia desmantha
## 55495           Miconia hirticaulis                    Miconia seticaulis
## 55795             Miconia kappellei                      Miconia kappleri
## 56665         Miconia neoamygdalina                   Miconia fasciculata
## 56775            Miconia ocampensis                      Miconia onaensis
## 57235             Miconia polyflora                     Miconia polyandra
## 58824            Miconia tricostata                      Miconia aristata
## 6520         Microtropis densiflora      Microtis media subsp. densiflora
## 6296            Monoon pachypetalum                   Monoon pachyphyllum
## 6966         Monteverdia chiapensis                  Maytenus chapadensis
## 6996          Monteverdia crassipes                  Pontederia crassipes
## 7086           Monteverdia elongata                  Pontederia crassipes
## 7486         Monteverdia planifolia                   Maytenus ebenifolia
## 12006              Myoporum semotum                     Myoporum insulare
## 12456             Myrcia acutissima                    Pinalia acutissima
## 12485                 Myrcia adunca                         Acacia adunca
## 12726             Myrcia amplifolia                     Myrcia ampliflora
## 12866              Myrcia arenicola                       Myrcia rupicola
## 13046                Myrcia barkeri                      Myrcia splendens
## 13256           Myrcia brevispicata                  Varronia curassavica
## 13325             Myrcia calyptrata                        Cordia dentata
## 13436            Myrcia celaenensis                     Myrcia petenensis
## 13466             Myrcia chionantha                       Myrcia monantha
## 13485            Myrcia chytraculia             Calyptranthes chytraculia
## 13526         Myrcia clarendonensis               Varronia clarendonensis
## 13726              Myrcia corticosa                       Myrcia tortuosa
## 13926           Myrcia cymatophylla                  Myrcia dermatophylla
## 13955               Myrcia decandra                       Cordia decandra
## 14365            Myrcia fasciculata                   Mycetia fasciculata
## 14506              Myrcia fawcettii                      Cordia elliptica
## 14686              Myrcia galanoana                       Myrcia gamaeana
## 14806              Myrcia glomerata                       Madia glomerata
## 15165             Myrcia hydrophila                     Myrcia petrophila
## 15286            Myrcia irregularis                    Kurzia irregularis
## 15365                 Myrcia krugii                   Myrcia fascicularis
## 15526              Myrcia legrandii                        Myrcia grandis
## 15926             Myrcia mayarensis                      Myrcia amapensis
## 16246              Myrcia mornicola                     Myrcia citrifolia
## 16396            Myrcia neocapitata                       Myrcia capitata
## 16436             Myrcia neocollina                     Myrcia guianensis
## 16486             Myrcia neoelegans                     Myrcia guianensis
## 16535             Myrcia neograndis                        Myrcia grandis
## 16556            Myrcia neohotteana                       Myrcia hotteana
## 16586         Myrcia neoinvolucrata                     Myrcia felisberti
## 16655          Myrcia neomyrcioides                     Myrcia myrcioides
## 16696           Myrcia neopalustris                      Myrcia palustris
## 16736             Myrcia neorubella                  Myrcia myrtillifolia
## 16746         Myrcia neosalicifolia                    Myrcia salicifolia
## 16766          Myrcia neosintenisii                      Myrcia fenzliana
## 16776             Myrcia neosmithii                         Myrcia selloi
## 16946                 Myrcia nodosa                         Myrcia cymosa
## 17076             Myrcia nummularia                      Vicia nummularia
## 17285                Myrcia ovoidea                          Myrcia ovina
## 17386           Myrcia peduncularis               Passiflora peduncularis
## 17466              Myrcia petricola                     Myrcia pineticola
## 17556              Myrcia pitoniana                        Myrcia doniana
## 17756              Myrcia pozasiana                     Myrcia thomasiana
## 17796              Myrcia protracta                      Cordia protracta
## 18205          Myrcia rufotomentosa                  Myrcia albotomentosa
## 18446            Myrcia siberiensis                    Myrcia sabaraensis
## 18756            Myrcia subcapitata                       Myrcia capitata
## 18926             Myrcia tenuiclada                     Myrcia tenuiflora
## 19406               Myrcia wilsonii                       Acacia wilsonii
## 22166               Myrsine brassii                       Myrsine brownii
## 28156        Neonauclea kranjiensis                 Neonauclea kraboensis
## 29826             Nolina brandegeei                 Polemonium brandegeei
## 29886            Nolina orbicularis                     Hoita orbicularis
## 29936            Nolina rodriguezii                       Sagina maritima
## 30746          Noronhia richardsiae                    Noronhia richardii
## 38946            Olinia chimanimani                      Olea chimanimani
## 45176              Ostrya chinensis                       Eurya chinensis
## 46596               Ouratea robusta                       Jurinea robusta
## 48206              Pachira moreirae                         Pachira morae
## 52086           Palicourea osaensis                  Palicourea paraensis
## 52746            Palicourea sucllii                     Palicourea pullei
## 52806              Palicourea tatei                     Palicourea patens
## 55606          Pandanus martinianus                   Pandanus marginatus
## 14456            Pinus vallartensis                      Pinus dalatensis
## 14526             Piparea spruceana                       Pilea spruceana
## 16427      Piptostigma macrophyllum               Piptostigma calophyllum
## 16937               Pisonia roqueae                         Pisonia rosea
## 21857             Pleroma canescens                    Peronema canescens
## 23147              Plinia rufiflora                     Plinia cauliflora
## 26347      Polyosma subintegrifolia                 Polyosma integrifolia
## 29828      Portulacaria carrissoana                 Portulaca carrissoana
## 334111            Praravinia nitida                     Praravinia mimica
## 357111         Protium balsamiferum                  Aeonium balsamiferum
## 44477           Psychotria hamifera                   Psychotria pilifera
## 46066          Psychotria ortiziana                   Psychotria orosiana
## 471111         Psychotria sublyrata                 Psychotria subcordata
## 49277           Pterospermum aureum                   Pterospermum fuscum
## 49446       Pterospermum havilandii                Pterospermum harmandii
## 52867              Quadrella indica                           Cordia myxa
## 54457           Quercus baolamensis                     Quercus blaoensis
## 545111           Quercus barrancana                     Quercus arkansana
## 54597           Quercus bidoupensis                    Quercus ×idzuensis
## 56167            Quercus honbaensis                   Quercus donnaiensis
## 57035              Quercus melissae                    Quercus lancifolia
## 7828    Rehderodendron macrophyllum                  Ixora schomburgkiana
## 38329  Rhododendron leigongshanense             Rhododendron gongshanense
## 48528         Rhododendron stanleyi                  Rhododendron baileyi
## 9048         Rondeletia roynaefolia                Rondeletia royenifolia
## 101210                Ruagea beckii                         Jungia beckii
## 10187                Ruagea obovata                         Dalea obovata
## 17798              Saurauia chaiana                       Saurauia tafana
## 179112             Saurauia corneri                      Saurauia roemeri
## 18078         Saurauia graciliflora Scurrula parasitica var. graciliflora
## 181112        Saurauia hispidicalyx                  Saurauia lepidicalyx
## 18139              Saurauia iliasii                      Saurauia klinkii
## 18157             Saurauia jeisinii                     Saurauia scabrida
## 18178               Saurauia joelii                       Saurauia poolei
## 18229               Saurauia juliae                      Saurauia molinae
## 18348            Saurauia latifolia                   Saurauia ilicifolia
## 18408               Saurauia leeana                       Saurauia tafana
## 18508              Saurauia linusii                      Saurauia klinkii
## 18658          Saurauia minutiflora                    Saurauia nudiflora
## 19098               Saurauia runiae                         Saurauia rufa
## 191210         Saurauia sammanniana                 Saurauia schumanniana
## 19258             Saurauia speciosa                     Sobralia speciosa
## 20428           Schefflera beamanii                     Schefflera glauca
## 20457          Schefflera bifurcata               Schefflera heterophylla
## 20788             Schefflera chanii                     Schefflera bangii
## 21078            Schefflera crenata                    Schefflera caudata
## 248210      Schinus weinmanniifolia               Schinus weinmannifolius
## 36465              Sloanea cruciata                       Sloanea cruenta
## 36977            Sloanea jaramilloi                    Brownea jaramilloi
## 37257                 Sloanea morii                         Sloanea lamii
## 410211          Sorbus acutiserrata                    Pyrus acutiserrata
## 43486                 Sorbus sellii                         Sorbus beckii
## 43768              Sorbus thayensis                      Sorbus zayuensis
## 57538         Symplocos juiyenensis                  Symplocos guianensis
## 577210         Symplocos limonensis                    Symplocos moaensis
## 60883           Syzygium barotsense                    Syzygium baramense
## 41039          Syzygium bengkulense                 Syzygium benguellense
## 39230           Syzygium komatiense                   Syzygium kalahiense
## 55139            Syzygium niassense                     Syzygium inasense
## 13878                 Tamarix minoa                         Tamarix ninae
## 26309               Tovomita nidiae                   Tovomita gracilipes
## 28549            Trichilia reynelii                     Trichilia pallida
## 30999           Tritaxis pauciflora                Hesperantha pauciflora
## 37477           Vadensea tenuifolia                   Tillandsia flexuosa
## 381310         Vantanea maculicarpa                   Vantanea macrocarpa
## 42689             Villaria coriacea                      Olearia coriacea
## 42869                Virola allenii                       Virola marlenei
## 43009                Virola fosteri                        Hiraea fosteri
## 468211           Vochysia peruviana                     Vochysia leguiana
## 47809           Warneckea albiflora                  Warneckea cauliflora
## 549113             Xylopia muricata                      Xylopia africana
## 56569                Yucca pinicola                        Yucca rupicola
##                               Old.name        taxonID
## 773                                    wfo-0000514162
## 1783                                   wfo-0000390566
## 1849                                   wfo-0000756094
## 1854                                   wfo-0000677939
## 2418                                   wfo-0000998700
## 2430                                   wfo-0000994316
## 3186                                   wfo-0000537718
## 3772               Arenaria lanceolata wfo-0000374850
## 3776               Arenaria macrocarpa wfo-0000374757
## 3783                                   wfo-0000546388
## 4314                                   wfo-0000706669
## 4418           Aspidosperma hilarianum wfo-0000291837
## 5318                                   wfo-0000922995
## 5528               Beaucarnea watsonii wfo-0000700468
## 6931                                   wfo-0000911620
## 6961                    Mairia crenata wfo-0000118134
## 12311                                  wfo-0000576119
## 21331          Campomanesia ovalifolia wfo-0000793699
## 23741  Capparidastrum cuatrecasasianum wfo-0001423797
## 24631                                  wfo-0000186069
## 27461                                  wfo-0000651539
## 28211                                  wfo-0000923944
## 29271              Cassia yucatanensis wfo-0000175298
## 43921                                  wfo-0000604901
## 57391                                  wfo-0000613029
## 8103                                   wfo-0000620732
## 13772                                  wfo-0000926140
## 16292                                  wfo-0000926715
## 20362                                  wfo-0000927832
## 20622                                  wfo-0000932450
## 32192                                  wfo-0000635454
## 32262                                  wfo-0000627232
## 41662                                  wfo-0001286738
## 45222                                  wfo-0000648500
## 50902                 Diospyros mabolo wfo-0000648780
## 51292                                  wfo-0000649737
## 51632                                  wfo-0000648512
## 6080                                   wfo-0000946459
## 24313                                  wfo-0000658061
## 7015                                   wfo-0000664050
## 13973                                  wfo-0000667714
## 16273                                  wfo-0000056834
## 16332         Eremanthus pycnocephalus wfo-0000087102
## 22992                                  wfo-0000955652
## 23803                                  wfo-0000954854
## 28683                                  wfo-0000336518
## 32603                                  wfo-0000823746
## 35753                  Eugenia palenae wfo-0000231064
## 35983                                  wfo-0000957988
## 36503                                  wfo-0000956783
## 36662                                  wfo-0000957957
## 40233                                  wfo-0000693316
## 59462                                  wfo-0001345281
## 15115                                  wfo-0000694684
## 15604                                  wfo-0001214311
## 16834                                  wfo-0000710750
## 19484            Guettarda turrialbana wfo-0000921751
## 19504                                  wfo-0000711141
## 28233                Helicia youngiana wfo-0000454537
## 29117                                  wfo-0000718574
## 30624                                  wfo-0001140732
## 32254                                  wfo-0000722278
## 32434                                  wfo-0001077057
## 36204                                  wfo-0001063037
## 36483                                  wfo-0001062878
## 43224                                  wfo-0000216415
## 53594                                  wfo-0000218193
## 58194                                  wfo-0001063887
## 13618              Turpinia paniculata wfo-0000459223
## 40517          Loranthus linearifolius wfo-0000366774
## 12305                                  wfo-0000796905
## 12365             Podochilus rupestris wfo-0000252079
## 22445                                  wfo-0000138908
## 27385                                  wfo-0000373769
## 32134                                  wfo-0000233012
## 32235                                  wfo-0000464751
## 33319                                  wfo-0000233291
## 33419                                  wfo-0000233291
## 33475                                  wfo-0000233291
## 33855                                  wfo-0000233323
## 34005                Magnolia sinensis wfo-0000233398
## 34275                    Magnolia moto wfo-0000233234
## 34595                                  wfo-0001283317
## 34945                                  wfo-0000465081
## 42185                                  wfo-0000369064
## 48285                                  wfo-0001081375
## 49655                                  wfo-0001343185
## 53135                                  wfo-0001247703
## 53545                                  wfo-0001079631
## 55125                                  wfo-0000169715
## 55365                                  wfo-0001079603
## 55495                                  wfo-0001082742
## 55795                                  wfo-0001079684
## 56665               Miconia amygdalina wfo-0001079622
## 56775                                  wfo-0001082540
## 57235                                  wfo-0001079799
## 58824                                  wfo-0001082152
## 6520               Microtis densiflora wfo-0000244311
## 6296                                   wfo-0001334309
## 6966           Monteverdia chapadensis wfo-0001421659
## 6996                                   wfo-0000501039
## 7086               Pontederia elongata wfo-0000501039
## 7486            Monteverdia ebenifolia wfo-0001292498
## 12006                Myoporum serratum wfo-0000448228
## 12456                  Eria acutissima wfo-0000922247
## 12485                                  wfo-0000187961
## 12726                                  wfo-0001424924
## 12866                                  wfo-0000247851
## 13046                  Myrcia berberis wfo-0000247907
## 13256              Cordia brevispicata wfo-0001350461
## 13325                Cordia calyptrata wfo-0000620413
## 13436                                  wfo-0001425001
## 13466                                  wfo-0001086476
## 13485               Myrtus chytraculia wfo-0000784639
## 13526            Cordia clarendonensis wfo-0000420852
## 13726                                  wfo-0000247959
## 13926                                  wfo-0000247388
## 13955                                  wfo-0000620411
## 14365                                  wfo-0000246885
## 14506                 Cordia fawcettii wfo-0000620449
## 14686                                  wfo-0001086131
## 14806                                  wfo-0000039422
## 15165                                  wfo-0001318292
## 15286                                  wfo-0001213408
## 15365                    Myrcia bangii wfo-0000247448
## 15526                                  wfo-0000247499
## 15926                                  wfo-0000247228
## 16246                 Myrcia vernicosa wfo-0000247328
## 16396                                  wfo-0000247310
## 16436                   Myrcia collina wfo-0000247506
## 16486                   Myrcia elegans wfo-0000247506
## 16535                                  wfo-0000247499
## 16556                                  wfo-0000247531
## 16586               Myrcia involucrata wfo-0000247450
## 16655                                  wfo-0000913451
## 16696                                  wfo-0000247729
## 16736                   Myrcia rubella wfo-0000247677
## 16746                                  wfo-0000247855
## 16766                Myrcia sintenisii wfo-0000247452
## 16776                   Myrcia smithii wfo-0000247874
## 16946                                  wfo-0000247378
## 17076                                  wfo-0000191387
## 17285                                  wfo-0001318050
## 17386            Murucuia peduncularis wfo-0001090803
## 17466                                  wfo-0000247763
## 17556                                  wfo-0000247414
## 17756                                  wfo-0000247948
## 17796                                  wfo-0000620876
## 18205                                  wfo-0000247219
## 18446                                  wfo-0001086147
## 18756                                  wfo-0000247310
## 18926                                  wfo-0001425017
## 19406                                  wfo-0000202277
## 22166                                  wfo-0000449072
## 28156                                  wfo-0000250281
## 29826                 Gilia brandegeei wfo-0001099791
## 29886                                  wfo-0000184634
## 29936               Sagina rodriguezii wfo-0000438539
## 30746                                  wfo-0001315433
## 38946                                  wfo-0000817257
## 45176                                  wfo-0000682973
## 46596                                  wfo-0000017627
## 48206                                  wfo-0000397401
## 52086                                  wfo-0000263105
## 52746                                  wfo-0001429074
## 52806                                  wfo-0000263108
## 55606                                  wfo-0000730074
## 14456                                  wfo-0000481267
## 14526                                  wfo-0000472416
## 16427                                  wfo-0001065979
## 16937                                  wfo-0000476609
## 21857                                  wfo-0000267573
## 23147                                  wfo-0000278887
## 26347                                  wfo-0001239572
## 29828                                  wfo-0000489461
## 334111                                 wfo-0000282261
## 357111                                 wfo-0000521612
## 44477                                  wfo-0000287126
## 46066                                  wfo-0000286935
## 471111                                 wfo-0000287705
## 49277                                  wfo-0000476018
## 49446                                  wfo-0000476031
## 52867                   Quarena indica wfo-0000620765
## 54457                                  wfo-0000289777
## 545111                                 wfo-0000289626
## 54597                                  wfo-0001220454
## 56167                                  wfo-0000290543
## 57035                  Quercus molinae wfo-0000291543
## 7828        Siderodendron macrophyllum wfo-0000218978
## 38329                                  wfo-0001229447
## 48528                                  wfo-0001048729
## 9048                                   wfo-0000297920
## 101210                                 wfo-0000011401
## 10187                                  wfo-0000169103
## 17798                                  wfo-0000433161
## 179112                                 wfo-0000500170
## 18078            Scurrula graciliflora wfo-0001075783
## 181112                                 wfo-0000501399
## 18139                                  wfo-0000501390
## 18157                Saurauia nelsonii wfo-0000500915
## 18178                                  wfo-0000433208
## 18229                                  wfo-0000493595
## 18348                                  wfo-0000493598
## 18408                                  wfo-0000433161
## 18508                                  wfo-0000501390
## 18658                                  wfo-0000500946
## 19098                                  wfo-0000500935
## 191210                                 wfo-0001328779
## 19258                                  wfo-0000311582
## 20428             Schefflera seemannii wfo-0000305901
## 20457             Schefflera biternata wfo-0000305936
## 20788                                  wfo-0000305668
## 21078                                  wfo-0000305737
## 248210                                 wfo-0001049834
## 36465                                  wfo-0001046584
## 36977                                  wfo-0001334440
## 37257                                  wfo-0000499163
## 410211                                 wfo-0000987853
## 43486                                  wfo-0000998681
## 43768                                  wfo-0001015965
## 57538                                  wfo-0000491350
## 577210                                 wfo-0000490926
## 60883                                  wfo-0000318296
## 41039                                  wfo-0000318306
## 39230                                  wfo-0000318821
## 55139                                  wfo-0000318785
## 13878                                  wfo-0000458647
## 26309                  Tovomita duidae wfo-0000407114
## 28549               Trichilia weddelii wfo-0000455741
## 30999              Tritonia pauciflora wfo-0000782819
## 37477               Vriesea tenuifolia wfo-0000578510
## 381310                                 wfo-0001065600
## 42689                                  wfo-0000046138
## 42869                                  wfo-0001085217
## 43009                                  wfo-0001263739
## 468211                                 wfo-0001146184
## 47809                                  wfo-0001081608
## 549113                                 wfo-0000428870
## 56569                                  wfo-0000752219
GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 3 & GTS.fuzzy$Auth.dist < 6, 
          c("TaxonName", "Author", "scientificName", "scientificNameAuthorship")]
##                      TaxonName      Author             scientificName
## 43921 Cinnamomum austrosinense   H.T.Chang Cinnamomum austro-sinensis
## 50902        Diospyros robolot    B.Walln.         Diospyros discolor
## 7015         Elaeocarpus avium       Coode         Elaeocarpus badius
## 23803        Eucalyptus bunyip        Rule          Eucalyptus dunnii
## 40517 Lasianthus linearifolius       H.Zhu       Lysiana linearifolia
## 42185  Matisia cuatrecasasiana Fern.Alonso       Matisia cuatrecasana
## 9048    Rondeletia roynaefolia         DC.     Rondeletia royenifolia
##       scientificNameAuthorship
## 43921                H.T.Chang
## 50902                   Willd.
## 7015                     Coode
## 23803                   Maiden
## 40517                   Tiegh.
## 42185              Fern.Alonso
## 9048                       DC.

5.5 A function to check for acceptable fuzzy matches

One of the reasons that species could not be directly matched is that their names suggest different genders because botanical names differentiate between feminine, masculine and neutral names (see the International Code of Botanical Nomenclature for various examples) and their genders differ between different lists.

The following function checks whether names would match if names of the species end with any of ‘a’, ‘us’ and ‘um’. Additional checks remove hyphens and check for matches if ‘ii’ was replaced by ‘i’.

An option of the function is to check whether names would match if vowels and ‘y’ were ignored.

acceptable.match <- function(x, no.vowels=FALSE) {
  x$submitted <- x$TaxonName
  x$matched <- x$scientificName
  x[x$New.accepted == TRUE, "matched"] <- x[x$New.accepted == TRUE, "Old.name"]

  x$submitted <- str_replace(x$submitted, pattern="um$", replacement="a")
  x$matched <- str_replace(x$matched, pattern="um$", replacement="a")
  
  x$submitted <- str_replace(x$submitted, pattern="us$", replacement="a")
  x$matched <- str_replace(x$matched, pattern="us$", replacement="a")
  
  x$submitted <- str_replace(x$submitted, pattern="-", replacement="")
  x$matched <- str_replace(x$matched, pattern="-", replacement="")

  x$submitted <- str_replace_all(x$submitted, pattern="ii", replacement="i")
  x$matched <- str_replace_all(x$matched, pattern="ii", replacement="i")
    
  if (no.vowels == TRUE) {
    x$submitted <- str_replace_all(x$submitted, pattern="[aeiouy]", replacement="")
    x$matched <- str_replace_all(x$matched, pattern="[aeiouy]", replacement="")
  }
  
  return(x$submitted == x$matched)
  
}

GTS.acceptable <- acceptable.match(GTS.fuzzy)
GTS.acceptable2 <- acceptable.match(GTS.fuzzy, no.vowels=TRUE)

With the custom function, we can now check the species with fuzzy matches.

nrow(GTS.fuzzy[GTS.acceptable == TRUE, ])
## [1] 277
head(GTS.fuzzy[GTS.acceptable == TRUE, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName        scientificName Old.name        taxonID
## 1028   Acropogon calcicolus   Acropogon calcicola          wfo-0000506268
## 1210 Adelobotrys macranthus Adelobotrys macrantha          wfo-0001080704
## 1569       Ageratina urbani     Ageratina urbanii          wfo-0000072197
## 1721         Aidia congesta       Aidia congestum          wfo-0000931235
## 1743      Ailanthus excelsa    Ailanthus excelsus          wfo-0000524612
## 2087     Alectryon connatus    Alectryon connatum          wfo-0000525455
tail(GTS.fuzzy[GTS.acceptable == TRUE, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                    TaxonName         scientificName             Old.name
## 45289  Vitex rubroaurantiaca Vitex rubro-aurantiaca                     
## 45579     Vitex vansteenisii      Vitex vansteenisi                     
## 511210 Withania begoniifolia  Mellissia begonifolia Withania begonifolia
## 51339    Wrightia flavorosea   Wrightia flavo-rosea                     
## 554113  Xylopia subdehiscens  Xylopia sub-dehiscens                     
## 56697     Zabelia tyaihyonii      Zabelia tyaihyoni                     
##               taxonID
## 45289  wfo-0000333420
## 45579  wfo-0000333528
## 511210 wfo-0001023587
## 51339  wfo-0000334528
## 554113 wfo-0000428719
## 56697  wfo-0000430178
nrow(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, ])
## [1] 206
head(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName          scientificName Old.name        taxonID
## 252         Acacia cretacea        Acacia creatacea          wfo-0000201128
## 1456   Aeschynomene burttii   Aeschynomene burttiie          wfo-0000173135
## 2770   Amphitecna kennedyae     Amphitecna kennedyi          wfo-0000780939
## 2954        Aniba canelilla         Aniba canellila          wfo-0000536813
## 2987         Aniba rosodora        Aniba rosaeodora          wfo-0000536890
## 3173 Annona neoecuadorensis Annona neoecuadoarensis          wfo-0000506349
tail(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                     TaxonName          scientificName Old.name        taxonID
## 51548  Wunderlichia crulsiana Wunderlichia cruelsiana          wfo-0000118141
## 531113    Xanthostemon grisii     Xanthostemon grisei          wfo-0000334842
## 54807        Xylopia maccreae        Xylopia maccreai          wfo-0000428955
## 55879      Xylosma kaalaensis       Xylosma kaalensis          wfo-0001063072
## 56798   Zanthoxylum amapaense    Zanthoxylum amapense          wfo-0000430332
## 596211   Zygocarpum caeruleum    Zygocarpum coeruleum          wfo-0000430418
nrow(GTS.fuzzy[GTS.acceptable2 == FALSE, ])
## [1] 471

As it happens, about half the species with fuzzy matches could be accepted with the rules specified by the custom function. For the remaining species, I advise to do a manual verification of the names. Therefore, a considerable number of close to 500 species remains to be manually checked. The script below saves results for these species locally.

However, this is also less than 1 percent of the original list of species. More importantly, as will be shown below, for GlobalTreeSearch, it is better to check first whether chances of finding acceptable matches become greater with Kew’s World Checklist of Vascular Plants.

GTS.fuzzy.remain1 <- GTS.fuzzy[GTS.acceptable2 == FALSE, ]

nrow(GTS.fuzzy.remain1)
## [1] 471
GTS.fuzzy.remain1[, c("TaxonName", "scientificName", "Old.name", "taxonID")][1:30, ]
##                             TaxonName                       scientificName
## 40                 Abarema microcalyx                    Abarema microcaly
## 773                     Acer iranicum                        Acer creticum
## 1319        Adinandra macquilingensis             Adinandra maquilingensis
## 1396            Aegiphila luschnathii                 Aegiphila luschnatii
## 1460  Aeschynomene pararubrofarinacea      Aeschynomene pararuhrofarinacea
## 1783               Aiouea leptophylla                   Ocotea leptophylla
## 1849               Alangium denudatum                     Allium denudatum
## 1854                 Alangium gracile                     Eryngium gracile
## 2418                 Alnus lusitanica                    Prunus lusitanica
## 2430                   Alnus rohlenae                       Rubus rohlenae
## 3186                 Annona oleifolia                    Annona cordifolia
## 3633            Arachnothryx chaconii                Arachnothryx chaconis
## 3772              Arawakia lanceolata Minuartia rupestris subsp. clementei
## 3776              Arawakia macrocarpa                 Minuartia macrocarpa
## 3783              Arawakia parvifolia                  Arenaria parvifolia
## 3793                  Arbutus bicolor             Comarostaphylis discolor
## 3811             Archidendron bauchei                Archidendron baucheri
## 4058             Ardisia labisiifolia                Ardisia labrisiifolia
## 4163               Ardisia silamensis                    Ardisia siamensis
## 4314              Artocarpus montanus                  Gonocarpus montanus
## 4364                Arytera litoralis                   Arytera littoralis
## 4418          Aspidosperma huberianum              Aspidosperma tomentosum
## 5275               Barleria mirabilis                   Barleria mutabilis
## 5290             Barringtonia augusta                 Barringtonia angusta
## 5318          Barringtonia magnifolia              Barringtonia pinnifolia
## 5528               Beaucarnea olsonii                      Nolina watsonii
## 5558           Beguea tsaratananensis                 Beguea tsaratanensis
## 5578               Beilschmiedia atra                 Beilschmiedia atrata
## 24100                Betula murrayana                     Betula ×purpusii
## 6931                Bribria apiculata                    Beyeria apiculata
##                      Old.name        taxonID
## 40                            wfo-0000194017
## 773                           wfo-0000514162
## 1319                          wfo-0000520935
## 1396                          wfo-0000811926
## 1460                          wfo-0000173772
## 1783                          wfo-0000390566
## 1849                          wfo-0000756094
## 1854                          wfo-0000677939
## 2418                          wfo-0000998700
## 2430                          wfo-0000994316
## 3186                          wfo-0000537718
## 3633                          wfo-0000254992
## 3772      Arenaria lanceolata wfo-0000374850
## 3776      Arenaria macrocarpa wfo-0000374757
## 3783                          wfo-0000546388
## 3793         Arbutus discolor wfo-0000615941
## 3811                          wfo-0000199765
## 4058                          wfo-0000544511
## 4163                          wfo-0000545167
## 4314                          wfo-0000706669
## 4364                          wfo-0000550703
## 4418  Aspidosperma hilarianum wfo-0000291837
## 5275                          wfo-0001342460
## 5290                          wfo-0000774826
## 5318                          wfo-0000922995
## 5528      Beaucarnea watsonii wfo-0000700468
## 5558                          wfo-0001269042
## 5578                          wfo-0000561803
## 24100       Betula ×murrayana wfo-0000336488
## 6931                          wfo-0000911620
tail(GTS.fuzzy.remain1[, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                         TaxonName             scientificName Old.name
## 47809         Warneckea albiflora       Warneckea cauliflora         
## 49698       Wendlandia buddleacea     Wendlandia buddlejacea         
## 52259         Xanthophyllum laeve       Xanthophyllum laevis         
## 52659  Xanthophyllum schizocarpon Xanthophyllum schixocarpon         
## 549113           Xylopia muricata           Xylopia africana         
## 56569              Yucca pinicola             Yucca rupicola         
##               taxonID
## 47809  wfo-0001081608
## 49698  wfo-0000334069
## 52259  wfo-0000428598
## 52659  wfo-0000428323
## 549113 wfo-0000428870
## 56569  wfo-0000752219
file.save1 <- paste0(getwd(), "//GTS_Fuzzy_WFO_remain.txt")
fwrite(GTS.fuzzy.remain1, file=file.save1, sep="|", row.names=FALSE)

6 Standardize species names with the World Checklist of Vascular Plants

Instead of using the taxonomic backbone of World Flora Online, now we will use the taxonomic backbone of the World checklist of Vascular Plants (WCVP). Here I used the June 2022 version, downloaded via this link.

As shown previously, I recommend to first use a text editor to replace instances of ’ × ’ by ’ ×’ in the WCVP.

Where World Flora Online listed over 500,000 current species names, WCVP has slightly less than 400,000 current species names.

# WCVP.file <- choose.files()
WCVP.file <- "E:\\Roeland\\WorldFloraOnline\\WFO 2022\\wcvp_v9_jun_2022 x changed.txt"
WCVP.data <- fread(WCVP.file, header=TRUE, encoding="UTF-8", sep="|")
head(WCVP.data)
##     kew_id      family       genus       species infraspecies
## 1:   338-1 Acanthaceae Acanthodium                           
## 2: 44787-1 Acanthaceae Acanthodium      angustum             
## 3: 44788-1 Acanthaceae Acanthodium       capense             
## 4: 44789-1 Acanthaceae Acanthodium  carduifolium             
## 5: 44790-1 Acanthaceae Acanthodium       delilii             
## 6: 44792-1 Acanthaceae Acanthodium diversispinum             
##                   taxon_name     authors    rank  taxonomic_status
## 1:               Acanthodium      Delile   GENUS           Synonym
## 2:      Acanthodium angustum        Nees SPECIES Homotypic_Synonym
## 3:       Acanthodium capense (L.f.) Nees SPECIES Homotypic_Synonym
## 4:  Acanthodium carduifolium (L.f.) Nees SPECIES Homotypic_Synonym
## 5:       Acanthodium delilii      H.Buek SPECIES           Synonym
## 6: Acanthodium diversispinum        Nees SPECIES Homotypic_Synonym
##    accepted_kew_id           accepted_name  accepted_authors parent_kew_id
## 1:           427-1               Blepharis             Juss.              
## 2:         46469-1       Blepharis angusta (Nees) T.Anderson              
## 3:         46487-1      Blepharis capensis      (L.f.) Pers.              
## 4:         44830-1 Acanthopsis carduifolia     (L.f.) Schinz              
## 5:         46503-1        Blepharis edulis   (Forssk.) Pers.              
## 6:         46501-1  Blepharis diversispina (Nees) C.B.Clarke              
##    parent_name parent_authors  reviewed
## 1:                            In review
## 2:                            In review
## 3:                            In review
## 4:                            In review
## 5:                            In review
## 6:                            In review
##                                      publication original_name_id
## 1: Descr. Egypte, Hist. Nat. 2(Mém.): 241 (1813)                 
## 2:        A.P.de Candolle, Prodr. 11: 273 (1847)                 
## 3:                        Linnaea 15: 361 (1841)                 
## 4:        A.P.de Candolle, Prodr. 11: 278 (1847)          44848-1
## 5:                 Gen. Sp. Candoll. 3: 1 (1858)                 
## 6:        A.P.de Candolle, Prodr. 11: 275 (1847)
WCVP.data <- new.backbone(WCVP.data, 
                          taxonID="kew_id",
                          scientificName="taxon_name",
                          scientificNameAuthorship="authors",
                          acceptedNameUsageID = "accepted_kew_id",
                          taxonomicStatus = "taxonomic_status")
head(WCVP.data)
##    taxonID            scientificName scientificNameAuthorship
## 1:   338-1               Acanthodium                   Delile
## 2: 44787-1      Acanthodium angustum                     Nees
## 3: 44788-1       Acanthodium capense              (L.f.) Nees
## 4: 44789-1  Acanthodium carduifolium              (L.f.) Nees
## 5: 44790-1       Acanthodium delilii                   H.Buek
## 6: 44792-1 Acanthodium diversispinum                     Nees
##    acceptedNameUsageID   taxonomicStatus  kew_id      family       genus
## 1:               427-1           Synonym   338-1 Acanthaceae Acanthodium
## 2:             46469-1 Homotypic_Synonym 44787-1 Acanthaceae Acanthodium
## 3:             46487-1 Homotypic_Synonym 44788-1 Acanthaceae Acanthodium
## 4:             44830-1 Homotypic_Synonym 44789-1 Acanthaceae Acanthodium
## 5:             46503-1           Synonym 44790-1 Acanthaceae Acanthodium
## 6:             46501-1 Homotypic_Synonym 44792-1 Acanthaceae Acanthodium
##          species infraspecies                taxon_name     authors    rank
## 1:                                          Acanthodium      Delile   GENUS
## 2:      angustum                   Acanthodium angustum        Nees SPECIES
## 3:       capense                    Acanthodium capense (L.f.) Nees SPECIES
## 4:  carduifolium               Acanthodium carduifolium (L.f.) Nees SPECIES
## 5:       delilii                    Acanthodium delilii      H.Buek SPECIES
## 6: diversispinum              Acanthodium diversispinum        Nees SPECIES
##     taxonomic_status accepted_kew_id           accepted_name  accepted_authors
## 1:           Synonym           427-1               Blepharis             Juss.
## 2: Homotypic_Synonym         46469-1       Blepharis angusta (Nees) T.Anderson
## 3: Homotypic_Synonym         46487-1      Blepharis capensis      (L.f.) Pers.
## 4: Homotypic_Synonym         44830-1 Acanthopsis carduifolia     (L.f.) Schinz
## 5:           Synonym         46503-1        Blepharis edulis   (Forssk.) Pers.
## 6: Homotypic_Synonym         46501-1  Blepharis diversispina (Nees) C.B.Clarke
##    parent_kew_id parent_name parent_authors  reviewed
## 1:                                          In review
## 2:                                          In review
## 3:                                          In review
## 4:                                          In review
## 5:                                          In review
## 6:                                          In review
##                                      publication original_name_id
## 1: Descr. Egypte, Hist. Nat. 2(Mém.): 241 (1813)                 
## 2:        A.P.de Candolle, Prodr. 11: 273 (1847)                 
## 3:                        Linnaea 15: 361 (1841)                 
## 4:        A.P.de Candolle, Prodr. 11: 278 (1847)          44848-1
## 5:                 Gen. Sp. Candoll. 3: 1 (1858)                 
## 6:        A.P.de Candolle, Prodr. 11: 275 (1847)
nrow(WCVP.data)
## [1] 1232931
nrow(WCVP.data[WCVP.data$rank == "SPECIES", ])
## [1] 999556
nrow(WCVP.data[WCVP.data$rank == "SPECIES" & WCVP.data$acceptedNameUsageID == "", ])
## [1] 394407

We can use similar scripts now as above.

cuts <- cut(c(1:nrow(GTS)), breaks=10, labels=FALSE)
cut.i <- sort(unique(cuts))

start.time <- Sys.time()

for (i in 1:length(cut.i)) {

cat(paste("Cut: ", i, "\n"))  
    
GTS.i <- WFO.one(WFO.match.fuzzyjoin(spec.data=GTS[cuts==cut.i[i], ],
                                     WFO.data=WCVP.data,
                                     spec.name="TaxonName",
                                     Authorship="Author",
                                     fuzzydist.max=3),
                 verbose=FALSE)

if (i==1) {
  GTS.WFO <- GTS.i
}else{
  GTS.WFO <- rbind(GTS.WFO, GTS.i)
}

}
## Cut:  1
## Checking for fuzzy matches for 35 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  2
## Checking for fuzzy matches for 37 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  3
## Checking for fuzzy matches for 42 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  4
## Checking for fuzzy matches for 21 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  5
## Checking for fuzzy matches for 48 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  6
## Checking for fuzzy matches for 47 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  7
## Checking for fuzzy matches for 36 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  8
## Checking for fuzzy matches for 59 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  9
## Checking for fuzzy matches for 62 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
## Cut:  10
## Checking for fuzzy matches for 37 records
## 
## Checking new accepted IDs
## Reached case # 1000
## Reached case # 2000
## Reached case # 3000
## Reached case # 4000
## Reached case # 5000
end.time <- Sys.time()
end.time - start.time # 1.113053 hours
## Time difference of 58.9713 mins

6.1 Breakdown of matches

Matching with the WCVP was considerably more successful than with World Flora Online. Instead of not finding matches for 410 species, now this number was close to 50. The number of species with fuzzy matches also dropped to 40% and less than 400.

# not matched
nrow(GTS.WFO[GTS.WFO$Matched == FALSE, ])
## [1] 52
# directly matched
nrow(GTS.WFO[GTS.WFO$Matched == TRUE & GTS.WFO$Fuzzy == FALSE, ])
## [1] 57534
GTS.fuzzy <- GTS.WFO[GTS.WFO$Fuzzy == TRUE, ]
nrow(GTS.fuzzy)
## [1] 372

6.2 Species that could not be matched

These are the species for which no matches could found.

GTS.not2 <- GTS.WFO[GTS.WFO$Matched == FALSE, ]
GTS.not2[, c("TaxonName", "Author")]
##                            TaxonName
## 2483                Alseis sertaneja
## 2696              Amphitecna fonceti
## 4192               Artocarpus bergii
## 5619           Beilschmiedia obscura
## 5627           Beilschmiedia osacola
## 5694        Beilschmiedia tisseranti
## 39610      Bougainvillea fasciculata
## 20011         Camellia hengchunensis
## 42901          Cinnadenia liyuyingii
## 46051       Citharexylum ligustrinum
## 52311               Clusia aemygdioi
## 11222         Cotoneaster ellipticus
## 21352               Crudia bibundina
## 24972    Cryptocarya sheikelmudiyana
## 25792            Ctenodon molliculus
## 25802             Ctenodon monteiroi
## 31162             Cyrtandra balgooyi
## 37892              Deinbollia onanae
## 44422          Diospyros antakaranae
## 54182           Disepalum rawagambut
## 13893     Endiandra wongawallanensis
## 14183          Endlicheria goeldiana
## 14882        Englerodendron libassum
## 13034          Gordonia singaporeana
## 14824            Grewia delphinensis
## 15294             Grewia mansouriana
## 39564        Humiriastrum purusensis
## 55024         Jarandersonia pereirae
## 29795            Madhuca chia-ananii
## 33334        Magnolia llanganatensis
## 38118         Mangifera salomonensis
## 42895            Mediusella arenaria
## 53620             Petrea asperifolia
## 21577             Plerandra gordonii
## 21626            Plerandra moratiana
## 380111               Prunus klokovii
## 53907             Quercus centenaria
## 10258             Ruagea parvifructa
## 24888           Schizolaena noronhae
## 26608        Scyphostegia borneensis
## 47958           Sterculia multiovula
## 51229             Styrax cambodianus
## 58208                Synima cordieri
## 27739             Trichilia deminuta
## 35039            Uvariopsis dicaprio
## 402210           Vepris robertsoniae
## 40409               Vepris zapfackii
## 41269              Viburnum axillare
## 45689      Vochysia caroliae-scottii
## 46809          Volkameria emirnensis
## 46839              Volkameria grevei
## 57839  Zanthoxylum tenuipedicellatum
##                                                           Author
## 2483                                      L.Marinho & J.G.Jardim
## 2696                         Ortiz-Rodr. & G\xf3mez-Dom\xednguez
## 4192                              E.M.Gardner, Arifiani & Zerega
## 5619                                    (Stapf) Engl. ex A.Chev.
## 5627                          Aguilar, D.Santam. & van der Werff
## 5694                                                     A.Chev.
## 39610                                                    Heimerl
## 20011                                                      Chang
## 42901                                    (H.Liu) de Kok & Sengun
## 46051                               (Thur. ex Decne.) Van Houtte
## 52311                                Gomes da Silva & B.Weinberg
## 11222                                            (Lindl.) Loudon
## 21352                                                      Harms
## 24972                                  A.K.H.Bachan & P.K.Fasila
## 25792                (Kunth) D.B.O.S.Cardoso, Filardi & H.C.Lima
## 25802  (A.Fern. & P.Bezerra) D.B.O.S.Cardoso, Filardi & H.C.Lima
## 31162                                       H.J.Atkins & Karton.
## 37892                                                      Cheek
## 44422                              Capuron ex G.E.Schatz & Lowry
## 54182                               Randi, D.C.Thomas & Wijedasa
## 13893                                                    L.Weber
## 14183                                                    Vattimo
## 14882                                        Jongkind & Breteler
## 13034                                      (Dyer) Wall. ex Ridl.
## 14824                                                    Capuron
## 15294                                                     Abedin
## 39564                                                     Prance
## 55024                                  S.K.Ganesan & R.C.K.Chung
## 29795                                                   Chantar.
## 33334                                   A.V\xe1zquez & D.A.Neill
## 38118                                                  C.T.White
## 42895                                      (F.G\xe9rard) Hong-Wa
## 53620                                           (Miranda) Hammel
## 21577                               Lowry, G.M.Plunkett & Frodin
## 21626                                       Lowry & G.M.Plunkett
## 380111                                                   (Sobko)
## 53907                                            L.M.Gonz\xe1lez
## 10258                                                  T.D.Penn.
## 24888                                 (Tul.) G.E. Schatz & Lowry
## 26608                                                      Stapf
## 47958                                E.L. Taylor ex Mondrag\xf3n
## 51229                                                P.W.Fritsch
## 58208                                          (F.Muell.) Radlk.
## 27739                                                  T.D.Penn.
## 35039                                            Cheek & Gosline
## 402210                                                   Q. Luke
## 40409                                              Cheek & Onana
## 41269                                                     Triana
## 45689                                       Marc.-Berti & Aymard
## 46809                       (Bojer ex Hook.) Phillipson & Callm.
## 46839                             (Moldenke) Phillipson & Callm.
## 57839                                         (Kokwaro) Vollesen

6.3 Matches of distance = 1

As before, we can check for species with matching distance equal to 1, …

nrow(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, ]) # 207
## [1] 207
head(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName          scientificName            Old.name
## 14    Abarema cochliocarpos   Abarema cochliacarpos                    
## 402  Acacia macdonnellensis Acacia macdonnelliensis                    
## 1298     Adinandra milletii     Adinandra millettii                    
## 1402     Aegiphila valerioi       Aegiphila valerii                    
## 2011    Aldina aquae-nigrae      Aldina macrophylla Aldina aquae-negrae
## 2029      Aldina rio-negrae      Aldina macrophylla    Aldina rionegrae
##        taxonID
## 14   1020872-2
## 402   470815-1
## 1298  828439-1
## 1402    5839-2
## 2011  473441-1
## 2029  473441-1
tail(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 1, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                     TaxonName          scientificName Old.name    taxonID
## 45549     Vochysia antioquiae      Vochysia antioquia          60457217-2
## 49008    Weinmannia silvicola    Weinmannia sylvicola            795085-1
## 491113    Weinmannia trianaea      Weinmannia trianae            268328-2
## 51249  Wunderlichia crulsiana Wunderlichia cruelsiana            260740-1
## 56398      Zabelia tyaihyonii       Zabelia tyaihyoni            150127-1
## 591112        Zygia macbridii         Zygia macbridei            962832-1

6.4 Matches of distance = 2

… and distances equal to 2 …

nrow(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, ]) # 137
## [1] 137
head(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                    TaxonName          scientificName              Old.name
## 1001    Acropogon calcicolus     Acropogon calcicola                      
## 1018 Acropogon sageniifolius  Acropogon sageniifolia                      
## 1021 Acropogon schumannianus  Acropogon schumanniana                      
## 3522     Apterosperma oblata    Apterosperma oblatum                      
## 3527     Aquilaria banaensis      Aquilaria banaense                      
## 3790    Archidendron oblonga Archidendropsis oblonga Archidendron oblongum
##         taxonID
## 1001 77080941-1
## 1018   822024-1
## 1021   822025-1
## 3522   829855-1
## 3527   931120-1
## 3790   911899-1
tail(GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 2, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                         TaxonName           scientificName Old.name  taxonID
## 26749            Trema cannabinum          Trema cannabina          856736-1
## 29439  Trigonostemon detritiferus Trigonostemon detritifer          979028-1
## 320211      Turraeanthus africana   Turraeanthus africanus          579875-1
## 510211        Wrightia flavorosea    Wrightia flavidorosea           82849-1
## 51299       Wurdastom ecuadorense   Wurdastom ecuadorensis          994547-1
## 58308        Ziziphus cambodianus      Ziziphus cambodiana          719271-1

6.5 Matches of distance = 3

… and those equal to 3.

GTS.fuzzy[GTS.fuzzy$Fuzzy.dist == 3, c("TaxonName", "scientificName", "Old.name", "taxonID")]
##                           TaxonName                      scientificName
## 1092          Actinodaphne leiantha              Actinodaphne myriantha
## 4261                Arytera collina                     Drosera collina
## 4817             Ayenia cuatrecasae                Ayenia cuatrecasasii
## 5758              Bembicia uniflora                    Remijia uniflora
## 12111              Bursera zapoteca                      Bursera aptera
## 23381  Capparidastrum cuatrecasanum           Morisonia cuatrecasasiana
## 28913           Cybianthus pittieri               Lycianthes multiflora
## 31432        Cyrtandra longistamina            Cryptandra longistaminea
## 50272            Diospyros sennenii                  Diospyros senensis
## 59661              Drypetes louisii                     Drypetes dussii
## 22642         Eucalyptus alatissima               Eucalyptus plenissima
## 41773       Euphorbia neospinescens Euphorbia cuneata subsp. spinescens
## 14554            Grewia androyensis                   Grewia angolensis
## 14634                Grewia barorum                      Grewia baronii
## 15354                Grewia milleri                      Grewia bicolor
## 15718             Labordia triflora                      Gagea triflora
## 53355         Miconia castaneiflora               Miconia castaneifolia
## 47096             Padus napaulensis                  Prunus napaulensis
## 28897               Populus hyrcana                      Populus haoana
## 35567               Protium aidanum                    Protium bahianum
## 37587              Prunus dielsiana              Cotoneaster dielsianus
## 101210                Ruagea beckii                       Jungia beckii
## 10198                Ruagea obovata                       Dalea obovata
## 18139         Saurauia cuatrecasana            Saurauia cuatrecasasiana
## 421112              Sorbus neglecta                Lotus lancerottensis
## 42148            Sorbus obtusifolia            Hesperomeles obtusifolia
## 47697             Sterculia holtzei             Sterculia megistophylla
## 19659           Ternstroemia huberi                  Ternstroemia hosei
##                               Old.name    taxonID
## 1092                                     462296-1
## 4261                                   77142060-1
## 4817                                      27519-2
## 5758                                     302939-2
## 12111                                    127080-1
## 23381  Capparidastrum cuatrecasasianum 77184037-1
## 28913              Lycianthes pittieri   146571-2
## 31432                                    717246-1
## 50272                                    323003-1
## 59661                                     85490-2
## 22642                                    593258-1
## 41773             Euphorbia spinescens   880546-1
## 14554                                    834371-1
## 14634                                    834077-1
## 15354                   Grewia dinteri   834087-1
## 15718                 Lloydia triflora   535691-1
## 53355                                    572249-1
## 47096                                    730017-1
## 28897                                    776705-1
## 35567                                    300177-2
## 37587                  Pyrus dielsiana   722471-1
## 101210                                   315219-2
## 10198                                     76270-2
## 18139                                    228033-2
## 421112                  Lotus neglecta   503717-1
## 42148                Pyrus obtusifolia   725484-1
## 47697                  Sterculia hosei   825342-1
## 19659                                    830529-1

6.6 A function to check for acceptable fuzzy matches

The custom function can also be used now.

GTS.acceptable <- acceptable.match(GTS.fuzzy)
GTS.acceptable2 <- acceptable.match(GTS.fuzzy, no.vowels=TRUE)

Again the results of the function suggest that many of fuzzy matches can be accepted.

nrow(GTS.fuzzy[GTS.acceptable == TRUE, ]) # 130
## [1] 130
head(GTS.fuzzy[GTS.acceptable == TRUE, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                    TaxonName         scientificName         Old.name    taxonID
## 1001    Acropogon calcicolus    Acropogon calcicola                  77080941-1
## 1018 Acropogon sageniifolius Acropogon sageniifolia                    822024-1
## 1021 Acropogon schumannianus Acropogon schumanniana                    822025-1
## 2029       Aldina rio-negrae     Aldina macrophylla Aldina rionegrae   473441-1
## 2046   Alectryon macrococcum  Alectryon macrococcus                    781658-1
## 2630       Ambavia gerrardii       Ambavia gerrardi                     72022-1
tail(GTS.fuzzy[GTS.acceptable == TRUE, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                           TaxonName              scientificName Old.name
## 11499   Tabernaemontana mocquerysii  Tabernaemontana mocquerysi         
## 13799  Tambourissa castri-delphinii Tambourissa castri-delphini         
## 26749              Trema cannabinum             Trema cannabina         
## 320211        Turraeanthus africana      Turraeanthus africanus         
## 56398            Zabelia tyaihyonii           Zabelia tyaihyoni         
## 58308          Ziziphus cambodianus         Ziziphus cambodiana         
##         taxonID
## 11499   82224-1
## 13799  582447-1
## 26749  856736-1
## 320211 579875-1
## 56398  150127-1
## 58308  719271-1
nrow(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, ]) # 113
## [1] 113
head(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                   TaxonName             scientificName            Old.name
## 14    Abarema cochliocarpos      Abarema cochliacarpos                    
## 402  Acacia macdonnellensis    Acacia macdonnelliensis                    
## 1402     Aegiphila valerioi          Aegiphila valerii                    
## 2011    Aldina aquae-nigrae         Aldina macrophylla Aldina aquae-negrae
## 3003     Annickia kummeriae          Annickia kummerae                    
## 3977     Ardisia lancifolia Tapeinosperma lanceifolium Ardisia lanceifolia
##        taxonID
## 14   1020872-2
## 402   470815-1
## 1402    5839-2
## 2011  473441-1
## 3003  948443-1
## 3977  590094-1
tail(GTS.fuzzy[GTS.acceptable == FALSE & GTS.acceptable2 == TRUE, 
               c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                     TaxonName          scientificName        Old.name
## 43879          Vitex carvalhi    Vitex mossambicensis Vitex carvalhoi
## 45549     Vochysia antioquiae      Vochysia antioquia                
## 49008    Weinmannia silvicola    Weinmannia sylvicola                
## 491113    Weinmannia trianaea      Weinmannia trianae                
## 51249  Wunderlichia crulsiana Wunderlichia cruelsiana                
## 591112        Zygia macbridii         Zygia macbridei                
##           taxonID
## 43879    865886-1
## 45549  60457217-2
## 49008    795085-1
## 491113   268328-2
## 51249    260740-1
## 591112   962832-1

Slightly over 100 species remain that are not accepted with the rules of the custom function. It has now become a relatively easy task to manually check these species, so the script below saves the species and their matching details. But maybe we should check first whether any of these remaining species could have been matched with the World Flora Online? This is done in the next section.

GTS.fuzzy.remain2 <- GTS.fuzzy[GTS.acceptable2 == FALSE, ]

nrow(GTS.fuzzy.remain2) # 129
## [1] 129
head(GTS.fuzzy.remain2[, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                       TaxonName              scientificName Old.name    taxonID
## 1092      Actinodaphne leiantha      Actinodaphne myriantha            462296-1
## 1298         Adinandra milletii         Adinandra millettii            828439-1
## 2364          Alnus mandshurica          Alnus mandschurica            107681-1
## 3336 Antirhea novobritanniensis Antirhea novobrittanniensis            969414-1
## 3527        Aquilaria banaensis          Aquilaria banaense            931120-1
## 4261            Arytera collina             Drosera collina          77142060-1
tail(GTS.fuzzy.remain2[, c("TaxonName", "scientificName", "Old.name", "taxonID")])
##                               TaxonName                  scientificName
## 19437          Ternstroemia conicocarpa         Ternstroemia coniocarpa
## 19659               Ternstroemia huberi              Ternstroemia hosei
## 20769                Tetralix moanensis               Tetralix moaensis
## 209211 Tetrapterocarpon septentrionalis Tetrapterocarpon septentrionale
## 510211              Wrightia flavorosea           Wrightia flavidorosea
## 51299             Wurdastom ecuadorense          Wurdastom ecuadorensis
##        Old.name    taxonID
## 19437            1017244-1
## 19659             830529-1
## 20769             251704-2
## 209211          20004546-1
## 510211             82849-1
## 51299             994547-1
file.save2 <- paste0(getwd(), "//GTS_Fuzzy_WCVP_remain.txt")
fwrite(GTS.fuzzy.remain2, file=file.save2, sep="|", row.names=FALSE)

7 Recheck the species not reasonably matched with the WCVP

As GlobalTreeSearch was compiled from many different information sources, it is possible that some species that could not be matched with the WCVP could have been included in the World Flora Online. This is what we will check for here.

GTS.recheck <- rbind(GTS.not2[, c("TaxonName", "Author")],
                     GTS.fuzzy.remain2[, c("TaxonName", "Author")])
nrow(GTS.recheck)
## [1] 181
start.time <- Sys.time()

GTS.rechecked <- WFO.one(WFO.match.fuzzyjoin(spec.data=GTS.recheck,
                                     WFO.data=WFO.data,
                                     spec.name="TaxonName",
                                     Authorship="Author",
                                     fuzzydist.max=3),
                         verbose=FALSE)
## Checking for fuzzy matches for 72 records
## 
## Checking new accepted IDs
end.time <- Sys.time()
end.time - start.time # 1.113053 hours
## Time difference of 4.634521 mins

As already shown among the messages, there were indeed some of the species that could be directly matched with World Flora Online. And also among the species with fuzzy matches, many can be accepted via a manual check.

nrow(GTS.rechecked[GTS.rechecked$Matched == TRUE, ])
## [1] 151
GTS.rechecked[GTS.rechecked$Matched == FALSE, 
              c("TaxonName", "scientificName")]
##                         TaxonName scientificName
## 2              Amphitecna fonceti           <NA>
## 3               Artocarpus bergii           <NA>
## 5           Beilschmiedia osacola           <NA>
## 9           Cinnadenia liyuyingii           <NA>
## 14    Cryptocarya sheikelmudiyana           <NA>
## 15            Ctenodon molliculus           <NA>
## 16             Ctenodon monteiroi           <NA>
## 17             Cyrtandra balgooyi           <NA>
## 18              Deinbollia onanae           <NA>
## 19          Diospyros antakaranae           <NA>
## 20           Disepalum rawagambut           <NA>
## 21     Endiandra wongawallanensis           <NA>
## 23        Englerodendron libassum           <NA>
## 26             Grewia mansouriana           <NA>
## 27        Humiriastrum purusensis           <NA>
## 28         Jarandersonia pereirae           <NA>
## 30        Magnolia llanganatensis           <NA>
## 36                Prunus klokovii           <NA>
## 37             Quercus centenaria           <NA>
## 38             Ruagea parvifructa           <NA>
## 42             Styrax cambodianus           <NA>
## 44             Trichilia deminuta           <NA>
## 45            Uvariopsis dicaprio           <NA>
## 47               Vepris zapfackii           <NA>
## 49      Vochysia caroliae-scottii           <NA>
## 50          Volkameria emirnensis           <NA>
## 51              Volkameria grevei           <NA>
## 52  Zanthoxylum tenuipedicellatum           <NA>
## 144         Monteverdia gonoclada           <NA>
## 171    Rhododendron suoilenhensis           <NA>
GTS.rechecked[GTS.rechecked$Fuzzy == TRUE, 
              c("TaxonName", "scientificName", "Old.name")]
##                        TaxonName              scientificName
## 6       Beilschmiedia tisseranti   Beilschmiedia tisserantii
## 29           Madhuca chia-ananii         Madhuca chai-ananii
## 62              Betula murrayana            Betula ×purpusii
## 63              Bursera zapoteca              Bursera aptera
## 66          Canarium multinervis         Canarium multinerve
## 67             Canarium subtilis            Canarium subtile
## 68  Capparidastrum cuatrecasanum   Morisonia cuatrecasasiana
## 71         Citharexylum mocinnoi        Citharexylum mocinoi
## 78        Cyrtandra longistamina    Cryptandra longistaminea
## 79          Dalbergia annamensis        Dalbergia andapensis
## 82         Dicoryphe buddleoides      Dicoryphe buddlejoides
## 84            Diospyros sennenii          Diospyros senensis
## 85       Drypetes assymetricarpa     Drypetes asymmetricarpa
## 86              Drypetes louisii             Drypetes dussii
## 89          Escallonia myrtoides        Escallonia myrtoidea
## 90           Euadenia trifoliata           Crateva monticola
## 91         Eucalyptus alatissima       Eucalyptus plenissima
## 94             Eugenia poroensis           Eugenia pardensis
## 104          Gmelina leichardtii        Gmelina leichhardtii
## 105  Graffenrieda conostegioides Graffenrieda comostegioides
## 112               Grewia milleri             Grimmia milleri
## 114       Guettarda prenleloupii       Guettarda preneloupii
## 115          Guettarda wayaensis        Guettarda wagapensis
## 136        Memecylon arnhemensis        Memecylon arnhemense
## 137           Memecylon plebeium          Memecylon plebejum
## 139         Mezoneuron kavaiense        Mezoneuron kauaiense
## 141              Miconia doniana             Miconia doriana
## 143   Moldenhawera luschnathiana   Moldenhawera lushnathiana
## 148      Neraudia melastomifolia   Neraudia melastomatifolia
## 153         Pavieasia annamensis         Pavieasia anamensis
## 155      Phellocalyx vollescenii      Phellocalyx vollesenii
## 160         Plumeria trouinensis       Plumeria ×stenopetala
## 170               Quercus mexiae            Quercus gambelii
## 172                Ruagea beckii               Jungia beckii
## 173               Ruagea obovata               Dalea obovata
## 181            Sorbus arvonensis           Sorbus arranensis
## 187      Stenostomum albobruneum    Stenostomum albobrunneum
## 189               Styrax obassis              Styrax obassia
## 190       Syzygium kanneliyensis       Syzygium kanneliyense
## 192           Syzygium trukensis           Syzygium trukense
## 194     Ternstroemia conicocarpa     Ternstroemia coniocarpa
## 198          Wrightia flavorosea        Wrightia flavo-rosea
##                            Old.name
## 6                                  
## 29                                 
## 62                Betula ×murrayana
## 63                                 
## 66                                 
## 67                                 
## 68  Capparidastrum cuatrecasasianum
## 71                                 
## 78                                 
## 79                                 
## 82                                 
## 84                                 
## 85                                 
## 86                                 
## 89                                 
## 90            Euadenia trifoliolata
## 91                                 
## 94                                 
## 104                                
## 105                                
## 112                                
## 114                                
## 115                                
## 136                                
## 137                                
## 139                                
## 141                                
## 143                                
## 148                                
## 153                                
## 155                                
## 160           Plumeria ×trouinenais
## 170                   Quercus media
## 172                                
## 173                                
## 181                                
## 187                                
## 189                                
## 190                                
## 192                                
## 194                                
## 198

8 Global Biodiversity Standard

This publication was initiated partially from ongoing work in a Darwin Initiative project (DAREX001) that develops a Global Biodiversity Standard for tree planting. Recently the GlobalUsefulNativeTrees database was released from this project. With scripts such as the ones shown here, when the Global Biodiversity Standard scheme becomes operational, tree planting projects can crosscheck lists of species before applying.

9 Session Information

sessionInfo()
## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] stringr_1.4.1     data.table_1.14.2 WorldFlora_1.12  
## 
## loaded via a namespace (and not attached):
##  [1] bslib_0.4.0       compiler_4.2.1    pillar_1.8.1      jquerylib_0.1.4  
##  [5] tools_4.2.1       digest_0.6.29     jsonlite_1.8.0    evaluate_0.16    
##  [9] lifecycle_1.0.3   tibble_3.1.8      pkgconfig_2.0.3   rlang_1.0.6      
## [13] cli_3.4.1         rstudioapi_0.14   yaml_2.3.5        parallel_4.2.1   
## [17] fuzzyjoin_0.1.6   xfun_0.33         fastmap_1.1.0     withr_2.5.0      
## [21] dplyr_1.0.10      knitr_1.40        generics_0.1.3    vctrs_0.5.1      
## [25] sass_0.4.2        tidyselect_1.2.0  glue_1.6.2        R6_2.5.1         
## [29] fansi_1.0.3       rmarkdown_2.16    purrr_0.3.4       tidyr_1.2.1      
## [33] magrittr_2.0.3    htmltools_0.5.3   stringdist_0.9.10 utf8_1.2.2       
## [37] stringi_1.7.8     cachem_1.0.6