library(BiodiversityR) # also loads vegan
library(FD)
library(readxl)
The FD package includes methods for calculating the weighted Gower dissimilarity index, through methods described in A distance‐based framework for measuring functional diversity from multiple traits.
The vegan package includes methods for calculating taxonomic distances, through methods described in A taxonomic distinctness index and its statistical properties.
Here I will show:
How taxonomic distance can be calculated via the FD package
How functional and taxonomic distance can be combined in a weighted distance measure
These methods resulted from statistical analysis that I conduced for the following article led by Ian Dawson: The role of genetics in mainstreaming the production of new and orphan crops to diversify food systems and support human nutrition.
Here we used an example data set from the vegan package.
data(dune.taxon)
str(dune.taxon)
## 'data.frame': 30 obs. of 5 variables:
## $ Genus : chr "Achillea" "Agrostis" "Aira" "Alopecurus" ...
## $ Family : chr "Asteraceae" "Poaceae" "Poaceae" "Poaceae" ...
## $ Order : chr "Asterales" "Poales" "Poales" "Poales" ...
## $ Superorder: chr "Asteranae" "Lilianae" "Lilianae" "Lilianae" ...
## $ Subclass : chr "Magnoliidae" "Magnoliidae" "Magnoliidae" "Magnoliidae" ...
head(dune.taxon)
## Genus Family Order Superorder Subclass
## Achimill Achillea Asteraceae Asterales Asteranae Magnoliidae
## Agrostol Agrostis Poaceae Poales Lilianae Magnoliidae
## Airaprae Aira Poaceae Poales Lilianae Magnoliidae
## Alopgeni Alopecurus Poaceae Poales Lilianae Magnoliidae
## Anthodor Anthoxanthum Poaceae Poales Lilianae Magnoliidae
## Bellpere Bellis Asteraceae Asterales Asteranae Magnoliidae
The taxonomic distance is calculated via the vegan::taxa2dist function.
taxdist.vegan1 <- vegan::taxa2dist(dune.taxon, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan1)[1:10, 1:5]
## Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill 0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545 0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273 0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273 0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273 0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545
The dune.taxon data set did not explicitly include a species level. This level needs to be added as an extra column for the calculations of the Gower distance. But note first that adding the species level does not change the calculations via taxa2dist.
dune.taxon2 <- data.frame(Species=rownames(dune.taxon), dune.taxon)
str(dune.taxon2)
## 'data.frame': 30 obs. of 6 variables:
## $ Species : chr "Achimill" "Agrostol" "Airaprae" "Alopgeni" ...
## $ Genus : chr "Achillea" "Agrostis" "Aira" "Alopecurus" ...
## $ Family : chr "Asteraceae" "Poaceae" "Poaceae" "Poaceae" ...
## $ Order : chr "Asterales" "Poales" "Poales" "Poales" ...
## $ Superorder: chr "Asteranae" "Lilianae" "Lilianae" "Lilianae" ...
## $ Subclass : chr "Magnoliidae" "Magnoliidae" "Magnoliidae" "Magnoliidae" ...
taxdist.vegan2 <- vegan::taxa2dist(dune.taxon2, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan2)[1:10, 1:5]
## Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill 0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545 0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273 0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273 0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273 0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan1),
current=as.matrix(taxdist.vegan2))
## [1] TRUE
To calculate the taxonomic distance via the FD::gowdis function, information is needed on the different weights for the taxonomic levels. This information is available from the steps output of the taxa2dist function:
steps1 <- attributes(taxdist.vegan1)$steps
steps1
## Base Genus Family Order Superorder Subclass
## 4.090909 18.181818 13.636364 16.363636 27.272727 20.454545
steps2 <- attributes(taxdist.vegan2)$steps
steps2
## Base Genus Family Order Superorder Subclass
## 4.090909 18.181818 13.636364 16.363636 27.272727 20.454545
You can check that these add up to 100%
sum(steps1)
## [1] 100
It is possible to manually calculate these steps from differences in the number of categories at each hierarchical level. Here I show these in a tabular format for data that I calculated in a spreadsheet. EXTRA corresponds to the difference in the number of categories in the level compared to the number of categories in the level immediately above. RESCALED is the result of rescaling the weights to add to 100%.
# LEVEL CATEGORIES EXTRA EXTRA/CATEGORIES RESCALED
# Top 1
# Subclass 2 1 0.50 20.454545
# Superorder 6 4 0.66 27.272727
# Order 10 4 0.40 16.363636
# Family 15 5 0.33 13.636364
# Genus 27 12 0.44 18.181818
# Species 30 3 0.10 4.090909
# TOTAL 2.44 99.999999
With the weights and the data set where species was added, the taxonomic distance can now be calculated via the FD::gowdis function. As the method of vegan follows the original method of calculating distance as a percentage, these results need to be multipied by 100 to obtain exactly the same result.
taxdist.FD <- FD::gowdis(dune.taxon2, w=steps1)
as.matrix(100*taxdist.FD)[1:10, 1:5]
## Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill 0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545 0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273 0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273 0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273 0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan2), current=100*as.matrix(taxdist.FD))
## [1] TRUE
I will show now how the analysis was done for The role of genetics in mainstreaming the production of new and orphan crops to diversify food systems and support human nutrition.
The data can be downloaded from the Supporting Information.
I saved the Excel file to a local folder. Next I created a new sheet where I copied cells A5 to T65 from sheet ‘(a)’.
crop.down <- "C:\\BiodiversityR_test\\nph15895_copied.xlsx"
crop.data <- as.data.frame(readxl::read_excel(crop.down, sheet="cells A5 to T65"))
## New names:
## * Seed -> Seed...11
## * Seed -> Seed...15
names(crop.data)[3] <- "Species"
names(crop.data)[names(crop.data) == "Seed...11"] <- "Seeds"
names(crop.data)[names(crop.data) == "Seed...15"] <- "Seed"
str(crop.data)
## 'data.frame': 60 obs. of 20 variables:
## $ Crop group £ : chr "New or orphan crop requiring model" "New or orphan crop requiring model" "New or orphan crop requiring model" "New or orphan crop requiring model" ...
## $ Crop common name *: chr "African eggplant" "African gnetum" "African plum" "Bambara groundnut" ...
## $ Species : chr "Solanum aethiopicum" "Gnetum africanum" "Dacryodes edulis" "Vigna subterranea" ...
## $ Diploid : num 1 1 1 1 0 1 1 1 1 0 ...
## $ Polyploid : num 0 0 0 0 1 0 1 0 0 1 ...
## $ Self-fertilising : num 1 0 0 1 0 1 0 0 0 1 ...
## $ Outcrossing : num 0 1 1 0 1 0 1 1 1 1 ...
## $ Annual : num 1 0 0 1 0 0 0 0 0 1 ...
## $ Perennial : num 0 1 1 0 1 1 1 1 1 0 ...
## $ Vegetative : num 0 1 1 0 1 0 1 1 1 0 ...
## $ Seeds : num 1 0 0 1 0 1 0 0 0 1 ...
## $ Root : num 0 0 0 0 0 0 0 1 1 0 ...
## $ Leaf : num 1 1 0 0 1 1 0 1 0 1 ...
## $ Flower : num 0 0 0 0 0 1 0 0 0 0 ...
## $ Seed : num 0 0 0 1 0 0 0 0 0 1 ...
## $ Fruit : num 1 0 1 0 1 1 1 0 0 0 ...
## $ Genus : chr "Solanum" "Gnetum" "Dacryodes" "Vigna" ...
## $ Family : chr "Solanaceae" "Gnetaceae" "Burseraceae" "Fabaceae" ...
## $ Order : chr "Solanales" "Gnetales" "Sapindales" "Fabales" ...
## $ Supra-order : chr "Dicot" "Gnetophyta" "Dicot" "Dicot" ...
# make short rownames
rownames(crop.data) <- make.cepnames(crop.data$Species)
New variables are created for some functional traits:
crop.data$Longevity <- factor(crop.data$Perennial)
levels(crop.data$Longevity) <- list(annual="0", perennial="1")
crop.data$Propagation <- factor(crop.data$Seeds)
levels(crop.data$Propagation) <- list(vegetative="0", seed="1")
For the analysis, we select the following subset of functional traits:
crop.trait <- crop.data[, c("Diploid", "Polyploid", "Self-fertilising", "Outcrossing",
"Longevity", "Propagation",
"Root", "Leaf", "Flower", "Seed", "Fruit")]
str(crop.trait)
## 'data.frame': 60 obs. of 11 variables:
## $ Diploid : num 1 1 1 1 0 1 1 1 1 0 ...
## $ Polyploid : num 0 0 0 0 1 0 1 0 0 1 ...
## $ Self-fertilising: num 1 0 0 1 0 1 0 0 0 1 ...
## $ Outcrossing : num 0 1 1 0 1 0 1 1 1 1 ...
## $ Longevity : Factor w/ 2 levels "annual","perennial": 1 2 2 1 2 2 2 2 2 1 ...
## $ Propagation : Factor w/ 2 levels "vegetative","seed": 2 1 1 2 1 2 1 1 1 2 ...
## $ Root : num 0 0 0 0 0 0 0 1 1 0 ...
## $ Leaf : num 1 1 0 0 1 1 0 1 0 1 ...
## $ Flower : num 0 0 0 0 0 1 0 0 0 0 ...
## $ Seed : num 0 0 0 1 0 0 0 0 0 1 ...
## $ Fruit : num 1 0 1 0 1 1 1 0 0 0 ...
… and the following data on taxonomy:
crop.taxonomy <- crop.data[, c("Species", "Genus", "Family", "Order", "Supra-order")]
str(crop.taxonomy)
## 'data.frame': 60 obs. of 5 variables:
## $ Species : chr "Solanum aethiopicum" "Gnetum africanum" "Dacryodes edulis" "Vigna subterranea" ...
## $ Genus : chr "Solanum" "Gnetum" "Dacryodes" "Vigna" ...
## $ Family : chr "Solanaceae" "Gnetaceae" "Burseraceae" "Fabaceae" ...
## $ Order : chr "Solanales" "Gnetales" "Sapindales" "Fabales" ...
## $ Supra-order: chr "Dicot" "Gnetophyta" "Dicot" "Dicot" ...
Next the taxonomic weights need to be calculated. This can be done via the taxa2dist function.
taxdist.vegan <- vegan::taxa2dist(crop.taxonomy, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan)[1:10, 1:5]
## Solaaeth Gnetafri Dacredul Vignsubt Adandigi
## Solaaeth 0.00000 100 72.40122 72.40122 72.40122
## Gnetafri 100.00000 0 100.00000 100.00000 100.00000
## Dacredul 72.40122 100 0.00000 72.40122 72.40122
## Vignsubt 72.40122 100 72.40122 0.00000 72.40122
## Adandigi 72.40122 100 72.40122 72.40122 0.00000
## Momochar 72.40122 100 72.40122 72.40122 72.40122
## Artoalti 72.40122 100 72.40122 72.40122 72.40122
## Xantsagi 100.00000 100 100.00000 100.00000 100.00000
## Ensevent 100.00000 100 100.00000 100.00000 100.00000
## Brascari 72.40122 100 72.40122 72.40122 72.40122
taxonomy.weights <- attributes(taxdist.vegan)$steps
taxonomy.weights
## Base Genus Family Order Supra-order
## 3.449848 18.064656 16.025098 34.861617 27.598780
It is good practice to cross-check that the taxonomic distance can also be calculated via the weighted Gower distance.
taxdist.FD <- FD::gowdis(crop.taxonomy, w=taxonomy.weights)
as.matrix(100*taxdist.FD)[1:10, 1:5]
## Solaaeth Gnetafri Dacredul Vignsubt Adandigi
## Solaaeth 0.00000 100 72.40122 72.40122 72.40122
## Gnetafri 100.00000 0 100.00000 100.00000 100.00000
## Dacredul 72.40122 100 0.00000 72.40122 72.40122
## Vignsubt 72.40122 100 72.40122 0.00000 72.40122
## Adandigi 72.40122 100 72.40122 72.40122 0.00000
## Momochar 72.40122 100 72.40122 72.40122 72.40122
## Artoalti 72.40122 100 72.40122 72.40122 72.40122
## Xantsagi 100.00000 100 100.00000 100.00000 100.00000
## Ensevent 100.00000 100 100.00000 100.00000 100.00000
## Brascari 72.40122 100 72.40122 72.40122 72.40122
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan), current=100*as.matrix(taxdist.FD))
## [1] TRUE
Now that the weights are known for the taxonomic categories, these can now be combined with the traits in the calculation of the weighted Gower distance.
The five categories of functional traits (Ploidy level with 2 variables of Diploid and Polyploid, Mating strategy with 2 variables Self-fertilizing and Outcrossing, Longevity as categorical variable, Propagation as categorical variable and Part of plant used as food) were each given a total weight of 100, dividing equally among the different variables:
trait.weights <- c(50, 50, # Ploidy
50, 50, # Mating stragegy
100, # Longevity
100, # Propagation
20, 20, 20, 20, 20) # Part of plant used as food
All traits can now be combined, giving the same total weight (500) to each of functional traits and taxonomy.
weights.combined <- c(trait.weights, 5*taxonomy.weights)
sum(weights.combined)
## [1] 1000
Everything is now in place to calculate the combined distance based on ecological traits and taxonomy:
crop.trait_taxo <- cbind(crop.trait, crop.taxonomy)
TraitTaxon.dist <- FD::gowdis(crop.trait_taxo, w=weights.combined)
as.matrix(TraitTaxon.dist)[1:10, 1:5]
## Solaaeth Gnetafri Dacredul Vignsubt Adandigi
## Solaaeth 0.0000000 0.82 0.6820061 0.4220061 0.7620061
## Gnetafri 0.8200000 0.00 0.5400000 0.8400000 0.6200000
## Dacredul 0.6820061 0.54 0.0000000 0.7020061 0.4820061
## Vignsubt 0.4220061 0.84 0.7020061 0.0000000 0.8220061
## Adandigi 0.7620061 0.62 0.4820061 0.8220061 0.0000000
## Momochar 0.4820061 0.74 0.6020061 0.5420061 0.6820061
## Artoalti 0.7320061 0.59 0.4120061 0.7520061 0.4320061
## Xantsagi 0.8400000 0.52 0.5600000 0.8600000 0.6400000
## Ensevent 0.8600000 0.54 0.5400000 0.8400000 0.6600000
## Brascari 0.5520061 0.87 0.7720061 0.5320061 0.6520061
In a sensitivity analysis, double weight can be given to biological traits as follows:
weights.combined2to1 <- c(2*trait.weights, 5*taxonomy.weights)
sum(weights.combined2to1)
## [1] 1500
This method for 2:1 weighting of biological:taxonomy gives the following result
crop.trait_taxo <- cbind(crop.trait, crop.taxonomy)
TraitTaxon.dist <- FD::gowdis(crop.trait_taxo, w=weights.combined2to1)
as.matrix(TraitTaxon.dist)[1:10, 1:5]
## Solaaeth Gnetafri Dacredul Vignsubt Adandigi
## Solaaeth 0.0000000 0.7600000 0.6680041 0.3213374 0.7746707
## Gnetafri 0.7600000 0.0000000 0.3866667 0.7866667 0.4933333
## Dacredul 0.6680041 0.3866667 0.0000000 0.6946707 0.4013374
## Vignsubt 0.3213374 0.7866667 0.6946707 0.0000000 0.8546707
## Adandigi 0.7746707 0.4933333 0.4013374 0.8546707 0.0000000
## Momochar 0.4013374 0.6533333 0.5613374 0.4813374 0.6680041
## Artoalti 0.7346707 0.4533333 0.3080041 0.7613374 0.3346707
## Xantsagi 0.7866667 0.3600000 0.4133333 0.8133333 0.5200000
## Ensevent 0.8133333 0.3866667 0.3866667 0.7866667 0.5466667
## Brascari 0.4946707 0.8266667 0.7880041 0.4680041 0.6280041
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] tcltk stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] readxl_1.3.1 FD_1.0-12 geometry_0.4.5
## [4] ape_5.4 ade4_1.7-16 BiodiversityR_2.12-2
## [7] rgl_0.100.54 vegan3d_1.1-2 vegan_2.5-6
## [10] lattice_0.20-41 permute_0.9-5
##
## loaded via a namespace (and not attached):
## [1] minqa_1.2.4 colorspace_1.4-1 ellipsis_0.3.1
## [4] class_7.3-17 rio_0.5.16 htmlTable_2.0.0
## [7] base64enc_0.1-3 rstudioapi_0.11 splines_4.0.2
## [10] knitr_1.28 effects_4.1-4 Formula_1.2-3
## [13] jsonlite_1.6.1 nloptr_1.2.2.1 cluster_2.1.0
## [16] Rcmdr_2.6-2 png_0.1-7 shiny_1.4.0.2
## [19] compiler_4.0.2 backports_1.1.7 Matrix_1.2-18
## [22] fastmap_1.0.1 survey_4.0 later_1.1.0.1
## [25] tcltk2_1.2-11 acepack_1.4.1 htmltools_0.5.0
## [28] prettyunits_1.1.1 tools_4.0.2 gtable_0.3.0
## [31] glue_1.4.1 dplyr_1.0.2 Rcpp_1.0.4.6
## [34] carData_3.0-4 cellranger_1.1.0 vctrs_0.3.4
## [37] nlme_3.1-148 crosstalk_1.1.0.1 xfun_0.15
## [40] stringr_1.4.0 openxlsx_4.1.5 lme4_1.1-23
## [43] mime_0.9 miniUI_0.1.1.1 lifecycle_0.2.0
## [46] RcmdrMisc_2.7-0 statmod_1.4.34 MASS_7.3-51.6
## [49] zoo_1.8-8 scales_1.1.1 hms_0.5.3
## [52] promises_1.1.1 parallel_4.0.2 sandwich_2.5-1
## [55] RColorBrewer_1.1-2 yaml_2.2.1 curl_4.3
## [58] gridExtra_2.3 ggplot2_3.3.2 rpart_4.1-15
## [61] latticeExtra_0.6-29 stringi_1.4.6 nortest_1.0-4
## [64] e1071_1.7-3 checkmate_2.0.0 boot_1.3-25
## [67] zip_2.0.4 manipulateWidget_0.10.1 rlang_0.4.8
## [70] pkgconfig_2.0.3 evaluate_0.14 purrr_0.3.4
## [73] htmlwidgets_1.5.1 tidyselect_1.1.0 magrittr_1.5
## [76] R6_2.4.1 generics_0.1.0 Hmisc_4.4-0
## [79] DBI_1.1.0 pillar_1.4.4 haven_2.3.1
## [82] foreign_0.8-80 mgcv_1.8-31 survival_3.1-12
## [85] scatterplot3d_0.3-41 abind_1.4-5 nnet_7.3-14
## [88] tibble_3.0.1 crayon_1.3.4 car_3.0-8
## [91] relimp_1.0-5 rmarkdown_2.3 jpeg_0.1-8.1
## [94] progress_1.2.2 grid_4.0.2 data.table_1.12.8
## [97] forcats_0.5.0 digest_0.6.25 webshot_0.5.2
## [100] xtable_1.8-4 httpuv_1.5.4 munsell_0.5.0
## [103] magic_1.5-9 mitools_2.4