1 Packages needed

library(BiodiversityR) # also loads vegan
library(FD)
library(readxl)

2 Introduction

The FD package includes methods for calculating the weighted Gower dissimilarity index, through methods described in A distance‐based framework for measuring functional diversity from multiple traits.

The vegan package includes methods for calculating taxonomic distances, through methods described in A taxonomic distinctness index and its statistical properties.

Here I will show:

  • How taxonomic distance can be calculated via the FD package

  • How functional and taxonomic distance can be combined in a weighted distance measure

These methods resulted from statistical analysis that I conduced for the following article led by Ian Dawson: The role of genetics in mainstreaming the production of new and orphan crops to diversify food systems and support human nutrition.

3 Calculating taxonomic distance via the weighted Gower distance

Here we used an example data set from the vegan package.

data(dune.taxon)
str(dune.taxon)
## 'data.frame':    30 obs. of  5 variables:
##  $ Genus     : chr  "Achillea" "Agrostis" "Aira" "Alopecurus" ...
##  $ Family    : chr  "Asteraceae" "Poaceae" "Poaceae" "Poaceae" ...
##  $ Order     : chr  "Asterales" "Poales" "Poales" "Poales" ...
##  $ Superorder: chr  "Asteranae" "Lilianae" "Lilianae" "Lilianae" ...
##  $ Subclass  : chr  "Magnoliidae" "Magnoliidae" "Magnoliidae" "Magnoliidae" ...
head(dune.taxon)
##                 Genus     Family     Order Superorder    Subclass
## Achimill     Achillea Asteraceae Asterales  Asteranae Magnoliidae
## Agrostol     Agrostis    Poaceae    Poales   Lilianae Magnoliidae
## Airaprae         Aira    Poaceae    Poales   Lilianae Magnoliidae
## Alopgeni   Alopecurus    Poaceae    Poales   Lilianae Magnoliidae
## Anthodor Anthoxanthum    Poaceae    Poales   Lilianae Magnoliidae
## Bellpere       Bellis Asteraceae Asterales  Asteranae Magnoliidae

The taxonomic distance is calculated via the vegan::taxa2dist function.

taxdist.vegan1 <- vegan::taxa2dist(dune.taxon, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan1)[1:10, 1:5]
##          Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill  0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545  0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273  0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273  0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273  0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545

The dune.taxon data set did not explicitly include a species level. This level needs to be added as an extra column for the calculations of the Gower distance. But note first that adding the species level does not change the calculations via taxa2dist.

dune.taxon2 <- data.frame(Species=rownames(dune.taxon), dune.taxon)
str(dune.taxon2)
## 'data.frame':    30 obs. of  6 variables:
##  $ Species   : chr  "Achimill" "Agrostol" "Airaprae" "Alopgeni" ...
##  $ Genus     : chr  "Achillea" "Agrostis" "Aira" "Alopecurus" ...
##  $ Family    : chr  "Asteraceae" "Poaceae" "Poaceae" "Poaceae" ...
##  $ Order     : chr  "Asterales" "Poales" "Poales" "Poales" ...
##  $ Superorder: chr  "Asteranae" "Lilianae" "Lilianae" "Lilianae" ...
##  $ Subclass  : chr  "Magnoliidae" "Magnoliidae" "Magnoliidae" "Magnoliidae" ...
taxdist.vegan2 <- vegan::taxa2dist(dune.taxon2, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan2)[1:10, 1:5]
##          Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill  0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545  0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273  0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273  0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273  0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan1), 
          current=as.matrix(taxdist.vegan2))
## [1] TRUE

To calculate the taxonomic distance via the FD::gowdis function, information is needed on the different weights for the taxonomic levels. This information is available from the steps output of the taxa2dist function:

steps1 <- attributes(taxdist.vegan1)$steps
steps1
##       Base      Genus     Family      Order Superorder   Subclass 
##   4.090909  18.181818  13.636364  16.363636  27.272727  20.454545
steps2 <- attributes(taxdist.vegan2)$steps
steps2
##       Base      Genus     Family      Order Superorder   Subclass 
##   4.090909  18.181818  13.636364  16.363636  27.272727  20.454545

You can check that these add up to 100%

sum(steps1)
## [1] 100

It is possible to manually calculate these steps from differences in the number of categories at each hierarchical level. Here I show these in a tabular format for data that I calculated in a spreadsheet. EXTRA corresponds to the difference in the number of categories in the level compared to the number of categories in the level immediately above. RESCALED is the result of rescaling the weights to add to 100%.

# LEVEL         CATEGORIES     EXTRA    EXTRA/CATEGORIES      RESCALED

# Top                    1
# Subclass               2         1               0.50       20.454545
# Superorder             6         4               0.66       27.272727
# Order                 10         4               0.40       16.363636
# Family                15         5               0.33       13.636364
# Genus                 27        12               0.44       18.181818
# Species               30         3               0.10        4.090909

# TOTAL                                            2.44       99.999999

With the weights and the data set where species was added, the taxonomic distance can now be calculated via the FD::gowdis function. As the method of vegan follows the original method of calculating distance as a percentage, these results need to be multipied by 100 to obtain exactly the same result.

taxdist.FD <- FD::gowdis(dune.taxon2, w=steps1)
as.matrix(100*taxdist.FD)[1:10, 1:5]
##          Achimill Agrostol Airaprae Alopgeni Anthodor
## Achimill  0.00000 79.54545 79.54545 79.54545 79.54545
## Agrostol 79.54545  0.00000 22.27273 22.27273 22.27273
## Airaprae 79.54545 22.27273  0.00000 22.27273 22.27273
## Alopgeni 79.54545 22.27273 22.27273  0.00000 22.27273
## Anthodor 79.54545 22.27273 22.27273 22.27273  0.00000
## Bellpere 22.27273 79.54545 79.54545 79.54545 79.54545
## Bromhord 79.54545 22.27273 22.27273 22.27273 22.27273
## Chenalbu 79.54545 79.54545 79.54545 79.54545 79.54545
## Cirsarve 22.27273 79.54545 79.54545 79.54545 79.54545
## Comapalu 79.54545 79.54545 79.54545 79.54545 79.54545
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan2), current=100*as.matrix(taxdist.FD))
## [1] TRUE

4 Combining functional traits and taxonomic differences in the calculation of distance

4.1 Preparing the data

I will show now how the analysis was done for The role of genetics in mainstreaming the production of new and orphan crops to diversify food systems and support human nutrition.

The data can be downloaded from the Supporting Information.

I saved the Excel file to a local folder. Next I created a new sheet where I copied cells A5 to T65 from sheet ‘(a)’.

crop.down <- "C:\\BiodiversityR_test\\nph15895_copied.xlsx"
crop.data <- as.data.frame(readxl::read_excel(crop.down, sheet="cells A5 to T65"))
## New names:
## * Seed -> Seed...11
## * Seed -> Seed...15
names(crop.data)[3] <- "Species"
names(crop.data)[names(crop.data) == "Seed...11"] <- "Seeds"
names(crop.data)[names(crop.data) == "Seed...15"] <- "Seed"
str(crop.data)
## 'data.frame':    60 obs. of  20 variables:
##  $ Crop group £      : chr  "New or orphan crop requiring model" "New or orphan crop requiring model" "New or orphan crop requiring model" "New or orphan crop requiring model" ...
##  $ Crop common name *: chr  "African eggplant" "African gnetum" "African plum" "Bambara groundnut" ...
##  $ Species           : chr  "Solanum aethiopicum" "Gnetum africanum" "Dacryodes edulis" "Vigna subterranea" ...
##  $ Diploid           : num  1 1 1 1 0 1 1 1 1 0 ...
##  $ Polyploid         : num  0 0 0 0 1 0 1 0 0 1 ...
##  $ Self-fertilising  : num  1 0 0 1 0 1 0 0 0 1 ...
##  $ Outcrossing       : num  0 1 1 0 1 0 1 1 1 1 ...
##  $ Annual            : num  1 0 0 1 0 0 0 0 0 1 ...
##  $ Perennial         : num  0 1 1 0 1 1 1 1 1 0 ...
##  $ Vegetative        : num  0 1 1 0 1 0 1 1 1 0 ...
##  $ Seeds             : num  1 0 0 1 0 1 0 0 0 1 ...
##  $ Root              : num  0 0 0 0 0 0 0 1 1 0 ...
##  $ Leaf              : num  1 1 0 0 1 1 0 1 0 1 ...
##  $ Flower            : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ Seed              : num  0 0 0 1 0 0 0 0 0 1 ...
##  $ Fruit             : num  1 0 1 0 1 1 1 0 0 0 ...
##  $ Genus             : chr  "Solanum" "Gnetum" "Dacryodes" "Vigna" ...
##  $ Family            : chr  "Solanaceae" "Gnetaceae" "Burseraceae" "Fabaceae" ...
##  $ Order             : chr  "Solanales" "Gnetales" "Sapindales" "Fabales" ...
##  $ Supra-order       : chr  "Dicot" "Gnetophyta" "Dicot" "Dicot" ...
# make short rownames
rownames(crop.data) <- make.cepnames(crop.data$Species)

New variables are created for some functional traits:

crop.data$Longevity <- factor(crop.data$Perennial)
levels(crop.data$Longevity) <- list(annual="0", perennial="1")

crop.data$Propagation <- factor(crop.data$Seeds)
levels(crop.data$Propagation) <- list(vegetative="0", seed="1")

For the analysis, we select the following subset of functional traits:

crop.trait <- crop.data[, c("Diploid", "Polyploid", "Self-fertilising", "Outcrossing",
                            "Longevity", "Propagation",
                            "Root", "Leaf", "Flower", "Seed", "Fruit")]
str(crop.trait)
## 'data.frame':    60 obs. of  11 variables:
##  $ Diploid         : num  1 1 1 1 0 1 1 1 1 0 ...
##  $ Polyploid       : num  0 0 0 0 1 0 1 0 0 1 ...
##  $ Self-fertilising: num  1 0 0 1 0 1 0 0 0 1 ...
##  $ Outcrossing     : num  0 1 1 0 1 0 1 1 1 1 ...
##  $ Longevity       : Factor w/ 2 levels "annual","perennial": 1 2 2 1 2 2 2 2 2 1 ...
##  $ Propagation     : Factor w/ 2 levels "vegetative","seed": 2 1 1 2 1 2 1 1 1 2 ...
##  $ Root            : num  0 0 0 0 0 0 0 1 1 0 ...
##  $ Leaf            : num  1 1 0 0 1 1 0 1 0 1 ...
##  $ Flower          : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ Seed            : num  0 0 0 1 0 0 0 0 0 1 ...
##  $ Fruit           : num  1 0 1 0 1 1 1 0 0 0 ...

… and the following data on taxonomy:

crop.taxonomy <- crop.data[, c("Species", "Genus", "Family", "Order", "Supra-order")]
str(crop.taxonomy)
## 'data.frame':    60 obs. of  5 variables:
##  $ Species    : chr  "Solanum aethiopicum" "Gnetum africanum" "Dacryodes edulis" "Vigna subterranea" ...
##  $ Genus      : chr  "Solanum" "Gnetum" "Dacryodes" "Vigna" ...
##  $ Family     : chr  "Solanaceae" "Gnetaceae" "Burseraceae" "Fabaceae" ...
##  $ Order      : chr  "Solanales" "Gnetales" "Sapindales" "Fabales" ...
##  $ Supra-order: chr  "Dicot" "Gnetophyta" "Dicot" "Dicot" ...

4.2 Checking for taxonomic weights

Next the taxonomic weights need to be calculated. This can be done via the taxa2dist function.

taxdist.vegan <- vegan::taxa2dist(crop.taxonomy, varstep=TRUE, check=TRUE)
as.matrix(taxdist.vegan)[1:10, 1:5]
##           Solaaeth Gnetafri  Dacredul  Vignsubt  Adandigi
## Solaaeth   0.00000      100  72.40122  72.40122  72.40122
## Gnetafri 100.00000        0 100.00000 100.00000 100.00000
## Dacredul  72.40122      100   0.00000  72.40122  72.40122
## Vignsubt  72.40122      100  72.40122   0.00000  72.40122
## Adandigi  72.40122      100  72.40122  72.40122   0.00000
## Momochar  72.40122      100  72.40122  72.40122  72.40122
## Artoalti  72.40122      100  72.40122  72.40122  72.40122
## Xantsagi 100.00000      100 100.00000 100.00000 100.00000
## Ensevent 100.00000      100 100.00000 100.00000 100.00000
## Brascari  72.40122      100  72.40122  72.40122  72.40122
taxonomy.weights <- attributes(taxdist.vegan)$steps
taxonomy.weights
##        Base       Genus      Family       Order Supra-order 
##    3.449848   18.064656   16.025098   34.861617   27.598780

It is good practice to cross-check that the taxonomic distance can also be calculated via the weighted Gower distance.

taxdist.FD <- FD::gowdis(crop.taxonomy, w=taxonomy.weights)
as.matrix(100*taxdist.FD)[1:10, 1:5]
##           Solaaeth Gnetafri  Dacredul  Vignsubt  Adandigi
## Solaaeth   0.00000      100  72.40122  72.40122  72.40122
## Gnetafri 100.00000        0 100.00000 100.00000 100.00000
## Dacredul  72.40122      100   0.00000  72.40122  72.40122
## Vignsubt  72.40122      100  72.40122   0.00000  72.40122
## Adandigi  72.40122      100  72.40122  72.40122   0.00000
## Momochar  72.40122      100  72.40122  72.40122  72.40122
## Artoalti  72.40122      100  72.40122  72.40122  72.40122
## Xantsagi 100.00000      100 100.00000 100.00000 100.00000
## Ensevent 100.00000      100 100.00000 100.00000 100.00000
## Brascari  72.40122      100  72.40122  72.40122  72.40122
#### same distance obtained
all.equal(target=as.matrix(taxdist.vegan), current=100*as.matrix(taxdist.FD))
## [1] TRUE

4.3 Calculating weighted distances

Now that the weights are known for the taxonomic categories, these can now be combined with the traits in the calculation of the weighted Gower distance.

The five categories of functional traits (Ploidy level with 2 variables of Diploid and Polyploid, Mating strategy with 2 variables Self-fertilizing and Outcrossing, Longevity as categorical variable, Propagation as categorical variable and Part of plant used as food) were each given a total weight of 100, dividing equally among the different variables:

trait.weights <- c(50, 50, # Ploidy
                   50, 50, # Mating stragegy
                   100, # Longevity
                   100, # Propagation
                   20, 20, 20, 20, 20) # Part of plant used as food

All traits can now be combined, giving the same total weight (500) to each of functional traits and taxonomy.

weights.combined <- c(trait.weights, 5*taxonomy.weights)
sum(weights.combined)
## [1] 1000

Everything is now in place to calculate the combined distance based on ecological traits and taxonomy:

crop.trait_taxo <- cbind(crop.trait, crop.taxonomy)
TraitTaxon.dist <- FD::gowdis(crop.trait_taxo, w=weights.combined)
as.matrix(TraitTaxon.dist)[1:10, 1:5]
##           Solaaeth Gnetafri  Dacredul  Vignsubt  Adandigi
## Solaaeth 0.0000000     0.82 0.6820061 0.4220061 0.7620061
## Gnetafri 0.8200000     0.00 0.5400000 0.8400000 0.6200000
## Dacredul 0.6820061     0.54 0.0000000 0.7020061 0.4820061
## Vignsubt 0.4220061     0.84 0.7020061 0.0000000 0.8220061
## Adandigi 0.7620061     0.62 0.4820061 0.8220061 0.0000000
## Momochar 0.4820061     0.74 0.6020061 0.5420061 0.6820061
## Artoalti 0.7320061     0.59 0.4120061 0.7520061 0.4320061
## Xantsagi 0.8400000     0.52 0.5600000 0.8600000 0.6400000
## Ensevent 0.8600000     0.54 0.5400000 0.8400000 0.6600000
## Brascari 0.5520061     0.87 0.7720061 0.5320061 0.6520061

In a sensitivity analysis, double weight can be given to biological traits as follows:

weights.combined2to1 <- c(2*trait.weights, 5*taxonomy.weights)
sum(weights.combined2to1)
## [1] 1500

This method for 2:1 weighting of biological:taxonomy gives the following result

crop.trait_taxo <- cbind(crop.trait, crop.taxonomy)
TraitTaxon.dist <- FD::gowdis(crop.trait_taxo, w=weights.combined2to1)
as.matrix(TraitTaxon.dist)[1:10, 1:5]
##           Solaaeth  Gnetafri  Dacredul  Vignsubt  Adandigi
## Solaaeth 0.0000000 0.7600000 0.6680041 0.3213374 0.7746707
## Gnetafri 0.7600000 0.0000000 0.3866667 0.7866667 0.4933333
## Dacredul 0.6680041 0.3866667 0.0000000 0.6946707 0.4013374
## Vignsubt 0.3213374 0.7866667 0.6946707 0.0000000 0.8546707
## Adandigi 0.7746707 0.4933333 0.4013374 0.8546707 0.0000000
## Momochar 0.4013374 0.6533333 0.5613374 0.4813374 0.6680041
## Artoalti 0.7346707 0.4533333 0.3080041 0.7613374 0.3346707
## Xantsagi 0.7866667 0.3600000 0.4133333 0.8133333 0.5200000
## Ensevent 0.8133333 0.3866667 0.3866667 0.7866667 0.5466667
## Brascari 0.4946707 0.8266667 0.7880041 0.4680041 0.6280041

5 Session Information

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] tcltk     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] readxl_1.3.1         FD_1.0-12            geometry_0.4.5      
##  [4] ape_5.4              ade4_1.7-16          BiodiversityR_2.12-2
##  [7] rgl_0.100.54         vegan3d_1.1-2        vegan_2.5-6         
## [10] lattice_0.20-41      permute_0.9-5       
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4             colorspace_1.4-1        ellipsis_0.3.1         
##   [4] class_7.3-17            rio_0.5.16              htmlTable_2.0.0        
##   [7] base64enc_0.1-3         rstudioapi_0.11         splines_4.0.2          
##  [10] knitr_1.28              effects_4.1-4           Formula_1.2-3          
##  [13] jsonlite_1.6.1          nloptr_1.2.2.1          cluster_2.1.0          
##  [16] Rcmdr_2.6-2             png_0.1-7               shiny_1.4.0.2          
##  [19] compiler_4.0.2          backports_1.1.7         Matrix_1.2-18          
##  [22] fastmap_1.0.1           survey_4.0              later_1.1.0.1          
##  [25] tcltk2_1.2-11           acepack_1.4.1           htmltools_0.5.0        
##  [28] prettyunits_1.1.1       tools_4.0.2             gtable_0.3.0           
##  [31] glue_1.4.1              dplyr_1.0.2             Rcpp_1.0.4.6           
##  [34] carData_3.0-4           cellranger_1.1.0        vctrs_0.3.4            
##  [37] nlme_3.1-148            crosstalk_1.1.0.1       xfun_0.15              
##  [40] stringr_1.4.0           openxlsx_4.1.5          lme4_1.1-23            
##  [43] mime_0.9                miniUI_0.1.1.1          lifecycle_0.2.0        
##  [46] RcmdrMisc_2.7-0         statmod_1.4.34          MASS_7.3-51.6          
##  [49] zoo_1.8-8               scales_1.1.1            hms_0.5.3              
##  [52] promises_1.1.1          parallel_4.0.2          sandwich_2.5-1         
##  [55] RColorBrewer_1.1-2      yaml_2.2.1              curl_4.3               
##  [58] gridExtra_2.3           ggplot2_3.3.2           rpart_4.1-15           
##  [61] latticeExtra_0.6-29     stringi_1.4.6           nortest_1.0-4          
##  [64] e1071_1.7-3             checkmate_2.0.0         boot_1.3-25            
##  [67] zip_2.0.4               manipulateWidget_0.10.1 rlang_0.4.8            
##  [70] pkgconfig_2.0.3         evaluate_0.14           purrr_0.3.4            
##  [73] htmlwidgets_1.5.1       tidyselect_1.1.0        magrittr_1.5           
##  [76] R6_2.4.1                generics_0.1.0          Hmisc_4.4-0            
##  [79] DBI_1.1.0               pillar_1.4.4            haven_2.3.1            
##  [82] foreign_0.8-80          mgcv_1.8-31             survival_3.1-12        
##  [85] scatterplot3d_0.3-41    abind_1.4-5             nnet_7.3-14            
##  [88] tibble_3.0.1            crayon_1.3.4            car_3.0-8              
##  [91] relimp_1.0-5            rmarkdown_2.3           jpeg_0.1-8.1           
##  [94] progress_1.2.2          grid_4.0.2              data.table_1.12.8      
##  [97] forcats_0.5.0           digest_0.6.25           webshot_0.5.2          
## [100] xtable_1.8-4            httpuv_1.5.4            munsell_0.5.0          
## [103] magic_1.5-9             mitools_2.4