Key Biodiversity Areas (KBAs) are sites that contribute significantly to the global persistence of biodiversity. Distinct genetic diversity has been introduced as one of the metrics to identify KBAs for threatened and range-restricted species (IUCN 2016). We recommend using Δ⁺ on SNP or microsatellite data to calculate distinct genetic diversity (Clarke & Warwick 1998). To apply Δ⁺ to calculate distinct genetic diversity and identify KBAs, follow the instructions.
At first you should install and load the packages adegenet, poppr, and vegan into you current R session (Jombart 2008, Jombart & Ahmed 2011, Kamvar et al. 2014, Oksanen et al. 2024). Adegenet is used to handle genetic data sets, vegan is used to calculate Δ⁺, and poppr is used to calculate genetic distances.
# Install packages only, if they were not installed already.
# We suppressed information on the installation and R versions.
if (!requireNamespace("vegan", quietly = TRUE)) {
install.packages("vegan")
}
if (!requireNamespace("adegenet", quietly = TRUE)) {
install.packages("adegenet")
}
if (!requireNamespace("poppr", quietly = TRUE)) {
install.packages("poppr")
}
suppressWarnings(library("vegan"))
suppressWarnings(library("adegenet"))
suppressWarnings(library("poppr"))
Genetic data sets are divers. In addition to the differences between SNPs and microsatellites, there are many different file formats (e.g. GENETIX files (.gtx), Genepop files (.gen), Fstat files (.dat), STRUCTURE files (.str or .stru)). You could use ?import2genind() to receive more information on converting different file formats to a genind.
The genind we used here is included in the adegenet package (Devillard et al. 2009). The data set is from the stray cat (Felis catus L.), which is neither endangered nor geographical restricted. Therefore it cannot be used to identify KBAs and does not include names of potential KBAs. It includes names of different populations instead. Nevertheless, it can be used to explain the calculation steps to identify KBAs with distinct genetic diversity if we pretend those populations would be the names of potential KBAs.
# This is our example data set
# It is a microsatellite data set.
data(nancycats)
# A genind includes basic and optional content.
nancycats
## /// GENIND OBJECT /////////
##
## // 237 individuals; 9 loci; 108 alleles; size: 150.5 Kb
##
## // Basic content
## @tab: 237 x 108 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 8-18)
## @loc.fac: locus factor for the 108 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: genind(tab = truenames(nancycats)$tab, pop = truenames(nancycats)$pop)
##
## // Optional content
## @pop: population of each individual (group size range: 9-23)
## @other: a list containing: xy
# It is important the names of the potential KBAs are included in the optional content.
nancycats@pop
## [1] P01 P01 P01 P01 P01 P01 P01 P01 P01 P01 P02 P02 P02 P02 P02 P02 P02 P02
## [19] P02 P02 P02 P02 P02 P02 P02 P02 P02 P02 P02 P02 P02 P02 P03 P03 P03 P03
## [37] P03 P03 P03 P03 P03 P03 P03 P03 P04 P04 P04 P04 P04 P04 P04 P04 P04 P04
## [55] P04 P04 P04 P04 P04 P04 P04 P04 P04 P04 P04 P04 P04 P05 P05 P05 P05 P05
## [73] P05 P05 P05 P05 P05 P05 P05 P05 P05 P05 P06 P06 P06 P06 P06 P06 P06 P06
## [91] P06 P06 P06 P07 P07 P07 P07 P07 P07 P07 P07 P07 P07 P07 P07 P07 P07 P08
## [109] P08 P08 P08 P08 P08 P08 P08 P08 P08 P09 P09 P09 P09 P09 P09 P09 P09 P09
## [127] P10 P10 P10 P10 P10 P10 P10 P10 P10 P10 P10 P11 P11 P11 P11 P11 P11 P11
## [145] P11 P11 P11 P11 P11 P11 P11 P11 P11 P11 P11 P11 P11 P12 P12 P12 P12 P12
## [163] P12 P12 P13 P13 P13 P13 P13 P13 P13 P13 P13 P13 P13 P13 P13 P14 P14 P14
## [181] P14 P14 P14 P14 P14 P14 P14 P14 P14 P14 P14 P14 P14 P14 P15 P15 P15 P15
## [199] P15 P15 P15 P15 P15 P15 P15 P16 P16 P16 P16 P16 P16 P16 P16 P16 P16 P16
## [217] P16 P12 P12 P12 P12 P12 P12 P12 P17 P17 P17 P17 P17 P17 P17 P17 P17 P17
## [235] P17 P17 P17
## 17 Levels: P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 P12 P13 P14 P15 ... P17
Δ⁺ is calculated for each area separately. For this reason we need a genind for every potential KBA.
# save the names of all potential KBAs
potential_KBAs <- nancycats@pop
# The genind is split into smaller geninds.
# The small geninds are saved in a list.
# each small genind only contains individuals occurring in the potential KBA.
genind_list <- lapply(unique(potential_KBAs), function(subset_by_potential_KBAs) {
individuals <- which(potential_KBAs == subset_by_potential_KBAs)
nancycats[individuals, ]
})
# the small geninds should be assigned to the name of the potential KBA instead of a number.
names(genind_list) <- unique(potential_KBAs)
# This way it is easier to find the genind belonging to the potential KBA of interest.
# Here we are interested in the potential KBA "P11".
genind_list$P11
## /// GENIND OBJECT /////////
##
## // 20 individuals; 9 loci; 108 alleles; size: 33.9 Kb
##
## // Basic content
## @tab: 20 x 108 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 8-18)
## @loc.fac: locus factor for the 108 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: .local(x = x, i = i, j = j, drop = drop)
##
## // Optional content
## @pop: population of each individual (group size range: 20-20)
## @other: a list containing: xy
Distinct genetic diversity needs to be calculated for each genind separately. We demonstrate the two calculation steps (genetic distances and Δ⁺) in detail on the potential KBA “P11”. Then it is demonstated how distinct genetic diversity can be calculated for all geninds.
Δ⁺ is based on genetic distances between individuals in each potential KBA. We recommend calculating genetic distances using the bitwise.dist() function of poppr. Bitwise distances are calculated quickly, also for huge SNP data sets.
# calculate genetic distances
dist_matrix <- bitwise.dist(genind_list$P11)
dist_matrix
## N125 N126 N127 N128 N129 N130 N131
## N126 0.6666667
## N127 0.5000000 0.3888889
## N128 0.4444444 0.5000000 0.5000000
## N129 0.8333333 0.6666667 0.6666667 0.7777778
## N130 0.6111111 0.7222222 0.5555556 0.5000000 0.6111111
## N131 0.7777778 0.6111111 0.7222222 0.6666667 0.2222222 0.6111111
## N132 0.6666667 0.5000000 0.3888889 0.3888889 0.5555556 0.5000000 0.6111111
## N133 0.6111111 0.6666667 0.7222222 0.6666667 0.6111111 0.5555556 0.4444444
## N246 0.8333333 0.6666667 0.6666667 0.6111111 0.8333333 0.7222222 0.7777778
## N247 0.7777778 0.8333333 0.7777778 0.6666667 0.8333333 0.6666667 0.7222222
## N271 0.7222222 0.6666667 0.6666667 0.6666667 0.8333333 0.7222222 0.7222222
## N298 0.7777778 0.8333333 0.8333333 0.7777778 0.6111111 0.6666667 0.6666667
## N299 0.7222222 0.7777778 0.8333333 0.7777778 0.7777778 0.6666667 0.7777778
## N300 0.7222222 0.8333333 0.8888889 0.8333333 0.7777778 0.7777778 0.7777778
## N301 0.7777778 0.7777778 0.7777778 0.7777778 0.6666667 0.6111111 0.6666667
## N302 0.7222222 0.7222222 0.7777778 0.7777778 0.6111111 0.7222222 0.6111111
## N303 0.6111111 0.6666667 0.6666667 0.6111111 0.6111111 0.6111111 0.5000000
## N304 0.6111111 0.5000000 0.4444444 0.5000000 0.3888889 0.5555556 0.5000000
## N310 0.4444444 0.5000000 0.5000000 0.4444444 0.5000000 0.5000000 0.4444444
## N132 N133 N246 N247 N271 N298 N299
## N126
## N127
## N128
## N129
## N130
## N131
## N132
## N133 0.6666667
## N246 0.6111111 0.7222222
## N247 0.7222222 0.7222222 0.6666667
## N271 0.6666667 0.7777778 0.6666667 0.7222222
## N298 0.8333333 0.6666667 0.6111111 0.5555556 0.7777778
## N299 0.8333333 0.6666667 0.6111111 0.6111111 0.6111111 0.3333333
## N300 0.8888889 0.7222222 0.6666667 0.6666667 0.5000000 0.5000000 0.3333333
## N301 0.8333333 0.6666667 0.6111111 0.6111111 0.6666667 0.3333333 0.3333333
## N302 0.7777778 0.6666667 0.6111111 0.6666667 0.5000000 0.3888889 0.3333333
## N303 0.7222222 0.6111111 0.6666667 0.6666667 0.7222222 0.5555556 0.7222222
## N304 0.5000000 0.6666667 0.5555556 0.6666667 0.5000000 0.5555556 0.6666667
## N310 0.5000000 0.5000000 0.3888889 0.5555556 0.3888889 0.3888889 0.5000000
## N300 N301 N302 N303 N304
## N126
## N127
## N128
## N129
## N130
## N131
## N132
## N133
## N246
## N247
## N271
## N298
## N299
## N300
## N301 0.4444444
## N302 0.4444444 0.4444444
## N303 0.7777778 0.6666667 0.6666667
## N304 0.6111111 0.5555556 0.5555556 0.5000000
## N310 0.5000000 0.5000000 0.4444444 0.5000000 0.4444444
Δ⁺ is calculated using the taxondive() function of vegan. taxondive() calculates the allelic distinctiveness for each individual and the average allelic distinctiveness for all individuals in the potential KBA. The average allelic distinctiveness (Δ⁺) can be found as EDplus (expected Delta plus) in the results.
# To calculate Δ⁺ we need allele frequencies additionally to the distance matrix
allele_frequencies <- tab(genind_list$P11, freq = TRUE)
# Display only the first 3 individuals and first 3 alleles
allele_frequencies[1:3, 1:3]
## fca8.117 fca8.119 fca8.121
## N125 0 0 0
## N126 0 0 0
## N127 0 0 0
# calculate Δ⁺
mod <- taxondive(t(allele_frequencies), dist_matrix)
## Warning in sqrt(vardplus): NaNs wurden erzeugt
mod$EDplus
## [1] 0.628655
To calculate distinct genetic diversity for each potential KBA, we created a function which includes both calculation steps and returns Δ⁺.
# create a function to calculate distinct genetic diversity
calculate_EDplus <- function(genind) {
allele_frequencies <- tab(genind, freq = TRUE)
dist_matrix <- bitwise.dist(genind)
mod <- taxondive(t(allele_frequencies), dist_matrix)
return(mod$EDplus)
}
# apply the function to every genind in our list
EDplus_values <- lapply(genind_list, calculate_EDplus)
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
## Warning in sqrt(vardplus): NaNs wurden erzeugt
# Δ⁺ for each potential KBA can be displayed, e.g.
EDplus_values$P11
## [1] 0.628655
EDplus_values$P13
## [1] 0.5519943
To identify a KBA, the species needs to be range-restricted (criterium B1) or threatened (criterium A1). Depending on the criteria different thresholds apply to the the ratio in which the area contributes to the total distinct genetic diversity.
# total distinct genetic diversity:
sumDplus <- sum(unlist(EDplus_values))
# calculate the ratio, in which each area contributes to the total distinct genetic diversity
ratios <- unlist(EDplus_values)/sumDplus * 100
ratios # in %
## P01 P02 P03 P04 P05 P06 P07 P08
## 5.253463 6.212314 6.408845 6.737822 5.610256 6.764519 5.396505 6.877485
## P09 P10 P11 P12 P13 P14 P15 P16
## 5.382647 6.019617 6.264915 5.645949 5.500947 6.256988 5.717629 5.804870
## P17
## 4.145230
# If we have a range-restricted species the threshold would be more than ≥10%.
# Not a single number of our percentages is higher than 10% so no of the areas would qualify as KBA.
# If the species would be vulnerable (threshold: ≥ 1%) endangered, or critically endangered all of the areas will be identified as KBAs.
VU <- ratios[ratios >= 0.5]
names(VU)
## [1] "P01" "P02" "P03" "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12"
## [13] "P13" "P14" "P15" "P16" "P17"
Clarke KR, Warwick RM (1998) A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35, 523–531.
Devillard S, Jombart T, Pontier D (2009) Revealing cryptic genetic structuring in an urban population of stray cats (Felis silvestris catus). Mammalian Biology 74:59–71. DOI: 10.1016/j.mambio.2008.01.001, Data from: Jombart T (2008) adegenet. a R package for the multivariate analysis of genetic markers. Bioinformatics. nancycats [Dataset].
IUCN (2016) A Global Standard for the Identification of Key Biodiversity Areas. Journal of Ecology and Rural Environment, Gland, Switzerland.
Jombart T (2008) adegenet. a R package for the multivariate analysis of genetic markers. Bioinformatics, 1403-1405.
Jombart T, Ahmed I (2011) adegenet. new tools for the analysis of genome-wide SNP data. Bioinformatics.
Kamvar ZN, Tabima JF, Grünwald NJ (2014) Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ, 2, e281.
Oksanen J, Simpson G, Blanchet F et al. (2024) vegan. Community Ecology Package.