How to calculate Distinct Genetic Diversity?

Distinct Genetic Diversity is one of several options to identify a Key Biodiversity Area (KBA). If you want to include genetic information in the process of determining which area has a higher priority for conservation effort, we recommend using Average Taxonomic Distinctiveness/AvTD/Delta+ on SNP or microsatellite data.

This guide is intended to help you understand how to combine Average Taxonomic Distinctiveness/AvTD/Delta+ from the community ecology package vegan with your genetic data.

install packages

In addition to the vegan package, you also need the adegenet package to simplify the handling of genetic data. If you have not already installed the two packages, install them and load them into your current R session.

if (!requireNamespace("vegan", quietly = TRUE)) {      
  install.packages("vegan")
}
if (!requireNamespace("adegenet", quietly = TRUE)) {   
  install.packages("adegenet")
}
library("vegan")
library("adegenet")

prepare your data: create a genind!

This manual just shows one of several ways how to create a genind object. Use ?import2genind() to receive more information. You can import data to genind not only from a data frame, but an aligmnent of sequences or several data files like GENETIX files (.gtx), Genepop files (.gen), Fstat files (.dat), STRUCTURE files (.str or .stru) and not only microsatellite data sets but also SNP and AFLP data sets.

In this example we import microsatellite data from a data frame. The first column was not only called “pop” like population, but the R example dataset nancycats also contains information about populations. Your dataset should contain area names instead. The row names are named like the individuals tested and all other columns after the first column are names of loci.

To import from a data frame into genind, a vector with area names is needed and a table with genetic information of each individual without area names is required. Use ncode to specify how many letters belong to an allele.

# load microsatellite data set as a dataframe
data(nancycats)
df <- genind2df(nancycats)
# view dataframe
head(df)

##      pop   fca8  fca23  fca43  fca45  fca77  fca78  fca90  fca96  fca37
## N215 P01   <NA> 136146 139139 116120 156156 142148 199199 113113 208208
## N216 P01   <NA> 146146 139145 120126 156156 142148 185199 113113 208208
## N217 P01 135143 136146 141141 116116 152156 142142 197197 113113 210210
## N218 P01 133135 138138 139141 116126 150150 142148 199199 091105 208208
## N219 P01 133135 140146 141145 126126 152152 142148 193199 113113 208208
## N220 P01 135143 136146 145149 120126 150156 148148 193195 091113 208208

# area names -let's pretend they would be area names
pop <- as.vector(df$pop)
# view area names
pop

##   [1] "P01" "P01" "P01" "P01" "P01" "P01" "P01" "P01" "P01" "P01" "P02" "P02"
##  [13] "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02"
##  [25] "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P02" "P03" "P03" "P03" "P03"
##  [37] "P03" "P03" "P03" "P03" "P03" "P03" "P03" "P03" "P04" "P04" "P04" "P04"
##  [49] "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P04"
##  [61] "P04" "P04" "P04" "P04" "P04" "P04" "P04" "P05" "P05" "P05" "P05" "P05"
##  [73] "P05" "P05" "P05" "P05" "P05" "P05" "P05" "P05" "P05" "P05" "P06" "P06"
##  [85] "P06" "P06" "P06" "P06" "P06" "P06" "P06" "P06" "P06" "P07" "P07" "P07"
##  [97] "P07" "P07" "P07" "P07" "P07" "P07" "P07" "P07" "P07" "P07" "P07" "P08"
## [109] "P08" "P08" "P08" "P08" "P08" "P08" "P08" "P08" "P08" "P09" "P09" "P09"
## [121] "P09" "P09" "P09" "P09" "P09" "P09" "P10" "P10" "P10" "P10" "P10" "P10"
## [133] "P10" "P10" "P10" "P10" "P10" "P11" "P11" "P11" "P11" "P11" "P11" "P11"
## [145] "P11" "P11" "P11" "P11" "P11" "P11" "P11" "P11" "P11" "P11" "P11" "P11"
## [157] "P11" "P12" "P12" "P12" "P12" "P12" "P12" "P12" "P13" "P13" "P13" "P13"
## [169] "P13" "P13" "P13" "P13" "P13" "P13" "P13" "P13" "P13" "P14" "P14" "P14"
## [181] "P14" "P14" "P14" "P14" "P14" "P14" "P14" "P14" "P14" "P14" "P14" "P14"
## [193] "P14" "P14" "P15" "P15" "P15" "P15" "P15" "P15" "P15" "P15" "P15" "P15"
## [205] "P15" "P16" "P16" "P16" "P16" "P16" "P16" "P16" "P16" "P16" "P16" "P16"
## [217] "P16" "P12" "P12" "P12" "P12" "P12" "P12" "P12" "P17" "P17" "P17" "P17"
## [229] "P17" "P17" "P17" "P17" "P17" "P17" "P17" "P17" "P17"

# loci and individuals
head(df[2:ncol(df)])

##        fca8  fca23  fca43  fca45  fca77  fca78  fca90  fca96  fca37
## N215   <NA> 136146 139139 116120 156156 142148 199199 113113 208208
## N216   <NA> 146146 139145 120126 156156 142148 185199 113113 208208
## N217 135143 136146 141141 116116 152156 142142 197197 113113 210210
## N218 133135 138138 139141 116126 150150 142148 199199 091105 208208
## N219 133135 140146 141145 126126 152152 142148 193199 113113 208208
## N220 135143 136146 145149 120126 150156 148148 193195 091113 208208

# create genind
nancycats <- df2genind(df[2:ncol(df)], pop = pop, ncode = 3)  
# this is how a genind is structured:
nancycats

## /// GENIND OBJECT /////////
## 
##  // 237 individuals; 9 loci; 108 alleles; size: 140.2 Kb
## 
##  // Basic content
##    @tab:  237 x 108 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 8-18)
##    @loc.fac: locus factor for the 108 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: df2genind(X = df[2:ncol(df)], ncode = 3, pop = pop)
## 
##  // Optional content
##    @pop: population of each individual (group size range: 9-23)

calculate allele abundances

The next step is to create a genpop object, as this is a quick and easy way to get the allele abundances for each area.

# create genpop
catpop <- genind2genpop(nancycats)

## 
##  Converting data from a genind to a genpop object... 
## 
## ...done.

# this is how a genpop is structured:
catpop

## /// GENPOP OBJECT /////////
## 
##  // 17 populations; 9 loci; 108 alleles; size: 28.9 Kb
## 
##  // Basic content
##    @tab:  17 x 108 matrix of allele counts
##    @loc.n.all: number of alleles per locus (range: 8-18)
##    @loc.fac: locus factor for the 108 columns of @tab
##    @all.names: list of allele names for each locus
##    @ploidy: ploidy of each individual  (range: 2-2)
##    @type:  codom
##    @call: genind2genpop(x = nancycats)
## 
##  // Optional content
##    - empty -

# view your allele abundandes
head(catpop@tab[, 1:3])

##     fca8.135 fca8.143 fca8.133
## P01        9        4        2
## P02       14        0        8
## P03        1        1        0
## P04        7        1        1
## P05        7        6        0
## P06        2        0        0

# rename! These are allele abundances!
allele_abundances <- catpop@tab

calculate genetic distance between areas

The best option to calculate genetic distances is to use the function taxa2dist() from the package vegan.

# switch columns and rows
t_allele_abundances <- t(allele_abundances)

# calculate genetic distances from a classification table with variable step lengths
taxdis <- taxa2dist(t_allele_abundances, varstep=TRUE)

calculate Distinct Genetic Diversity

We calculate Distinct Genetic Diversity by calculating Average Taxonomic Distinctiveness/AvTD/Delta+.

# Calculate Average Taxonomic Distinctiveness/AvTD/Delta+
mod <- taxondive(allele_abundances, taxdis)
# view results
mod$Dplus

##      P01      P02      P03      P04      P05      P06      P07      P08 
## 93.22286 89.29504 89.94338 86.55443 89.66798 89.03269 92.15152 88.07585 
##      P09      P10      P11      P12      P13      P14      P15      P16 
## 90.92341 90.14573 86.63087 89.23613 89.10459 88.08156 90.88370 92.70398 
##      P17 
## 91.74986

Apply KBA criteria

To apply KBA criteria we want to know how much each area contributes to the total Distinct Genetic Diversity. Depending on how endangered the species you have examined is, a different threshold can be applied. Keep in mind that the example dataset contained additional information about populations not areas. Also, the example data set is from the stray cat (Felis catus L.), which is neither endangered nor geographical restricted. Therefore, the evaluation below only serves to demonstrate the method.

# calculate ratios
sumDplus <- sum(mod$Dplus)
ratios <- mod$Dplus/sumDplus * 100
ratios # in %

##      P01      P02      P03      P04      P05      P06      P07      P08 
## 6.103355 5.846198 5.888645 5.666769 5.870615 5.829022 6.033213 5.766377 
##      P09      P10      P11      P12      P13      P14      P15      P16 
## 5.952809 5.901893 5.671773 5.842341 5.833729 5.766751 5.950208 6.069383 
##      P17 
## 6.006916

# KBA Criteria
###############################
###############################

# A1:   Threatened species
###############################
# A1a:  Critically endangered/ Endangered species 
#       Threshold: >= 0.5%
A1a <- ratios[ratios >= 0.5]
# Show the areas that are proposed as KBA(s) for criterion A1a.
names(A1a) # In this example all areas would qualify as KBAs.

##  [1] "P01" "P02" "P03" "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12"
## [13] "P13" "P14" "P15" "P16" "P17"

# A1b:  Vulnerable species
#       Threshold: >= 1%
A1b <- ratios[ratios >= 1]
# Show the areas that are proposed as KBA(s) for criterion A1b.
names(A1b) # In this example all areas would qualify as KBAs.

##  [1] "P01" "P02" "P03" "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12"
## [13] "P13" "P14" "P15" "P16" "P17"

# A1c:  Critically endangered/ Endangered species due only to past/current decline [Red List A only, but not A3 only]
#       Threshold: >= 0.1%
A1c <- ratios[ratios >= 0.1]
# Show the areas that are proposed as KBA(s) for criterion A1c.
names(A1c) # In this example all areas would qualify as KBAs.

##  [1] "P01" "P02" "P03" "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12"
## [13] "P13" "P14" "P15" "P16" "P17"

# A1d:  Vulnerable species due only to past/current decline [Red List A only, but not A3 only]
#       Threshold: >= 0.2%
A1d <- ratios[ratios >= 0.2]
# Show the areas that are proposed as KBA(s) for criterion A1d.
names(A1d) # In this example all areas would qualify as KBAs.

##  [1] "P01" "P02" "P03" "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12"
## [13] "P13" "P14" "P15" "P16" "P17"

# B1:   Individual geographically restricted species
###############################
#       Threshold >= 10%
B1 <- ratios[ratios >= 10]
# Show the areas that are proposed as KBA(s) for criterion B1.
names(B1) # In this example none of the areas would qualify as KBAs.

## character(0)