Introduction

As populations of species interact with one another, they form biological communites. Community ecology studies the organisation and functioning of these communities. During this session we will focus on one aspect of community ecology, diversity. Understanding the distribution of diversity across ecosystems, and the environmental factors influencing diversity and community composition, are critical components of ecology. Conceptually, the diversity of ecosystems can be broken down into three spatial scales (Whittaker, 1972):


Alpha diversity
The diversity of a local community (a single site or sample).

Gamma diversity
The total diversity of the ecosystem (considering all sites or samples).

Beta diversity
The diversity between local communities. Classically defined as the relationship between gamma diversity and alpha diversity, and in its simplest iteration can be calculated as gamma diversity divided by alpha diversity (though beta diversity is commonly explored with more complex metrics).


beta1

beta2

Fig. 1. The relationship between alpha, beta, and gamma diversity under scenarios of minimum and maximum differentiation. In the minimum differentiation scenario, each local community harbours five species (alpha diversity = 5), a total of five species are present across the whole ecosystem (gamma diversity = 5), and there is no differentiation between local communities (beta diversity = 1). In the maximum differentiation scenario each local community harbours five species, but all local communities harbour a distinct cohort of species therefore a total of fifteen species are present in the ecosystem (gamma diversity = 15) and there is full differentiation between each local community (beta diversity = 3).


Here we will explore diversity across multiple scales using a case study from Plymouth Sound to familiarise you with key concepts in community ecology. We will explore classic measures of alpha and beta diversity, and use statistical analyses to test how environmental conditions such as heavy-metal contamination and nitrate enrichment affect diversity. This will equip you to perform your own diversity analysis on bacterial community data from the Plymouth Sound.



map
Fig. 2. Sampling locations in the Plymouth Sound. WM: West Mud, MS: Mallard Shoal, SL: Sutton Lock, JC: Jennycliff Bay, IB: Inner Breakwater, CB: Cawsand Bay.


Load required packages

library(vegan)
library(ggplot2)
library(viridis)
library(cowplot)
library(plyr)


Upload raw data

meio.data <- read.csv("meiofauna_data.csv")
meio.data <- meio.data[, 2:107]
env.data <- read.csv("env_data_alt.csv")


Alpha diversity

Alpha diversity refers to the diversity of a local community (i.e. a single site or sample). The simplest form of alpha diversity is species richness (as represented in Fig. 1) which is the number of species observed in the local community. However, there are many different metrics which can be used to quantify alpha diversity in community ecology, that can broadly be broken down into three categories:
- measures of species richness
- measures of species evenness
- overall diversity metrics considering both richness and evenness.
Here we will explore three of the most common.

Richness
Richness is simply the number of species at a specific site. Despite the simplicity of richness, multiple measures exist, ranging from the number of species observed at the site through to estimates of true richness based on models (e.g. Chao1 richness).

Shannon’s Index H’
Shannon’s H’ is the most frequently used metric to quantify overall diversity of a site. Shannon’s H’ takes into account both the total number of species observed at a site, and how evenly distributed species abundance is at each site. Shannon’s H’ is a measure of how difficult it is to predict the identity of a randomly chosen indvidual from the community, and is calculated as follows: \[\begin{align} H'=-\sum^S_{i = 1}p_i~ln~p_i \end{align}\] Where \(S\) is species richness, \(P_i\) is the proportion of the community made up on species \(i\), and \(ln\) is the natural log.

Pielou’s Evenness J’
Evenness is an important component of diversity, and reflects how evenly distributed abundances are between all species present at a site. A common measure of eveness is Pielou’s Evenness. Pielou’s Evenness can be easily derived from Shannon’s H’ as follows:
\[J' = \frac{H'}{H'_{max}} \] Where \(H'_{max}\) is the maximum possible \(H'\) value for a community:
\[\begin{align} H'_{max} = - \sum^S_{i = 1} \frac{1}{S} ~ ln ~ \frac{1}{S} = ln ~ S \end{align}\] Where \(S\) is the number of species observed, and \(ln\) is the natural log.


Calculate alpha diversity

# Observed Richness
richness <- specnumber(meio.data)  

# Shannon's H'
H <- diversity(meio.data)

# Pielou's Evenness
evenness <- H/log(richness)
  
# Create alpha diversity dataframe including environmental data
alpha <- cbind(shannon = H, richness = richness, pielou = evenness, env.data)
head(alpha)
##    shannon richness    pielou             site depth sediment  NO3  Cu   Hg
## 1 2.453751       34 0.6958309      Cawsand Bay    18     Silt 0.87  32 0.09
## 2 1.613972       16 0.5821174 Inner Breakwater     6     Sand 0.90 119 1.15
## 3 2.852518       35 0.8023176  Jenny Cliff Bay     9     Silt 0.64  18 0.01
## 4 2.421241       19 0.8223099    Mallard Shoal     4     Silt 0.70  38 0.02
## 5 1.777476       31 0.5176128         West Mud     3     Sand 1.10  81 0.82
## 6 2.121223       26 0.6510620      Sutton Lock     8     Sand 0.50  65 0.60


Plot alpha diversity

# define colour pattern to use for the different sites
cpattern=c("cyan2","darkgoldenrod2","plum","darkorchid4","brown3","steelblue4")

plot.rich <-ggplot(alpha, aes(x = site, y = richness, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  ylab("Species Richness") +
  xlab("") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))

plot.even <- ggplot(alpha, aes(x = site, y = evenness, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  ylab("Pielou's Evenness") +
  xlab("") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))

plot.shan <- ggplot(alpha, aes(x = site, y = shannon, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  ylab("Shannon's H'") + 
  xlab("") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))

legend <- get_legend(plot.even)

plot_grid(plot.shan + theme(legend.position = "none"), plot.rich + theme(legend.position = "none"), plot.even + theme(legend.position = "none"),ncol = 3)

These graphs illustrate the differences between alpha diversity metrics and the relationship between species richness, Pielou’s evenness, and Shannon’s H’. Lets focus on two examples: Cawsand Bay and Mallard Shoal. These two sites have similar Shannon’s H’ scores, however, this is resultant of distinct mechanisms. Cawsand Bay has a large number of species (high species richness) but is relatively uneven (moderate Pielou’s Evenness), resulting in a moderate Shannon’s H’. Mallard Shoal, in contrast, has a small number of species but is very even, also resulting in a moderate Shannon’s H’. Alpha diversity is structured very differently at Cawsand Bay compared with Mallard Shoal, but if we had used Shannon’s H’ alone we may not have identified these important differences.

Relationships between alpha diversity and environment

Simple linear models

summary(lm(shannon ~ depth, alpha))
summary(lm(shannon ~ NO3, alpha))
summary(lm(shannon ~ Cu, alpha))
summary(lm(shannon ~ Hg, alpha))
## 
## Call:
## lm(formula = shannon ~ Hg, data = alpha)
## 
## Residuals:
##        1        2        3        4        5        6 
## -0.07859  0.04493  0.24747 -0.17472 -0.09146  0.05236 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.61413    0.09721  26.891 1.14e-05 ***
## Hg          -0.90878    0.15489  -5.867  0.00421 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1666 on 4 degrees of freedom
## Multiple R-squared:  0.8959, Adjusted R-squared:  0.8699 
## F-statistic: 34.43 on 1 and 4 DF,  p-value: 0.004213

The linear model outputs demonstrate that there is a significant negative relationship between both Copper and Mercury concentration, and Shannon’s H’ across the Plymouth Sound. We can then fit the significant regression lines to our plots using the geom_smooth function in the r package ggplot2.


depth.reg <- ggplot(alpha, aes(x = depth, y = shannon, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  xlab("Depth (m)") + 
  ylab("Shannon's H'") +
  ylim(1.1,3) +
  theme_bw()

NO3.reg <- ggplot(alpha, aes(x = NO3, y = shannon, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  xlab(bquote(Nitrate ~ (mg~kg^-1))) + 
  ylab("") +
  ylim(1.1,3) +
  theme_bw()

cu.reg <- ggplot(alpha, aes(x = Cu, y = shannon, colour = site)) +
  geom_smooth(method = "lm", colour = "black", fill = "grey90") +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  xlab(bquote(Copper ~ (mg~kg^-1))) + 
  ylab("") +
  ylim(1.1,3) +
  theme_bw()

hg.reg <- ggplot(alpha, aes(x = Hg, y = shannon, colour = site)) +
  geom_smooth(method = "lm", colour = "black", fill = "grey85") +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  xlab(bquote(Mercury ~ (mg~kg^-1))) + 
  ylab("") +
  ylim(1.1,3) +
  theme_bw()

legend <- get_legend(hg.reg)

plot_grid(depth.reg + theme(legend.position = 'none'), NO3.reg + theme(legend.position = 'none'), cu.reg + theme(legend.position = 'none'), hg.reg+ theme(legend.position = 'none'), legend, rel_heights = c(1,1,1,1,0.5),  ncol = 5)



Analysis of Variance (ANOVA)

anova <- aov(shannon ~ sediment, alpha)
summary(anova)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## sediment     1 0.8176  0.8176   13.11 0.0223 *
## Residuals    4 0.2494  0.0624                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sed.plot <- ggplot(alpha, aes(x = sediment, y = shannon, fill = sediment)) +
  geom_boxplot(aes(fill = sediment)) +
  geom_point(size = 3, aes(colour = site)) + 
  scale_colour_manual(values=cpattern)+
  scale_fill_manual(values = c("grey60", "grey90"), guide = FALSE) +
  ylab("Shannon's H'") + 
  xlab('')+
  theme_bw() 
 sed.plot

The one-way ANOVA reveals a significant relationship between sediment type (a categorical variable) and Shannon’s H’, as greater meiofaunal diversity was observed in silt sediments compared with sand sediments throughout Plymouth Sound.



Beta Diversity

In this section, we will explore the differentiation between local communities across the Plymouth Sound. That is, to compare the dissimilarity in community composition between each pair of sites. Such analyses are challenging with large species by site matrices like our meio.data, and consequently we us dissimilarity metrics to summarise differences between pairs of communities. There are a large number of dissimilarity metrics which are used by ecologists, which each have their own idiosyncrasies and applications. In the session we will calculate and compare two of the most common metrics, Bray-Curtis dissimilarity and Jaccard dissimilarity.


Bray-Curtis Dissimilarity
Bray-Curtis dissimilarity is an abundance-based dissimilarity metric. It is calculated as follows:

\[BC_{jk} = \frac{\sum |x_{ij} - x_{ik}|}{\sum (x_{ij}+ x_{ik})} \] Where \(BC_{jk}\) is the Bray-Curtis dissimilarity between sites \(j\) and \(k\), \(x_{ij}\) is the abundance of species \(i\) in community \(j\) and \(x_{ik}\) represents the abundance of species \(i\) in community \(k\).

Jaccard Dissimilarity
Jaccard dissimilarity is a binary dissimilarity metric, which explores the intersection over union of sites. Jaccard dissimilarity is calculated as follows: \[JAC_{jk} = 1- \frac{|x_{j} \cap x_{k}|}{|x_{j} \cup x_{k}|} = 1- \frac{|x_{j} \cap x_{k}|}{|x_j| + |x_k| - |x_{j} \cap x_{k}|}\] Where \(JAC_{jk}\) is the Jaccard dissimilarity between sites \(j\) and \(k\), \(|x_{j} \cap x_{k}|\) is the number of species shared between sites \(j\) and \(k\), and \(|x_{j} \cup x_{k}|\) is the number of species in site \(j\), plus the number of species in site \(k\), minus the number of species shared between the sites (or in simpler terms, the total number of species considering both sites).



Calculate pairwise dissimilarity

meio.mdf <- as.matrix.data.frame(meio.data)
rownames(meio.mdf) <- env.data$site

meio.bray <- vegdist(meio.mdf, method = "bray")
meio.bray
##                  Cawsand Bay Inner Breakwater Jenny Cliff Bay Mallard Shoal  West Mud
## Inner Breakwater   0.9267176                                                         
## Jenny Cliff Bay    0.7335542        0.9532976                                        
## Mallard Shoal      0.9596577        0.8998273       0.9394366                        
## West Mud           0.6964623        0.9674981       0.8112900     0.9516616          
## Sutton Lock        0.8507216        0.9824561       0.9057592     0.9576516 0.9169350
meio.jac <- vegdist(meio.mdf, method = "jaccard", binary = T)
meio.jac
##                  Cawsand Bay Inner Breakwater Jenny Cliff Bay Mallard Shoal  West Mud
## Inner Breakwater   0.9130435                                                         
## Jenny Cliff Bay    0.7678571        0.9375000                                        
## Mallard Shoal      0.9183673        0.9393939       0.6829268                        
## West Mud           0.7962963        0.9782609       0.6530612     0.7500000          
## Sutton Lock        0.8461538        0.9230769       0.8039216     0.8750000 0.7872340

It is important to consider the differences in the underlying formula for calculating Bray-Curtis dissimilarity and Jaccard dissimilarity, and the consequent differences in output pairwise dissimilarity matrices. Jaccard dissimilarity simply compares the overlap in species presence/absence between pairs of communities, consequently two communities with identical species present will have a Jaccard score of 0 regardless of the relative abundace of these organisms in the community. For example, through the lens of Jaccard dissimilarity, a community with 10 frogs, 1 cat, and 1 fox would be considered identical to a community with 1 frog, 3 cats, and 8 foxes. In contrast, Bray-Curtis dissimilarity considers the abundance of each species in each community and consequently would differentiate these communities. In some cases, it is useful to compare the outputs of both dissimilarity metrics to gain deeper insights into the structure of communities in an ecosystem.

While summary insights into community dissimilarity are valuable for understanding the factors which determine the structure of communities through time and space, it is important to consider which species are driving these trends. We can identify the species which contribute most to Bray-Curtis dissimilarity between each pair of sites using SIMPER analysis.


SIMPER
SIMPER is a method for identifying which species contribute most to beta diversity (Bray-Curtis dissimilarity). SIMPER identifies the most abundant and/or most variables species in the dataset.

\[BC_{ijk} = \frac{|x_{ij}-x_{ik}|}{\sum(x_{ij}+x_{ik})}\]

Where \(x_{ij}\) is the abundance of species \(i\) in community \(j\), and \(x_{ik}\) is the abundance of species \(i\) in community \(k\). \(BC_{ijk}\) is the contribution of species \(i\) to overall Bray-Curtis dissimilarity between communities \(j\) and \(k\). The overall Bray-Curtis dissimilarity between communities \(j\) and \(k\) is the sum of individual species contributions. Note that SIMPER is severely biased towards highly abundant species.


We can use the simper function in the R package vegan to calculate the contribution of each species to Bray-Curtis dissimilarity between sites in the Plymouth Sound. The simper function will automatically perform pairwise comparisons between sites, and return cumulative contribution scores up to a maximum of 0.7 (70% of total dissimilarity). You can cross reference the species identified with your taxonomy table and discuss the relevant biology and ecology of these organisms.


simper(meio.data, env.data$site, permutations = 999)
##     Cawsand_Inner                                                                        
## otu "OTU_21"      "OTU_73"    "OTU_63"    "OTU_71"    "OTU_3162"  "OTU_104"   "OTU_47"   
## cs  "0.2731253"   "0.4094143" "0.4942529" "0.5604817" "0.6119321" "0.6584565" "0.7033388"



Ordination

PERMANOVA

PERMANOVA (permutational multivariate analysis of variance; Anderson 2001) is non-parametric multivariate statistical test used to quantify the impact of both continuous and categorical variables on dissimilarity between communities. While it is valid to construct a PERMANOVA model which includes continuous variables, in this instance we will use a simple PERMANOVA model to test the effect of sediment type on meiofaunal community composition. The input for PERMANOVA is a dissimilarity matrix (Bray-Curtis dissimilarity in this case), and corresponding environmental data. A resultant p-value < 0.05 indicates that centroid position and/or dispersion differs between the groups in the model.
As PERMANOVA is affected by both centroid position and disperison, we perform a homogeneity of dispersions analysis using betadisper to establish whether dispersion is homogeneous between groups (in this case, silt and sand sediment associated communities). We will then perform PERMANOVA analysis using the adonis function in the R package vegan.

# Homogeneity of dispersion test
permutest(betadisper(meio.bray, env.data$sediment))
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 719
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.003431 0.0034315 0.3266    719 0.6014
## Residuals  4 0.042029 0.0105073
# PERMANOVA analysis
adonis(meio.bray ~ sediment, data = env.data, permutations = 999)
## 
## Call:
## adonis(formula = meio.bray ~ sediment, data = env.data, permutations = 999) 
## 
## Permutation: free
## Number of permutations: 719
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)
## sediment   1   0.33396 0.33396 0.78833 0.16463      1
## Residuals  4   1.69454 0.42364         0.83537       
## Total      5   2.02851                 1.00000

The betadisper analysis shows that there is no signficiant difference in dispersion between sand sediments and silt sediments. The PERMANOVA analysis show that there is no significant difference in centroid position or dispersion between sediment types.

Principal Coordinates Analysis (PCoA)

Now that we have calculated our dissimilarity matrices, we can use ordination to visualise the structure of these communities. Ordination by definition is the representation of large numbers of sites or species in low dimensional space, such that any underlying relationships in these data become visually apparent. Below we will use functions in the R package vegan to perform and plot principal coordinates analysis for both Bray-Curtis and Jaccard dissimilarity matrices to visualise meiofaunal community structure across the Plymouth Sound.

# calculate principal coordinates analysis (Bray-Curtis)
pcoa.meio.bray <- cmdscale(meio.bray, k = 2, eig = T)

# extract axis positions for each site from cmdscale object and create a dataframe for plotting
pcoa.meio.bray.plotting <- as.data.frame(pcoa.meio.bray$points)
colnames(pcoa.meio.bray.plotting) <- c("axis_1", "axis_2")
pcoa.meio.bray.plotting$site <- rownames(pcoa.meio.bray.plotting)
pcoa.meio.bray.plotting$sediment <- env.data$sediment

# calculate the proportion of variance in the data which is explained by the first two PCoA axes
pcoa.meio.bray$eig[1]/(sum(pcoa.meio.bray$eig))
## [1] 0.3047715
pcoa.meio.bray$eig[2]/(sum(pcoa.meio.bray$eig))
## [1] 0.2235254
# create a PCoA plot
pcoa.meio.bray.plot <- ggplot(pcoa.meio.bray.plotting, aes(x = axis_1, y = axis_2, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  theme_bw() + 
  xlab("PCoA 1 (30.8%)") +
  ylab("PCoA 2 (22.3%)") +
  annotate(geom = 'text', label = 'Bray-Curtis', x = Inf, y = -Inf, hjust = 1.15, vjust = -1)

# repeat process with Jaccard dissimilarity matrix
pcoa.meio.jac <- cmdscale(meio.jac, k = 2, eig = T)

pcoa.meio.jac.plotting <- as.data.frame(pcoa.meio.jac$points)
colnames(pcoa.meio.jac.plotting) <- c("axis_1", "axis_2")
pcoa.meio.jac.plotting$site <- rownames(pcoa.meio.jac.plotting)

pcoa.meio.jac$eig[1]/(sum(pcoa.meio.jac$eig))
## [1] 0.3166126
pcoa.meio.jac$eig[2]/(sum(pcoa.meio.jac$eig))
## [1] 0.2390876
pcoa.meio.jac.plot <- ggplot(pcoa.meio.jac.plotting, aes(x = axis_1, y = axis_2, colour = site)) +
  geom_point(size = 3) +
  scale_colour_manual(values=cpattern)+
  theme_bw() + 
  xlab("PCoA 1 (31.6%)") +
  ylab("PCoA 2 (24.0%)") +
  annotate(geom = 'text', label = 'Jaccard', x = Inf, y = -Inf, hjust = 1.215, vjust = -1)

# extract plot legend
legend <- get_legend(pcoa.meio.jac.plot)

# plot Bray-Curtis PCoA and Jaccard PCoA side by side
plot_grid(pcoa.meio.bray.plot + theme(legend.position = 'none'), pcoa.meio.jac.plot + theme(legend.position = 'none'), legend, ncol = 3, rel_widths = c(1,1,0.5))