Microbial Community Structure

Introduction

In this class, we will learn how to analyse microbial community structure using ASV tables.

Ecological community structure refers to the composition and arrangement of species within a community, including their abundance, diversity, and the relationships between them. It encompasses the distribution of organisms across different trophic levels, their interactions, and how they vary spatially and temporally.

In the context of microbial ecology, community structure involves the relative abundances of microbial taxa (e.g., ASVs or OTUs) and how these taxa are distributed across samples or environments. It provides insights into the functional roles and ecological dynamics within ecosystems.

Our focus will be on:

Understanding ecological distances and beta diversity among samples.
Visualising samples in multivariate space using NMDS (Non-metric Multidimensional Scaling).
Interpreting differences in microbial communities based on metadata.

Case Study

This analysis is based on data from the paper Microbial Community Changes in 26,500-Year-Old Thawing Permafrost by Scheel et al. (2022), published in Frontiers in Microbiology (https://doi.org/10.3389/fmicb.2022.787146).

The study investigates how microbial communities in ancient permafrost respond to abrupt thawing events in Northeast Greenland.

Methods:

Site Description: The study was conducted in Northeast Greenland, where a thermal erosion gully collapsed in 2018, leading to the thawing of ancient permafrost material.

Sample Collection: Soil samples were collected from the eroding gully over two consecutive years. The sampling strategy included finely scaled depth increments to capture vertical variations in microbial communities.

DNA Sequencing: The researchers performed sequencing of the prokaryotic 16S rRNA and fungal ITS2 gene regions to analyze microbial communities.

Soil Parameter Analysis: Various soil parameters, including pH, soil carbon content, age, moisture, organic and mineral horizons, and permafrost layers, were measured to determine their influence on microbial diversity and abundance.

Key Findings:

Diversity Patterns: Alpha diversity (species variety within a sample) decreased with increasing soil depth, age, and pH. Beta diversity analyses revealed that soil age, horizon (organic or mineral layers), and permafrost layer were significant drivers of microbial community composition.

Dominant Taxa: Permafrost microbial communities were dominated by Proteobacteria and Firmicutes, with genus Polaromonas particularly abundant. Upon thawing, there was a shift toward copiotrophic (meaning, thriving in nutrient-rich environments) taxa like Bacteroidia.

Environmental Drivers: Soil age, horizons, and permafrost layers strongly influenced microbial community structure.

This study is significant because understanding microbial responses to permafrost thaw is crucial for predicting carbon release and its implications for climate change.

We will focus on analysing the fungal community (ITS sequences) to explore how it varies across soil layers and depths.

Key Concepts

Diversity

In this class we’ll be thinking about beta diversity and ecological distance among samples or communities. Let’s review the differences between alpha and beta diversity.

Alpha Diversity

Definition: Alpha diversity measures the diversity within a single sample. It considers the number of species (richness) and their evenness (relative abundances).
Common Indices: Examples include Shannon, Simpson, and observed species (ASVs).

Beta Diversity

Definition: Beta diversity quantifies differences in community composition between samples. It reflects ecological distances or dissimilarities.
Connection to Alpha Diversity: Beta diversity complements alpha diversity by showing how much diversity is shared (or not) across samples.

Ecological Distance

Definition: A measure of dissimilarity between communities based on species composition and abundance.
Bray-Curtis Dissimilarity: This metric is widely used in microbial ecology and is based on the proportional differences in abundances between samples. It ranges from 0 (identical communities) to 1 (completely dissimilar communities).
- Bray-Curtis focuses on quantitative differences, meaning that both the presence/absence and the abundance of taxa contribute to the calculation.
Other Metrics: Many other measures of ecological distance exist, each with specific strengths and assumptions. Examples include:
- Jaccard Index: Based on presence/absence data.
- Unifrac: Accounts for phylogenetic relationships between taxa.

Bray-Curtis Dissimilarity

Formula: Bray-Curtis dissimilarity between two samples \(i\) and \(j\) is calculated as: \[ BC_{ij} = \frac{\sum_k |x_{ik} - x_{jk}|}{\sum_k (x_{ik} + x_{jk})} \] Where:

\(x_{ik}\) and \(x_{jk}\) are the abundances of taxon \(k\) in samples \(i\) and \(j\), respectively.
The numerator \(\sum_k |x_{ik} - x_{jk}|\) is the sum of the absolute differences in abundances for all taxa between the two samples.
The denominator \(\sum_k (x_{ik} + x_{jk})\) is the sum of the total abundances of all taxa in both samples.

Interpretation:

\(BC_{ij} = 0\): The samples are identical in composition and relative abundance.
\(BC_{ij} = 1\): The samples have no taxa in common.
Bray-Curtis focuses on quantitative differences, meaning that both the presence/absence and the abundance of taxa contribute to the calculation.

Ordination

Statistical ordination is a method used to simplify and interpret complex multivariate data, especially in ecology. By summarising patterns in the data, ordination reduces high-dimensional datasets (e.g., species abundances across multiple samples) into a few key axes or dimensions. This allows researchers to explore relationships between samples, identify gradients (e.g., environmental factors), and detect clustering or separation among groups.

Common ordination methods include NMDS, PCA (Principal Component Analysis), PCoA (Principal Co-Ordinates Analysis) and CCA (Canonical Correspondence Analysis). They all help visualise patterns and relationships within multivariate datasets.

Note

There is an ordination plot in Scheel et al. (2020), can you find it? What do you think it tells you about the microbial communities in the analysis?

Non-Metric Dimensional Scaling (NMDS)

NMDS is technique to visualise complex ecological data in a reduced-dimensional space (i.e. a 2 dimensional plot) while preserving the rank order of pairwise distances. In an NMDS plot, sites which are more similar to one another in species composition will be plotted closer together.

NMDS is well-suited for ecological data, as it does not assume any particular relationships among relative abundances of taxa.

An important consideration in NMDS is the stress value which indicates how well the reduced-dimensional representation fits the data. NMDS tries to make the plotted distances between points proportional to the ecological distances (e.g. Bray Curtis) calculated from the data. The more distorted these become, the higher the stress. Stress values <0.2 are acceptable; lower values are better.

Analysis

Load libraries

# Load required packages
library(tidyverse) # For handling data
library(vegan) # For ecological community analysis
library(ggplot2) # For plotting

# Set your working directory using setwd()
setwd("~/Downloads")

Load data

We will use the following files:

sample_info.csv: Metadata for each sample (e.g., soil chemistry, sampling depth).
Scheel2022_ASVtable_ITS.csv: Fungal ASV abundances.

# Load sample information
sample_info <- read.csv("sample_info_ITS.csv")

# Load ASV abundance table
asv_data <- read.csv("Scheel2022_ASVtable_ITS.csv")

# Preview the data
head(sample_info)
head(asv_data[, 1:6])  # Preview the first few ASVs

# How would you find out how many ASVs are present?

Data Preprocessing

Filtering and Normalisation

Extract the ASV sequence counts only, and transform ASV counts to relative abundances:

# Extract the sequence counts
# Either do this by dropping the first two columns...
asv_table <- asv_data[,-(1:2)]

# ...r by selecting the columns by name
asv_table <- asv_data %>%
  select(starts_with("TK"))

# Convert to relative abundances
asv_rel_abundance <- asv_table / rowSums(asv_table)

# How would you check that the relative abundances all sum to 1?

Community matrix

Ecological community matrices normally have samples in rows and species in columns. Our data is currently the wrong way round (species in rows, samples in columns). We can easily transpose using t():

# Transpose to community matrix format using t()
community_matrix <- t(asv_rel_abundance)

Ecological Distances

Beta diversity is calculated using distance metrics. We will use Bray-Curtis dissimilarity:

# Calculate Bray-Curtis distance
bray_dist <- vegdist(community_matrix, method = "bray")

The resulting distance matrix contains pairwise distances between all samples. To view the distances:

# View the first few distances
as.matrix(bray_dist)[1:5, 1:5]  # Display a subset of the distance matrix

# How would you plot the distribution of Bray-Curtis values?

Remember, the distances among samples indicate the beta diversity (larger distances, larger community turnover).

NMDS

Perform NMDS using metaMDS:

# Perform NMDS
ITS_nmds <- metaMDS(bray_dist, k = 2, trymax = 100, autotransform = FALSE)

# Check stress value
print(ITS_nmds)

Does the stress value indicate that the NMDS is giving a fair representation of ecological distances among samples?

Visualising NMDS Results

Plot the NMDS ordination, colouring samples by Layer:

# Extract NMDS coordinates
nmds_coords <- as.data.frame(ITS_nmds$points)
nmds_coords$Sample <- rownames(nmds_coords)

# Merge with sample metadata
nmds_coords <- nmds_coords %>% 
  left_join(sample_info, by = c("Sample" = "sample"))

# Plot NMDS
nmds_plot <- ggplot(nmds_coords, aes(x = MDS1, y = MDS2, colour = layer)) +
  geom_point(size = 3) +
  theme_bw() +
  labs(title = "NMDS of Fungal Communities", 
       x = "NMDS Dimension 1", 
       y = "NMDS Dimension 2", 
       colour = "Layer")

print(nmds_plot)

How would you interpret your plot? Do the community compositions of the different soil layers seem to be similar? Are the samples in each layer clustered together?

Optional Extension: 16S Data Analysis

If time permits, you can repeat the above steps using the Scheel2022_ASVtable_16S.csv file for bacterial communities.

Simply load the Scheel2022_ASVtable_16S.csv file in place of the ITS data and follow the same workflow:

# Load bacterial ASV table
asv_table_16S <- read.csv("Scheel2022_ASVtable_16S.csv", row.names = 1)

# Load sample information
sample_info_16S <- read.csv("sample_info_16S.csv")

# Normalise and analyse as above...

Summary

Beta Diversity: Reflects ecological differences between samples.
NMDS: Projects samples into a low-dimensional space for visualisation.
Metadata Integration: Helps interpret community differences.

This workflow can be extended to include other diversity metrics or distance measures based on your research goals.

Introduction

Case Study

Methods:

Key Findings:

Key Concepts

Diversity

Alpha Diversity

Beta Diversity

Ecological Distance

Bray-Curtis Dissimilarity

Ordination

Non-Metric Dimensional Scaling (NMDS)

Analysis

Load libraries

Load data

Data Preprocessing

Filtering and Normalisation

Community matrix

Ecological Distances

NMDS

Visualising NMDS Results

Optional Extension: 16S Data Analysis

Summary

Further Reading