The goal of this is to demonstrate beta diversity analyses that could be performed on metagenomic data. This script was used to generate some results that are present in one of my published articles.
For these analyses, I mainly use the amazing phyloseq package.
library("phyloseq")
library("ggplot2")
library("dplyr")
library("ggpubr")
library("Matrix")
library("reshape2")
library("vegan")
In this example I’m using a phyloseq object called “physeq1”. For information on how it was created, check this tutorial.
physeq1
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 850 taxa and 162 samples ]
## sample_data() Sample Data: [ 162 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 850 taxa by 2 taxonomic ranks ]
First of all, we need to transform the data to relative abundance and create a new phyloseq object:
relab_genera = transform_sample_counts(physeq1, function(x) x / sum(x) * 100)
head(otu_table(relab_genera)[,1:6])
## OTU Table: [6 taxa and 6 samples]
## taxa are rows
## yy1 yy2 yy3 yy4
## Acidobacterium 0.004112241 0.00000000 0.001144479 0.0008894617
## Candidatus_Koribacter 0.030442858 0.00000000 0.002924780 0.0017250167
## Granulicella 0.016080703 0.00141635 0.006103888 0.0051750501
## Terriglobus 0.034248215 0.00000000 0.002416122 0.0023179912
## Chloracidobacterium 0.001411665 0.00000000 0.000381493 0.0004312542
## Candidatus_Solibacter 0.032284160 0.00000000 0.002034629 0.0020484573
## yy5 yy6
## Acidobacterium 0.000000000 0.0009638424
## Candidatus_Koribacter 0.000000000 0.0001927685
## Granulicella 0.000000000 0.0004819212
## Terriglobus 0.000000000 0.0007710739
## Chloracidobacterium 0.002803555 0.0008674582
## Candidatus_Solibacter 0.000000000 0.0018313006
Calculate Bray-Curtis distance among samples and convert the result to a matrix:
abrel_bray <- phyloseq::distance(relab_genera, method = "bray")
abrel_bray <- as.matrix(abrel_bray)
head(abrel_bray)[,1:6]
## yy1 yy2 yy3 yy4 yy5 yy6
## yy1 0.0000000 0.9113060 0.7871159 0.8421657 0.9163139 0.9208827
## yy2 0.9113060 0.0000000 0.5488375 0.8984067 0.6060223 0.5351851
## yy3 0.7871159 0.5488375 0.0000000 0.7986645 0.5075713 0.4345278
## yy4 0.8421657 0.8984067 0.7986645 0.0000000 0.9259870 0.9028090
## yy5 0.9163139 0.6060223 0.5075713 0.9259870 0.0000000 0.3863192
## yy6 0.9208827 0.5351851 0.4345278 0.9028090 0.3863192 0.0000000
The “abrel_bray” matrix presents the distance between all samples, but since we wanna generate a boxplot with distances considering the metagenomes in each group separately, we need to filter this matrix. With the code below we end with a dataframe “df.bray” storing Bray Curtis Distances between metagenomes from the same groups.
sub_dist <- list()
groups_all <- sample_data(relab_genera)$Group
for (group in levels(groups_all)) {
row_group <- which(groups_all == group)
sample_group <- sample_names(relab_genera)[row_group]
sub_dist[[group]] <- abrel_bray[ sample_group, sample_group]
sub_dist[[group]][!lower.tri(sub_dist[[group]])] <- NA
}
braygroups<- melt(sub_dist)
df.bray <- braygroups[complete.cases(braygroups), ]
df.bray$L1 <- factor(df.bray$L1, levels=names(sub_dist))
head(df.bray)
## Var1 Var2 value L1
## 2 yy2 yy1 0.9113060 YY
## 3 yy3 yy1 0.7871159 YY
## 4 yy4 yy1 0.8421657 YY
## 5 yy5 yy1 0.9163139 YY
## 6 yy6 yy1 0.9208827 YY
## 7 yy7 yy1 0.9039108 YY
Now we generate the boxplot:
ggplot(df.bray, aes(x=L1, y=value, colour=L1)) +
geom_jitter() +
geom_boxplot(alpha=0.6) +
theme(legend.position="none") +
ylab("Bray-Curtis diversity") +
theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=45,hjust=1,vjust=1,size=12), axis.text.y=element_text(size=12))
To generate the PcoA plot, first get the ordination result with the “ordinate” command from phyloseq. It requires a phyloseq object, and it accepts diverse methods and distances. Next, generate the plot using the “plot_ordination()” function.
ord = ordinate(relab_genera, method="PCoA", distance = "bray")
plot_ordination(relab_genera, ord, color = "Group", shape="Test") +
geom_point(size=4) +
stat_ellipse(aes(group=Group))
To test if the Control and the Treatment Groups are significantly different from each other we can run a Permanova test using the adonis function from the vegan package.
samples <- data.frame(sample_data(relab_genera))
adonis(abrel_bray ~ Test, data = samples)
##
## Call:
## adonis(formula = abrel_bray ~ Test, data = samples)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## Test 1 8.228 8.2285 42.521 0.20996 0.001 ***
## Residuals 160 30.962 0.1935 0.79004
## Total 161 39.191 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This result means that the adonis test is significant, therefore the null hypothesis in which the Control/Treatment have the same centroid was reject.