The goal of this is to demonstrate beta diversity analyses that could be performed on metagenomic data. This script was used to generate some results that are present in one of my published articles.

For these analyses, I mainly use the amazing phyloseq package.

library("phyloseq")
library("ggplot2")
library("dplyr")
library("ggpubr")
library("Matrix")
library("reshape2")
library("vegan")

In this example I’m using a phyloseq object called “physeq1”. For information on how it was created, check this tutorial.

physeq1
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 850 taxa and 162 samples ]
## sample_data() Sample Data:       [ 162 samples by 2 sample variables ]
## tax_table()   Taxonomy Table:    [ 850 taxa by 2 taxonomic ranks ]

First of all, we need to transform the data to relative abundance and create a new phyloseq object:

relab_genera = transform_sample_counts(physeq1, function(x) x / sum(x) * 100) 

head(otu_table(relab_genera)[,1:6])
## OTU Table:          [6 taxa and 6 samples]
##                      taxa are rows
##                               yy1        yy2         yy3          yy4
## Acidobacterium        0.004112241 0.00000000 0.001144479 0.0008894617
## Candidatus_Koribacter 0.030442858 0.00000000 0.002924780 0.0017250167
## Granulicella          0.016080703 0.00141635 0.006103888 0.0051750501
## Terriglobus           0.034248215 0.00000000 0.002416122 0.0023179912
## Chloracidobacterium   0.001411665 0.00000000 0.000381493 0.0004312542
## Candidatus_Solibacter 0.032284160 0.00000000 0.002034629 0.0020484573
##                               yy5          yy6
## Acidobacterium        0.000000000 0.0009638424
## Candidatus_Koribacter 0.000000000 0.0001927685
## Granulicella          0.000000000 0.0004819212
## Terriglobus           0.000000000 0.0007710739
## Chloracidobacterium   0.002803555 0.0008674582
## Candidatus_Solibacter 0.000000000 0.0018313006

Calculate Bray-Curtis distance among samples and convert the result to a matrix:

abrel_bray <- phyloseq::distance(relab_genera, method = "bray")
abrel_bray <- as.matrix(abrel_bray)
head(abrel_bray)[,1:6]
##           yy1       yy2       yy3       yy4       yy5       yy6
## yy1 0.0000000 0.9113060 0.7871159 0.8421657 0.9163139 0.9208827
## yy2 0.9113060 0.0000000 0.5488375 0.8984067 0.6060223 0.5351851
## yy3 0.7871159 0.5488375 0.0000000 0.7986645 0.5075713 0.4345278
## yy4 0.8421657 0.8984067 0.7986645 0.0000000 0.9259870 0.9028090
## yy5 0.9163139 0.6060223 0.5075713 0.9259870 0.0000000 0.3863192
## yy6 0.9208827 0.5351851 0.4345278 0.9028090 0.3863192 0.0000000

The “abrel_bray” matrix presents the distance between all samples, but since we wanna generate a boxplot with distances considering the metagenomes in each group separately, we need to filter this matrix. With the code below we end with a dataframe “df.bray” storing Bray Curtis Distances between metagenomes from the same groups.

sub_dist <- list()
groups_all <- sample_data(relab_genera)$Group

for (group in levels(groups_all)) { 
    row_group <- which(groups_all == group)
    sample_group <- sample_names(relab_genera)[row_group]
    sub_dist[[group]] <- abrel_bray[ sample_group, sample_group]
    sub_dist[[group]][!lower.tri(sub_dist[[group]])] <- NA
}

braygroups<- melt(sub_dist)
df.bray <- braygroups[complete.cases(braygroups), ]
df.bray$L1 <- factor(df.bray$L1, levels=names(sub_dist))

head(df.bray)
##   Var1 Var2     value L1
## 2  yy2  yy1 0.9113060 YY
## 3  yy3  yy1 0.7871159 YY
## 4  yy4  yy1 0.8421657 YY
## 5  yy5  yy1 0.9163139 YY
## 6  yy6  yy1 0.9208827 YY
## 7  yy7  yy1 0.9039108 YY

Now we generate the boxplot:

ggplot(df.bray, aes(x=L1, y=value, colour=L1)) +
    geom_jitter() + 
  geom_boxplot(alpha=0.6) +  
  theme(legend.position="none") +
  ylab("Bray-Curtis diversity") +
  theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=45,hjust=1,vjust=1,size=12), axis.text.y=element_text(size=12))

To generate the PcoA plot, first get the ordination result with the “ordinate” command from phyloseq. It requires a phyloseq object, and it accepts diverse methods and distances. Next, generate the plot using the “plot_ordination()” function.

ord = ordinate(relab_genera, method="PCoA", distance = "bray")


plot_ordination(relab_genera, ord, color = "Group", shape="Test") + 
  geom_point(size=4) + 
  stat_ellipse(aes(group=Group))

To test if the Control and the Treatment Groups are significantly different from each other we can run a Permanova test using the adonis function from the vegan package.

samples <- data.frame(sample_data(relab_genera))
adonis(abrel_bray ~ Test, data = samples)
## 
## Call:
## adonis(formula = abrel_bray ~ Test, data = samples) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##            Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## Test        1     8.228  8.2285  42.521 0.20996  0.001 ***
## Residuals 160    30.962  0.1935         0.79004           
## Total     161    39.191                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This result means that the adonis test is significant, therefore the null hypothesis in which the Control/Treatment have the same centroid was reject.