Alpha Diversity Tutorial

This tutorial shows you how to compare alpha diversity of different sample groups once you have a phyloseq object.

Start by loading the required R packages

(Packages must be installed prior to running this tutorial)

library(tidyverse)
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(phyloseq)
library(FSA)
## ## FSA v0.9.4. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
#set seed for reproducible data
set.seed(81)

Load your phyloseq object

This tutorial assumes that you have a phyloseq object of the data that you want to plot. The example phyloseq object shown here has 9 samples, 9 sample variables, and 12,003 unique taxa.

toy_ps <- read_rds("/home/lgschaer/old/Tutorials/toy_phyloseq.rds")  
toy_ps
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 12003 taxa and 9 samples ]
## sample_data() Sample Data:       [ 9 samples by 9 sample variables ]
## tax_table()   Taxonomy Table:    [ 12003 taxa by 7 taxonomic ranks ]

Explore the sample variables in the test phyloseq object

This phyloseq object has 9 sample variables: SampleID, Environoment, Sample_Type, Replicate, Substrate, Transfer, Substrate_Label, Environment_Label_Location, and Environment_Label.

head(sample_data(toy_ps))
##          SampleID Environment Sample_Type Replicate Substrate Transfer
## CCONR1T1 CCONR1T1     Compost  Enrichment        R1   Control       T1
## CCONR2T1 CCONR2T1     Compost  Enrichment        R2   Control       T1
## CCONR3T1 CCONR3T1     Compost  Enrichment        R3   Control       T1
## CTAR1T1   CTAR1T1     Compost  Enrichment        R1        TA       T1
## CTAR2T1   CTAR2T1     Compost  Enrichment        R2        TA       T1
## CTAR3T1   CTAR3T1     Compost  Enrichment        R3        TA       T1
##          Substrate_Label Environment_Label_Location Environment_Label
## CCONR1T1         Control       Compost\nCalumet, MI           Compost
## CCONR2T1         Control       Compost\nCalumet, MI           Compost
## CCONR3T1         Control       Compost\nCalumet, MI           Compost
## CTAR1T1  Terephthalamide       Compost\nCalumet, MI           Compost
## CTAR2T1  Terephthalamide       Compost\nCalumet, MI           Compost
## CTAR3T1  Terephthalamide       Compost\nCalumet, MI           Compost

Make a simple plot

Start with a simple plot with no fancy formatting.

toy_ps %>%                                           #phyloseq object
  plot_richness(
    x = "Substrate",                                 #change to the variable you want to compare
    measures = c("Observed", "Shannon")) +           #choose diversity measure, also available: Chao1, ACE, Simpson, InvSimpson, Fisher
  geom_boxplot(aes(fill = Substrate), show.legend = FALSE)
## Warning in estimate_richness(physeq, split = TRUE, measures = measures): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.

Add more layers to the plot to clean up the formatting and make the plot prettier.

#sample colors
sample_colors <- c("lightblue", "maroon", "olivedrab")

#violin plot
toy_ps %>%                                                              #phyloseq object
  plot_richness(
    x = "Substrate",                                                    #compare diversity of datatype
    measures = c("Observed", "Shannon")) +                              #choose diversity measures
  geom_boxplot(aes(fill = Substrate), show.legend = FALSE)+             #make violin plot, set fill aes to sampletype
  theme_linedraw()+                                                     #change theme to classic
  xlab(NULL)+                                                           #no label on x-axis
  theme(axis.text.y.left = element_text(size = 20),                     #adjust y-axis text
        axis.text.x = element_text(size = 20, hjust = 0.5, vjust = 1, angle = 0),           #adjust x-axis label position
        axis.title.y = element_text(size = 20))+                        #adjust y-axis title
  theme(strip.text = element_text(face = "bold", size = 25))+           #adjust headings
  scale_fill_manual(values = sample_colors)+                            #set fill colors
  theme(plot.title=element_text(size = 25, face = "bold", hjust = 0.5)) #change title size, face and position
## Warning in estimate_richness(physeq, split = TRUE, measures = measures): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.

Look for statistically significant differences between sample types

We have visualized differences between sample groups, but are these differences statistically significant? We will use a Kruskall-Wallis test paired with a dunn post-hoc test to find out. These tests were chosen because they are robust with non-normally distributed data (most 16S rRNA data is not normally distributed).

Start by making a data frame with the calculated alpha diversity metrics for each sample. The resulting data frame should have a column for each alpha diversity metric you specify.

alphadiv <- estimate_richness(toy_ps, measures = c("Observed", "Shannon")) %>%
  rownames_to_column(var = "SampleID") %>%
  left_join(as.data.frame(sample_data(toy_ps)), by = "SampleID") 
## Warning in estimate_richness(toy_ps, measures = c("Observed", "Shannon")): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.
head(alphadiv)
##   SampleID Observed  Shannon Environment Sample_Type Replicate Substrate
## 1 CCONR1T1      253 4.634398     Compost  Enrichment        R1   Control
## 2 CCONR2T1        3 0.687092     Compost  Enrichment        R2   Control
## 3 CCONR3T1      152 4.388443     Compost  Enrichment        R3   Control
## 4  CTAR1T1      422 4.774065     Compost  Enrichment        R1        TA
## 5  CTAR2T1      542 4.839662     Compost  Enrichment        R2        TA
## 6  CTAR3T1      639 4.010840     Compost  Enrichment        R3        TA
##   Transfer Substrate_Label Environment_Label_Location Environment_Label
## 1       T1         Control       Compost\nCalumet, MI           Compost
## 2       T1         Control       Compost\nCalumet, MI           Compost
## 3       T1         Control       Compost\nCalumet, MI           Compost
## 4       T1 Terephthalamide       Compost\nCalumet, MI           Compost
## 5       T1 Terephthalamide       Compost\nCalumet, MI           Compost
## 6       T1 Terephthalamide       Compost\nCalumet, MI           Compost

#Run the Kruskall-Wallis test

You will need to run a test for each alpha diversity measure that you are interested in.

#Observed
kruskal.test(Observed ~ Substrate, data = alphadiv) 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Observed by Substrate
## Kruskal-Wallis chi-squared = 5.6, df = 2, p-value = 0.06081
#Shannon
kruskal.test(Shannon ~ Substrate, data = alphadiv)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Shannon by Substrate
## Kruskal-Wallis chi-squared = 1.1556, df = 2, p-value = 0.5611

We see that the test is not statistically significant for either diversity measure (p > 0.05). For the sake of practice, let’s run the post-hoc test anyway. Normally you would not run a post-hoc test after a non-significant test.

the post-hoc test shows you between which pairwise comparisons there is a statistically significant difference.

##Observed
dunnO <- dunnTest(Observed ~ Substrate,
                  data=alphadiv,
                  method="bh")
## Warning: Substrate was coerced to a factor.
dunnO
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Benjamini-Hochberg method.
##      Comparison          Z    P.unadj      P.adj
## 1  Control - TA -2.2360680 0.02534732 0.07604196
## 2 Control - TPA -0.4472136 0.65472085 0.65472085
## 3      TA - TPA  1.7888544 0.07363827 0.11045741
##Shannon
dunnS <- dunnTest(Shannon ~ Substrate,
                  data=alphadiv,
                  method="bh")
## Warning: Substrate was coerced to a factor.
dunnS
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Benjamini-Hochberg method.
##      Comparison          Z   P.unadj     P.adj
## 1  Control - TA -0.7453560 0.4560565 0.6840848
## 2 Control - TPA  0.2981424 0.7655945 0.7655945
## 3      TA - TPA  1.0434984 0.2967175 0.8901526