This tutorial shows you how to compare alpha diversity of different sample groups once you have a phyloseq object.
(Packages must be installed prior to running this tutorial)
library(tidyverse)
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(phyloseq)
library(FSA)
## ## FSA v0.9.4. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
#set seed for reproducible data
set.seed(81)
This tutorial assumes that you have a phyloseq object of the data that you want to plot. The example phyloseq object shown here has 9 samples, 9 sample variables, and 12,003 unique taxa.
toy_ps <- read_rds("/home/lgschaer/old/Tutorials/toy_phyloseq.rds")
toy_ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12003 taxa and 9 samples ]
## sample_data() Sample Data: [ 9 samples by 9 sample variables ]
## tax_table() Taxonomy Table: [ 12003 taxa by 7 taxonomic ranks ]
This phyloseq object has 9 sample variables: SampleID, Environoment, Sample_Type, Replicate, Substrate, Transfer, Substrate_Label, Environment_Label_Location, and Environment_Label.
head(sample_data(toy_ps))
## SampleID Environment Sample_Type Replicate Substrate Transfer
## CCONR1T1 CCONR1T1 Compost Enrichment R1 Control T1
## CCONR2T1 CCONR2T1 Compost Enrichment R2 Control T1
## CCONR3T1 CCONR3T1 Compost Enrichment R3 Control T1
## CTAR1T1 CTAR1T1 Compost Enrichment R1 TA T1
## CTAR2T1 CTAR2T1 Compost Enrichment R2 TA T1
## CTAR3T1 CTAR3T1 Compost Enrichment R3 TA T1
## Substrate_Label Environment_Label_Location Environment_Label
## CCONR1T1 Control Compost\nCalumet, MI Compost
## CCONR2T1 Control Compost\nCalumet, MI Compost
## CCONR3T1 Control Compost\nCalumet, MI Compost
## CTAR1T1 Terephthalamide Compost\nCalumet, MI Compost
## CTAR2T1 Terephthalamide Compost\nCalumet, MI Compost
## CTAR3T1 Terephthalamide Compost\nCalumet, MI Compost
Start with a simple plot with no fancy formatting.
toy_ps %>% #phyloseq object
plot_richness(
x = "Substrate", #change to the variable you want to compare
measures = c("Observed", "Shannon")) + #choose diversity measure, also available: Chao1, ACE, Simpson, InvSimpson, Fisher
geom_boxplot(aes(fill = Substrate), show.legend = FALSE)
## Warning in estimate_richness(physeq, split = TRUE, measures = measures): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
##
## We recommended that you find the un-trimmed data and retry.
Add more layers to the plot to clean up the formatting and make the plot prettier.
#sample colors
sample_colors <- c("lightblue", "maroon", "olivedrab")
#violin plot
toy_ps %>% #phyloseq object
plot_richness(
x = "Substrate", #compare diversity of datatype
measures = c("Observed", "Shannon")) + #choose diversity measures
geom_boxplot(aes(fill = Substrate), show.legend = FALSE)+ #make violin plot, set fill aes to sampletype
theme_linedraw()+ #change theme to classic
xlab(NULL)+ #no label on x-axis
theme(axis.text.y.left = element_text(size = 20), #adjust y-axis text
axis.text.x = element_text(size = 20, hjust = 0.5, vjust = 1, angle = 0), #adjust x-axis label position
axis.title.y = element_text(size = 20))+ #adjust y-axis title
theme(strip.text = element_text(face = "bold", size = 25))+ #adjust headings
scale_fill_manual(values = sample_colors)+ #set fill colors
theme(plot.title=element_text(size = 25, face = "bold", hjust = 0.5)) #change title size, face and position
## Warning in estimate_richness(physeq, split = TRUE, measures = measures): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
##
## We recommended that you find the un-trimmed data and retry.
We have visualized differences between sample groups, but are these differences statistically significant? We will use a Kruskall-Wallis test paired with a dunn post-hoc test to find out. These tests were chosen because they are robust with non-normally distributed data (most 16S rRNA data is not normally distributed).
Start by making a data frame with the calculated alpha diversity metrics for each sample. The resulting data frame should have a column for each alpha diversity metric you specify.
alphadiv <- estimate_richness(toy_ps, measures = c("Observed", "Shannon")) %>%
rownames_to_column(var = "SampleID") %>%
left_join(as.data.frame(sample_data(toy_ps)), by = "SampleID")
## Warning in estimate_richness(toy_ps, measures = c("Observed", "Shannon")): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
##
## We recommended that you find the un-trimmed data and retry.
head(alphadiv)
## SampleID Observed Shannon Environment Sample_Type Replicate Substrate
## 1 CCONR1T1 253 4.634398 Compost Enrichment R1 Control
## 2 CCONR2T1 3 0.687092 Compost Enrichment R2 Control
## 3 CCONR3T1 152 4.388443 Compost Enrichment R3 Control
## 4 CTAR1T1 422 4.774065 Compost Enrichment R1 TA
## 5 CTAR2T1 542 4.839662 Compost Enrichment R2 TA
## 6 CTAR3T1 639 4.010840 Compost Enrichment R3 TA
## Transfer Substrate_Label Environment_Label_Location Environment_Label
## 1 T1 Control Compost\nCalumet, MI Compost
## 2 T1 Control Compost\nCalumet, MI Compost
## 3 T1 Control Compost\nCalumet, MI Compost
## 4 T1 Terephthalamide Compost\nCalumet, MI Compost
## 5 T1 Terephthalamide Compost\nCalumet, MI Compost
## 6 T1 Terephthalamide Compost\nCalumet, MI Compost
#Run the Kruskall-Wallis test
You will need to run a test for each alpha diversity measure that you are interested in.
#Observed
kruskal.test(Observed ~ Substrate, data = alphadiv)
##
## Kruskal-Wallis rank sum test
##
## data: Observed by Substrate
## Kruskal-Wallis chi-squared = 5.6, df = 2, p-value = 0.06081
#Shannon
kruskal.test(Shannon ~ Substrate, data = alphadiv)
##
## Kruskal-Wallis rank sum test
##
## data: Shannon by Substrate
## Kruskal-Wallis chi-squared = 1.1556, df = 2, p-value = 0.5611
We see that the test is not statistically significant for either diversity measure (p > 0.05). For the sake of practice, let’s run the post-hoc test anyway. Normally you would not run a post-hoc test after a non-significant test.
the post-hoc test shows you between which pairwise comparisons there is a statistically significant difference.
##Observed
dunnO <- dunnTest(Observed ~ Substrate,
data=alphadiv,
method="bh")
## Warning: Substrate was coerced to a factor.
dunnO
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Benjamini-Hochberg method.
## Comparison Z P.unadj P.adj
## 1 Control - TA -2.2360680 0.02534732 0.07604196
## 2 Control - TPA -0.4472136 0.65472085 0.65472085
## 3 TA - TPA 1.7888544 0.07363827 0.11045741
##Shannon
dunnS <- dunnTest(Shannon ~ Substrate,
data=alphadiv,
method="bh")
## Warning: Substrate was coerced to a factor.
dunnS
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Benjamini-Hochberg method.
## Comparison Z P.unadj P.adj
## 1 Control - TA -0.7453560 0.4560565 0.6840848
## 2 Control - TPA 0.2981424 0.7655945 0.7655945
## 3 TA - TPA 1.0434984 0.2967175 0.8901526