Here I am using the data from the bacterial communities in response to fungicide treatments. This is a phyloseq object and saved in RDS format. To load it in to R, use the readRDS() function. I moved this data into the current directory for our convenience.
epi16S <- readRDS("C:/Users/Xiaoping/OneDrive - Virginia Tech/Xiaoping Li/Location_impacts/Gloria/bsthit_epi_clean_1000_updateDec14.rds")
Using class() to check that it is a phyloseq object.
A phyloseq object usually contains a count table (otu_table), a taxonomy table(tax_table), a metadata table (sample_data), and can also include tree, sequences, etc. Check the official page for more: here
class(epi16S)
## [1] "phyloseq"
## attr(,"package")
## [1] "phyloseq"
This data has an otu table, a sample data, a taxonomy table and a tree, you can use functions, like otu_table(), tax_table, sample_data() to extract the corresponding data
epi16S
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 4645 taxa and 110 samples ]
## sample_data() Sample Data: [ 110 samples by 18 sample variables ]
## tax_table() Taxonomy Table: [ 4645 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 4645 tips and 4644 internal nodes ]
otu16S = otu_table(epi16S)
class(otu16S) = "matrix"
## Warning in class(otu16S) = "matrix": Setting class(x) to "matrix" sets attribute
## to NULL; result will no longer be an S4 object
tax16S = as.matrix(tax_table(epi16S))
meta16S = as.data.frame(sample_data(epi16S))
otu table looks like this (Showing first 10 rows and columns)—row: taxa, column: sample in the metadata, each cell has the sequencing abundance (how many reads assigned to that taxon)
print(otu16S[1:2, 1:10])
## A0NC1_barcode01 A0NC2_barcode33 A0NC3_barcode01
## OTU905:g__Mycoplasma 7 13 11
## OTU981:c__Planctomycetia 1 3 1
## A1NC1_barcode02 A1NC2_barcode34 A1NC3_barcode02
## OTU905:g__Mycoplasma 6 2 9
## OTU981:c__Planctomycetia 0 0 0
## A1NC4_barcode34 A2NC1_barcode03 A2NC2_barcode35
## OTU905:g__Mycoplasma 4 3 1
## OTU981:c__Planctomycetia 0 2 0
## A2NC3_barcode03
## OTU905:g__Mycoplasma 3
## OTU981:c__Planctomycetia 5
tax table looks like this, showing the lineage (7 levels):
print(tax16S[1:5, 1:7])
## Taxonomy Table: [5 taxa by 7 taxonomic ranks]:
## Domain Phylum
## OTU905:g__Mycoplasma "Bacteria" "Tenericutes"
## OTU981:c__Planctomycetia "Bacteria" "Planctomycetes"
## OTU770:Corynebacterium matruchotii "Bacteria" "Actinobacteria"
## OTU457:Desulfobacula toluolica "Bacteria" "Proteobacteria"
## OTU2343:Luteimicrobium xylanilyticum "Bacteria" "Actinobacteria"
## Class Order
## OTU905:g__Mycoplasma "Mollicutes" "Mycoplasmatales"
## OTU981:c__Planctomycetia "Planctomycetia" "c__Planctomycetia"
## OTU770:Corynebacterium matruchotii "Actinomycetia" "Corynebacteriales"
## OTU457:Desulfobacula toluolica "Deltaproteobacteria" "Desulfobacterales"
## OTU2343:Luteimicrobium xylanilyticum "Actinomycetia" "Micrococcales"
## Family Genus
## OTU905:g__Mycoplasma "Mycoplasmataceae" "Mycoplasma"
## OTU981:c__Planctomycetia "c__Planctomycetia" "c__Planctomycetia"
## OTU770:Corynebacterium matruchotii "Corynebacteriaceae" "Corynebacterium"
## OTU457:Desulfobacula toluolica "Desulfobacteraceae" "Desulfobacula"
## OTU2343:Luteimicrobium xylanilyticum "o__Micrococcales" "Luteimicrobium"
## Species
## OTU905:g__Mycoplasma "g__Mycoplasma"
## OTU981:c__Planctomycetia "c__Planctomycetia"
## OTU770:Corynebacterium matruchotii "Corynebacterium matruchotii"
## OTU457:Desulfobacula toluolica "Desulfobacula toluolica"
## OTU2343:Luteimicrobium xylanilyticum "Luteimicrobium xylanilyticum"
metadata looks like this, showing what samples have been taken, when, and other related information:
print(meta16S[1:5, 1:10])
## Barcodes Original_label Amplicon Seq_run Flow_cell Fungicide
## A0NC1_barcode01 barcode01 A0NC1 16S 1 1 NT
## A0NC2_barcode33 barcode33 A0NC2 16S 2 1 NT
## A0NC3_barcode01 barcode01 A0NC3 16S 3 2 NT
## A1NC1_barcode02 barcode02 A1NC1 16S 1 1 NT
## A1NC2_barcode34 barcode34 A1NC2 16S 2 1 NT
## Fungicide_code time Month Season
## A0NC1_barcode01 A 0 May ES
## A0NC2_barcode33 A 0 May ES
## A0NC3_barcode01 A 0 May ES
## A1NC1_barcode02 A 1 May ES
## A1NC2_barcode34 A 1 May ES
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
# need to transpose first
rarecurve(t(otu16S), label=F)
or you can use ampvis2 package, so the plot would be ggplot2 object for easier manipulation
# ampvis2 is restrict on the tax_table, need to keep to 7 rankings
tax16S = as.matrix(tax_table(epi16S))
tax16S = tax16S[, 1:7]
tax_table(epi16S) = tax16S
amp.epi = phyloseq_to_ampvis2(epi16S)
amp_rarecurve(amp.epi, color_by = "funlab", stepsize = 100) +
geom_vline(aes(xintercept = median(sample_sums(epi16S))), linetype="dashed", color="darkgreen") +
scale_color_brewer(palette = "Set2",labels=c("Nontreated", "Daconil", "Banner Maxx", "Concert II")) +
labs(color = "Fungicides", y="Number of observed OTUs") +
theme(text = element_text(size=12, color="black"),
axis.title.x = element_text(size=12, color="black", margin = margin(t=10)),
axis.title.y = element_text(size=12, color="black", margin = margin(r=10)),
axis.text.x = element_text(size=12, color="black"),
axis.text.y = element_text(size=12, color="black"))
meta16S.epi = data.frame(sample_data(epi16S))
goods.epi = phyloseq_coverage(epi16S, correct_singletons = T)
rownames(goods.epi) = goods.epi$SampleID
epi.coverage = merge(goods.epi, meta16S.epi, by = "row.names")
head(epi.coverage)
## Row.names SampleID SampleCoverage Barcodes Original_label
## 1 A0NC1_barcode01 A0NC1_barcode01 0.9889398 barcode01 A0NC1
## 2 A0NC2_barcode33 A0NC2_barcode33 0.9856085 barcode33 A0NC2
## 3 A0NC3_barcode01 A0NC3_barcode01 0.9968621 barcode01 A0NC3
## 4 A1NC1_barcode02 A1NC1_barcode02 0.9803316 barcode02 A1NC1
## 5 A1NC2_barcode34 A1NC2_barcode34 0.9728516 barcode34 A1NC2
## 6 A1NC3_barcode02 A1NC3_barcode02 0.9937704 barcode02 A1NC3
## Amplicon Seq_run Flow_cell Fungicide Fungicide_code time Month Season
## 1 16S 1 1 NT A 0 May ES
## 2 16S 2 1 NT A 0 May ES
## 3 16S 3 2 NT A 0 May ES
## 4 16S 1 1 NT A 1 May ES
## 5 16S 2 1 NT A 1 May ES
## 6 16S 3 2 NT A 1 May ES
## Date time_label Rep Disease funlab funlab2 Spl_time Season_lab
## 1 5/26/2021 pre 1 10 NT Nontreated 0 Early summer
## 2 5/26/2021 pre 2 10 NT Nontreated 0 Early summer
## 3 5/26/2021 pre 3 0 NT Nontreated 0 Early summer
## 4 5/27/2021 1dp 1 10 NT Nontreated 1 Early summer
## 5 5/27/2021 1dp 2 10 NT Nontreated 1 Early summer
## 6 5/27/2021 1dp 3 0 NT Nontreated 1 Early summer
# here
epi.coverage %>%
group_by(Season, funlab) %>%
summarise(mean_coverage = mean(SampleCoverage), stderr = sd(SampleCoverage)/sqrt(n()))
## `summarise()` has grouped output by 'Season'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups: Season [2]
## Season funlab mean_coverage stderr
## <fct> <fct> <dbl> <dbl>
## 1 ES NT 0.985 0.00267
## 2 ES DL 0.990 0.00184
## 3 ES BM 0.990 0.00182
## 4 ES Crt2 0.991 0.00166
## 5 EF NT 0.995 0.00146
## 6 EF DL 0.975 0.0105
## 7 EF BM 0.978 0.00934
## 8 EF Crt2 0.994 0.000869
please review what is observed richness, shannon index, simpsons, Chao1, ACE
which of those only measure richness, and which of those measure evenness, and which measure diversity (richness and evenness)
if the sequencing depth allows, we can rarefy the data first (IMPORTANT: need to find an appropriate depth where it retains a reasonable amount of sequencing reads while keeping most of the samples)
In this case, the minimal depth of data is 1,669, I will just use this one as an example, the parameter ‘rngseed’ can be set as any number, next time you run it, you will get the exact result (because it involves random sampling so if you don’t set it, everytime it will produce a slightly different output)
min(sample_sums(epi16S)) #1669
## [1] 1669
epi_rare = phyloseq::rarefy_even_depth(epi16S, rngseed = 123, replace = F)
## `set.seed(123)` was used to initialize repeatable random subsampling.
## Please record this for your records so others can reproduce.
## Try `set.seed(123); .Random.seed` for the full vector
## ...
## 1360OTUs were removed because they are no longer
## present in any sample after random subsampling
## ...
After rarefaction, we can estimate richness and diversity
div_epi16S = phyloseq::estimate_richness(epi_rare, measures = c("Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson"))
head(div_epi16S)
## Observed Chao1 se.chao1 ACE se.ACE Shannon Simpson
## A0NC1_barcode01 287 609.6364 71.50498 648.9985 16.54194 4.126091 0.9561377
## A0NC2_barcode33 329 665.0185 69.02996 724.7856 17.25638 4.113913 0.9232927
## A0NC3_barcode01 300 686.2174 82.04864 785.7976 19.12988 3.685522 0.9021501
## A1NC1_barcode02 235 483.1081 60.75085 525.0204 14.11728 3.788541 0.9424328
## A1NC2_barcode34 301 640.3750 72.44460 687.7792 17.08671 3.793132 0.9158557
## A1NC3_barcode02 226 578.8333 88.30718 620.3770 17.27779 3.140454 0.8685044
## InvSimpson
## A0NC1_barcode01 22.798643
## A0NC2_barcode33 13.036561
## A0NC3_barcode01 10.219729
## A1NC1_barcode02 17.370997
## A1NC2_barcode34 11.884350
## A1NC3_barcode02 7.604818
mrg.epi = merge(meta16S.epi, div_epi16S, by.x="row.names", by.y="row.names")
head(mrg.epi)
## Row.names Barcodes Original_label Amplicon Seq_run Flow_cell Fungicide
## 1 A0NC1_barcode01 barcode01 A0NC1 16S 1 1 NT
## 2 A0NC2_barcode33 barcode33 A0NC2 16S 2 1 NT
## 3 A0NC3_barcode01 barcode01 A0NC3 16S 3 2 NT
## 4 A1NC1_barcode02 barcode02 A1NC1 16S 1 1 NT
## 5 A1NC2_barcode34 barcode34 A1NC2 16S 2 1 NT
## 6 A1NC3_barcode02 barcode02 A1NC3 16S 3 2 NT
## Fungicide_code time Month Season Date time_label Rep Disease funlab
## 1 A 0 May ES 5/26/2021 pre 1 10 NT
## 2 A 0 May ES 5/26/2021 pre 2 10 NT
## 3 A 0 May ES 5/26/2021 pre 3 0 NT
## 4 A 1 May ES 5/27/2021 1dp 1 10 NT
## 5 A 1 May ES 5/27/2021 1dp 2 10 NT
## 6 A 1 May ES 5/27/2021 1dp 3 0 NT
## funlab2 Spl_time Season_lab Observed Chao1 se.chao1 ACE se.ACE
## 1 Nontreated 0 Early summer 287 609.6364 71.50498 648.9985 16.54194
## 2 Nontreated 0 Early summer 329 665.0185 69.02996 724.7856 17.25638
## 3 Nontreated 0 Early summer 300 686.2174 82.04864 785.7976 19.12988
## 4 Nontreated 1 Early summer 235 483.1081 60.75085 525.0204 14.11728
## 5 Nontreated 1 Early summer 301 640.3750 72.44460 687.7792 17.08671
## 6 Nontreated 1 Early summer 226 578.8333 88.30718 620.3770 17.27779
## Shannon Simpson InvSimpson
## 1 4.126091 0.9561377 22.798643
## 2 4.113913 0.9232927 13.036561
## 3 3.685522 0.9021501 10.219729
## 4 3.788541 0.9424328 17.370997
## 5 3.793132 0.9158557 11.884350
## 6 3.140454 0.8685044 7.604818
epi_adiv2 = mrg.epi %>%
tidyr::gather(key = "measure", value = "value", Observed:InvSimpson)
# select "Observed" for observed richness, or "Shannon" for shannon index, etc.
epi.df = epi_adiv2 %>%
filter(measure == "Observed")
aov.epi = epi.df %>%
aov(value ~ funlab*time_label*Season, data=.)
# see anova result
summary(aov.epi)
## Df Sum Sq Mean Sq F value Pr(>F)
## funlab 3 313670 104557 33.272 5.94e-14 ***
## time_label 3 450034 150011 47.736 < 2e-16 ***
## Season 1 103742 103742 33.013 1.69e-07 ***
## funlab:time_label 9 27732 3081 0.981 0.46276
## funlab:Season 3 51295 17098 5.441 0.00189 **
## time_label:Season 3 114678 38226 12.164 1.30e-06 ***
## funlab:time_label:Season 9 24578 2731 0.869 0.55631
## Residuals 78 245114 3142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1