This dataset was obtained from the doctoral dissertation of Dr. LaShanda Williams. You will find the sequences in this dataset here.
Note: The samples were shotgun sequenced and only 16S profiles were used in the analysis below.

This coding demonstration shows the power of phyloseq and ggplot2 in creating clear, engaging, and easily modifiable visualizations as you conduct your microbiome data analysis. There are three essential analyses that you should include in your exploratory data analysis pipeline.

Ideally, you would use pre-processing steps to remove samples with low reads or taxa that aren’t well represented in your samples according to best practices in your microbiome region of interest. You would also do any decontamination analysis and examine batch effects. See our blog post on the topic to learn more. We also did a coding demonstration illustrating the utility of the decontam package here.

Taxonomy Plots

In our exploratory data analysis, Methanobrevibacter appeared to be present in higher than expected abundance. We saw in our previous coding demonstration that it is not a contaminant sequence. Let’s compare the abundance of Methanobrevibacter between calculus and blanks using the phyloseq package.

a_archaea <- subset_taxa(a, Rank6=="g__Methanobrevibacter")
bar <- plot_bar(a_archaea, fill = "Env")
bar +  
    facet_grid(~ Env, scales = "free_y") +
    theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())

Alpha Diversity Plots

In addition to taxonomic differences between blanks and samples, it is important to assess diversity. We anticipate seeing lower within-sample diversity in blanks compared to samples across all measurement types.

alpha <- plot_richness(a, measures=c("Observed","Chao1", "Shannon", "Simpson"), color = "Env")

alpha + theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())

Beta Diversity Plots

Beta diversity measures between-sample diversity. It is ideal for reducing the dimensionality of microbiome data by generating new variables that explain most of the variation with the data.

a.ord <- ordinate(a, "PCoA", "bray")
plot_ordination(a, a.ord, type="samples", color="Env")

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] decontam_1.13.0 ggplot2_3.3.5   dplyr_1.0.7     phyloseq_1.30.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7          ape_5.5             lattice_0.20-45    
##  [4] Biostrings_2.54.0   assertthat_0.2.1    digest_0.6.28      
##  [7] foreach_1.5.1       utf8_1.2.2          R6_2.5.1           
## [10] plyr_1.8.6          stats4_3.6.3        evaluate_0.14      
## [13] highr_0.9           pillar_1.6.4        zlibbioc_1.32.0    
## [16] rlang_0.4.12        data.table_1.14.2   vegan_2.5-7        
## [19] jquerylib_0.1.4     S4Vectors_0.24.4    Matrix_1.3-4       
## [22] rmarkdown_2.11      labeling_0.4.2      splines_3.6.3      
## [25] stringr_1.4.0       igraph_1.2.8        munsell_0.5.0      
## [28] compiler_3.6.3      xfun_0.28           pkgconfig_2.0.3    
## [31] BiocGenerics_0.32.0 multtest_2.42.0     mgcv_1.8-38        
## [34] htmltools_0.5.2     biomformat_1.14.0   tidyselect_1.1.1   
## [37] tibble_3.1.6        IRanges_2.20.2      codetools_0.2-18   
## [40] permute_0.9-5       fansi_0.5.0         withr_2.4.2        
## [43] crayon_1.4.2        MASS_7.3-54         grid_3.6.3         
## [46] nlme_3.1-153        jsonlite_1.7.2      gtable_0.3.0       
## [49] lifecycle_1.0.1     DBI_1.1.1           magrittr_2.0.1     
## [52] scales_1.1.1        stringi_1.7.5       farver_2.1.0       
## [55] XVector_0.26.0      reshape2_1.4.4      bslib_0.3.1        
## [58] ellipsis_0.3.2      vctrs_0.3.8         generics_0.1.1     
## [61] Rhdf5lib_1.8.0      iterators_1.0.13    tools_3.6.3        
## [64] ade4_1.7-18         Biobase_2.46.0      glue_1.5.0         
## [67] purrr_0.3.4         parallel_3.6.3      fastmap_1.1.0      
## [70] survival_3.2-13     yaml_2.2.1          colorspace_2.0-2   
## [73] rhdf5_2.30.1        cluster_2.1.2       knitr_1.36         
## [76] sass_0.4.0

Need help in with your microbiome data analysis? Learn more about our Microbiome Data Analysis Suite offered by the BioData Lab.

Or schedule a consultation with us today!

References

  1. Williams, LaShanda Rena. Paleopathological and microbiological investigations of dental health in America since 1890. Diss. Rutgers The State University of New Jersey, School of Graduate Studies, 2019.

  2. Xia, Yinglin, Jun Sun, and Ding-Geng Chen. Statistical analysis of microbiome data with R. Vol. 847. Singapore: Springer, 2018.