1 What curatedMetagenomicData provides
2 Setup
3 Choose a dataset for analysis
4 Choose metabolic datasets for analysis
5 Taxonomy-aware analysis with the phyloseq Bioconductor library

1 What `curatedMetagenomicData` provides

curatedMetagenomicData provides 6 types of data for each dataset:

1. species-level taxonomic profiles;
2-3) marker presence and abundance data; and
4-6) gene families, pathway coverage and pathway abundance.

Types 1-3 are generated by MetaPhlAn2; 4-6 are generated by HUMAnN2.

Currently, curatedMetagenomicData provides:

2,875 samples from 13 datasets, primarily of the human gut but including body sites profiled in the Human Microbiome Project
processed data from whole-metagenome shotgun metagenomics, with manually-curated metadata, as integrated and documented Bioconductor ExpressionSet objects
~200 fields of specimen metadata from original papers, supplementary files, and websites, with manual curation to standardize annotations
processing of data through the new MetaPhlAn2 pipeline (http://huttenhower.sph.harvard.edu/metaphlan2) for taxonomic abundance, and HUMAnN2 pipeline (Version 2 manuscript in preparation, http://huttenhower.sph.harvard.edu/humann2) for metabolic analysis.
This effort required analyzing ~23T of raw sequencing data, generating ~52T of intermediate files

These datasets are documented in the reference manual.

2 Setup

Load the ExperimentHub library,

library(ExperimentHub)

and create the hub:

eh = ExperimentHub()

## snapshotDate(): 2016-06-08

The following queries ExperimentHub for all records matching “curatedMetagenomicData”:

myquery = query(eh, "curatedMetagenomicData")

This is an abbreviated list of what was found:

myquery

## ExperimentHub with 162 records
## # snapshotDate(): 2016-06-08 
## # $dataprovider: Human Microbiome Project Consortium, BGI-Shenzhen, She...
## # $species: Homo sapiens
## # $rdataclass: ExpressionSet
## # additional mcols(): taxonomyid, genome, description,
## #   preparerclass, tags, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH2"]]' 
## 
##           title                                      
##   EH2   | Candela_Africa.stool.abundance.eset.rda    
##   EH3   | Candela_Africa.stool.genefamilies.eset.rda 
##   EH4   | Candela_Africa.stool.marker_ab.eset.rda    
##   EH5   | Candela_Africa.stool.marker_pres.eset.rda  
##   EH6   | Candela_Africa.stool.pathabundance.eset.rda
##   ...     ...                                        
##   EH159 | t2dmeta_short.stool.genefamilies.eset.rda  
##   EH160 | t2dmeta_short.stool.marker_ab.eset.rda     
##   EH161 | t2dmeta_short.stool.marker_pres.eset.rda   
##   EH162 | t2dmeta_short.stool.pathabundance.eset.rda 
##   EH163 | t2dmeta_short.stool.pathcoverage.eset.rda

Note that the first column, “EH2”, “EH3”, for example, are unique IDs for each dataset. The low numbers here are because curatedMetagenomicData is the first major dataset to be distributed through ExperimentHub. For a full list, mcols(myquery) produces a data.frame. The following is unevaluated in this script, but can be used to display an interactive spreadsheet of the results:

View(mcols(myquery))

Or to write the search results to a csv file:

write.csv(mcols(myquery), file="curatedMetagenomicData_allrecords.csv")

3 Choose a dataset for analysis

Here we choose “EH2” from the View(mcols(myquery)) above. This loads the “Candela_Africa” stool dataset. Note that this operation will occur much more quickly subsequent times that it is performed, due to local caching.

Please ignore the warning that this generates, it is harmless and will not appear in the release version.

taxabund = eh[["EH2"]]

## BioC_mirror: https://bioconductor.org

## Using Bioconductor 3.4 (BiocInstaller 1.23.5), R 3.3.1 (2016-06-21).

## Installing package(s) 'curatedMetagenomicData'

## see ?curatedMetagenomicData and browseVignettes('curatedMetagenomicData') for documentation

## loading from cache '/home/nar/.ExperimentHub/2'

taxabund

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 3302 features, 38 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: H10 H11 ... IT8 (38 total)
##   varLabels: dataset_name sampleID ... group (211 total)
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
##   pubMedIds: 25981789 
## Annotation: NA

This ExpressionSet object contains information about the experiment:

experimentData(taxabund)@name

## [1] "Rampelli S, Schnorr SL, Consolandi C, Turroni S, Severgnini M, Peano C, Brigidi P, Crittenden AN, Henry AG, Candela M"

experimentData(taxabund)@title

## [1] "Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota"

experimentData(taxabund)@abstract

## [1] "Through human microbiome sequencing, we can better understand how host evolutionary and ontogenetic history is reflected in the microbial function. However, there has been no information on the gut metagenome configuration in hunter-gatherer populations, posing a gap in our knowledge of gut microbiota (GM)-host mutualism arising from a lifestyle that describes over 90% of human evolutionary history. Here, we present the first metagenomic analysis of GM from Hadza hunter-gatherers of Tanzania, showing a unique enrichment in metabolic pathways that aligns with the dietary and environmental factors characteristic of their foraging lifestyle. We found that the Hadza GM is adapted for broad-spectrum carbohydrate metabolism, reflecting the complex polysaccharides in their diet. Furthermore, the Hadza GM is equipped for branched-chain amino acid degradation and aromatic amino acid biosynthesis. Resistome functionality demonstrates the existence of antibiotic resistance genes in a population with little antibiotic exposure, indicating the ubiquitous presence of environmentally derived resistances. Our results demonstrate how the functional specificity of the GM correlates with certain environment and lifestyle factors and how complexity from the exogenous environment can be balanced by endogenous homeostasis. The Hadza gut metagenome structure allows us to appreciate the co-adaptive functional role of the GM in complementing the human physiology, providing a better understanding of the versatility of human life and subsistence."

experimentData(taxabund)@other

## $platform_summary
## [1] NA
## 
## $platform_accession
## [1] NA
## 
## $platform_title
## [1] "GAIIx"
## 
## $platform_technology
## [1] "Illumina"
## 
## $platform_distribution
## [1] NA
## 
## $platform_manufacturer
## [1] "Illumina"
## 
## $processor
## [1] "MetaPhlAn2 (marker & abundance) / HUMAnN2 (geneFam., pathAbun., & pathCov.)"

It also contains information about the study subjects. For example, here are the first 11 columns of data (all that are available for this dataset), for the first 5 subjects:

pData(taxabund)[1:5, 1:11]

##       dataset_name sampleID subjectID bodysite disease age gender  country
## H10 Candela_Africa      H10       h10    stool       n  40 female tanzania
## H11 Candela_Africa      H11       h11    stool       n  29 female tanzania
## H12 Candela_Africa      H12       h12    stool       n   8 female tanzania
## H13 Candela_Africa      H13       h13    stool       n  34   male tanzania
## H14 Candela_Africa      H14       h14    stool       n  30   male tanzania
##     sequencing_technology pubmedid    camp
## H10              Illumina 25981789 dedauko
## H11              Illumina 25981789 dedauko
## H12              Illumina 25981789 dedauko
## H13              Illumina 25981789 dedauko
## H14              Illumina 25981789 dedauko

Again, this could be explored more fully by (unevaluated):

View(pData(taxabund))

And of course, it contains the table of taxonomic abundances. Here are the relative abundances for the first two Operational Taxonomic Units (OTUs), for the first five subjects:

exprs(taxabund)[1:2, 1:5]

##                                 H10     H11     H12     H13     H14
## k__Archaea                  0.24169 0.50621 0.30522 0.40133 0.17479
## k__Archaea|p__Euryarchaeota 0.24169 0.50621 0.30522 0.40133 0.17479

We refer users to the MetaPhlAn2 documentation for description of what these values represent.

4 Choose metabolic datasets for analysis

Recall that we looked up dataset codes such as “EH6” by doing View(mcols(myquery)).

Let’s look, for example, at pathway coverage for the Candela Africa dataset:

pathcoverage = myquery[["EH7"]]

## Warning: package 'curatedMetagenomicData' is not available (for R version
## 3.3.1)

Coverage of the first two pathways, for the first two subjects, is:

exprs(pathcoverage)[1:2, 1:2]

##                                                                      H1
## 12DICHLORETHDEG-PWY                                                0.25
## 12DICHLORETHDEG-PWY|g__Bacteroides.s__Bacteroides_cellulosilyticus   NA
##                                                                     H10
## 12DICHLORETHDEG-PWY                                                0.25
## 12DICHLORETHDEG-PWY|g__Bacteroides.s__Bacteroides_cellulosilyticus   NA

We refer users to the HUMAnN2 documentation for description of what these values represent.

5 Taxonomy-aware analysis with the `phyloseq` Bioconductor library

curatedMetagenomicData provides 6 types of data; of these, only the taxonomic profiles make sense to analyse as phylogenetic data. Here we use a convenience script from the Waldron lab to create the phyloseq object

library(phyloseq)
source("https://raw.githubusercontent.com/waldronlab/presentations/master/Waldron_2016-06-07_EPIC/metaphlanToPhyloseq.R")

And create the phyloseq object from the above-created object “taxabund”.

Candela = metaphlanToPhyloseq(tax = exprs(taxabund), 
                              metadat = pData(taxabund),
                              simplenames = TRUE, 
                              roundtointeger = TRUE)

Now we can do all sorts of taxonomy-aware analyses of this dataset…

summary(otu_table(Candela))

##       H10              H11              H12              H13        
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:     0   1st Qu.:     0  
##  Median :     0   Median :     0   Median :     0   Median :     0  
##  Mean   :  2415   Mean   :  2413   Mean   :  2412   Mean   :  2409  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0  
##  Max.   :997583   Max.   :994938   Max.   :996948   Max.   :995987  
##       H14              H15              H16              H17        
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:     0   1st Qu.:     0  
##  Median :     0   Median :     0   Median :     0   Median :     0  
##  Mean   :  2409   Mean   :  2413   Mean   :  2411   Mean   :  2412  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0  
##  Max.   :998252   Max.   :980141   Max.   :999055   Max.   :992864  
##       H18               H19               H1              H20        
##  Min.   :      0   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.:      0   1st Qu.:     0   1st Qu.:     0   1st Qu.:     0  
##  Median :      0   Median :     0   Median :     0   Median :     0  
##  Mean   :   2418   Mean   :  2418   Mean   :  2370   Mean   :  2382  
##  3rd Qu.:      0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0  
##  Max.   :1000000   Max.   :999642   Max.   :988258   Max.   :969768  
##       H21              H22              H23               H24        
##  Min.   :     0   Min.   :     0   Min.   :      0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:      0   1st Qu.:     0  
##  Median :     0   Median :     0   Median :      0   Median :     0  
##  Mean   :  2384   Mean   :  2402   Mean   :   2412   Mean   :  2365  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:      0   3rd Qu.:     0  
##  Max.   :989730   Max.   :989674   Max.   :1000000   Max.   :999042  
##       H25              H26              H27               H2         
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:     0   1st Qu.:      0  
##  Median :     0   Median :     0   Median :     0   Median :      0  
##  Mean   :  2405   Mean   :  2401   Mean   :  2396   Mean   :   2390  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:      0  
##  Max.   :994856   Max.   :982371   Max.   :985848   Max.   :1000000  
##        H3               H4               H5               H6         
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:     0   1st Qu.:      0  
##  Median :     0   Median :     0   Median :     0   Median :      0  
##  Mean   :  2418   Mean   :  2392   Mean   :  2381   Mean   :   2420  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:      0  
##  Max.   :999013   Max.   :994328   Max.   :985114   Max.   :1000000  
##        H7               H8               H9              IT11        
##  Min.   :     0   Min.   :     0   Min.   :     0   Min.   :      0  
##  1st Qu.:     0   1st Qu.:     0   1st Qu.:     0   1st Qu.:      0  
##  Median :     0   Median :     0   Median :     0   Median :      0  
##  Mean   :  2416   Mean   :  2402   Mean   :  2405   Mean   :   2417  
##  3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:     0   3rd Qu.:      0  
##  Max.   :999459   Max.   :991263   Max.   :996313   Max.   :1000000  
##       IT13             IT14              IT1               IT2        
##  Min.   :     0   Min.   :      0   Min.   :      0   Min.   :     0  
##  1st Qu.:     0   1st Qu.:      0   1st Qu.:      0   1st Qu.:     0  
##  Median :     0   Median :      0   Median :      0   Median :     0  
##  Mean   :  2411   Mean   :   2410   Mean   :   2402   Mean   :  2392  
##  3rd Qu.:     0   3rd Qu.:      0   3rd Qu.:      0   3rd Qu.:     0  
##  Max.   :999760   Max.   :1000000   Max.   :1000000   Max.   :972812  
##       IT3               IT4              IT5               IT6        
##  Min.   :      0   Min.   :     0   Min.   :      0   Min.   :     0  
##  1st Qu.:      0   1st Qu.:     0   1st Qu.:      0   1st Qu.:     0  
##  Median :      0   Median :     0   Median :      0   Median :     0  
##  Mean   :   2351   Mean   :  2390   Mean   :   2402   Mean   :  2381  
##  3rd Qu.:      0   3rd Qu.:     0   3rd Qu.:      0   3rd Qu.:     0  
##  Max.   :1000000   Max.   :997734   Max.   :1000000   Max.   :999404  
##       IT7               IT8         
##  Min.   :      0   Min.   :      0  
##  1st Qu.:      0   1st Qu.:      0  
##  Median :      0   Median :      0  
##  Mean   :   2328   Mean   :   2367  
##  3rd Qu.:      0   3rd Qu.:      0  
##  Max.   :1000000   Max.   :1000000

summary(sample_data(Candela))

##  dataset_name         sampleID          subjectID        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    bodysite           disease              age           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     gender            country          sequencing_technology
##  Length:38          Length:38          Length:38            
##  Class :character   Class :character   Class :character     
##  Mode  :character   Mode  :character   Mode  :character     
##    pubmedid             camp           paired_end_insert_size
##  Length:38          Length:38          Length:38             
##  Class :character   Class :character   Class :character      
##  Mode  :character   Mode  :character   Mode  :character      
##  read_length        total_reads        matched_reads     
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  uniquely_matching_reads uniquely_matched_reads gene_number       
##  Length:38               Length:38              Length:38         
##  Class :character        Class :character       Class :character  
##  Mode  :character        Mode  :character       Mode  :character  
##  gene_number_for_11m_uniquely_matched_reads hitchip_probe_number
##  Length:38                                  Length:38           
##  Class :character                           Class :character    
##  Mode  :character                           Mode  :character    
##      bmi            gene_count_class   hitchip_probe_class
##  Length:38          Length:38          Length:38          
##  Class :character   Class :character   Class :character   
##  Mode  :character   Mode  :character   Mode  :character   
##   X.SampleID        rna_sampleid       infant_gender     
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  infant_gestation_weeks    cohort          less_than_29weeks 
##  Length:38              Length:38          Length:38         
##  Class :character       Class :character   Class :character  
##  Mode  :character       Mode  :character   Mode  :character  
##  sample_collection_days collectionweek     samplecollectionwindow
##  Length:38              Length:38          Length:38             
##  Class :character       Class :character   Class :character      
##  Mode  :character       Mode  :character   Mode  :character      
##  gut_sample_id_ncbipublic gut_sample_id_corrected  projectid        
##  Length:38                Length:38               Length:38         
##  Class :character         Class :character        Class :character  
##  Mode  :character         Mode  :character        Mode  :character  
##    flowcell           comment          mlst_project      
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    mlst_ec             st_ec             mlst_kp         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     st_kp           extractionprotocolid site_id_cincinnati
##  Length:38          Length:38            Length:38         
##  Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character     Mode  :character  
##  delivery_mode      infant_ethnicity   infant_race       
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   birth_year        necrotizing_enterocolitis dol_firstnecors   
##  Length:38          Length:38                 Length:38         
##  Class :character   Class :character          Class :character  
##  Mode  :character   Mode  :character          Mode  :character  
##     sepsis          infant_birthweight_kg infant_birth_length_cm
##  Length:38          Length:38             Length:38             
##  Class :character   Class :character      Class :character      
##  Mode  :character   Mode  :character      Mode  :character      
##  maternal_abx_given postnatal_antimicrobial_use postnatal_abx_2window
##  Length:38          Length:38                   Length:38            
##  Class :character   Class :character            Class :character     
##  Mode  :character   Mode  :character            Mode  :character     
##  perinatal_antimicrobial_use    death           nursing_status    
##  Length:38                   Length:38          Length:38         
##  Class :character            Class :character   Class :character  
##  Mode  :character            Mode  :character   Mode  :character  
##  maternal_age_at_delivery_years mat_age_bin           parity         
##  Length:38                      Length:38          Length:38         
##  Class :character               Class :character   Class :character  
##  Mode  :character               Mode  :character   Mode  :character  
##  infant_fut2          mat_fut2             lowh          
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     lowlea            highslea           married         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    gravida          preeclampsia        mult_birth       
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  hypertension       hypertension_prepreg chorioamnionitis  
##  Length:38          Length:38            Length:38         
##  Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character     Mode  :character  
##    primipar         daysonform3daysprior days_on_abx_14    
##  Length:38          Length:38            Length:38         
##  Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character     Mode  :character  
##  visit_number          snprnt            wmsphase        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     first             repeat.          stooltexture      
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  daysafteronset         hus             stec_count       
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  shigatoxin2elisa   readsmillions        nonhuman        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  stec_coverage      stxab_detected      stx_ratio        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   typingdata        c_difficile_frequency     ibd           
##  Length:38          Length:38             Length:38         
##  Class :character   Class :character      Class :character  
##  Mode  :character   Mode  :character      Mode  :character  
##  sampling_day      
##  Length:38         
##  Class :character  
##  Mode  :character  
##  known_consumers_of_a_defined_fermented_milk_product_.dfmp.
##  Length:38                                                 
##  Class :character                                          
##  Mode  :character                                          
##  mgs_richness       mgs_profile_matched_sample_pairs  ethnicity        
##  Length:38          Length:38                        Length:38         
##  Class :character   Class :character                 Class :character  
##  Mode  :character   Mode  :character                 Mode  :character  
##  sample_name        index_2_primer     index_1_primer    
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  sample_name_repeat1 extraction_conc    sampleid_with_barcode
##  Length:38           Length:38          Length:38            
##  Class :character    Class :character   Class :character     
##  Mode  :character    Mode  :character   Mode  :character     
##  initial_nreads     quality_control    screensensloc_hg19
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  screensensloc_phix174  screenhiv         percent_initial_reads
##  Length:38             Length:38          Length:38            
##  Class :character      Class :character   Class :character     
##  Mode  :character      Mode  :character   Mode  :character     
##  percent_qc_reads   tipo_prelievo      prima_dopo_cura   
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  codice_provetta        data          
##  Length:38          Length:38         
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##  prelievo_efemalefemaleettuato_presso patient_number    
##  Length:38                            Length:38         
##  Class :character                     Class :character  
##  Mode  :character                     Mode  :character  
##      pasi               bsa            age_ofemale_onset 
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   arthritis             type           antibiotic_usage  
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     stage            cirrhotic         hbv_related       
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  alcohol_related    other_causes_related     inr           
##  Length:38          Length:38            Length:38         
##  Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character     Mode  :character  
##      crea               alb                 tb           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##       pt              ascites               he           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##      ctp                meld            antivirus        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   X..blocker        designation         age_range        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  site_symmetry      affected.unaffected    method         
##  Length:38          Length:38           Length:38         
##  Class :character   Class :character    Class :character  
##  Mode  :character   Mode  :character    Mode  :character  
##  site_characteristic total_intital_reads estimated_median_insert_size
##  Length:38           Length:38           Length:38                   
##  Class :character    Class :character    Class :character            
##  Mode  :character    Mode  :character    Mode  :character            
##  reported_as_failed_qc uniquely_align_to_human
##  Length:38             Length:38              
##  Class :character      Class :character       
##  Mode  :character      Mode  :character       
##  non_uniquely_align_to_human_with_0_2_mismatches
##  Length:38                                      
##  Class :character                               
##  Mode  :character                               
##  reads_removed_because_of_read_pair_trimming_discrepancy
##  Length:38                                              
##  Class :character                                       
##  Mode  :character                                       
##  too_short_after_quality_trimming..50bp. final_._paired_reads
##  Length:38                               Length:38           
##  Class :character                        Class :character    
##  Mode  :character                        Mode  :character    
##  final_._singletons total_final_reads 
##  Length:38          Length:38         
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##  X._reads_kept_by_mean_slope_of_last_5_points_.coverage.
##  Length:38                                              
##  Class :character                                       
##  Mode  :character                                       
##   population         bmi_class          X16s_rrna        
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  shotgun_metagenome read_depth_16s        paired         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     single             height             weight         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    diabetic             fbg                sbp           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##      dbp                fins               fcp           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     hbalc                tg                tcho          
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##      hdl                ldl                kit           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  sample_group       illumina_run       classification    
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  gad.antibodies         whr                 wc           
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  cholesterol        triglycerides       creatinine       
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##      y.gt           fasting_glucose    fasting_insulin   
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     hba1c           adiponectin           leptin         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     glp.1              fgf.19             hscrp          
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   c.peptide             tnfa               il.1          
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     cd163             statins            insulin         
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  oral_anti.diabetic_medication years_in_sweden     tnm_stage        
##  Length:38                     Length:38          Length:38         
##  Class :character              Class :character   Class :character  
##  Mode  :character              Mode  :character   Mode  :character  
##   ajcc_stage        localization           fobt          
##  Length:38          Length:38          Length:38         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  wif.1_gene_methylation_test    group          
##  Length:38                   Length:38         
##  Class :character            Class :character  
##  Mode  :character            Mode  :character

5.1 Initial exploration

phyloseq help vignettes here.

Candela

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 3302 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 211 sample variables ]
## tax_table()   Taxonomy Table:    [ 3302 taxa by 8 taxonomic ranks ]

Candela = prune_taxa(taxa_sums(Candela) > 0, Candela)
Candela

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 648 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 211 sample variables ]
## tax_table()   Taxonomy Table:    [ 648 taxa by 8 taxonomic ranks ]

5.2 Subsetting and pruning

rank_names(Candela)

## [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
## [8] "Strain"

subset_taxa(Candela, !is.na(Strain))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 207 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 211 sample variables ]
## tax_table()   Taxonomy Table:    [ 207 taxa by 8 taxonomic ranks ]

(Candela.sp_strain = subset_taxa(Candela, !is.na(Species)))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 444 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 211 sample variables ]
## tax_table()   Taxonomy Table:    [ 444 taxa by 8 taxonomic ranks ]

taxonomy.level = apply(tax_table(Candela), 1, function(x) sum(!is.na(x)))
Candela.phy = prune_taxa(taxonomy.level==2, Candela)
taxa_names(Candela.phy)

##  [1] "p__Euryarchaeota"    "p__Acidobacteria"    "p__Actinobacteria"  
##  [4] "p__Bacteroidetes"    "p__Cyanobacteria"    "p__Firmicutes"      
##  [7] "p__Fusobacteria"     "p__Proteobacteria"   "p__Spirochaetes"    
## [10] "p__Tenericutes"      "p__Verrucomicrobia"  "p__Eukaryota_noname"
## [13] "p__Microsporidia"

5.3 Advanced pruning

Keep taxa only if they are in the most abundant 10% of taxa in at least two samples:

f1<- filterfun_sample(topp(0.1))
pru <- genefilter_sample(Candela, f1, A=2)
summary(pru)

##    Mode   FALSE    TRUE    NA's 
## logical     455     193       0

subset_taxa(Candela, pru)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 193 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 211 sample variables ]
## tax_table()   Taxonomy Table:    [ 193 taxa by 8 taxonomic ranks ]

More help here.

5.4 Heatmap

plot_heatmap(Candela.sp_strain, method="PCoA", distance="bray")

5.5 Barplot

par(mar = c(20, 4, 0, 0) + 0.1) # make more room on bottom margin
barplot(sort(taxa_sums(Candela.sp_strain), TRUE)[1:30]/nsamples(Candela.sp_strain), las=2)

5.6 Alpha diversity estimates

Look at ?phyloseq::estimate_richness
Supported measures of alpha diversity are:
- “Observed”, “Chao1”, “ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”
- more information from vegan package

Note, you can ignore warning about singletons:

alpha_meas = c("Shannon", "Simpson", "InvSimpson")
(p <- plot_richness(Candela, "gender", "camp", measures=alpha_meas))

5.7 Comparison of alpha diversity estimates

alphas = estimate_richness(Candela, measures=alpha_meas)
pairs(alphas)

5.8 Beta diversity / dissimilarity

E.g. Bray-Curtis dissimilarity between all pairs of samples:

plot(hclust(phyloseq::distance(Candela, method="bray")), 
     main="Bray-Curtis Dissimilarity", xlab="", sub = "")

Dozens of distance measures are available
- see ?phyloseq::distance and ?phyloseq::distanceMethodList

5.9 Ordination

ord = ordinate(Candela, method="PCoA", distance="bray")
plot_ordination(Candela, ord, color="camp", shape="camp") + 
  ggplot2::ggtitle("Bray-Curtis Principal Coordinates Analysis")

Available methods are “DCA”, “CCA”, “RDA”, “CAP”, “DPCoA”, “NMDS”, “MDS”, “PCoA”
PCoA approximately preserves distances in a two-dimensional projection

Screeplot:

plot_scree(ord) + ggplot2::ggtitle("Screeplot")

Acquiring curatedMetagenomicData through ExperimentHub

Levi Waldron

6/29/2016

1 What `curatedMetagenomicData` provides

2 Setup

3 Choose a dataset for analysis

4 Choose metabolic datasets for analysis

5 Taxonomy-aware analysis with the `phyloseq` Bioconductor library

5.1 Initial exploration

5.2 Subsetting and pruning

5.3 Advanced pruning

5.4 Heatmap

5.5 Barplot

5.6 Alpha diversity estimates

5.7 Comparison of alpha diversity estimates

5.8 Beta diversity / dissimilarity

5.9 Ordination

Acquiring curatedMetagenomicData through ExperimentHub

Levi Waldron

6/29/2016

1 What curatedMetagenomicData provides

2 Setup

3 Choose a dataset for analysis

4 Choose metabolic datasets for analysis

5 Taxonomy-aware analysis with the phyloseq Bioconductor library

5.1 Initial exploration

5.2 Subsetting and pruning

5.3 Advanced pruning

5.4 Heatmap

5.5 Barplot

5.6 Alpha diversity estimates

5.7 Comparison of alpha diversity estimates

5.8 Beta diversity / dissimilarity

5.9 Ordination

1 What `curatedMetagenomicData` provides

5 Taxonomy-aware analysis with the `phyloseq` Bioconductor library