16S and Shotgun Metagenomics integration
Abstract
3 pipelines were compared in the present analysis (see Table 1)
Data changes
Pipeline 1
- 16S rRNA gene: All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”
Pipeline 2 and 3
- 16S rRNA gene: the following taxa were merged into a single taxon: Bacteroides and [Bacteroides], Clostridium and [Clostridium], Eubacterium x2 and [Eubacterium] x2. All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”
Data check
Prevalence was calculated for each genus by sequencing type. Results are detailed in Table 2 and Table 3. Results from pipeline 1 contained 39 taxa identified as “-” which were eliminated for further analyses
Beta diversity
First, PCoA plots without rarefaction were done for each pipeline
Pipeline 1
Pipeline 2
Pipeline 3
Rarefaction and Beta diversity
Data was rarefied at 10,000 reads per sample and beta diversity indices were calculated without any filter.
Pipeline 1
Pipeline 2
Pipeline 3
Filter by relative abundance
Every dataset was filtered and all taxa with < 0.01% median relative abundance were removed separately from 16S rRNA gene and shotgun metagenomics.
Beta diversity after filtering
Pipeline 1
Pipeline 2
Pipeline 3
Variance explained
The percentage of variance explained by the subjects and Dataset was calculated for each matrix across all pipelines Table 7 shows the R2 values for subjects after various data processing steps
Data integration
Data from 3 pipelines was merged into a single dataset and beta diversity was calculated
The dataset was rarefied to 10000 reads per sample and beta diversity was calculated