16S and Shotgun Metagenomics integration
Abstract
3 pipelines were compared in the present analysis (see Table 1)
Data changes
Pipeline 1
- 16S rRNA gene: All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”
Pipeline 2 and 3
- 16S rRNA gene: the following taxa were merged into a single taxon: Bacteroides and [Bacteroides], Clostridium and [Clostridium], Eubacterium x2 and [Eubacterium] x2. All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”
Data check
Prevalence was calculated for each genus by sequencing type. Results are detailed in Table 2 and Table 3
Note:
GreenGenes2 (GG2) was redesigned (2023–2024) to provide updated, genome-based taxonomy, aligned with GTDB. The identifiers like g__01-FULL-36-10b follow a systematic internal nomenclature used in GG2 to denote unnamed taxa that are distinct, but not yet formally described.
Breakdown of the name:
g__: Genus-level classification
01-Full: Cluster prefix — identifies one of GG2’s internal genome clusters (“FULL” means full-length 16S reference sequence).
36-10b: Cluster index — unique within that cluster, often indicating subclades or genomic bins.
So g__01-FULL-36-10b is a placeholder genus in the GG2 tree, used for genomes that form a coherent genus-level clade but lack a valid Latin name.
g__01-FULL-36-10b is an unnamed, genome-defined genus-level group in the GreenGenes2 taxonomy, representing a distinct lineage not yet given a formal name in GTDB/NCBI.
It’s a real microbial clade, not a formatting artifact — but it just hasn’t been taxonomically described yet.
Beta diversity
First, PCoA plots without rarefaction were done for each pipeline
Pipeline 1
No rarefaction
Pipeline 2
No rarefaction
Pipeline 3
No rarefaction
Rarefaction and Beta diversity
Data was rarefied at 10,000 reads per sample and beta diversity indices were calculated without any filter.
Pipeline 1
Pipeline 2
Pipeline 3
Filter by relative abundance
Every dataset was filtered and all taxa with < 0.01% median relative abundance were removed separately from 16S rRNA gene and shotgun metagenomics.
Relative abundance 0.001%
Beta diversity after filtering
Pipeline 1 0.01%
Pipeline 1 0.001%
Pipeline 2 0.01%
Pipeline 2 0.001%
Pipeline 3 0.01%
Pipeline 3 0.001%
Variance explained
The percentage of variance explained by the subjects and Dataset was calculated for each matrix across all pipelines Table 10 shows the R2 values for subjects after various data processing steps
Data integration
Data from 3 pipelines was merged into a single dataset and beta diversity was calculated
The dataset was rarefied to 10000 reads per sample and beta diversity was calculated