16S and Shotgun Metagenomics integration

Published

October 15, 2025

Abstract

3 pipelines were compared in the present analysis (see Table 1)

Table 1- Pipelines used for the analysis of 18 samples. The figure shows databases and tools used in each pipeline

Data changes

Pipeline 1

  • 16S rRNA gene: All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”

Pipeline 2 and 3

  • 16S rRNA gene: the following taxa were merged into a single taxon: Bacteroides and [Bacteroides], Clostridium and [Clostridium], Eubacterium x2 and [Eubacterium] x2. All taxa originally labeled as “–” were renamed to reflect the lowest available taxonomic level (d: domain, p: phylum, c: class, o: order, f: family). Each of these taxa was subsequently tagged with “NID” to indicate “Not Identified.”

Data check

Prevalence was calculated for each genus by sequencing type. Results are detailed in Table 2 and Table 3. Results from pipeline 1 contained 39 taxa identified as “-” which were eliminated for further analyses

Table 2- Number of samples per species by sequencing type (pipeline 1)
Table 3- Number of samples per species by sequencing type (pipeline 2)
Table 4- Number of samples per species by sequencing type (pipeline 3)

Unique and shared taxa

Then, the unique and shared taxa were calculated for each pipeline

Table 5- Number of unique and shared species

Beta diversity

First, PCoA plots without rarefaction were done for each pipeline

Pipeline 1

Figure 1- PCoA plots by sequencing type without rarefaction (Bray-Curtis)
Figure 2- PCoA plot by sequencing type without rarefaction (Jaccard)

Pipeline 2

Figure 3- PCoA plots by sequencing type without rarefaction (Bray-Curtis)
Figure 4- PCoA plot by sequencing type without rarefaction (Jaccard)

Pipeline 3

Figure 5- PCoA plots by sequencing type without rarefaction (Bray-Curtis)
Figure 6- PCoA plots by sequencing type without rarefaction (Jaccard)

Rarefaction and Beta diversity

Data was rarefied at 10,000 reads per sample and beta diversity indices were calculated without any filter.

Pipeline 1

Figure 7- PCoA plots by sequencing type (Bracy-Curtis)
Figure 8- PCoA plots by sequencing type (Jaccard)

Pipeline 2

Figure 9- PCoA plots by sequencing type (Bray-Curtis)
Figure 10- PCoA plots by sequencing type (Jaccard)

Pipeline 3

Figure 11- PCoA plots by sequencing type (Bray-Curtis)
Figure 12- PCoA plots by sequencing type (Jaccard)

Filter by relative abundance

Every dataset was filtered and all taxa with < 0.01% median relative abundance were removed separately from 16S rRNA gene and shotgun metagenomics.

Table 6- Number of unique and shared species after filtering by relative abundance

Beta diversity after filtering

Pipeline 1

Figure 13- PCoA plots after filtering by relative abundance (Bray-Curtis)
Figure 14- PCoA plots after filtering by relative abundance (Jaccard)

Pipeline 2

Figure 15- PCoA plots after filtering by relative abundance (Bray-Curtis)
Figure 16- PCoA plots after filtering by relative abundance (Jaccard)

Pipeline 3

Figure 17- PCoA plots after filtering by relative abundance (Bray-Curtis)
Figure 18- PCoA plots after filtering by relative abundance (Jaccard)

Variance explained

The percentage of variance explained by the subjects and Dataset was calculated for each matrix across all pipelines Table 7 shows the R2 values for subjects after various data processing steps

Table 7- Variance explained by subject in each pipeline
Table 8- Variance explained by sequencing technique in each pipeline

Data integration

Data from 3 pipelines was merged into a single dataset and beta diversity was calculated

Figure 19- PCoA plots all pipelines (Jaccard)

The dataset was rarefied to 10000 reads per sample and beta diversity was calculated

Figure 20- PCoA plots after integration and rarefaction