The amplicon library of the BTEX enrichment was processed using qiime2 pipeline with a pre-trained Silva v132 classifier and dada2 actual-sequence-variant denoising. The results clearly revealed a community almost entirely dominated by an Arhodomonas species.
OTU.table <- read.csv("rich.OTU.table.BTEX.csv", header = TRUE)
taxonomy <- data.frame(do.call('rbind', strsplit(as.character(OTU.table$taxonomy),';',fixed=TRUE)))
colnames(taxonomy) <- c("kingdom","phylum","class","order","family","genus","species")
#OTU.table <- cbind(taxonomy, OTU.table[-1])
kable(OTU.table[c(1:3)])
| taxonomy | seq.count | relative.abund. |
|---|---|---|
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Nitrococcales;D_4__Nitrococcaceae;D_5__Arhodomonas;D_6__uncultured bacterium | 453177 | 0.9931036 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Clostridiaceae 1;D_5__Clostridium sensu stricto 13;D_6__uncultured bacterium | 1276 | 0.0027963 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;;;;; | 591 | 0.0012951 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Peptococcaceae;D_5__Candidatus Dichloromethanomonas;D_6__uncultured bacterium | 201 | 0.0004405 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;D_6__uncultured compost bacterium | 191 | 0.0004186 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Deltaproteobacteria;D_3__Desulfovibrionales;D_4__Desulfovibrionaceae;D_5__Desulfovibrio;D_6__Desulfovibrio desulfuricans | 101 | 0.0002213 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;D_6__Pseudomonas fluorescens | 93 | 0.0002038 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Enterobacteriales;D_4__Enterobacteriaceae;;; | 90 | 0.0001972 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Nitrococcales;D_4__Nitrococcaceae;D_5__Arhodomonas;D_6__uncultured bacterium | 78 | 0.0001709 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;; | 75 | 0.0001644 |
| D_0__Bacteria;D_1__Thermotogae;D_2__Thermotogae;D_3__Petrotogales;D_4__Petrotogaceae;D_5__SC103;D_6__metagenome | 45 | 0.0000986 |
| D_0__Bacteria;D_1__Cloacimonetes;D_2__Cloacimonadia;D_3__Cloacimonadales;;;; | 43 | 0.0000942 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Peptococcaceae;D_5__uncultured;; | 39 | 0.0000855 |
| D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Bacteroidales;D_4__Rikenellaceae;D_5__DMER64;D_6__uncultured bacterium | 37 | 0.0000811 |
| D_0__Bacteria;D_1__Spirochaetes;D_2__Spirochaetia;D_3__Spirochaetales;D_4__Spirochaetaceae;D_5__uncultured;D_6__uncultured bacterium | 36 | 0.0000789 |
| D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Bacteroidales;D_4__Bacteroidetes vadinHA17;D_5__uncultured bacterium;D_6__uncultured bacterium | 20 | 0.0000438 |
| D_0__Bacteria;D_1__Bacteroidetes;D_2__Rhodothermia;D_3__Balneolales;D_4__Balneolaceae;D_5__uncultured;D_6__uncultured bacterium | 19 | 0.0000416 |
| D_0__Bacteria;D_1__Chloroflexi;D_2__Anaerolineae;D_3__Anaerolineales;D_4__Anaerolineaceae;D_5__Longilinea;D_6__uncultured bacterium | 18 | 0.0000394 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Xanthomonadales;D_4__Xanthomonadaceae;D_5__Stenotrophomonas;; | 18 | 0.0000394 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Deltaproteobacteria;D_3__Syntrophobacterales;D_4__Syntrophaceae;D_5__Smithella;D_6__uncultured bacterium | 17 | 0.0000373 |
| D_0__Bacteria;D_1__Chlamydiae;D_2__LD1-PA32;D_3__uncultured bacterium;D_4__uncultured bacterium;D_5__uncultured bacterium;D_6__uncultured bacterium | 17 | 0.0000373 |
| D_0__Archaea;D_1__Euryarchaeota;D_2__Methanobacteria;D_3__Methanobacteriales;D_4__Methanobacteriaceae;D_5__Methanobrevibacter;; | 16 | 0.0000351 |
| D_0__Bacteria;D_1__Synergistetes;D_2__Synergistia;D_3__Synergistales;D_4__Synergistaceae;D_5__Lactivibrio;D_6__uncultured bacterium | 15 | 0.0000329 |
| D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Bacteroidales;;;; | 15 | 0.0000329 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Oceanospirillales;D_4__Halomonadaceae;D_5__Chromohalobacter;; | 14 | 0.0000307 |
| D_0__Bacteria;D_1__Cloacimonetes;D_2__Cloacimonadia;D_3__Cloacimonadales;;;; | 12 | 0.0000263 |
| D_0__Archaea;D_1__Euryarchaeota;D_2__Methanomicrobia;D_3__Methanomicrobiales;D_4__Methanocorpusculaceae;D_5__Methanocorpusculum;D_6__uncultured archaeon | 10 | 0.0000219 |
| D_0__Bacteria;D_1__Proteobacteria;D_2__Deltaproteobacteria;D_3__Myxococcales;D_4__Phaselicystidaceae;D_5__Phaselicystis;D_6__uncultured bacterium | 10 | 0.0000219 |
| D_0__Bacteria;D_1__Spirochaetes;D_2__Spirochaetia;D_3__Spirochaetales;D_4__Spirochaetaceae;D_5__uncultured;D_6__uncultured bacterium | 9 | 0.0000197 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__DTU014;D_4__uncultured bacterium;D_5__uncultured bacterium;D_6__uncultured bacterium | 8 | 0.0000175 |
| D_0__Bacteria;D_1__Hydrogenedentes;D_2__Hydrogenedentia;D_3__Hydrogenedentiales;D_4__Hydrogenedensaceae;D_5__uncultured bacterium;D_6__uncultured bacterium | 7 | 0.0000153 |
| D_0__Bacteria;;;;;;; | 7 | 0.0000153 |
| D_0__Bacteria;D_1__Atribacteria;D_2__Caldatribacteriia;D_3__Caldatribacteriales;D_4__Caldatribacteriaceae;D_5__Candidatus Caldatribacterium;D_6__uncultured bacterium | 6 | 0.0000131 |
| D_0__Bacteria;D_1__Actinobacteria;D_2__Thermoleophilia;D_3__Gaiellales;D_4__uncultured;D_5__metagenome;D_6__metagenome | 4 | 0.0000088 |
| D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Micrococcales;D_4__Cellulomonadaceae;D_5__Cellulomonas;D_6__uncultured bacterium | 3 | 0.0000066 |
| D_0__Bacteria;D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Christensenellaceae;D_5__Christensenellaceae R-7 group;; | 3 | 0.0000066 |
| D_0__Bacteria;D_1__Verrucomicrobia;D_2__Verrucomicrobiae;D_3__Pedosphaerales;D_4__Pedosphaeraceae;;; | 3 | 0.0000066 |
A number of other minor community members are possibly present, although at very low relative abundances. Indeed, the second most abundant community member is present in the amplicon library at 0.3%. This creates significant challenge for shotgun metagenomic analysis, as the vast majority of sequencing reads will be consumed by the dominant species.
We can do a very crude assessment. Assume average prokaryotic genome is 4mb. You need about 10x coverage to confidently assemble. That means, lets say ~40mb of sequencing per genome present. If we assume that the 16S amplicon library accurately represents the community, the second-most abundant species represents 0.3% of the total genomic DNA present. the amount of sequencing required is roughly equivalent to that which would be required to sequence 40mb / 0.003 = 13,333mb or 13.3 gbases (to reach 10X coverage).
A full MiSeq run yields a maximum of roughly 10 gbases. This will yield very approximately 7X coverage of the second-most abundant species; not enough to assemble with any confidence but enough to send down an assembly-independent pipeline.
Note that there is no guarantee that this is the actual sequencing requirement (the amplicon library is imprecise and shotgun sequencing is unpredictable).
https://www.nature.com/articles/srep01968 presents a more formalized approach:
They present an equation for estimating, assuming genus level uniqueness of the target species:
Sequencing required (A) = 0.0706 * (proportion) ^ -0.9959 / 2 A = 0.0706 * (0.003) ^ -0.9959 / 2
This method estimates 11.49 gbases required for 10x coverage, very similar estimate.
The actual metagenome yielded 4.7 gbases following QC (detailed below), thus we can expect ~4X coverage of the second-most abundant community member and thus partial assembly.
Shotgun sequencing of the BTEX enrichment yielded 6.7 Gbases of sequence data.
This was trimmed using Trimmomatic (default settings, sliding window of 6:25). This resulted in 4.7 Gbases. Reads were assembled using SPAdes v 3.12, with “careful” setting and error correction and a minimum contig size of 500bp. Assessing the assembly in QUAST:
| Statistics.without.reference | Kuwait2018.mgenome.SPAdes.con. |
|---|---|
| # contigs | 1066.00 |
| # contigs (>= 0 bp) | 1066.00 |
| # contigs (>= 1000 bp) | 446.00 |
| # contigs (>= 10000 bp) | 121.00 |
| # contigs (>= 100000 bp) | 1.00 |
| # contigs (>= 1000000 bp) | 0.00 |
| Largest contig | 179935.00 |
| Total length | 5104356.00 |
| Total length (>= 0 bp) | 5104356.00 |
| Total length (>= 1000 bp) | 4695727.00 |
| Total length (>= 10000 bp) | 3664180.00 |
| Total length (>= 100000 bp) | 179935.00 |
| Total length (>= 1000000 bp) | 0.00 |
| N50 | 24189.00 |
| N75 | 8250.00 |
| L50 | 47.00 |
| L75 | 140.00 |
| GC (%) | 67.96 |
| Mismatches | NA |
| # N’s | 8511.00 |
| # N’s per 100 kbp | 166.74 |
| Predicted genes | NA |
| # predicted genes (unique) | 4909.00 |
| # predicted genes (>= 0 bp) | 4910.00 |
| # predicted genes (>= 300 bp) | 4273.00 |
| # predicted genes (>= 1500 bp) | 610.00 |
| # predicted genes (>= 3000 bp) | 67.00 |
Total assembly length is ~5.1mb
Total genome size of the only existing Arhodomonas genome is 3.96mb, implying an extra 1mb of assembly (assuming that this Arhodomonas is similar in that respect)
GC content is also similar to that found in studied Arhodomonas genomes, ~70%:
The assembly was binned using MaxBin2 v2.2.4. This yielded two bins:
Bins: 2 Input Contigs: 1066 Binned Contigs: 368 (34.5%) Unbinned Contigs: 698 (65.5%) Contigs Too Short: 620 (58.2%) Summed Length of Binned Contigs: 4204219 (82.4%) Summed Length of Unbinned Contigs: 900137 (17.6%) Summed Length of Short Contigs: 408629 (8.0%)
| Bin.Name. | Read.Coverage. | GC.Content. | Number.of.Contigs. | Total.Contig.Length. |
|---|---|---|---|---|
| Bin.001.fasta | 0.925 | 0.698 | 245 | 3528385 |
| Bin.002.fasta | 0.028 | 0.674 | 123 | 675834 |
Given the similarity in GC content, what we already know about the community make-up, and the total length of the combined bins (4.2mb) compared to existing Arhodomonas genome (3.96mb), it is assumed that these two bins together represent the Arhodomonas
The genome of A. aqeuoli was downloaded from JGI IMG and annotate using prokka v1.13.3 (https://github.com/tseemann/prokka) under default recommended settings. The combined bin was annotated using the same pipeline. Initial whole genome alignment using Mauve yielded a scattered picture:
Iterative contig realignment made the contiguous regions more evident:
There remains approximately 50kb of the binned genome which is not accounted for in the reference genome. These contigs will be recovered and their functional annotations explored. These might represent novel genomic material in the Arhodomonas or they might represent mis-binned contigs.
The non-aligned contigs are named in this file and will be examined later in the pipeline:
| V2 | V3 | V4 | V5 | V6 |
|---|---|---|---|---|
| BTEX1_347 | chromosome | forward | 3718756 | 3722692 |
| BTEX1_290 | chromosome | complement | 3722693 | 3728982 |
| BTEX1_76 | chromosome | forward | 3728983 | 3739463 |
| BTEX1_337 | chromosome | forward | 3739464 | 3740876 |
| BTEX1_105 | chromosome | complement | 3740877 | 3741996 |
| BTEX1_5 | chromosome | forward | 3741997 | 3744665 |
| This is only | the first fi | ve of 139 con | tigs which | do not seem to align with the reference genome. These must derive from either: |
FastANI estimates average nucleotide identity between genomes by calculating pairwise nucleotide identity between homologous genes.
Results: 97.67% identity, out of the total 1218 sequence fragments in the BTEX bin (BTEX1), 1044 were determined to be orthologous to A. aquaeoli.
This is regarded as within-species level of identity based on recent observations described in https://www.biorxiv.org/content/early/2017/11/27/225342
ANI scores between pairs of 90,000 bacterial genomes show that an ANI above 96% is consistent with genomes from the same species
This alignment visualization is based on the FastANI calculations and shows the reference, A. aquaeoli (top) vs. the BTEX enrichment bin (bottom)
Conducted at MiGA web server:
The raw reads were mapped back with very strict requirements to the scaffolds of BTEX1. Reads which did not map back might represent two categories of genomic material: 1 BTEX1 genomic material which is not captured during the binning process and is also not found in the reference Arhodoomonas genome. 2 genomic material from other members of the community.
BowTie2 default parameters are somewhat permissive. Our read-length is maximum 250 bp. The default minimum score is -0.6 + -0.6 * L(ength), or -150.6. Mismatches between high quality reads impose a -6 penalty each. Thus, this default setting allows up to ~25 high quality mismatches, or ~10%. Given the knowledge that the other community members are unlikely to be closely related, this is acceptable.
Thus, the bowtie2 call used was:
bowtie2 -x (index) -1 (forward reads) -2 (reverse reads) –score-min L,-0.6,-0.6 –un-conc (paired reads that fail to map condordantly)
The unmapped reads are expected to be quite a minimal set, representing ~0.003 of the 4.8gbases of trimmed sequencing effort, or 10.5 mbases. However, there are likely still scaffolds representing un-binned parts of the BTEX1 genome also.
paired-read mapping leads to pairs that map only once to BTEX1 ending up in the unmapped sequence bin. This is good for assembly-based analysis, where the paired-end information is essential to good assembly and the contaminating BTEX1 pairs will not have any impact on the assembly. However, this is potentilly very bad for assembly-free pipelines, as BTEX1 reads will still be annotated. The question; How many more reads make it into the unmapped paired data set vs the unmapped unpaired data sets. The difference represents the amount of BTEX1 contamination.
BowTie2 results, paired mapping:
Forward reads: 11154399 reads; of these: 11154399 (100.00%) were unpaired; of these: 1479305 (13.26%) aligned 0 times 8963463 (80.36%) aligned exactly 1 time 711631 (6.38%) aligned >1 times 86.74% overall alignment rate
Reverse reads: 11154399 reads; of these: 11154399 (100.00%) were unpaired; of these: 1468625 (13.17%) aligned 0 times 8971133 (80.43%) aligned exactly 1 time 714641 (6.41%) aligned >1 times 86.83% overall alignment rate
Goal of MG-RAST analysis:
Two data sets:
Annotations which are present at a ratio of at least 0.5 unmapped/full metagenome and are present with less than 500 reads are considered.
Reads processed at MG-RAST without dereplication and without additional quality trimming (data has already been trimmed via Trimmomatic as described above)
Create tables of Silva SSU and RDP non-Arhodomonas taxonomy
Create table of pathways over-represented in the unmapped annotations via KEGG, COG, and Subsystems. Focus on a handful of stories: * putrescine uptake * phosphotransferase modules for carbohydrate uptake
Note that the 16S rRNA gene was NOT included in the BTEX1 genome bin, thus these reads are present in the unmapped read set. Also note that this is not a “classifier” -based approach, it only reports the best hit, it does not present and resolve conflicts.
Reads that map to the RDP database under the following stringency thresholds: e-value = -30, %identity = 70%, length = 50, minimum count = 2
RDP30 <- read.csv("MG-RAST/final.RDP.e30.tsv", header=TRUE, sep="\t")
RDP30 <- RDP30[-8]
RDP30 <- RDP30[order(-RDP30$BTEX1.unmapped.FR.fastq),]
RDP30$BTEX1.unmapped.FR.fastq <- RDP30$BTEX1.unmapped.FR.fastq / sum(RDP30$BTEX1.unmapped.FR.fastq)
RDP30$Kuwait.1810.mgenome.fastq <- RDP30$Kuwait.1810.mgenome.fastq / sum(RDP30$Kuwait.1810.mgenome.fastq)
colnames(RDP30) <- c("domain","phylum","class","order","family","genus","unmapped","full.mgenome")
RDP30 <- RDP30[-c(1:4),]
write.csv(RDP30, "RDP30.csv")
kable(RDP30)
| domain | phylum | class | order | family | genus | unmapped | full.mgenome | |
|---|---|---|---|---|---|---|---|---|
| 4 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Bacillus | 0.0006074 | 0.0003704 |
| 8 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Geobacillus | 0.0002916 | 0.0003010 |
| 9 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Marinococcus | 0.0001701 | 0.0002315 |
| 2 | Bacteria | Firmicutes | Bacilli | Bacillales | Alicyclobacillaceae | Alicyclobacillus | 0.0001458 | 0.0001158 |
| 5 | Bacteria | Firmicutes | Clostridia | Clostridiales | Peptococcaceae | Desulfotomaculum | 0.0000972 | 0.0000000 |
| 16 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Virgibacillus | 0.0000486 | 0.0000695 |
| 1 | Bacteria | Firmicutes | Clostridia | Clostridiales | Eubacteriaceae | Acetobacterium | 0.0000000 | 0.0000695 |
| 6 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Ectothiorhodospira | 0.0000000 | 0.0000926 |
| 7 | Bacteria | Firmicutes | Clostridia | Clostridiales | Eubacteriaceae | Eubacterium | 0.0000000 | 0.0000695 |
| 10 | Bacteria | Proteobacteria | Gammaproteobacteria | unclassified (derived from Gammaproteobacteria) | unclassified (derived from Gammaproteobacteria) | Methylonatrum | 0.0000000 | 0.0000463 |
| 11 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Nitrococcus | 0.0000000 | 0.0006020 |
| 12 | Bacteria | Firmicutes | Bacilli | Bacillales | Paenibacillaceae | Paenibacillus | 0.0000000 | 0.0000926 |
| 13 | Bacteria | Proteobacteria | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Psychrobacter | 0.0000000 | 0.0001158 |
| 14 | Bacteria | Firmicutes | Bacilli | Bacillales | Sporolactobacillaceae | Sporolactobacillus | 0.0000000 | 0.0000695 |
| 15 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Thioalkalivibrio | 0.0000000 | 0.0000695 |
| 19 | Bacteria | Proteobacteria | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | 0.0000000 | 0.0000695 |
Conducted using the same specificity settings as the RDP
SILVASSU30 <- read.csv("MG-RAST/final.SilvaSSU.e30.tsv", header=TRUE, sep="\t")
SILVASSU30 <- SILVASSU30[-8]
SILVASSU30 <- SILVASSU30[order(-SILVASSU30$BTEX1.unmapped.FR.fastq),]
SILVASSU30$BTEX1.unmapped.FR.fastq <- SILVASSU30$BTEX1.unmapped.FR.fastq / sum(SILVASSU30$BTEX1.unmapped.FR.fastq)
SILVASSU30$Kuwait.1810.mgenome.fastq <- SILVASSU30$Kuwait.1810.mgenome.fastq / sum(SILVASSU30$Kuwait.1810.mgenome.fastq)
colnames(SILVASSU30) <- c("domain","phylum","class","order","family","genus","unmapped","full.mgenome")
SILVASSU30 <- SILVASSU30[-c(1:4),]
kable(SILVASSU30)
| domain | phylum | class | order | family | genus | unmapped | full.mgenome | |
|---|---|---|---|---|---|---|---|---|
| 4 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Bacillus | 0.0006802 | 0.0004398 |
| 11 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Geobacillus | 0.0002915 | 0.0003009 |
| 12 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Marinococcus | 0.0001700 | 0.0002315 |
| 2 | Bacteria | Firmicutes | Bacilli | Bacillales | Alicyclobacillaceae | Alicyclobacillus | 0.0001457 | 0.0001157 |
| 7 | Bacteria | Firmicutes | Clostridia | Clostridiales | Peptococcaceae | Desulfotomaculum | 0.0000972 | 0.0000000 |
| 9 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Enterococcaceae | Enterococcus | 0.0000729 | 0.0000000 |
| 6 | Bacteria | Firmicutes | Clostridia | Clostridiales | Peptococcaceae | Desulfosporosinus | 0.0000486 | 0.0000000 |
| 19 | Bacteria | Firmicutes | Bacilli | Bacillales | Bacillaceae | Virgibacillus | 0.0000486 | 0.0000694 |
| 1 | Bacteria | Firmicutes | Clostridia | Clostridiales | Eubacteriaceae | Acetobacterium | 0.0000000 | 0.0000694 |
| 5 | Eukaryota | Nematoda | Chromadorea | Rhabditida | Rhabditidae | Caenorhabditis | 0.0000000 | 0.0000463 |
| 8 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Ectothiorhodospira | 0.0000000 | 0.0000926 |
| 10 | Bacteria | Firmicutes | Clostridia | Clostridiales | Eubacteriaceae | Eubacterium | 0.0000000 | 0.0000694 |
| 13 | Bacteria | Proteobacteria | Gammaproteobacteria | unclassified (derived from Gammaproteobacteria) | unclassified (derived from Gammaproteobacteria) | Methylonatrum | 0.0000000 | 0.0000463 |
| 14 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Nitrococcus | 0.0000000 | 0.0006018 |
| 15 | Bacteria | Firmicutes | Bacilli | Bacillales | Paenibacillaceae | Paenibacillus | 0.0000000 | 0.0001157 |
| 16 | Bacteria | Proteobacteria | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Psychrobacter | 0.0000000 | 0.0001157 |
| 17 | Bacteria | Firmicutes | Bacilli | Bacillales | Sporolactobacillaceae | Sporolactobacillus | 0.0000000 | 0.0000694 |
| 18 | Bacteria | Proteobacteria | Gammaproteobacteria | Chromatiales | Ectothiorhodospiraceae | Thioalkalivibrio | 0.0000000 | 0.0002315 |
| 22 | Bacteria | Proteobacteria | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | unclassified (derived from Proteobacteria) | 0.0000000 | 0.0000694 |
library(ggplot2)
library(RColorBrewer)
SILVASSU30.c <- SILVASSU30[c(1:8),-8]
SILVASSU30.c$concat <- paste(SILVASSU30.c$order,";",SILVASSU30.c$family,";",SILVASSU30.c$genus," = (",round(SILVASSU30.c$unmapped*100, 4),"%)", sep="")
write.csv(SILVASSU30.c, "SILVASSU30.c.csv")
par(mar=c(3,3,7,7))
plot <- pie(SILVASSU30.c$unmapped, labels = SILVASSU30.c$concat, main="non-Arhodomonas 16S rRNA gene fragments",
cex = 0.6, col = brewer.pal(n = 8, name = "Dark2"))
plot
## NULL
Not an ideal way to represent the data but it’s something
library(dplyr)
RDP30 <- dplyr::group_by(RDP30, order)
summary.RDP <- dplyr::summarise(RDP30, RDP = sum(unmapped))
summary.RDP$order <- as.character(summary.RDP$order)
SILVASSU30 <- dplyr::group_by(SILVASSU30, order)
summary.silva <- dplyr::summarise(SILVASSU30, SILVASSU30 = sum(unmapped))
summary.silva$order <- as.character(summary.silva$order)
summary.16S.order <- dplyr::full_join(summary.RDP, summary.silva, by="order")
summary.16S.order <- summary.16S.order[order(-summary.16S.order$SILVASSU30),]
colnames(summary.16S.order) <- c("order","RDP","Silva.SSU")
kable(summary.16S.order)
| order | RDP | Silva.SSU |
|---|---|---|
| Bacillales | 0.0012634 | 0.0013360 |
| Clostridiales | 0.0000972 | 0.0001457 |
| Lactobacillales | NA | 0.0000729 |
| Chromatiales | 0.0000000 | 0.0000000 |
| Pseudomonadales | 0.0000000 | 0.0000000 |
| unclassified (derived from Gammaproteobacteria) | 0.0000000 | 0.0000000 |
| unclassified (derived from Proteobacteria) | 0.0000000 | 0.0000000 |
| Rhabditida | NA | 0.0000000 |
The results from the two databases are very close to one another, indicating the presence of a handful of genomes from orders Bacillales and Clostriales. This is largely in accordance with the 16S amplicon library results. The 16S library suggested ~0.0030% other organisms, while the metagenome suggests ~0.0014% other organisms. At most, we should expect to find ~20 reads from any given fragment of genomes of minor community members.
This table is the result of annotation of all reads with the Subsytems database. Annotations are displayed using the stringency measures: e-value = -10 %identity minimum = 60% minimum length of match = 15 minimum counts = 2 maximum counts = 500
(relative counts and min/max read counts were applied in Excel)
subsystems <- read.csv("MG-RAST/final.subsystems.function.tsv", sep="\t", header=TRUE)
subsystems <- subsystems[-6]
subsystems <- subsystems[order(-subsystems$rel),]
subsystems <- subsystems[subsystems$rel > 0.5,] #remove all relative fractions 50% and below
kable(head(subsystems))
| level1 | level2 | level3 | function. | BTEX1.unmapped.FR.fastq | Kuwait.1810.mgenome.fastq | test | rel | |
|---|---|---|---|---|---|---|---|---|
| 20 | RNA Metabolism | RNA processing and modification | tRNA_modification_Archaea | Asparaginyl-tRNA synthetase (EC 6.1.1.22) | 6 | 3 | 6 | 2.000000 |
| 21 | Carbohydrates | Central carbohydrate metabolism | Pyruvate_metabolism_II:_acetyl-CoA,_acetogenesis_from_pyruvate | Pyruvate dehydrogenase E1 component alpha subunit (EC 1.2.4.1) | 4 | 2 | 4 | 2.000000 |
| 22 | Carbohydrates | One-carbon Metabolism | Serine-glyoxylate_cycle | Phosphoenolpyruvate carboxykinase [ATP] (EC 4.1.1.49) | 9 | 5 | 9 | 1.800000 |
| 23 | Cell Wall and Capsule | Capsular and extracellular polysacchrides | Alginate_metabolism | Probable poly(beta-D-mannuronate) O-acetylase (EC 2.3.1.-) | 7 | 4 | 7 | 1.750000 |
| 24 | Membrane Transport | NULL | Tricarboxylate_transport_system | Tricarboxylate transport protein TctC | 7 | 4 | 7 | 1.750000 |
| 25 | Clustering-based subsystems | NULL | Cluster_containing_CofD-like_protein_and_co-occuring_with_DNA_repair | FIG002813: LPPG:FO 2-phospho-L-lactate transferase like, CofD-like | 5 | 3 | 5 | 1.666667 |
This is a small subset of the subsystems analysis results
print(paste("There are",nrow(subsystems),"functions annoted by the subsystems database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"))
## [1] "There are 227 functions annoted by the subsystems database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"
subsystems.level3 <- dplyr::group_by(subsystems, level3)
subsystems.level3 <- dplyr::summarise(subsystems.level3, subsystems.path = n())
subsystems.level3 <- subsystems.level3[order(-subsystems.level3$subsystems.path),]
kable(subsystems.level3)
| level3 | subsystems.path |
|---|---|
| Sporulation_gene_orphans | 6 |
| Purine_conversions | 5 |
| Sugar_utilization_in_Thermotogales | 5 |
| D-Galacturonate_and_D-Glucuronate_Utilization | 4 |
| Heme,_hemin_uptake_and_utilization_systems_in_GramPositives | 4 |
| L-rhamnose_utilization | 4 |
| Mannitol_Utilization | 4 |
| COG3533 | 3 |
| Inositol_catabolism | 3 |
| Menaquinone_and_Phylloquinone_Biosynthesis | 3 |
| Phosphate_metabolism | 3 |
| Proteolysis_in_bacteria,_ATP-dependent | 3 |
| SigmaB_stress_responce_regulation | 3 |
| Trehalose_Uptake_and_Utilization | 3 |
| Bacterial_hemoglobins | 2 |
| Calvin-Benson_cycle | 2 |
| CBSS-257314.1.peg.676 | 2 |
| CBSS-393133.3.peg.2787 | 2 |
| Control_of_cell_elongation_-_division_cycle_in_Bacilli | 2 |
| DNA_processing_cluster | 2 |
| DNA_repair,_bacterial | 2 |
| ECF_class_transporters | 2 |
| Exopolysaccharide_Biosynthesis | 2 |
| Experimental_-_Histidine_Degradation | 2 |
| (GlcNAc)2_Catabolic_Operon | 2 |
| Oxidative_stress | 2 |
| Polyglycerolphosphate_lipoteichoic_acid_biosynthesis | 2 |
| Rhamnose_containing_glycans | 2 |
| Serine-glyoxylate_cycle | 2 |
| Tryptophan_catabolism | 2 |
| Xylose_utilization | 2 |
| 4-Hydroxyphenylacetic_acid_catabolic_pathway | 1 |
| Acetone_Butanol_Ethanol_Synthesis | 1 |
| Adenosyl_nucleosidases | 1 |
| Alginate_metabolism | 1 |
| Arginine_and_Ornithine_Degradation | 1 |
| Arsenic_resistance | 1 |
| At4g10620_At3g57180_At3g47450 | 1 |
| At5g63420 | 1 |
| Bacillibactin_Siderophore | 1 |
| Bacillus_biofilm_matrix_protein_component_TasA_and_homologs | 1 |
| Bacitracin_Stress_Response | 1 |
| Bacterial_Chemotaxis | 1 |
| Beta-Glucoside_Metabolism | 1 |
| Beta-lactamase_cluster_in_Streptococcus | 1 |
| Biogenesis_of_c-type_cytochromes | 1 |
| Biotin_biosynthesis | 1 |
| Broadly_distributed_proteins_not_in_subsystems | 1 |
| Cadmium_resistance | 1 |
| Carbon_Starvation | 1 |
| Carboxysome | 1 |
| CBSS-176280.1.peg.1561 | 1 |
| CBSS-176299.4.peg.1292 | 1 |
| CBSS-193567.1.peg.90 | 1 |
| CBSS-262719.3.peg.410 | 1 |
| CBSS-269801.1.peg.1715 | 1 |
| CBSS-288681.3.peg.1039 | 1 |
| CBSS-393121.3.peg.1913 | 1 |
| CBSS-393121.3.peg.2760 | 1 |
| CBSS-393130.3.peg.129 | 1 |
| Chitin_and_N-acetylglucosamine_utilization | 1 |
| Cluster_containing_CofD-like_protein_and_co-occuring_with_DNA_repair | 1 |
| Cluster_with_phosphopentomutase_paralog | 1 |
| Coenzyme_A_Biosynthesis | 1 |
| Coenzyme_PQQ_synthesis | 1 |
| COG0451 | 1 |
| COG0523 | 1 |
| COG2363 | 1 |
| COG2509 | 1 |
| COG3146_experimental | 1 |
| Competence_or_DNA_damage-inducible_protein_CinA_and_related_protein_families | 1 |
| Conserved_gene_cluster_associated_with_Met-tRNA_formyltransferase | 1 |
| Cysteine_Biosynthesis | 1 |
| Dehydrogenase_complexes | 1 |
| De_Novo_Purine_Biosynthesis | 1 |
| De_Novo_Pyrimidine_Synthesis | 1 |
| Deoxyribose_and_Deoxynucleoside_Catabolism | 1 |
| D-galactarate,_D-glucarate_and_D-glycerate_catabolism | 1 |
| Dihydroxyacetone_kinases | 1 |
| Dipicolinate_Synthesis | 1 |
| DMT_transporter | 1 |
| DNA_repair,_bacterial_DinG_and_relatives | 1 |
| DNA_Repair_Base_Excision | 1 |
| DNA_replication_cluster_1 | 1 |
| D-Tagatose_and_Galactitol_Utilization | 1 |
| EcsAB_transporter_affecting_expression_and_secretion_of_secretory_preproteins | 1 |
| Fermentations:_Mixed_acid | 1 |
| Ferrous_iron_transporter_EfeUOB,_low-pH-induced | 1 |
| Formate_hydrogenase | 1 |
| Fosfomycin_resistance | 1 |
| Fructooligosaccharides(FOS)_and_Raffinose_Utilization | 1 |
| Galactose-inducible_PTS | 1 |
| Glutamate_and_Aspartate_uptake_in_Bacteria | 1 |
| Glutamate-mediated_methylamine_utilization_pathway | 1 |
| Glycerolipid_and_Glycerophospholipid_Metabolism_in_Bacteria | 1 |
| Glycine_Biosynthesis | 1 |
| Glycogen_metabolism_cluster | 1 |
| Glycolysis_and_Gluconeogenesis | 1 |
| Heme_and_Siroheme_Biosynthesis | 1 |
| HMG_CoA_Synthesis | 1 |
| HPr_catabolite_repression_system | 1 |
| HPr_kinase_and_hprK_operon_in_Gram-positive_organisms | 1 |
| Hydantoin_metabolism | 1 |
| Isobutyryl-CoA_to_Propionyl-CoA_Module | 1 |
| Isoleucine_degradation | 1 |
| Isoprenoid_Biosynthesis | 1 |
| Lactate_utilization | 1 |
| Lactose_and_Galactose_Uptake_and_Utilization | 1 |
| L-Arabinose_utilization | 1 |
| Listeria_Pathogenicity_Island_LIPI-1_extended | 1 |
| LMPTP_YwlE_cluster | 1 |
| Lysine_Biosynthesis_DAP_Pathway | 1 |
| Mannose_Metabolism | 1 |
| Mediator_of_hyperadherence_YidE_in_Enterobacteria_and_its_conserved_region | 1 |
| Metallocarboxypeptidases_(EC_3.4.17.-) | 1 |
| Metalloendopeptidases_(EC_3.4.24.-) | 1 |
| Methionine_Biosynthesis | 1 |
| Muconate_lactonizing_enzyme_family | 1 |
| Murein_hydrolase_regulation_and_cell_death | 1 |
| Nitrosative_stress | 1 |
| Nudix_proteins_(nucleoside_triphosphate_hydrolases) | 1 |
| Omega_peptidases_(EC_3.4.19.-) | 1 |
| p-Aminobenzoyl-Glutamate_Utilization | 1 |
| Pentose_phosphate_pathway | 1 |
| Peptidoglycan_Crosslinking_of_Peptide_Stems | 1 |
| Phage_replication | 1 |
| Phd-Doc,YdcE-YdcD_toxin-antitoxin(programmed_cell_death)_systems | 1 |
| Phenylalanine_and_Tyrosine_Branches_from_Chorismate | 1 |
| Photorespiration_(oxidative_C2_cycle) | 1 |
| Polyamine_Metabolism | 1 |
| Polymyxin_Synthetase_Gene_Cluster_in_Bacillus | 1 |
| Potassium_homeostasis | 1 |
| Proline,_4-hydroxyproline_uptake_and_utilization | 1 |
| PROSC | 1 |
| Protocatechuate_branch_of_beta-ketoadipate_pathway | 1 |
| pyrimidine_conversions | 1 |
| Pyruvate_metabolism_I:_anaplerotic_reactions,_PEP | 1 |
| Pyruvate_metabolism_II:_acetyl-CoA,_acetogenesis_from_pyruvate | 1 |
| Restriction-Modification_System | 1 |
| RNA_methylation | 1 |
| RNA_modification_cluster | 1 |
| Sialic_Acid_Metabolism | 1 |
| Siderophore_[Alcaligin-like] | 1 |
| Siderophore_assembly_kit | 1 |
| Single-Rhodanese-domain_proteins | 1 |
| Spore_Coat | 1 |
| Spore_Core_Dehydration | 1 |
| Spore_germination | 1 |
| Sporulation-associated_proteins_with_broader_functions | 1 |
| Sporulation_Cluster | 1 |
| Sporulation-related_Hypotheticals | 1 |
| SpoVS_protein_family | 1 |
| Streptococcus_pyogenes_recombinatorial_zone | 1 |
| Sulfatases_and_sulfatase_modifying_factor_1 | 1 |
| Synthesis_of_osmoregulated_periplasmic_glucans | 1 |
| Teichoic_and_lipoteichoic_acids_biosynthesis | 1 |
| Terminal_cytochrome_C_oxidases | 1 |
| Terminal_cytochrome_oxidases | 1 |
| Thiamin_biosynthesis | 1 |
| Tolerance_to_colicin_E2 | 1 |
| Trehalose_Biosynthesis | 1 |
| Tricarboxylate_transport_system | 1 |
| tRNA_modification_Archaea | 1 |
| tRNA_modification_Bacteria | 1 |
| tRNA_modification_yeast_cytoplasmic | 1 |
| tRNA_processing | 1 |
| Two-component_Response_Regulator_of_Virulence_ResDE | 1 |
| Ubiquinone_Menaquinone-cytochrome_c_reductase_complexes | 1 |
| Universal_GTPases | 1 |
| Unspecified_monosaccharide_transport_cluster | 1 |
| YgfZ | 1 |
The Subsystems database is advantageous in that it is large and well annotated, but the organization leaves much to desire.
The presence of sporulation-related genes confirms the 16S results, that there are gram-positives in the community at low levels.
subsystems.level1 <- dplyr::group_by(subsystems, level1)
subsystems.level1 <- dplyr::summarise(subsystems.level1, subsystems.category = n())
subsystems.level1 <- subsystems.level1[order(-subsystems.level1$subsystems.category),]
kable(subsystems.level1)
| level1 | subsystems.category |
|---|---|
| Carbohydrates | 54 |
| Clustering-based subsystems | 26 |
| Miscellaneous | 18 |
| Amino Acids and Derivatives | 13 |
| Dormancy and Sporulation | 13 |
| Nucleosides and Nucleotides | 11 |
| Cell Wall and Capsule | 10 |
| Cofactors, Vitamins, Prosthetic Groups, Pigments | 10 |
| Stress Response | 9 |
| Iron acquisition and metabolism | 8 |
| Virulence, Disease and Defense | 8 |
| DNA Metabolism | 7 |
| Protein Metabolism | 7 |
| Membrane Transport | 5 |
| Respiration | 5 |
| RNA Metabolism | 5 |
| Regulation and Cell signaling | 4 |
| Phosphorus Metabolism | 3 |
| Cell Division and Cell Cycle | 2 |
| Fatty Acids, Lipids, and Isoprenoids | 2 |
| Metabolism of Aromatic Compounds | 2 |
| Phages, Prophages, Transposable elements, Plasmids | 2 |
| Motility and Chemotaxis | 1 |
| Nitrogen Metabolism | 1 |
| Potassium metabolism | 1 |
COG <- read.csv("MG-RAST/final.COG.function.tsv", sep="\t", header=TRUE)
COG <- COG[order(-COG$rel),]
COG <- COG[COG$rel > 0.5,] #remove all relative fractions 50% and below
kable(head(COG))
| level1 | level2 | function. | BTEX1.unmapped.FR.fastq | Kuwait.1810.mgenome.fastq | test | rel |
|---|---|---|---|---|---|---|
| CELLULAR PROCESSES AND SIGNALING | Cell wall/membrane/envelope biogenesis | Predicted membrane protein involved in D-alanine export | 5 | 2 | 5 | 2.5 |
| METABOLISM | Coenzyme transport and metabolism | Na+/panthothenate symporter | 6 | 3 | 6 | 2.0 |
| POORLY CHARACTERIZED | General function prediction only | Predicted phosphohydrolases | 4 | 2 | 4 | 2.0 |
| METABOLISM | Energy production and conversion | Phosphoenolpyruvate carboxykinase (ATP) | 9 | 5 | 9 | 1.8 |
| INFORMATION STORAGE AND PROCESSING | Replication, recombination and repair | ATPase involved in DNA replication | 6 | 4 | 6 | 1.5 |
| INFORMATION STORAGE AND PROCESSING | Replication, recombination and repair | Adenine specific DNA methylase Mod | 3 | 2 | 3 | 1.5 |
This is a small subset of the full COG database results
print(paste("There are",nrow(COG),"functions annoted by the COG database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"))
## [1] "There are 145 functions annoted by the COG database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"
COG.level2 <- dplyr::group_by(COG, level2)
COG.level2 <- dplyr::summarise(COG.level2, COG.path = n())
COG.level2 <- COG.level2[order(-COG.level2$COG.path),]
kable(COG.level2)
| level2 | COG.path |
|---|---|
| Carbohydrate transport and metabolism | 27 |
| General function prediction only | 26 |
| Amino acid transport and metabolism | 16 |
| Inorganic ion transport and metabolism | 13 |
| Energy production and conversion | 12 |
| Cell wall/membrane/envelope biogenesis | 8 |
| Replication, recombination and repair | 7 |
| Coenzyme transport and metabolism | 6 |
| Secondary metabolites biosynthesis, transport and catabolism | 6 |
| Signal transduction mechanisms | 6 |
| Nucleotide transport and metabolism | 5 |
| Transcription | 5 |
| Lipid transport and metabolism | 3 |
| Posttranslational modification, protein turnover, chaperones | 2 |
| Cell motility | 1 |
| Function unknown | 1 |
| Translation, ribosomal structure and biogenesis | 1 |
This is only consistent with the subsystems results in that carbohydrate metabolism is the main category containing unique genes.
COG.level1 <- dplyr::group_by(COG, level1)
COG.level1 <- dplyr::summarise(COG.level1, COG.category = n())
COG.level1 <- COG.level1[order(-COG.level1$COG.category),]
kable(COG.level1)
| level1 | COG.category |
|---|---|
| METABOLISM | 88 |
| POORLY CHARACTERIZED | 27 |
| CELLULAR PROCESSES AND SIGNALING | 17 |
| INFORMATION STORAGE AND PROCESSING | 13 |
KEGG <- read.csv("MG-RAST/final.KEGG.function.tsv", sep="\t", header=TRUE)
KEGG <- KEGG[order(-KEGG$rel),]
KEGG <- KEGG[KEGG$rel > 0.5,] #remove all relative fractions 50% and below
kable(head(KEGG))
| level1 | level2 | level3 | function. | BTEX1.unmapped.FR.fastq | Kuwait.1810.mgenome.fastq | test | rel |
|---|---|---|---|---|---|---|---|
| Environmental Information Processing | Membrane transport | 02010 ABC transporters [PATH:ko02010] | araG; L-arabinose transport system ATP-binding protein [EC:3.6.3.17] | 14 | 6 | 14 | 2.333333 |
| Metabolism | Carbohydrate metabolism | 00052 Galactose metabolism [PATH:ko00052] | bgaB, lacA; beta-galactosidase [EC:3.2.1.23] | 4 | 2 | 4 | 2.000000 |
| Environmental Information Processing | Signal transduction | 02020 Two-component system [PATH:ko02020] | desR; two-component system, NarL family, response regulator DesR | 4 | 2 | 4 | 2.000000 |
| Metabolism | Carbohydrate metabolism | 00500 Starch and sucrose metabolism [PATH:ko00500] | treS; maltose alpha-D-glucosyltransferase/ alpha-amylase [EC:5.4.99.16 3.2.1.1] | 6 | 3 | 6 | 2.000000 |
| Metabolism | Carbohydrate metabolism | 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] | E4.1.1.49, pckA; phosphoenolpyruvate carboxykinase (ATP) [EC:4.1.1.49] | 12 | 7 | 12 | 1.714286 |
| Metabolism | Amino acid metabolism | 00270 Cysteine and methionine metabolism [PATH:ko00270] | E2.1.1.10, mmuM; homocysteine S-methyltransferase [EC:2.1.1.10] | 8 | 5 | 8 | 1.600000 |
print(paste("There are",nrow(KEGG),"functions annoted by the KEGG database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"))
## [1] "There are 144 functions annoted by the KEGG database which meet the criteria for being present primarily in non-Arhodomonas menbers of the community"
KEGG.level3 <- dplyr::group_by(KEGG, level3)
KEGG.level3 <- dplyr::summarise(KEGG.level3, KEGG.pathway = n())
KEGG.level3 <- KEGG.level3[order(-KEGG.level3$KEGG.pathway),]
kable(KEGG.level3)
| level3 | KEGG.pathway |
|---|---|
| 02010 ABC transporters [PATH:ko02010] | 13 |
| 02020 Two-component system [PATH:ko02020] | 13 |
| 02060 Phosphotransferase system (PTS) [PATH:ko02060] | 10 |
| 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] | 5 |
| 00130 Ubiquinone and other terpenoid-quinone biosynthesis [PATH:ko00130] | 5 |
| 00270 Cysteine and methionine metabolism [PATH:ko00270] | 5 |
| 00040 Pentose and glucuronate interconversions [PATH:ko00040] | 4 |
| 00051 Fructose and mannose metabolism [PATH:ko00051] | 4 |
| 00190 Oxidative phosphorylation [PATH:ko00190] | 4 |
| 00230 Purine metabolism [PATH:ko00230] | 4 |
| 00500 Starch and sucrose metabolism [PATH:ko00500] | 4 |
| 00260 Glycine, serine and threonine metabolism [PATH:ko00260] | 3 |
| 00520 Amino sugar and nucleotide sugar metabolism [PATH:ko00520] | 3 |
| 00562 Inositol phosphate metabolism [PATH:ko00562] | 3 |
| 00030 Pentose phosphate pathway [PATH:ko00030] | 2 |
| 00052 Galactose metabolism [PATH:ko00052] | 2 |
| 00053 Ascorbate and aldarate metabolism [PATH:ko00053] | 2 |
| 00280 Valine, leucine and isoleucine degradation [PATH:ko00280] | 2 |
| 00330 Arginine and proline metabolism [PATH:ko00330] | 2 |
| 00340 Histidine metabolism [PATH:ko00340] | 2 |
| 00350 Tyrosine metabolism [PATH:ko00350] | 2 |
| 00561 Glycerolipid metabolism [PATH:ko00561] | 2 |
| 00620 Pyruvate metabolism [PATH:ko00620] | 2 |
| 00730 Thiamine metabolism [PATH:ko00730] | 2 |
| 00770 Pantothenate and CoA biosynthesis [PATH:ko00770] | 2 |
| 00860 Porphyrin and chlorophyll metabolism [PATH:ko00860] | 2 |
| 00940 Phenylpropanoid biosynthesis [PATH:ko00940] | 2 |
| 01053 Biosynthesis of siderophore group nonribosomal peptides [PATH:ko01053] | 2 |
| 04146 Peroxisome [PATH:ko04146] | 2 |
| 00020 Citrate cycle (TCA cycle) [PATH:ko00020] | 1 |
| 00140 Steroid hormone biosynthesis [PATH:ko00140] | 1 |
| 00240 Pyrimidine metabolism [PATH:ko00240] | 1 |
| 00250 Alanine, aspartate and glutamate metabolism [PATH:ko00250] | 1 |
| 00300 Lysine biosynthesis [PATH:ko00300] | 1 |
| 00310 Lysine degradation [PATH:ko00310] | 1 |
| 00360 Phenylalanine metabolism [PATH:ko00360] | 1 |
| 00362 Benzoate degradation [PATH:ko00362] | 1 |
| 00380 Tryptophan metabolism [PATH:ko00380] | 1 |
| 00440 Phosphonate and phosphinate metabolism [PATH:ko00440] | 1 |
| 00511 Other glycan degradation [PATH:ko00511] | 1 |
| 00521 Streptomycin biosynthesis [PATH:ko00521] | 1 |
| 00550 Peptidoglycan biosynthesis [PATH:ko00550] | 1 |
| 00564 Glycerophospholipid metabolism [PATH:ko00564] | 1 |
| 00630 Glyoxylate and dicarboxylate metabolism [PATH:ko00630] | 1 |
| 00633 Nitrotoluene degradation [PATH:ko00633] | 1 |
| 00650 Butanoate metabolism [PATH:ko00650] | 1 |
| 00680 Methane metabolism [PATH:ko00680] | 1 |
| 00760 Nicotinate and nicotinamide metabolism [PATH:ko00760] | 1 |
| 00785 Lipoic acid metabolism [PATH:ko00785] | 1 |
| 00900 Terpenoid backbone biosynthesis [PATH:ko00900] | 1 |
| 00930 Caprolactam degradation [PATH:ko00930] | 1 |
| 00970 Aminoacyl-tRNA biosynthesis [PATH:ko00970] | 1 |
| 02040 Flagellar assembly [PATH:ko02040] | 1 |
| 03008 Ribosome biogenesis in eukaryotes [PATH:ko03008] | 1 |
| 03013 RNA transport [PATH:ko03013] | 1 |
| 03018 RNA degradation [PATH:ko03018] | 1 |
| 03020 RNA polymerase [PATH:ko03020] | 1 |
| 03030 DNA replication [PATH:ko03030] | 1 |
| 03410 Base excision repair [PATH:ko03410] | 1 |
| 03430 Mismatch repair [PATH:ko03430] | 1 |
| 03440 Homologous recombination [PATH:ko03440] | 1 |
| 04142 Lysosome [PATH:ko04142] | 1 |
| 05146 Amoebiasis [PATH:ko05146] | 1 |
KEGG.level2 <- dplyr::group_by(KEGG, level2)
KEGG.level2 <- dplyr::summarise(KEGG.level2, KEGG.level2 = n())
KEGG.level2 <- KEGG.level2[order(-KEGG.level2$KEGG.level2),]
kable(KEGG.level2)
| level2 | KEGG.level2 |
|---|---|
| Carbohydrate metabolism | 34 |
| Membrane transport | 23 |
| Amino acid metabolism | 21 |
| Metabolism of cofactors and vitamins | 13 |
| Signal transduction | 13 |
| Energy metabolism | 5 |
| Nucleotide metabolism | 5 |
| Lipid metabolism | 4 |
| Replication and repair | 4 |
| Biosynthesis of other secondary metabolites | 3 |
| Translation | 3 |
| Transport and catabolism | 3 |
| Xenobiotics biodegradation and metabolism | 3 |
| Glycan biosynthesis and metabolism | 2 |
| Metabolism of terpenoids and polyketides | 2 |
| Cell motility | 1 |
| Folding, sorting and degradation | 1 |
| Infectious diseases | 1 |
| Metabolism of other amino acids | 1 |
| Metabolism of Terpenoids and Polyketides | 1 |
| Transcription | 1 |
#this cell imports and parses the MG-RAST kegg orthology file output to create a name to KO mapping file
library(dplyr)
library(stringr)
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))}
KO.name.full <- read.csv("MG-RAST/ontology.csv", header = TRUE, sep = "\t", stringsAsFactors = F, row.names=NULL)
KO.name <- KO.name.full[13]
KO.name <- data.frame(KO.name[!duplicated(KO.name),])
colnames(KO.name) <- "gene"
KO.name <- data.frame(stringr::str_split_fixed(KO.name$gene, ",", 2), stringsAsFactors = FALSE)
colnames(KO.name) <- c("KO","KO.function")
KO.name$KO <- substr(KO.name$KO, nchar(KO.name$KO)-6, nchar(KO.name$KO)-1)
KO.name$KO.function <- substr(KO.name$KO.function, 11, nchar(KO.name$KO.function))
KO.name$KO.function <- substr(KO.name$KO.function, 0, nchar(KO.name$KO.function) - 1)
KO.name <- KO.name[c(2,1)]
#this chunk takes the KEGG table output from MGRAST analysis and maps the kegg names to KO numbers
KEGG.name <- KEGG[c(4,5)]
KEGG.name.2 <- data.frame(stringr::str_split_fixed(KEGG.name$function.,"; ",2))
KEGG.name <- cbind(KEGG.name.2[2],KEGG.name[2])
colnames(KEGG.name) <- c("KO.function","count")
KEGG.name$KO.function <- as.character(KEGG.name$KO.function)
KEGG.name.KO <- dplyr::left_join(KEGG.name,KO.name,by="KO.function")
Some of the MG-RAST file is not possible to parse cleanly. ~10% of function names are mapped to KO terms by hand. This file is read back in and turned into a list of KO terms that can be fed into KEGG mapper (https://www.genome.jp/kegg/tool/map_pathway1.html)
unmapped.novel.KO <- read.csv("KEGG.name.KO.csv", header = TRUE, stringsAsFactors = FALSE)
KO.terms <- data.frame(unmapped.novel.KO$KO)
write.csv(KO.terms,"KO.terms.csv")
At KEGG mapper website, KO terms can be pasted in as such:
This in turn leads to an index of pathways to which the query functions map.
The individual KEGG maps can be used to guide biological investigations of the implications of the functional annotations.
These pathways represent only an initial survey of the functional annotations
Initial analysis confirms the presence of several minor community members from orders Bacilli and Clostridia. This is supported by analysis of 16S rRNA gene fragments in the metagenome and by the presence of functions unique to gram positive bacteria such as teichoic acid metabolism and sporulation. Their unique activities include uptake of carbohydrates (sugars), waste products (sarcosine), and osmoprotectants (trehalose) while enabling synthesis of Vitamin K (menaqionone). Altogether, this paints a picture of organisms surviving by scavenging trace sugars and waste products and providing an essential vitamin to the community.
Testable hypotheses: