Contents

This tutorial describes an R-package for finding active metabolic modules in atom transition network.

1 Installation

# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# 
# BiocManager::install("gatom")

2 Example workfow

In this example we will find an active metabolic module based on macrophage activation gene expression data.

library(gatom)
library(data.table)
library(igraph)
library(mwcsr)

First let’s load data with atom mappings (network object), enzyme annotations for mouse (org.Mm.eg.gatom) and metabolite annotations (met.kegg.db.rda):

data("networkEx")
data("org.Mm.eg.gatom.annoEx")
data("met.kegg.dbEx")

Loading input data:

data("met.de.rawEx")
data("gene.de.rawEx")

Getting metabolic graph (depending on topology parameter, the graph vertices can correspond to atoms or metabolites):

g <- makeMetabolicGraph(network=networkEx, 
                        topology="atoms",
                        org.gatom.anno=org.Mm.eg.gatom.annoEx, 
                        gene.de=gene.de.rawEx,
                        met.db=met.kegg.dbEx, 
                        met.de=met.de.rawEx)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
print(g)
## IGRAPH d60e1d0 UN-- 176 190 -- 
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from d60e1d0 (vertex names):
## [1] C00025_-0.3248_2.8125 --C00026_-0.3248_2.8125
## [2] C00025_-1.6238_3.5625 --C00026_-1.6238_3.5625
## [3] C00025_-2.9228_2.8125 --C00026_-2.9228_2.8125
## + ... omitted several edges

Scoring graph, obtaining an instance of SGMWCS (Signal Generalized Maximum Weight Subgraph) problem instance:

gs <- scoreGraph(g, k.gene=25, k.met=25)

Initialize an SMGWCS solver (a heuristic relax-and-cut solver rnc_solver is used for simplicity, check out mwcsr package documentation for more options, and see Using exact solver section for recommended way):

solver <- rnc_solver()

Finding a module:

res <- solve_mwcsp(solver, gs)
m <- res$graph
print(m)
## IGRAPH 1719da0 UN-- 37 36 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from 1719da0 (vertex names):
## [1] C00025_-2.9228_2.8125 --C00026_-2.9228_2.8125
## [2] C00024_15.0644_27.8518--C00033_-1.6238_0.5625
## + ... omitted several edges
head(E(m)$label)
## [1] "Psat1" "Acss2" "Gpt2"  "Got2"  "Pkm"   "Tpi1"
head(V(m)$label)
## [1] "Pyruvate"       "Acetyl-CoA"     "L-Glutamate"    "2-Oxoglutarate"
## [5] "Acetate"        "Oxaloacetate"

The module can be plotted in R Viewer with createShinyCyJSWidget():

set.seed(42)
createShinyCyJSWidget(m)

Next, reactions without highly changing genes but with high average expression can be added.

m.ext <- addHighlyExpressedEdges(m, gs)
createShinyCyJSWidget(m.ext)

Sometimes, as in example above, the same metabolite can appear multiple times in the module via different atoms. In such cases it is useful to either connect atoms belonging to the same metabolite with edges or to collapse them into one vertex.

To connect atoms use connectAtomsInsideMetabolite function:

m1 <- connectAtomsInsideMetabolite(m.ext)
createShinyCyJSWidget(m1)

And to collapse them into one vertex use collapseAtomsIntoMetabolites:

m2 <- collapseAtomsIntoMetabolites(m.ext)
createShinyCyJSWidget(m2)

3 Saving modules

We can save the module to graphml format with write_graph() function from igraph:

write_graph(m, file="M0.vs.M1.graphml", format="graphml")

Or it can be saved to an interactive html widget:

saveModuleToHtml(module = m, file = "M0.vs.M1.html", name="M0.vs.M1")
htmltools::includeHTML("M0.vs.M1.html")
M0.vs.M1

M0.vs.M1

Or we can save the module to dot format:

saveModuleToDot(m, file="M0.vs.M1.dot", name="M0.vs.M1")

Such dot file can be further used to generate svg file using neato tool from graphviz suite:

system("neato -Tsvg M0.vs.M1.dot > M0.vs.M1.svg", ignore.stderr=TRUE)

Alternatively, the module can be saved to pdf format with a nice layout. You may vary the meaning of repel force and the number of iterations of repel algorithm for label layout. Note, that the larger your graph is the softer force you should use. You may also set different seed for different variants of edge layout.

set.seed(42)
saveModuleToPdf(m, file="M0.vs.M1.pdf", name="M0.vs.M1", 
                n_iter=100, force=1e-5)

4 Example on full data and full network

Full dataset for the example can be downloaded here:

library(R.utils)
library(data.table)

met.de.raw <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GAM/Ctrl.vs.MandLPSandIFNg.met.de.tsv.gz")
gene.de.raw <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GAM/Ctrl.vs.MandLPSandIFNg.gene.de.tsv.gz")

Full pre-generated combined network (ref. Networks for details) and enzyme annotation can be downloaded from http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/:

network.combined <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.combined.rds"))
met.combined.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.combined.db.rds"))

org.Mm.eg.gatom.anno <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/org.Mm.eg.gatom.anno.rds"))

For better work of the combined network we highly recommend using additional supplementary gene files (ref. Supplementary Genes)

gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.combined.mmu.eg.tsv", colClasses="character")

Running gatom on combined network and full dataset:

cg <- makeMetabolicGraph(network=network.combined,
                         topology="atoms",
                         org.gatom.anno=org.Mm.eg.gatom.anno,
                         gene.de=gene.de.raw,
                         met.db=met.combined.db,
                         met.de=met.de.raw,
                         gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
cgs <- scoreGraph(cg, k.gene = 50, k.met = 50)
## Metabolite p-value threshold: 0.038971
## Metabolite BU alpha: 0.078854
## FDR for metabolites: 0.054345
## Gene p-value threshold: 0.001204
## Gene BU alpha: 0.153709
## FDR for genes: 0.006234
solver <- rnc_solver()
set.seed(42)
sol <- solve_mwcsp(solver, cgs)
cm <- sol$graph
cm
## IGRAPH f18f805 UN-- 53 55 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from f18f805 (vertex names):
## [1] C00025_1.299_0.75--C00025_1.299_0.75  C00022_1.299_0.75--C00026_-1.299_0.75
## [3] C00025_1.299_0.75--C00026_-1.299_0.75 C00025_1.299_0.75--C00049_1.299_0.75 
## + ... omitted several edges
createShinyCyJSWidget(cm)

5 Networks

We provide four types of networks that can be used for analysis:

  1. KEGG network
  2. Rhea network
  3. Combined network
  4. Rhea lipid subnetwork

5.1 KEGG

KEGG network consists of network.kegg.rds & met.kegg.db.rds files and is based on KEGG database.

Both metabolites and reactions have KEGG IDs.

network <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.kegg.rds"))
met.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.kegg.db.rds"))

Running gatom on this dataset:

kg <- makeMetabolicGraph(network=network, 
                        topology="atoms",
                        org.gatom.anno=org.Mm.eg.gatom.anno, 
                        gene.de=gene.de.raw,
                        met.db=met.db, 
                        met.de=met.de.raw)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
kgs <- scoreGraph(kg, k.gene = 50, k.met = 50)
## Metabolite p-value threshold: 0.055133
## Metabolite BU alpha: 0.087937
## FDR for metabolites: 0.070765
## Gene p-value threshold: 0.002471
## Gene BU alpha: 0.164266
## FDR for genes: 0.010231
solver <- rnc_solver()
set.seed(42)
sol <- solve_mwcsp(solver, kgs) 
km <- sol$graph
km
## IGRAPH 39f1c03 UN-- 68 69 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from 39f1c03 (vertex names):
## [1] C00025_-0.3248_2.8125--C00026_-0.3248_2.8125
## [2] C00036_-1.9486_3.375 --C00049_-0.6495_2.625 
## + ... omitted several edges
createShinyCyJSWidget(km)

5.2 Rhea

Rhea network consists of network.rhea.rds & met.rhea.db.rds files and is based on Rhea database.

Reactions in Rhea have their own IDs, but unlike KEGG, metabolite IDs come from a separate database – ChEBI database

To use Rhea network download the following files:

network.rhea <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.rhea.rds"))
met.rhea.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.rhea.db.rds"))

For proper work of the Rhea network we also need a corresponding supplementary gene file (ref. Supplementary Genes)

gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.rhea.mmu.eg.tsv", colClasses="character")

And run gatom on Rhea network:

rg <- makeMetabolicGraph(network=network.rhea,
                         topology="atoms",
                         org.gatom.anno=org.Mm.eg.gatom.anno,
                         gene.de=gene.de.raw,
                         met.db=met.rhea.db,
                         met.de=met.de.raw,
                         gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
rgs <- scoreGraph(rg, k.gene = 50, k.met = 50)
## Metabolite p-value threshold: 0.046206
## Metabolite BU alpha: 0.082053
## FDR for metabolites: 0.092911
## Gene p-value threshold: 0.000695
## Gene BU alpha: 0.161730
## FDR for genes: 0.004909
solver <- rnc_solver()
set.seed(42)
sol <- solve_mwcsp(solver, rgs)
rm <- sol$graph
rm
## IGRAPH 294bdb4 UN-- 63 65 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), url (e/c), reaction (e/c), log2FC (e/n), baseMean
## | (e/n), logPval (e/n), signal (e/c), signalRank (e/n), score (e/n)
## + edges from 294bdb4 (vertex names):
## [1] CHEBI:15361_-1.299_3.75   --CHEBI:15589_-0.6495_2.625 
## [2] CHEBI:15562_-2.9228_2.8125--CHEBI:16383_-0.3248_2.8125
## [3] CHEBI:15698_0.4735_-2.9742--CHEBI:16450_-0.236_2.2973 
## + ... omitted several edges
createShinyCyJSWidget(rm)

5.3 Combined network

Combined network comprises not only KEGG and Rhea reactions, but also transport reactions from BIGG database.

This means that reactions in such network have either KEGG or Rhea or BIGG IDs, and metabolite IDs are KEGGs and ChEBIs.

5.4 Rhea lipid subnetwork

Rhea lipid subnetwork is subset of Rhea reactions that involve lipids, and it consists of network.rhea.lipids.rds & met.rhea.lipids.db.rds files.

network.lipids <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.rhea.lipids.rds"))
met.lipids.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.lipids.db.rds"))

For proper work of the lipid network we will also need a corresponding supplementary gene file (ref. Supplementary Genes)

gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.rhea.mmu.eg.tsv", colClasses="character")

To test lipid network we will use a different dataset:

met.de.lipids <- fread("https://artyomovlab.wustl.edu/publications/supp_materials/GATOM/Ctrl.vs.HighFat.lipid.de.csv")

For lipid network we recommend setting topology to metabolites (ref. Using metabolite-level network):

lg <- makeMetabolicGraph(network=network.lipids,
                         topology="metabolites",
                         org.gatom.anno=org.Mm.eg.gatom.anno,
                         gene.de=NULL,
                         met.db=met.lipids.db,
                         met.de=met.de.lipids,
                         gene2reaction.extra=gene2reaction.extra)
## Found DE table for metabolites with Species IDs
lgs <- scoreGraph(lg, k.gene = NULL, k.met = 50)
## Metabolite p-value threshold: 0.009204
## Metabolite BU alpha: 0.145591
## FDR for metabolites: 0.006065
solver <- rnc_solver()
set.seed(42)
sol <- solve_mwcsp(solver, lgs)
lm <- sol$graph
lm
## IGRAPH 32797a3 UN-- 45 44 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), LipidMaps_ID (v/c), SwissLipids_ID (v/c),
## | Class_LipidMaps (v/c), Abbreviation_LipidMaps (v/c),
## | Synonyms_LipidMaps (v/c), Class_SwissLipids (v/c),
## | Abbreviation_SwissLipids (v/c), pval (v/n), origin (v/n), Species
## | (v/c), Mean(ctrl) (v/n), Mean(exp) (v/n), FC (v/n), log2FC (v/n),
## | -Log10(p-value) (v/n), -Log10(padj) (v/n), adj.P.Val (v/n),
## | Significance(padj) (v/c), Significance(p-value) (v/c), logPval (v/n),
## | signal (v/c), score (v/n), label (e/c), pval (e/l), gene (e/c),
## | enzyme (e/c), url (e/c), reaction (e/c), score (e/n), signal (e/c)
## + edges from 32797a3 (vertex names):
createShinyCyJSWidget(lm)

If IDs for DE table for metabolites are of type Species we can process metabolite labels into more readable ones:

lm1 <- abbreviateLabels(lm, orig.names = TRUE, abbrev.names = TRUE)

createShinyCyJSWidget(lm1)

6 Misc

6.1 Supplementary gene files

For combined, Rhea and lipid networks we provide supplementary files with genes that either come from proteome or are not linked to a specific enzyme. These files are organism-specific and are also available at http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/.

network.combined <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/network.combined.rds"))
met.combined.db <- readRDS(url("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/met.combined.db.rds"))
gene2reaction.extra <- fread("http://artyomovlab.wustl.edu/publications/supp_materials/GATOM/gene2reaction.combined.mmu.eg.tsv", colClasses="character")

gg <- makeMetabolicGraph(network=network.combined, 
                         topology="atoms",
                         org.gatom.anno=org.Mm.eg.gatom.anno, 
                         gene.de=gene.de.raw,
                         met.db=met.combined.db, 
                         met.de=met.de.raw, 
                         gene2reaction.extra=gene2reaction.extra)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
gg
## IGRAPH aa8564f UN-- 5047 6374 -- 
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from aa8564f (vertex names):
## [1] C00019_-0.75_-1.299--C00019_-0.75_-1.299
## [2] C00019_-1.5_0      --C00019_-1.5_0      
## [3] C00019_0.75_-1.299 --C00019_0.75_-1.299 
## + ... omitted several edges

6.2 Non-enzymatic reactions

Optionally, we can also preserve non-enzymatic reactions that are found in the network. This can be done by setting keepReactionsWithoutEnzymes to TRUE:

ge <- makeMetabolicGraph(network=network.combined, 
                         topology="atoms",
                         org.gatom.anno=org.Mm.eg.gatom.anno, 
                         gene.de=gene.de.raw,
                         met.db=met.combined.db, 
                         met.de=met.de.raw, 
                         gene2reaction.extra=gene2reaction.extra, 
                         keepReactionsWithoutEnzymes=TRUE)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
ge
## IGRAPH a4d55cc UN-- 5171 6519 -- 
## + attr: name (v/c), metabolite (v/c), element (v/c), label (v/c), url
## | (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC (v/n), baseMean
## | (v/n), logPval (v/n), signal (v/c), label (e/c), pval (e/n), origin
## | (e/n), RefSeq (e/c), gene (e/c), enzyme (e/c), reaction_name (e/c),
## | reaction_equation (e/c), url (e/c), reaction (e/c), log2FC (e/n),
## | baseMean (e/n), logPval (e/n), signal (e/c), signalRank (e/n)
## + edges from a4d55cc (vertex names):
## [1] C00019_-0.75_-1.299--C00019_-0.75_-1.299
## [2] C00019_-1.5_0      --C00019_-1.5_0      
## [3] C00019_0.75_-1.299 --C00019_0.75_-1.299 
## + ... omitted several edges

6.3 Using exact solver

For proper analysis quality we recommend to use exact SGMWCS solver virgo_solver(), which requires Java (version >= 11) and CPLEX (version >= 12.7) to be installed. If the requirements are met you can then find a module as following:

vsolver <- virgo_solver(cplex_dir=Sys.getenv("CPLEX_HOME"), 
                        threads=4, penalty=0.001, log=1)
sol <- solve_mwcsp(vsolver, gs) 
m <- sol$graph

Edge penalty option there is used to remove excessive redundancy in genes.

6.4 Running with no metabolite data

If there is no metabolite data in your experiment assign met.de and k.met to NULL:

g <- makeMetabolicGraph(network=networkEx, 
                        topology="atoms",
                        org.gatom.anno=org.Mm.eg.gatom.annoEx, 
                        gene.de=gene.de.rawEx,
                        met.db=met.kegg.dbEx, 
                        met.de=NULL)
## Found DE table for genes with RefSeq IDs
gs <- scoreGraph(g, k.gene = 50, k.met = NULL)
## Warning in bumOptim(x = x, starts): One or both parameters are on the limit of
## the defined parameter space
## Gene p-value threshold: 0.130626
## Gene BU alpha: 0.364527
## FDR for genes: 0.100000

6.5 Running with no gene data

If there is no gene data in your experiment assign gene.de and k.gene to NULL:

g <- makeMetabolicGraph(network=networkEx, 
                        topology="atoms",
                        org.gatom.anno=org.Mm.eg.gatom.annoEx, 
                        gene.de=NULL,
                        met.db=met.kegg.dbEx, 
                        met.de=met.de.rawEx)
## Found DE table for metabolites with HMDB IDs
gs <- scoreGraph(g, k.gene = NULL, k.met = 50)
## Metabolite p-value threshold: 0.260385
## Metabolite BU alpha: 0.080774
## FDR for metabolites: 0.100000

6.6 Using metabolite-level network

Sometimes it could make sense to work with metabolite-metabolite topology of the network, not atom-atom one. Such network is less structured, but contains more genes.

gm <- makeMetabolicGraph(network=network, 
                        topology="metabolite",
                        org.gatom.anno=org.Mm.eg.gatom.anno, 
                        gene.de=gene.de.raw,
                        met.db=met.db, 
                        met.de=met.de.raw)
## Found DE table for genes with RefSeq IDs
## Found DE table for metabolites with HMDB IDs
gms <- scoreGraph(gm, k.gene = 50, k.met = 50)
## Metabolite p-value threshold: 0.037216
## Metabolite BU alpha: 0.078983
## FDR for metabolites: 0.062353
## Gene p-value threshold: 0.001241
## Gene BU alpha: 0.163383
## FDR for genes: 0.006075
solver <- rnc_solver()
set.seed(42)
sol <- solve_mwcsp(solver, gms) 
mm <- sol$graph
mm
## IGRAPH 2cb4142 UN-- 106 105 -- 
## + attr: signals (g/n), name (v/c), metabolite (v/c), element (v/c),
## | label (v/c), url (v/c), pval (v/n), origin (v/n), HMDB (v/c), log2FC
## | (v/n), baseMean (v/n), logPval (v/n), signal (v/c), score (v/n),
## | label (e/c), pval (e/n), origin (e/n), RefSeq (e/c), gene (e/c),
## | enzyme (e/c), reaction_name (e/c), reaction_equation (e/c), url
## | (e/c), reaction (e/c), log2FC (e/n), baseMean (e/n), logPval (e/n),
## | signal (e/c), signalRank (e/n), score (e/n)
## + edges from 2cb4142 (vertex names):
##  [1] C00219--C05956 C00062--C05933 C00327--C05933 C00417--C00490 C00154--C00249
##  [6] C00301--C13050 C04242--C21750 C16634--C21748 C00062--C00086 C00157--C00219
## + ... omitted several edges

6.7 Pathway annotation

To get functional annotation of obtained modules by KEGG and Reactome metabolic pathways, we can use hypergeometric test with fora() function from fgsea package.

foraRes <- fgsea::fora(pathways=org.Mm.eg.gatom.anno$pathways,
                       genes=E(km)$gene,
                       universe=unique(E(kg)$gene),
                       minSize=5)
foraRes[padj < 0.05]
##                                                                  pathway
## 1:                                      mmu00480: Glutathione metabolism
## 2:  mmu_M00002: Glycolysis, core module involving three-carbon compounds
## 3:              mmu_M00003: Gluconeogenesis, oxaloacetate => fructose-6P
## 4:                                   mmu00020: Citrate cycle (TCA cycle)
## 5:                    mmu_M00009: Citrate cycle (TCA cycle, Krebs cycle)
## 6: mmu_M00001: Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
## 7:                                      mmu00565: Ether lipid metabolism
## 8:                                       mmu00220: Arginine biosynthesis
##            pval        padj overlap size
## 1: 1.785198e-05 0.001249639       9   14
## 2: 5.513707e-05 0.001929798       5    5
## 3: 2.937901e-04 0.006855102       5    6
## 4: 4.238384e-04 0.007417172       7   12
## 5: 2.163785e-03 0.030292991       5    8
## 6: 4.326217e-03 0.042094854       5    9
## 7: 4.326217e-03 0.042094854       5    9
## 8: 4.810840e-03 0.042094854       4    6
##                                 overlapGenes
## 1: 110175,110208,14629,14782,14854,15926,...
## 2:             21991,14433,18648,13806,18746
## 3:             74551,13806,18648,14433,21991
## 4:  104112,11428,12974,15926,18293,74551,...
## 5:             12974,11428,15926,18293,78920
## 6:             21991,14433,18648,13806,18746
## 7:           18783,225845,270084,27226,67916
## 8:                   11846,11898,14719,18126

Optionally, redundancy in pathways can be decreased with collapsePathwaysORA() function:

mainPathways <- fgsea::collapsePathwaysORA(
  foraRes[padj < 0.05],
  pathways=org.Mm.eg.gatom.anno$pathways,
  genes=E(km)$gene,
  universe=unique(E(kg)$gene))
foraRes[pathway %in% mainPathways$mainPathways]
##                                                                 pathway
## 1:                                     mmu00480: Glutathione metabolism
## 2: mmu_M00002: Glycolysis, core module involving three-carbon compounds
## 3:                                  mmu00020: Citrate cycle (TCA cycle)
## 4:                                     mmu00565: Ether lipid metabolism
## 5:                                      mmu00220: Arginine biosynthesis
##            pval        padj overlap size
## 1: 1.785198e-05 0.001249639       9   14
## 2: 5.513707e-05 0.001929798       5    5
## 3: 4.238384e-04 0.007417172       7   12
## 4: 4.326217e-03 0.042094854       5    9
## 5: 4.810840e-03 0.042094854       4    6
##                                 overlapGenes
## 1: 110175,110208,14629,14782,14854,15926,...
## 2:             21991,14433,18648,13806,18746
## 3:  104112,11428,12974,15926,18293,74551,...
## 4:           18783,225845,270084,27226,67916
## 5:                   11846,11898,14719,18126
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 12 (bookworm)
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3;  LAPACK version 3.11.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=C                  LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Chicago
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] R.utils_2.12.2    R.oo_1.25.0       R.methodsS3_1.8.2 mwcsr_0.1.7      
## [5] igraph_1.4.3      data.table_1.14.8 gatom_0.99.0      BiocStyle_2.28.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0        farver_2.1.1            dplyr_1.1.2            
##  [4] blob_1.2.4              Biostrings_2.68.1       bitops_1.0-7           
##  [7] fastmap_1.1.1           RCurl_1.98-1.12         reshape_0.8.9          
## [10] GGally_2.1.2            XML_3.99-0.14           digest_0.6.33          
## [13] lifecycle_1.0.3         ellipsis_0.3.2          KEGGREST_1.40.0        
## [16] RSQLite_2.3.1           magrittr_2.0.3          compiler_4.3.1         
## [19] rlang_1.1.1             sass_0.4.7              tools_4.3.1            
## [22] utf8_1.2.3              yaml_2.3.7              sna_2.7-1              
## [25] knitr_1.43              htmlwidgets_1.6.2       curl_5.0.1             
## [28] bit_4.0.5               plyr_1.8.8              RColorBrewer_1.1-3     
## [31] BiocParallel_1.34.2     withr_2.5.0             BiocGenerics_0.46.0    
## [34] shinyCyJS_1.0.0         grid_4.3.1              stats4_4.3.1           
## [37] fansi_1.0.4             colorspace_2.1-0        ggplot2_3.4.2          
## [40] scales_1.2.1            cli_3.6.1               rmarkdown_2.23         
## [43] crayon_1.5.2            generics_0.1.3          rstudioapi_0.14        
## [46] httr_1.4.6              DBI_1.1.3               cachem_1.0.8           
## [49] stringr_1.5.0           zlibbioc_1.46.0         network_1.18.1         
## [52] parallel_4.3.1          AnnotationDbi_1.62.2    BiocManager_1.30.21.1  
## [55] XVector_0.40.0          vctrs_0.6.3             Matrix_1.5-3           
## [58] jsonlite_1.8.7          bookdown_0.34           IRanges_2.34.1         
## [61] S4Vectors_0.38.1        bit64_4.0.5             BioNet_1.60.0          
## [64] jquerylib_0.1.4         intergraph_2.0-2        glue_1.6.2             
## [67] statnet.common_4.9.0    codetools_0.2-19        cowplot_1.1.1          
## [70] stringi_1.7.12          gtable_0.3.3            GenomeInfoDb_1.36.1    
## [73] munsell_0.5.0           tibble_3.2.1            pillar_1.9.0           
## [76] htmltools_0.5.5         fgsea_1.27.1            GenomeInfoDbData_1.2.10
## [79] R6_2.5.1                evaluate_0.21           Biobase_2.60.0         
## [82] lattice_0.20-45         png_0.1-8               memoise_2.0.1          
## [85] pryr_0.1.6              bslib_0.5.0             fastmatch_1.1-3        
## [88] Rcpp_1.0.11             coda_0.19-4             xfun_0.39              
## [91] pkgconfig_2.0.3