pathfindR is a tool for enrichment analysis via active
subnetworks. The package also offers functionalities to cluster the
enriched terms and identify representative terms in each cluster, score
the enriched terms per sample, and visualize analysis results. As of the
latest version, the package also allows comparison of two pathfindR
results.
The functionalities of pathfindR are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet.
Seurat data# specify your cell type
my_celltype <- "Ast"
# specify time point of interest
my_timept <- "2mo"
# Load Seurat data
# Change the filepath in quotes "" below to match where you
# have your celltype data saved
# The expected filepath is "~/AlzheimersDS/student-notebooks/FinalMergedData-my_celltype.rds"
# (where `my-celltype` is your celltype code)
my_celltype_data <- readRDS("/data/Alzheimers_DS/FinalMergedData-Ast.rds")
# Subset my celltype-specific Seurat object by time
my_seurat <- subset(x = my_celltype_data, subset = Age == my_timept)
# set active identities of my subset seurat to "Mt" attribute (denoting variant)
# for DE testing
Idents(my_seurat) <- "Mt"
summary(as.factor(my_seurat$Mt))
## V337M V337V
## 81 98
# Run D.E. test to identify genes differentially expressed between
# between diseased and normal cells, for my celltype and at this timepoint
marker_genes.df <- FindMarkers(my_seurat,
ident.1 = "V337M", ident.2 = "V337V")
# order results by p-value
marker_genes.df <- marker_genes.df %>% arrange(p_val)
head(marker_genes.df)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## HMGXB4 2.964098e-05 -0.4700985 0.247 0.541 1
## SNRPGP10 6.754363e-05 0.3308939 0.222 0.031 1
## TTBK2 1.044893e-04 -0.3491722 0.049 0.265 1
## RAB28 1.456217e-04 0.3951541 0.383 0.143 1
## HIRIP3 3.108470e-04 -0.2753372 0.037 0.224 1
## DOK5 3.644166e-04 0.4949083 0.617 0.357 1
# format a new dataframe using the contents of
# marker_genes.df, to pass to pathfinder
pathfinder.df <- marker_genes.df %>%
tibble::rownames_to_column("geneIDs") %>%
dplyr::select(geneIDs, avg_log2FC, p_val)
colnames(pathfinder.df) <- c("geneIDs","logFC","pvalue")
knitr::kable(head(pathfinder.df))
| geneIDs | logFC | pvalue |
|---|---|---|
| HMGXB4 | -0.4700985 | 0.0000296 |
| SNRPGP10 | 0.3308939 | 0.0000675 |
| TTBK2 | -0.3491722 | 0.0001045 |
| RAB28 | 0.3951541 | 0.0001456 |
| HIRIP3 | -0.2753372 | 0.0003108 |
| DOK5 | 0.4949083 | 0.0003644 |
After input testing, any gene symbol that is not in the chosen protein-protein interaction network (PIN) is converted to an alias symbol if there is an alias that is found in the PIN. After mapping the input genes with the associated p-values onto the PIN, active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain.
An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene.
These filtered lists of active subnetworks are then used for
enrichment analyses, i.e., using the genes in each of the active
subnetworks, the significantly enriched terms (pathways/gene sets) are
identified. Enriched terms with adjusted p-values larger than the given
threshold are discarded, and the lowest adjusted p-value (among all
active subnetworks) for each term is kept. This process of
active subnetwork search + enrichment analyses is repeated
for a selected number of iterations, performed in parallel. Over all
iterations, the lowest and the highest adjusted p-values, and the number
of occurrences among all iterations are reported for each significantly
enriched term.
# change the gene sets used for analysis (default = "KEGG")
# change the PIN for active subnetwork search (default = Biogrid)
# The available PINs are “Biogrid”, “STRING”, “GeneMania”, “IntAct”, “KEGG” and “mmu_STRING”.
# The available gene sets are “KEGG”, “Reactome”, “BioCarta”, “GO-All”, “GO-BP”, “GO-CC”, “GO-MF”, and “mmu_KEGG”.
output_df <- run_pathfindR(pathfinder.df, gene_sets = "Reactome", pin_name_path = "STRING",
p_val_threshold = 0.05, plot_enrichment_chart = FALSE)
## Visualizing: Signaling by high-kinase activity BRAF mutants
## Visualizing: MAP2K and MAPK activation
## Visualizing: Signaling by RAF1 mutants
## Visualizing: Paradoxical activation of RAF signaling by kinase inactive BRAF
## Visualizing: Signaling by RAS mutants
## Visualizing: Signaling by moderate kinase activity BRAF mutants
## Visualizing: Signaling downstream of RAS mutants
## Visualizing: Transcriptional Regulation by MECP2
## Visualizing: Signaling by BRAF and RAF1 fusions
## Visualizing: mRNA Splicing - Major Pathway
## Visualizing: mRNA Splicing
## Visualizing: Oncogenic MAPK signaling
## Visualizing: Processing of Capped Intron-Containing Pre-mRNA
## Visualizing: The citric acid (TCA) cycle and respiratory electron transport
## Visualizing: RAF activation
## Visualizing: Signalling to ERKs
## Visualizing: Glucose metabolism
## Visualizing: Signaling by NOTCH
## Visualizing: Negative regulation of MAPK pathway
## Visualizing: Signaling by NTRKs
## Visualizing: Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins.
## Visualizing: SRP-dependent cotranslational protein targeting to membrane
## Visualizing: mRNA Splicing - Minor Pathway
## Visualizing: Eukaryotic Translation Elongation
## Visualizing: L1CAM interactions
## Visualizing: RAF-MAP kinase cascade
## Visualizing: MAPK1-MAPK3 signaling
## Visualizing: Frs2-mediated activation
## Visualizing: Signaling by NTRK1 (TRKA)
## Visualizing: Prolonged ERK activation events
## Visualizing: Insertion of tail-anchored proteins into the endoplasmic reticulum membrane
## Visualizing: Complex I biogenesis
## Visualizing: ER to Golgi Anterograde Transport
## Visualizing: Glycolysis
## Visualizing: NOTCH3 Intracellular Domain Regulates Transcription
## Visualizing: FOXO-mediated transcription
## Visualizing: Signal transduction by L1
## Visualizing: VEGFA-VEGFR2 Pathway
## Visualizing: Transcriptional regulation by RUNX2
## Visualizing: Transport to the Golgi and subsequent modification
## Visualizing: Signaling by VEGF
## Visualizing: FOXO-mediated transcription of cell cycle genes
## Visualizing: Metabolism of carbohydrates
## Visualizing: Gluconeogenesis
## Visualizing: Regulation of MECP2 expression and activity
## Visualizing: Signaling by NTRK2 (TRKB)
## Visualizing: Intra-Golgi traffic
## Visualizing: Respiratory electron transport
## Visualizing: ER-Phagosome pathway
## Visualizing: HSP90 chaperone cycle for steroid hormone receptors (SHR) in the presence of ligand
## Visualizing: Signaling by NOTCH3
## Visualizing: RET signaling
## Visualizing: Antigen processing-Cross presentation
## Visualizing: Transcriptional regulation by RUNX1
## Visualizing: Deactivation of the beta-catenin transactivating complex
## Visualizing: Asparagine N-linked glycosylation
## Visualizing: NCAM signaling for neurite out-growth
## Visualizing: COPII-mediated vesicle transport
## Visualizing: Regulation of TP53 Activity through Acetylation
## Visualizing: Selective autophagy
## Visualizing: Defective CFTR causes cystic fibrosis
## Visualizing: Pre-NOTCH Expression and Processing
## Visualizing: Signaling by NOTCH4
## Visualizing: Cell death signalling via NRAGE, NRIF and NADE
## Visualizing: EPH-Ephrin signaling
## Visualizing: Signaling by TGF-beta Receptor Complex
## Visualizing: ABC transporter disorders
## Visualizing: Transcriptional Regulation by VENTX
## Visualizing: Semaphorin interactions
## Visualizing: p75 NTR receptor-mediated signalling
## Visualizing: ABC-family proteins mediated transport
## Visualizing: Regulation of TP53 Activity
## Visualizing: Transcriptional regulation by RUNX3
## Visualizing: Uptake and function of anthrax toxins
## Visualizing: Signaling by MET
## Visualizing: Signaling by TGFB family members
## Visualizing: Apoptotic cleavage of cell adhesion proteins
## Visualizing: Clathrin-mediated endocytosis
## Visualizing: Macroautophagy
## Visualizing: Regulation of RAS by GAPs
## Visualizing: Autophagy
## Visualizing: Protein localization
## Visualizing: Death Receptor Signalling
## Visualizing: Neurotransmitter receptors and postsynaptic signal transmission
## Visualizing: Downstream signaling events of B Cell Receptor (BCR)
## Visualizing: Intra-Golgi and retrograde Golgi-to-ER traffic
## Visualizing: CDC42 GTPase cycle
## Visualizing: Diseases of programmed cell death
## Visualizing: MAPK6-MAPK4 signaling
## Visualizing: Rap1 signalling
## Visualizing: RAF-independent MAPK1-3 activation
## Visualizing: Disorders of transmembrane transporters
##
|
| | 0%
|
|....................... | 33%
## inline R code fragments
##
##
|
|............................................... | 67%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|......................................................................| 100%
## ordinary text without R code
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS results.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e5e28d34.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e5357161c8.html
##
|
| | 0%
|
|.................. | 25%
## inline R code fragments
##
##
|
|................................... | 50%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|.................................................... | 75%
## ordinary text without R code
##
##
|
|......................................................................| 100%
## label: table (with options)
## List of 2
## $ echo : logi FALSE
## $ comment: logi NA
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS enriched_terms.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e5a5a1f24.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e559799808.html
##
|
| | 0%
|
|............ | 17%
## inline R code fragments
##
##
|
|....................... | 33%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|................................... | 50%
## ordinary text without R code
##
##
|
|............................................... | 67%
## label: converted_tbl, table1 (with options)
## List of 1
## $ comment: logi NA
##
##
|
|.......................................................... | 83%
## ordinary text without R code
##
##
|
|......................................................................| 100%
## label: gene_wo_interaction, table2 (with options)
## List of 1
## $ comment: logi NA
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS conversion_table.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e56364662b.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e57f4af102.html
This workflow first calculates the pairwise kappa statistics between the enriched terms. The function then performs hierarchical clustering (by default), automatically determines the optimal number of clusters by maximizing the average silhouette width and returns a data frame with cluster assignments.
# display the heatmap of hierarchical clustering
# display the dendrogram and automatically-determined clusters
# change agglomeration method (default = "average") for hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, plot_hmap = FALSE, plot_dend = FALSE, plot_clusters_graph = FALSE, clu_method = "centroid")
## The maximum average silhouette width was 0.65 for k = 65
## The representative terms
knitr::kable(clustered_df[clustered_df$Status == "Representative", ])
| ID | Term_Description | Fold_Enrichment | occurrence | support | lowest_p | highest_p | Up_regulated | Down_regulated | Cluster | Status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | R-HSA-6802948 | Signaling by high-kinase activity BRAF mutants | 11.192811 | 10 | 0.1086957 | 0.0000000 | 0.0000000 | MAP2K1, RAP1B | HRAS, MAP2K2 | 1 | Representative |
| 8 | R-HSA-8986944 | Transcriptional Regulation by MECP2 | 6.070677 | 10 | 0.0434783 | 0.0000000 | 0.0000000 | FOXG1, SOX2, TNRC6B | GAD1 | 2 | Representative |
| 10 | R-HSA-72163 | mRNA Splicing - Major Pathway | 2.529449 | 10 | 0.0434783 | 0.0000000 | 0.0000000 | HNRNPA0, HNRNPR, LSM2, SNRPE | SNRNP200 | 3 | Representative |
| 14 | R-HSA-1428517 | The citric acid (TCA) cycle and respiratory electron transport | 3.753278 | 10 | 0.0217391 | 0.0000004 | 0.0000004 | ATP5PF, NDUFA11 | COX8A, IDH2, MDH2, MT-ND5, NDUFS7 | 4 | Representative |
| 16 | R-HSA-187687 | Signalling to ERKs | 10.853634 | 10 | 0.0869565 | 0.0000006 | 0.0000006 | MAP2K1 | HRAS, KIDINS220, MAP2K2 | 5 | Representative |
| 17 | R-HSA-70326 | Glucose metabolism | 5.087641 | 10 | 0.0652174 | 0.0000008 | 0.0000008 | PGK1, TPI1 | MDH2, PRKACB, SLC25A1 | 6 | Representative |
| 18 | R-HSA-157118 | Signaling by NOTCH | 2.249811 | 10 | 0.0652174 | 0.0000009 | 0.0000009 | FABP7, HES1, PSMB7, TMED2, TNRC6B | 7 | Representative | |
| 20 | R-HSA-166520 | Signaling by NTRKs | 4.070113 | 10 | 0.1304348 | 0.0000064 | 0.0000064 | MAP2K1 | AP2B1, CDK5R1, HRAS, KIDINS220, MAP2K2 | 8 | Representative |
| 21 | R-HSA-163200 | Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins. | 3.794173 | 10 | 0.0217391 | 0.0000081 | 0.0000081 | ATP5PF, NDUFA11 | COX8A, MT-ND5, NDUFS7 | 9 | Representative |
| 22 | R-HSA-1799339 | SRP-dependent cotranslational protein targeting to membrane | 3.443942 | 10 | 0.0217391 | 0.0000085 | 0.0000085 | RPS27L, SEC61B, SEC61G, SRP9 | 10 | Representative | |
| 23 | R-HSA-72165 | mRNA Splicing - Minor Pathway | 5.267205 | 10 | 0.0217391 | 0.0000090 | 0.0000090 | LSM2, SNRPE | SNRNP200 | 11 | Representative |
| 24 | R-HSA-156842 | Eukaryotic Translation Elongation | 3.123575 | 10 | 0.0434783 | 0.0000110 | 0.0000110 | RPS27L | EEF1A2, EEF1B2 | 12 | Representative |
| 25 | R-HSA-373760 | L1CAM interactions | 5.655315 | 10 | 0.0869565 | 0.0000164 | 0.0002263 | MAP2K1, SDCBP | AP2B1, MAP2K2, NCAM1, NRP2 | 13 | Representative |
| 26 | R-HSA-5673001 | RAF/MAP kinase cascade | 2.019755 | 10 | 0.0869565 | 0.0000239 | 0.0000239 | MAP2K1, PSMB7, RAP1B | HRAS, MAP2K2, NCAM1 | 14 | Representative |
| 28 | R-HSA-170968 | Frs2-mediated activation | 14.923747 | 10 | 0.1304348 | 0.0000273 | 0.0000273 | MAP2K1 | MAP2K2 | 15 | Representative |
| 31 | R-HSA-9609523 | Insertion of tail-anchored proteins into the endoplasmic reticulum membrane | 8.527856 | 10 | 0.0217391 | 0.0000472 | 0.0000472 | SEC61B, SEC61G | 16 | Representative | |
| 33 | R-HSA-199977 | ER to Golgi Anterograde Transport | 3.366259 | 10 | 0.0217391 | 0.0000641 | 0.0000641 | CAPZB, TMED2 | CNIH2, DYNC1LI2, GOSR1 | 17 | Representative |
| 34 | R-HSA-70171 | Glycolysis | 3.837535 | 10 | 0.0217391 | 0.0000656 | 0.0000656 | PGK1, TPI1 | PRKACB | 18 | Representative |
| 35 | R-HSA-9013508 | NOTCH3 Intracellular Domain Regulates Transcription | 7.461874 | 10 | 0.0434783 | 0.0000717 | 0.0000717 | FABP7, HES1 | 19 | Representative | |
| 36 | R-HSA-9614085 | FOXO-mediated transcription | 4.197304 | 10 | 0.0217391 | 0.0000816 | 0.0000816 | FOXG1 | CCNG2, PINK1 | 20 | Representative |
| 38 | R-HSA-4420097 | VEGFA-VEGFR2 Pathway | 3.893152 | 10 | 0.1956522 | 0.0000999 | 0.0000999 | CTNNB1 | BAIAP2, HRAS, PRKACB | 21 | Representative |
| 39 | R-HSA-8878166 | Transcriptional regulation by RUNX2 | 2.257373 | 10 | 0.0217391 | 0.0001219 | 0.0001219 | HES1, PSMB7 | LGALS3 | 22 | Representative |
| 45 | R-HSA-9022692 | Regulation of MECP2 expression and activity | 5.969499 | 10 | 0.0217391 | 0.0002872 | 0.0002872 | FOXG1, TNRC6B | 23 | Representative | |
| 46 | R-HSA-9006115 | Signaling by NTRK2 (TRKB) | 7.461874 | 10 | 0.0217391 | 0.0004003 | 0.0004003 | CDK5R1, HRAS | 24 | Representative | |
| 47 | R-HSA-6811438 | Intra-Golgi traffic | 4.263928 | 10 | 0.0217391 | 0.0004061 | 0.0004061 | GOLIM4 | GOSR1 | 25 | Representative |
| 49 | R-HSA-1236974 | ER-Phagosome pathway | 4.116896 | 10 | 0.0217391 | 0.0004710 | 0.0004710 | HMGB1, PSMB7, SEC61B, SEC61G | 26 | Representative | |
| 50 | R-HSA-3371497 | HSP90 chaperone cycle for steroid hormone receptors (SHR) in the presence of ligand | 4.712762 | 10 | 0.0652174 | 0.0005959 | 0.0005959 | CAPZB | DYNC1LI2 | 27 | Representative |
| 52 | R-HSA-8853659 | RET signaling | 4.591922 | 10 | 0.0217391 | 0.0006454 | 0.0006454 | DOK5 | PRKACB | 28 | Representative |
| 54 | R-HSA-8878171 | Transcriptional regulation by RUNX1 | 2.205480 | 10 | 0.0217391 | 0.0010441 | 0.0010441 | PSMB7, TJP1, TNRC6B | LGALS3, RYBP | 29 | Representative |
| 55 | R-HSA-3769402 | Deactivation of the beta-catenin transactivating complex | 4.591922 | 10 | 0.0217391 | 0.0011273 | 0.0011273 | CTNNB1, SOX2 | 30 | Representative | |
| 57 | R-HSA-375165 | NCAM signaling for neurite out-growth | 3.378962 | 10 | 0.0652174 | 0.0016507 | 0.0016507 | HRAS, NCAM1 | 31 | Representative | |
| 58 | R-HSA-204005 | COPII-mediated vesicle transport | 2.672910 | 10 | 0.0217391 | 0.0016898 | 0.0016898 | TMED2 | CNIH2 | 32 | Representative |
| 59 | R-HSA-6804758 | Regulation of TP53 Activity through Acetylation | 6.175344 | 10 | 0.0217391 | 0.0021174 | 0.0021174 | PIN1 | KAT6A | 33 | Representative |
| 60 | R-HSA-9663891 | Selective autophagy | 2.984749 | 10 | 0.0217391 | 0.0024084 | 0.0024084 | DYNC1LI2, PINK1 | 34 | Representative | |
| 61 | R-HSA-5678895 | Defective CFTR causes cystic fibrosis | 2.984749 | 10 | 0.0217391 | 0.0024084 | 0.0024084 | PSMB7, RNF5 | 35 | Representative | |
| 62 | R-HSA-1912422 | Pre-NOTCH Expression and Processing | 2.356381 | 10 | 0.0217391 | 0.0024773 | 0.0024773 | TMED2, TNRC6B | 36 | Representative | |
| 64 | R-HSA-204998 | Cell death signalling via NRAGE, NRIF and NADE | 4.009365 | 10 | 0.0217391 | 0.0033677 | 0.0033677 | BEX3, MAGED1 | ITSN1 | 37 | Representative |
| 65 | R-HSA-2682334 | EPH-Ephrin signaling | 4.024381 | 10 | 0.0434783 | 0.0039961 | 0.0220640 | SDCBP | AP2B1, HRAS, ITSN1 | 38 | Representative |
| 66 | R-HSA-170834 | Signaling by TGF-beta Receptor Complex | 2.522324 | 10 | 0.0217391 | 0.0040154 | 0.0040154 | FKBP1A | CCNK | 39 | Representative |
| 68 | R-HSA-8853884 | Transcriptional Regulation by VENTX | 4.591922 | 10 | 0.0217391 | 0.0052726 | 0.0052726 | CTNNB1, TNRC6B | 40 | Representative | |
| 69 | R-HSA-373755 | Semaphorin interactions | 2.888467 | 10 | 0.0217391 | 0.0074026 | 0.0074026 | CDK5R1, PLXNA1 | 41 | Representative | |
| 73 | R-HSA-8853334 | Signaling by FGFR3 fusions in cancer | 9.949165 | 10 | 0.0217391 | 0.0087466 | 0.0087466 | HRAS | 42 | Representative | |
| 74 | R-HSA-8878159 | Transcriptional regulation by RUNX3 | 2.919864 | 10 | 0.0434783 | 0.0087920 | 0.0087920 | CTNNB1, HES1, PSMB7 | 43 | Representative | |
| 76 | R-HSA-6806834 | Signaling by MET | 2.713409 | 10 | 0.0217391 | 0.0089469 | 0.0089469 | RAP1B | HRAS | 44 | Representative |
| 77 | R-HSA-5626467 | RHO GTPases activate IQGAPs | 8.954248 | 10 | 0.0434783 | 0.0109311 | 0.0109311 | CTNNB1 | 45 | Representative | |
| 83 | R-HSA-8941856 | RUNX3 regulates NOTCH signaling | 6.887883 | 10 | 0.0434783 | 0.0126308 | 0.0126308 | HES1 | 46 | Representative | |
| 84 | R-HSA-351906 | Apoptotic cleavage of cell adhesion proteins | 16.280452 | 10 | 0.0652174 | 0.0133576 | 0.0133576 | CTNNB1, TJP1 | 47 | Representative | |
| 87 | R-HSA-177504 | Retrograde neurotrophin signalling | 6.395892 | 9 | 0.0217391 | 0.0147338 | 0.0147338 | AP2B1 | 48 | Representative | |
| 88 | R-HSA-111367 | SLBP independent Processing of Histone Pre-mRNAs | 8.954248 | 10 | 0.0217391 | 0.0152975 | 0.0152975 | SNRPE | 49 | Representative | |
| 89 | R-HSA-8856828 | Clathrin-mediated endocytosis | 1.918767 | 9 | 0.0217391 | 0.0156614 | 0.0156614 | AP2B1, ITSN1, SCARB2 | 50 | Representative | |
| 91 | R-HSA-1614558 | Degradation of cysteine and homocysteine | 7.461874 | 10 | 0.0217391 | 0.0160260 | 0.0160260 | CDO1 | 51 | Representative | |
| 102 | R-HSA-5658442 | Regulation of RAS by GAPs | 2.633602 | 10 | 0.0217391 | 0.0208364 | 0.0208364 | PSMB7 | HRAS | 52 | Representative |
| 103 | R-HSA-1912420 | Pre-NOTCH Processing in Golgi | 5.267205 | 10 | 0.0217391 | 0.0220101 | 0.0220101 | TMED2 | 53 | Representative | |
| 108 | R-HSA-9609507 | Protein localization | 2.224658 | 10 | 0.0217391 | 0.0238312 | 0.0238312 | CHCHD2, PEX2, SEC61B, SEC61G | 54 | Representative | |
| 110 | R-HSA-8943723 | Regulation of PTEN mRNA translation | 8.140226 | 10 | 0.0217391 | 0.0249123 | 0.0249123 | TNRC6B | 55 | Representative | |
| 116 | R-HSA-2892247 | POU5F1 (OCT4), SOX2, NANOG activate genes related to proliferation | 6.887883 | 10 | 0.0217391 | 0.0264964 | 0.0264964 | SOX2 | 56 | Representative | |
| 117 | R-HSA-112314 | Neurotransmitter receptors and postsynaptic signal transmission | 2.035056 | 10 | 0.0217391 | 0.0311303 | 0.0311303 | AP2B1, GABBR2, HRAS, PRKACB | 57 | Representative | |
| 119 | R-HSA-6811442 | Intra-Golgi and retrograde Golgi-to-ER traffic | 3.541228 | 10 | 0.0217391 | 0.0316635 | 0.0316635 | CAPZB, GOLIM4, TMED2 | DYNC1LI2, GOSR1, KIF1B, KIF21A | 58 | Representative |
| 120 | R-HSA-9013148 | CDC42 GTPase cycle | 1.891743 | 10 | 0.0217391 | 0.0324387 | 0.0324387 | KTN1 | BAIAP2, ITSN1 | 59 | Representative |
| 121 | R-HSA-9645723 | Diseases of programmed cell death | 2.522324 | 10 | 0.0217391 | 0.0325151 | 0.0325151 | CDK5R1, DNMT1 | 60 | Representative | |
| 127 | R-HSA-5687128 | MAPK6/MAPK4 signaling | 2.984749 | 10 | 0.0217391 | 0.0340649 | 0.0340649 | PSMB7, TNRC6B | PRKACB | 61 | Representative |
| 128 | R-HSA-354194 | GRB2:SOS provides linkage to MAPK signaling for Integrins | 6.887883 | 10 | 0.0217391 | 0.0353096 | 0.0353096 | RAP1B | 62 | Representative | |
| 133 | R-HSA-392517 | Rap1 signalling | 11.192811 | 10 | 0.0217391 | 0.0407339 | 0.0407339 | RAP1B | PRKACB | 63 | Representative |
| 135 | R-HSA-399955 | SEMA3A-Plexin repulsion signaling by inhibiting Integrin adhesion | 6.395892 | 10 | 0.0217391 | 0.0411825 | 0.0411825 | PLXNA1 | 64 | Representative | |
| 145 | R-HSA-450385 | Butyrate Response Factor 1 (BRF1) binds and destabilizes mRNA | 5.267205 | 10 | 0.0217391 | 0.0461538 | 0.0461538 | ZFP36L1 | 65 | Representative |
After clustering, you may again plot the summary enrichment chart and display the enriched terms by clusters:
# plotting only selected clusters for better visualization
RA_selected <- subset(clustered_df, Cluster %in% 1:3)
enrichment_chart(RA_selected, plot_by_cluster = TRUE)
The function term_gene_heatmap() can be used to
visualize the heatmap of genes that are involved in the enriched terms.
This heatmap allows visual identification of the input genes involved in
the enriched terms, as well as the common or distinct genes between
different terms. If the input data frame (same as in
run_pathfindR() is supplied, the tile colors indicate the
change values.
term_gene_heatmap(result_df = output_df, genes_df = pathfinder.df)
The visualization function term_gene_graph() (adapted
from the “Gene-Concept network visualization” by the R package
enrichplot) can be utilized to visualize which genes are
involved in the enriched terms. The function creates a term-gene graph
which shows the connections between genes and biological terms (enriched
pathways or gene sets). This allows for the investigation of multiple
terms to which significant genes are related. This graph also enables
visual determination of the degree of overlap between the enriched terms
by identifying shared and/or distinct significant genes.
term_gene_graph(result_df = output_df, use_description = TRUE)
UpSet plots are plots of the intersections of sets as a matrix. This
function creates a ggplot object of an UpSet plot where the x-axis is
the UpSet plot of intersections of enriched terms. By default (i.e.,
method = "heatmap"), the main plot is a heatmap of genes at
the corresponding intersections, colored by up/down regulation (if
genes_df is provided, colored by change values). If
method = "barplot", the main plot is bar plots of the
number of genes at the corresponding intersections. Finally, if
method = "boxplot" and genes_df is provided,
then the main plot displays the boxplots of change values of the genes
at the corresponding intersections.
UpSet_plot(result_df = output_df, genes_df = pathfinder.df)
# specify time point of interest
my_timept_2 <- "6mo"
# Subset my celltype-specific Seurat object by time
my_seurat_2 <- subset(x = my_celltype_data, subset = Age == my_timept_2)
# set active identities of my subset seurat to "Mt" attribute (denoting variant)
# for DE testing
Idents(my_seurat_2) <- "Mt"
summary(as.factor(my_seurat_2$Mt))
## V337M V337V
## 120 206
# Run D.E. test to identify genes differentially expressed between
# between diseased and normal cells, for my celltype and at this timepoint
marker_genes_2.df <- FindMarkers(my_seurat_2,
ident.1 = "V337M", ident.2 = "V337V")
# order results by p-value
marker_genes_2.df <- marker_genes_2.df %>% arrange(p_val)
head(marker_genes_2.df)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FAM183A 1.057416e-06 0.3126794 0.208 0.039 0.04239813
## RSPH1 6.416327e-06 0.5408231 0.292 0.107 0.25726906
## SDR39U1 8.923202e-06 0.2776002 0.367 0.146 0.35778471
## CHCHD2 7.523072e-05 0.5406251 0.742 0.534 1.00000000
## GSTO1 1.188225e-04 0.2758435 0.483 0.277 1.00000000
## KIF5B 2.266152e-04 -0.2973771 0.917 0.961 1.00000000
# format a new dataframe using the contents of
# marker_genes_2.df, to pass to pathfinder
pathfinder_2.df <- marker_genes_2.df %>%
tibble::rownames_to_column("geneIDs") %>%
dplyr::select(geneIDs, avg_log2FC, p_val)
colnames(pathfinder_2.df) <- c("geneIDs","logFC","pvalue")
output_df_2 <- run_pathfindR(pathfinder_2.df, gene_sets = "Reactome", pin_name_path = "STRING")
## Visualizing: Nuclear signaling by ERBB4
## Visualizing: Signaling by ERBB4
## Visualizing: Cell-extracellular matrix interactions
## Visualizing: NGF-stimulated transcription
## Visualizing: NR1H3 & NR1H2 regulate gene expression linked to cholesterol transport and efflux
## Visualizing: Nuclear Events (kinase and transcription factor activation)
## Visualizing: NR1H2 and NR1H3-mediated signaling
## Visualizing: Signaling by NTRK1 (TRKA)
## Visualizing: Signaling by NTRKs
## Visualizing: Amyloid fiber formation
## Visualizing: Cell junction organization
## Visualizing: Plasma lipoprotein remodeling
## Visualizing: Cell-Cell communication
## Visualizing: Formation of the ternary complex, and subsequently, the 43S complex
## Visualizing: Ribosomal scanning and start codon recognition
## Visualizing: Translation initiation complex formation
## Visualizing: Activation of the mRNA upon binding of the cap-binding complex and eIFs, and subsequent binding to 43S
## Visualizing: Transcriptional Regulation by MECP2
## Visualizing: Chromatin modifying enzymes
## Visualizing: Chromatin organization
## Visualizing: Transcriptional activity of SMAD2-SMAD3:SMAD4 heterotrimer
## Visualizing: Signaling by Nuclear Receptors
## Visualizing: Transcriptional regulation of white adipocyte differentiation
## Visualizing: MyD88 cascade initiated on plasma membrane
## Visualizing: Toll Like Receptor 10 (TLR10) Cascade
## Visualizing: Toll Like Receptor 5 (TLR5) Cascade
## Visualizing: TRAF6 mediated induction of NFkB and MAP kinases upon TLR7-8 or 9 activation
## Visualizing: MyD88 dependent cascade initiated on endosome
## Visualizing: Toll Like Receptor 7-8 (TLR7-8) Cascade
## Visualizing: Toll Like Receptor 3 (TLR3) Cascade
## Visualizing: Toll Like Receptor 9 (TLR9) Cascade
## Visualizing: Plasma lipoprotein assembly, remodeling, and clearance
## Visualizing: Response of EIF2AK4 (GCN2) to amino acid deficiency
## Visualizing: MyD88-independent TLR4 cascade
## Visualizing: TRIF(TICAM1)-mediated TLR4 signaling
## Visualizing: MyD88:MAL(TIRAP) cascade initiated on plasma membrane
## Visualizing: Toll Like Receptor 2 (TLR2) Cascade
## Visualizing: Toll Like Receptor TLR1:TLR2 Cascade
## Visualizing: Toll Like Receptor TLR6:TLR2 Cascade
## Visualizing: Interleukin-4 and Interleukin-13 signaling
## Visualizing: L13a-mediated translational silencing of Ceruloplasmin expression
## Visualizing: GTP hydrolysis and joining of the 60S ribosomal subunit
## Visualizing: Cap-dependent Translation Initiation
## Visualizing: Eukaryotic Translation Initiation
## Visualizing: PPARA activates gene expression
## Visualizing: Regulation of lipid metabolism by PPARalpha
## Visualizing: Signaling by TGF-beta Receptor Complex
## Visualizing: HDL remodeling
## Visualizing: Post-translational protein phosphorylation
## Visualizing: Toll Like Receptor 4 (TLR4) Cascade
## Visualizing: Cellular response to starvation
## Visualizing: Toll-like Receptor Cascades
## Visualizing: Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs)
## Visualizing: Signaling by TGFB family members
## Visualizing: Cellular Senescence
## Visualizing: Nuclear events stimulated by ALK signaling in cancer
## Visualizing: Retinoid metabolism and transport
## Visualizing: Non-integrin membrane-ECM interactions
## Visualizing: Organelle biogenesis and maintenance
## Visualizing: Metabolism of fat-soluble vitamins
## Visualizing: Regulation of expression of SLITs and ROBOs
##
|
| | 0%
|
|....................... | 33%
## inline R code fragments
##
##
|
|............................................... | 67%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|......................................................................| 100%
## ordinary text without R code
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS results.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e57197b857.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e568f002f6.html
##
|
| | 0%
|
|.................. | 25%
## inline R code fragments
##
##
|
|................................... | 50%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|.................................................... | 75%
## ordinary text without R code
##
##
|
|......................................................................| 100%
## label: table (with options)
## List of 2
## $ echo : logi FALSE
## $ comment: logi NA
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS enriched_terms.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e53c6f23df.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e55b92cc54.html
##
|
| | 0%
|
|............ | 17%
## inline R code fragments
##
##
|
|....................... | 33%
## label: setup (with options)
## List of 1
## $ include: logi FALSE
##
##
|
|................................... | 50%
## ordinary text without R code
##
##
|
|............................................... | 67%
## label: converted_tbl, table1 (with options)
## List of 1
## $ comment: logi NA
##
##
|
|.......................................................... | 83%
## ordinary text without R code
##
##
|
|......................................................................| 100%
## label: gene_wo_interaction, table2 (with options)
## List of 1
## $ comment: logi NA
##
##
## /usr/lib/rstudio-server/bin/quarto/bin/pandoc +RTS -K512m -RTS conversion_table.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc14e510f6078f.html --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /home/heh4/R/x86_64-pc-linux-gnu-library/4.1/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpWWlIBJ/rmarkdown-str14e53c7fd9ba.html
combined_df <- combine_pathfindR_results(result_A = output_df,
result_B = output_df_2)
combined_results_graph(combined_df, selected_terms = c("R-HSA-187037", "R-HSA-6802948", "R-HSA-168164"))
genesymbol <- pathfinder.df$geneIDs
pathfinder.df$pvalue <- -log10(pathfinder.df$pvalue)
id <- bitr(genesymbol, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
names(id)[1] = "geneIDs"
merged <- merge(pathfinder.df, id, by="geneIDs")
my_geneList <- as.numeric(merged$pvalue)
names(my_geneList) <- as.character(merged$ENTREZID)
my_geneList <- sort(my_geneList, decreasing = TRUE)
enriched <- enrichPathway(id$ENTREZID,
organism = "human",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
readable = FALSE)
head(enriched)
## ID
## R-HSA-3371497 R-HSA-3371497
## R-HSA-373760 R-HSA-373760
## R-HSA-6798695 R-HSA-6798695
## R-HSA-70263 R-HSA-70263
## R-HSA-75153 R-HSA-75153
## R-HSA-3371568 R-HSA-3371568
## Description
## R-HSA-3371497 HSP90 chaperone cycle for steroid hormone receptors (SHR) in the presence of ligand
## R-HSA-373760 L1CAM interactions
## R-HSA-6798695 Neutrophil degranulation
## R-HSA-70263 Gluconeogenesis
## R-HSA-75153 Apoptotic execution phase
## R-HSA-3371568 Attenuation phase
## GeneRatio BgRatio pvalue p.adjust qvalue
## R-HSA-3371497 8/205 55/10867 7.959422e-06 0.006218000 0.005319472
## R-HSA-373760 11/205 119/10867 1.464782e-05 0.006218000 0.005319472
## R-HSA-6798695 23/205 480/10867 3.624799e-05 0.007693649 0.006581883
## R-HSA-70263 6/205 34/10867 3.624805e-05 0.007693649 0.006581883
## R-HSA-75153 7/205 52/10867 4.998147e-05 0.008486853 0.007260466
## R-HSA-3371568 4/205 14/10867 1.061360e-04 0.015018241 0.012848040
## geneID
## R-HSA-3371497 832/1783/10383/2288/3303/3320/3337/347733
## R-HSA-373760 163/8828/5604/1641/6386/4684/5962/5605/10383/1173/347733
## R-HSA-6798695 25798/10857/5908/127829/6386/3958/5269/3146/8895/80184/8635/2519/5695/10383/3303/5611/3320/4282/966/2934/7077/57153/2171
## R-HSA-70263 7167/4191/6576/5230/2023/2597
## R-HSA-75153 7082/1499/3146/3148/324/2934/4000
## R-HSA-3371568 2288/3303/3320/3337
## Count
## R-HSA-3371497 8
## R-HSA-373760 11
## R-HSA-6798695 23
## R-HSA-70263 6
## R-HSA-75153 7
## R-HSA-3371568 4
y <- gsePathway(my_geneList,
pvalueCutoff = 1,
pAdjustMethod = "BH",
verbose = FALSE)
head(y)
## ID Description
## R-HSA-72766 R-HSA-72766 Translation
## R-HSA-72163 R-HSA-72163 mRNA Splicing - Major Pathway
## R-HSA-72172 R-HSA-72172 mRNA Splicing
## R-HSA-72203 R-HSA-72203 Processing of Capped Intron-Containing Pre-mRNA
## R-HSA-9012999 R-HSA-9012999 RHO GTPase cycle
## R-HSA-1640170 R-HSA-1640170 Cell Cycle
## setSize enrichmentScore NES pvalue p.adjust qvalues
## R-HSA-72766 11 0.4946562 1.452763 0.06979405 0.9160609 0.9160609
## R-HSA-72163 11 -0.3237055 -1.298800 0.17736169 0.9160609 0.9160609
## R-HSA-72172 11 -0.3237055 -1.298800 0.17736169 0.9160609 0.9160609
## R-HSA-72203 11 -0.3237055 -1.298800 0.17736169 0.9160609 0.9160609
## R-HSA-9012999 15 0.3695965 1.187080 0.24728850 0.9160609 0.9160609
## R-HSA-1640170 14 -0.2730478 -1.202461 0.25846050 0.9160609 0.9160609
## rank leading_edge
## R-HSA-72766 116 tags=73%, list=38%, signal=47%
## R-HSA-72163 166 tags=91%, list=54%, signal=43%
## R-HSA-72172 166 tags=91%, list=54%, signal=43%
## R-HSA-72203 166 tags=91%, list=54%, signal=43%
## R-HSA-9012999 58 tags=33%, list=19%, signal=28%
## R-HSA-1640170 158 tags=86%, list=51%, signal=44%
## core_enrichment
## R-HSA-72766 23480/1917/51065/29093/28977/1933/10952/6726
## R-HSA-72163 6635/57819/10949/10236/3185/3178/10285/6431/11100/3187
## R-HSA-72172 6635/57819/10949/10236/3185/3178/10285/6431/11100/3187
## R-HSA-72203 6635/57819/10949/10236/3185/3178/10285/6431/11100/3187
## R-HSA-9012999 6453/10458/832/57498/57120
## R-HSA-1640170 80184/5695/10383/5704/23122/7324/3320/27338/347733/5577/1032/4000
enriched_set <- setReadable(y, 'org.Hs.eg.db', 'ENTREZID')
p1 <- cnetplot(enriched_set, foldChange=pathfinder.df$logFC, colorEdge = TRUE, node_label = "gene")
p1
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(enriched_set, categorySize = "pvalue", foldChange = pathfinder.df$logFC, colorEdge = TRUE, node_label = "gene")
p2
p3 <- cnetplot(enriched_set, foldChange = pathfinder.df$logFC, circular = TRUE, colorEdge = TRUE, node_label = "gene")
p3
## Signaling by RAS mutants
## MAP2K1 1
## RAP1B 1
## HRAS 1
## MAP2K2 1
## FOXG1 0
## SOX2 0
## Signaling by moderate kinase activity BRAF mutants
## MAP2K1 1
## RAP1B 1
## HRAS 1
## MAP2K2 1
## FOXG1 0
## SOX2 0
## Transcriptional Regulation by MECP2 mRNA Splicing - Major Pathway
## MAP2K1 0 0
## RAP1B 0 0
## HRAS 0 0
## MAP2K2 0 0
## FOXG1 1 0
## SOX2 1 0
## Oncogenic MAPK signaling Glucose metabolism Signaling by NOTCH
## MAP2K1 1 0 0
## RAP1B 1 0 0
## HRAS 1 0 0
## MAP2K2 1 0 0
## FOXG1 0 0 0
## SOX2 0 0 0
## Signaling by NTRKs logFC
## MAP2K1 1 0.3653721
## RAP1B 0 0.2748598
## HRAS 1 -0.3101027
## MAP2K2 1 -0.2605632
## FOXG1 0 0.4565708
## SOX2 0 0.2857072