Overview

The ReactomeGSA package is a client to the web-based Reactome Analysis System. Essentially, it performs a gene set analysis using the latest version of the Reactome pathway database as a backend.

This vignette shows how the ReactomeGSA package can be used to perform a pathway analysis of cell clusters in single-cell RNA-sequencing data.

Part I: Example data

As an example we load single-cell RNA-sequencing data of B cells extracted from the dataset published by Jerby-Arnon et al. (Cell, 2018).

Note: This is not a complete Seurat object. To decrease the size, the object only contains gene expression values and cluster annotations.

Load B cells data

# Read in the B cells data
load("jerby_b_cells.RData")

Pathway analysis

Perform the pathway analysis using ReactomeGSA’s analyse_sc_clusters function, which automatically takes care of all required steps to perform pathway analysis:

  • Calculate the average expression per cluster

  • Perform a ssGSEA gene set variation analysis using ReactomeGSA

  • Return the final ReactomeGSAResult object

# Perform the pathway analysis using ReactomeGSA's analyse_sc_clusters function
gsva_result <- analyse_sc_clusters(jerby_b_cells, verbose = TRUE)
## Calculating average cluster expression...
## Converting expression data to string... (This may take a moment)
## Conversion complete
## Submitting request to Reactome API...
## Compressing request data...
## Reactome Analysis submitted succesfully
## Converting dataset Seurat...
## Mapping identifiers...
## Performing gene set analysis using ssGSEA
## Analysing dataset 'Seurat' using ssGSEA
## Retrieving result...
gsva_result
## ReactomeAnalysisResult object
##   Reactome Release: 80
##   Results:
##   - Seurat:
##     1741 pathways
##     10963 fold changes for genes
##   No Reactome visualizations available
## ReactomeAnalysisResult

The pathways function returns a data.frame with the columns representing the clusters and each row a pathway.

# Combines and returns the pathways of all clusters
pathway_expression <- pathways(gsva_result)

# Simplify the column names by removing the default dataset identifier
colnames(pathway_expression) <- gsub("\\.Seurat", "", colnames(pathway_expression))

# Check the first three pathways
pathway_expression[1:3,]
##                                          Name Cluster_1 Cluster_10 Cluster_11
## R-HSA-1059683         Interleukin-6 signaling 0.1095886 0.10583047  0.1464828
## R-HSA-109606  Intrinsic Pathway for Apoptosis 0.1110049 0.10802694  0.1152236
## R-HSA-109703              PKB-mediated events 0.1272170 0.05259092  0.1061580
##               Cluster_12 Cluster_13  Cluster_2  Cluster_3  Cluster_4  Cluster_5
## R-HSA-1059683 0.11478603 0.10563668 0.12228118 0.11707971 0.11658300 0.10752920
## R-HSA-109606  0.11437148 0.12881599 0.10669255 0.10913197 0.11275952 0.10513836
## R-HSA-109703  0.09549884 0.07342587 0.08303762 0.08388032 0.05555674 0.04629642
##               Cluster_6 Cluster_7  Cluster_8  Cluster_9
## R-HSA-1059683 0.1008306 0.1236283 0.13873595 0.10955212
## R-HSA-109606  0.1098378 0.1166306 0.11872254 0.11558804
## R-HSA-109703  0.1238539 0.0770211 0.07808829 0.01423031

A simple approach to find the most interesting pathways is to evaluate the maximum expression difference for each pathway.

# Find the maximum differently expressed pathway
max_difference <- do.call(rbind, apply(pathway_expression, 1, function(row) {
    values <- as.numeric(row[2:length(row)])
    return(data.frame(name = row[1], min = min(values), max = max(values)))
}))

# Calculate the maximum expression difference for each pathway
max_difference$diff <- max_difference$max - max_difference$min

# Sort based on the difference
max_difference <- max_difference[order(max_difference$diff, decreasing = T), ]

# Check the first few pathways and their maximum expression differences
head(max_difference)
##                                                          name        min
## R-HSA-350864           Regulation of thyroid hormone activity -0.4873137
## R-HSA-8964540                              Alanine metabolism -0.5061703
## R-HSA-190374  FGFR1c and Klotho ligand binding and activation -0.3432086
## R-HSA-140180                                    COX reactions -0.3447800
## R-HSA-9024909           BDNF activates NTRK2 (TRKB) signaling -0.3743353
## R-HSA-9025046           NTF3 activates NTRK2 (TRKB) signaling -0.3953952
##                     max      diff
## R-HSA-350864  0.3746363 0.8619500
## R-HSA-8964540 0.2554885 0.7616588
## R-HSA-190374  0.4152150 0.7584236
## R-HSA-140180  0.3721406 0.7169206
## R-HSA-9024909 0.3230906 0.6974259
## R-HSA-9025046 0.2989622 0.6943574

The plot_gsva_pathway function plots the expression of a single pathway accross the different cell types.

# Plot the expression of a single pathway accross the different cell types
# options(repr.plot.width = 6, repr.plot.height = 4)
plot_obj <- plot_gsva_pathway(gsva_result, pathway_id = "R-HSA-350864")
print(plot_obj)

# Plot a different pathway
plot_gsva_pathway(gsva_result, pathway_id = "R-HSA-1169091")

The plot_gsva_heatmap function is an intuitive way to show differences of multiple pathways across all cell clusters/types.

selected_pathways <- c(
                       "R-HSA-1169091", "R-HSA-1168372", "R-HSA-983705", # BCR signalling
                       "R-HSA-9034013", "R-HSA-9024909", "R-HSA-9025046" # NTRK3 associated pathways 
                       )

# Sort the cells alphabetically in the pathway result
org_cols <- colnames(gsva_result@results[[1]]$pathways)
org_cols <- c(org_cols[1:2], sort(org_cols[3:length(org_cols)]))

gsva_result@results[[1]]$pathways <- gsva_result@results[[1]]$pathways[, org_cols]

# Display the heatmap
options(repr.plot.width = 15, repr.plot.height = 7)
plot_gsva_heatmap(gsva_result, truncate_names = T, pathway_ids = selected_pathways, 
                  dendrogram = "none", 
                  rowsep = 1:length(selected_pathways), colsep = 1:13, sepcolor = "black", sepwidth = c(0.01, 0.0001),
                  scale = "row",
                  margins = c(6, 10),
                  cexRow = 1,
                  Colv = F)

# If pathway_ids is not set,  we can set the maximum number of pathways to include in heatmap
plot_gsva_heatmap(gsva_result, max_pathways = 15, margins = c(6,20))

Pathway-level PCA

The pathway-level expression analysis can also be used to run a Principal Component Analysis on the samples. This is simplified through the function plot_gsva_pca:

plot_gsva_pca(gsva_result)

In this analysis, cluster 11 is a clear outlier from the other B cell subtypes and therefore might be prioritised for further evaluation.

Part II: Pathway analysis with Seurat

# Specify your cell type 
my_celltype <- "Ast"
# Specify time point of interest
my_timept <- "2mo"

my_celltype_data <- readRDS("/data/Alzheimers_DS/FinalMergedData-Ast.rds")

# Subset my celltype-specific Seurat object by time
my_seurat <- subset(x = my_celltype_data, subset = Age == my_timept)

# Set active identities of my subset seurat to "seurat_clusters" attribute 
Idents(my_seurat) <- "seurat_clusters"

# Perform the pathway analysis using ReactomeGSA's analyse_sc_clusters function
gsva_result_seurat <- analyse_sc_clusters(my_seurat, verbose = TRUE)
## Calculating average cluster expression...
## Converting expression data to string... (This may take a moment)
## Conversion complete
## Submitting request to Reactome API...
## Compressing request data...
## Reactome Analysis submitted succesfully
## Converting dataset Seurat...
## Mapping identifiers...
## Performing gene set analysis using ssGSEA
## Analysing dataset 'Seurat' using ssGSEA
## Retrieving result...
gsva_result_seurat
## ReactomeAnalysisResult object
##   Reactome Release: 80
##   Results:
##   - Seurat:
##     1782 pathways
##     5287 fold changes for genes
##   No Reactome visualizations available
## ReactomeAnalysisResult
# Combines and returns the pathways of all clusters
pathway_expression_seurat <- pathways(gsva_result_seurat)

# Simplify the column names by removing the default dataset identifier
colnames(pathway_expression_seurat) <- gsub("\\.Seurat", "", colnames(pathway_expression_seurat))
colnames(pathway_expression_seurat) <- gsub("X", "Cluster", colnames(pathway_expression_seurat))

# Check the first three pathways
pathway_expression_seurat[1:3,]
##                                          Name   Cluster1  Cluster10  Cluster12
## R-HSA-1059683         Interleukin-6 signaling 0.06299457 0.06033846 0.08230918
## R-HSA-109606  Intrinsic Pathway for Apoptosis 0.08587167 0.10122974 0.10351837
## R-HSA-109703              PKB-mediated events 0.49904858 0.49220308 0.48231275
##                Cluster14  Cluster17  Cluster18  Cluster19  Cluster20  Cluster21
## R-HSA-1059683 0.10319353 0.05657831 0.04673243 0.03341037 0.07570325 0.04093702
## R-HSA-109606  0.09976679 0.08822515 0.08699354 0.05768395 0.09941158 0.09780691
## R-HSA-109703  0.50142602 0.48706989 0.47491875 0.47757804 0.47909432 0.47994977
##                Cluster22  Cluster23  Cluster24  Cluster26  Cluster29   Cluster3
## R-HSA-1059683 0.05540514 0.08083686 0.02706257 0.02015672 0.07256262 0.08846758
## R-HSA-109606  0.07664207 0.11026093 0.07764913 0.05425798 0.10779721 0.09354268
## R-HSA-109703  0.47157603 0.49220628 0.48280728 0.38305567 0.47994183 0.48556246
##                Cluster30   Cluster7   Cluster9
## R-HSA-1059683 0.06136025 0.03592487 0.04153281
## R-HSA-109606  0.08564958 0.04974637 0.07498208
## R-HSA-109703  0.47700921 0.42932233 0.47979141
# Find the maximum differently expressed pathway
max_difference_seurat <- do.call(rbind, apply(pathway_expression_seurat, 1, function(row) {
    values <- as.numeric(row[2:length(row)])
    return(data.frame(name = row[1], min = min(values), max = max(values)))
}))

# Calculate the maximum expression difference for each pathway
max_difference_seurat$diff <- max_difference_seurat$max - max_difference_seurat$min

# Sort based on the difference
max_difference_seurat <- max_difference_seurat[order(max_difference_seurat$diff, decreasing = T), ]

# Check the first few pathways and their maximum expression differences
head(max_difference_seurat)
##                                                                                 name
## R-HSA-427975                                      Proton/oligopeptide cotransporters
## R-HSA-2142850                                     Hyaluronan biosynthesis and export
## R-HSA-5653890                                                      Lactose synthesis
## R-HSA-1299308 Tandem of pore domain in a weak inwardly rectifying K+ channels (TWIK)
## R-HSA-8964540                                                     Alanine metabolism
## R-HSA-2161517                                       Abacavir transmembrane transport
##                      min       max      diff
## R-HSA-427975  -0.4324016 0.5016163 0.9340179
## R-HSA-2142850 -0.4945807 0.4352539 0.9298346
## R-HSA-5653890 -0.4579684 0.4398069 0.8977752
## R-HSA-1299308 -0.4305001 0.4407682 0.8712683
## R-HSA-8964540 -0.4833619 0.3852443 0.8686062
## R-HSA-2161517 -0.4725233 0.3704126 0.8429359
# Plot the expression of a single pathway accross the different cell types
options(repr.plot.width = 6, repr.plot.height = 4)
plot_obj_seurat <- plot_gsva_pathway(gsva_result_seurat, pathway_id = "R-HSA-427975")
print(plot_obj_seurat)

# Plot a different pathway
plot_gsva_pathway(gsva_result_seurat, pathway_id = "R-HSA-1660499")

selected_pathways_seurat <- c(
                       "R-HSA-3000157", # Laminin organization
                       "R-HSA-1474244", # Extracellular matrix organization
                       "R-HSA-3000171", # Non-integrin membrane-ECM interactions
                       "R-HSA-3000178", # ECM proteoglycans
                       "R-HSA-216083" # Integrin cell surface interactions
                       )

# Sort the cells alphabetically in the pathway result
org_cols_seurat <- colnames(gsva_result_seurat@results[[1]]$pathways)
org_cols_seurat <- c(org_cols_seurat[1:2], sort(org_cols_seurat[3:length(org_cols_seurat)]))

gsva_result_seurat@results[[1]]$pathways <- gsva_result_seurat@results[[1]]$pathways[, org_cols_seurat]

# Display the heatmap
options(repr.plot.width = 15, repr.plot.height = 7)
plot_gsva_heatmap(gsva_result_seurat, truncate_names = T, pathway_ids = selected_pathways_seurat, 
                  dendrogram = "none", 
                  rowsep = 1:length(selected_pathways_seurat), colsep = 1:13, sepcolor = "black", sepwidth = c(0.01, 0.0001),
                  scale = "row",
                  margins = c(6, 10),
                  cexRow = 1,
                  Colv = F)

# If pathway_ids is not set,  we can set the maximum number of pathways to include in heatmap
plot_gsva_heatmap(gsva_result_seurat, max_pathways = 15, margins = c(6,20))

# Pathway-level PCA
plot_gsva_pca(gsva_result_seurat)