The ReactomeGSA package is a client to the web-based
Reactome Analysis System. Essentially, it performs a gene set analysis
using the latest version of the Reactome pathway database as a
backend.
This vignette shows how the ReactomeGSA package can be
used to perform a pathway analysis of cell clusters in single-cell
RNA-sequencing data.
As an example we load single-cell RNA-sequencing data of B cells extracted from the dataset published by Jerby-Arnon et al. (Cell, 2018).
Note: This is not a complete Seurat object. To decrease the size, the object only contains gene expression values and cluster annotations.
# Read in the B cells data
load("jerby_b_cells.RData")
Perform the pathway analysis using ReactomeGSA’s
analyse_sc_clusters function, which automatically takes
care of all required steps to perform pathway analysis:
Calculate the average expression per cluster
Perform a ssGSEA gene set variation analysis using
ReactomeGSA
Return the final ReactomeGSAResult object
# Perform the pathway analysis using ReactomeGSA's analyse_sc_clusters function
gsva_result <- analyse_sc_clusters(jerby_b_cells, verbose = TRUE)
## Calculating average cluster expression...
## Converting expression data to string... (This may take a moment)
## Conversion complete
## Submitting request to Reactome API...
## Compressing request data...
## Reactome Analysis submitted succesfully
## Converting dataset Seurat...
## Mapping identifiers...
## Performing gene set analysis using ssGSEA
## Analysing dataset 'Seurat' using ssGSEA
## Retrieving result...
gsva_result
## ReactomeAnalysisResult object
## Reactome Release: 80
## Results:
## - Seurat:
## 1741 pathways
## 10963 fold changes for genes
## No Reactome visualizations available
## ReactomeAnalysisResult
The pathways function returns a data.frame with the
columns representing the clusters and each row a pathway.
# Combines and returns the pathways of all clusters
pathway_expression <- pathways(gsva_result)
# Simplify the column names by removing the default dataset identifier
colnames(pathway_expression) <- gsub("\\.Seurat", "", colnames(pathway_expression))
# Check the first three pathways
pathway_expression[1:3,]
## Name Cluster_1 Cluster_10 Cluster_11
## R-HSA-1059683 Interleukin-6 signaling 0.1095886 0.10583047 0.1464828
## R-HSA-109606 Intrinsic Pathway for Apoptosis 0.1110049 0.10802694 0.1152236
## R-HSA-109703 PKB-mediated events 0.1272170 0.05259092 0.1061580
## Cluster_12 Cluster_13 Cluster_2 Cluster_3 Cluster_4 Cluster_5
## R-HSA-1059683 0.11478603 0.10563668 0.12228118 0.11707971 0.11658300 0.10752920
## R-HSA-109606 0.11437148 0.12881599 0.10669255 0.10913197 0.11275952 0.10513836
## R-HSA-109703 0.09549884 0.07342587 0.08303762 0.08388032 0.05555674 0.04629642
## Cluster_6 Cluster_7 Cluster_8 Cluster_9
## R-HSA-1059683 0.1008306 0.1236283 0.13873595 0.10955212
## R-HSA-109606 0.1098378 0.1166306 0.11872254 0.11558804
## R-HSA-109703 0.1238539 0.0770211 0.07808829 0.01423031
A simple approach to find the most interesting pathways is to evaluate the maximum expression difference for each pathway.
# Find the maximum differently expressed pathway
max_difference <- do.call(rbind, apply(pathway_expression, 1, function(row) {
values <- as.numeric(row[2:length(row)])
return(data.frame(name = row[1], min = min(values), max = max(values)))
}))
# Calculate the maximum expression difference for each pathway
max_difference$diff <- max_difference$max - max_difference$min
# Sort based on the difference
max_difference <- max_difference[order(max_difference$diff, decreasing = T), ]
# Check the first few pathways and their maximum expression differences
head(max_difference)
## name min
## R-HSA-350864 Regulation of thyroid hormone activity -0.4873137
## R-HSA-8964540 Alanine metabolism -0.5061703
## R-HSA-190374 FGFR1c and Klotho ligand binding and activation -0.3432086
## R-HSA-140180 COX reactions -0.3447800
## R-HSA-9024909 BDNF activates NTRK2 (TRKB) signaling -0.3743353
## R-HSA-9025046 NTF3 activates NTRK2 (TRKB) signaling -0.3953952
## max diff
## R-HSA-350864 0.3746363 0.8619500
## R-HSA-8964540 0.2554885 0.7616588
## R-HSA-190374 0.4152150 0.7584236
## R-HSA-140180 0.3721406 0.7169206
## R-HSA-9024909 0.3230906 0.6974259
## R-HSA-9025046 0.2989622 0.6943574
The plot_gsva_pathway function plots the expression of a
single pathway accross the different cell types.
# Plot the expression of a single pathway accross the different cell types
# options(repr.plot.width = 6, repr.plot.height = 4)
plot_obj <- plot_gsva_pathway(gsva_result, pathway_id = "R-HSA-350864")
print(plot_obj)
# Plot a different pathway
plot_gsva_pathway(gsva_result, pathway_id = "R-HSA-1169091")
The plot_gsva_heatmap function is an intuitive way to
show differences of multiple pathways across all cell
clusters/types.
selected_pathways <- c(
"R-HSA-1169091", "R-HSA-1168372", "R-HSA-983705", # BCR signalling
"R-HSA-9034013", "R-HSA-9024909", "R-HSA-9025046" # NTRK3 associated pathways
)
# Sort the cells alphabetically in the pathway result
org_cols <- colnames(gsva_result@results[[1]]$pathways)
org_cols <- c(org_cols[1:2], sort(org_cols[3:length(org_cols)]))
gsva_result@results[[1]]$pathways <- gsva_result@results[[1]]$pathways[, org_cols]
# Display the heatmap
options(repr.plot.width = 15, repr.plot.height = 7)
plot_gsva_heatmap(gsva_result, truncate_names = T, pathway_ids = selected_pathways,
dendrogram = "none",
rowsep = 1:length(selected_pathways), colsep = 1:13, sepcolor = "black", sepwidth = c(0.01, 0.0001),
scale = "row",
margins = c(6, 10),
cexRow = 1,
Colv = F)
# If pathway_ids is not set, we can set the maximum number of pathways to include in heatmap
plot_gsva_heatmap(gsva_result, max_pathways = 15, margins = c(6,20))
The pathway-level expression analysis can also be used to run a
Principal Component Analysis on the samples. This is simplified through
the function plot_gsva_pca:
plot_gsva_pca(gsva_result)
In this analysis, cluster 11 is a clear outlier from the other B cell subtypes and therefore might be prioritised for further evaluation.
Seurat# Specify your cell type
my_celltype <- "Ast"
# Specify time point of interest
my_timept <- "2mo"
my_celltype_data <- readRDS("/data/Alzheimers_DS/FinalMergedData-Ast.rds")
# Subset my celltype-specific Seurat object by time
my_seurat <- subset(x = my_celltype_data, subset = Age == my_timept)
# Set active identities of my subset seurat to "seurat_clusters" attribute
Idents(my_seurat) <- "seurat_clusters"
# Perform the pathway analysis using ReactomeGSA's analyse_sc_clusters function
gsva_result_seurat <- analyse_sc_clusters(my_seurat, verbose = TRUE)
## Calculating average cluster expression...
## Converting expression data to string... (This may take a moment)
## Conversion complete
## Submitting request to Reactome API...
## Compressing request data...
## Reactome Analysis submitted succesfully
## Converting dataset Seurat...
## Mapping identifiers...
## Performing gene set analysis using ssGSEA
## Analysing dataset 'Seurat' using ssGSEA
## Retrieving result...
gsva_result_seurat
## ReactomeAnalysisResult object
## Reactome Release: 80
## Results:
## - Seurat:
## 1782 pathways
## 5287 fold changes for genes
## No Reactome visualizations available
## ReactomeAnalysisResult
# Combines and returns the pathways of all clusters
pathway_expression_seurat <- pathways(gsva_result_seurat)
# Simplify the column names by removing the default dataset identifier
colnames(pathway_expression_seurat) <- gsub("\\.Seurat", "", colnames(pathway_expression_seurat))
colnames(pathway_expression_seurat) <- gsub("X", "Cluster", colnames(pathway_expression_seurat))
# Check the first three pathways
pathway_expression_seurat[1:3,]
## Name Cluster1 Cluster10 Cluster12
## R-HSA-1059683 Interleukin-6 signaling 0.06299457 0.06033846 0.08230918
## R-HSA-109606 Intrinsic Pathway for Apoptosis 0.08587167 0.10122974 0.10351837
## R-HSA-109703 PKB-mediated events 0.49904858 0.49220308 0.48231275
## Cluster14 Cluster17 Cluster18 Cluster19 Cluster20 Cluster21
## R-HSA-1059683 0.10319353 0.05657831 0.04673243 0.03341037 0.07570325 0.04093702
## R-HSA-109606 0.09976679 0.08822515 0.08699354 0.05768395 0.09941158 0.09780691
## R-HSA-109703 0.50142602 0.48706989 0.47491875 0.47757804 0.47909432 0.47994977
## Cluster22 Cluster23 Cluster24 Cluster26 Cluster29 Cluster3
## R-HSA-1059683 0.05540514 0.08083686 0.02706257 0.02015672 0.07256262 0.08846758
## R-HSA-109606 0.07664207 0.11026093 0.07764913 0.05425798 0.10779721 0.09354268
## R-HSA-109703 0.47157603 0.49220628 0.48280728 0.38305567 0.47994183 0.48556246
## Cluster30 Cluster7 Cluster9
## R-HSA-1059683 0.06136025 0.03592487 0.04153281
## R-HSA-109606 0.08564958 0.04974637 0.07498208
## R-HSA-109703 0.47700921 0.42932233 0.47979141
# Find the maximum differently expressed pathway
max_difference_seurat <- do.call(rbind, apply(pathway_expression_seurat, 1, function(row) {
values <- as.numeric(row[2:length(row)])
return(data.frame(name = row[1], min = min(values), max = max(values)))
}))
# Calculate the maximum expression difference for each pathway
max_difference_seurat$diff <- max_difference_seurat$max - max_difference_seurat$min
# Sort based on the difference
max_difference_seurat <- max_difference_seurat[order(max_difference_seurat$diff, decreasing = T), ]
# Check the first few pathways and their maximum expression differences
head(max_difference_seurat)
## name
## R-HSA-427975 Proton/oligopeptide cotransporters
## R-HSA-2142850 Hyaluronan biosynthesis and export
## R-HSA-5653890 Lactose synthesis
## R-HSA-1299308 Tandem of pore domain in a weak inwardly rectifying K+ channels (TWIK)
## R-HSA-8964540 Alanine metabolism
## R-HSA-2161517 Abacavir transmembrane transport
## min max diff
## R-HSA-427975 -0.4324016 0.5016163 0.9340179
## R-HSA-2142850 -0.4945807 0.4352539 0.9298346
## R-HSA-5653890 -0.4579684 0.4398069 0.8977752
## R-HSA-1299308 -0.4305001 0.4407682 0.8712683
## R-HSA-8964540 -0.4833619 0.3852443 0.8686062
## R-HSA-2161517 -0.4725233 0.3704126 0.8429359
# Plot the expression of a single pathway accross the different cell types
options(repr.plot.width = 6, repr.plot.height = 4)
plot_obj_seurat <- plot_gsva_pathway(gsva_result_seurat, pathway_id = "R-HSA-427975")
print(plot_obj_seurat)
# Plot a different pathway
plot_gsva_pathway(gsva_result_seurat, pathway_id = "R-HSA-1660499")
selected_pathways_seurat <- c(
"R-HSA-3000157", # Laminin organization
"R-HSA-1474244", # Extracellular matrix organization
"R-HSA-3000171", # Non-integrin membrane-ECM interactions
"R-HSA-3000178", # ECM proteoglycans
"R-HSA-216083" # Integrin cell surface interactions
)
# Sort the cells alphabetically in the pathway result
org_cols_seurat <- colnames(gsva_result_seurat@results[[1]]$pathways)
org_cols_seurat <- c(org_cols_seurat[1:2], sort(org_cols_seurat[3:length(org_cols_seurat)]))
gsva_result_seurat@results[[1]]$pathways <- gsva_result_seurat@results[[1]]$pathways[, org_cols_seurat]
# Display the heatmap
options(repr.plot.width = 15, repr.plot.height = 7)
plot_gsva_heatmap(gsva_result_seurat, truncate_names = T, pathway_ids = selected_pathways_seurat,
dendrogram = "none",
rowsep = 1:length(selected_pathways_seurat), colsep = 1:13, sepcolor = "black", sepwidth = c(0.01, 0.0001),
scale = "row",
margins = c(6, 10),
cexRow = 1,
Colv = F)
# If pathway_ids is not set, we can set the maximum number of pathways to include in heatmap
plot_gsva_heatmap(gsva_result_seurat, max_pathways = 15, margins = c(6,20))
# Pathway-level PCA
plot_gsva_pca(gsva_result_seurat)