In this session we will learn the basics of imaging-derived spatial transcriptomic data. We will learn how to visualise, manipulate and analyse single molecule data.
We will maintain the use of tidyomics that we learned in
Session 2. The programming style, in contrast of
Session 1 will make use of the |> (pipe)
operator.
# https://bioconductor.org/packages/devel/data/experiment/vignettes/SubcellularSpatialData/inst/doc/SubcellularSpatialData.html
# BiocManager::install("stemangiola/SubcellularSpatialData")
# Tidyverse library(tidyverse)
library(ggplot2)
library(plotly)
library(dplyr)
library(tidyr)
library(purrr)
library(glue) # sprintf
library(stringr)
library(forcats)
# Plotting
library(colorspace)
library(dittoSeq)
library(ggspavis)
library(RColorBrewer)
# Analysis
library(scuttle)
library(scater)
library(scran)
# Data download
library(ExperimentHub)
# Tidyomics
library(tidySingleCellExperiment)
library(tidySummarizedExperiment)
library(tidySpatialExperiment)
# Niche analysis
library(hoodscanR)
library(scico)
library(ggspavis)This data
package contains annotated datasets localized at the sub-cellular
level from the STOmics, Xenium, and CosMx platforms, as analyzed in the
publication by Bhuva
et al., 2024. It includes raw transcript detections and provides
functions to convert these into SpatialExperiment
objects.
Let’s preview the object. The data is contained in a simple data frame.
| sample_id | cell | gene | genetype | x | y | counts | region | technology | level | Level0 | Level1 | Level2 | Level3 | Level4 | Level5 | Level6 | Level7 | Level8 | Level9 | Level10 | Level11 | transcript_id | overlaps_nucleus | z_location | qv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs | 136818 | Gng12 | Gene | 6340.038 | 6671.3477 | 1 | NA | Xenium | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.819560e+14 | 1 | 21.06523 | 40.00000 |
| Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs | 6617 | Strip2 | Gene | 6124.914 | 1707.9254 | 1 | fiber tracts | Xenium | Level1 | root | fiber tracts | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.815695e+14 | 0 | 13.99666 | 40.00000 |
| Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs | NA | Calb2 | Gene | 6237.329 | 884.7155 | 1 | LZ | Xenium | Level6 | root | grey | BS | IB | NA | HY | LZ | NA | NA | NA | NA | NA | 2.814836e+14 | 0 | 16.01127 | 40.00000 |
| Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs | 8758 | Necab1 | Gene | 9086.885 | 3002.8480 | 1 | TEa5 | Xenium | Level11 | root | grey | CH | CTX | CTXpl | Isocortex | TEa | NA | NA | NA | NA | TEa5 | 2.817327e+14 | 0 | 18.94937 | 40.00000 |
| Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs | 145515 | Igf2 | Gene | 8002.925 | 1222.2137 | 1 | CTXsp | Xenium | Level5 | root | grey | CH | CTX | NA | CTXsp | NA | NA | NA | NA | NA | NA | 2.815824e+14 | 1 | 17.08874 | 40.00000 |
| Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs | 116460 | Car4 | Gene | 2882.634 | 3161.9277 | 1 | VPM | Xenium | Level9 | root | grey | BS | IB | NA | TH | DORsm | VENT | VP | VPM | NA | NA | 2.816940e+14 | 0 | 15.33951 | 17.10761 |
We can appreciate how, even subsampling the data 1 in 500, we still have a vast amount of data to visualise.
tx_small |>
ggplot(aes(x, y, colour = region)) +
geom_point(pch = ".") +
facet_wrap(~sample_id, ncol = 2) +
coord_fixed() +
theme_minimal() +
theme(legend.position = "none")This dataset have been annotated for regions. Let’s have a look how many regions have been annotated
## # A tibble: 146 × 1
## region
## <chr>
## 1 <NA>
## 2 fiber tracts
## 3 LZ
## 4 TEa5
## 5 CTXsp
## 6 VPM
## 7 SSp-bfd5
## 8 cc
## 9 ENTl5
## 10 SSp
## # ℹ 136 more rows
From this large dataset, we select a small reagion for illustrative purposes
Load the pre-saved data
The R package MoleculeExperiment includes functions to create and manipulate objects from the newly introduced MoleculeExperiment class, designed for analyzing molecule-based spatial transcriptomics data from platforms such as Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others.
Although in this session we will not use
MoleculeExperiment class, because of the lack of
segmentation boundary information (we rather have cell identifiers), we
briefly introduce this package because as an important part of
Bioconductor.
We show how we would import our table of probe location into a
MoleculeExperiment. At the end of the Session, for
knowledge, we will navigate the example code given in the vignette
material.
library(MoleculeExperiment)
repoDir = system.file("extdata", package = "MoleculeExperiment")
repoDir = paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")
me = readXenium(repoDir, keepCols = "essential")
me## MoleculeExperiment class
##
## molecules slot (1): detected
## - detected:
## samples (2): sample1 sample2
## -- sample1:
## ---- features (137): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (962)
## ---- location range: [4900,4919.98] x [6400.02,6420]
## -- sample2:
## ---- features (143): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (777)
## ---- location range: [4900.01,4919.98] x [6400.16,6419.97]
##
##
## boundaries slot (1): cell
## - cell:
## samples (2): sample1 sample2
## -- sample1:
## ---- segments (5): 67500 67512 67515 67521 67527
## -- sample2:
## ---- segments (9): 65043 65044 ... 65070 65071
In this object, besides the single molecule location, we have cell segmentation boundaries. We can use these boudaries to understand subcellular localisation of molecules and to aggregate molecules in cells.
ggplot_me() +
geom_polygon_me(me, assayName = "cell", fill = "#F8DE7E", color="grey") +
geom_point_me(me) +
# zoom in to selected patch area
coord_cartesian(
xlim = c(4900, 4919.98),
ylim = c(6400.02, 6420)
)In this object we don’t only have the cell segmentation but the nucleous segmentation as well.
boundaries(me, "nucleus") = readBoundaries(
dataDir = repoDir,
pattern = "nucleus_boundaries.csv",
segmentIDCol = "cell_id",
xCol = "vertex_x",
yCol = "vertex_y",
keepCols = "essential",
boundariesAssay = "nucleus",
scaleFactorVector = 1
)
boundaries(me, "cell")## $cell
## $cell$sample1
## $cell$sample1$`67500`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4905. 6400.
## 2 4899. 6401.
## 3 4894. 6408.
## 4 4890. 6418.
## 5 4887. 6423.
## 6 4887. 6425.
## 7 4890. 6427.
## 8 4891. 6427.
## 9 4894. 6426.
## 10 4908. 6421.
## 11 4906. 6404.
## 12 4905. 6400.
## 13 4905. 6400.
##
## $cell$sample1$`67512`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4906. 6404.
## 2 4906. 6408.
## 3 4907. 6412.
## 4 4907. 6415.
## 5 4908. 6421.
## 6 4910. 6418.
## 7 4914. 6414.
## 8 4914. 6413.
## 9 4914. 6412.
## 10 4914. 6412.
## 11 4911. 6408.
## 12 4906. 6405.
## 13 4906. 6404.
##
## $cell$sample1$`67515`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4909. 6396.
## 2 4905. 6399.
## 3 4906. 6403.
## 4 4906. 6404.
## 5 4912. 6408.
## 6 4914. 6413.
## 7 4917. 6410.
## 8 4920. 6408.
## 9 4922. 6404.
## 10 4916. 6397.
## 11 4913. 6396.
## 12 4910. 6396.
## 13 4909. 6396.
##
## $cell$sample1$`67521`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4920. 6408.
## 2 4916. 6411.
## 3 4916. 6412.
## 4 4914. 6413.
## 5 4914. 6414.
## 6 4910. 6418.
## 7 4908. 6421.
## 8 4919. 6428.
## 9 4918. 6422.
## 10 4918. 6418.
## 11 4920. 6413.
## 12 4920. 6410.
## 13 4920. 6408.
##
## $cell$sample1$`67527`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4922. 6405.
## 2 4920. 6408.
## 3 4920. 6413.
## 4 4918. 6418.
## 5 4919. 6428.
## 6 4922. 6432.
## 7 4927. 6430.
## 8 4927. 6414.
## 9 4929. 6409.
## 10 4929. 6408.
## 11 4928. 6408.
## 12 4923. 6405.
## 13 4922. 6405.
##
##
## $cell$sample2
## $cell$sample2$`65043`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4897. 6413.
## 2 4895. 6414.
## 3 4894. 6418.
## 4 4892. 6421.
## 5 4886. 6423.
## 6 4888. 6426.
## 7 4897. 6430.
## 8 4904. 6429.
## 9 4901. 6425.
## 10 4901. 6419.
## 11 4902. 6417.
## 12 4900. 6413.
## 13 4897. 6413.
##
## $cell$sample2$`65044`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4902. 6417.
## 2 4902. 6419.
## 3 4901. 6419.
## 4 4901. 6423.
## 5 4902. 6425.
## 6 4905. 6429.
## 7 4910. 6431.
## 8 4912. 6424.
## 9 4912. 6420.
## 10 4907. 6418.
## 11 4904. 6417.
## 12 4902. 6417.
## 13 4902. 6417.
##
## $cell$sample2$`65051`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4916. 6413.
## 2 4912. 6420.
## 3 4910. 6431.
## 4 4914. 6439.
## 5 4919. 6444.
## 6 4924. 6443.
## 7 4928. 6442.
## 8 4934. 6437.
## 9 4938. 6429.
## 10 4939. 6424.
## 11 4922. 6417.
## 12 4917. 6413.
## 13 4916. 6413.
##
## $cell$sample2$`65055`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4912. 6398.
## 2 4907. 6401.
## 3 4904. 6407.
## 4 4902. 6410.
## 5 4900. 6414.
## 6 4902. 6417.
## 7 4904. 6417.
## 8 4912. 6420.
## 9 4914. 6418.
## 10 4917. 6409
## 11 4916. 6405.
## 12 4912. 6398.
## 13 4912. 6398.
##
## $cell$sample2$`65063`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4930. 6408.
## 2 4925. 6410.
## 3 4921. 6410.
## 4 4918. 6409.
## 5 4917. 6412.
## 6 4922. 6417.
## 7 4927. 6419.
## 8 4931. 6421.
## 9 4939. 6423.
## 10 4939. 6423.
## 11 4938. 6422.
## 12 4930. 6408.
## 13 4930. 6408.
##
## $cell$sample2$`65064`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4891. 6399.
## 2 4888. 6400.
## 3 4888. 6410.
## 4 4892. 6411.
## 5 4895. 6414.
## 6 4897. 6413.
## 7 4900. 6413.
## 8 4902. 6410.
## 9 4904. 6407.
## 10 4907. 6401.
## 11 4900. 6401.
## 12 4893. 6399.
## 13 4891. 6399.
##
## $cell$sample2$`65067`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4925. 6403.
## 2 4924. 6403.
## 3 4922. 6403.
## 4 4921. 6403.
## 5 4916. 6405.
## 6 4918. 6409
## 7 4921. 6410.
## 8 4925. 6410.
## 9 4927. 6409.
## 10 4930. 6408.
## 11 4930. 6408.
## 12 4925. 6403.
## 13 4925. 6403.
##
## $cell$sample2$`65070`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4901. 6389.
## 2 4899. 6391.
## 3 4899. 6392.
## 4 4896. 6395.
## 5 4892. 6399.
## 6 4897. 6400.
## 7 4900. 6400.
## 8 4908. 6400.
## 9 4911. 6398.
## 10 4912. 6397.
## 11 4909. 6390.
## 12 4902. 6389.
## 13 4901. 6389.
##
## $cell$sample2$`65071`
## # A tibble: 13 × 2
## x_location y_location
## <dbl> <dbl>
## 1 4924. 6394.
## 2 4922. 6395.
## 3 4917. 6396.
## 4 4912. 6397.
## 5 4912. 6398.
## 6 4916. 6405.
## 7 4922. 6403.
## 8 4923. 6403.
## 9 4925. 6402.
## 10 4925. 6401.
## 11 4925. 6400.
## 12 4925. 6394.
## 13 4924. 6394.
## List of 1
## $ detected:List of 2
## ..$ sample1:List of 137
## .. ..$ 2010300C02Rik : tibble [11 × 2] (S3: tbl_df/tbl/data.frame)
## .. ..$ Acsbg1 : tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
## .. .. [list output truncated]
## ..$ sample2:List of 143
## .. ..$ 2010300C02Rik: tibble [9 × 2] (S3: tbl_df/tbl/data.frame)
## .. ..$ Acsbg1 : tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
## .. .. [list output truncated]
bds_colours = setNames(
c("#aa0000ff", "#ffaaffff"),
c("Region 1", "Region 2")
)
ggplot_me() +
# add cell segments and colour by cell id
geom_polygon_me(me, byFill = "segment_id", colour = "black", alpha = 0.1) +
# add molecule points and colour by feature name
geom_point_me(me, byColour = "feature_id", size = 0.1) +
# add nuclei segments and colour the border with red
geom_polygon_me(me, assayName = "nucleus", fill = NA, colour = "red") +
# zoom in to selected patch area
coord_cartesian(xlim = c(4900, 4919.98), ylim = c(6400.02, 6420))## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 10414128 556.2 18222902 973.3 NA 18222902 973.3
## Vcells 44239081 337.6 75186197 573.7 65536 49330982 376.4
We can organise our large data frame containing single molecules into
a more efficient MoleculeExperiment.
library(MoleculeExperiment)
tx_small_me =
tx_small |>
select(sample_id, gene, x, y) |>
dataframeToMEList(
dfType = "molecules",
assayName = "detected",
sampleCol = "sample_id",
factorCol = "gene",
xCol = "x",
yCol = "y"
) |>
MoleculeExperiment()
tx_small_me## MoleculeExperiment class
##
## molecules slot (1): detected
## - detected:
## samples (3): Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs
## Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs
## Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs:
## ---- features (496): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (125531)
## ---- location range: [20,10197.56] x [31.25,7021.59]
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs:
## ---- features (483): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (116929)
## ---- location range: [28.65,10256.49] x [43.5,7012.21]
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs:
## ---- features (488): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (119310)
## ---- location range: [10.6,9656.13] x [45.85,7884.78]
##
##
## boundaries slot: NULL
Here, we can appreciate the difference in size between the redundant data frame
## [1] "69.1 Mb"
and the MoleculeExperiment.
## [1] "7 Mb"
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 10415689 556.3 18222902 973.3 NA 18222902 973.3
## Vcells 35124227 268.0 75186197 573.7 65536 62460398 476.6
Now let’s try to visualise just a small section. You can appreciate, coloured by cell, single molecules. You cqn also appreciate the difference in density between regions. An aspect to note, is that not all probes are withiin cells. This depends on the segmentation process.
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00" "#FFFF33" "#A65628"
tx_small_region |>
filter(!is.na(cell)) |>
slice_sample(prop = 0.3) |>
ggplot(aes(x, y, colour = factor(cell))) +
geom_point(shape=".") +
facet_wrap(~sample_id, ncol = 2) +
scale_color_manual(values = sample(colorRampPalette(brewer.pal(8, "Set2"))(1800))) +
coord_fixed() +
theme_minimal() +
theme(legend.position = "none")Let’s have a look to the probes that have not being unassigned to cells.
tx_small_region |>
filter(is.na(cell)) |>
ggplot(aes(x, y, colour = factor(cell))) +
geom_point(shape=".") +
facet_wrap(~sample_id, ncol = 2) +
scale_color_manual(values = sample(colorRampPalette(brewer.pal(8, "Set2"))(1800))) +
coord_fixed() +
theme_minimal() +
theme(legend.position = "none")## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 10439300 557.6 18222902 973.3 NA 18222902 973.3
## Vcells 20357963 155.4 60148958 459.0 65536 75162802 573.5
Exercise 3.1
We want to understand how much data we are discarting, that does not have a cell identity.
tidyomicsWe will convert our cell by gene count to a
SpatialExperiment. This object stores a cell by gene matrix
with relative XY coordinates.
SubcellularSpatialData has a utility function that
aggregated the single molecules in cells, where these cell ID have been
identified with segmentation.
A trivial edit to work with ggspavis.
Let’s have a look to our SpatialExperiment.
## # A SpatialExperiment-tibble abstraction: 474,734 × 9
## # [90mFeatures = 541 | Cells = 474734 | Assays = counts[0m
## .cell sample_id cell_id transcript_id qv region in_tissue x y
## <chr> <fct> <fct> <dbl> <dbl> <fct> <lgl> <dbl> <dbl>
## 1 Xenium_V1… 1 1 2.82e14 31.4 CP TRUE 1557. 2529.
## 2 Xenium_V1… 1 10 2.82e14 32.2 CP TRUE 1631. 2543.
## 3 Xenium_V1… 1 100 2.82e14 30.7 Isoco… TRUE 834. 3109.
## 4 Xenium_V1… 1 1000 2.82e14 31.4 RSPv5 TRUE 4932. 5720.
## 5 Xenium_V1… 1 10000 2.82e14 31.5 LA TRUE 1667. 2159.
## 6 Xenium_V1… 1 100000 2.82e14 33.6 VISa1 TRUE 3558. 6587.
## 7 Xenium_V1… 1 100001 2.82e14 31.7 VISa1 TRUE 3570. 6583.
## 8 Xenium_V1… 1 100002 2.82e14 33.9 SSp TRUE 3430. 6157.
## 9 Xenium_V1… 1 100003 2.82e14 32.5 SSp TRUE 3431. 6120.
## 10 Xenium_V1… 1 100004 2.82e14 33.5 SSp TRUE 3436. 6140.
## # ℹ 474,724 more rows
Let’s have a look at how many cells have been detected for each region
tx_spe |>
add_count(region) |>
ggplot(aes(fct_reorder(region, n, .desc = TRUE))) +
geom_bar() +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, size = 2))We normalise the SpatialExperiment using
scater.
We then visualise what is the relationship between variance and total expression across cells.
tx_spe |>
# Gene variance
scran::modelGeneVar(block = tx_spe$sample_id) |>
# Reformat for plotting
as_tibble(rownames = "feature") |>
# Plot
ggplot(aes(mean, total)) +
geom_point() +
geom_smooth(color="red")+
xlab("Mean log-expression") +
ylab("Variance") +
theme_bw()## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
For further analysis, we subset the dataset to allow quicker calculations.
As we have done previously, we calculate variable informative genes, for further analyses.
genes <- !grepl(pattern = "NegControl.+|BLANK.+", x = rownames(tx_spe_sample_1))
# Get the top 2000 genes.
top.hvgs =
tx_spe_sample_1 |>
scran::modelGeneVar(subset.row = genes) |>
# Model gene variance and select variable genes per sample
getTopHVGs(n=200) The selected subset of genes can then be passed to the subset.row argument (or equivalent) in downstream steps.
We then use the gene expression to cluster sales based on their similarity and represent these clusters in a two dimensional embeddings (UMAP)
Louvain clustering is a popular method used in single-cell RNA sequencing (scRNA-seq) data analysis to identify groups of cells with similar gene expression profiles. This method is based on the Louvain algorithm, which is widely used for detecting community structures in large networks.
The Louvain algorithm is designed to maximize a metric known as modularity, which measures the density of edges inside communities compared to edges outside communities.
It operates in two phases:
cluster_labels =
tx_spe_sample_1 |>
clusterCells(
use.dimred="PCA",
BLUSPARAM=bluster::NNGraphParam(k=20, cluster.fun="louvain")
) |>
as.character()
cluster_labels |>
head()## [1] "1" "1" "2" "3" "4" "5"
Now we add this cluster column to our
SpatialExperiment
As we have done before, we caculate UMAPs for visualisation purposes.
This step takes long time.
Now, let’s visualise the clusters in UMAP space.
tx_spe_sample_1 |>
plotUMAP(colour_by = "clusters") +
scale_color_discrete(
colorRampPalette(brewer.pal(9, "Set1"))(30)
)## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
Exercise 3.2
Let’s try to understand the identity of these clusters performing gene marker detection.
In the previous sections we have seen how to do gene marker selection for sequencing-based spatial data. We just have to adapt it to our current scenario.
Too understand whether the cell clusters explain morphology as opposed to merely cell identity, we can color cells according to annotated region. As we can see we have a lot of regions. We have more regions that cell clusters.
tx_spe_sample_1 |>
plotUMAP(colour_by = "region") +
scale_color_discrete(
brewer.pal(n = 30, name = "Set1")
) +
guides(color="none")## Warning in brewer.pal(n = 30, name = "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
Let’s try to understand the morphological distribution of cell clusters in space.
Plot ground truth in tissue map.
# For comparison the annotated regions
tx_spe_sample_1 |>
ggspavis::plotSpots(annotate = "region") +
scale_color_manual(values = colorRampPalette( brewer.pal(9,"Set1") )(150) ) +
guides(color = "none")## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
Exercise 3.3
Spatial-aware clustering: Apply the spatial aware clustering method BANKSY. Taking as example the code run for session 2.
hoodscanR Liu et al., 2024
Algorithm:
Nearest cells detection by Approximate Nearest Neighbor (ANN) search algorithm
Calculating euclidean distance matrix between cells and their k-nearest neighbors
Cell-level annotations provided by users are used to construct a cell annotation matrix
Identify cellular neighborhoods uses the SoftMax function, enhanced by a “shape” parameter that governs the “influence radious”. This measures probability of a cell type to be found in a neighbour.
The K-means clustering algorithm finds recurring neighbours
In order to perform neighborhood scanning, we need to firstly identify k (in this example, k = 100) nearest cells for each cells. The searching algorithm is based on Approximate Near Neighbor (ANN) C++ library from the RANN package.
tx_spe_neighbours =
tx_spe_sample_1 |>
readHoodData(anno_col = "clusters") |>
findNearCells(k = 100)The output of findNearCells function includes two matrix, an annotation matrix and a distance matrix.
## nearest_cell_1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 22
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 6
## nearest_cell_2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 22
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 6
## nearest_cell_3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 16
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 3
## nearest_cell_4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 17
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 3
## nearest_cell_5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 7
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 3
## nearest_cell_1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 20.521773
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 14.848379
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 11.846450
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 12.881435
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 16.468559
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 14.614673
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 7.758216
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 10.179061
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 29.404332
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 32.198567
## nearest_cell_2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 26.51370
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 34.84461
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 22.36816
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 14.63369
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 31.74507
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 21.56662
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 18.42152
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 24.94423
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 36.23650
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 32.40975
## nearest_cell_3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 33.08725
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 34.97209
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 26.63320
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 15.07115
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 31.89583
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 46.39735
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 19.11510
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 39.78288
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 37.02099
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 48.70481
## nearest_cell_4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 33.90145
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 38.92746
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 31.45865
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 18.36453
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 33.21731
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 54.88479
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 26.49622
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 53.91673
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 37.34226
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 54.49694
## nearest_cell_5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 36.40826
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 42.13184
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418 39.71453
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043 24.01902
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673 54.53777
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715 58.14167
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247 28.44833
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718 55.31210
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988 40.81428
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774 57.26670
We can then perform neighborhood analysis using the function scanHoods. This function incldue the modified softmax algorithm, aimming to genereate a matrix with the probability of each cell associating with their 100 nearest cells.
## Tau is set to: 4463.8145259849
# We can then merge the probabilities by the cell types of the 100 nearest cells. We get the probability distribution of each cell all each neighborhood.
hoods <- mergeByGroup(pm, tx_spe_neighbours$cells)
hoods[1:2, 1:10]## 1 10 11
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 3.033042e-03 0 0
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 2.712486e-05 0 0
## 12 13 14 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.00000000 0.000650276 0 0
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.02398078 0.003108832 0 0
## 16 17
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.00000000 4.243489e-05
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.01849357 3.374641e-02
## 18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.0000000
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.7780374
We plot randomly plot 50 cells to see the output of neighborhood scanning using plotHoodMat. In this plot, each value represent the probability of the each cell (each row) located in each cell type neighborhood. The rowSums of the probability maxtrix will always be 1.
We can then merge the neighborhood results with the
SpatialExperiment object using mergeHoodSpe so
that we can conduct more neighborhood-related analysis.
We can see what are the neighborhood distributions look like in each
cluster using plotProbDist.
tx_spe_sample_1 |>
plotProbDist(
pm_cols = colnames(hoods),
by_cluster = TRUE,
plot_all = TRUE,
show_clusters = as.character(seq(10))
)Session Information
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Australia/Adelaide
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MoleculeExperiment_1.4.0 scico_1.5.0
## [3] hoodscanR_1.2.0 tidySpatialExperiment_1.1.1
## [5] SpatialExperiment_1.14.0 tidySummarizedExperiment_1.15.0
## [7] ttservice_0.4.0 tidySingleCellExperiment_1.15.2
## [9] ExperimentHub_2.12.0 AnnotationHub_3.12.0
## [11] BiocFileCache_2.12.0 dbplyr_2.5.0
## [13] scran_1.32.0 scater_1.32.0
## [15] scuttle_1.14.0 SingleCellExperiment_1.26.0
## [17] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [19] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
## [21] IRanges_2.38.0 S4Vectors_0.42.0
## [23] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [25] matrixStats_1.3.0 RColorBrewer_1.1-3
## [27] ggspavis_1.9.10 dittoSeq_1.16.0
## [29] colorspace_2.1-0 forcats_1.0.0
## [31] stringr_1.5.1 glue_1.7.0
## [33] purrr_1.0.2 tidyr_1.3.1
## [35] dplyr_1.1.4 plotly_4.10.4
## [37] ggplot2_3.5.1 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] RcppAnnoy_0.0.22 splines_4.4.0
## [3] later_1.3.2 bitops_1.0-7
## [5] filelock_1.0.3 tibble_3.2.1
## [7] lifecycle_1.0.4 doParallel_1.0.17
## [9] edgeR_4.2.0 rprojroot_2.0.4
## [11] lattice_0.22-6 magrittr_2.0.3
## [13] limma_3.60.0 sass_0.4.9
## [15] rmarkdown_2.27 jquerylib_0.1.4
## [17] yaml_2.3.8 metapod_1.12.0
## [19] httpuv_1.6.15 ggside_0.3.1
## [21] cowplot_1.1.3 DBI_1.2.2
## [23] abind_1.4-5 zlibbioc_1.50.0
## [25] RCurl_1.98-1.14 rappdirs_0.3.3
## [27] circlize_0.4.16 GenomeInfoDbData_1.2.12
## [29] ggrepel_0.9.5 irlba_2.3.5.1
## [31] terra_1.7-71 pheatmap_1.0.12
## [33] dqrng_0.4.0 DelayedMatrixStats_1.26.0
## [35] codetools_0.2-20 DelayedArray_0.30.1
## [37] shape_1.4.6.1 tidyselect_1.2.1
## [39] UCSC.utils_1.0.0 farver_2.1.2
## [41] ScaledMatrix_1.12.0 viridis_0.6.5
## [43] jsonlite_1.8.8 GetoptLong_1.0.5
## [45] BiocNeighbors_1.22.0 ellipsis_0.3.2
## [47] iterators_1.0.14 ggridges_0.5.6
## [49] foreach_1.5.2 tools_4.4.0
## [51] Rcpp_1.0.12 gridExtra_2.3
## [53] SparseArray_1.4.3 xfun_0.44
## [55] mgcv_1.9-1 EBImage_4.46.0
## [57] withr_3.0.0 BiocManager_1.30.23
## [59] fastmap_1.2.0 bluster_1.14.0
## [61] fansi_1.0.6 digest_0.6.35
## [63] rsvd_1.0.5 R6_2.5.1
## [65] mime_0.12 Cairo_1.6-2
## [67] jpeg_0.1-10 RSQLite_2.3.6
## [69] utf8_1.2.4 generics_0.1.3
## [71] data.table_1.15.4 httr_1.4.7
## [73] htmlwidgets_1.6.4 S4Arrays_1.4.0
## [75] uwot_0.2.2 pkgconfig_2.0.3
## [77] gtable_0.3.5 blob_1.2.4
## [79] ComplexHeatmap_2.20.0 XVector_0.44.0
## [81] htmltools_0.5.8.1 fftwtools_0.9-11
## [83] clue_0.3-65 scales_1.3.0
## [85] png_0.1-8 knitr_1.46
## [87] rstudioapi_0.16.0 rjson_0.2.21
## [89] nlme_3.1-164 curl_5.2.1
## [91] GlobalOptions_0.1.2 cachem_1.1.0
## [93] BiocVersion_3.19.1 parallel_4.4.0
## [95] vipor_0.4.7 AnnotationDbi_1.66.0
## [97] pillar_1.9.0 grid_4.4.0
## [99] vctrs_0.6.5 RANN_2.6.1
## [101] promises_1.3.0 BiocSingular_1.20.0
## [103] beachmat_2.20.0 xtable_1.8-4
## [105] cluster_2.1.6 beeswarm_0.4.0
## [107] evaluate_0.23 magick_2.8.3
## [109] cli_3.6.2 locfit_1.5-9.9
## [111] compiler_4.4.0 rlang_1.1.3
## [113] crayon_1.5.2 labeling_0.4.3
## [115] ggbeeswarm_0.7.2 stringi_1.8.4
## [117] viridisLite_0.4.2 BiocParallel_1.38.0
## [119] munsell_0.5.1 Biostrings_2.72.0
## [121] lazyeval_0.2.2 tiff_0.1-12
## [123] Matrix_1.7-0 sparseMatrixStats_1.16.0
## [125] bit64_4.0.5 KEGGREST_1.44.0
## [127] statmod_1.5.0 shiny_1.8.1.1
## [129] highr_0.10 tidygate_0.5.3
## [131] igraph_2.0.3 memoise_2.0.1
## [133] bslib_0.7.0 bit_4.0.5
References
<mangiola.s at wehi.edu.au>↩︎