Session 3 – Spatial analyses of imaging data

In this session we will learn the basics of imaging-derived spatial transcriptomic data. We will learn how to visualise, manipulate and analyse single molecule data.

We will maintain the use of tidyomics that we learned in Session 2. The programming style, in contrast of Session 1 will make use of the |> (pipe) operator.

1. Working with imaging-based data in Bioconductor with MoleculeExperiment

# https://bioconductor.org/packages/devel/data/experiment/vignettes/SubcellularSpatialData/inst/doc/SubcellularSpatialData.html
# BiocManager::install("stemangiola/SubcellularSpatialData")

# Tidyverse library(tidyverse)
library(ggplot2)
library(plotly)
library(dplyr)
library(tidyr)
library(purrr)
library(glue) # sprintf
library(stringr)
library(forcats)

# Plotting
library(colorspace)
library(dittoSeq)
library(ggspavis)
library(RColorBrewer)

# Analysis
library(scuttle)
library(scater)
library(scran)

# Data download
library(ExperimentHub)

# Tidyomics
library(tidySingleCellExperiment)
library(tidySummarizedExperiment)
library(tidySpatialExperiment)

# Niche analysis
library(hoodscanR)
library(scico)
library(ggspavis)

SubcellularSpatialData

This data package contains annotated datasets localized at the sub-cellular level from the STOmics, Xenium, and CosMx platforms, as analyzed in the publication by Bhuva et al., 2024. It includes raw transcript detections and provides functions to convert these into SpatialExperiment objects.

library(SubcellularSpatialData)
eh = ExperimentHub(cache = "/vast/scratch/users/mangiola.s")
query(eh, "SubcellularSpatialData")

# Brain Mouse data
tx = eh[["EH8230"]]
tx |> filter(sample_id=="Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs") |> nrow()
# 62,744,602

An overview of the data

tx_small =  tx[sample(seq_len(nrow(tx)), size = nrow(tx)/500),]

Let’s preview the object. The data is contained in a simple data frame.

tx_small |> 
  head() |> 
  knitr::kable(
  format = "html"
)

sample_id	cell	gene	genetype	x	y	counts	region	technology	level	Level0	Level1	Level2	Level3	Level4	Level5	Level6	Level7	Level8	Level9	Level10	Level11	transcript_id	overlaps_nucleus	z_location	qv
Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs	136818	Gng12	Gene	6340.038	6671.3477	1	NA	Xenium	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2.819560e+14	1	21.06523	40.00000
Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs	6617	Strip2	Gene	6124.914	1707.9254	1	fiber tracts	Xenium	Level1	root	fiber tracts	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2.815695e+14	0	13.99666	40.00000
Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs	NA	Calb2	Gene	6237.329	884.7155	1	LZ	Xenium	Level6	root	grey	BS	IB	NA	HY	LZ	NA	NA	NA	NA	NA	2.814836e+14	0	16.01127	40.00000
Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs	8758	Necab1	Gene	9086.885	3002.8480	1	TEa5	Xenium	Level11	root	grey	CH	CTX	CTXpl	Isocortex	TEa	NA	NA	NA	NA	TEa5	2.817327e+14	0	18.94937	40.00000
Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs	145515	Igf2	Gene	8002.925	1222.2137	1	CTXsp	Xenium	Level5	root	grey	CH	CTX	NA	CTXsp	NA	NA	NA	NA	NA	NA	2.815824e+14	1	17.08874	40.00000
Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs	116460	Car4	Gene	2882.634	3161.9277	1	VPM	Xenium	Level9	root	grey	BS	IB	NA	TH	DORsm	VENT	VP	VPM	NA	NA	2.816940e+14	0	15.33951	17.10761

We can appreciate how, even subsampling the data 1 in 500, we still have a vast amount of data to visualise.

tx_small |>
    ggplot(aes(x, y, colour = region)) +
    geom_point(pch = ".") +
    facet_wrap(~sample_id, ncol = 2) +
    coord_fixed() +
    theme_minimal() +
    theme(legend.position = "none")

This dataset have been annotated for regions. Let’s have a look how many regions have been annotated

tx_small |> 
  distinct(region)

## # A tibble: 146 × 1
##    region      
##    <chr>       
##  1 <NA>        
##  2 fiber tracts
##  3 LZ          
##  4 TEa5        
##  5 CTXsp       
##  6 VPM         
##  7 SSp-bfd5    
##  8 cc          
##  9 ENTl5       
## 10 SSp         
## # ℹ 136 more rows

From this large dataset, we select a small reagion for illustrative purposes

tx_small_region =
  tx |>
    filter(x |> between(3700, 4200), y |> between(5000, 5500))

Load the pre-saved data

2. MoleculeExperiment

The R package MoleculeExperiment includes functions to create and manipulate objects from the newly introduced MoleculeExperiment class, designed for analyzing molecule-based spatial transcriptomics data from platforms such as Xenium by 10X, CosMx SMI by Nanostring, and Merscope by Vizgen, among others.

Although in this session we will not use MoleculeExperiment class, because of the lack of segmentation boundary information (we rather have cell identifiers), we briefly introduce this package because as an important part of Bioconductor.

We show how we would import our table of probe location into a MoleculeExperiment. At the end of the Session, for knowledge, we will navigate the example code given in the vignette material.

library(MoleculeExperiment)

repoDir = system.file("extdata", package = "MoleculeExperiment")
repoDir = paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")

me = readXenium(repoDir, keepCols = "essential")
me

## MoleculeExperiment class
## 
## molecules slot (1): detected
## - detected:
## samples (2): sample1 sample2
## -- sample1:
## ---- features (137): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (962)
## ---- location range: [4900,4919.98] x [6400.02,6420]
## -- sample2:
## ---- features (143): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (777)
## ---- location range: [4900.01,4919.98] x [6400.16,6419.97]
## 
## 
## boundaries slot (1): cell
## - cell:
## samples (2): sample1 sample2
## -- sample1:
## ---- segments (5): 67500 67512 67515 67521 67527
## -- sample2:
## ---- segments (9): 65043 65044 ... 65070 65071

In this object, besides the single molecule location, we have cell segmentation boundaries. We can use these boudaries to understand subcellular localisation of molecules and to aggregate molecules in cells.

ggplot_me() +
  geom_polygon_me(me, assayName = "cell", fill = "#F8DE7E", color="grey") +
  geom_point_me(me) +
  # zoom in to selected patch area
  coord_cartesian(
    xlim = c(4900, 4919.98),
    ylim = c(6400.02, 6420)
  )

In this object we don’t only have the cell segmentation but the nucleous segmentation as well.

boundaries(me, "nucleus") = readBoundaries(
  dataDir = repoDir,
  pattern = "nucleus_boundaries.csv",
  segmentIDCol = "cell_id",
  xCol = "vertex_x",
  yCol = "vertex_y",
  keepCols = "essential",
  boundariesAssay = "nucleus",
  scaleFactorVector = 1
)

boundaries(me, "cell")

## $cell
## $cell$sample1
## $cell$sample1$`67500`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4905.      6400.
##  2      4899.      6401.
##  3      4894.      6408.
##  4      4890.      6418.
##  5      4887.      6423.
##  6      4887.      6425.
##  7      4890.      6427.
##  8      4891.      6427.
##  9      4894.      6426.
## 10      4908.      6421.
## 11      4906.      6404.
## 12      4905.      6400.
## 13      4905.      6400.
## 
## $cell$sample1$`67512`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4906.      6404.
##  2      4906.      6408.
##  3      4907.      6412.
##  4      4907.      6415.
##  5      4908.      6421.
##  6      4910.      6418.
##  7      4914.      6414.
##  8      4914.      6413.
##  9      4914.      6412.
## 10      4914.      6412.
## 11      4911.      6408.
## 12      4906.      6405.
## 13      4906.      6404.
## 
## $cell$sample1$`67515`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4909.      6396.
##  2      4905.      6399.
##  3      4906.      6403.
##  4      4906.      6404.
##  5      4912.      6408.
##  6      4914.      6413.
##  7      4917.      6410.
##  8      4920.      6408.
##  9      4922.      6404.
## 10      4916.      6397.
## 11      4913.      6396.
## 12      4910.      6396.
## 13      4909.      6396.
## 
## $cell$sample1$`67521`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4920.      6408.
##  2      4916.      6411.
##  3      4916.      6412.
##  4      4914.      6413.
##  5      4914.      6414.
##  6      4910.      6418.
##  7      4908.      6421.
##  8      4919.      6428.
##  9      4918.      6422.
## 10      4918.      6418.
## 11      4920.      6413.
## 12      4920.      6410.
## 13      4920.      6408.
## 
## $cell$sample1$`67527`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4922.      6405.
##  2      4920.      6408.
##  3      4920.      6413.
##  4      4918.      6418.
##  5      4919.      6428.
##  6      4922.      6432.
##  7      4927.      6430.
##  8      4927.      6414.
##  9      4929.      6409.
## 10      4929.      6408.
## 11      4928.      6408.
## 12      4923.      6405.
## 13      4922.      6405.
## 
## 
## $cell$sample2
## $cell$sample2$`65043`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4897.      6413.
##  2      4895.      6414.
##  3      4894.      6418.
##  4      4892.      6421.
##  5      4886.      6423.
##  6      4888.      6426.
##  7      4897.      6430.
##  8      4904.      6429.
##  9      4901.      6425.
## 10      4901.      6419.
## 11      4902.      6417.
## 12      4900.      6413.
## 13      4897.      6413.
## 
## $cell$sample2$`65044`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4902.      6417.
##  2      4902.      6419.
##  3      4901.      6419.
##  4      4901.      6423.
##  5      4902.      6425.
##  6      4905.      6429.
##  7      4910.      6431.
##  8      4912.      6424.
##  9      4912.      6420.
## 10      4907.      6418.
## 11      4904.      6417.
## 12      4902.      6417.
## 13      4902.      6417.
## 
## $cell$sample2$`65051`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4916.      6413.
##  2      4912.      6420.
##  3      4910.      6431.
##  4      4914.      6439.
##  5      4919.      6444.
##  6      4924.      6443.
##  7      4928.      6442.
##  8      4934.      6437.
##  9      4938.      6429.
## 10      4939.      6424.
## 11      4922.      6417.
## 12      4917.      6413.
## 13      4916.      6413.
## 
## $cell$sample2$`65055`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4912.      6398.
##  2      4907.      6401.
##  3      4904.      6407.
##  4      4902.      6410.
##  5      4900.      6414.
##  6      4902.      6417.
##  7      4904.      6417.
##  8      4912.      6420.
##  9      4914.      6418.
## 10      4917.      6409 
## 11      4916.      6405.
## 12      4912.      6398.
## 13      4912.      6398.
## 
## $cell$sample2$`65063`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4930.      6408.
##  2      4925.      6410.
##  3      4921.      6410.
##  4      4918.      6409.
##  5      4917.      6412.
##  6      4922.      6417.
##  7      4927.      6419.
##  8      4931.      6421.
##  9      4939.      6423.
## 10      4939.      6423.
## 11      4938.      6422.
## 12      4930.      6408.
## 13      4930.      6408.
## 
## $cell$sample2$`65064`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4891.      6399.
##  2      4888.      6400.
##  3      4888.      6410.
##  4      4892.      6411.
##  5      4895.      6414.
##  6      4897.      6413.
##  7      4900.      6413.
##  8      4902.      6410.
##  9      4904.      6407.
## 10      4907.      6401.
## 11      4900.      6401.
## 12      4893.      6399.
## 13      4891.      6399.
## 
## $cell$sample2$`65067`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4925.      6403.
##  2      4924.      6403.
##  3      4922.      6403.
##  4      4921.      6403.
##  5      4916.      6405.
##  6      4918.      6409 
##  7      4921.      6410.
##  8      4925.      6410.
##  9      4927.      6409.
## 10      4930.      6408.
## 11      4930.      6408.
## 12      4925.      6403.
## 13      4925.      6403.
## 
## $cell$sample2$`65070`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4901.      6389.
##  2      4899.      6391.
##  3      4899.      6392.
##  4      4896.      6395.
##  5      4892.      6399.
##  6      4897.      6400.
##  7      4900.      6400.
##  8      4908.      6400.
##  9      4911.      6398.
## 10      4912.      6397.
## 11      4909.      6390.
## 12      4902.      6389.
## 13      4901.      6389.
## 
## $cell$sample2$`65071`
## # A tibble: 13 × 2
##    x_location y_location
##         <dbl>      <dbl>
##  1      4924.      6394.
##  2      4922.      6395.
##  3      4917.      6396.
##  4      4912.      6397.
##  5      4912.      6398.
##  6      4916.      6405.
##  7      4922.      6403.
##  8      4923.      6403.
##  9      4925.      6402.
## 10      4925.      6401.
## 11      4925.      6400.
## 12      4925.      6394.
## 13      4924.      6394.

showMolecules(me)

## List of 1
##  $ detected:List of 2
##   ..$ sample1:List of 137
##   .. ..$ 2010300C02Rik        : tibble [11 × 2] (S3: tbl_df/tbl/data.frame)
##   .. ..$ Acsbg1               : tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
##   .. .. [list output truncated]
##   ..$ sample2:List of 143
##   .. ..$ 2010300C02Rik: tibble [9 × 2] (S3: tbl_df/tbl/data.frame)
##   .. ..$ Acsbg1       : tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
##   .. .. [list output truncated]

bds_colours = setNames(
  c("#aa0000ff", "#ffaaffff"),
  c("Region 1", "Region 2")
)

ggplot_me() +
  # add cell segments and colour by cell id
  geom_polygon_me(me, byFill = "segment_id", colour = "black", alpha = 0.1) +
  # add molecule points and colour by feature name
  geom_point_me(me, byColour = "feature_id", size = 0.1) +
  # add nuclei segments and colour the border with red
  geom_polygon_me(me, assayName = "nucleus", fill = NA, colour = "red") +
  # zoom in to selected patch area
  coord_cartesian(xlim = c(4900, 4919.98), ylim = c(6400.02, 6420))

rm(me)
gc()

##            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 10414128 556.2   18222902 973.3         NA 18222902 973.3
## Vcells 44239081 337.6   75186197 573.7      65536 49330982 376.4

We can organise our large data frame containing single molecules into a more efficient MoleculeExperiment.

library(MoleculeExperiment)

tx_small_me = 
  tx_small |> 
    select(sample_id, gene, x, y) |> 
    dataframeToMEList(
        dfType = "molecules",
        assayName = "detected",
        sampleCol = "sample_id",
        factorCol = "gene",
        xCol = "x",
        yCol = "y"
    ) |> 
    MoleculeExperiment()

tx_small_me

## MoleculeExperiment class
## 
## molecules slot (1): detected
## - detected:
## samples (3): Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs
##   Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs
##   Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs:
## ---- features (496): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (125531)
## ---- location range: [20,10197.56] x [31.25,7021.59]
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs:
## ---- features (483): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (116929)
## ---- location range: [28.65,10256.49] x [43.5,7012.21]
## -- Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs:
## ---- features (488): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
## ---- molecules (119310)
## ---- location range: [10.6,9656.13] x [45.85,7884.78]
## 
## 
## boundaries slot: NULL

Here, we can appreciate the difference in size between the redundant data frame

tx_small |> 
  object.size() |> 
  format(units = "auto")

## [1] "69.1 Mb"

and the MoleculeExperiment.

tx_small_me |> 
  object.size() |> 
  format(units = "auto")

## [1] "7 Mb"

rm(tx_small)
rm(tx_small_me)
gc()

##            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 10415689 556.3   18222902 973.3         NA 18222902 973.3
## Vcells 35124227 268.0   75186197 573.7      65536 62460398 476.6

A preview of a zoomed in section of the tissue

Now let’s try to visualise just a small section. You can appreciate, coloured by cell, single molecules. You cqn also appreciate the difference in density between regions. An aspect to note, is that not all probes are withiin cells. This depends on the segmentation process.

brewer.pal(7, "Set1")

## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00" "#FFFF33" "#A65628"

tx_small_region |>
  filter(!is.na(cell)) |> 
  slice_sample(prop = 0.3) |> 
  ggplot(aes(x, y, colour = factor(cell))) +
  geom_point(shape=".") +
  
  facet_wrap(~sample_id, ncol = 2) +
  scale_color_manual(values = sample(colorRampPalette(brewer.pal(8, "Set2"))(1800))) +
  coord_fixed() +
  theme_minimal() +
  theme(legend.position = "none")

Let’s have a look to the probes that have not being unassigned to cells.

tx_small_region |>
  filter(is.na(cell)) |> 
  ggplot(aes(x, y, colour = factor(cell))) +
  geom_point(shape=".") +
  
  facet_wrap(~sample_id, ncol = 2) +
  scale_color_manual(values = sample(colorRampPalette(brewer.pal(8, "Set2"))(1800))) +
  coord_fixed() +
  theme_minimal() +
  theme(legend.position = "none")

rm(tx_small_region)
gc()

##            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 10439300 557.6   18222902 973.3         NA 18222902 973.3
## Vcells 20357963 155.4   60148958 459.0      65536 75162802 573.5

Exercise 3.1

We want to understand how much data we are discarting, that does not have a cell identity.

Using base r/Bioconductor grammar calculate what is the ratio of outside-cell vs within-cell, probes
Reproduce the same calculation with tidyomics

3. Aggregation and analysis

We will convert our cell by gene count to a SpatialExperiment. This object stores a cell by gene matrix with relative XY coordinates.

SubcellularSpatialData has a utility function that aggregated the single molecules in cells, where these cell ID have been identified with segmentation.

tx_spe = SubcellularSpatialData::tx2spe(tx)

 tx_spe = tx_spe |> mutate(in_tissue = TRUE)

A trivial edit to work with ggspavis.

tx_spe = tx_spe |> mutate(in_tissue = TRUE)

Let’s have a look to our SpatialExperiment.

tx_spe

## # A SpatialExperiment-tibble abstraction: 474,734 × 9
## # [90mFeatures = 541 | Cells = 474734 | Assays = counts[0m
##    .cell      sample_id cell_id transcript_id    qv region in_tissue     x     y
##    <chr>      <fct>     <fct>           <dbl> <dbl> <fct>  <lgl>     <dbl> <dbl>
##  1 Xenium_V1… 1         1             2.82e14  31.4 CP     TRUE      1557. 2529.
##  2 Xenium_V1… 1         10            2.82e14  32.2 CP     TRUE      1631. 2543.
##  3 Xenium_V1… 1         100           2.82e14  30.7 Isoco… TRUE       834. 3109.
##  4 Xenium_V1… 1         1000          2.82e14  31.4 RSPv5  TRUE      4932. 5720.
##  5 Xenium_V1… 1         10000         2.82e14  31.5 LA     TRUE      1667. 2159.
##  6 Xenium_V1… 1         100000        2.82e14  33.6 VISa1  TRUE      3558. 6587.
##  7 Xenium_V1… 1         100001        2.82e14  31.7 VISa1  TRUE      3570. 6583.
##  8 Xenium_V1… 1         100002        2.82e14  33.9 SSp    TRUE      3430. 6157.
##  9 Xenium_V1… 1         100003        2.82e14  32.5 SSp    TRUE      3431. 6120.
## 10 Xenium_V1… 1         100004        2.82e14  33.5 SSp    TRUE      3436. 6140.
## # ℹ 474,724 more rows

Let’s have a look at how many cells have been detected for each region

tx_spe |> 
  add_count(region) |> 
  ggplot(aes(fct_reorder(region, n, .desc = TRUE))) +
  geom_bar() +
  theme_bw() +
  theme(axis.text.x  = element_text(angle=90, hjust=1, size = 2))

We normalise the SpatialExperiment using scater.

tx_spe = 
  tx_spe |> 
  
  # Scaling and tranformation
  scater::logNormCounts()

We then visualise what is the relationship between variance and total expression across cells.

tx_spe |> 
  
  # Gene variance
  scran::modelGeneVar(block = tx_spe$sample_id) |> 

  # Reformat for plotting
  as_tibble(rownames  = "feature") |> 
  
  # Plot
  ggplot(aes(mean, total)) +
  geom_point() +
  geom_smooth(color="red")+
  xlab("Mean log-expression") + 
  ylab("Variance") +
  theme_bw()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

For further analysis, we subset the dataset to allow quicker calculations.

tx_spe_sample_1 = 
  tx_spe |>
  filter(sample_id=="1") |> 
  slice_sample(prop = 0.2)

As we have done previously, we calculate variable informative genes, for further analyses.

genes <- !grepl(pattern = "NegControl.+|BLANK.+", x = rownames(tx_spe_sample_1))

# Get the top 2000 genes.
top.hvgs = 
  tx_spe_sample_1 |>

  scran::modelGeneVar(subset.row = genes) |> 
  
  # Model gene variance and select variable genes per sample
  getTopHVGs(n=200)

The selected subset of genes can then be passed to the subset.row argument (or equivalent) in downstream steps.

tx_spe_sample_1 =  
  tx_spe_sample_1 |> 
  fixedPCA( subset.row=top.hvgs )

We then use the gene expression to cluster sales based on their similarity and represent these clusters in a two dimensional embeddings (UMAP)

Louvain clustering is a popular method used in single-cell RNA sequencing (scRNA-seq) data analysis to identify groups of cells with similar gene expression profiles. This method is based on the Louvain algorithm, which is widely used for detecting community structures in large networks.

The Louvain algorithm is designed to maximize a metric known as modularity, which measures the density of edges inside communities compared to edges outside communities.

It operates in two phases:

first, it looks for small communities by optimizing modularity locally, and
second it aggregates nodes belonging to the same community and repeats the process.

Blondel et al., 2008

cluster_labels = 
  tx_spe_sample_1 |> 
   clusterCells(
     use.dimred="PCA", 
     BLUSPARAM=bluster::NNGraphParam(k=20, cluster.fun="louvain")
    ) |> 
   as.character()

cluster_labels |> 
  head()

## [1] "1" "1" "2" "3" "4" "5"

Now we add this cluster column to our SpatialExperiment

tx_spe_sample_1 = 
  tx_spe_sample_1 |> 
  mutate(clusters = cluster_labels)

As we have done before, we caculate UMAPs for visualisation purposes.

This step takes long time.

## Check how many
tx_spe_sample_1 = 
  tx_spe_sample_1 |>
  runUMAP()

Now, let’s visualise the clusters in UMAP space.

tx_spe_sample_1 |> 
  plotUMAP(colour_by = "clusters") +
  scale_color_discrete(
    colorRampPalette(brewer.pal(9, "Set1"))(30)
  )

## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Exercise 3.2

Let’s try to understand the identity of these clusters performing gene marker detection.

In the previous sections we have seen how to do gene marker selection for sequencing-based spatial data. We just have to adapt it to our current scenario.

Score the markers
Filter top markers
Focus on Cluster one and try to guess the cell type
Plot the umap colouring by the top marker of cluster 1 (plotReducedDim)

Too understand whether the cell clusters explain morphology as opposed to merely cell identity, we can color cells according to annotated region. As we can see we have a lot of regions. We have more regions that cell clusters.

tx_spe_sample_1 |> 
  plotUMAP(colour_by = "region") +
  scale_color_discrete(
    brewer.pal(n = 30, name = "Set1")
  ) +
  guides(color="none")

## Warning in brewer.pal(n = 30, name = "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Let’s try to understand the morphological distribution of cell clusters in space.

Plot ground truth in tissue map.

tx_spe_sample_1 |> 
    ggspavis::plotSpots(annotate = "clusters") + 
    guides(color = "none")

# For comparison the annotated regions
tx_spe_sample_1 |> 
  ggspavis::plotSpots(annotate = "region") + 
      scale_color_manual(values = colorRampPalette( brewer.pal(9,"Set1") )(150) ) +
  guides(color = "none")

## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Exercise 3.3

Spatial-aware clustering: Apply the spatial aware clustering method BANKSY. Taking as example the code run for session 2.

4. Neighborhood analyses

hoodscanR Liu et al., 2024

Source

Algorithm:

Nearest cells detection by Approximate Nearest Neighbor (ANN) search algorithm
Calculating euclidean distance matrix between cells and their k-nearest neighbors
Cell-level annotations provided by users are used to construct a cell annotation matrix
Identify cellular neighborhoods uses the SoftMax function, enhanced by a “shape” parameter that governs the “influence radious”. This measures probability of a cell type to be found in a neighbour.
The K-means clustering algorithm finds recurring neighbours

In order to perform neighborhood scanning, we need to firstly identify k (in this example, k = 100) nearest cells for each cells. The searching algorithm is based on Approximate Near Neighbor (ANN) C++ library from the RANN package.

tx_spe_neighbours = 
  tx_spe_sample_1 |> 
  readHoodData(anno_col = "clusters") |> 
  findNearCells(k = 100)

The output of findNearCells function includes two matrix, an annotation matrix and a distance matrix.

tx_spe_neighbours$cells[1:10, 1:5]

##                                                     nearest_cell_1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069              18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418              1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043             22
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673               1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715               5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247                6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718             15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988               8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774              6
##                                                     nearest_cell_2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416               3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069              18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418              2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043             22
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673               5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715               5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247                2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718             15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988              23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774              6
##                                                     nearest_cell_3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069              18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418              2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043              8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673               4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715              16
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247                2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718             15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988              23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774              3
##                                                     nearest_cell_4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069              18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418              2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043              1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673               4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715              17
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247                2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718             15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988              23
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774              3
##                                                     nearest_cell_5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416               3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418              2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043              8
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715               6
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247                2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718              7
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988               1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774              3

tx_spe_neighbours$distance[1:10, 1:5]

##                                                     nearest_cell_1
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416       20.521773
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069       14.848379
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418      11.846450
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043      12.881435
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673       16.468559
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715       14.614673
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247         7.758216
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718      10.179061
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988       29.404332
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774      32.198567
##                                                     nearest_cell_2
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416        26.51370
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069        34.84461
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418       22.36816
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043       14.63369
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673        31.74507
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715        21.56662
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247         18.42152
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718       24.94423
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988        36.23650
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774       32.40975
##                                                     nearest_cell_3
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416        33.08725
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069        34.97209
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418       26.63320
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043       15.07115
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673        31.89583
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715        46.39735
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247         19.11510
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718       39.78288
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988        37.02099
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774       48.70481
##                                                     nearest_cell_4
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416        33.90145
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069        38.92746
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418       31.45865
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043       18.36453
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673        33.21731
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715        54.88479
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247         26.49622
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718       53.91673
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988        37.34226
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774       54.49694
##                                                     nearest_cell_5
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416        36.40826
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069        42.13184
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_143418       39.71453
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_161043       24.01902
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_46673        54.53777
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_64715        58.14167
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_5247         28.44833
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_142718       55.31210
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_25988        40.81428
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_128774       57.26670

We can then perform neighborhood analysis using the function scanHoods. This function incldue the modified softmax algorithm, aimming to genereate a matrix with the probability of each cell associating with their 100 nearest cells.

  # Calculate neighbours
pm <- scanHoods(tx_spe_neighbours$distance)

## Tau is set to: 4463.8145259849

 # We can then merge the probabilities by the cell types of the 100 nearest cells. We get the probability distribution of each cell all each neighborhood. 
hoods <- mergeByGroup(pm, tx_spe_neighbours$cells)

hoods[1:2, 1:10]

##                                                               1 10 11
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 3.033042e-03  0  0
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 2.712486e-05  0  0
##                                                            12          13 14 15
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.00000000 0.000650276  0  0
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.02398078 0.003108832  0  0
##                                                            16           17
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.00000000 4.243489e-05
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.01849357 3.374641e-02
##                                                           18
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_42416 0.0000000
## Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs_21069 0.7780374

We plot randomly plot 50 cells to see the output of neighborhood scanning using plotHoodMat. In this plot, each value represent the probability of the each cell (each row) located in each cell type neighborhood. The rowSums of the probability maxtrix will always be 1.

hoods |> 
plotHoodMat(n = 50)

We can then merge the neighborhood results with the SpatialExperiment object using mergeHoodSpe so that we can conduct more neighborhood-related analysis.

tx_spe_sample_1 =  tx_spe_sample_1 |> mergeHoodSpe(hoods)

We can see what are the neighborhood distributions look like in each cluster using plotProbDist.

tx_spe_sample_1 |> 
  plotProbDist(
    pm_cols = colnames(hoods),
    by_cluster = TRUE, 
    plot_all = TRUE, 
    show_clusters = as.character(seq(10))
    )

Session Information

sessionInfo()

## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Australia/Adelaide
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MoleculeExperiment_1.4.0        scico_1.5.0                    
##  [3] hoodscanR_1.2.0                 tidySpatialExperiment_1.1.1    
##  [5] SpatialExperiment_1.14.0        tidySummarizedExperiment_1.15.0
##  [7] ttservice_0.4.0                 tidySingleCellExperiment_1.15.2
##  [9] ExperimentHub_2.12.0            AnnotationHub_3.12.0           
## [11] BiocFileCache_2.12.0            dbplyr_2.5.0                   
## [13] scran_1.32.0                    scater_1.32.0                  
## [15] scuttle_1.14.0                  SingleCellExperiment_1.26.0    
## [17] SummarizedExperiment_1.34.0     Biobase_2.64.0                 
## [19] GenomicRanges_1.56.0            GenomeInfoDb_1.40.0            
## [21] IRanges_2.38.0                  S4Vectors_0.42.0               
## [23] BiocGenerics_0.50.0             MatrixGenerics_1.16.0          
## [25] matrixStats_1.3.0               RColorBrewer_1.1-3             
## [27] ggspavis_1.9.10                 dittoSeq_1.16.0                
## [29] colorspace_2.1-0                forcats_1.0.0                  
## [31] stringr_1.5.1                   glue_1.7.0                     
## [33] purrr_1.0.2                     tidyr_1.3.1                    
## [35] dplyr_1.1.4                     plotly_4.10.4                  
## [37] ggplot2_3.5.1                   here_1.0.1                     
## 
## loaded via a namespace (and not attached):
##   [1] RcppAnnoy_0.0.22          splines_4.4.0            
##   [3] later_1.3.2               bitops_1.0-7             
##   [5] filelock_1.0.3            tibble_3.2.1             
##   [7] lifecycle_1.0.4           doParallel_1.0.17        
##   [9] edgeR_4.2.0               rprojroot_2.0.4          
##  [11] lattice_0.22-6            magrittr_2.0.3           
##  [13] limma_3.60.0              sass_0.4.9               
##  [15] rmarkdown_2.27            jquerylib_0.1.4          
##  [17] yaml_2.3.8                metapod_1.12.0           
##  [19] httpuv_1.6.15             ggside_0.3.1             
##  [21] cowplot_1.1.3             DBI_1.2.2                
##  [23] abind_1.4-5               zlibbioc_1.50.0          
##  [25] RCurl_1.98-1.14           rappdirs_0.3.3           
##  [27] circlize_0.4.16           GenomeInfoDbData_1.2.12  
##  [29] ggrepel_0.9.5             irlba_2.3.5.1            
##  [31] terra_1.7-71              pheatmap_1.0.12          
##  [33] dqrng_0.4.0               DelayedMatrixStats_1.26.0
##  [35] codetools_0.2-20          DelayedArray_0.30.1      
##  [37] shape_1.4.6.1             tidyselect_1.2.1         
##  [39] UCSC.utils_1.0.0          farver_2.1.2             
##  [41] ScaledMatrix_1.12.0       viridis_0.6.5            
##  [43] jsonlite_1.8.8            GetoptLong_1.0.5         
##  [45] BiocNeighbors_1.22.0      ellipsis_0.3.2           
##  [47] iterators_1.0.14          ggridges_0.5.6           
##  [49] foreach_1.5.2             tools_4.4.0              
##  [51] Rcpp_1.0.12               gridExtra_2.3            
##  [53] SparseArray_1.4.3         xfun_0.44                
##  [55] mgcv_1.9-1                EBImage_4.46.0           
##  [57] withr_3.0.0               BiocManager_1.30.23      
##  [59] fastmap_1.2.0             bluster_1.14.0           
##  [61] fansi_1.0.6               digest_0.6.35            
##  [63] rsvd_1.0.5                R6_2.5.1                 
##  [65] mime_0.12                 Cairo_1.6-2              
##  [67] jpeg_0.1-10               RSQLite_2.3.6            
##  [69] utf8_1.2.4                generics_0.1.3           
##  [71] data.table_1.15.4         httr_1.4.7               
##  [73] htmlwidgets_1.6.4         S4Arrays_1.4.0           
##  [75] uwot_0.2.2                pkgconfig_2.0.3          
##  [77] gtable_0.3.5              blob_1.2.4               
##  [79] ComplexHeatmap_2.20.0     XVector_0.44.0           
##  [81] htmltools_0.5.8.1         fftwtools_0.9-11         
##  [83] clue_0.3-65               scales_1.3.0             
##  [85] png_0.1-8                 knitr_1.46               
##  [87] rstudioapi_0.16.0         rjson_0.2.21             
##  [89] nlme_3.1-164              curl_5.2.1               
##  [91] GlobalOptions_0.1.2       cachem_1.1.0             
##  [93] BiocVersion_3.19.1        parallel_4.4.0           
##  [95] vipor_0.4.7               AnnotationDbi_1.66.0     
##  [97] pillar_1.9.0              grid_4.4.0               
##  [99] vctrs_0.6.5               RANN_2.6.1               
## [101] promises_1.3.0            BiocSingular_1.20.0      
## [103] beachmat_2.20.0           xtable_1.8-4             
## [105] cluster_2.1.6             beeswarm_0.4.0           
## [107] evaluate_0.23             magick_2.8.3             
## [109] cli_3.6.2                 locfit_1.5-9.9           
## [111] compiler_4.4.0            rlang_1.1.3              
## [113] crayon_1.5.2              labeling_0.4.3           
## [115] ggbeeswarm_0.7.2          stringi_1.8.4            
## [117] viridisLite_0.4.2         BiocParallel_1.38.0      
## [119] munsell_0.5.1             Biostrings_2.72.0        
## [121] lazyeval_0.2.2            tiff_0.1-12              
## [123] Matrix_1.7-0              sparseMatrixStats_1.16.0 
## [125] bit64_4.0.5               KEGGREST_1.44.0          
## [127] statmod_1.5.0             shiny_1.8.1.1            
## [129] highr_0.10                tidygate_0.5.3           
## [131] igraph_2.0.3              memoise_2.0.1            
## [133] bslib_0.7.0               bit_4.0.5

References

Imaging assays (tidy)

Stefano Mangiola, South Australian immunoGENomics Cancer Institute¹, Walter and Eliza Hall Institute²

Luciano Martellotto, Adelaide Centre for Epigenetics, South Australian immunoGENomics Cancer Institute³

Session 3 – Spatial analyses of imaging data

1. Working with imaging-based data in Bioconductor with MoleculeExperiment

SubcellularSpatialData

An overview of the data

2. MoleculeExperiment

A preview of a zoomed in section of the tissue

3. Aggregation and analysis

4. Neighborhood analyses

Imaging assays (tidy)

Stefano Mangiola, South Australian immunoGENomics Cancer Institute1, Walter and Eliza Hall Institute2

Luciano Martellotto, Adelaide Centre for Epigenetics, South Australian immunoGENomics Cancer Institute3

Session 3 – Spatial analyses of imaging data

1. Working with imaging-based data in Bioconductor with MoleculeExperiment

SubcellularSpatialData

An overview of the data

2. MoleculeExperiment

A preview of a zoomed in section of the tissue

3. Aggregation and analysis

4. Neighborhood analyses

Stefano Mangiola, South Australian immunoGENomics Cancer Institute¹, Walter and Eliza Hall Institute²

Luciano Martellotto, Adelaide Centre for Epigenetics, South Australian immunoGENomics Cancer Institute³