##Packages

library(Seurat)
library(tidyverse)
library(remotes)
remotes::install_github(repo = 'satijalab/seurat', ref = 'develop')
library(arrow)

Importing Samples

I imported the file folders from Xenium, adding sample ID and treatment group to the metadata info for each cell. In this step, the assay and fov was assigned. Indicating the fov was important to maintain spatial data on merged objects.

This was applied to each sample:

C1<- LoadXenium("/Volumes/PRJ-SEIB/Postdocs/Alex/Xenium/20241002__050614__20241002_ALEX_MCCDEX/output-XETG00179__0029442__Region_3__20241002__050655", fov= "fovC1", assay = "Xenium")# reads file folder
C1$Sample <- "C1" # adds sample label

C1$Group <- "PCL" # adds treatment group

Subsetting

Once the data files have been read into R, I subset the files to only include the cells around the scaffold.

cell_ids_C1 <- read.csv("/Volumes/PRJ-SEIB/Students/Billie/MCCDEX/Scaffold_selection/C1_cells_stats.csv", header = FALSE, stringsAsFactors = FALSE) #reads CSV from downloads. i've added what region it's from
cell_ids_C1 <- as.vector(cell_ids_C1[[1]]) #converts list to vector
C1 <- subset(C1, cells = cell_ids_C1) #subsets file

qc

I didn’t use any QC, but this is what could be done:

hist(subset$nCount_Xenium, xlim=c(0,2000), breaks = 5000)
abline(v = 20)
thres <- quantile(subset$nCount_Xenium, c(0.98))
subset <- subset(subset, subset = nCount_Xenium >= 20 & nCount_Xenium <= thres[1])

Adding cell distance to and within the scaffold

library(sf)
coords <- GetTissueCoordinates(C1)
rownames(coords) <- coords$cell #sets cell barcode as rowname
coords$barcode <- rownames(coords)
coords <- coords[,-3] #rm old barcode column

# Convert to sf POINT object
cell_sf <- st_as_sf(coords, coords = c("x", "y"), crs = NA)
st_crs(cell_sf) <- 32633 # this is important to set scale

#create polygon
plot(st_geometry(cell_sf), asp = 1, main = "Click around implant, then press ESC")
implant_poly <- locator(type = "n") #gotta draw it after this ONLY RUN TO HERE, THEN DRAW, OTHERWISE WON'T WORK
implant_polygon <- load(file = "/Users/billierose/Desktop/RN_day14/C1_implant_poly")
implant_coords <- cbind(implant_poly$x, implant_poly$y)
implant_coords <- rbind(implant_coords, implant_coords[1, ])  # close the loop


# 3. Convert implant polygon to sf
implant_poly_sf <- st_polygon(list(implant_coords))  # make polygon
implant_poly_sf <- st_sfc(implant_poly_sf, crs = 32633)  # wrap in geometry
implant_poly_sf <- st_sf(geometry = implant_poly_sf)  # convert to sf

#check the polygon drawn
plot(st_geometry(cell_sf), asp = 1, main = "Cells and Implant Polygon")
plot(implant_poly_sf$geometry, add = TRUE, border = "red", lwd = 2)

# 4. Compute distance from each cell to polygon
dists <- st_distance(cell_sf, implant_poly_sf)

# 5. Flatten and name
dist_vec <- as.numeric(dists)
names(dist_vec) <- coords$barcode

# 6. Add to Seurat metadata 
C1$dist_outside_implant <- dist_vec[colnames(C1)]

# Add distance as a column to the sf object
cell_sf$dist <- dist_vec[rownames(cell_sf)]

Normalization

In tradition with single cell analysis, Seurat pipelines suggest using log-normalized counts for downstream analysis. However in this context, total counts was a technical artifact resulting from the PCR amplification steps in single cell RNA-seq protocol. In single-cell imaging-based spatially resolved transcriptomics, cell counts can contain biological information relevant to the spatial structure of the tissue. Moreover, as the gene-panel for Xenium is narrower than transcriptome wide scRNA-seq, meaning that counts has a greater biological skew so using it to normalize data could lead to FALSE negatives for differential expression analysis (Atta et al. 2024).

The traditional method of log-count normalization is:

C1 <- NormalizeData(C1, scale.factor=median(C1$nCount_Xenium))
C1 <- FindVariableFeatures(C1, nfeatures=nrow(C1))

SCTransform is also a scRNA-seq normalization method which uses a negative binomial regresssion and theoretically maintains a greater proportional of biological heterogenity. Subsequently, it was developed in the consideration of the artifacts of scRNA-seq (eg. mito contamination, ambient RNA contamination). Many papers push that for scRNA-seq SCTransform is a far better model than log-transormation, however for spatial data, papers split in their methods and there have been arguments for an against its use.

C1 <- SCTransform(C1)

Ideally, to normally distribute the data for differential expression analysis, we would find an alternative to count-based normalization methods. Relevant to Xenium, this could be cell area based, though cell area is also spatially structured and biologically relevant. Unfortunately, I haven’t been able to write code to get this to work. Or perhaps the internal scaling and gene selection panels of Xenium output does not require normalisation?

Banksy

Prior to Banksy, each sample was normalized, using this method. I am working on a way to run Banksy without normalisation but find that the function breaks when I try to run it on raw counts data, perhaps due to the Seurat object structure.

C1 <- readRDS(file = "/Volumes/PRJ-SEIB/Students/Billie/Files_with_distance/C1.RDS")
C1 <- NormalizeData(C1, scale.factor=median(C1$nCount_Xenium))
C1 <- FindVariableFeatures(C1, nfeatures=nrow(C1))

When running Banksy, the lambda parameter is regularization parameter that controls the degree of neighbour augmentation. For cell typing, it is recommended that a lambda of 0.2 is used and then for domain segmentation, 0.8 is recommended. I have used 0.2 for cell typing and have not yet compared this to 0.8 for segmenting spatial domains around the scaffold (Singhal et al., 2024).

library(Seurat)
library(sf)
library(SeuratWrappers)
library(Banksy)

C1 <- RunBanksy(C1, lambda = 0.2, k_geom = 15, assay = "Xenium", slot = "data", features = "all", verbose=TRUE)

Merging and clustering pipeline

After the above is run per sample, I merge all of the samples and follow the usual Seurat clustering pipeline:

library(Seurat)
library(SeuratWrappers)
library(harmony)

#read the merged file
dist <- readRDS(file = "/Volumes/PRJ-SEIB/Students/Billie/Files_with_distance/merged.banksy.RDS")

#workflow
dist <- ScaleData(dist)
dist <- RunPCA(dist, assay = 'BANKSY', features = rownames(dist)) 
ElbowPlot(dist)
dist <- RunHarmony(dist, group.by.vars = "Sample") # could also integrate layers like this: dist <- IntegrateLayers(object = dist, method = HarmonyIntegration, orig.reduction = "pca", new.reduction = "harmony", verbose = TRUE, dims = 1:30)
dist <- FindNeighbors(dist, dims = 1:20, reduction = "harmony")
dist <- FindClusters(dist, resolution = 0.8, cluster.name = "spat.clusters")
dist <- RunUMAP(dist, reduction = "harmony", dims = 1:20, verbose = FALSE, reduction.name = "spat.dim", n.components =2L)

Note: within this workflow, the Elbowplot informs the dimensions of the UMAP. Alternatively, the eigenvalues>1 can be used to compute dimensions. I use n.components = 2 for 2D umaps/3D with space and 3 for 3D umapping.

Phenotyping

For cell typing, I had the choice of manual annotation based on cluster markers or to use an automated supervised method. I chose to use SingleR to phenotype each cell individually, then to use the UMAP clusters to indicate cluster identity based on the most common SingleR assigned phenotype per cluster. I opted for this technique for multiple reasons. Firstly, SingleR was benchmarked as a cell type method on par with manual typing (Cheng et al., 2025) for spatial transcriptomics data. However, I did find that some cell type were seemingly inaccurate, perhaps due to a low count of transcripts being read from them (I have essentially no QC), so cell types that didn’t quite make sense (hepatocytes, neuronal, myocardiocytes) were being labelled. This is why I integrated the umap spatially-informed unsupervised clustering with the SingleR methodology. It seems ‘correct’ though does sacrifice specificity for accuracy.

Note: normalized but not spatially augmented transcript reads were used here. But cluster assignment here is spatially informed.

library(SingleR)
library(Seurat)

#add phenotype labels
phenotyping <- readRDS(file = "/Volumes/PRJ-SEIB/Students/Billie/MCCDEX\ RDS\ files/MCCDEX_subset_raw.RDS")
phenotyping <- NormalizeData(phenotyping)  #should try it with inidivudally normalized?
phenotyping <- ScaleData(phenotyping)
phenotyping.sce <- JoinLayers(object = phenotyping, layers = c("counts", "data")) # if using lognorm
counts_matrix <- GetAssayData(phenotyping.sce, layer = "counts") # if using lognorm
phenotyping.sce <- as.SingleCellExperiment(phenotyping.sce, assay = "Xenium")
mouseref <- celldex::MouseRNAseqData(ensembl=FALSE)
phenotype <- SingleR(test = phenotyping.sce, ref = mouseref, labels = mouseref$label.main)
dist$SingleR.label <- phenotype$labels #add labels to banksy object

#group by %
cluster_annotation <- dist@meta.data %>%
  group_by(seurat_clusters, SingleR.label) %>%
  summarise(n = n(), .groups = 'drop')  # count number of cells per (cluster, label)
dominant_labels <- cluster_annotation %>%
  group_by(seurat_clusters) %>%
  slice_max(order_by = n, n = 1, with_ties = FALSE)
cluster2label <- setNames(dominant_labels$SingleR.label, dominant_labels$seurat_clusters)
cell_labels <- cluster2label[as.character(dist$seurat_clusters)]
names(cell_labels) <- rownames(dist@meta.data)
dist <- AddMetaData(dist, metadata = cell_labels, col.name = "SingleR.cluster.label")

Comparison of methods

This uses median normalise data that has been spatially informed with banksy and harmony integrated (like the above protocol)

library(Seurat)
## Loading required package: SeuratObject
## Loading required package: sp
## 'SeuratObject' was built under R 4.4.1 but the current version is
## 4.4.2; it is recomended that you reinstall 'SeuratObject' as the ABI
## for R may have changed
## 
## Attaching package: 'SeuratObject'
## The following objects are masked from 'package:base':
## 
##     intersect, t
library(ggplot2)

subset <- readRDS(file = "/Volumes/PRJ-SEIB-1/Students/Billie/MCCDEX\ RDS\ files/subset.banksy.dimplot.RDS")

DimPlot(subset, reduction = "spat.dim", group.by = "spat.clusters", split.by = "Group", label = TRUE, label.color = "white", raster = TRUE, pt.size = 0.8, repel = TRUE, raster.dpi = c(700, 750)) +
  theme(
    panel.background = element_rect(fill = "black", color = "black"),
    plot.background = element_rect(fill = "black", color = "black"),
    axis.text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    axis.line = element_line(color = "white"),
    panel.grid = element_blank(),
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white"),
    strip.text = element_text(color = "white", size = 14, face = "bold"))

I can then add the SingleR labels and plot this.

library(Seurat)
library(SingleCellExperiment)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:SeuratObject':
## 
##     intersect
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
##     table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following object is masked from 'package:sp':
## 
##     %over%
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## 
## Attaching package: 'SummarizedExperiment'
## The following object is masked from 'package:Seurat':
## 
##     Assays
## The following object is masked from 'package:SeuratObject':
## 
##     Assays
library(SingleR)

phenotyping <- readRDS(file = "/Volumes/PRJ-SEIB-1/Students/Billie/MCCDEX\ RDS\ files/MCCDEX_subset_raw.RDS")
phenotyping <- NormalizeData(phenotyping)  #should try it with inidivudally normalized
## Normalizing layer: counts.1
## Normalizing layer: counts.2
## Normalizing layer: counts.3
## Normalizing layer: counts.4
## Normalizing layer: counts.5
## Normalizing layer: counts.6
## Normalizing layer: counts.7
## Normalizing layer: counts.8
## Normalizing layer: counts.9
## Normalizing layer: counts.10
## Normalizing layer: counts.11
## Normalizing layer: counts.12
## Normalizing layer: counts.13
phenotyping.sce <- JoinLayers(object = phenotyping, layers = c("counts", "data")) # if using lognorm
counts_matrix <- GetAssayData(phenotyping.sce, layer = "counts") # if using lognorm

phenotyping.sce <- as.SingleCellExperiment(phenotyping.sce, assay = "Xenium")
## Warning: `PackageCheck()` was deprecated in SeuratObject 5.0.0.
## ℹ Please use `rlang::check_installed()` instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Layer 'scale.data' is empty
mouseref <- celldex::MouseRNAseqData(ensembl=FALSE)
phenotype <- SingleR(test = phenotyping.sce, ref = mouseref, labels = mouseref$label.main)

#add labels back to spat.dim file
subset$SingleR.label <- phenotype$labels

#view 
DimPlot(subset, reduction = "spat.dim", group.by = "SingleR.label", split.by = "Group", label = TRUE, label.color = "white", raster = TRUE, pt.size = 0.8, repel = TRUE, raster.dpi = c(700, 750)) +
  theme(
    panel.background = element_rect(fill = "black", color = "black"),
    plot.background = element_rect(fill = "black", color = "black"),
    axis.text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    axis.line = element_line(color = "white"),
    panel.grid = element_blank(),
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white"),
    strip.text = element_text(color = "white", size = 14, face = "bold"))
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Then I can do % phenotype clustering on the same UMAP.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
## 
##     combine
## The following objects are masked from 'package:GenomicRanges':
## 
##     intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
## 
##     intersect
## The following objects are masked from 'package:IRanges':
## 
##     collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
## 
##     first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union
## The following object is masked from 'package:matrixStats':
## 
##     count
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
cluster_annotation <- subset@meta.data %>%
  group_by(seurat_clusters, SingleR.label) %>%
  summarise(n = n(), .groups = 'drop')  # count number of cells per (cluster, label)
dominant_labels <- cluster_annotation %>% # finds most common label
  group_by(seurat_clusters) %>%
  slice_max(order_by = n, n = 1, with_ties = FALSE)
cluster2label <- setNames(dominant_labels$SingleR.label, dominant_labels$seurat_clusters) #makes named vector cellID -> label
cell_labels <- cluster2label[as.character(subset$seurat_clusters)] # map cluster ident to cell
names(cell_labels) <- rownames(subset@meta.data)
subset <- AddMetaData(subset, metadata = cell_labels, col.name = "SingleR.cluster.label") # add to metadata

#view % clustering
DimPlot(subset, reduction = "spat.dim", group.by = "SingleR.cluster.label", split.by = "Group", label = TRUE, label.color = "white", raster = TRUE, pt.size = 0.8, repel = TRUE, raster.dpi = c(700, 750)) +
  theme(
    panel.background = element_rect(fill = "black", color = "black"),
    plot.background = element_rect(fill = "black", color = "black"),
    axis.text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    axis.line = element_line(color = "white"),
    panel.grid = element_blank(),
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white"),
    strip.text = element_text(color = "white", size = 14, face = "bold"))

Let’s look at the cluster counts for this

Idents(subset) <- "SingleR.cluster.label"
table(subset$Group, Idents(subset))
##      
##       Fibroblasts Macrophages Endothelial cells Adipocytes Granulocytes
##   DEX       14081       38100              2240       1751          862
##   MCC       20663       63450              5276       2545         4308
##   PCL       19692       52797              4435       3529         2299
##      
##       Cardiomyocytes
##   DEX           1567
##   MCC           1930
##   PCL           3093
#make table
subset_cluster_counts <- subset@meta.data %>%
  dplyr::count(Group, cluster = Idents(subset))

#plot
ggplot(subset_cluster_counts, aes(x = cluster, y = n, fill = Group)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Cell Counts per Cluster by Treatment",
       x = "Cluster",
       y = "Number of Cells") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# normalize within treatment group
subset_props <- subset_cluster_counts %>%
  group_by(Group) %>% 
  mutate(prop = n / sum(n)) %>% #divdes the number by the sum 
  ungroup()

# Plot proportions
ggplot(subset_props, aes(x = cluster, y = prop, fill = Group)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Proportion of Cells per Cluster by Treatment",
       x = "Cluster",
       y = "Proportion of Cells") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

3D umap

With cell distancing and cell type in the metadata, I could then do fun stuff like make 3D umaps in plotly.

Colocalisation

I ran this per sample to get the cell type proportions for each cell and add to the metadata for each sample:

dist_M1 <-subset(dist, subset = Sample == "D5")

# Extract and convert coordinates
coords <- GetTissueCoordinates(dist_M1)
coord_matrix <- as.matrix(coords[, c("x", "y")])

# set radius
radius <- 50

# finds fixed-radius nearest neighbours
frnn_result <- frNN(coord_matrix, eps = radius)

#maps back neighbours to cell types
cell_types <- dist_M1$SingleR.cluster.label
cell_names <- rownames(dist_M1@meta.data)

# removes self from neighbours
neighbor_celltypes <- lapply(frnn_result$id, function(idxs) {
  idxs <- idxs[idxs != 0] 
  cell_types[idxs]
})

#add cell names
names(neighbor_celltypes) <- cell_names

#calculate neighbour proportion type
neighbor_type_props <- lapply(neighbor_celltypes, function(x) {
  prop.table(table(x))
})

I then combined files based on the selection of samples I wanted to look at, and ran this:

# read in saved proportion tables
C1 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/C1_neighbor_props.RDS")
C2 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/C2_neighbor_props.RDS")
C3 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/C3_neighbor_props.RDS")
C5 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/C5_neighbor_props.RDS")

M1 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/M1_neighbor_props.RDS")
M2 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/M2_neighbor_props.RDS")
M3 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/M3_neighbor_props.RDS")
M4 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/M4_neighbor_props.RDS")
M5 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/M5_neighbor_props.RDS")

D1 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/D1_neighbor_props.RDS")
D2 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/D2_neighbor_props.RDS")
D4 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/D4_neighbor_props.RDS")
D5 <- readRDS(file = "/Users/billierose/Desktop/Revised\ MCCDEX/neighbor_data/D5_neighbor_props.RDS")

dist <- readRDS(file = "/Volumes/PRJ-SEIB/Students/Billie/Files_with_distance/merged.banksy.label.2dumap.RDS")

## combine
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
# Combine into one list
all_props_list <- list(C1, C2, C3, C5, M1, M2, M3, M4, M5, D1, D2, D4, D5)

# gets cell type labels
celltype_lookup <- dist$SingleR.cluster.label

# flatten the neighbor_type_props across all samples
all_cells <- unlist(all_props_list, recursive = FALSE)

# make a table of query cell id, query cell type, neighbour cell type, neighbour proportions
combined_df <- bind_rows(lapply(names(all_cells), function(cell_id) {
  prop_table <- all_cells[[cell_id]]
  if (length(prop_table) == 0) return(NULL)  # skip empty
  tibble(
    query_cell = cell_id,
    query_type = celltype_lookup[[cell_id]],
    neighbor_type = names(prop_table),
    proportion = as.numeric(prop_table)
  )
}))


# find mean neighbor proportions for each query cell type
summary_df <- combined_df %>%
  group_by(query_type, neighbor_type) %>%
  summarise(mean_prop = mean(proportion), .groups = "drop")

# put into matrix format
heatmap_matrix <- summary_df %>%
  pivot_wider(names_from = neighbor_type, values_from = mean_prop, values_fill = 0) %>%
  column_to_rownames("query_type") %>%
  as.matrix()

#mappy map mamp
library(pheatmap)
pheatmap(heatmap_matrix,
         cluster_rows = FALSE,
         cluster_cols = FALSE,
         main = "Combined neighbour composition by cell type",
         fontsize_row = 14,
         fontsize_col = 14)
Alt text
Alt text