=================================================================================================== THIS: https://rpubs.com/sherloconan/1341447

REF: https://xzhoulab.github.io/SRTsim/

MORE: https://github.com/zhengxiaoUVic/Spatial

   

SRT = Spatially-Resolved Transcriptomics

SVG = Spatially Variable Gene

> library(SRTsim)

Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, will use the null device.
See '?rgl.useNULL' for ways to avoid this warning.

> options(rgl.useNULL=T) # macOS needs to install XQuartz, https://www.xquartz.org/

   

1. Reference-Based Example

A data list

[[1]] count:    a sparse matrix (dgCMatrix) with 80 rows (genes) and 3,611 columns (Visium spots).

  • i:    row indices (0-based) of the non-zero entries, within each column.

  • p:    column pointers.

  • Dim:    the matrix dimensions \(80\times3,611\).

  • Dimnames:    row and column names. (1) gene IDs, e.g., Ensembl IDs. (2) spot barcodes, e.g., AAACAAGTATCTCCCA-1.

  • x:    the non-zero values aligned with i.

  • factors:    cache for matrix factorizations (e.g., LU/QR/Cholesky) used internally for speed. Empty.

[[2]] info:    a data frame with 3,611 rows (Visium spots) and 6 columns (metadata).

  • row, col:    spot coordinates on the Visium hex grid.

  • imagerow, imagecol:    pixel coordinates on the histology image.

  • tissue:    factor {“0”, “1”} indicating background versus on-tissue.

  • layer:    annotated cortical layer label.

example_count   <- exampleLIBD$count
example_loc     <- exampleLIBD$info[, c("imagecol", "imagerow", "layer")]
colnames(example_loc) <- c("x", "y", "label")

simSRT  <- createSRT(count_in=example_count,
                     loc_in=example_loc) # create an SRT object

 

The reference-based SRT simulation relies on one of these 80 spatial patterns. The process starts by fitting four count models—poisson, nb, zip, and zinb—and then selecting the best fit using likelihood ratio tests, AIC, and dispersion. Afterward, we generate simulated counts from the best model based on the estimated parameters. The simulation can either be applied across the entire tissue or within specific spatial domains, like cortical layers.

  • Poisson, \(\mathbb{P}(Y=k)=\frac{\lambda^k e^{-\lambda}}{k!}\), \(\mathbb{E}[Y]=\operatorname{Var}(Y)=\lambda\).

  • NB (negative binomial), \(\mathbb{P}(Y=k)=\frac{\Gamma(k+\theta)}{k!\ \Gamma(\theta)}\left(\frac{\theta}{\theta+\mu}\right)^\theta \left(\frac{\mu}{\theta+\mu}\right)^k\), \(\operatorname{Var}(Y)=\mu+\frac{\mu^2}{\theta}\).

  • ZIP (zero-inflated Poisson), \(\mathbb{P}(Y=k)=\begin{cases}\ \pi_0+(1-\pi_0)e^{-\lambda}, &\text{if}\ k=0 \\ \ (1-\pi_0)\frac{\lambda^k e^{-\lambda}}{k!}, &\text{if}\ k>0 \end{cases}\), \(\operatorname{Var}(Y)=(1-\pi_0)\lambda(1+\pi_0\lambda)=\mathbb{E}[Y]+\pi_0(1-\pi_0)\lambda^2\).

  • ZINB (zero-inflated negative binomial)

   

2. Tissue-Wise Simulation

Fit one count model per gene using all on-tissue spots pooled together, then generate counts across the whole tissue from that single set of parameters.

# estimate model parameters for data generation
set.seed(1); simSRT1 <- srtsim_fit(simSRT, sim_schem="tissue")

# generate synthetic data with estimated parameters
simSRT1 <- srtsim_count(simSRT1)

# visualize expression pattern for the gene 
visualize_gene(simsrt=simSRT1, plotgn="ENSG00000183036", rev_y=T)

Note: Visium coordinates come from the image grid where the origin is top-left and \(y\) increases downward. Many spatial transcriptomics plotting helpers therefore reverse the \(y\)-axis to match the histology image.

   

3. Domain-Specific Simulation

Fit a separate count model per gene within each spatial domain (e.g., cortical layers), then simulate each domain and stitch them together. This preserves—and often sharpens—between-domain contrasts, making domain detection easier.

# estimate model parameters for data generation
set.seed(1); simSRT2 <- srtsim_fit(simSRT, sim_schem="domain")

# generate synthetic data with estimated parameters
simSRT2 <- srtsim_count(simSRT2)

# visualize expression pattern for the gene 
visualize_gene(simsrt=simSRT2, plotgn="ENSG00000168314", rev_y=T)

   

Backup

The title uses the last five digits of the gene ID.

plot_gene <- function(gene) {
  #' function to plot a single gene
  df <- data.frame(
    x = example_loc$x,
    y = example_loc$y,
    count = example_count[gene,]
  )
  
  ggplot(df, aes(x = x, y = y, color = count)) +
    geom_point(size = 0.5) +
    scale_color_viridis(option = "D", direction = -1) +
    coord_fixed() +
    scale_y_reverse() +
    labs(title = substr(gene, nchar(gene)-4, nchar(gene))) +
    theme_void() +
    theme(legend.position = "none")
}

# create plots
plots <- lapply(example_count@Dimnames[[1]], plot_gene)

# combine using patchwork
wrap_plots(plots, ncol=5)

 

Yan, Hua, and Li (2025, pp. 3-5) identify three types of SVGs: overall, cell-type-specific, and spatial domain markers.

First, they define overall SVGs as the genes that exhibit non-random spatial expression patterns. The detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules.

Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types.

Third, spatial-domain-marker SVGs detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.

   

Local Moran’s \(I\)

\[\begin{equation} I_i\propto(y_i-\bar{y})\sum_{j=1}^N w_{ij}\ \!(y_j-\bar{y}),\quad\text{where}\ \ w_{ij}=\mathbf{1}_{\{j\in\partial i\}} \end{equation}\]

Higher values indicate greater similarity in expression among neighboring cells, while negative values show local dissimilarity.

Non-uniform autocorrelation patterns indicate non-stationarity.

plot_Moran <- function(gene, r=15) {
  #' function to plot the local Moran's I
  Z <- log10(1 + example_count[gene,]) # logcount
  df <- data.frame(
    x = example_loc$x,
    y = example_loc$y,
    value = Z - mean(Z, na.rm=T)
  )
  
  D <- as.matrix(dist(as.matrix(df[,1:2]))) # pairwise distances
  A <- (D > 0 & D <= r) * 1 # binary adjacency (no self-neighbors)  
  #                  ↑ tune the radius r that defines neighbors; hist(D[D>0])

  df$I <- as.numeric(df$value * (A %*% df$value))
  
  ggplot(df, aes(x = x, y = y, color = I)) +
    geom_point(size = 0.5) +
    scale_color_viridis(option = "D", direction = -1) +
    coord_fixed() +
    scale_y_reverse() +
    labs(title = substr(gene, nchar(gene)-4, nchar(gene))) +
    theme_void() +
    theme(legend.position = "none")
}

# create plots
plots2 <- lapply(example_count@Dimnames[[1]], plot_Moran)

# combine using patchwork
wrap_plots(plots2, ncol=5)

 

scroll to top