=================================================================================================== THIS: https://rpubs.com/sherloconan/1341447
REF: https://xzhoulab.github.io/SRTsim/
MORE: https://github.com/zhengxiaoUVic/Spatial
SRT = Spatially-Resolved Transcriptomics
SVG = Spatially Variable Gene
> library(SRTsim)
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, will use the null device.
See '?rgl.useNULL' for ways to avoid this warning.
> options(rgl.useNULL=T) # macOS needs to install XQuartz, https://www.xquartz.org/
A data list
[[1]] count: a sparse matrix (dgCMatrix)
with 80 rows (genes) and 3,611 columns (Visium
spots).
i: row indices (0-based) of the non-zero entries,
within each column.
p: column pointers.
Dim: the matrix dimensions \(80\times3,611\).
Dimnames: row and column names. (1) gene IDs,
e.g., Ensembl IDs. (2) spot barcodes, e.g., AAACAAGTATCTCCCA-1.
x: the non-zero values aligned with
i.
factors: cache for matrix factorizations (e.g.,
LU/QR/Cholesky) used internally for speed. Empty.
[[2]] info: a data frame with 3,611 rows (Visium
spots) and 6 columns (metadata).
row, col: spot coordinates on the
Visium hex grid.
imagerow, imagecol: pixel
coordinates on the histology image.
tissue: factor {“0”, “1”} indicating background
versus on-tissue.
layer: annotated cortical layer label.
example_count <- exampleLIBD$count
example_loc <- exampleLIBD$info[, c("imagecol", "imagerow", "layer")]
colnames(example_loc) <- c("x", "y", "label")
simSRT <- createSRT(count_in=example_count,
loc_in=example_loc) # create an SRT object
The reference-based SRT simulation relies on one of these 80 spatial
patterns. The process starts by fitting four count
models—poisson, nb, zip, and
zinb—and then selecting the best fit using likelihood ratio
tests, AIC, and dispersion. Afterward, we generate simulated counts from
the best model based on the estimated parameters. The simulation can
either be applied across the entire tissue or within specific spatial
domains, like cortical layers.
Poisson, \(\mathbb{P}(Y=k)=\frac{\lambda^k e^{-\lambda}}{k!}\), \(\mathbb{E}[Y]=\operatorname{Var}(Y)=\lambda\).
NB (negative binomial), \(\mathbb{P}(Y=k)=\frac{\Gamma(k+\theta)}{k!\ \Gamma(\theta)}\left(\frac{\theta}{\theta+\mu}\right)^\theta \left(\frac{\mu}{\theta+\mu}\right)^k\), \(\operatorname{Var}(Y)=\mu+\frac{\mu^2}{\theta}\).
ZIP (zero-inflated Poisson), \(\mathbb{P}(Y=k)=\begin{cases}\ \pi_0+(1-\pi_0)e^{-\lambda}, &\text{if}\ k=0 \\ \ (1-\pi_0)\frac{\lambda^k e^{-\lambda}}{k!}, &\text{if}\ k>0 \end{cases}\), \(\operatorname{Var}(Y)=(1-\pi_0)\lambda(1+\pi_0\lambda)=\mathbb{E}[Y]+\pi_0(1-\pi_0)\lambda^2\).
ZINB (zero-inflated negative binomial)
Fit one count model per gene using all on-tissue spots pooled together, then generate counts across the whole tissue from that single set of parameters.
# estimate model parameters for data generation
set.seed(1); simSRT1 <- srtsim_fit(simSRT, sim_schem="tissue")
# generate synthetic data with estimated parameters
simSRT1 <- srtsim_count(simSRT1)
# visualize expression pattern for the gene
visualize_gene(simsrt=simSRT1, plotgn="ENSG00000183036", rev_y=T)
Note: Visium coordinates come from the image grid where the origin is top-left and \(y\) increases downward. Many spatial transcriptomics plotting helpers therefore reverse the \(y\)-axis to match the histology image.
Fit a separate count model per gene within each spatial domain (e.g., cortical layers), then simulate each domain and stitch them together. This preserves—and often sharpens—between-domain contrasts, making domain detection easier.
# estimate model parameters for data generation
set.seed(1); simSRT2 <- srtsim_fit(simSRT, sim_schem="domain")
# generate synthetic data with estimated parameters
simSRT2 <- srtsim_count(simSRT2)
# visualize expression pattern for the gene
visualize_gene(simsrt=simSRT2, plotgn="ENSG00000168314", rev_y=T)
The title uses the last five digits of the gene ID.
plot_gene <- function(gene) {
#' function to plot a single gene
df <- data.frame(
x = example_loc$x,
y = example_loc$y,
count = example_count[gene,]
)
ggplot(df, aes(x = x, y = y, color = count)) +
geom_point(size = 0.5) +
scale_color_viridis(option = "D", direction = -1) +
coord_fixed() +
scale_y_reverse() +
labs(title = substr(gene, nchar(gene)-4, nchar(gene))) +
theme_void() +
theme(legend.position = "none")
}
# create plots
plots <- lapply(example_count@Dimnames[[1]], plot_gene)
# combine using patchwork
wrap_plots(plots, ncol=5)