Constrained Spatial Point Process Intensity Estimation using Ensembles of Spanning Trees

Author

Huiyan Sang, Isaac Ray, Ligang Lu

library(BASTION)
library(igraph)
library(ggraph)
library(tidyverse)
library(ggpubr)
library(sf)
library(sp)
library(sn)
library(fdaPDE)
library(mgcv)
library(doParallel)
library(foreach)
library(doRNG)
library(mvtnorm)
library(TruncatedNormal)
library(scoringRules)
library(spatstat)
library(gridExtra)

Introduction

  • Analysis of spatial point patterns (event occurrences) is needed for many applications

  • A reasonable starting point is to start with a nonparametric estimation of the first order moment (intensity function)

  • Most popular approaches’ (kernel methods1, LGCP2, spline methods3) validity depends on Euclidean assumption

  • Many data sets are collected on a complex domain (e.g., road/pipeline networks, domains with interior holes, land separated by water)

  • Many point processes may have sharp discontinuities in their intensity

Intensity Estimation

  • We assume we can characterize our point pattern by an inhomogeneous Poisson process

  • Using our data, we want to estimate the function \(\lambda(s)\) on our entire domain

    • \(\lambda(s)\) may not be globally smooth
  • Since \(\lambda(s)=\lim_{\nu(ds)\rightarrow 0}\frac{E(N(ds))}{\nu(ds)}\), we can treat \(\lambda(s)\nu(ds)\) as the probability of an event occurring in the ‘infinitesimal’ set \(ds\)

    • So, in some sense we want to ‘collapse’ \(ds\) at every point \(s\) in our domain while tracking \(N(ds)\) to estimate \(\lambda(s)\)

Existing Literature

Additive Tree Based Approach

  • Idea: Use the Bayesian Random Spanning Tree Partition model framework instead of directly using Voronoi cells

  • Model \(\lambda(s) = \sum_{m=1}^M \lambda_m(s)\)

  • Assume that \(\lambda_m(s)\) is piecewise constant on the domain

    • What the constant pieces are varies with \(m\)

    • This allows us to greatly simplify the normally challenging integral \(\int_{A_m}\lambda_m ds = \lambda_m\nu(A_j)\)

    • We’ll call each \(\lambda_m(s)\) a ‘learner’

  • Across the MCMC, we merge and split the piecewise constant regions for flexible, spatially contiguous partioning

Model Description

  • Data \(y = y_1,…,y_N\) lying in constrained domain \(A\)

  • Partition \(\pi = \{A_1,...,A_k\}\) of our domain as \(k\) contiguous subregions \(A_j\) for \(j \in \{1,...,k\}\)

    • Each one of these subregions is a cluster
  • \(|A_j|\) is Lebesgue measure \(\nu\) of cluster \(A_j\)

    • For most problems this is just the area, could be length or volume
  • \(n_j = \sum_{i=1}^N I(y_i \in A_j)\), the number of observations in each cluster


  • Our likelihood is
\[\begin{equation} \label{jointlik} \left[\underline{y}|\underline{\lambda}, \pi, k\right] = e^{|A|}\prod_{j=1}^k e^{\lambda_j|A_j|}\lambda_j^{n_j} \end{equation}\] \[\begin{equation} \label{jointloglik} \log\left[\underline{y}|\underline{\lambda}, \pi, k\right] = |A| + \sum_{j=1}^k n_j\log \lambda_j - \lambda_j|A_j| \end{equation}\]

So, per learner, we’ll use Gibbs sampling4 to estimate the posterior distributions:

\[\begin{align} [k^{(t+1)}&|k^{(t)}] \label{kdist}\\ [\underline{\lambda}^{(t+1)}&|\pi^{(t)},\underline{y},k^{(t+1)}] \label{lambdapost}\\ [\pi^{(t+1)}&|\underline{\lambda}^{(t+1)},\underline{y},k^{(t+1)}] \label{partitionpost} \end{align}\]

Computation

bound_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) 

mesh_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_node_point()
  

partition_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_edge_link() +
  geom_node_point(aes(color = Partition)) +
  scale_color_discrete()
  
data_gg = ggplot(data = example_points) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_point(aes(x, y, color = mem))

grid.arrange(bound_gg, mesh_gg, partition_gg, data_gg, nrow = 2)

Simulation Example

# boundary specifies the data domain (can be constrained)
# N_LEARNERS specifies the number of simultaneous chains in the bagging setup
# MESH_REF specifies how fine the meshes should be on the domain
# data is the observations in the form of coordinates (sf points)

learners = makeVoronoiMeshes(boundary = boundary_rot,
                             N_LEARNERS = 48,
                             MESH_REF = 100) %>%
           makeLearners(data = simulated_data_rot)
# How long to run the chain
MCMC = 5000
# Initial iterations to get rid of to reach stationary distribution
BURNIN = 1000
# Sampling interval to reduce autocorrelation
THIN = 20
# Alpha and beta for Gamma prior distribution on piecewise constants
shape_hp = (1/3)
rate_hp = 0.0000001
# Lambda for truncated Poisson prior on number of clusters
lambda_k_hp = 50
# Where to truncate the Poisson prior on number of clusters
max_clusters = 100

densityOutputRot = densityBASTunconditioned(learners = learners,
                                             MCMC,
                                             BURNIN,
                                             THIN,
                                             shape_hp,
                                             rate_hp,
                                             lambda_k_hp,
                                             max_clusters,
                                             seed = 12345,
                                             data_prior = TRUE,
                                             normalized = TRUE,
                                             subsetting = "dirichlet",
                                             parallel_cores = 12,
                                             detailed_output = FALSE)
grid_points = truth_density_rot[,1:2] %>% 
  as.matrix() %>% 
  st_multipoint() %>% 
  st_sfc %>% 
  st_cast("POINT") %>% 
  st_sf()
intensity_samples = getDensitySamples(densityOutputRot, 
                                      grid_points, 
                                      intensity = TRUE)
density_samples = t(apply(intensity_samples, 1, function(x) x/sum(x)))
grid_post_mean = colMeans(density_samples)
grid_post_median = apply(density_samples, 2, median)
grid_post_sd = apply(density_samples, 2, sd)
grid_post_mean_sqerr = (grid_post_mean - truth_density_rot$full_dens)^2
grid_post_mean_abserr = abs(grid_post_mean - truth_density_rot$full_dens)
grid_post_median_sqerr = (grid_post_median - truth_density_rot$full_dens)^2
grid_post_median_abserr = abs(grid_post_median - truth_density_rot$full_dens)

grid_MISE = mean(grid_post_mean_sqerr)
grid_MISE_median = mean(grid_post_median_sqerr)
cat("MISE based on posterior mean:", grid_MISE, "\n")
MISE based on posterior mean: 1.981295e-10 
cat("MISE based on posterior median:", grid_MISE_median, "\n")
MISE based on posterior median: 1.981582e-10 
grid_crps = crps_sample(y = truth_density_rot$full_dens, 
                        dat = t(density_samples))
cat("Mean CRPS:", mean(grid_crps), "\n")
Mean CRPS: 7.645029e-06 
grid_MAE = mean(grid_post_mean_abserr)
grid_MAE_median = mean(grid_post_median_abserr)
cat("MAE based on posterior mean:", grid_MAE, "\n")
MAE based on posterior mean: 8.137434e-06 
cat("MAE based on posterior median:", grid_MAE_median, "\n")
MAE based on posterior median: 8.130804e-06 

grid_df = cbind(truth_density_rot[,1:2], 
                grid_post_mean, 
                grid_post_median, 
                grid_post_sd, 
                grid_post_mean_abserr,
                grid_post_median_abserr,
                grid_post_mean_sqerr,
                grid_post_median_sqerr)
grid_plot = ggplot(grid_df, aes(x = x, y = y))

#grid.arrange(
  ggplot(truth_density_rot, aes(x = x, y = y)) +
  geom_point(aes(color = full_dens)) + 
  # geom_point(data =  as.data.frame(simulated_data_matrix_rotated), 
  #            size = 0.1) +
  scale_color_viridis(limits = c(0, 0.0002)) +
  guides(fill = "none", shape = "none") +
  labs(title = "True Intensity Field",
       color = "True Intensity")#,

  grid_plot + geom_point(aes(color = grid_post_median)) +
  scale_color_viridis(limits = c(0, 0.0002)) + 
  # geom_point(data =  as.data.frame(simulated_data_matrix_rotated), 
  #            size = 0.1)  +
  labs(title = "Posterior Median Intensity Field",
       color = "Est. Intensity")#,

grid_plot + geom_point(aes(color = grid_post_mean_sqerr)) +
  scale_color_viridis() +
  labs(title = "Posterior Median - Squared Error",
       color = "SE(Intensity)")#,

  #ncol = 3
#)

Real Data Example


Ongoing Research

  • Boosting vs Bagging

    • Currently bagging has worked better, but boosting using thinning properties should be possible
    • Model Averaging
  • Mixture of Experts

  • Better transition kernel for the partition based on graph message passing

  • Line network data

    • Lebesgue measure is ambivalent to dimension, just need reasonable graph construction
  • Variational Inference for Speed Improvements

Conclusion

  • The new method is robust to constrained point patterns

  • It provides for uncertainty quantification through its Bayesian framework

    • This allows for better estimation of the discontinuities in the intensity function

    • Needs data along the boundary to effectively capture this

  • Different meta-model structures are reasonable and possible


References

Baddeley, Adrian, Gopalan Nair, Suman Rakshit, Greg McSwiggan, and Tilman M Davies. 2020. “Analysing Point Patterns on Networks—a Review.” Spatial Statistics, 100435.
Ferraccioli, Federico, Eleonora Arnone, Livio Finos, James O Ramsay, and Laura M Sangalli. 2021. “Nonparametric Density Estimation over Complicated Domains.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 83 (2): 346–68.
Gu, Chong, and Chunfu Qiu. 1993. “Smoothing Spline Density Estimation: Theory.” The Annals of Statistics 21 (1): 217–34.
Moradi, M Mehdi, Ottmar Cronie, Ege Rubak, Raphael Lachieze-Rey, Jorge Mateu, and Adrian Baddeley. 2019. “Resample-Smoothing of Voronoi Intensity Estimators.” Statistics and Computing 29 (5): 995–1010.
Payne, Richard D, Nilabja Guha, Yu Ding, and Bani K Mallick. 2020. “A Conditional Density Estimation Partition Model Using Logistic Gaussian Processes.” Biometrika 107 (1): 173–90.
Wand, Matt P, and M Chris Jones. 1994. Kernel Smoothing. CRC press.

Footnotes

  1. (Wand and Jones 1994)↩︎

  2. (Payne et al. 2020)↩︎

  3. (Gu and Qiu 1993)↩︎

  4. We have to do Metropolis-Within-Gibbs because the posterior of the partition is not available in closed form↩︎