Constrained Spatial Point Process Intensity Estimation using Ensembles of Spanning Trees

Author

Huiyan Sang, Isaac Ray, Ligang Lu

library(BASTION)
library(igraph)
library(ggraph)
library(tidyverse)
library(ggpubr)
library(sf)
library(sp)
library(sn)
library(fdaPDE)
library(mgcv)
library(doParallel)
library(foreach)
library(doRNG)
library(mvtnorm)
library(TruncatedNormal)
library(scoringRules)
library(spatstat)
library(gridExtra)

Introduction

Analysis of spatial point patterns (event occurrences) is needed for many applications
A reasonable starting point is to start with a nonparametric estimation of the first order moment (intensity function)
Most popular approaches’ (kernel methods¹, LGCP², spline methods³) validity depends on Euclidean assumption
Many data sets are collected on a complex domain (e.g., road/pipeline networks, domains with interior holes, land separated by water)
Many point processes may have sharp discontinuities in their intensity

Intensity Estimation

We assume we can characterize our point pattern by an inhomogeneous Poisson process
Using our data, we want to estimate the function \(\lambda(s)\) on our entire domain
- \(\lambda(s)\) may not be globally smooth
Since \(\lambda(s)=\lim_{\nu(ds)\rightarrow 0}\frac{E(N(ds))}{\nu(ds)}\), we can treat \(\lambda(s)\nu(ds)\) as the probability of an event occurring in the ‘infinitesimal’ set \(ds\)
- So, in some sense we want to ‘collapse’ \(ds\) at every point \(s\) in our domain while tracking \(N(ds)\) to estimate \(\lambda(s)\)

Existing Literature

(Ferraccioli et al. 2021) - density estimation as equivalent problem to inhomogeneous PPP intensity estimation (smooth only)
(Baddeley et al. 2020) - kernel estimators on line networks (smooth only)
(Moradi et al. 2019) - resample-smoothing for Voronoi intensity estimator

Additive Tree Based Approach

Idea: Use the Bayesian Random Spanning Tree Partition model framework instead of directly using Voronoi cells
Model \(\lambda(s) = \sum_{m=1}^M \lambda_m(s)\)
Assume that \(\lambda_m(s)\) is piecewise constant on the domain
- What the constant pieces are varies with \(m\)
- This allows us to greatly simplify the normally challenging integral \(\int_{A_m}\lambda_m ds = \lambda_m\nu(A_j)\)
- We’ll call each \(\lambda_m(s)\) a ‘learner’
Across the MCMC, we merge and split the piecewise constant regions for flexible, spatially contiguous partioning

Model Description

Data \(y = y_1,…,y_N\) lying in constrained domain \(A\)
Partition \(\pi = \{A_1,...,A_k\}\) of our domain as \(k\) contiguous subregions \(A_j\) for \(j \in \{1,...,k\}\)
- Each one of these subregions is a cluster
\(|A_j|\) is Lebesgue measure \(\nu\) of cluster \(A_j\)
- For most problems this is just the area, could be length or volume
\(n_j = \sum_{i=1}^N I(y_i \in A_j)\), the number of observations in each cluster

Our likelihood is

\[\begin{equation} \label{jointlik} \left[\underline{y}|\underline{\lambda}, \pi, k\right] = e^{|A|}\prod_{j=1}^k e^{\lambda_j|A_j|}\lambda_j^{n_j} \end{equation}\] \[\begin{equation} \label{jointloglik} \log\left[\underline{y}|\underline{\lambda}, \pi, k\right] = |A| + \sum_{j=1}^k n_j\log \lambda_j - \lambda_j|A_j| \end{equation}\]

So, per learner, we’ll use Gibbs sampling⁴ to estimate the posterior distributions:

\[\begin{align} [k^{(t+1)}&|k^{(t)}] \label{kdist}\\ [\underline{\lambda}^{(t+1)}&|\pi^{(t)},\underline{y},k^{(t+1)}] \label{lambdapost}\\ [\pi^{(t+1)}&|\underline{\lambda}^{(t+1)},\underline{y},k^{(t+1)}] \label{partitionpost} \end{align}\]

Computation

bound_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) 

mesh_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_node_point()
  

partition_gg = ggraph(example_rsf, layout = example_mesh %>%
        st_centroid() %>%
        st_coordinates()) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_edge_link() +
  geom_node_point(aes(color = Partition)) +
  scale_color_discrete()
  
data_gg = ggplot(data = example_points) +
  geom_sf(data=boundary_rot) +
  geom_sf(data=example_mesh) +
  geom_point(aes(x, y, color = mem))

grid.arrange(bound_gg, mesh_gg, partition_gg, data_gg, nrow = 2)

Simulation Example

# boundary specifies the data domain (can be constrained)
# N_LEARNERS specifies the number of simultaneous chains in the bagging setup
# MESH_REF specifies how fine the meshes should be on the domain
# data is the observations in the form of coordinates (sf points)

learners = makeVoronoiMeshes(boundary = boundary_rot,
                             N_LEARNERS = 48,
                             MESH_REF = 100) %>%
           makeLearners(data = simulated_data_rot)
# How long to run the chain
MCMC = 5000
# Initial iterations to get rid of to reach stationary distribution
BURNIN = 1000
# Sampling interval to reduce autocorrelation
THIN = 20
# Alpha and beta for Gamma prior distribution on piecewise constants
shape_hp = (1/3)
rate_hp = 0.0000001
# Lambda for truncated Poisson prior on number of clusters
lambda_k_hp = 50
# Where to truncate the Poisson prior on number of clusters
max_clusters = 100

densityOutputRot = densityBASTunconditioned(learners = learners,
                                             MCMC,
                                             BURNIN,
                                             THIN,
                                             shape_hp,
                                             rate_hp,
                                             lambda_k_hp,
                                             max_clusters,
                                             seed = 12345,
                                             data_prior = TRUE,
                                             normalized = TRUE,
                                             subsetting = "dirichlet",
                                             parallel_cores = 12,
                                             detailed_output = FALSE)

grid_points = truth_density_rot[,1:2] %>% 
  as.matrix() %>% 
  st_multipoint() %>% 
  st_sfc %>% 
  st_cast("POINT") %>% 
  st_sf()
intensity_samples = getDensitySamples(densityOutputRot, 
                                      grid_points, 
                                      intensity = TRUE)
density_samples = t(apply(intensity_samples, 1, function(x) x/sum(x)))

grid_post_mean = colMeans(density_samples)
grid_post_median = apply(density_samples, 2, median)
grid_post_sd = apply(density_samples, 2, sd)
grid_post_mean_sqerr = (grid_post_mean - truth_density_rot$full_dens)^2
grid_post_mean_abserr = abs(grid_post_mean - truth_density_rot$full_dens)
grid_post_median_sqerr = (grid_post_median - truth_density_rot$full_dens)^2
grid_post_median_abserr = abs(grid_post_median - truth_density_rot$full_dens)

grid_MISE = mean(grid_post_mean_sqerr)
grid_MISE_median = mean(grid_post_median_sqerr)
cat("MISE based on posterior mean:", grid_MISE, "\n")

MISE based on posterior mean: 1.981295e-10

cat("MISE based on posterior median:", grid_MISE_median, "\n")

MISE based on posterior median: 1.981582e-10

grid_crps = crps_sample(y = truth_density_rot$full_dens, 
                        dat = t(density_samples))
cat("Mean CRPS:", mean(grid_crps), "\n")

Mean CRPS: 7.645029e-06

grid_MAE = mean(grid_post_mean_abserr)
grid_MAE_median = mean(grid_post_median_abserr)
cat("MAE based on posterior mean:", grid_MAE, "\n")

MAE based on posterior mean: 8.137434e-06

cat("MAE based on posterior median:", grid_MAE_median, "\n")

MAE based on posterior median: 8.130804e-06

grid_df = cbind(truth_density_rot[,1:2], 
                grid_post_mean, 
                grid_post_median, 
                grid_post_sd, 
                grid_post_mean_abserr,
                grid_post_median_abserr,
                grid_post_mean_sqerr,
                grid_post_median_sqerr)
grid_plot = ggplot(grid_df, aes(x = x, y = y))

#grid.arrange(
  ggplot(truth_density_rot, aes(x = x, y = y)) +
  geom_point(aes(color = full_dens)) + 
  # geom_point(data =  as.data.frame(simulated_data_matrix_rotated), 
  #            size = 0.1) +
  scale_color_viridis(limits = c(0, 0.0002)) +
  guides(fill = "none", shape = "none") +
  labs(title = "True Intensity Field",
       color = "True Intensity")#,

  grid_plot + geom_point(aes(color = grid_post_median)) +
  scale_color_viridis(limits = c(0, 0.0002)) + 
  # geom_point(data =  as.data.frame(simulated_data_matrix_rotated), 
  #            size = 0.1)  +
  labs(title = "Posterior Median Intensity Field",
       color = "Est. Intensity")#,

grid_plot + geom_point(aes(color = grid_post_mean_sqerr)) +
  scale_color_viridis() +
  labs(title = "Posterior Median - Squared Error",
       color = "SE(Intensity)")#,

  #ncol = 3
#)

Real Data Example

Ongoing Research

Boosting vs Bagging
- Currently bagging has worked better, but boosting using thinning properties should be possible
- Model Averaging
Mixture of Experts
Better transition kernel for the partition based on graph message passing
Line network data
- Lebesgue measure is ambivalent to dimension, just need reasonable graph construction
Variational Inference for Speed Improvements

Conclusion

The new method is robust to constrained point patterns
It provides for uncertainty quantification through its Bayesian framework
- This allows for better estimation of the discontinuities in the intensity function
- Needs data along the boundary to effectively capture this
Different meta-model structures are reasonable and possible

References

Baddeley, Adrian, Gopalan Nair, Suman Rakshit, Greg McSwiggan, and Tilman M Davies. 2020. “Analysing Point Patterns on Networks—a Review.” Spatial Statistics, 100435.

Ferraccioli, Federico, Eleonora Arnone, Livio Finos, James O Ramsay, and Laura M Sangalli. 2021. “Nonparametric Density Estimation over Complicated Domains.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 83 (2): 346–68.

Gu, Chong, and Chunfu Qiu. 1993. “Smoothing Spline Density Estimation: Theory.” The Annals of Statistics 21 (1): 217–34.

Moradi, M Mehdi, Ottmar Cronie, Ege Rubak, Raphael Lachieze-Rey, Jorge Mateu, and Adrian Baddeley. 2019. “Resample-Smoothing of Voronoi Intensity Estimators.” Statistics and Computing 29 (5): 995–1010.

Payne, Richard D, Nilabja Guha, Yu Ding, and Bani K Mallick. 2020. “A Conditional Density Estimation Partition Model Using Logistic Gaussian Processes.” Biometrika 107 (1): 173–90.

Wand, Matt P, and M Chris Jones. 1994. Kernel Smoothing. CRC press.

Footnotes

(Wand and Jones 1994)↩︎
(Payne et al. 2020)↩︎
(Gu and Qiu 1993)↩︎
We have to do Metropolis-Within-Gibbs because the posterior of the partition is not available in closed form↩︎