Motivation

Modern wildlife monitoring rarely relies on a single data source. Systematic line transect surveys, opportunistic sightings, acoustic detections, and telemetry data are increasingly collected in parallel — each providing partial, imperfect information about the same underlying animal population. Data integration, combining these sources into a single coherent model, is therefore a central methodological challenge in ecological statistics.

The Log-Gaussian Cox Process (LGCP) framework provides the most principled approach to this problem for detection-type data. The core idea, as implemented in the inlabru R package (Bachl et al.  2019), is that each data source can be treated as an independent thinned realisation of the same latent spatial intensity surface — with its own thinning mechanism reflecting its specific observation process (detection function, survey effort, observer behaviour). Multiple data sources are combined simply by adding additional likelihoods to the model, each linked to the shared latent field. This provides a statistically coherent way to borrow strength across data sources while propagating uncertainty from each.

Despite its theoretical appeal, the LGCP framework has seen limited uptake among marine mammal researchers compared to conventional density surface modelling (DSM; Miller et al. 2013). A key practical barrier is that many marine mammals are not solitary animals. They are detected as groups, and those groups vary enormously in size: from a handful of individuals to aggregations of hundreds or thousands. A standard LGCP models the intensity of detection events (one point per sighting regardless of group size) and therefore estimates the spatial distribution of groups, not animals. Converting this to animal density requires explicitly modelling group size as part of the statistical framework, which introduces substantial additional complexity: the group size distribution is typically overdispersed, may vary spatially, and may be confounded with the detection process itself (larger groups being visible at greater distances — so-called size bias). Addressing this gap and making the LGCP framework tractable for marine species is therefore a necessary step toward broader adoption of principled data integration methods in marine mammal science.

This document demonstrates the LGCP framework applied to a single data source (systematic line transect surveys of pantropical dolphins in the Gulf of Mexico) as a foundation for multi-source integration. The standard spatial LGCP for this dataset is documented in the inlabru distance sampling vignette (Borchers & Lindgren, 2026; https://inlabru-org.github.io/inlabru/articles/2d_lgcp_distancesampling.html) and the original inlabru paper (Bachl et al., 2019), which we take as the starting point. We extend that standard model to a marked point process that jointly models group locations and group sizes, enabling estimation of animal density (animals/km²) rather than just group intensity (groups/km²).


Overview

This document implements a marked Log-Gaussian Cox Process (LGCP) for estimating dolphin density from line transect distance sampling data. The mexdolphin_sf dataset (inlabru) contains 47 detected dolphin groups in the Gulf of Mexico, each with a perpendicular detection distance and group size.

The core problem: groups vs animals

Standard LGCP models the spatial intensity of detected groups: it tells you where groups are, not how many animals there are. To estimate animal density we need to account for group size, which varies spatially. This requires a marked point process; treating group size as a “mark” attached to each detected location.

Modelling framework

We write:

\[D(\mathbf{s}) = \underbrace{\lambda_{\text{groups}}(\mathbf{s})}_{\text{Likelihood 1: CP}} \times \underbrace{\mu(\mathbf{s})}_{\text{Likelihood 2: NegBin}}\]

where \(D(\mathbf{s})\) is animal density (animals/km²), \(\lambda_{\text{groups}}(\mathbf{s})\) is group intensity (groups/km²) corrected for imperfect detection, and \(\mu(\mathbf{s})\) is expected group size at location \(\mathbf{s}\).

Key assumptions

These (problematic) assumptions are required for the two-likelihood factorisation:

Assumption 1 — No size bias: Detection probability \(g(y)\) depends only on perpendicular distance, not on group size. In reality, larger dolphin groups may be detectable at greater distances. Without this assumption the factorisation breaks down: if larger groups are more detectable, the fitted size field \(\zeta(\mathbf{s})\) absorbs detectability variation rather than true spatial variation in group size — the mark distribution and the thinning process become confounded. The approach of Houldcroft et al. (2024) resolves this by modelling \(g(y, m)\) jointly, at additional computational cost (that becomes unfeasable in this example)

Assumption 2 — Independent spatial fields (\(\phi = 0\)): The spatial process driving group intensity and the spatial process driving group size are completely independent. There is no shared latent field. Areas with more groups are not assumed to have larger (or smaller) groups.

Assumption 3 — Conditional independence of group sizes: The NegBin likelihood treats observed group sizes as conditionally independent given the spatial field \(\zeta(\mathbf{s})\). This requires that groups form independently of one another: that the size of one group does not depend on the sizes or locations of neighbouring groups. In reality this is unlikely: cetacean groups arise from an underlying animal-level aggregation process in which the same individuals cannot belong to multiple groups simultaneously. A group of 200 animals at one location removes those 200 animals from the pool available to form groups elsewhere. Properly accounting for this would require modelling the animal-level point process directly rather than the group-level process


Setup

library(inlabru)
library(fmesher)
library(INLA)
library(ggplot2)
library(patchwork)

bru_options_set(inla.mode = "experimental")

Data exploration

mexdolphin <- mexdolphin_sf
mesh       <- mexdolphin$mesh
W          <- 8  # truncation distance (km)

n_det  <- nrow(mexdolphin$points)
gs     <- mexdolphin$points$size
gs_mean <- round(mean(gs), 1)
gs_var  <- round(var(gs), 1)
gs_vm   <- round(gs_var / gs_mean, 1)

The dataset contains 47 detected groups with a truncation distance of W = 8 km. Group sizes range from 15 to 650 animals with a mean of 98.9 and variance/mean ratio of 113.9 — confirming strong overdispersion.

Survey design

The survey covers the Gulf of Mexico with a series of line transects.

ggplot() +
  gg(mexdolphin$ppoly) +
  gg(mexdolphin$samplers, color = "grey") +
  gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
  coord_sf(datum = fm_crs(mexdolphin$ppoly)) +
  labs(title = "Survey design: transects and detections",
       x = NULL, y = NULL) +
  theme_bw()
Survey transects (grey) and detected groups (black points).

Survey transects (grey) and detected groups (black points).

Perpendicular distances

The distance histogram should show declining detections with increasing distance from the transect — a key assumption of distance sampling.

ggplot(mexdolphin$points) +
  geom_histogram(aes(x = distance),
                 breaks = seq(0, W, length.out = 9),
                 boundary = 0, fill = "steelblue",
                 color = "white", alpha = 0.8) +
  geom_point(aes(x = distance), y = 0, pch = "|", cex = 4, color = "black") +
  labs(title = "Perpendicular distance histogram",
       x = "Distance from transect (km)", y = "Count") +
  theme_bw()
Histogram of perpendicular detection distances. Tick marks show individual observations.

Histogram of perpendicular detection distances. Tick marks show individual observations.

Group size distribution

Key finding: The variance/mean ratio is 113.9, massively exceeding 1. This confirms extreme overdispersion relative to a Poisson distribution and justifies using the Negative Binomial distribution for group sizes rather than Poisson.

p1 <- ggplot(data.frame(mexdolphin$points)) +
  geom_histogram(aes(x = size), bins = 25, fill = "steelblue",
                 color = "white", boundary = 0) +
  labs(title = "Group size distribution",
       x = "Group size (animals)", y = "Count") +
  theme_bw()

p2 <- ggplot() +
  gg(mexdolphin$ppoly) +
  gg(mexdolphin$samplers, color = "grey") +
  gg(mexdolphin$points, aes(col = size), size = 2, alpha = 0.8) +
  scale_color_viridis_c(option = "plasma", name = "Group\nsize") +
  coord_sf(datum = fm_crs(mexdolphin$ppoly)) +
  labs(title = "Detected groups by size", x = NULL, y = NULL) +
  theme_bw()

p1 + p2
Left: raw group size distribution (right-skewed). Right: groups coloured by size overlaid on study region.

Left: raw group size distribution (right-skewed). Right: groups coloured by size overlaid on study region.


Shared model components

# Half-normal detection function
# g(y) = exp(-0.5 * (y/sigma)^2)
# sigma estimated from data — prior is Exponential(1/8) on sigma scale
hn <- function(distance, sigma) {
  exp(-0.5 * (distance / sigma)^2)
}

# Matern SPDE for group intensity — penalised complexity priors
# P(range < 50 km) = 0.01  -> allows large-scale spatial structure
# P(sigma > 2)     = 0.01  -> allows substantial spatial variation
matern <- inla.spde2.pcmatern(
  mexdolphin$mesh,
  prior.sigma = c(2, 0.01),
  prior.range = c(50, 0.01)
)

# Independent Matern SPDE for group size
# Slightly tighter sigma prior (1 vs 2) — group size variation less extreme
matern_size <- inla.spde2.pcmatern(
  mexdolphin$mesh,
  prior.sigma = c(1, 0.01),
  prior.range = c(50, 0.01)
)

Model 1: Standard LGCP (group intensity only)

This model treats each detection as a single point event — group size is ignored. It gives the spatial intensity of groups, not animals. This is the standard inlabru distance sampling LGCP as in Bachl et al. (2019).

The observation model is a thinned Poisson point process:

\[\lambda_{\text{obs}}(\mathbf{s}, y) = \lambda_{\text{groups}}(\mathbf{s}) \cdot g(y)\]

where \(g(y) = \exp(-y^2 / 2\sigma^2)\) is the half-normal detection function.

\[\log \lambda_{\text{groups}}(\mathbf{s}) = \alpha + \xi(\mathbf{s})\]

with \(\alpha\) an intercept and \(\xi(\mathbf{s})\) a Matérn SPDE random field.

cmp_groups <- ~ mySPDE(main = geometry, model = matern) +
  sigma(1,
        prec.linear = 1,
        marginal = bm_marginal(qexp, pexp, dexp, rate = 1 / 8)
  ) +
  Intercept(1)

form_groups <- geometry + distance ~ mySPDE +
  log(hn(distance, sigma)) +
  Intercept + log(2)

fit_groups <- lgcp(
  components = cmp_groups,
  mexdolphin$points,
  samplers = mexdolphin$samplers,
  domain = list(
    geometry = mesh,
    distance = fm_mesh_1d(seq(0, W, length.out = 30))
  ),
  formula = form_groups
)
summary(fit_groups)
## inlabru version: 2.13.0.9032 
## INLA version: 24.12.11 
## Latent components:
## mySPDE: main = spde(geometry)
## sigma: main = linear(1), marginal(NULL)
## Intercept: main = linear(1)
## Observation models:
##   Model tag: <No tag>
##     Family: 'cp'
##     Data class: 'sf', 'tbl_df', 'tbl', 'data.frame'
##     Response class: 'numeric'
##     Predictor: geometry + distance ~ mySPDE + log(hn(distance, sigma)) + Intercept + log(2)
##     Additive/Linear/Rowwise: FALSE/FALSE/TRUE
##     Used components: effect[mySPDE, sigma, Intercept], latent[] 
## Time used:
##     Pre = 0.293, Running = 1.61, Post = 0.858, Total = 2.76 
## Fixed effects:
##             mean    sd 0.025quant 0.5quant 0.975quant   mode   kld
## sigma     -0.050 0.209     -0.461   -0.050      0.360 -0.050 0.000
## Intercept -8.156 0.494     -9.287   -8.107     -7.339 -8.034 0.004
## 
## Random effects:
##   Name     Model
##     mySPDE SPDE2 model
## 
## Model hyperparameters:
##                    mean      sd 0.025quant 0.5quant 0.975quant    mode
## Range for mySPDE 295.50 147.305    110.592  262.948     673.84 209.846
## Stdev for mySPDE   0.81   0.251      0.416    0.777       1.39   0.717
## 
## Marginal log-Likelihood:  -390.35 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Detection function

distdf   <- data.frame(distance = seq(0, W, length.out = 200))
det_pred <- predict(fit_groups, distdf, ~ hn(distance, sigma))

ggplot(det_pred) +
  geom_ribbon(aes(x = distance, ymin = q0.025, ymax = q0.975),
              fill = "steelblue", alpha = 0.3) +
  geom_line(aes(x = distance, y = mean),
            color = "steelblue", linewidth = 1) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "grey50") +
  labs(
    x = "Perpendicular distance (km)",
    y = "Detection probability g(y)"
  ) +
  ylim(0, 1) +
  theme_bw()
Estimated half-normal detection function with 95% posterior credible interval (shaded).

Estimated half-normal detection function with 95% posterior credible interval (shaded).

Group intensity surface

pxl <- fm_pixels(mesh, dims = c(200, 100), mask = mexdolphin$ppoly)

pr.groups <- predict(fit_groups, pxl, ~ exp(mySPDE + Intercept))

ggplot() +
  gg(pr.groups, geom = "tile") +
  gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
  gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
  gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
  scale_fill_viridis_c(option = "viridis", name = "Groups\n/km\u00b2") +
  labs(title = "Model 1: Group intensity surface (groups/km\u00b2)",
       x = NULL, y = NULL) +
  theme_bw()
Posterior mean group intensity surface (groups/km²) from the standard LGCP.

Posterior mean group intensity surface (groups/km²) from the standard LGCP.


Model 2: Marked LGCP (group intensity × group size)

This extends Model 1 with a second likelihood for group sizes, allowing us to estimate animal density rather than just group intensity.

Model specification

Likelihood 1 (CP): Identical to Model 1 — thinned point process for group locations with half-normal detection.

Likelihood 2 (NegBin): Group sizes modelled as Negative Binomial with spatially varying mean:

\[m_i \mid \mathbf{s}_i \sim \text{NegBin}(\mu(\mathbf{s}_i), \kappa)\]

\[\log \mu(\mathbf{s}) = \gamma + \zeta(\mathbf{s})\]

where \(\gamma\) is an intercept, \(\zeta(\mathbf{s})\) is an independent Matérn SPDE random field (not shared with the intensity field — see Assumption 2), and \(\kappa\) is the NegBin overdispersion parameter.

Animal density surface:

\[D(\mathbf{s}) = \lambda_{\text{groups}}(\mathbf{s}) \cdot \mu(\mathbf{s})\]

with units animals/km².

# Two completely independent spatial fields — no sharing, no copy mechanism
cmp_marked <- ~ mySPDE(main = geometry, model = matern) +
  gp_mySPDE(main = geometry, model = matern_size) +
  sigma(1,
        prec.linear = 1,
        marginal = bm_marginal(qexp, pexp, dexp, rate = 1 / 8)
  ) +
  Intercept(1) +
  gp_Intercept(1)

fit_marked <- bru(
  components = cmp_marked,

  # Likelihood 1: thinned point process for group locations
  like(
    formula = geometry + distance ~ mySPDE +
      log(hn(distance, sigma)) +
      Intercept + log(2),
    family = "cp",
    data = mexdolphin$points,
    samplers = mexdolphin$samplers,
    domain = list(
      geometry = mesh,
      distance = fm_mesh_1d(seq(0, W, length.out = 30))
    )
  ),

  # Likelihood 2: NegBin for group sizes
  like(
    formula = size ~ gp_Intercept + gp_mySPDE,
    family = "nbinomial",
    data = mexdolphin$points,
    domain = list(geometry = mesh)
  ),

  options = bru_options(
    bru_verbose  = FALSE,
    bru_method   = list(rel_tol = 0.15),
    control.compute = list(dic = TRUE)
  )
)
summary(fit_marked)
## inlabru version: 2.13.0.9032 
## INLA version: 24.12.11 
## Latent components:
## mySPDE: main = spde(geometry)
## sigma: main = linear(1), marginal(NULL)
## Intercept: main = linear(1)
## gp_mySPDE: main = spde(geometry)
## gp_Intercept: main = linear(1)
## Observation models:
##   Model tag: <No tag>
##     Family: 'cp'
##     Data class: 'sf', 'tbl_df', 'tbl', 'data.frame'
##     Response class: 'numeric'
##     Predictor: geometry + distance ~ mySPDE + log(hn(distance, sigma)) + Intercept + log(2)
##     Additive/Linear/Rowwise: FALSE/FALSE/TRUE
##     Used components: effect[mySPDE, sigma, Intercept], latent[] 
##   Model tag: <No tag>
##     Family: 'nbinomial'
##     Data class: 'sf', 'data.frame'
##     Response class: 'numeric'
##     Predictor: size ~ gp_Intercept + gp_mySPDE
##     Additive/Linear/Rowwise: TRUE/TRUE/TRUE
##     Used components: effect[gp_mySPDE, gp_Intercept], latent[] 
## Time used:
##     Pre = 0.633, Running = 1.87, Post = 0.662, Total = 3.17 
## Fixed effects:
##                mean    sd 0.025quant 0.5quant 0.975quant  mode kld
## sigma        -0.050 0.209     -0.461   -0.050      0.360 -0.05   0
## Intercept    -8.178 0.514     -9.416   -8.112     -7.356 -7.96   0
## gp_Intercept  4.478 0.248      3.971    4.481      4.966  4.48   0
## 
## Random effects:
##   Name     Model
##     mySPDE SPDE2 model
##    gp_mySPDE SPDE2 model
## 
## Model hyperparameters:
##                                                              mean      sd
## size for the nbinomial observations (1/overdispersion)[2]   4.059   1.206
## Range for mySPDE                                          295.499 147.306
## Stdev for mySPDE                                            0.810   0.251
## Range for gp_mySPDE                                       155.218  76.080
## Stdev for gp_mySPDE                                         0.735   0.150
##                                                           0.025quant 0.5quant
## size for the nbinomial observations (1/overdispersion)[2]      2.175    3.897
## Range for mySPDE                                             110.587  262.946
## Stdev for mySPDE                                               0.416    0.777
## Range for gp_mySPDE                                           57.633  138.918
## Stdev for gp_mySPDE                                            0.483    0.720
##                                                           0.975quant    mode
## size for the nbinomial observations (1/overdispersion)[2]       6.88   3.596
## Range for mySPDE                                              673.84 209.844
## Stdev for mySPDE                                                1.39   0.717
## Range for gp_mySPDE                                           349.09 111.797
## Stdev for gp_mySPDE                                             1.07   0.690
## 
## Deviance Information Criterion (DIC) ...............: 1257.89
## Deviance Information Criterion (DIC, saturated) ....: 24365.61
## Effective number of parameters .....................: 34.56
## 
## Marginal log-Likelihood:  -655.16 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Spatial predictions

pxl <- fm_pixels(mesh, dims = c(200, 100), mask = mexdolphin$ppoly)

# Group intensity surface — groups per km²
pr.intensity <- predict(fit_marked, pxl, ~ exp(mySPDE + Intercept))

# Expected group size surface — animals per group
pr.size <- predict(fit_marked, pxl, ~ exp(gp_mySPDE + gp_Intercept))

# Animal density surface — animals per km²
# D(s) = lambda_groups(s) * mu(s)
pr.animals <- predict(
  fit_marked, pxl,
  ~ exp(mySPDE + Intercept) * exp(gp_mySPDE + gp_Intercept)
)

Group intensity

ggplot() +
  gg(pr.intensity, geom = "tile") +
  gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
  gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
  gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
  scale_fill_viridis_c(option = "viridis", name = "Groups\n/km\u00b2") +
  labs(title = "Group intensity surface (groups/km\u00b2)", x = NULL, y = NULL) +
  theme_bw()
Posterior mean group intensity (groups/km²). Driven by the point process likelihood only — where groups were detected after correcting for imperfect detection.

Posterior mean group intensity (groups/km²). Driven by the point process likelihood only — where groups were detected after correcting for imperfect detection.

Expected group size

ggplot() +
  gg(pr.size, geom = "tile") +
  gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
  gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
  gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
  scale_fill_viridis_c(option = "plasma", name = "Animals\n/group") +
  labs(title = "Expected group size surface (animals/group)", x = NULL, y = NULL) +
  theme_bw()
Posterior mean expected group size (animals/group). Driven by the independent group size SPDE field with shorter range (~155 km) than the intensity field (~295 km).

Posterior mean expected group size (animals/group). Driven by the independent group size SPDE field with shorter range (~155 km) than the intensity field (~295 km).

Animal density surface

ggplot() +
  gg(pr.animals, geom = "tile") +
  gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
  gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
  gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
  scale_fill_viridis_c(option = "viridis", name = "Animals\n/km\u00b2") +
  labs(title = "Animal density surface (animals/km\u00b2)",
       subtitle = "D(s) = \u03bb_groups(s) \u00d7 \u03bc(s)",
       x = NULL, y = NULL) +
  theme_bw()
Posterior mean animal density (animals/km²). Product of group intensity and expected group size — the primary output of the marked LGCP.

Posterior mean animal density (animals/km²). Product of group intensity and expected group size — the primary output of the marked LGCP.

Uncertainty surface

ggplot() +
  gg(pr.animals, geom = "tile", aes(fill = sd / mean)) +
  gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
  gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
  scale_fill_viridis_c(option = "plasma", name = "CV\n(sd/mean)") +
  labs(title = "Uncertainty \u2014 coefficient of variation (sd/mean)",
       subtitle = "High CV = high uncertainty relative to estimate magnitude",
       x = NULL, y = NULL) +
  theme_bw()
Coefficient of variation (CV = sd/mean) of the animal density surface. CV > 1 means the standard deviation exceeds the mean.

Coefficient of variation (CV = sd/mean) of the animal density surface. CV > 1 means the standard deviation exceeds the mean.


Total abundance estimate

The posterior expectation and variance of total abundance \(N = \int D(\mathbf{s}) \, d\mathbf{s}\) are:

\[E(N) = E\left(\int \lambda_{\text{groups}}(\mathbf{s}) \cdot \mu(\mathbf{s}) \, d\mathbf{s}\right)\]

\[\text{Var}(N) = \underbrace{E\left[\text{Var}(N \mid \lambda)\right]}_{\text{Poisson/NegBin randomness}} + \underbrace{\text{Var}\left[E(N \mid \lambda)\right]}_{\text{uncertainty in intensity surface}}\]

The second term is obtained from the variance of the predict() samples; the first term requires an additional prediction using the squared expected group size (see code below).

predpts <- fm_int(mesh, mexdolphin$ppoly, format = "sf")

Lambda_post <- predict(
  fit_marked,
  predpts,
  ~ c(
    expect   = sum(weight * exp(mySPDE + Intercept) *
                     exp(gp_mySPDE + gp_Intercept)),
    variance = sum(weight * exp(mySPDE + Intercept) *
                     exp(gp_mySPDE + gp_Intercept)^2)
  ),
  n.samples = 1000
)
count_expectation <- Lambda_post["expect",   "mean"]
count_variance    <- Lambda_post["variance", "mean"] +
                     Lambda_post["expect",   "sd"]^2
count_sd          <- sqrt(count_variance)
count_cv          <- count_sd / count_expectation
count_lo          <- count_expectation - 1.96 * count_sd
count_hi          <- count_expectation + 1.96 * count_sd
v_cond            <- Lambda_post["variance", "mean"]
v_uncond          <- Lambda_post["expect",   "sd"]^2
dominant_source   <- ifelse(v_uncond > v_cond,
                            "uncertainty in intensity surface",
                            "Poisson/NegBin randomness")

The marked LGCP estimates approximately 28,700 dolphins in the Gulf of Mexico survey region:

  • Expected N: 28,714
  • SD: 8,166
  • CV: 0.28
  • 95% CI: [12,708, 44,720]

The variance decomposition shows that the dominant source of uncertainty is uncertainty in intensity surface (E[conditional variance] = 6,672,592 vs Var[conditional expectation] = 60,017,275).


Comparison with density surface modelling

The marked LGCP abundance estimate of 28,700 animals (95% CI: 12,700–44,700) is in remarkably close agreement with the density surface model (DSM) analysis of the same dataset by Miller (2025), which estimates 28,462 animals using a quasi-Poisson bivariate smooth of location — a difference of less than 1.5%.

This convergence is noteworthy because the two frameworks are methodologically distinct:

Marked LGCP DSM
Framework Bayesian (INLA) Frequentist (GAM)
Spatial model Matérn SPDE random field Bivariate spline s(x, y)
Detection function Estimated jointly Estimated separately
Group size Spatially explicit NegBin field Horvitz-Thompson multiplier
Uncertainty Full posterior distribution Delta method CV

The DSM reports a total CV of 0.41 (SD ~11,683), compared to our LGCP CV of 0.28 (SD ~8,166). The somewhat lower CV from the LGCP likely reflects the Bayesian regularisation imposed by the PC priors on the SPDE, which shrinks the spatial field toward smoothness and reduces extrapolation variance in unsurveyed areas.

Both analyses agree that the highest dolphin density is concentrated in the north-central Gulf of Mexico, and that uncertainty is substantial across most of the study region — a consequence of the sparse survey design (n = 47).

The close agreement between frameworks provides informal validation that both approaches are capturing the same underlying signal, and that the marked LGCP assumptions (no size bias, independent spatial fields) do not substantially distort the abundance estimate relative to the standard DSM approach.

Reference: Miller, D.L. (2025). Example dsm analysis: pantropical dolphins in the Gulf of Mexico. dsm package vignette. https://distancesampling.org/dsm/articles/lines_gomex/mexico-analysis.html


References

  • Bachl, F.E., Lindgren, F., Borchers, D.L., Illian, J.B. (2019). inlabru: an R package for Bayesian spatial modelling from ecological survey data. Methods in Ecology and Evolution, 10, 760–766.

  • Houldcroft, A. et al. (2024). Joint spatial modeling of cluster size and density for a heavily hunted primate persisting in a heterogeneous landscape. Ecography.

  • Miller, D.L. (2025). Example dsm analysis: pantropical dolphins in the Gulf of Mexico. dsm package vignette. https://distancesampling.org/dsm/articles/lines_gomex/mexico-analysis.html

  • Miller, D.L., Burt, M.L., Rexstad, E.A., Thomas, L. (2013). Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4, 1001–1010.