Modern wildlife monitoring rarely relies on a single data source. Systematic line transect surveys, opportunistic sightings, acoustic detections, and telemetry data are increasingly collected in parallel — each providing partial, imperfect information about the same underlying animal population. Data integration, combining these sources into a single coherent model, is therefore a central methodological challenge in ecological statistics.
The Log-Gaussian Cox Process (LGCP) framework
provides the most principled approach to this problem for detection-type
data. The core idea, as implemented in the inlabru R
package (Bachl et al. 2019), is that each data source can be treated as
an independent thinned realisation of the same latent
spatial intensity surface — with its own thinning mechanism reflecting
its specific observation process (detection function, survey effort,
observer behaviour). Multiple data sources are combined simply by adding
additional likelihoods to the model, each linked to the shared latent
field. This provides a statistically coherent way to borrow strength
across data sources while propagating uncertainty from each.
Despite its theoretical appeal, the LGCP framework has seen limited uptake among marine mammal researchers compared to conventional density surface modelling (DSM; Miller et al. 2013). A key practical barrier is that many marine mammals are not solitary animals. They are detected as groups, and those groups vary enormously in size: from a handful of individuals to aggregations of hundreds or thousands. A standard LGCP models the intensity of detection events (one point per sighting regardless of group size) and therefore estimates the spatial distribution of groups, not animals. Converting this to animal density requires explicitly modelling group size as part of the statistical framework, which introduces substantial additional complexity: the group size distribution is typically overdispersed, may vary spatially, and may be confounded with the detection process itself (larger groups being visible at greater distances — so-called size bias). Addressing this gap and making the LGCP framework tractable for marine species is therefore a necessary step toward broader adoption of principled data integration methods in marine mammal science.
This document demonstrates the LGCP framework applied to a single data source (systematic line transect surveys of pantropical dolphins in the Gulf of Mexico) as a foundation for multi-source integration. The standard spatial LGCP for this dataset is documented in the inlabru distance sampling vignette (Borchers & Lindgren, 2026; https://inlabru-org.github.io/inlabru/articles/2d_lgcp_distancesampling.html) and the original inlabru paper (Bachl et al., 2019), which we take as the starting point. We extend that standard model to a marked point process that jointly models group locations and group sizes, enabling estimation of animal density (animals/km²) rather than just group intensity (groups/km²).
This document implements a marked Log-Gaussian Cox Process
(LGCP) for estimating dolphin density from line transect
distance sampling data. The mexdolphin_sf dataset (inlabru)
contains 47 detected dolphin groups in the Gulf of Mexico, each with a
perpendicular detection distance and group size.
Standard LGCP models the spatial intensity of detected groups: it tells you where groups are, not how many animals there are. To estimate animal density we need to account for group size, which varies spatially. This requires a marked point process; treating group size as a “mark” attached to each detected location.
We write:
\[D(\mathbf{s}) = \underbrace{\lambda_{\text{groups}}(\mathbf{s})}_{\text{Likelihood 1: CP}} \times \underbrace{\mu(\mathbf{s})}_{\text{Likelihood 2: NegBin}}\]
where \(D(\mathbf{s})\) is animal density (animals/km²), \(\lambda_{\text{groups}}(\mathbf{s})\) is group intensity (groups/km²) corrected for imperfect detection, and \(\mu(\mathbf{s})\) is expected group size at location \(\mathbf{s}\).
These (problematic) assumptions are required for the two-likelihood factorisation:
Assumption 1 — No size bias: Detection probability \(g(y)\) depends only on perpendicular distance, not on group size. In reality, larger dolphin groups may be detectable at greater distances. Without this assumption the factorisation breaks down: if larger groups are more detectable, the fitted size field \(\zeta(\mathbf{s})\) absorbs detectability variation rather than true spatial variation in group size — the mark distribution and the thinning process become confounded. The approach of Houldcroft et al. (2024) resolves this by modelling \(g(y, m)\) jointly, at additional computational cost (that becomes unfeasable in this example)
Assumption 2 — Independent spatial fields (\(\phi = 0\)): The spatial process driving group intensity and the spatial process driving group size are completely independent. There is no shared latent field. Areas with more groups are not assumed to have larger (or smaller) groups.
Assumption 3 — Conditional independence of group sizes: The NegBin likelihood treats observed group sizes as conditionally independent given the spatial field \(\zeta(\mathbf{s})\). This requires that groups form independently of one another: that the size of one group does not depend on the sizes or locations of neighbouring groups. In reality this is unlikely: cetacean groups arise from an underlying animal-level aggregation process in which the same individuals cannot belong to multiple groups simultaneously. A group of 200 animals at one location removes those 200 animals from the pool available to form groups elsewhere. Properly accounting for this would require modelling the animal-level point process directly rather than the group-level process
library(inlabru)
library(fmesher)
library(INLA)
library(ggplot2)
library(patchwork)
bru_options_set(inla.mode = "experimental")
mexdolphin <- mexdolphin_sf
mesh <- mexdolphin$mesh
W <- 8 # truncation distance (km)
n_det <- nrow(mexdolphin$points)
gs <- mexdolphin$points$size
gs_mean <- round(mean(gs), 1)
gs_var <- round(var(gs), 1)
gs_vm <- round(gs_var / gs_mean, 1)
The dataset contains 47 detected groups with a truncation distance of W = 8 km. Group sizes range from 15 to 650 animals with a mean of 98.9 and variance/mean ratio of 113.9 — confirming strong overdispersion.
The survey covers the Gulf of Mexico with a series of line transects.
ggplot() +
gg(mexdolphin$ppoly) +
gg(mexdolphin$samplers, color = "grey") +
gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
coord_sf(datum = fm_crs(mexdolphin$ppoly)) +
labs(title = "Survey design: transects and detections",
x = NULL, y = NULL) +
theme_bw()
Survey transects (grey) and detected groups (black points).
The distance histogram should show declining detections with increasing distance from the transect — a key assumption of distance sampling.
ggplot(mexdolphin$points) +
geom_histogram(aes(x = distance),
breaks = seq(0, W, length.out = 9),
boundary = 0, fill = "steelblue",
color = "white", alpha = 0.8) +
geom_point(aes(x = distance), y = 0, pch = "|", cex = 4, color = "black") +
labs(title = "Perpendicular distance histogram",
x = "Distance from transect (km)", y = "Count") +
theme_bw()
Histogram of perpendicular detection distances. Tick marks show individual observations.
Key finding: The variance/mean ratio is 113.9, massively exceeding 1. This confirms extreme overdispersion relative to a Poisson distribution and justifies using the Negative Binomial distribution for group sizes rather than Poisson.
p1 <- ggplot(data.frame(mexdolphin$points)) +
geom_histogram(aes(x = size), bins = 25, fill = "steelblue",
color = "white", boundary = 0) +
labs(title = "Group size distribution",
x = "Group size (animals)", y = "Count") +
theme_bw()
p2 <- ggplot() +
gg(mexdolphin$ppoly) +
gg(mexdolphin$samplers, color = "grey") +
gg(mexdolphin$points, aes(col = size), size = 2, alpha = 0.8) +
scale_color_viridis_c(option = "plasma", name = "Group\nsize") +
coord_sf(datum = fm_crs(mexdolphin$ppoly)) +
labs(title = "Detected groups by size", x = NULL, y = NULL) +
theme_bw()
p1 + p2
Left: raw group size distribution (right-skewed). Right: groups coloured by size overlaid on study region.
This model treats each detection as a single point event — group size is ignored. It gives the spatial intensity of groups, not animals. This is the standard inlabru distance sampling LGCP as in Bachl et al. (2019).
The observation model is a thinned Poisson point process:
\[\lambda_{\text{obs}}(\mathbf{s}, y) = \lambda_{\text{groups}}(\mathbf{s}) \cdot g(y)\]
where \(g(y) = \exp(-y^2 / 2\sigma^2)\) is the half-normal detection function.
\[\log \lambda_{\text{groups}}(\mathbf{s}) = \alpha + \xi(\mathbf{s})\]
with \(\alpha\) an intercept and \(\xi(\mathbf{s})\) a Matérn SPDE random field.
cmp_groups <- ~ mySPDE(main = geometry, model = matern) +
sigma(1,
prec.linear = 1,
marginal = bm_marginal(qexp, pexp, dexp, rate = 1 / 8)
) +
Intercept(1)
form_groups <- geometry + distance ~ mySPDE +
log(hn(distance, sigma)) +
Intercept + log(2)
fit_groups <- lgcp(
components = cmp_groups,
mexdolphin$points,
samplers = mexdolphin$samplers,
domain = list(
geometry = mesh,
distance = fm_mesh_1d(seq(0, W, length.out = 30))
),
formula = form_groups
)
summary(fit_groups)
## inlabru version: 2.13.0.9032
## INLA version: 24.12.11
## Latent components:
## mySPDE: main = spde(geometry)
## sigma: main = linear(1), marginal(NULL)
## Intercept: main = linear(1)
## Observation models:
## Model tag: <No tag>
## Family: 'cp'
## Data class: 'sf', 'tbl_df', 'tbl', 'data.frame'
## Response class: 'numeric'
## Predictor: geometry + distance ~ mySPDE + log(hn(distance, sigma)) + Intercept + log(2)
## Additive/Linear/Rowwise: FALSE/FALSE/TRUE
## Used components: effect[mySPDE, sigma, Intercept], latent[]
## Time used:
## Pre = 0.293, Running = 1.61, Post = 0.858, Total = 2.76
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## sigma -0.050 0.209 -0.461 -0.050 0.360 -0.050 0.000
## Intercept -8.156 0.494 -9.287 -8.107 -7.339 -8.034 0.004
##
## Random effects:
## Name Model
## mySPDE SPDE2 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Range for mySPDE 295.50 147.305 110.592 262.948 673.84 209.846
## Stdev for mySPDE 0.81 0.251 0.416 0.777 1.39 0.717
##
## Marginal log-Likelihood: -390.35
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
distdf <- data.frame(distance = seq(0, W, length.out = 200))
det_pred <- predict(fit_groups, distdf, ~ hn(distance, sigma))
ggplot(det_pred) +
geom_ribbon(aes(x = distance, ymin = q0.025, ymax = q0.975),
fill = "steelblue", alpha = 0.3) +
geom_line(aes(x = distance, y = mean),
color = "steelblue", linewidth = 1) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "grey50") +
labs(
x = "Perpendicular distance (km)",
y = "Detection probability g(y)"
) +
ylim(0, 1) +
theme_bw()
Estimated half-normal detection function with 95% posterior credible interval (shaded).
pxl <- fm_pixels(mesh, dims = c(200, 100), mask = mexdolphin$ppoly)
pr.groups <- predict(fit_groups, pxl, ~ exp(mySPDE + Intercept))
ggplot() +
gg(pr.groups, geom = "tile") +
gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
scale_fill_viridis_c(option = "viridis", name = "Groups\n/km\u00b2") +
labs(title = "Model 1: Group intensity surface (groups/km\u00b2)",
x = NULL, y = NULL) +
theme_bw()
Posterior mean group intensity surface (groups/km²) from the standard LGCP.
This extends Model 1 with a second likelihood for group sizes, allowing us to estimate animal density rather than just group intensity.
Likelihood 1 (CP): Identical to Model 1 — thinned point process for group locations with half-normal detection.
Likelihood 2 (NegBin): Group sizes modelled as Negative Binomial with spatially varying mean:
\[m_i \mid \mathbf{s}_i \sim \text{NegBin}(\mu(\mathbf{s}_i), \kappa)\]
\[\log \mu(\mathbf{s}) = \gamma + \zeta(\mathbf{s})\]
where \(\gamma\) is an intercept, \(\zeta(\mathbf{s})\) is an independent Matérn SPDE random field (not shared with the intensity field — see Assumption 2), and \(\kappa\) is the NegBin overdispersion parameter.
Animal density surface:
\[D(\mathbf{s}) = \lambda_{\text{groups}}(\mathbf{s}) \cdot \mu(\mathbf{s})\]
with units animals/km².
# Two completely independent spatial fields — no sharing, no copy mechanism
cmp_marked <- ~ mySPDE(main = geometry, model = matern) +
gp_mySPDE(main = geometry, model = matern_size) +
sigma(1,
prec.linear = 1,
marginal = bm_marginal(qexp, pexp, dexp, rate = 1 / 8)
) +
Intercept(1) +
gp_Intercept(1)
fit_marked <- bru(
components = cmp_marked,
# Likelihood 1: thinned point process for group locations
like(
formula = geometry + distance ~ mySPDE +
log(hn(distance, sigma)) +
Intercept + log(2),
family = "cp",
data = mexdolphin$points,
samplers = mexdolphin$samplers,
domain = list(
geometry = mesh,
distance = fm_mesh_1d(seq(0, W, length.out = 30))
)
),
# Likelihood 2: NegBin for group sizes
like(
formula = size ~ gp_Intercept + gp_mySPDE,
family = "nbinomial",
data = mexdolphin$points,
domain = list(geometry = mesh)
),
options = bru_options(
bru_verbose = FALSE,
bru_method = list(rel_tol = 0.15),
control.compute = list(dic = TRUE)
)
)
summary(fit_marked)
## inlabru version: 2.13.0.9032
## INLA version: 24.12.11
## Latent components:
## mySPDE: main = spde(geometry)
## sigma: main = linear(1), marginal(NULL)
## Intercept: main = linear(1)
## gp_mySPDE: main = spde(geometry)
## gp_Intercept: main = linear(1)
## Observation models:
## Model tag: <No tag>
## Family: 'cp'
## Data class: 'sf', 'tbl_df', 'tbl', 'data.frame'
## Response class: 'numeric'
## Predictor: geometry + distance ~ mySPDE + log(hn(distance, sigma)) + Intercept + log(2)
## Additive/Linear/Rowwise: FALSE/FALSE/TRUE
## Used components: effect[mySPDE, sigma, Intercept], latent[]
## Model tag: <No tag>
## Family: 'nbinomial'
## Data class: 'sf', 'data.frame'
## Response class: 'numeric'
## Predictor: size ~ gp_Intercept + gp_mySPDE
## Additive/Linear/Rowwise: TRUE/TRUE/TRUE
## Used components: effect[gp_mySPDE, gp_Intercept], latent[]
## Time used:
## Pre = 0.633, Running = 1.87, Post = 0.662, Total = 3.17
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## sigma -0.050 0.209 -0.461 -0.050 0.360 -0.05 0
## Intercept -8.178 0.514 -9.416 -8.112 -7.356 -7.96 0
## gp_Intercept 4.478 0.248 3.971 4.481 4.966 4.48 0
##
## Random effects:
## Name Model
## mySPDE SPDE2 model
## gp_mySPDE SPDE2 model
##
## Model hyperparameters:
## mean sd
## size for the nbinomial observations (1/overdispersion)[2] 4.059 1.206
## Range for mySPDE 295.499 147.306
## Stdev for mySPDE 0.810 0.251
## Range for gp_mySPDE 155.218 76.080
## Stdev for gp_mySPDE 0.735 0.150
## 0.025quant 0.5quant
## size for the nbinomial observations (1/overdispersion)[2] 2.175 3.897
## Range for mySPDE 110.587 262.946
## Stdev for mySPDE 0.416 0.777
## Range for gp_mySPDE 57.633 138.918
## Stdev for gp_mySPDE 0.483 0.720
## 0.975quant mode
## size for the nbinomial observations (1/overdispersion)[2] 6.88 3.596
## Range for mySPDE 673.84 209.844
## Stdev for mySPDE 1.39 0.717
## Range for gp_mySPDE 349.09 111.797
## Stdev for gp_mySPDE 1.07 0.690
##
## Deviance Information Criterion (DIC) ...............: 1257.89
## Deviance Information Criterion (DIC, saturated) ....: 24365.61
## Effective number of parameters .....................: 34.56
##
## Marginal log-Likelihood: -655.16
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
pxl <- fm_pixels(mesh, dims = c(200, 100), mask = mexdolphin$ppoly)
# Group intensity surface — groups per km²
pr.intensity <- predict(fit_marked, pxl, ~ exp(mySPDE + Intercept))
# Expected group size surface — animals per group
pr.size <- predict(fit_marked, pxl, ~ exp(gp_mySPDE + gp_Intercept))
# Animal density surface — animals per km²
# D(s) = lambda_groups(s) * mu(s)
pr.animals <- predict(
fit_marked, pxl,
~ exp(mySPDE + Intercept) * exp(gp_mySPDE + gp_Intercept)
)
ggplot() +
gg(pr.intensity, geom = "tile") +
gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
scale_fill_viridis_c(option = "viridis", name = "Groups\n/km\u00b2") +
labs(title = "Group intensity surface (groups/km\u00b2)", x = NULL, y = NULL) +
theme_bw()
Posterior mean group intensity (groups/km²). Driven by the point process likelihood only — where groups were detected after correcting for imperfect detection.
ggplot() +
gg(pr.size, geom = "tile") +
gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
scale_fill_viridis_c(option = "plasma", name = "Animals\n/group") +
labs(title = "Expected group size surface (animals/group)", x = NULL, y = NULL) +
theme_bw()
Posterior mean expected group size (animals/group). Driven by the independent group size SPDE field with shorter range (~155 km) than the intensity field (~295 km).
ggplot() +
gg(pr.animals, geom = "tile") +
gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
gg(mexdolphin$points, size = 0.5, alpha = 0.8) +
scale_fill_viridis_c(option = "viridis", name = "Animals\n/km\u00b2") +
labs(title = "Animal density surface (animals/km\u00b2)",
subtitle = "D(s) = \u03bb_groups(s) \u00d7 \u03bc(s)",
x = NULL, y = NULL) +
theme_bw()
Posterior mean animal density (animals/km²). Product of group intensity and expected group size — the primary output of the marked LGCP.
ggplot() +
gg(pr.animals, geom = "tile", aes(fill = sd / mean)) +
gg(mexdolphin$ppoly, linewidth = 1, alpha = 0) +
gg(mexdolphin$samplers, color = "grey", linewidth = 0.3) +
scale_fill_viridis_c(option = "plasma", name = "CV\n(sd/mean)") +
labs(title = "Uncertainty \u2014 coefficient of variation (sd/mean)",
subtitle = "High CV = high uncertainty relative to estimate magnitude",
x = NULL, y = NULL) +
theme_bw()
Coefficient of variation (CV = sd/mean) of the animal density surface. CV > 1 means the standard deviation exceeds the mean.
The posterior expectation and variance of total abundance \(N = \int D(\mathbf{s}) \, d\mathbf{s}\) are:
\[E(N) = E\left(\int \lambda_{\text{groups}}(\mathbf{s}) \cdot \mu(\mathbf{s}) \, d\mathbf{s}\right)\]
\[\text{Var}(N) = \underbrace{E\left[\text{Var}(N \mid \lambda)\right]}_{\text{Poisson/NegBin randomness}} + \underbrace{\text{Var}\left[E(N \mid \lambda)\right]}_{\text{uncertainty in intensity surface}}\]
The second term is obtained from the variance of the
predict() samples; the first term requires an additional
prediction using the squared expected group size (see code below).
predpts <- fm_int(mesh, mexdolphin$ppoly, format = "sf")
Lambda_post <- predict(
fit_marked,
predpts,
~ c(
expect = sum(weight * exp(mySPDE + Intercept) *
exp(gp_mySPDE + gp_Intercept)),
variance = sum(weight * exp(mySPDE + Intercept) *
exp(gp_mySPDE + gp_Intercept)^2)
),
n.samples = 1000
)
count_expectation <- Lambda_post["expect", "mean"]
count_variance <- Lambda_post["variance", "mean"] +
Lambda_post["expect", "sd"]^2
count_sd <- sqrt(count_variance)
count_cv <- count_sd / count_expectation
count_lo <- count_expectation - 1.96 * count_sd
count_hi <- count_expectation + 1.96 * count_sd
v_cond <- Lambda_post["variance", "mean"]
v_uncond <- Lambda_post["expect", "sd"]^2
dominant_source <- ifelse(v_uncond > v_cond,
"uncertainty in intensity surface",
"Poisson/NegBin randomness")
The marked LGCP estimates approximately 28,700 dolphins in the Gulf of Mexico survey region:
The variance decomposition shows that the dominant source of uncertainty is uncertainty in intensity surface (E[conditional variance] = 6,672,592 vs Var[conditional expectation] = 60,017,275).
The marked LGCP abundance estimate of 28,700 animals (95% CI: 12,700–44,700) is in remarkably close agreement with the density surface model (DSM) analysis of the same dataset by Miller (2025), which estimates 28,462 animals using a quasi-Poisson bivariate smooth of location — a difference of less than 1.5%.
This convergence is noteworthy because the two frameworks are methodologically distinct:
| Marked LGCP | DSM | |
|---|---|---|
| Framework | Bayesian (INLA) | Frequentist (GAM) |
| Spatial model | Matérn SPDE random field | Bivariate spline s(x, y) |
| Detection function | Estimated jointly | Estimated separately |
| Group size | Spatially explicit NegBin field | Horvitz-Thompson multiplier |
| Uncertainty | Full posterior distribution | Delta method CV |
The DSM reports a total CV of 0.41 (SD ~11,683), compared to our LGCP CV of 0.28 (SD ~8,166). The somewhat lower CV from the LGCP likely reflects the Bayesian regularisation imposed by the PC priors on the SPDE, which shrinks the spatial field toward smoothness and reduces extrapolation variance in unsurveyed areas.
Both analyses agree that the highest dolphin density is concentrated in the north-central Gulf of Mexico, and that uncertainty is substantial across most of the study region — a consequence of the sparse survey design (n = 47).
The close agreement between frameworks provides informal validation that both approaches are capturing the same underlying signal, and that the marked LGCP assumptions (no size bias, independent spatial fields) do not substantially distort the abundance estimate relative to the standard DSM approach.
Reference: Miller, D.L. (2025). Example dsm analysis: pantropical dolphins in the Gulf of Mexico. dsm package vignette. https://distancesampling.org/dsm/articles/lines_gomex/mexico-analysis.html
Bachl, F.E., Lindgren, F., Borchers, D.L., Illian, J.B. (2019). inlabru: an R package for Bayesian spatial modelling from ecological survey data. Methods in Ecology and Evolution, 10, 760–766.
Houldcroft, A. et al. (2024). Joint spatial modeling of cluster size and density for a heavily hunted primate persisting in a heterogeneous landscape. Ecography.
Miller, D.L. (2025). Example dsm analysis: pantropical dolphins in the Gulf of Mexico. dsm package vignette. https://distancesampling.org/dsm/articles/lines_gomex/mexico-analysis.html
Miller, D.L., Burt, M.L., Rexstad, E.A., Thomas, L. (2013). Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4, 1001–1010.