Spatial Patterns of Beach Litter in Latvia: Evidence from Monitoring Site Data
Introduction
Marine and coastal litter constitutes an important environmental
problem, with consequences for ecosystems, tourism, and coastal
management. While many studies focus on aggregate litter counts or broad
spatial patterns, less attention is paid to the distinction between the
spatial arrangement of monitoring sites themselves and the spatial
structure of observed litter intensity. In coastal settings, this
distinction is particularly important, as monitoring locations are often
distributed along shorelines rather than across a two-dimensional plane.
This study analyses site-level litter data collected at coastal
monitoring locations in Latvia in 2024, with the objective of assessing
whether observed spatial patterns in litter counts reflect genuine
spatial dependence in the phenomenon of interest, or whether they are
primarily driven by the geometry of site placement along the coast. To
address this question, we combine descriptive analysis, point-pattern
diagnostics, spatial autocorrelation tests, and spatial regression
models within a unified and reproducible framework.
Data
The analysis distinguishes between several categories of coastal
litter, each representing a specific type of commonly observed waste
item. These categories include, among others, Cigarette butts, Plastic
bags, and related packaging or consumer residues. Cigarette butts refer
to discarded cigarette filters and remnants, which are among the most
frequently recorded forms of coastal litter worldwide. Due to their
small size, high abundance, and resistance to degradation, cigarette
butts tend to accumulate in large numbers and are often associated with
recreational beach use and nearby urban activity.
Each litter
category is analysed separately, meaning that the dependent variable Y
represents the site-level count of items belonging to a given category.
This category-specific approach allows spatial patterns and potential
clustering to differ across types of litter, reflecting distinct
behavioural, environmental, and usage-related mechanisms rather than a
single aggregated pollution process.
| category | n_sites | total | mean | median | sd | min | max | pct_zero |
|---|---|---|---|---|---|---|---|---|
| Cigarette butts | 44 | 4783 | 108.70 | 67.5 | 114.62 | 2 | 514 | 0.00 |
| Plastic ropes | 44 | 588 | 13.36 | 8.5 | 14.53 | 1 | 70 | 0.00 |
| Styrofoam and polystyrene | 44 | 506 | 11.50 | 6.0 | 16.81 | 0 | 103 | 2.27 |
| Plastic bags | 44 | 450 | 10.23 | 7.0 | 9.35 | 0 | 36 | 6.82 |
| Plastic bottle caps | 44 | 351 | 7.98 | 4.0 | 11.85 | 0 | 66 | 13.64 |
| SUP dishes and cutlery | 44 | 170 | 3.86 | 3.0 | 4.04 | 0 | 19 | 11.36 |
| Beverage containers | 44 | 66 | 1.50 | 1.0 | 1.77 | 0 | 7 | 38.64 |
Monitoring Sites Distribution
Clark Evans
##
## Clark-Evans test
## No edge correction
## Z-test
##
## data: sites_ppp
## R = 0.35818, p-value = 1.771e-10
## alternative hypothesis: two-sided
##
## Clark-Evans test
## Donnelly correction
## Z-test
##
## data: sites_ppp_bb
## R = 0.32271, p-value < 2.2e-16
## alternative hypothesis: two-sided
The Clark–Evans test indicates strong clustering in the spatial
arrangement of monitoring sites. In both the uncorrected and Donnelly
edge-corrected versions, the estimated R statistic is well below unity
(R ≈ 0.36 and R ≈ 0.32, respectively), and the null hypothesis of
complete spatial randomness is decisively rejected. This result reflects
the non-random placement of sites along the Latvian coastline rather
than clustering in the underlying litter process.
It should be
noted that the Donnelly edge correction applied in the Clark–Evans test
is only an approximation in this context. The correction assumes a
relatively simple observation window, whereas the monitoring sites
analysed here are distributed primarily along the coastline, which
constitutes a highly irregular and essentially one-dimensional sampling
geometry embedded in two-dimensional space. As a result, even the
edge-corrected Clark–Evans statistic should be interpreted with caution,
as it may still reflect artefacts of site placement rather than
meaningful spatial structure. This limitation further motivates the
subsequent analysis of spatial autocorrelation in model residuals and
the comparison of alternative neighbourhood definitions tailored to
coastal settings.
The Ripley’s K and corresponding L functions further confirm strong departures from complete spatial randomness in the spatial arrangement of monitoring sites. Across the entire range of distances, the empirical estimates of both K(r) and L(r) lie well above the theoretical Poisson benchmarks, indicating a persistent excess of neighbouring points relative to a random pattern.
HDBSCAN
In this step, density-based spatial clustering (HDBSCAN) is applied separately for each litter category to identify potential local concentrations of litter intensity. Clustering is performed on a three-dimensional space combining site coordinates and a normalized litter count, where extreme values are trimmed at the 95th percentile to reduce the influence of outliers. To distinguish clusters driven by litter intensity from those arising purely from site geometry, a control clustering is also estimated using spatial coordinates only. Summary tables report the number of detected clusters, clustered sites, and noise points for each category, allowing a direct comparison between intensity-informed and purely geometric clustering. This approach helps assess whether observed hotspots reflect genuine spatial patterns in litter accumulation or are primarily artefacts of the coastal sampling layout.
| category | n | n_noise | n_clustered | n_clusters | avg_prob |
|---|---|---|---|---|---|
| Beverage containers | 44 | 4 | 40 | 2 | 0.41 |
| Cigarette butts | 44 | 4 | 40 | 2 | 0.41 |
| Plastic bags | 44 | 4 | 40 | 2 | 0.41 |
| Plastic bottle caps | 44 | 4 | 40 | 2 | 0.41 |
| Plastic ropes | 44 | 4 | 40 | 2 | 0.41 |
| SUP dishes and cutlery | 44 | 4 | 40 | 2 | 0.41 |
| Styrofoam and polystyrene | 44 | 4 | 40 | 2 | 0.41 |
| category | n | n_noise_xy | n_clustered_xy | n_clusters_xy |
|---|---|---|---|---|
| Beverage containers | 44 | 4 | 40 | 2 |
| Cigarette butts | 44 | 4 | 40 | 2 |
| Plastic bags | 44 | 4 | 40 | 2 |
| Plastic bottle caps | 44 | 4 | 40 | 2 |
| Plastic ropes | 44 | 4 | 40 | 2 |
| SUP dishes and cutlery | 44 | 4 | 40 | 2 |
| Styrofoam and polystyrene | 44 | 4 | 40 | 2 |
The HDBSCAN results reveal a highly consistent clustering structure across all seven litter categories. When clustering is performed using both spatial coordinates and normalized litter intensity (x, y, y_norm), exactly the same number of clusters and noise points is obtained as in the control specification based solely on spatial coordinates (x, y). The corresponding maps show that cluster boundaries are nearly identical across categories and do not align with variations in litter intensity, which is reflected only in point size rather than cluster membership. This indicates that the detected clusters are driven primarily by the geometric arrangement of monitoring sites along the coastline rather than by genuine spatial concentrations of litter.
KDE
In this step, weighted kernel density estimation (KDE) is used to visualize relative spatial hotspots of litter intensity for each category. All spatial data are projected to a metric coordinate system to ensure correct distance-based smoothing, and litter counts are normalized within each category after trimming extreme values at the 95th percentile to limit the influence of outliers. The KDE is computed on a regular grid using data-driven bandwidths and then masked to the national boundary of Latvia, producing continuous density surfaces that are directly comparable across categories.
The weighted kernel density estimates show broadly similar spatial patterns across all litter categories. In each case, the highest relative densities are concentrated in the same central coastal segment, while lower densities are observed toward the eastern and western edges of the study area.
Spatial Autocorrelation of Residuals (Moran’s I)
In this step, Moran’s I statistic is used to assess whether spatial autocorrelation remains in the model residuals for each litter category. For every category, a k-nearest-neighbours spatial weights matrix (k = 4) is constructed based on Euclidean distances between monitoring sites, and Moran’s I is computed using this neighbourhood structure.
| category | n | moran_I | p_value |
|---|---|---|---|
| Cigarette butts | 44 | 0.2633 | 0.0010 |
| Plastic bags | 44 | 0.2361 | 0.0032 |
| Styrofoam and polystyrene | 44 | 0.0410 | 0.1792 |
| Beverage containers | 44 | 0.0546 | 0.2045 |
| Plastic bottle caps | 44 | 0.0175 | 0.3027 |
| Plastic ropes | 44 | -0.0020 | 0.4060 |
| SUP dishes and cutlery | 44 | -0.0359 | 0.5546 |
The Moran’s I results indicate statistically significant positive spatial autocorrelation for Cigarette butts and Plastic bags, with Moran’s I values of approximately 0.26 and 0.24, respectively, and p-values below 0.01. This suggests that higher (or lower) values of litter intensity tend to be spatially clustered for these two categories under a k-nearest-neighbours structure with k = 4. For the remaining categories, Moran’s I values are close to zero and statistically insignificant, indicating no detectable residual spatial dependence once the basic spatial structure has been accounted for.
Baseline Residual Moran’s I with Alternative Neighbourhood Definitions
This section tests whether the apparent spatial dependence in litter data is real or just an artefact of sampling along the coastline. A single continuous shoreline was extracted from Latvia’s boundary (EPSG:3059). Two spatial weights matrices were built for the monitoring sites: one based on straight-line (Euclidean) k-nearest neighbours (k=4), and another based on distance along the coast (also k=4). For cigarette butts and plastic bags, simple models were fitted: intercept-only and intercept + covariates; under both weighting schemes. Moran’s I was then calculated on the residuals to check for spatial autocorrelation.
| category | W | model | moran_I | p_value |
|---|---|---|---|---|
| Cigarette butts | coast(k=4) | Baseline 1: lm(y_log~1) | -0.0186 | 0.3926 |
| Cigarette butts | coast(k=4) | Baseline 2: lm(y_log~1+X) | -0.0150 | 0.3290 |
| Cigarette butts | eucl(k=4) | Baseline 1: lm(y_log~1) | 0.2342 | 0.0039 |
| Cigarette butts | eucl(k=4) | Baseline 2: lm(y_log~1+X) | 0.0573 | 0.2008 |
| Plastic bags | coast(k=4) | Baseline 1: lm(y_log~1) | -0.0086 | 0.2051 |
| Plastic bags | coast(k=4) | Baseline 2: lm(y_log~1+X) | -0.0121 | 0.2701 |
| Plastic bags | eucl(k=4) | Baseline 1: lm(y_log~1) | 0.2557 | 0.0019 |
| Plastic bags | eucl(k=4) | Baseline 2: lm(y_log~1+X) | 0.0384 | 0.2610 |
Baseline Moran’s I results indicate that residual spatial autocorrelation depends strongly on how spatial neighbourhoods are defined. Using Euclidean k-nearest-neighbour weights (k = 4), residuals from the intercept-only model exhibit positive and statistically significant spatial autocorrelation for both Cigarette butts and Plastic bags. However, this dependence weakens or disappears once additional covariates are included. Importantly, when neighbourhoods are defined along the coastline rather than by straight-line distance, Moran’s I is close to zero and statistically insignificant across all baseline specifications. This suggests that the apparent spatial autocorrelation observed under Euclidean weights is primarily driven by the geometry of site placement, rather than by genuine spatial spillovers in litter accumulation.
The comparison highlights a fundamental difference between Euclidean and coastline-based neighbourhood definitions. Under Euclidean kNN weights, neighbouring sites are connected by straight-line distance, which often links locations across bays or inlets, producing connections that cut through open water. In contrast, the coastline-based weights restrict neighbourhoods to proximity along the shoreline, resulting in connections that follow the coastal geometry. As a consequence, the Euclidean specification may induce apparent spatial dependence driven by geometric shortcuts, whereas the coastal specification better reflects the actual spatial continuity of monitoring locations.
Spatial model comparison under alternative neighbourhood definitions
Spatial regression models are fitted separately for Cigarette
butts and Plastic bags to check whether spatial dependence
persists after controlling for covariates and whether results depend on
neighbourhood definition. The response is log(1 + Y); a set of
non-collinear numeric covariates is selected. Two k=4 nearest-neighbour
weights are used: standard Euclidean and coastline-based (along-shore
distance). For each weight matrix, SAR, SEM, and SLX models are
estimated and compared by AIC; the best specification is chosen per
weight. Moran’s I is calculated on residuals to test for remaining
spatial autocorrelation.
The spatial regression models include the
following site-level covariates describing beach management,
accessibility, surrounding infrastructure, and tourism pressure:
X1_BathingZoneManag– The site is located next to an officially
designated bathing zone, which is subject to a stricter management
regime (e.g. smoking bans, enhanced monitoring, more frequent
clean-ups).
X2_BlueFlag – The site is located on a Blue Flag
beach, which must meet specific international standards related to
environmental management, cleanliness, and litter prevention.
X3_GarbageBin – A garbage container is available on the beach
during the season and located within approximately 200 meters of the
site in managed beach areas.
X4_BeachCleanup – Intensity and
regularity of municipal beach clean-up activities (e.g. mainly before
and after the summer season, with additional cleaning during the
season).
X5_BeachCafe – Presence of beach-side commercial
facilities, such as a café, restaurant, or kiosk, operating near the
site during the summer season.
X6_CoastCafe – Presence of
coastal commercial facilities (e.g. cafés or restaurants) located close
to the site but not directly on the beach.
X7_Accommod –
Presence of tourist accommodation facilities (e.g. campsite, guest
house, or hotel) in the vicinity of the site.
X8_Parking –
Availability of car parking near the site, either directly adjacent or
within approximately 300 meters.
X9_Shops – Presence of
retail shops near the site (within about 300 meters or located along the
main access route to the beach).
X10_PublTransport – Access
to public transport, defined as the presence of a nearby local or
intercity transport stop from which the beach can be reached directly.
X11_BeachUseCateg – Beach use intensity category (urban,
semi-urban, remote), based on standard HELCOM and EU classifications.
X12_LocalResidents – Number of inhabitants in the parish or
city where the site is located (population data for 2024, as of January
2025).
X13_TouristNights2024 – Total number of tourist
overnight stays (domestic and foreign) recorded in accommodation
facilities within the county or major city where the site is located in
2024.
X14_LVSeaBasin – Sea basin of the Latvian marine
waters in which the site is located
| category | W | model | AIC | moran_I | p_value |
|---|---|---|---|---|---|
| Cigarette butts | eucl(k=4) | SAR | 116.1212 | 0.0586 | 0.1970 |
| Cigarette butts | eucl(k=4) | SEM | 115.0887 | -0.0061 | 0.4292 |
| Cigarette butts | eucl(k=4) | SLX | 79.3925 | -0.1937 | 0.9645 |
| Cigarette butts | coast(k=4) | SAR | 116.0193 | -0.0144 | 0.3189 |
| Cigarette butts | coast(k=4) | SEM | 115.6527 | -0.0121 | 0.2799 |
| Plastic bags | eucl(k=4) | SAR | 114.6365 | -0.0501 | 0.6098 |
| Plastic bags | eucl(k=4) | SEM | 116.3717 | -0.0042 | 0.4217 |
| Plastic bags | eucl(k=4) | SLX | 104.9446 | -0.2481 | 0.9901 |
| Plastic bags | coast(k=4) | SAR | 116.4141 | -0.0093 | 0.2215 |
| Plastic bags | coast(k=4) | SEM | 116.3542 | -0.0155 | 0.3353 |
Once covariates are included, residual spatial autocorrelation disappears across all model specifications, with SLX models providing the best fit and no evidence of remaining spatial dependence.
Comparing fitted values from the SLX model under Euclidean and
coastline-based neighbourhood definitions reveals nearly identical
spatial patterns for both Cigarette butts and Plastic bags. In both
cases, predicted litter intensity varies smoothly along the coastline
and is characterized by localized peaks rather than large-scale
gradients. Importantly, changing the definition of spatial proximity
from straight-line distance to along-shore distance does not alter the
location or relative magnitude of these peaks. This indicates that the
observed spatial variation in litter accumulation is robust to the
choice of spatial weights and that apparent spatial dependence is not
driven by geometric shortcuts introduced by Euclidean distance.
Residuals display no spatial clustering under either Euclidean or
coastline-based weights, confirming that spatial dependence has been
effectively removed by the selected models.
Observed–fitted plots show that models estimated with Euclidean neighbourhoods provide a better predictive fit than those using coastline-based weights. Under Euclidean kNN, fitted values are more tightly aligned with the 45-degree reference line, particularly for low and medium litter counts, while coastal weights exhibit greater dispersion. Despite this difference in predictive accuracy, the main spatial patterns and substantive conclusions remain robust across neighbourhood definitions.
Sensitivity Check
To assess the robustness of spatial model results to the choice of neighbourhood size, a sensitivity analysis was conducted by varying the number of nearest neighbours from k = 3 to k = 6.For each litter category and each value of k, spatial models (SAR, SEM, and SLX) were re-estimated under both Euclidean and coastline-based weight definitions, and the best-performing specification was selected using AIC. Moran’s I was then computed on residuals from the selected model to verify whether spatial autocorrelation remained.
| category | W | k | model | AIC | moranI | p_val |
|---|---|---|---|---|---|---|
| Cigarette butts | coast(k=3) | 3 | SAR | 115.5469 | 0.0008 | 0.1396 |
| Cigarette butts | coast(k=4) | 4 | SEM | 115.6527 | -0.0121 | 0.2799 |
| Cigarette butts | coast(k=5) | 5 | SEM | 115.9209 | -0.0094 | 0.2026 |
| Cigarette butts | coast(k=6) | 6 | SAR | 116.0778 | -0.0019 | 0.0750 |
| Cigarette butts | eucl(k=3) | 3 | SLX | 88.9981 | -0.1385 | 0.8471 |
| Cigarette butts | eucl(k=4) | 4 | SLX | 79.3925 | -0.1937 | 0.9645 |
| Cigarette butts | eucl(k=5) | 5 | SLX | 94.3399 | -0.2094 | 0.9855 |
| Cigarette butts | eucl(k=6) | 6 | SLX | 102.7889 | -0.1806 | 0.9793 |
| Plastic bags | coast(k=3) | 3 | SEM | 116.2326 | -0.0157 | 0.3605 |
| Plastic bags | coast(k=4) | 4 | SEM | 116.3542 | -0.0155 | 0.3353 |
| Plastic bags | coast(k=5) | 5 | SEM | 114.2739 | -0.0168 | 0.3458 |
| Plastic bags | coast(k=6) | 6 | SEM | 116.7706 | -0.0099 | 0.1772 |
| Plastic bags | eucl(k=3) | 3 | SLX | 70.8104 | -0.2357 | 0.9725 |
| Plastic bags | eucl(k=4) | 4 | SLX | 104.9446 | -0.2481 | 0.9901 |
| Plastic bags | eucl(k=5) | 5 | SLX | 100.0741 | -0.2239 | 0.9909 |
| Plastic bags | eucl(k=6) | 6 | SLX | 108.2994 | -0.1616 | 0.9637 |
Sensitivity analysis confirms that the choice of neighbourhood size does not affect the main modelling conclusions. Under Euclidean kNN weights, the SLX specification is almost consistently selected as the best-performing model for all tested values of k = [3,6] and yields no residual spatial autocorrelation. Based on this stability and on standard practice, the SLX model with k=4 is correctly adopted as the baseline specification in the subsequent analysis.
GWR
Geographically Weighted Regression (GWR) is estimated separately for Cigarette butts and Plastic bags to examine whether the relationships between litter intensity and site characteristics vary across space, rather than being constant along the coastline. The dependent variable is modelled as log(1 + Y), and the same set of non-collinear numeric covariates is used as in the global spatial models.
The Local R2 maps from the GWR models reveal clear spatial heterogeneity in model fit along the Latvian coastline. For Cigarette butts, Local R2 values are consistently high across most locations, indicating that local covariates explain litter intensity well in many coastal segments. For Plastic bags, Local R2 varies more strongly across space, with areas of both high and lower explanatory power. These results suggest that the strength of relationships between litter accumulation and site characteristics is location-dependent, supporting the use of GWR as an exploratory complement to global spatial models.
Residual maps from the GWR models show no strong or systematic spatial clustering for either Cigarette butts or Plastic bags. Residuals are generally small in magnitude and spatially mixed, indicating that the GWR specification captures most locally varying relationships present in the data. Some minor localized deviations remain, but they do not form coherent spatial patterns along the coastline.
Conclusion
This analysis shows that spatial variation in coastal litter accumulation is primarily driven by local site characteristics rather than by substantive spatial spillover effects. Once observable covariates related to beach management, accessibility, and tourism pressure are included, residual spatial autocorrelation disappears across model specifications. Sensitivity analyses confirm that results are robust to alternative neighbourhood definitions and choices of neighbourhood size, with SLX models under Euclidean kNN weights providing the best and most stable fit. Overall, the findings suggest that policy-relevant, site-specific interventions are more important for explaining litter patterns than broader spatial diffusion processes along the coastline.