Spatial Patterns of Beach Litter in Latvia: Evidence from Monitoring Site Data



Introduction


Marine and coastal litter constitutes an important environmental problem, with consequences for ecosystems, tourism, and coastal management. While many studies focus on aggregate litter counts or broad spatial patterns, less attention is paid to the distinction between the spatial arrangement of monitoring sites themselves and the spatial structure of observed litter intensity. In coastal settings, this distinction is particularly important, as monitoring locations are often distributed along shorelines rather than across a two-dimensional plane.
This study analyses site-level litter data collected at coastal monitoring locations in Latvia in 2024, with the objective of assessing whether observed spatial patterns in litter counts reflect genuine spatial dependence in the phenomenon of interest, or whether they are primarily driven by the geometry of site placement along the coast. To address this question, we combine descriptive analysis, point-pattern diagnostics, spatial autocorrelation tests, and spatial regression models within a unified and reproducible framework.



Data


The analysis distinguishes between several categories of coastal litter, each representing a specific type of commonly observed waste item. These categories include, among others, Cigarette butts, Plastic bags, and related packaging or consumer residues. Cigarette butts refer to discarded cigarette filters and remnants, which are among the most frequently recorded forms of coastal litter worldwide. Due to their small size, high abundance, and resistance to degradation, cigarette butts tend to accumulate in large numbers and are often associated with recreational beach use and nearby urban activity.
Each litter category is analysed separately, meaning that the dependent variable Y represents the site-level count of items belonging to a given category. This category-specific approach allows spatial patterns and potential clustering to differ across types of litter, reflecting distinct behavioural, environmental, and usage-related mechanisms rather than a single aggregated pollution process.

Descriptive Statistics of Litter Types
category n_sites total mean median sd min max pct_zero
Cigarette butts 44 4783 108.70 67.5 114.62 2 514 0.00
Plastic ropes 44 588 13.36 8.5 14.53 1 70 0.00
Styrofoam and polystyrene 44 506 11.50 6.0 16.81 0 103 2.27
Plastic bags 44 450 10.23 7.0 9.35 0 36 6.82
Plastic bottle caps 44 351 7.98 4.0 11.85 0 66 13.64
SUP dishes and cutlery 44 170 3.86 3.0 4.04 0 19 11.36
Beverage containers 44 66 1.50 1.0 1.77 0 7 38.64



Monitoring Sites Distribution



Clark Evans


In this step, the spatial configuration of the monitoring sites themselves is analysed using point-pattern methods. First, all site coordinates are transformed to a metric coordinate system (EPSG:3059) to ensure that distances are measured in meters. Then, the set of unique site locations is represented as a planar point process (ppp) within an observation window corresponding to Latvia’s boundary. The Clark–Evans test is computed to assess whether site locations are more clustered or more regularly spaced than expected under complete spatial randomness; both the uncorrected version and a Donnelly edge-corrected version is calculated. Finally, Ripley’s K and L functions (with standard edge corrections) are computed, in order to evaluate spatial dependence across multiple distance scales.
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sites_ppp
## R = 0.35818, p-value = 1.771e-10
## alternative hypothesis: two-sided
## 
##  Clark-Evans test
##  Donnelly correction
##  Z-test
## 
## data:  sites_ppp_bb
## R = 0.32271, p-value < 2.2e-16
## alternative hypothesis: two-sided

The Clark–Evans test indicates strong clustering in the spatial arrangement of monitoring sites. In both the uncorrected and Donnelly edge-corrected versions, the estimated R statistic is well below unity (R ≈ 0.36 and R ≈ 0.32, respectively), and the null hypothesis of complete spatial randomness is decisively rejected. This result reflects the non-random placement of sites along the Latvian coastline rather than clustering in the underlying litter process.
It should be noted that the Donnelly edge correction applied in the Clark–Evans test is only an approximation in this context. The correction assumes a relatively simple observation window, whereas the monitoring sites analysed here are distributed primarily along the coastline, which constitutes a highly irregular and essentially one-dimensional sampling geometry embedded in two-dimensional space. As a result, even the edge-corrected Clark–Evans statistic should be interpreted with caution, as it may still reflect artefacts of site placement rather than meaningful spatial structure. This limitation further motivates the subsequent analysis of spatial autocorrelation in model residuals and the comparison of alternative neighbourhood definitions tailored to coastal settings.


The Ripley’s K and corresponding L functions further confirm strong departures from complete spatial randomness in the spatial arrangement of monitoring sites. Across the entire range of distances, the empirical estimates of both K(r) and L(r) lie well above the theoretical Poisson benchmarks, indicating a persistent excess of neighbouring points relative to a random pattern.



HDBSCAN


In this step, density-based spatial clustering (HDBSCAN) is applied separately for each litter category to identify potential local concentrations of litter intensity. Clustering is performed on a three-dimensional space combining site coordinates and a normalized litter count, where extreme values are trimmed at the 95th percentile to reduce the influence of outliers. To distinguish clusters driven by litter intensity from those arising purely from site geometry, a control clustering is also estimated using spatial coordinates only. Summary tables report the number of detected clusters, clustered sites, and noise points for each category, allowing a direct comparison between intensity-informed and purely geometric clustering. This approach helps assess whether observed hotspots reflect genuine spatial patterns in litter accumulation or are primarily artefacts of the coastal sampling layout.

HDBSCAN (x,y,y_norm)
category n n_noise n_clustered n_clusters avg_prob
Beverage containers 44 4 40 2 0.41
Cigarette butts 44 4 40 2 0.41
Plastic bags 44 4 40 2 0.41
Plastic bottle caps 44 4 40 2 0.41
Plastic ropes 44 4 40 2 0.41
SUP dishes and cutlery 44 4 40 2 0.41
Styrofoam and polystyrene 44 4 40 2 0.41
HDBSCAN (x,y)
category n n_noise_xy n_clustered_xy n_clusters_xy
Beverage containers 44 4 40 2
Cigarette butts 44 4 40 2
Plastic bags 44 4 40 2
Plastic bottle caps 44 4 40 2
Plastic ropes 44 4 40 2
SUP dishes and cutlery 44 4 40 2
Styrofoam and polystyrene 44 4 40 2



The HDBSCAN results reveal a highly consistent clustering structure across all seven litter categories. When clustering is performed using both spatial coordinates and normalized litter intensity (x, y, y_norm), exactly the same number of clusters and noise points is obtained as in the control specification based solely on spatial coordinates (x, y). The corresponding maps show that cluster boundaries are nearly identical across categories and do not align with variations in litter intensity, which is reflected only in point size rather than cluster membership. This indicates that the detected clusters are driven primarily by the geometric arrangement of monitoring sites along the coastline rather than by genuine spatial concentrations of litter.



KDE


In this step, weighted kernel density estimation (KDE) is used to visualize relative spatial hotspots of litter intensity for each category. All spatial data are projected to a metric coordinate system to ensure correct distance-based smoothing, and litter counts are normalized within each category after trimming extreme values at the 95th percentile to limit the influence of outliers. The KDE is computed on a regular grid using data-driven bandwidths and then masked to the national boundary of Latvia, producing continuous density surfaces that are directly comparable across categories.


The weighted kernel density estimates show broadly similar spatial patterns across all litter categories. In each case, the highest relative densities are concentrated in the same central coastal segment, while lower densities are observed toward the eastern and western edges of the study area.



Spatial Autocorrelation of Residuals (Moran’s I)


In this step, Moran’s I statistic is used to assess whether spatial autocorrelation remains in the model residuals for each litter category. For every category, a k-nearest-neighbours spatial weights matrix (k = 4) is constructed based on Euclidean distances between monitoring sites, and Moran’s I is computed using this neighbourhood structure.

Moran’s I (kNN k=4, euclidean)
category n moran_I p_value
Cigarette butts 44 0.2633 0.0010
Plastic bags 44 0.2361 0.0032
Styrofoam and polystyrene 44 0.0410 0.1792
Beverage containers 44 0.0546 0.2045
Plastic bottle caps 44 0.0175 0.3027
Plastic ropes 44 -0.0020 0.4060
SUP dishes and cutlery 44 -0.0359 0.5546


The Moran’s I results indicate statistically significant positive spatial autocorrelation for Cigarette butts and Plastic bags, with Moran’s I values of approximately 0.26 and 0.24, respectively, and p-values below 0.01. This suggests that higher (or lower) values of litter intensity tend to be spatially clustered for these two categories under a k-nearest-neighbours structure with k = 4. For the remaining categories, Moran’s I values are close to zero and statistically insignificant, indicating no detectable residual spatial dependence once the basic spatial structure has been accounted for.



Baseline Residual Moran’s I with Alternative Neighbourhood Definitions


This section tests whether the apparent spatial dependence in litter data is real or just an artefact of sampling along the coastline. A single continuous shoreline was extracted from Latvia’s boundary (EPSG:3059). Two spatial weights matrices were built for the monitoring sites: one based on straight-line (Euclidean) k-nearest neighbours (k=4), and another based on distance along the coast (also k=4). For cigarette butts and plastic bags, simple models were fitted: intercept-only and intercept + covariates; under both weighting schemes. Moran’s I was then calculated on the residuals to check for spatial autocorrelation.

Baseline residual Moran’s I: Euclidean vs coastal weights (k = 4)
category W model moran_I p_value
Cigarette butts coast(k=4) Baseline 1: lm(y_log~1) -0.0186 0.3926
Cigarette butts coast(k=4) Baseline 2: lm(y_log~1+X) -0.0150 0.3290
Cigarette butts eucl(k=4) Baseline 1: lm(y_log~1) 0.2342 0.0039
Cigarette butts eucl(k=4) Baseline 2: lm(y_log~1+X) 0.0573 0.2008
Plastic bags coast(k=4) Baseline 1: lm(y_log~1) -0.0086 0.2051
Plastic bags coast(k=4) Baseline 2: lm(y_log~1+X) -0.0121 0.2701
Plastic bags eucl(k=4) Baseline 1: lm(y_log~1) 0.2557 0.0019
Plastic bags eucl(k=4) Baseline 2: lm(y_log~1+X) 0.0384 0.2610


Baseline Moran’s I results indicate that residual spatial autocorrelation depends strongly on how spatial neighbourhoods are defined. Using Euclidean k-nearest-neighbour weights (k = 4), residuals from the intercept-only model exhibit positive and statistically significant spatial autocorrelation for both Cigarette butts and Plastic bags. However, this dependence weakens or disappears once additional covariates are included. Importantly, when neighbourhoods are defined along the coastline rather than by straight-line distance, Moran’s I is close to zero and statistically insignificant across all baseline specifications. This suggests that the apparent spatial autocorrelation observed under Euclidean weights is primarily driven by the geometry of site placement, rather than by genuine spatial spillovers in litter accumulation.


The comparison highlights a fundamental difference between Euclidean and coastline-based neighbourhood definitions. Under Euclidean kNN weights, neighbouring sites are connected by straight-line distance, which often links locations across bays or inlets, producing connections that cut through open water. In contrast, the coastline-based weights restrict neighbourhoods to proximity along the shoreline, resulting in connections that follow the coastal geometry. As a consequence, the Euclidean specification may induce apparent spatial dependence driven by geometric shortcuts, whereas the coastal specification better reflects the actual spatial continuity of monitoring locations.



Spatial model comparison under alternative neighbourhood definitions


Spatial regression models are fitted separately for Cigarette butts and Plastic bags to check whether spatial dependence persists after controlling for covariates and whether results depend on neighbourhood definition. The response is log(1 + Y); a set of non-collinear numeric covariates is selected. Two k=4 nearest-neighbour weights are used: standard Euclidean and coastline-based (along-shore distance). For each weight matrix, SAR, SEM, and SLX models are estimated and compared by AIC; the best specification is chosen per weight. Moran’s I is calculated on residuals to test for remaining spatial autocorrelation.
The spatial regression models include the following site-level covariates describing beach management, accessibility, surrounding infrastructure, and tourism pressure:
X1_BathingZoneManag– The site is located next to an officially designated bathing zone, which is subject to a stricter management regime (e.g. smoking bans, enhanced monitoring, more frequent clean-ups).
X2_BlueFlag – The site is located on a Blue Flag beach, which must meet specific international standards related to environmental management, cleanliness, and litter prevention.
X3_GarbageBin – A garbage container is available on the beach during the season and located within approximately 200 meters of the site in managed beach areas.
X4_BeachCleanup – Intensity and regularity of municipal beach clean-up activities (e.g. mainly before and after the summer season, with additional cleaning during the season).
X5_BeachCafe – Presence of beach-side commercial facilities, such as a café, restaurant, or kiosk, operating near the site during the summer season.
X6_CoastCafe – Presence of coastal commercial facilities (e.g. cafés or restaurants) located close to the site but not directly on the beach.
X7_Accommod – Presence of tourist accommodation facilities (e.g. campsite, guest house, or hotel) in the vicinity of the site.
X8_Parking – Availability of car parking near the site, either directly adjacent or within approximately 300 meters.
X9_Shops – Presence of retail shops near the site (within about 300 meters or located along the main access route to the beach).
X10_PublTransport – Access to public transport, defined as the presence of a nearby local or intercity transport stop from which the beach can be reached directly.
X11_BeachUseCateg – Beach use intensity category (urban, semi-urban, remote), based on standard HELCOM and EU classifications.
X12_LocalResidents – Number of inhabitants in the parish or city where the site is located (population data for 2024, as of January 2025).
X13_TouristNights2024 – Total number of tourist overnight stays (domestic and foreign) recorded in accommodation facilities within the county or major city where the site is located in 2024.
X14_LVSeaBasin – Sea basin of the Latvian marine waters in which the site is located

Diagnostics
category W model AIC moran_I p_value
Cigarette butts eucl(k=4) SAR 116.1212 0.0586 0.1970
Cigarette butts eucl(k=4) SEM 115.0887 -0.0061 0.4292
Cigarette butts eucl(k=4) SLX 79.3925 -0.1937 0.9645
Cigarette butts coast(k=4) SAR 116.0193 -0.0144 0.3189
Cigarette butts coast(k=4) SEM 115.6527 -0.0121 0.2799
Plastic bags eucl(k=4) SAR 114.6365 -0.0501 0.6098
Plastic bags eucl(k=4) SEM 116.3717 -0.0042 0.4217
Plastic bags eucl(k=4) SLX 104.9446 -0.2481 0.9901
Plastic bags coast(k=4) SAR 116.4141 -0.0093 0.2215
Plastic bags coast(k=4) SEM 116.3542 -0.0155 0.3353

Once covariates are included, residual spatial autocorrelation disappears across all model specifications, with SLX models providing the best fit and no evidence of remaining spatial dependence.


Comparing fitted values from the SLX model under Euclidean and coastline-based neighbourhood definitions reveals nearly identical spatial patterns for both Cigarette butts and Plastic bags. In both cases, predicted litter intensity varies smoothly along the coastline and is characterized by localized peaks rather than large-scale gradients. Importantly, changing the definition of spatial proximity from straight-line distance to along-shore distance does not alter the location or relative magnitude of these peaks. This indicates that the observed spatial variation in litter accumulation is robust to the choice of spatial weights and that apparent spatial dependence is not driven by geometric shortcuts introduced by Euclidean distance.
Residuals display no spatial clustering under either Euclidean or coastline-based weights, confirming that spatial dependence has been effectively removed by the selected models.


Observed–fitted plots show that models estimated with Euclidean neighbourhoods provide a better predictive fit than those using coastline-based weights. Under Euclidean kNN, fitted values are more tightly aligned with the 45-degree reference line, particularly for low and medium litter counts, while coastal weights exhibit greater dispersion. Despite this difference in predictive accuracy, the main spatial patterns and substantive conclusions remain robust across neighbourhood definitions.



Sensitivity Check


To assess the robustness of spatial model results to the choice of neighbourhood size, a sensitivity analysis was conducted by varying the number of nearest neighbours from k = 3 to k = 6.For each litter category and each value of k, spatial models (SAR, SEM, and SLX) were re-estimated under both Euclidean and coastline-based weight definitions, and the best-performing specification was selected using AIC. Moran’s I was then computed on residuals from the selected model to verify whether spatial autocorrelation remained.

Sensitivity (k={3,4,5,6}): SLX
category W k model AIC moranI p_val
Cigarette butts coast(k=3) 3 SAR 115.5469 0.0008 0.1396
Cigarette butts coast(k=4) 4 SEM 115.6527 -0.0121 0.2799
Cigarette butts coast(k=5) 5 SEM 115.9209 -0.0094 0.2026
Cigarette butts coast(k=6) 6 SAR 116.0778 -0.0019 0.0750
Cigarette butts eucl(k=3) 3 SLX 88.9981 -0.1385 0.8471
Cigarette butts eucl(k=4) 4 SLX 79.3925 -0.1937 0.9645
Cigarette butts eucl(k=5) 5 SLX 94.3399 -0.2094 0.9855
Cigarette butts eucl(k=6) 6 SLX 102.7889 -0.1806 0.9793
Plastic bags coast(k=3) 3 SEM 116.2326 -0.0157 0.3605
Plastic bags coast(k=4) 4 SEM 116.3542 -0.0155 0.3353
Plastic bags coast(k=5) 5 SEM 114.2739 -0.0168 0.3458
Plastic bags coast(k=6) 6 SEM 116.7706 -0.0099 0.1772
Plastic bags eucl(k=3) 3 SLX 70.8104 -0.2357 0.9725
Plastic bags eucl(k=4) 4 SLX 104.9446 -0.2481 0.9901
Plastic bags eucl(k=5) 5 SLX 100.0741 -0.2239 0.9909
Plastic bags eucl(k=6) 6 SLX 108.2994 -0.1616 0.9637

Sensitivity analysis confirms that the choice of neighbourhood size does not affect the main modelling conclusions. Under Euclidean kNN weights, the SLX specification is almost consistently selected as the best-performing model for all tested values of k = [3,6] and yields no residual spatial autocorrelation. Based on this stability and on standard practice, the SLX model with k=4 is correctly adopted as the baseline specification in the subsequent analysis.



GWR


Geographically Weighted Regression (GWR) is estimated separately for Cigarette butts and Plastic bags to examine whether the relationships between litter intensity and site characteristics vary across space, rather than being constant along the coastline. The dependent variable is modelled as log(1 + Y), and the same set of non-collinear numeric covariates is used as in the global spatial models.


The Local R2 maps from the GWR models reveal clear spatial heterogeneity in model fit along the Latvian coastline. For Cigarette butts, Local R2 values are consistently high across most locations, indicating that local covariates explain litter intensity well in many coastal segments. For Plastic bags, Local R2 varies more strongly across space, with areas of both high and lower explanatory power. These results suggest that the strength of relationships between litter accumulation and site characteristics is location-dependent, supporting the use of GWR as an exploratory complement to global spatial models.


Residual maps from the GWR models show no strong or systematic spatial clustering for either Cigarette butts or Plastic bags. Residuals are generally small in magnitude and spatially mixed, indicating that the GWR specification captures most locally varying relationships present in the data. Some minor localized deviations remain, but they do not form coherent spatial patterns along the coastline.



Conclusion


This analysis shows that spatial variation in coastal litter accumulation is primarily driven by local site characteristics rather than by substantive spatial spillover effects. Once observable covariates related to beach management, accessibility, and tourism pressure are included, residual spatial autocorrelation disappears across model specifications. Sensitivity analyses confirm that results are robust to alternative neighbourhood definitions and choices of neighbourhood size, with SLX models under Euclidean kNN weights providing the best and most stable fit. Overall, the findings suggest that policy-relevant, site-specific interventions are more important for explaining litter patterns than broader spatial diffusion processes along the coastline.