Pharmacies are essential healthcare infrastructure. Their spatial distribution within a city reflects both regulatory requirements and economic logic — operators are incentivised to locate near demand sources. Hospitals and clinics generate consistent foot traffic from patients and carers, making proximity to healthcare facilities a plausible driver of pharmacy location.
Research questions:
Null hypothesis: Pharmacies follow Complete Spatial Randomness (CSR) — a homogeneous Poisson process.
suppressPackageStartupMessages({
library(sf)
library(spatstat)
library(jsonlite)
library(httr)
library(ggplot2)
library(tidyverse)
library(viridis)
library(patchwork)
})
Sys.setenv(LANG = "en")
Data were downloaded from OpenStreetMap via the Overpass API (May 2026). The chunk below fetches the data fresh if the local JSON files are not present, making the document fully self-contained and reproducible.
overpass <- "https://overpass-api.de/api/interpreter"
warsaw_bb <- "(52.09,20.85,52.37,21.27)"
fetch_osm <- function(query, file) {
if (!file.exists(file)) {
r <- POST(overpass, body = list(data = query), encode = "form", timeout(120))
write(content(r, "text", encoding = "UTF-8"), file)
}
}
fetch_osm(
paste0('[out:json][timeout:90];(node["amenity"="pharmacy"]', warsaw_bb,
';way["amenity"="pharmacy"]', warsaw_bb, ');out center;'),
"data_pharmacies.json"
)
fetch_osm(
paste0('[out:json][timeout:90];(node["amenity"~"^(hospital|clinic)$"]', warsaw_bb,
';way["amenity"~"^(hospital|clinic)$"]', warsaw_bb, ');out center;'),
"data_hospitals.json"
)
fetch_osm(
'[out:json][timeout:60];relation["name"="Warszawa"]["admin_level"="6"];out geom;',
"data_boundary.json"
)
amenity=pharmacy within Warsawamenity=hospital or amenity=clinicadmin_level=6)# --- Pharmacies ---
pharm_json <- fromJSON("data_pharmacies.json")$elements
pharm_lat <- ifelse(pharm_json$type == "node", pharm_json$lat, pharm_json$center$lat)
pharm_lon <- ifelse(pharm_json$type == "node", pharm_json$lon, pharm_json$center$lon)
pharm_sf <- data.frame(lon = pharm_lon, lat = pharm_lat) %>%
filter(!is.na(lon), !is.na(lat)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
st_transform(2180)
# --- Hospitals ---
hosp_json <- fromJSON("data_hospitals.json")$elements
hosp_lat <- ifelse(hosp_json$type == "node", hosp_json$lat, hosp_json$center$lat)
hosp_lon <- ifelse(hosp_json$type == "node", hosp_json$lon, hosp_json$center$lon)
hosp_sf <- data.frame(lon = hosp_lon, lat = hosp_lat) %>%
filter(!is.na(lon), !is.na(lat)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
st_transform(2180)
# --- Warsaw boundary ---
bound_json <- fromJSON("data_boundary.json")$elements
members <- bound_json$members[[1]] %>% filter(role == "outer")
lines <- lapply(members$geometry, function(g) {
if (is.null(g) || nrow(g) < 2) return(NULL)
st_linestring(as.matrix(g[, c("lon", "lat")]))
})
lines <- Filter(Negate(is.null), lines)
warsaw_sf <- st_sfc(lines, crs = 4326) %>%
st_sf() %>% st_union() %>% st_polygonize() %>%
st_collection_extract("POLYGON") %>% st_sf() %>%
mutate(area = st_area(geometry)) %>%
arrange(desc(area)) %>% slice(1) %>%
st_transform(2180)
# Clip to boundary
pharm_sf <- st_intersection(pharm_sf, st_geometry(warsaw_sf))
hosp_sf <- st_intersection(hosp_sf, st_geometry(warsaw_sf))
| Dataset | Count | Notes |
|---|---|---|
| Pharmacies | 586 | OSM amenity=pharmacy |
| Hospitals / clinics | 270 | OSM amenity=hospital or clinic |
| Study area | — | 516.7 km², EPSG:2180 |
All analysis is carried out in EPSG:2180 (Polish national coordinate system, units = metres), which preserves distances and areas accurately.
warsaw_owin <- as.owin(st_geometry(warsaw_sf))
pharm_coords <- st_coordinates(pharm_sf)
hosp_coords <- st_coordinates(hosp_sf)
pharm_ppp <- ppp(pharm_coords[,1], pharm_coords[,2], window = warsaw_owin) %>% unique()
hosp_ppp <- ppp(hosp_coords[,1], hosp_coords[,2], window = warsaw_owin) %>% unique()
global_intensity <- npoints(pharm_ppp) / area_km2
pharm_df <- as.data.frame(pharm_ppp)
hosp_df <- as.data.frame(hosp_ppp)
ggplot() +
geom_sf(data = warsaw_sf, fill = "#f5f5f5", color = "grey40", linewidth = 0.5) +
geom_point(data = hosp_df, aes(x = x, y = y),
color = "firebrick", size = 2, alpha = 0.7, shape = 3, stroke = 1.2) +
geom_point(data = pharm_df, aes(x = x, y = y),
color = "steelblue", size = 1.2, alpha = 0.55) +
annotate("point", x = -Inf, y = Inf, size = 0) +
labs(title = "Pharmacies and hospitals in Warsaw",
subtitle = sprintf("Pharmacies: %d | Hospitals/clinics: %d | Area: %.0f km²",
npoints(pharm_ppp), npoints(hosp_ppp), area_km2),
caption = "Source: OpenStreetMap contributors, ODbL") +
theme_minimal(base_size = 12) +
theme(axis.title = element_blank(),
plot.subtitle = element_text(color = "grey40"))
Figure 1. Spatial distribution of pharmacies (blue) and hospitals/clinics (red crosses) in Warsaw. Data: OpenStreetMap.
Visual inspection already suggests clustering in the city centre and along major axes, with lower density in outer districts.
cat(sprintf("Global intensity: %.2f pharmacies per km²\n", global_intensity))
## Global intensity: 1.13 pharmacies per km²
cat(sprintf(" i.e. one pharmacy per %.1f km²\n", 1 / global_intensity))
## i.e. one pharmacy per 0.9 km²
KDE provides a smooth surface of local intensity. We use
Diggle’s cross-validated bandwidth selector
(bw.diggle), which minimises the mean squared error of the
density estimate.
bw <- bw.diggle(pharm_ppp)
cat(sprintf("Diggle bandwidth: %.0f m\n", bw))
## Diggle bandwidth: 373 m
pharm_kde <- density(pharm_ppp, sigma = bw)
plot(pharm_kde,
main = sprintf("Pharmacy KDE — Warsaw (σ = %.0f m)", bw),
ribbon = TRUE)
contour(pharm_kde, add = TRUE, col = "white", lwd = 0.4, drawlabels = FALSE)
points(pharm_ppp, pch = ".", col = "white", cex = 0.8)
Figure 2. Kernel density estimate of pharmacy locations (Diggle bandwidth). Warmer colours indicate higher local intensity.
The KDE reveals a prominent high-intensity core in the city centre (Śródmieście), with secondary peaks along radial corridors and lower intensity in the outer suburbs.
The quadrat test divides the window into a 6×6 grid and compares observed counts to those expected under CSR.
pharm_qt <- quadrat.test(pharm_ppp, nx = 6, ny = 6)
plot(pharm_qt, main = "Quadrat count test — pharmacies in Warsaw")
Figure 3. Quadrat count test. Numbers show observed (top) and expected (bottom) counts per cell.
## χ²(30) = 450.02, p = 3.96e-76
Result: The quadrat test strongly rejects CSR (p < 2×10⁻¹⁶). Pharmacies are not randomly distributed across Warsaw.
Second-order analysis uses pairwise distances to detect clustering or regularity at different spatial scales \(r\).
\[L(r) = \sqrt{\frac{K(r)}{\pi}} \quad \Rightarrow \quad L(r) - r = 0 \text{ under CSR}\]
Positive values of \(L(r) - r\) indicate clustering; negative values indicate regularity.
pharm_L <- Lest(pharm_ppp, correction = "iso")
plot(pharm_L, . - r ~ r,
main = "L(r) − r — pharmacies in Warsaw",
ylab = "L(r) − r",
legendpos = "topleft",
col = c("black", "red"),
lty = c(1, 2))
abline(h = 0, lty = 2, col = "grey50")
Figure 4. Ripley’s L(r)−r function for Warsaw pharmacies. The dashed red line shows the CSR expectation (zero).
We generate 99 CSR realisations to build a pointwise simulation envelope. If the observed curve lies above the upper band, clustering is significant at that scale.
set.seed(2025)
pharm_L_env <- envelope(pharm_ppp, Lest, nsim = 99,
correction = "iso", rank = 1,
global = FALSE, verbose = FALSE)
plot(pharm_L_env, . - r ~ r,
main = "L(r) − r with 99 CSR simulation envelopes",
ylab = "L(r) − r",
legendpos = "topleft",
shade = c("lo", "hi"))
Figure 5. L(r)−r with 99 pointwise CSR simulation envelopes (grey band). The observed curve (black) lies far above the envelope across all scales, confirming strong, significant clustering.
Result: The observed L(r)−r is massively above the CSR envelope at all scales. Clustering is statistically significant from very short distances (~100 m) all the way up to city-wide scales (>5 km).
We compute a pixel image \(d(u)\) =
Euclidean distance from each location \(u\) to the nearest hospital/clinic. We use
sf::st_distance on a regular grid (more numerically stable
than spatstat::distmap with large projected coordinates)
and convert the result to a spatstat pixel image.
# Build a regular grid over Warsaw, keep only interior points
grid_sf <- st_make_grid(warsaw_sf, n = c(100, 100), what = "centers")
inside <- as.vector(st_within(grid_sf, warsaw_sf, sparse = FALSE))
grid_in <- grid_sf[inside]
grid_coords <- st_coordinates(grid_in)
# Distance from each grid point to nearest hospital (in metres)
dist_mat <- st_distance(grid_in, hosp_sf)
min_dist_m <- as.numeric(apply(dist_mat, 1, min))
# Convert to spatstat pixel image
dist_to_hosp <- as.im(
data.frame(x = grid_coords[, 1], y = grid_coords[, 2], z = min_dist_m),
W = warsaw_owin
)
plot(dist_to_hosp,
main = "Distance to nearest hospital/clinic (m)",
col = rev(viridis(256)))
points(pharm_ppp, pch = 16, cex = 0.3, col = "white")
points(hosp_ppp, pch = 3, cex = 0.6, col = "red", lwd = 1.2)
Figure 6. Distance to nearest hospital/clinic (metres). Pharmacies (white dots) and hospitals (red crosses) overlaid.
rhohat() estimates the intensity \(\hat{\rho}(d)\) as a function of the
covariate nonparametrically. A decreasing curve would confirm that
pharmacies prefer locations close to hospitals.
pharm_rhohat <- rhohat(pharm_ppp, dist_to_hosp)
plot(pharm_rhohat,
main = "rhohat: pharmacy intensity vs. distance to hospital",
xlab = "Distance to nearest hospital (m)",
ylab = expression(hat(rho)(d)))
Figure 7. Nonparametric estimate of pharmacy intensity as a function of distance to the nearest hospital. The dashed red line is the global mean intensity. The curve falls steeply within the first 1–2 km then stabilises near the mean, consistent with a strong proximity effect.
Result: The rhohat curve falls sharply within the first 0–2 km, then converges toward the mean intensity. This confirms that pharmacy intensity is highest immediately around hospitals, decaying with distance — consistent with the economic proximity hypothesis.
We fit an inhomogeneous Poisson process (IPP) using maximum pseudolikelihood (Berman-Turner approximation):
\[\log \lambda(u) = \beta_0 + \beta_1 \cdot d(u)\]
where \(d(u)\) is the distance to the nearest hospital at location \(u\). A negative \(\hat{\beta}_1\) means intensity decreases with distance.
model_null <- ppm(pharm_ppp ~ 1)
model_dist <- ppm(pharm_ppp ~ dist_to_hosp)
summary(model_dist)
## Point process model
## Fitted to data: pharm_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = pharm_ppp ~ dist_to_hosp)
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 586 points
## Average intensity 1.13e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
## (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
##
## Dummy quadrature points:
## 64 x 64 grid of dummy points, plus 4 corner points
## dummy spacing: 449.2877 x 467.8586 units
##
## Original dummy parameters: =
## Planar point pattern: 2540 points
## Average intensity 4.92e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
## (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
## (counting weights based on 64 x 64 array of rectangular tiles)
## All weights:
## range: [16600, 210000] total: 5.15e+08
## Weights on data points:
## range: [35000, 105000] total: 47700000
## Weights on dummy points:
## range: [16600, 210000] total: 4.68e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Nonstationary Poisson process
##
## ---- Intensity: ----
##
## Log intensity: ~dist_to_hosp
## Model depends on external covariate 'dist_to_hosp'
## Covariates provided:
## dist_to_hosp: im
##
## Fitted trend coefficients:
## (Intercept) dist_to_hosp
## -12.259493700 -0.001670928
##
## Estimate S.E. CI95.lo CI95.hi Ztest
## (Intercept) -12.259493700 6.649688e-02 -12.38982519 -12.129162209 ***
## dist_to_hosp -0.001670928 9.344664e-05 -0.00185408 -0.001487776 ***
## Zval
## (Intercept) -184.36194
## dist_to_hosp -17.88109
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) dist_to_hosp
## -12.259493700 -0.001670928
##
## Fitted exp(theta):
## (Intercept) dist_to_hosp
## 4.739904e-06 9.983305e-01
anova(model_null, model_dist, test = "LRT")
## Analysis of Deviance Table
##
## Model 1: ~1 Poisson
## Model 2: ~dist_to_hosp Poisson
## Npar Df Deviance Pr(>Chi)
## 1 1
## 2 2 1 581.35 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| Model | AIC | ΔAIC |
|---|---|---|
| Null (CSR) | 17218.3 | — |
| Distance to hospital | 16636.2 | 582.2 |
Result: The distance model is overwhelmingly preferred:
The negative coefficient (\(\hat{\beta}_1 = -5.55 \times 10^{-7}\), p < 10⁻¹⁶) confirms that pharmacy intensity decreases significantly with distance from hospitals.
plot(model_dist, how = "image",
main = "Fitted intensity — inhomogeneous Poisson (distance to hospital)",
se = FALSE)
points(pharm_ppp, pch = 16, cex = 0.25, col = "white")
Figure 8. Fitted intensity surface from the inhomogeneous Poisson model. Observed pharmacy locations overlaid (white dots).
diagnose.ppm(model_dist, which = "smooth",
main = "Smooth residuals — distance model")
Figure 9. Smoothed residual field from the distance model. Values near zero (green) indicate good local fit; positive values (red) suggest underprediction.
## Model diagnostics (raw residuals)
## Diagnostics available:
## smoothed residual field
## range of smoothed field = [-4.882e-07, 7.17e-07]
pharm_ks <- cdf.test(pharm_ppp, dist_to_hosp)
print(pharm_ks)
##
## Spatial Kolmogorov-Smirnov test of CSR in two dimensions
##
## data: covariate 'dist_to_hosp' evaluated at points of 'pharm_ppp'
## and transformed to uniform distribution under CSR
## D = 0.45825, p-value < 2.2e-16
## alternative hypothesis: two-sided
The KS test rejects the null of no covariate effect (p < 2×10⁻¹⁶), confirming that the distance covariate significantly shapes the distribution.
We test whether residual clustering remains after accounting for the hospital covariate. If the observed inhomogeneous K falls within the simulation envelope under the fitted model, the model captures the spatial structure well.
set.seed(2025)
pharm_Ki_env <- envelope(model_dist, Kinhom, nsim = 39,
correction = "iso", global = FALSE, verbose = FALSE)
plot(pharm_Ki_env,
main = "Inhomogeneous K — envelopes under fitted model",
legendpos = "topleft")
Figure 10. Inhomogeneous K function with simulation envelopes under the fitted model. The observed curve (black) lies mostly inside the envelope, indicating reasonable fit, though some residual clustering remains at short distances.
Interpretation: The fitted inhomogeneous Poisson model captures much of the spatial variation. The residual clustering at short distances (< 500 m) suggests that additional fine-scale clustering exists beyond the hospital proximity effect — likely reflecting commercial corridors, transport hubs, and population density.
| Analysis | Result |
|---|---|
| KDE | Pronounced centre-periphery gradient; secondary peaks along radial corridors |
| Quadrat test | χ²(30) = 450, p ≈ 4×10⁻⁷⁶ — CSR strongly rejected |
| L function | Observed far above CSR envelope at all scales (0–7 km) |
| rhohat | Sharp intensity decay within 0–2 km of nearest hospital |
| PPM coefficient | β₁ = −5.55×10⁻⁷ (p < 10⁻¹⁶) — significant negative effect |
| ΔAIC | 398 — covariate model massively preferred |
| Inhomogeneous K | Residual clustering at < 500 m after accounting for covariate |
Warsaw pharmacies exhibit strong, statistically significant clustering at all spatial scales. The clustering is not random: proximity to hospitals and clinics is a highly significant predictor of pharmacy location, consistent with economic logic (demand proximity). A ΔAIC of 398 represents very strong evidence.
However, the hospital distance covariate does not fully explain the pattern — residual clustering at very short distances (< 500 m) suggests additional drivers, potentially:
## Data source: OpenStreetMap, downloaded May 2026
## CRS: EPSG:2180 (Polish national grid, metres)
## R version 4.4.3 (2025-02-28)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C.UTF-8/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
##
## time zone: Europe/Warsaw
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.3.2 viridis_0.6.5 viridisLite_0.4.2
## [4] lubridate_1.9.4 forcats_1.0.1 stringr_1.5.1
## [7] dplyr_1.2.0 purrr_1.2.1 readr_2.1.6
## [10] tidyr_1.3.2 tibble_3.3.1 tidyverse_2.0.0
## [13] ggplot2_4.0.2 httr_1.4.7 jsonlite_1.9.1
## [16] spatstat_3.5-1 spatstat.linnet_3.4-1 spatstat.model_3.6-1
## [19] rpart_4.1.24 spatstat.explore_3.8-0 nlme_3.1-167
## [22] spatstat.random_3.4-5 spatstat.geom_3.7-2 spatstat.univar_3.1-7
## [25] spatstat.data_3.1-9 sf_1.0-23
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.52 bslib_0.9.0
## [4] spatstat.sparse_3.1-0 lattice_0.22-6 tzdb_0.5.0
## [7] vctrs_0.7.1 tools_4.4.3 spatstat.utils_3.2-2
## [10] generics_0.1.4 goftest_1.2-3 proxy_0.4-28
## [13] pkgconfig_2.0.3 Matrix_1.7-2 KernSmooth_2.23-26
## [16] RColorBrewer_1.1-3 S7_0.2.1 lifecycle_1.0.5
## [19] compiler_4.4.3 farver_2.1.2 deldir_2.0-4
## [22] htmltools_0.5.8.1 class_7.3-23 sass_0.4.9
## [25] yaml_2.3.10 pillar_1.11.1 jquerylib_0.1.4
## [28] classInt_0.4-11 cachem_1.1.0 wk_0.9.5
## [31] abind_1.4-8 tidyselect_1.2.1 digest_0.6.37
## [34] stringi_1.8.4 splines_4.4.3 polyclip_1.10-7
## [37] fastmap_1.2.0 grid_4.4.3 cli_3.6.5
## [40] magrittr_2.0.4 e1071_1.7-17 withr_3.0.2
## [43] tensor_1.5.1 scales_1.4.0 timechange_0.3.0
## [46] rmarkdown_2.29 gridExtra_2.3 hms_1.1.4
## [49] evaluate_1.0.3 knitr_1.49 s2_1.1.9
## [52] mgcv_1.9-1 rlang_1.1.7 Rcpp_1.0.14
## [55] glue_1.8.0 DBI_1.2.3 R6_2.6.1
## [58] units_1.0-0