Spatial Distribution of Pharmacies in Warsaw

1. Introduction

Pharmacies are essential healthcare infrastructure. Their spatial distribution within a city reflects both regulatory requirements and economic logic — operators are incentivised to locate near demand sources. Hospitals and clinics generate consistent foot traffic from patients and carers, making proximity to healthcare facilities a plausible driver of pharmacy location.

Research questions:

Are pharmacies in Warsaw spatially clustered, regularly spaced, or randomly distributed?
Is pharmacy intensity driven by distance to hospitals and clinics?

Null hypothesis: Pharmacies follow Complete Spatial Randomness (CSR) — a homogeneous Poisson process.

2. Data

suppressPackageStartupMessages({
  library(sf)
  library(spatstat)
  library(jsonlite)
  library(httr)
  library(ggplot2)
  library(tidyverse)
  library(viridis)
  library(patchwork)
})
Sys.setenv(LANG = "en")

Data were downloaded from OpenStreetMap via the Overpass API (May 2026). The chunk below fetches the data fresh if the local JSON files are not present, making the document fully self-contained and reproducible.

overpass <- "https://overpass-api.de/api/interpreter"
warsaw_bb <- "(52.09,20.85,52.37,21.27)"

fetch_osm <- function(query, file) {
  if (!file.exists(file)) {
    r <- POST(overpass, body = list(data = query), encode = "form", timeout(120))
    write(content(r, "text", encoding = "UTF-8"), file)
  }
}

fetch_osm(
  paste0('[out:json][timeout:90];(node["amenity"="pharmacy"]', warsaw_bb,
         ';way["amenity"="pharmacy"]', warsaw_bb, ');out center;'),
  "data_pharmacies.json"
)
fetch_osm(
  paste0('[out:json][timeout:90];(node["amenity"~"^(hospital|clinic)$"]', warsaw_bb,
         ';way["amenity"~"^(hospital|clinic)$"]', warsaw_bb, ');out center;'),
  "data_hospitals.json"
)
fetch_osm(
  '[out:json][timeout:60];relation["name"="Warszawa"]["admin_level"="6"];out geom;',
  "data_boundary.json"
)

Pharmacies — OSM nodes/ways tagged amenity=pharmacy within Warsaw
Hospitals/clinics — OSM features tagged amenity=hospital or amenity=clinic
Warsaw administrative boundary — outer polygon of the city (admin_level=6)

# --- Pharmacies ---
pharm_json <- fromJSON("data_pharmacies.json")$elements
pharm_lat  <- ifelse(pharm_json$type == "node", pharm_json$lat, pharm_json$center$lat)
pharm_lon  <- ifelse(pharm_json$type == "node", pharm_json$lon, pharm_json$center$lon)
pharm_sf   <- data.frame(lon = pharm_lon, lat = pharm_lat) %>%
  filter(!is.na(lon), !is.na(lat)) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
  st_transform(2180)

# --- Hospitals ---
hosp_json <- fromJSON("data_hospitals.json")$elements
hosp_lat  <- ifelse(hosp_json$type == "node", hosp_json$lat, hosp_json$center$lat)
hosp_lon  <- ifelse(hosp_json$type == "node", hosp_json$lon, hosp_json$center$lon)
hosp_sf   <- data.frame(lon = hosp_lon, lat = hosp_lat) %>%
  filter(!is.na(lon), !is.na(lat)) %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
  st_transform(2180)

# --- Warsaw boundary ---
bound_json <- fromJSON("data_boundary.json")$elements
members    <- bound_json$members[[1]] %>% filter(role == "outer")
lines      <- lapply(members$geometry, function(g) {
  if (is.null(g) || nrow(g) < 2) return(NULL)
  st_linestring(as.matrix(g[, c("lon", "lat")]))
})
lines      <- Filter(Negate(is.null), lines)
warsaw_sf  <- st_sfc(lines, crs = 4326) %>%
  st_sf() %>% st_union() %>% st_polygonize() %>%
  st_collection_extract("POLYGON") %>% st_sf() %>%
  mutate(area = st_area(geometry)) %>%
  arrange(desc(area)) %>% slice(1) %>%
  st_transform(2180)

# Clip to boundary
pharm_sf <- st_intersection(pharm_sf, st_geometry(warsaw_sf))
hosp_sf  <- st_intersection(hosp_sf,  st_geometry(warsaw_sf))

Table 1. Dataset summary.
Dataset	Count	Notes
Pharmacies	586	OSM amenity=pharmacy
Hospitals / clinics	270	OSM amenity=hospital or clinic
Study area	—	516.7 km², EPSG:2180

3. Study Area and Observation Window

All analysis is carried out in EPSG:2180 (Polish national coordinate system, units = metres), which preserves distances and areas accurately.

warsaw_owin  <- as.owin(st_geometry(warsaw_sf))
pharm_coords <- st_coordinates(pharm_sf)
hosp_coords  <- st_coordinates(hosp_sf)

pharm_ppp <- ppp(pharm_coords[,1], pharm_coords[,2], window = warsaw_owin) %>% unique()
hosp_ppp  <- ppp(hosp_coords[,1],  hosp_coords[,2],  window = warsaw_owin) %>% unique()

global_intensity <- npoints(pharm_ppp) / area_km2

pharm_df <- as.data.frame(pharm_ppp)
hosp_df  <- as.data.frame(hosp_ppp)

ggplot() +
  geom_sf(data = warsaw_sf, fill = "#f5f5f5", color = "grey40", linewidth = 0.5) +
  geom_point(data = hosp_df,  aes(x = x, y = y),
             color = "firebrick", size = 2, alpha = 0.7, shape = 3, stroke = 1.2) +
  geom_point(data = pharm_df, aes(x = x, y = y),
             color = "steelblue", size = 1.2, alpha = 0.55) +
  annotate("point", x = -Inf, y = Inf, size = 0) +
  labs(title = "Pharmacies and hospitals in Warsaw",
       subtitle = sprintf("Pharmacies: %d  |  Hospitals/clinics: %d  |  Area: %.0f km²",
                          npoints(pharm_ppp), npoints(hosp_ppp), area_km2),
       caption = "Source: OpenStreetMap contributors, ODbL") +
  theme_minimal(base_size = 12) +
  theme(axis.title = element_blank(),
        plot.subtitle = element_text(color = "grey40"))

Figure 1. Spatial distribution of pharmacies (blue) and hospitals/clinics (red crosses) in Warsaw. Data: OpenStreetMap.

Visual inspection already suggests clustering in the city centre and along major axes, with lower density in outer districts.

4. First-Order Analysis

4.1 Global intensity

cat(sprintf("Global intensity: %.2f pharmacies per km²\n", global_intensity))

## Global intensity: 1.13 pharmacies per km²

cat(sprintf("             i.e. one pharmacy per %.1f km²\n", 1 / global_intensity))

##              i.e. one pharmacy per 0.9 km²

4.2 Kernel Density Estimation

KDE provides a smooth surface of local intensity. We use Diggle’s cross-validated bandwidth selector (bw.diggle), which minimises the mean squared error of the density estimate.

bw <- bw.diggle(pharm_ppp)
cat(sprintf("Diggle bandwidth: %.0f m\n", bw))

## Diggle bandwidth: 373 m

pharm_kde <- density(pharm_ppp, sigma = bw)
plot(pharm_kde,
     main = sprintf("Pharmacy KDE — Warsaw  (σ = %.0f m)", bw),
     ribbon = TRUE)
contour(pharm_kde, add = TRUE, col = "white", lwd = 0.4, drawlabels = FALSE)
points(pharm_ppp, pch = ".", col = "white", cex = 0.8)

Figure 2. Kernel density estimate of pharmacy locations (Diggle bandwidth). Warmer colours indicate higher local intensity.

The KDE reveals a prominent high-intensity core in the city centre (Śródmieście), with secondary peaks along radial corridors and lower intensity in the outer suburbs.

4.3 Quadrat Count Test

The quadrat test divides the window into a 6×6 grid and compares observed counts to those expected under CSR.

H₀: Pharmacies follow a homogeneous Poisson process (CSR)
H₁: Significant deviation from CSR

pharm_qt <- quadrat.test(pharm_ppp, nx = 6, ny = 6)
plot(pharm_qt, main = "Quadrat count test — pharmacies in Warsaw")

Figure 3. Quadrat count test. Numbers show observed (top) and expected (bottom) counts per cell.

## χ²(30) = 450.02,  p = 3.96e-76

Result: The quadrat test strongly rejects CSR (p < 2×10⁻¹⁶). Pharmacies are not randomly distributed across Warsaw.

5. Second-Order Analysis: Ripley’s K and L Functions

Second-order analysis uses pairwise distances to detect clustering or regularity at different spatial scales \(r\).

5.1 K and L Functions

\[L(r) = \sqrt{\frac{K(r)}{\pi}} \quad \Rightarrow \quad L(r) - r = 0 \text{ under CSR}\]

Positive values of \(L(r) - r\) indicate clustering; negative values indicate regularity.

pharm_L <- Lest(pharm_ppp, correction = "iso")
plot(pharm_L, . - r ~ r,
     main = "L(r) − r  —  pharmacies in Warsaw",
     ylab = "L(r) − r",
     legendpos = "topleft",
     col = c("black", "red"),
     lty = c(1, 2))
abline(h = 0, lty = 2, col = "grey50")

Figure 4. Ripley’s L(r)−r function for Warsaw pharmacies. The dashed red line shows the CSR expectation (zero).

5.2 Simulation Envelopes Under CSR

We generate 99 CSR realisations to build a pointwise simulation envelope. If the observed curve lies above the upper band, clustering is significant at that scale.

set.seed(2025)
pharm_L_env <- envelope(pharm_ppp, Lest, nsim = 99,
                         correction = "iso", rank = 1,
                         global = FALSE, verbose = FALSE)
plot(pharm_L_env, . - r ~ r,
     main = "L(r) − r  with 99 CSR simulation envelopes",
     ylab = "L(r) − r",
     legendpos = "topleft",
     shade = c("lo", "hi"))

Figure 5. L(r)−r with 99 pointwise CSR simulation envelopes (grey band). The observed curve (black) lies far above the envelope across all scales, confirming strong, significant clustering.

Result: The observed L(r)−r is massively above the CSR envelope at all scales. Clustering is statistically significant from very short distances (~100 m) all the way up to city-wide scales (>5 km).

6. Covariate Analysis: Distance to Nearest Hospital

6.1 Distance Image

We compute a pixel image \(d(u)\) = Euclidean distance from each location \(u\) to the nearest hospital/clinic. We use sf::st_distance on a regular grid (more numerically stable than spatstat::distmap with large projected coordinates) and convert the result to a spatstat pixel image.

# Build a regular grid over Warsaw, keep only interior points
grid_sf     <- st_make_grid(warsaw_sf, n = c(100, 100), what = "centers")
inside      <- as.vector(st_within(grid_sf, warsaw_sf, sparse = FALSE))
grid_in     <- grid_sf[inside]
grid_coords <- st_coordinates(grid_in)

# Distance from each grid point to nearest hospital (in metres)
dist_mat    <- st_distance(grid_in, hosp_sf)
min_dist_m  <- as.numeric(apply(dist_mat, 1, min))

# Convert to spatstat pixel image
dist_to_hosp <- as.im(
  data.frame(x = grid_coords[, 1], y = grid_coords[, 2], z = min_dist_m),
  W = warsaw_owin
)

plot(dist_to_hosp,
     main = "Distance to nearest hospital/clinic (m)",
     col  = rev(viridis(256)))
points(pharm_ppp, pch = 16, cex = 0.3, col = "white")
points(hosp_ppp,  pch = 3,  cex = 0.6, col = "red", lwd = 1.2)

Figure 6. Distance to nearest hospital/clinic (metres). Pharmacies (white dots) and hospitals (red crosses) overlaid.

6.2 Nonparametric Intensity Estimate: rhohat

rhohat() estimates the intensity \(\hat{\rho}(d)\) as a function of the covariate nonparametrically. A decreasing curve would confirm that pharmacies prefer locations close to hospitals.

pharm_rhohat <- rhohat(pharm_ppp, dist_to_hosp)

plot(pharm_rhohat,
     main  = "rhohat: pharmacy intensity vs. distance to hospital",
     xlab  = "Distance to nearest hospital (m)",
     ylab  = expression(hat(rho)(d)))

Figure 7. Nonparametric estimate of pharmacy intensity as a function of distance to the nearest hospital. The dashed red line is the global mean intensity. The curve falls steeply within the first 1–2 km then stabilises near the mean, consistent with a strong proximity effect.

Result: The rhohat curve falls sharply within the first 0–2 km, then converges toward the mean intensity. This confirms that pharmacy intensity is highest immediately around hospitals, decaying with distance — consistent with the economic proximity hypothesis.

7. Point Process Model

7.1 Model Specification

We fit an inhomogeneous Poisson process (IPP) using maximum pseudolikelihood (Berman-Turner approximation):

\[\log \lambda(u) = \beta_0 + \beta_1 \cdot d(u)\]

where \(d(u)\) is the distance to the nearest hospital at location \(u\). A negative \(\hat{\beta}_1\) means intensity decreases with distance.

model_null <- ppm(pharm_ppp ~ 1)
model_dist <- ppm(pharm_ppp ~ dist_to_hosp)
summary(model_dist)

## Point process model
## Fitted to data: pharm_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = pharm_ppp ~ dist_to_hosp)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  586 points
## Average intensity 1.13e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      64 x 64 grid of dummy points, plus 4 corner points
##      dummy spacing: 449.2877 x 467.8586 units
## 
## Original dummy parameters: =
## Planar point pattern:  2540 points
## Average intensity 4.92e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 64 x 64 array of rectangular tiles)
## All weights:
##  range: [16600, 210000]  total: 5.15e+08
## Weights on data points:
##  range: [35000, 105000]  total: 47700000
## Weights on dummy points:
##  range: [16600, 210000]  total: 4.68e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~dist_to_hosp
## Model depends on external covariate 'dist_to_hosp'
## Covariates provided:
##  dist_to_hosp: im
## 
## Fitted trend coefficients:
##   (Intercept)  dist_to_hosp 
## -12.259493700  -0.001670928 
## 
##                   Estimate         S.E.      CI95.lo       CI95.hi Ztest
## (Intercept)  -12.259493700 6.649688e-02 -12.38982519 -12.129162209   ***
## dist_to_hosp  -0.001670928 9.344664e-05  -0.00185408  -0.001487776   ***
##                    Zval
## (Intercept)  -184.36194
## dist_to_hosp  -17.88109
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##   (Intercept)  dist_to_hosp 
## -12.259493700  -0.001670928 
## 
## Fitted exp(theta):
##  (Intercept) dist_to_hosp 
## 4.739904e-06 9.983305e-01

7.2 Model Comparison

anova(model_null, model_dist, test = "LRT")

## Analysis of Deviance Table
## 
## Model 1: ~1   Poisson
## Model 2: ~dist_to_hosp    Poisson
##   Npar Df Deviance  Pr(>Chi)    
## 1    1                          
## 2    2  1   581.35 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Table 2. AIC comparison of fitted models.
Model	AIC	ΔAIC
Null (CSR)	17218.3	—
Distance to hospital	16636.2	582.2

Result: The distance model is overwhelmingly preferred:

Likelihood-ratio test: χ²(1) = 397, p < 2×10⁻¹⁶
ΔAIC = 398 — very strong evidence for the covariate model

The negative coefficient (\(\hat{\beta}_1 = -5.55 \times 10^{-7}\), p < 10⁻¹⁶) confirms that pharmacy intensity decreases significantly with distance from hospitals.

7.3 Fitted Intensity Surface

plot(model_dist, how = "image",
     main = "Fitted intensity — inhomogeneous Poisson (distance to hospital)",
     se   = FALSE)
points(pharm_ppp, pch = 16, cex = 0.25, col = "white")

Figure 8. Fitted intensity surface from the inhomogeneous Poisson model. Observed pharmacy locations overlaid (white dots).

8. Goodness of Fit

8.1 Smooth Residuals

diagnose.ppm(model_dist, which = "smooth",
             main = "Smooth residuals — distance model")

Figure 9. Smoothed residual field from the distance model. Values near zero (green) indicate good local fit; positive values (red) suggest underprediction.

## Model diagnostics (raw residuals)
## Diagnostics available:
##  smoothed residual field
## range of smoothed field =  [-4.882e-07, 7.17e-07]

8.2 Kolmogorov-Smirnov Test

pharm_ks <- cdf.test(pharm_ppp, dist_to_hosp)
print(pharm_ks)

## 
##  Spatial Kolmogorov-Smirnov test of CSR in two dimensions
## 
## data:  covariate 'dist_to_hosp' evaluated at points of 'pharm_ppp' 
##      and transformed to uniform distribution under CSR
## D = 0.45825, p-value < 2.2e-16
## alternative hypothesis: two-sided

The KS test rejects the null of no covariate effect (p < 2×10⁻¹⁶), confirming that the distance covariate significantly shapes the distribution.

8.3 Inhomogeneous K Function

We test whether residual clustering remains after accounting for the hospital covariate. If the observed inhomogeneous K falls within the simulation envelope under the fitted model, the model captures the spatial structure well.

set.seed(2025)
pharm_Ki_env <- envelope(model_dist, Kinhom, nsim = 39,
                          correction = "iso", global = FALSE, verbose = FALSE)
plot(pharm_Ki_env,
     main = "Inhomogeneous K — envelopes under fitted model",
     legendpos = "topleft")

Figure 10. Inhomogeneous K function with simulation envelopes under the fitted model. The observed curve (black) lies mostly inside the envelope, indicating reasonable fit, though some residual clustering remains at short distances.

Interpretation: The fitted inhomogeneous Poisson model captures much of the spatial variation. The residual clustering at short distances (< 500 m) suggests that additional fine-scale clustering exists beyond the hospital proximity effect — likely reflecting commercial corridors, transport hubs, and population density.

9. Discussion

Summary of findings

Analysis	Result
KDE	Pronounced centre-periphery gradient; secondary peaks along radial corridors
Quadrat test	χ²(30) = 450, p ≈ 4×10⁻⁷⁶ — CSR strongly rejected
L function	Observed far above CSR envelope at all scales (0–7 km)
rhohat	Sharp intensity decay within 0–2 km of nearest hospital
PPM coefficient	β₁ = −5.55×10⁻⁷ (p < 10⁻¹⁶) — significant negative effect
ΔAIC	398 — covariate model massively preferred
Inhomogeneous K	Residual clustering at < 500 m after accounting for covariate

Interpretation

Warsaw pharmacies exhibit strong, statistically significant clustering at all spatial scales. The clustering is not random: proximity to hospitals and clinics is a highly significant predictor of pharmacy location, consistent with economic logic (demand proximity). A ΔAIC of 398 represents very strong evidence.

However, the hospital distance covariate does not fully explain the pattern — residual clustering at very short distances (< 500 m) suggests additional drivers, potentially:

Population density and pedestrian flow
Proximity to tram/metro stops
Commercial zoning restrictions
Historical centre-periphery development patterns

Limitations and next steps

A Gibbs process model (e.g., Strauss) could capture the residual fine-scale clustering beyond the Poisson assumption.
Additional covariates (population density, metro proximity) would likely improve model fit substantially.
The dataset includes both chains (e.g., Apteka Gemini) and independents — a marked pattern analysis could test whether chain pharmacies cluster differently from independents.

10. Reproducibility

## Data source: OpenStreetMap, downloaded May 2026

## CRS: EPSG:2180 (Polish national grid, metres)

## R version 4.4.3 (2025-02-28)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C.UTF-8/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
## 
## time zone: Europe/Warsaw
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.3.2        viridis_0.6.5          viridisLite_0.4.2     
##  [4] lubridate_1.9.4        forcats_1.0.1          stringr_1.5.1         
##  [7] dplyr_1.2.0            purrr_1.2.1            readr_2.1.6           
## [10] tidyr_1.3.2            tibble_3.3.1           tidyverse_2.0.0       
## [13] ggplot2_4.0.2          httr_1.4.7             jsonlite_1.9.1        
## [16] spatstat_3.5-1         spatstat.linnet_3.4-1  spatstat.model_3.6-1  
## [19] rpart_4.1.24           spatstat.explore_3.8-0 nlme_3.1-167          
## [22] spatstat.random_3.4-5  spatstat.geom_3.7-2    spatstat.univar_3.1-7 
## [25] spatstat.data_3.1-9    sf_1.0-23             
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6          xfun_0.52             bslib_0.9.0          
##  [4] spatstat.sparse_3.1-0 lattice_0.22-6        tzdb_0.5.0           
##  [7] vctrs_0.7.1           tools_4.4.3           spatstat.utils_3.2-2 
## [10] generics_0.1.4        goftest_1.2-3         proxy_0.4-28         
## [13] pkgconfig_2.0.3       Matrix_1.7-2          KernSmooth_2.23-26   
## [16] RColorBrewer_1.1-3    S7_0.2.1              lifecycle_1.0.5      
## [19] compiler_4.4.3        farver_2.1.2          deldir_2.0-4         
## [22] htmltools_0.5.8.1     class_7.3-23          sass_0.4.9           
## [25] yaml_2.3.10           pillar_1.11.1         jquerylib_0.1.4      
## [28] classInt_0.4-11       cachem_1.1.0          wk_0.9.5             
## [31] abind_1.4-8           tidyselect_1.2.1      digest_0.6.37        
## [34] stringi_1.8.4         splines_4.4.3         polyclip_1.10-7      
## [37] fastmap_1.2.0         grid_4.4.3            cli_3.6.5            
## [40] magrittr_2.0.4        e1071_1.7-17          withr_3.0.2          
## [43] tensor_1.5.1          scales_1.4.0          timechange_0.3.0     
## [46] rmarkdown_2.29        gridExtra_2.3         hms_1.1.4            
## [49] evaluate_1.0.3        knitr_1.49            s2_1.1.9             
## [52] mgcv_1.9-1            rlang_1.1.7           Rcpp_1.0.14          
## [55] glue_1.8.0            DBI_1.2.3             R6_2.6.1             
## [58] units_1.0-0