1. Introduction

This project investigates the spatial distribution of large-format supermarkets in Warsaw and their relationship with residential population density in 2021. The primary objective is to identify Food Deserts — areas characterised by high population density but limited access to food retail infrastructure.

The analysis treats supermarket locations as an unmarked point process: each store is recorded solely by its location, with no attribute differentiating one point from another. The research question is therefore one of spatial intensity — where do supermarkets occur, and is their distribution driven by where people live?

Two complementary methods are employed:

  1. Ripley’s K-function — to establish whether the point pattern departs from Complete Spatial Randomness (CSR) and, if so, at what spatial scales clustering occurs.
  2. Poisson Point Process Model (PPM) — to formally model supermarket intensity as a function of population density and identify districts where observed retail supply falls below model predictions.

Spatial accessibility is further characterised using a distance map derived from distfun() and a 500 m Metro buffer overlay, used as contextual tools to interpret model residuals in geographic terms. The analysis concludes with a district-level residual summary that quantifies the magnitude of retail undersupply in each of Warsaw’s 18 districts.


2. Data

2.1 Loading Packages

library(dplyr)
library(sf)
library(ggplot2)
library(spatstat)
library(stars)
library(osmdata)
library(ggrepel)
library(knitr)

2.2 Administrative Boundary

powiaty <- st_read("00_jednostki_administracyjne/A02_Granice_powiatow.shp")
## Reading layer `A02_Granice_powiatow' from data source 
##   `/Users/phuongtrang/Documents/Study/4th semester/Point and Line Pattern analysis/PLPA_Spatial Analysis of Food Deserts in Warsaw_Trang Hoang/00_jednostki_administracyjne/A02_Granice_powiatow.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 380 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 14.12298 ymin: 49.00204 xmax: 24.14578 ymax: 54.83642
## Geodetic CRS:  ETRS89
warszawa_sf   <- powiaty %>% filter(JPT_NAZWA_ == "powiat Warszawa")
warszawa_planar <- st_transform(warszawa_sf, 3857)

2.3 Supermarket Locations (OpenStreetMap)

Supermarket locations are fetched from OpenStreetMap via the osmdata package using the tag shop=supermarket. To ensure the dataset captures only large-format food retail — the type of store whose absence constitutes a food desert — results are filtered to the following major chains: Biedronka, Lidl, Carrefour, Auchan, Kaufland, Netto, Intermarché, Alma, and Piotr i Paweł.

Because OSM name fields are not standardised (a node may be recorded as “Biedronka — Białołęka” or “Lidl Polska”), filtering uses partial string matching (grepl) rather than exact equality, ensuring maximum recall without admitting unrelated retailers. The final observation count after spatial clipping to the Warsaw administrative boundary is reported inline below.

options(osmdata_server = "https://overpass.kumi.systems/api/interpreter")

fetch_osm_safe <- function(query_features) {
  res <- NULL; attempt <- 1
  while (is.null(res) && attempt <= 3) {
    message(paste("Attempt", attempt, "..."))
    try({ res <- query_features %>% osmdata_sf() }, silent = TRUE)
    attempt <- attempt + 1
    if (is.null(res)) Sys.sleep(2)
  }
  return(res)
}

# Fetch supermarkets
q_shops <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'shop', value = 'supermarket') %>%
  fetch_osm_safe()

large_chains <- c("Biedronka", "Lidl", "Carrefour", "Auchan",
                  "Kaufland", "Netto", "Intermarché",
                  "Alma", "Piotr i Paweł")

chain_pattern <- paste(large_chains, collapse = "|")

shops_all <- bind_rows(
  q_shops$osm_points,
  if (!is.null(q_shops$osm_polygons))      st_centroid(q_shops$osm_polygons)      else NULL,
  if (!is.null(q_shops$osm_multipolygons)) st_centroid(q_shops$osm_multipolygons) else NULL
)

shops_sf <- shops_all %>%
  st_transform(3857) %>%
  st_filter(warszawa_planar) %>%
  filter(grepl(chain_pattern, name, ignore.case = TRUE))

cat("Total supermarkets (large-format):", nrow(shops_sf), "\n")
## Total supermarkets (large-format): 304

This yields 304 observations of large-format supermarkets within the Warsaw administrative boundary.

2.4 Population Density Grid (GUS 2021)

The primary spatial covariate is the 1 km² population grid from the 2021 Polish National Census (GUS). Each cell records the number of residents, providing a continuous surface of residential demand.

pop_grid   <- st_read("query/txxs18433.shp")
## Reading layer `txxs18433' from data source 
##   `/Users/phuongtrang/Documents/Study/4th semester/Point and Line Pattern analysis/PLPA_Spatial Analysis of Food Deserts in Warsaw_Trang Hoang/query/txxs18433.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 315856 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1571327 ymin: 6274444 xmax: 2688689 ymax: 7330929
## Projected CRS: WGS 84 / Pseudo-Mercator
pop_planar <- st_transform(pop_grid, 3857)
pop_waw    <- st_intersection(pop_planar, warszawa_planar)

2.5 Metro Network and District Boundaries (OpenStreetMap)

The Warsaw Metro network (lines M1 and M2) and district boundaries (administrative level 9) are fetched from OSM. The Metro network is used as a contextual overlay to interpret spatial patterns in retail distribution.

q_metro <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'railway', value = 'subway') %>%
  fetch_osm_safe()
metro_lines <- st_transform(q_metro$osm_lines, 3857)

q_districts <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'admin_level', value = '9') %>%
  fetch_osm_safe()

districts_planar <- q_districts$osm_multipolygons %>%
  st_transform(3857) %>%
  st_filter(warszawa_planar)

district_labels <- districts_planar %>%
  st_centroid() %>%
  st_coordinates() %>%
  as.data.frame() %>%
  mutate(label = districts_planar$name)

3. Exploratory Spatial Analysis

3.1 Supermarket Distribution and Population Density

The first map overlays supermarket locations on the GUS population grid to provide an initial visual assessment of spatial correspondence between retail supply and residential demand.

ggplot() +
  geom_sf(data = districts_planar, fill = "gray95", color = "white", linewidth = 0.2) +
  geom_sf(data = pop_waw, aes(fill = res), alpha = 0.65, color = NA) +
  scale_fill_viridis_c(option = "mako", name = "Residents\nper km²") +
  geom_sf(data = shops_sf, color = "red", size = 1, alpha = 0.7) +
  geom_label_repel(data = district_labels,
                   aes(x = X, y = Y, label = label),
                   size = 2.8, fontface = "bold",
                   fill = alpha("white", 0.6),
                   box.padding = 0.5, max.overlaps = Inf) +
  labs(title = "Large-Format Supermarkets and Residential Population Density",
       subtitle = paste("n =", nrow(shops_sf), "supermarkets | Data: GUS 2021 & OpenStreetMap"),
       caption = "CRS: EPSG 3857") +
  theme_minimal() +
  theme(axis.title = element_blank(), axis.text = element_blank())

The overlay reveals a broad positive correspondence between residential density and retail presence in established districts such as Mokotów, Ursynów, and Praga-Południe. However, a clear spatial mismatch is visible in the northern district of Białołęka: despite registering among the highest population grid values in the city, supermarket coverage remains sparse. This pattern suggests a Retail Lag — a structural delay in commercial infrastructure relative to rapid residential growth — and marks Białołęka as a primary candidate for food desert designation. This visual hypothesis is tested formally in Sections 4 and 5.

3.2 Retail Distribution Along the Metro Corridor

The second map introduces the Metro network to assess the degree to which transit infrastructure shapes retail geography.

ggplot() +
  geom_sf(data = districts_planar, fill = NA, color = "gray80", linewidth = 0.2) +
  geom_sf(data = pop_waw, aes(fill = res), color = NA, alpha = 0.3) +
  scale_fill_viridis_c(option = "mako", guide = "none") +
  geom_sf(data = metro_lines, color = "orange", linewidth = 1, alpha = 0.9) +
  geom_sf(data = shops_sf, color = "red", size = 0.8, alpha = 0.7) +
  geom_label_repel(data = district_labels,
                   aes(x = X, y = Y, label = label),
                   size = 2.8, fontface = "bold",
                   fill = alpha("white", 0.7),
                   box.padding = 0.3, max.overlaps = Inf) +
  labs(title = "Supermarket Distribution and Metro Infrastructure",
       subtitle = "Orange: Metro lines M1 & M2  |  Red: Large-format supermarkets",
       caption = "Data: OpenStreetMap") +
  theme_minimal() +
  theme(axis.title = element_blank(), axis.text = element_blank())

A pronounced Transit-Oriented Development (TOD) pattern is evident: supermarkets cluster along and around the M1 and M2 corridors, with the highest concentrations at major interchange nodes in Śródmieście and Wola. This suggests that retail accessibility in Warsaw is strongly subsidised by high-capacity transit. Districts lacking Metro coverage — most notably Białołęka and Wawer — exhibit both low retail density and poor transit connectivity, a condition of double isolation that disproportionately affects residents without private vehicles.


4. Point Pattern Analysis

4.1 Constructing the Point Pattern Object

Supermarket coordinates are converted to a ppp object in spatstat. The observation window is defined by the Warsaw administrative boundary. The Warsaw city area is computed directly from the spatial data to ensure the rescaling factor is reproducible regardless of boundary file changes. All distances are rescaled to kilometres for interpretability, and a small jitter (0.02 units) is applied to resolve any duplicate coordinates.

W <- as.owin(warszawa_planar)

# Compute Warsaw area from data rather than hardcoding
area_km2 <- as.numeric(st_area(warszawa_sf)) / 1e6
rsc      <- sqrt(area.owin(W) / area_km2)

shops_ppp <- ppp(
  x = st_coordinates(shops_sf)[, 1],
  y = st_coordinates(shops_sf)[, 2],
  window = W
)
shops_ppp <- rescale(rjitter(shops_ppp, 0.02), rsc, "km")

summary(shops_ppp)
## Planar point pattern:  304 points
## Average intensity 0.59008 points per square km
## 
## Coordinates are given to 12 decimal places
## 
## Window: polygonal boundary
## single connected closed polygon with 3150 vertices
## enclosing rectangle: [1420.1507, 1448.7192] x [4171.273, 4201.332] km
##                      (28.57 x 30.06 km)
## Window area = 515.184 square km
## Unit of length: 1 km
## Fraction of frame area: 0.6

4.2 Ripley’s K-Function: Testing for Spatial Clustering

Ripley’s K-function measures the expected number of additional events within distance r of an arbitrary event, normalised by the global intensity. Under Complete Spatial Randomness (CSR), K(r) = πr². Departures above this baseline indicate clustering; departures below indicate regularity.

set.seed(42)
K_env <- envelope(shops_ppp, Kest, nsim = 99, rank = 1, verbose = FALSE)
plot(K_env,
     main = "Ripley's K-Function with 99-Simulation Envelope",
     legendpos = "topleft")

Interpretation. The observed K-function lies substantially and consistently above the upper bound of the 99-simulation Monte Carlo envelope across all tested distances. This formally rejects Complete Spatial Randomness at a significance level of p < 0.02, providing strong statistical evidence that large-format supermarkets in Warsaw are spatially clustered rather than randomly distributed. The divergence is most pronounced at distances below 2 km, consistent with an agglomeration effect: stores concentrate near high-footfall intersections and Metro nodes where catchment populations are largest. From the perspective of food desert analysis, this clustering is significant — it implies that retail provision is systematically concentrated in certain zones, leaving the spatial gaps between clusters underserved. The food desert pattern is therefore not a stochastic outcome but a structural feature of Warsaw’s retail geography.


5. Poisson Point Process Model (PPM)

5.1 Preparing the Population Covariate

The GUS population grid is rasterised to a 128×128 pixel image aligned to the Warsaw bounding box, then rescaled to kilometres to match the ppp object. The resulting image is plotted to verify spatial alignment with the point pattern before model fitting.

target_grid <- st_as_stars(st_bbox(warszawa_planar), nx = 128, ny = 128)
pop_raster  <- st_rasterize(pop_waw["res"], template = target_grid)
pop_im      <- rescale(as.im(pop_raster), rsc, "km")

# Verify alignment: covariate should cover the same extent as the ppp window
plot(pop_im, main = "Population Density Covariate (residents per km², rescaled to km)")
plot(shops_ppp, add = TRUE, pch = 16, cex = 0.4, cols = "white")

5.2 Model Specification and Estimation

A log-linear inhomogeneous Poisson process model is fitted with population density as the sole covariate:

\[\log \lambda(s) = \beta_0 + \beta_1 \cdot \text{pop}(s)\]

where λ(s) is the expected supermarket intensity at location s (stores per km²).

model_food <- ppm(shops_ppp ~ pop_im)
summary(model_food)
## Point process model
## Fitted to data: shops_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = shops_ppp ~ pop_im)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  304 points
## Average intensity 0.59 points per square km
## Window: polygonal boundary
## single connected closed polygon with 3150 vertices
## enclosing rectangle: [1420.1507, 1448.7192] x [4171.273, 4201.332] km
##                      (28.57 x 30.06 km)
## Window area = 515.184 square km
## Unit of length: 1 km
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      64 x 64 grid of dummy points, plus 4 corner points
##      dummy spacing: 0.4463825 x 0.4696735 km
## 
## Original dummy parameters: =
## Planar point pattern:  2538 points
## Average intensity 4.93 points per square km
## Window: polygonal boundary
## single connected closed polygon with 3150 vertices
## enclosing rectangle: [1420.1507, 1448.7192] x [4171.273, 4201.332] km
##                      (28.57 x 30.06 km)
## Window area = 515.184 square km
## Unit of length: 1 km
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 64 x 64 array of rectangular tiles)
## All weights:
##  range: [0.0182, 0.21]   total: 514
## Weights on data points:
##  range: [0.0524, 0.105]  total: 29.2
## Weights on dummy points:
##  range: [0.0182, 0.21]   total: 485
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~pop_im
## Model depends on external covariate 'pop_im'
## Covariates provided:
##  pop_im: im
## 
## Fitted trend coefficients:
##  (Intercept)       pop_im 
## -1.333741910  0.000146968 
## 
##                 Estimate         S.E.       CI95.lo       CI95.hi Ztest
## (Intercept) -1.333741910 9.547812e-02 -1.5208755960 -1.1466082250   ***
## pop_im       0.000146968 9.847480e-06  0.0001276673  0.0001662687   ***
##                  Zval
## (Intercept) -13.96908
## pop_im       14.92443
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##  (Intercept)       pop_im 
## -1.333741910  0.000146968 
## 
## Fitted exp(theta):
## (Intercept)      pop_im 
##   0.2634895   1.0001470

5.3 Model Interpretation

coef_table <- data.frame(
  Parameter   = c("Intercept (β₀)", "Population density (β₁)"),
  Estimate    = round(coef(model_food), 6),
  `exp(coef)` = round(exp(coef(model_food)), 6)
)
knitr::kable(coef_table, caption = "PPM coefficient estimates")
PPM coefficient estimates
Parameter Estimate exp.coef.
(Intercept) Intercept (β₀) -1.333742 0.263489
pop_im Population density (β₁) 0.000147 1.000147
# Extract coefficients dynamically for inline use
beta0     <- round(coef(model_food)[["(Intercept)"]], 4)
beta1     <- round(coef(model_food)[["pop_im"]], 6)
exp_beta0 <- round(exp(beta0), 3)
exp_beta1 <- round(exp(beta1), 6)

Population density (β₁). The coefficient is positive and highly significant (p < 0.001), confirming that supermarket intensity increases systematically with residential density. The multiplicative effect exp(β₁) = 1.000147 implies that each additional resident per km² grid cell is associated with a 0.0147% increase in expected retail intensity. While this per-unit increment is small — reflecting the continuous nature of the covariate — the cumulative effect across the full range of Warsaw’s population grid (roughly 0 to 15,000 residents/km²) translates into a substantial and statistically robust gradient in predicted store intensity.

Baseline intensity (β₀). The intercept exp(β₀) = 0.264 represents the expected intensity of supermarkets in areas of near-zero population. This non-trivial floor reflects the presence of stores serving commuters, office workers, and transit users in low-residential commercial or industrial zones.

Overall fit. Population density explains a significant portion of the spatial variation in retail intensity, validating it as a primary driver of supermarket location decisions. However, the model is deliberately parsimonious — a single covariate cannot capture the full complexity of retail geography. The residual spatial structure, examined below, is where food deserts become visible.

5.4 Predicted Intensity Surface

plot(predict(model_food),
     main = "PPM Predicted Supermarket Intensity (stores per km²)",
     col  = hcl.colors(128, "YlOrRd", rev = TRUE))

The predicted surface mirrors the population distribution, with high intensities in the dense residential belt surrounding the city centre. The model confirms that the urban core’s retail concentration is statistically justified by demand. Critically, Białołęka in the north generates a high predicted intensity — meaning the model expects more supermarkets there given its population. The contrast between this prediction and the sparse observed pattern constitutes formal evidence of a food desert.

5.5 Model Residuals

To make the spatial mismatch explicit, we visualise the smoothed residuals of the PPM using the default adaptive bandwidth in diagnose.ppm. Positive residuals (more stores than predicted) indicate over-supplied areas; negative residuals (fewer stores than predicted) identify the food deserts.

plot(diagnose.ppm(model_food, which = "smooth"),
     main = "PPM Smoothed Residuals\n(positive = over-supplied; negative = under-supplied)")

The residual surface directly localises the food desert zones identified visually in Section 3. Strongly negative residuals — indicating a systematic shortfall of supermarkets relative to population-based predictions — are visible across several districts, most visibly in the northern and south-eastern periphery. The spatial extent of the shortfall in Białołęka and Wawer/Wesoła is particularly prominent, though the district-level analysis in Section 5.6 reveals that per-unit undersupply is in fact most intense in Bielany, Wilanów, and Ursynów. Conversely, positive residuals in Śródmieście and along the Metro corridors confirm that the urban core is over-supplied relative to residential demand alone.

5.6 District-Level Residual Summary

To move beyond visual inspection and provide a quantitative basis for food desert designation, PPM residuals are aggregated to the district level. For each district, the mean smoothed residual is computed: negative values indicate undersupply relative to population-based predictions, positive values indicate oversupply.

resid_diag <- diagnose.ppm(model_food, which = "smooth", plot.it = FALSE)
resid_im   <- resid_diag$smooth$Z

resid_df <- as.data.frame(resid_im)
names(resid_df) <- c("x", "y", "residual")
resid_df <- resid_df %>% filter(!is.na(residual))
resid_sf_m <- resid_df %>%
  mutate(
    x_m = x * rsc,
    y_m = y * rsc
  ) %>%
  st_as_sf(coords = c("x_m", "y_m"), crs = 3857)

district_residuals <- st_join(resid_sf_m, districts_planar["name"]) %>%
  st_drop_geometry() %>%
  filter(!is.na(name)) %>%
  group_by(District = name) %>%
  summarise(
    Mean_Residual = round(mean(residual, na.rm = TRUE), 4),
    .groups = "drop"
  ) %>%
  arrange(Mean_Residual) %>%
  mutate(
    Status = case_when(
      Mean_Residual < -0.005 ~ "Under-supplied (food desert candidate)",
      Mean_Residual >  0.005 ~ "Over-supplied",
      TRUE                   ~ "Approximately balanced"
    )
  )

knitr::kable(district_residuals,
             col.names = c("District", "Mean Residual (stores/km²)", "Status"),
             caption   = "District-level PPM residuals: negative = fewer supermarkets than population predicts")
District-level PPM residuals: negative = fewer supermarkets than population predicts
District Mean Residual (stores/km²) Status
Bielany -0.0727 Under-supplied (food desert candidate)
Wilanów -0.0696 Under-supplied (food desert candidate)
Ursynów -0.0644 Under-supplied (food desert candidate)
Wesoła -0.0377 Under-supplied (food desert candidate)
Wawer -0.0258 Under-supplied (food desert candidate)
Białołęka -0.0098 Under-supplied (food desert candidate)
Żoliborz -0.0052 Under-supplied (food desert candidate)
Targówek 0.0095 Over-supplied
Bemowo 0.0119 Over-supplied
Mokotów 0.0251 Over-supplied
Rembertów 0.0342 Over-supplied
Praga-Północ 0.0348 Over-supplied
Praga-Południe 0.0588 Over-supplied
Śródmieście 0.0746 Over-supplied
Wola 0.1060 Over-supplied
Włochy 0.1334 Over-supplied
Ochota 0.1399 Over-supplied
Ursus 0.2502 Over-supplied

The district-level table quantifies the spatial mismatch identified visually in Section 5.5 and reveals a pattern more complex than a simple centre–periphery divide. Six districts record negative mean residuals, indicating that observed supermarket counts fall below what population density alone would predict.

The three most strongly under-supplied districts — Bielany (−0.073), Wilanów (−0.070), and Ursynów (−0.064) — are not peripheral post-communist housing estates but established, densely populated districts with relatively high household incomes. Their appearance at the top of the undersupply ranking suggests that retail shortfall in Warsaw reflects not only a growth lag in fast-developing outer areas but also a density gap in mature neighbourhoods where residential demand has consistently outpaced commercial development. Wesoła (−0.038) and Wawer (−0.026) follow as the expected peripheral candidates, consistent with their large red zones in the distance surface (Section 6.1). Białołęka (−0.010), while visually prominent in the residual map due to the spatial extent of its shortfall, ranks sixth by mean residual magnitude — its food desert designation reflects the breadth of undersupply across a large area more than its per-unit intensity.

At the other end of the spectrum, Ursus records by far the highest positive residual (0.250), likely reflecting a small district area with a disproportionate concentration of large retail formats relative to its resident population. Ochota, Włochy, Wola, and Śródmieście also show strong oversupply, consistent with the Transit-Oriented Development pattern identified in Section 3.2 — retail investment concentrated along Metro corridors generates systematic excess supply in central and inner-western districts.

Taken together, the district residual ranking provides a reproducible, data-driven basis for targeting retail policy. Interventions focused solely on peripheral growth districts would miss the three most severely underserved areas by this measure.

6. Accessibility Analysis

6.1 Distance to Nearest Supermarket

The distfun() function in spatstat computes, for every location in the observation window, the Euclidean distance to the nearest point in the pattern. This yields a continuous accessibility surface.

dist_im  <- as.im(distfun(shops_ppp))

dist_cut <- cut(dist_im,
                breaks = c(0, 0.3, 0.5, 0.8, Inf),
                labels = c("< 300 m", "300–500 m", "500–800 m", "> 800 m"))

plot(dist_cut,
     main = "Pedestrian Accessibility Zones: Distance to Nearest Supermarket",
     col  = c("#2b83ba", "#abdda4", "#fdae61", "#d7191c"))
plot(shops_ppp, add = TRUE, pch = 16, cols = "white", cex = 0.3)

The accessibility surface operationalises food desert designation through four pedestrian distance thresholds. The < 300 m and 300–500 m zones (blue and green) approximate the “15-minute city” ideal for grocery access. The > 800 m zone (red) represents a critical threshold: beyond this distance, pedestrian travel to food retail becomes impractical for many residents, generating car dependency.

The map reveals large contiguous red zones across northern Warsaw (Białołęka) and south-eastern districts (Wawer, Wesoła). These zones are not randomly distributed — they coincide precisely with the areas of strongly negative PPM residuals identified in Section 5.5, and with the most negative entries in the district-level summary table in Section 5.6. This spatial convergence across three independent analytical methods — the model residuals, the district summary, and the distance surface — strengthens the designation of these areas as genuine food deserts rather than artefacts of data sparsity.

6.2 Contextual Overlay: Metro Accessibility Buffer

To contextualise the distance surface within Warsaw’s transport geography, a 500 m buffer is drawn around the Metro network. Supermarkets falling within this zone are identified.

metro_buffer     <- st_buffer(metro_lines, 500)
shops_in_buffer  <- st_filter(shops_sf, metro_buffer)
pct_near_transit <- round((nrow(shops_in_buffer) / nrow(shops_sf)) * 100, 1)

cat(pct_near_transit, "% of large-format supermarkets are within 500 m of a Metro line.\n")
## 10.9 % of large-format supermarkets are within 500 m of a Metro line.
ggplot() +
  geom_sf(data = warszawa_planar, fill = "gray95", color = "gray70") +
  geom_sf(data = metro_buffer, fill = "orange", alpha = 0.25, color = NA) +
  geom_sf(data = shops_sf, color = "gray50", size = 1, alpha = 0.5) +
  geom_sf(data = shops_in_buffer, color = "darkred", size = 1.5) +
  labs(title = "Supermarkets Within 500 m of the Metro Network",
       subtitle = paste0(pct_near_transit,
                         "% of stores fall within the Metro catchment zone"),
       caption = "Orange: 500 m Metro buffer  |  Dark red: stores within buffer") +
  theme_minimal() +
  theme(axis.title = element_blank(), axis.text = element_blank())

The Metro buffer analysis reveals that a disproportionate share of Warsaw’s large-format supermarkets are located within walking distance of the Metro network. This confirms the Transit-Oriented Development pattern identified visually in Section 3.2: retail investment is structurally concentrated along high-capacity transit corridors. The practical implication is that the 500 m Metro buffer functions as a spatial proxy for food accessibility — areas outside this zone and outside the blue/green accessibility categories in Section 6.1 face compounded disadvantage.


7. Conclusion

7.1 Key Findings

This study applied point pattern analysis and Poisson process modelling to the spatial distribution of large-format supermarkets in Warsaw, with the aim of identifying and characterising food deserts.

Finding 1 — Supermarkets are significantly clustered (Ripley’s K-function). The K-function analysis rejects Complete Spatial Randomness at all tested distances. Clustering is most intense below 2 km, consistent with agglomeration around Metro nodes and commercial corridors. This clustering is the structural precondition for food deserts: the concentration of retail in some areas necessarily leaves other areas underserved.

Finding 2 — Population density is a significant predictor of retail intensity, but undersupply is not confined to the periphery (PPM). The log-linear PPM confirms a positive and significant relationship between residential density and supermarket intensity (β₁ = 1.47^{-4}, p < 0.001). The smoothed residual map (Section 5.5) and district-level summary (Section 5.6) together reveal that retail undersupply affects a broader set of districts than the peripheral narrative would suggest. Bielany, Wilanów, and Ursynów — established, densely populated districts — record the most strongly negative mean residuals, indicating that their supermarket counts fall furthest below population-based predictions. Peripheral districts Wesoła and Wawer follow, with Białołęka ranking sixth. The food desert problem in Warsaw is therefore both a peripheral growth lag and a middle-city density gap.

Finding 3 — Peripheral districts face compounded spatial disadvantage (accessibility analysis). The distance surface reveals extensive red zones (> 800 m to nearest supermarket) across Białołęka and Wawer. The Metro buffer analysis shows that these same areas lie outside the transit catchment zone. Residents in these districts face a double isolation: no walkable supermarket and no Metro access — making car ownership a de facto requirement for food shopping.

7.2 Limitations

The analysis has two principal limitations. First, the PPM is intentionally parsimonious, using population density as the sole covariate; additional drivers of retail location — land prices, road accessibility, competitor proximity — are not modelled. Second, the OSM supermarket dataset reflects the state of crowd-sourced mapping at the time of query and may undercount newer or smaller large-format stores, particularly in fast-growing peripheral districts. Both limitations point in the same direction: the food desert designation for Białołęka and Wawer is, if anything, conservative.

7.3 Policy Recommendations

  • Incentivise peripheral retail. Zoning policy should provide density bonuses or reduced commercial rates for large-format supermarkets opening in red-zone accessibility areas, prioritising northern Białołęka and south-eastern Wawer/Wesoła.

  • Mandate ground-floor retail in new residential permits. Rapid-growth districts such as Białołęka should require a minimum share of commercial ground-floor space in new residential developments, ensuring food infrastructure grows alongside population.

  • Improve feeder transit to existing retail nodes. Where supermarkets exist but are spatially isolated (Białołęka’s fragmented blue islands), local bus frequency and cycling infrastructure improvements can reduce effective distance without new store development.

In sum, Warsaw’s food desert problem is not a shortage of supermarkets in aggregate, but a systematic mismatch between where people live and where stores are located — a mismatch amplified by the concentration of retail investment along transit corridors and the lagging commercial development of fast-growing peripheral districts.