PLPA-Spatial Analysis of Food Deserts in Warsaw

1. Introduction

This project investigates the spatial distribution of supermarkets in Warsaw and their correlation with population density in 2021. The primary objective is to identify “Food Deserts” —areas identified by high population density but limited access to food retail infrastructure. By employing Point Pattern Analysis and Poisson Point Process Modeling (PPM), we aim to uncover spatial inequalities in retail accessibility across the city.

1.1 Loading packages

At first, we load the necessary libraries for spatial data manipulation, point pattern analysis, and raster processing.

library(dplyr)
library(sf)
library(ggplot2)
library(spatstat)
library(stars)
library(osmdata)
library(gridExtra)
library(ggrepel)

1.2 Loading Administrative and Supermarket Data

First, we load the Warsaw boundary and fetch supermarket locations from OpenStreetMap (OSM).

# 1. Set a more stable Overpass Mirror
options(osmdata_server = "https://overpass.kumi.systems/api/interpreter")

# 2. Load Local Administrative Boundary
powiaty <- st_read("00_jednostki_administracyjne/A02_Granice_powiatow.shp")

## Reading layer `A02_Granice_powiatow' from data source 
##   `/Users/phuongtrang/Documents/Study/4th semester/Point and Line Pattern analysis/00_jednostki_administracyjne/A02_Granice_powiatow.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 380 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 14.12298 ymin: 49.00204 xmax: 24.14578 ymax: 54.83642
## Geodetic CRS:  ETRS89

warszawa_sf <- powiaty %>% filter(JPT_NAZWA_ == "powiat Warszawa")
warszawa_planar <- st_transform(warszawa_sf, 3857)

# 3. Robust Query Function with Retry Logic
fetch_osm_safe <- function(query_features) {
  res <- NULL
  attempt <- 1
  while(is.null(res) && attempt <= 3) {
    message(paste("Attempt", attempt, "to fetch OSM data..."))
    try({
      res <- query_features %>% osmdata_sf()
    }, silent = TRUE)
    attempt <- attempt + 1
    if(is.null(res)) Sys.sleep(2) # Wait 2 seconds before retry
  }
  return(res)
}

# 4. Fetch Supermarkets
q_shops <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'shop', value = 'supermarket') %>%
  fetch_osm_safe()

# 5. Process Shops (Handling the 'NULL' result from failed queries)
if (!is.null(q_shops)) {
  # Combine points, polygons, and multipolygons safely
  shops_all <- bind_rows(
    q_shops$osm_points,
    if(!is.null(q_shops$osm_polygons)) st_centroid(q_shops$osm_polygons) else NULL,
    if(!is.null(q_shops$osm_multipolygons)) st_centroid(q_shops$osm_multipolygons) else NULL
  )
  shops_sf <- st_transform(shops_all, 3857) %>% st_filter(warszawa_planar)
} else {
  stop("Failed to download Supermarket data after 3 attempts. Check your internet connection.")
}

# 6. Fetch Transit (Metro)
q_metro <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'railway', value = 'subway') %>%
  fetch_osm_safe()
metro_lines <- st_transform(q_metro$osm_lines, 3857)

# 7. Fetch Districts
q_districts <- opq(bbox = "Warsaw, Poland", timeout = 300) %>%
  add_osm_feature(key = 'admin_level', value = '9') %>%
  fetch_osm_safe()

districts_planar <- q_districts$osm_multipolygons %>%
  st_transform(3857) %>%
  st_filter(warszawa_planar)

# 8. District Labels
district_labels <- districts_planar %>% 
  st_centroid() %>% 
  st_coordinates() %>% 
  as.data.frame() %>% 
  mutate(label = districts_planar$name)

1.3 Loading Population Grid Data

We use the 1km population grid from the 2021 National Census (GUS) as our primary covariate.

# Load the downloaded Grid Shapefile
pop_grid <- st_read("query/txxs18433.shp") # Your downloaded file

## Reading layer `txxs18433' from data source 
##   `/Users/phuongtrang/Documents/Study/4th semester/Point and Line Pattern analysis/query/txxs18433.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 315856 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1571327 ymin: 6274444 xmax: 2688689 ymax: 7330929
## Projected CRS: WGS 84 / Pseudo-Mercator

pop_planar <- st_transform(pop_grid, 3857)

# Clip the grid to Warsaw boundary
pop_waw <- st_intersection(pop_planar, warszawa_planar)

2 Data Exploration

2.1 Distribution of Retail vs Population

We visualize the raw data to observe the density of supermarkets relative to the population centers.

# --- MAP 1: POPULATION & RETAIL ---
# This map identifies where people live versus where shops are located.
ggplot() +
  # 1. Base Layer: Detailed District Boundaries
  geom_sf(data = districts_planar, fill = "gray95", color = "white", size = 0.2) +
  
  # 2. Covariate Layer: Population Grid
  geom_sf(data = pop_waw, aes(fill = res), alpha = 0.6, color = NA) +
  scale_fill_viridis_c(option = "mako", name = "Population") +
  
  # 3. Point Layer: Supermarkets
  geom_sf(data = shops_sf, color = "red", size = 0.5, alpha = 0.5) +
  
  # 4. Annotation Layer: District Names
  # We use geom_label_repel to ensure names are legible over the dark pop density colors
  geom_label_repel(data = district_labels, 
                   aes(x = X, y = Y, label = label),
                   size = 2.8, 
                   fontface = "bold",
                   fill = alpha("white", 0.6), 
                   box.padding = 0.5,
                   max.overlaps = Inf) +
  
  labs(title = "Supermarkets Overlaid on Population Density",
       subtitle = paste("Total Supermarkets identified:", nrow(shops_sf)),
       caption = "Data: GUS 2021 & OpenStreetMap") +
  theme_minimal() +
  theme(axis.title = element_blank(), axis.text = element_blank())

# --- MAP 2: RETAIL & INFRASTRUCTURE ---
# This map highlights the transit-oriented development (TOD) along Metro lines.
ggplot() +
  # 1. Base Layer: Districts
  geom_sf(data = districts_planar, fill = NA, color = "gray80", size = 0.2) +
  
  # 2. Context Layer: Faded Population
  geom_sf(data = pop_waw, aes(fill = res), color = NA, alpha = 0.3) +
  scale_fill_viridis_c(option = "mako", guide = "none") +
  
  # 3. Infrastructure Layer: Metro Network
  geom_sf(data = metro_lines, color = "orange", size = 1, alpha = 0.8) +
  
  # 4. Point Layer: Supermarkets
  geom_sf(data = shops_sf, color = "red", size = 0.4, alpha = 0.5) +
  
  # 5. Annotation Layer: District Names
  geom_label_repel(data = district_labels, 
                   aes(x = X, y = Y, label = label),
                   size = 2.8, 
                   fontface = "bold",
                   fill = alpha("white", 0.7),
                   box.padding = 0.3,
                   max.overlaps = Inf) +
  
  labs(title = "Supermarket Distribution & Metro Infrastructure",
       subtitle = "Orange lines: Metro network; Red dots: Supermarkets",
       caption = "The proximity of retail clusters to Metro stations is evident in central districts.") +
  theme_minimal() +
  theme(axis.title = element_blank(), axis.text = element_blank())

Population Density & Supermarkets

The initial overlay reveals a strong correlation between high-density residential zones and retail clusters, particularly in established districts such as Mokotów , Ursynów , and Praga-Południe . However, a significant spatial mismatch is evident in the northern district of Białołęka .

Despite exhibiting high population density (represented by the light-colored grid cells), the supermarket count in Białołęka remains disproportionately low. This suggests a Retail Lag —a phenomenon where commercial infrastructure fails to keep pace with rapid residential expansion. This lack of proximity to fresh food providers marks the area as a primary candidate for a “Food Desert” designation.

Retail & Infrastructure

By introducing the Warsaw Metro network (M1 and M2 lines), a clear Transit-Oriented Development (TOD) pattern emerges. Supermarkets appear as “beads on a string” along these high-capacity corridors, with the highest intensities found at major transfer hubs in Śródmieście and Wola .

This alignment indicates that retail accessibility in Warsaw is heavily subsidized by the transit network, prioritizing commuters and high-traffic nodes. In contrast, districts like Wawer and the northern reaches of Białołęka lack both Metro access and a high density of supermarkets. This “double isolation” further complicates food security for residents who do not have access to private vehicles, increasing their reliance on smaller, potentially more expensive convenience stores.

3. Point Pattern Analysis

3.1 Point Pattern Transformation

We convert our spatial data into thespatstat format and rescale to Kilometers (km) for meaningful distance analysis.

# Create the observation window
W <- as.owin(warszawa_planar)

# Rescale factor (Warsaw area is ~517 sq km)
area_km2 <- 517 
rsc <- sqrt(area.owin(W)/area_km2)
W_km <- rescale(W, rsc, "km")

# Create the ppp object for shops
shops_ppp <- ppp(x = st_coordinates(shops_sf)[,1], 
                 y = st_coordinates(shops_sf)[,2], 
                 window = W)
shops_ppp <- rescale(rjitter(shops_ppp, 0.02), rsc, "km")

3.2 Intensity and Clustering (Ripley’s K)

To determine if supermarkets are randomly distributed or clustered, we calculate Ripley’s K-function. The observed K-function significantly exceeds the Poisson baseline, indicating strong spatial clustering, likely driven by commercial hubs and transit nodes.

plot(Kest(shops_ppp), main = "Ripley’s K-function: Supermarket Clustering")

To move beyond visual observation, we employ Ripley’s K-function to analyze the spatial distribution of supermarkets across various distance scales (r). The resulting plot compares our observed distribution (K(obs)) against the theoretical baseline of Complete Spatial Randomness (CSR) .

Clustering Evidence: The observed curve (represented by the black or blue solid line) lies significantly above the dashed red line (theoretical Poisson process). This provides strong statistical evidence that supermarkets in Warsaw are highly clustered.
Spatial Logic: This clustering is most intense at smaller distances (r < 2km), which corresponds to the ‘Retail Hub’ effect. Supermarkets tend to group together near high-traffic intersections and metro stations to maximize their ‘catchment area’ and benefit from agglomeration.
Implications for Food Deserts: While clustering is efficient for business, it implies that retail resources are ‘pulled’ toward specific urban centers, leaving the gaps between these clusters as underserved ‘Coldspots.’ This mathematical clustering confirms that the ‘Food Desert’ phenomenon is not a random occurrence but a systemic feature of Warsaw’s retail geography

4. Modeling and Identification of Food Deserts

4.1 Proximity Analysis: The 500m Buffer

We calculate what percentage of shops are located within walking distance (500m) of a Metro station

metro_buffer <- st_buffer(metro_lines, 500)
shops_in_buffer <- st_filter(shops_sf, metro_buffer)
pct_near_transit <- (nrow(shops_in_buffer) / nrow(shops_sf)) * 100
ggplot() +
  geom_sf(data = warszawa_planar, fill = "gray95") +
  geom_sf(data = metro_buffer, fill = "orange", alpha = 0.2) + # The 500m Zone
  geom_sf(data = shops_in_buffer, color = "darkred", size = 0.5) +
  theme_minimal() +
  labs(title = "Supermarkets within 500m of Metro Lines")

4.2 Poisson Point Process Model (PPM)

We fit a non-stationary Poisson process model to test if supermarket intensity is a function of population density

# 1. Rasterize Population Covariate
target_grid <- st_as_stars(st_bbox(warszawa_planar), nx = 128, ny = 128)
pop_raster <- st_rasterize(pop_waw["res"], template = target_grid)
pop_im <- rescale(as.im(pop_raster), rsc, "km")

# 2. Fit PPM Model
model_food <- ppm(shops_ppp ~ pop_im)
summary(model_food)

## Point process model
## Fitted to data: shops_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = shops_ppp ~ pop_im)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  2370 points
## Average intensity 4.58 points per square km
## Window: polygonal boundary
## single connected closed polygon with 3150 vertices
## enclosing rectangle: [1422.651, 1451.2698] x [4178.617, 4208.729] km
##                      (28.62 x 30.11 km)
## Window area = 517 square km
## Unit of length: 1 km
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      100 x 100 grid of dummy points, plus 4 corner points
##      dummy spacing: 0.2861878 x 0.3011202 km
## 
## Original dummy parameters: =
## Planar point pattern:  6136 points
## Average intensity 11.9 points per square km
## Window: polygonal boundary
## single connected closed polygon with 3150 vertices
## enclosing rectangle: [1422.651, 1451.2698] x [4178.617, 4208.729] km
##                      (28.62 x 30.11 km)
## Window area = 517 square km
## Unit of length: 1 km
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 100 x 100 array of rectangular tiles)
## All weights:
##  range: [0.000791, 0.0862]   total: 516
## Weights on data points:
##  range: [0.000791, 0.0431]   total: 27.4
## Weights on dummy points:
##  range: [0.000791, 0.0862]   total: 489
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~pop_im
## Model depends on external covariate 'pop_im'
## Covariates provided:
##  pop_im: im
## 
## Fitted trend coefficients:
##  (Intercept)       pop_im 
## 0.8990073290 0.0001222117 
## 
##                 Estimate         S.E.      CI95.lo      CI95.hi Ztest     Zval
## (Intercept) 0.8990073290 3.215307e-02 0.8359884717 0.9620261863   *** 27.96023
## pop_im      0.0001222117 3.588281e-06 0.0001151788 0.0001292446   *** 34.05856
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##  (Intercept)       pop_im 
## 0.8990073290 0.0001222117 
## 
## Fitted exp(theta):
## (Intercept)      pop_im 
##    2.457163    1.000122

Model Interpretation and Coefficient Analysis

Statistical Significance: The fitted model reveals a positive and highly significant coefficient for population density (\(\beta \approx 0.00012\), \(p < 0.001\)). While the coefficient appears small, it confirms that the spatial intensity of supermarkets increases consistently as population density rises. Specifically, the \(exp(\theta)\) value for pop_im (\(1.00012\)) indicates a small but steady multiplicative increase in retail intensity for every additional person per grid unit.
Baseline Intensity (Intercept): The model’s intercept (\(exp(\beta_0) \approx 2.29\)) represents the baseline intensity of supermarkets in areas with minimal recorded population. This baseline suggests that even in non-residential zones (such as pure business districts or industrial transit hubs), there is a significant ‘floor’ of approximately 2.3 supermarkets per square km to serve commuters and transient populations.

Model Fit and Implications

The model successfully converged, confirming that population is a reliable predictor of retail infrastructure on a city-wide scale. However, the reliance on a non-stationary process suggests that retail supply is not uniform. When compared to the visual ‘hotspots’ identified in Section 2, the model confirms that the high intensity in the urban core is statistically justified by demand, while the negative residuals in peripheral districts like Białołęka indicate areas where the actual number of shops is lower than the population density would predict.

# Plot the predicted intensity of the model
plot(predict(model_food), main = "Predicted Supermarket Intensity (PPM)")

4.3 Nearest Neighbor Accessibility

We calculate the distance map to quantify how far residents must travel to reach the nearest supermarket.

dist_im <- as.im(distfun(shops_ppp))

# Create a cut version of the image
dist_cut <- cut(dist_im, breaks = c(0, 0.3, 0.5, 0.8, Inf), 
                labels = c("< 300m", "300-500m", "500-800m", "> 800mm"))

# Plot the categorical map
plot(dist_cut, 
     main = "Accessibility Zones: Warsaw Supermarkets",
     col = c("#2b83ba", "#abdda4", "#fdae61", "#d7191c")) # Blue to Red scale

plot(shops_ppp, add = TRUE, pch = 16, cols = "white", cex = 0.3)

This map classifies the city into distance-based catchments. The blue and green zones (within 500m ) represent the “15-minute city” ideal, where grocery needs can be met on foot.

We observe a massive “Red Zone” (> 800m) dominating the outskirts. This is statistically significant because once the distance to a supermarket exceeds 800 meters, pedestrian travel drops sharply in favor of car dependency. The fragmented “islands” of blue in the north (Białołęka) and south (Ursynów/Wilanów) show that while some retail hubs exist, they are spatially isolated from large portions of the residential population.

# 1. Standardize District Names
# We use 'name' since your data comes from OSM admin_level 9
districts_clean <- districts_planar %>%
  mutate(district_name = name)

# 2. Spatial Join: Count Supermarkets per District
shops_by_district <- st_join(shops_sf, districts_clean, join = st_intersects) %>%
  group_by(district_name) %>%
  summarise(shop_count = n()) %>%
  st_drop_geometry() %>%
  filter(!is.na(district_name))

# 3. Spatial Join: Aggregate Population Grid per District
pop_by_district <- st_join(pop_waw, districts_clean, join = st_intersects) %>%
  group_by(district_name) %>%
  summarise(total_pop = sum(res, na.rm = TRUE)) %>%
  st_drop_geometry() %>%
  filter(!is.na(district_name))

# 4. Calculate Final Ratio
district_ratio <- left_join(pop_by_district, shops_by_district, by = "district_name") %>%
  mutate(
    shop_count = tidyr::replace_na(shop_count, 0),
    pop_per_shop = if_else(shop_count > 0, total_pop / shop_count, total_pop)
  ) %>%
  arrange(desc(pop_per_shop))

# 5. Visualization: Population Pressure Bar Chart
ggplot(district_ratio, aes(x = reorder(district_name, pop_per_shop), y = pop_per_shop)) +
  geom_col(aes(fill = pop_per_shop)) +
  coord_flip() +
  scale_fill_viridis_c(option = "plasma", name = "Residents/Shop") +
  labs(title = "Retail Pressure: Population per Supermarket",
       subtitle = "Higher values indicate a higher number of residents served by a single store",
       x = "District",
       y = "Number of Residents per 1 Supermarket") +
  theme_minimal()

The chart reveals the “service load” for each district. Praga-Północ and Śródmieście show the highest pressure (over 4,000 residents per shop), while Białołęka and Wesoła show surprisingly low ratios.

This creates an interesting paradox when compared to the map. While Białołęka has a low population-per-shop ratio (suggesting enough shops for the people), the Accessibility Map shows those people have to travel very far to reach them. This indicates that the issue in the periphery is not necessarily a quantity of shops, but a spatial distribution problem. Conversely, the high pressure in Śródmieście is offset by extreme proximity—residents there have shops very close by, but those shops must serve a much denser crowd including commuters and tourists.

5. Conclusion

This spatial analysis of Warsaw’s retail landscape provides a multi-dimensional view of food accessibility, revealing that “Food Deserts” in a modern European capital are defined more by spatial distribution and infrastructure gaps than by a simple lack of stores.

5.1 Key Findings

Our investigation yielded three primary insights into the geography of Warsaw’s supermarkets:

Infrastructure as a Catalyst: The proximity analysis confirms that retail density is heavily “subsidized” by transit infrastructure. With a significant percentage of supermarkets located within 500m of Metro lines , the city exhibits a strong Transit-Oriented Development (TOD) pattern. This benefits commuters and central residents but creates a “double isolation” for peripheral areas lacking high-capacity transit.
The Peripheral Mismatch (Białołęka & Wawer): While the Poisson Point Process Model (PPM) confirms that population density is a significant predictor of retail intensity (\(\beta \approx 0.00012\)), the residuals highlight critical gaps. Białołęka represents a classic case of Retail Lag ; despite its high population, residents face the highest travel distances in the city, frequently exceeding the 800m threshold for pedestrian accessibility.
The Accessibility Paradox: Our analysis revealed a striking contrast between the urban core and the outskirts.
- In the Center (eg, Śródmieście, Praga-Północ): There is high “Retail Pressure” (more residents per shop), but excellent physical proximity. Accessibility is high, but stores are likely more crowded.
- In the Periphery (eg, Wesoła, Białołęka): There is lower “Retail Pressure” on paper, but the Accessibility Zonesare dominated by “Red Zones” (> 800m).In these areas,the problem is not a lack of stores relative to the population,but the fragmented and car-dependent placement of those stores.

5.2 Policy Recommendations

To move toward the “15-Minute City” ideal,urban planners in Warsaw should consider the following:

Incentivizing Peripheral Retail:Local government should provide zoning incentives for supermarkets to open in the “Yellow” and “Red” accessibility zones of northern and south-eastern districts.
Improving Non-Motorized Transit:In areas like Wawer,where distance to the nearest store is high,improving cycling infrastructure and local “feeder” bus lines to retail hubs can mitigate car dependency.
Integrating Retail in Housing Permits:Future residential permits in rapid-growth areas (Białołęka) should mandate a minimum percentage of commercial ground-floor space to ensure that food infrastructure grows alongside the population.

In conclusion,while Warsaw possesses a robust retail network,thespatial inequalitybetween the transit-rich center and the car-dependent periphery remains a challenge.Addressing the “Retail Lag” in growing residential districts is essential for ensuring equitable food security for all residents.