1 Introduction

This project analyzes the spatial distribution of discount supermarket chains in Warsaw, focusing on three major retail brands: Biedronka, Lidl, and Aldi. Using spatial point pattern methods, we examine how stores are distributed across the city, whether competing chains cluster near one another, and what environmental covariates — distance to the city centre and distance to public transport stops — explain the observed spatial intensity.

Warsaw is the largest retail market in Poland. The three chains represent contrasting strategies: Biedronka (Jeronimo Martins) pursues city-wide saturation, Lidl (Germany) targets suburban commercial nodes, and Aldi maintains a limited footprint. Comparing their spatial footprints reveals how competing retail formats organize themselves across urban space.

2 Data and Preprocessing

2.1 Data sources

All layers come from OpenStreetMap (OSM) via the Overpass API (osmdata package in R). The table below summarises the OSM tags used and the role of each layer in the analysis.

Layer OSM key/value Geometry Role
stores_clean.gpkg shop = supermarket, brands Biedronka / Lidl / Aldi Points Marked point pattern
warsaw_boundary.gpkg boundary = administrative, admin_level = 6 Polygon Study window
major_roads.gpkg highway ∈ {motorway, trunk, primary, secondary} Lines Accessibility / context
public_transport_stops.gpkg public_transport, highway = bus_stop, railway = tram_stop Points Covariate
building_polygons.gpkg building ∈ {commercial, retail, office} Polygons Urban context

2.2 Data acquisition (OSM via osmdata)

The block below shows the queries used to pull the raw layers. It is set to eval = FALSE because the cleaned .gpkg files are already shipped with the project; re-run it to regenerate the data from scratch.

library(osmdata)
library(sf)
library(dplyr)

bb <- getbb("Warsaw, Poland")

# 1. Stores (shop = supermarket)
stores_raw <- opq(bbox = bb) %>%
  add_osm_feature(key = "shop", value = "supermarket") %>%
  osmdata_sf()
stores <- stores_raw$osm_points

stores <- stores %>%
  filter(
    grepl("biedronka", brand, ignore.case = TRUE) |
      grepl("lidl", brand, ignore.case = TRUE) |
      grepl("aldi", brand, ignore.case = TRUE)
  )


# 2. Warsaw administrative boundary
boundary_raw <- opq(bbox = bb) %>%
  add_osm_feature(key = "boundary", value = "administrative") %>%
  add_osm_feature(key = "admin_level", value = "6") %>%
  osmdata_sf()
warsaw_boundary <- boundary_raw$osm_multipolygons %>%
  filter(name == "Warszawa")

# 3. Major roads
roads_raw <- opq(bbox = bb) %>%
  add_osm_feature(
    key   = "highway",
    value = c("motorway", "trunk", "primary", "secondary")
  ) %>%
  osmdata_sf()
major_roads <- roads_raw$osm_lines

# 4. Public transport stops
pt_raw <- opq(bbox = bb) %>%
  add_osm_feature(
    key   = "public_transport",
    value = c("stop_position", "platform")
  ) %>%
  osmdata_sf()
public_transport_stops <- pt_raw$osm_points

# 5. Commercial / retail / office buildings
build_raw <- opq(bbox = bb) %>%
  add_osm_feature(
    key   = "building",
    value = c("commercial", "retail", "office")
  ) %>%
  osmdata_sf()
building_polygons <- build_raw$osm_polygons

2.3 Data cleaning

OSM tags are inconsistent (different spellings of the same brand, empty columns, invalid polygons). The block below shows the cleaning rules applied to produce the .gpkg files used downstream.

# --- Drop columns that are entirely NA, then rows that are all-NA --------
drop_empty <- function(sfobj) {
  attrs   <- st_drop_geometry(sfobj)
  keep_c  <- names(attrs)[colSums(!is.na(attrs)) > 0]
  sfobj   <- sfobj[, c(keep_c, attr(sfobj, "sf_column"))]
  attrs2  <- st_drop_geometry(sfobj)
  keep_r  <- rowSums(!is.na(attrs2)) > 0
  sfobj[keep_r, ]
}

stores                 <- drop_empty(stores)
major_roads            <- drop_empty(major_roads)
public_transport_stops <- drop_empty(public_transport_stops)
building_polygons      <- drop_empty(building_polygons)

# --- Repair polygon geometries ------------------------------------------
warsaw_boundary   <- st_make_valid(warsaw_boundary)
building_polygons <- st_make_valid(building_polygons)

# --- Keep only POINT features for the transport-stops layer -------------
public_transport_stops <- public_transport_stops %>%
  filter(st_geometry_type(.) == "POINT")

# --- Persist cleaned layers ---------------------------------------------
st_write(stores,                 "stores_clean.gpkg",            delete_dsn = TRUE)
st_write(warsaw_boundary,        "warsaw_boundary.gpkg",         delete_dsn = TRUE)
st_write(major_roads,            "major_roads.gpkg",             delete_dsn = TRUE)
st_write(public_transport_stops, "public_transport_stops.gpkg",  delete_dsn = TRUE)
st_write(building_polygons,      "building_polygons.gpkg",       delete_dsn = TRUE)

2.4 EPSG:4326

OSM ships in EPSG:4326 (degrees), which is unsuitable for distance-based PPA methods. We reproject every layer to EPSG:2180 (PUWG 1992 / CS92) — the official projected CRS for Poland, with units in metres — so that all distances reported below are in metres on the ground.


2.5 Loading Packages

library(sf)
library(dplyr)
library(ggplot2)
library(spatstat.geom)
library(spatstat.explore)
library(spatstat.model)
library(knitr)

2.6 Loading the Data

All data were originally collected from OpenStreetMap and saved as GeoPackage files. They are read directly from the project folder — no internet connection is required.

stores                 <- st_read("stores_clean.gpkg",            quiet = TRUE)
warsaw_boundary        <- st_read("warsaw_boundary.gpkg",         quiet = TRUE)
major_roads            <- st_read("major_roads.gpkg",             quiet = TRUE)
public_transport_stops <- st_read("public_transport_stops.gpkg",  quiet = TRUE)
building_polygons      <- st_read("building_polygons.gpkg",       quiet = TRUE)
# Check column names and basic structure
names(stores)
## [1] "brand"       "brand_clean" "name"        "operator"    "geom"
# Coordinate reference systems
cat("CRS of stores:", st_crs(stores)$epsg, "\n")
## CRS of stores: 4326
cat("CRS of boundary:", st_crs(warsaw_boundary)$epsg, "\n")
## CRS of boundary: 4326
# Check missing values in brand
cat("Missing brand values:", sum(is.na(stores$brand_clean)), "\n")
## Missing brand values: 0
# Brand counts in raw data
kable(
  as.data.frame(table(Brand = stores$brand_clean)),
  caption = "Table 1. Store counts by brand (raw data before clipping)."
)
Table 1. Store counts by brand (raw data before clipping).
Brand Freq
Aldi 8
Biedronka 114
Lidl 16

The raw dataset contains 138 store locations across all brands. Biedronka dominates numerically, consistent with its position as Poland’s largest discount chain. Lidl and Aldi together represent a much smaller share.


3 Data Exploration

3.1 CRS Transformation and Clipping to Warsaw Boundary

All spatial layers are reprojected to EPSG:2180 (Polish CS92), a metric planar projection suited for distance calculations within Poland. Stores are then filtered to retain only those located within the Warsaw administrative boundary.

stores_2180                 <- st_transform(stores,                 2180)
warsaw_boundary_2180        <- st_transform(warsaw_boundary,        2180)
major_roads_2180            <- st_transform(major_roads,            2180)
public_transport_stops_2180 <- st_transform(public_transport_stops, 2180)
building_polygons_2180      <- st_transform(building_polygons,      2180)

# Validate Warsaw boundary geometry
warsaw_boundary_2180 <- st_make_valid(warsaw_boundary_2180)

# Clip stores to Warsaw boundary
stores_2180_clipped <- st_filter(
  stores_2180,
  warsaw_boundary_2180,
  .predicate = st_within
)

cat("Stores before clipping:", nrow(stores_2180), "\n")
## Stores before clipping: 138
cat("Stores after clipping: ", nrow(stores_2180_clipped), "\n")
## Stores after clipping:  138
kable(
  as.data.frame(table(Brand = stores_2180_clipped$brand_clean)),
  caption = "Table 2. Store counts by brand after clipping to the Warsaw administrative boundary."
)
Table 2. Store counts by brand after clipping to the Warsaw administrative boundary.
Brand Freq
Aldi 8
Biedronka 114
Lidl 16

After clipping, 138 stores remain within Warsaw. The small reduction reflects stores in neighbouring municipalities captured by the original bounding box query but located outside the city boundary. Biedronka retains the largest count, with Lidl and Aldi each contributing a much smaller number of locations.

3.2 Spatial Overview Map

ggplot() +
  geom_sf(data = warsaw_boundary_2180,
          fill = "white", color = "black", linewidth = 0.5) +
  geom_sf(data = major_roads_2180,
          color = "grey75", linewidth = 0.2) +
  geom_sf(data = stores_2180_clipped,
          aes(color = brand_clean), size = 2, alpha = 0.85) +
  scale_color_manual(
    values = c("Biedronka" = "#d62828",
               "Lidl"      = "#e9c46a",
               "Aldi"      = "#457b9d")
  ) +
  labs(
    title    = "Discount Supermarket Chains in Warsaw",
    subtitle = "Biedronka, Lidl, and Aldi",
    color    = "Brand"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
Figure 1. Spatial distribution of Biedronka, Lidl, and Aldi discount stores across Warsaw, overlaid on the major road network (EPSG:2180).

Figure 1. Spatial distribution of Biedronka, Lidl, and Aldi discount stores across Warsaw, overlaid on the major road network (EPSG:2180).

The map reveals the stark asymmetry between the three brands. Biedronka (red) forms a dense, city-wide network that covers both central and peripheral districts. Lidl (yellow) and Aldi (blue) are far fewer and appear more concentrated in outer areas, likely reflecting their preference for larger-format stores in suburban commercial zones with good road access.

3.3 Distribution by Brand

ggplot() +
  geom_sf(data = warsaw_boundary_2180,
          fill = "grey97", color = "black", linewidth = 0.4) +
  geom_sf(data = stores_2180_clipped,
          color = "#2b4c7e", size = 0.9, alpha = 0.55) +
  facet_wrap(~ brand_clean) +
  labs(title = "Store Locations by Brand") +
  theme_minimal(base_size = 11) +
  theme(axis.title = element_blank())
Figure 2. Faceted maps showing store locations for each brand separately. The contrast between Biedronka's city-wide saturation and the sparser footprints of Lidl and Aldi is clearly visible.

Figure 2. Faceted maps showing store locations for each brand separately. The contrast between Biedronka’s city-wide saturation and the sparser footprints of Lidl and Aldi is clearly visible.

The faceted view confirms that Biedronka occupies virtually the entire city, while Lidl and Aldi have distinctly thinner networks. Aldi’s footprint is sparse enough that individual locations stand out clearly; Lidl shows some clustering in specific suburban corridors.


4 Point Pattern Analysis

4.1 Data Transformations

4.1.1 Creating the Observation Window

The Warsaw administrative boundary polygon is converted into a spatstat owin object, which serves as the observation window for all subsequent point pattern analyses.

warsaw_window <- as.owin(warsaw_boundary_2180)
warsaw_window
## window: polygonal boundary
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units

4.1.2 Marked Point Pattern (ppp)

Store locations are converted into a marked planar point pattern where the mark attached to each point is the store’s brand. The spatstat ppp() function requires explicit x/y coordinates and a window object.

coords <- st_coordinates(stores_2180_clipped)

stores_ppp <- ppp(
  x      = coords[, 1],
  y      = coords[, 2],
  window = warsaw_window,
  marks  = factor(stores_2180_clipped$brand_clean)
)

summary(stores_ppp)
## Marked planar point pattern:  138 points
## Average intensity 2.670717e-07 points per square unit
## 
## Coordinates are given to 10 decimal places
## 
## Multitype:
##           frequency proportion    intensity
## Aldi              8 0.05797101 1.548242e-08
## Biedronka       114 0.82608700 2.206245e-07
## Lidl             16 0.11594200 3.096484e-08
## 
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
kable(
  as.data.frame(table(Brand = marks(stores_ppp))),
  caption = "Table 3. Mark (brand) distribution in the ppp object."
)
Table 3. Mark (brand) distribution in the ppp object.
Brand Freq
Aldi 8
Biedronka 114
Lidl 16
plot(
  stores_ppp,
  cols = c("#d62828", "#e9c46a", "#457b9d"),
  pch  = 20,
  cex  = 0.7,
  main = "Marked Point Pattern: Discount Supermarkets in Warsaw"
)
Figure 3. Marked point pattern of discount supermarkets in Warsaw. Each point represents one store; colour denotes brand.

Figure 3. Marked point pattern of discount supermarkets in Warsaw. Each point represents one store; colour denotes brand.

The summary confirms the brand-level intensity (stores per m²) for each mark type. Biedronka’s intensity is roughly four times that of Lidl, which in turn exceeds Aldi’s. The overall pattern already hints at strong spatial clustering rather than a random distribution.


4.2 Nearest-Neighbour Distance Analysis

4.2.1 Summary Statistics

For each store, the distance to the nearest store of any brand is computed using nndist(). This measures how tightly packed the combined network is, and how that packing differs by brand.

nn_all <- nndist(stores_ppp)

# Overall summary
summary(nn_all)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    40.2   375.9   619.0   725.9   908.9  4896.8
stores_2180_clipped$nn_distance <- nn_all

nn_summary <- stores_2180_clipped %>%
  st_drop_geometry() %>%
  group_by(brand_clean) %>%
  summarise(
    N        = n(),
    Mean_m   = round(mean(nn_distance,   na.rm = TRUE)),
    Median_m = round(median(nn_distance, na.rm = TRUE)),
    SD_m     = round(sd(nn_distance,     na.rm = TRUE)),
    Min_m    = round(min(nn_distance,    na.rm = TRUE)),
    Max_m    = round(max(nn_distance,    na.rm = TRUE))
  )

kable(
  nn_summary,
  col.names = c("Brand", "N", "Mean (m)", "Median (m)",
                "SD (m)", "Min (m)", "Max (m)"),
  caption   = "Table 4. Nearest-neighbour distance statistics by brand (metres)."
)
Table 4. Nearest-neighbour distance statistics by brand (metres).
Brand N Mean (m) Median (m) SD (m) Min (m) Max (m)
Aldi 8 406 353 264 40 779
Biedronka 114 751 623 681 40 4897
Lidl 16 704 531 502 114 1692

The table reveals a clear hierarchy. Biedronka stores have the shortest mean nearest-neighbour distance, reflecting how densely its network covers Warsaw — on average, there is always another store (of any brand, but likely Biedronka itself given its dominance) within a few hundred metres. Lidl stores are more widely spaced, and Aldi shows the largest mean distance and highest standard deviation, indicating the most uneven spatial coverage among the three brands.

The median being lower than the mean for all brands suggests a right-skewed distribution: most stores are relatively close to a neighbour, but a small number are isolated in lower-density areas.

4.2.2 Visualisation

ggplot(stores_2180_clipped,
       aes(x = brand_clean, y = nn_distance, fill = brand_clean)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21, outlier.size = 1.2) +
  scale_fill_manual(
    values = c("Biedronka" = "#d62828",
               "Lidl"      = "#e9c46a",
               "Aldi"      = "#457b9d")
  ) +
  labs(
    title    = "Nearest-Neighbour Distances by Brand",
    subtitle = "Discount supermarket chains in Warsaw",
    x = "Brand",
    y = "Nearest-neighbour distance (metres)"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")
Figure 4. Boxplot of nearest-neighbour distances (metres) by brand. Biedronka's compact interquartile range indicates consistently short distances across the city; Aldi's wide spread reflects its uneven network.

Figure 4. Boxplot of nearest-neighbour distances (metres) by brand. Biedronka’s compact interquartile range indicates consistently short distances across the city; Aldi’s wide spread reflects its uneven network.

ggplot(stores_2180_clipped, aes(x = nn_distance)) +
  geom_histogram(bins = 30, fill = "#2b4c7e", color = "white") +
  labs(
    title    = "Distribution of Nearest-Neighbour Distances",
    subtitle = "All discount supermarket stores",
    x = "Nearest-neighbour distance (metres)",
    y = "Number of stores"
  ) +
  theme_minimal(base_size = 12)
Figure 5. Distribution of nearest-neighbour distances for all stores combined. The right-skewed shape indicates that the majority of stores are within a few hundred metres of a neighbour, while a tail of more isolated locations also exists.

Figure 5. Distribution of nearest-neighbour distances for all stores combined. The right-skewed shape indicates that the majority of stores are within a few hundred metres of a neighbour, while a tail of more isolated locations also exists.

The histogram peaks at short distances and has a long right tail. The bulk of stores — most of which are Biedronka — are within roughly 200–600 m of their nearest neighbour, while the isolated tail corresponds to Aldi and Lidl stores in lower-density peripheral areas.


4.3 Nearest-Neighbour Brand Relationships

Beyond raw distance, we can identify whether each store’s nearest neighbour belongs to the same brand or a competing brand.

nn_index      <- nnwhich(stores_ppp)
own_brand     <- marks(stores_ppp)
nearest_brand <- own_brand[nn_index]

# Cross-tabulation of own brand vs nearest-neighbour brand
nn_brand_table <- table(own_brand = own_brand, nearest_brand = nearest_brand)
kable(nn_brand_table,
      caption = "Table 5. Cross-tabulation: brand of each store vs. brand of its nearest neighbour.")
Table 5. Cross-tabulation: brand of each store vs. brand of its nearest neighbour.
Aldi Biedronka Lidl
Aldi 0 7 1
Biedronka 6 88 20
Lidl 0 16 0
nn_brand_prop <- round(prop.table(nn_brand_table, margin = 1), 3)
kable(nn_brand_prop,
      caption = "Table 6. Row proportions: for each brand, the brand composition of its nearest neighbour.")
Table 6. Row proportions: for each brand, the brand composition of its nearest neighbour.
Aldi Biedronka Lidl
Aldi 0.000 0.875 0.125
Biedronka 0.053 0.772 0.175
Lidl 0.000 1.000 0.000
same_brand_nn <- (own_brand == nearest_brand)

same_brand_table <- table(same_brand_nn)
kable(
  as.data.frame(round(prop.table(same_brand_table), 3)),
  col.names = c("Same brand as NN", "Proportion"),
  caption   = "Table 7. Overall proportion of stores whose nearest neighbour is the same brand."
)
Table 7. Overall proportion of stores whose nearest neighbour is the same brand.
Same brand as NN Proportion
FALSE 0.362
TRUE 0.638
stores_2180_clipped$nearest_brand <- as.character(nearest_brand)
stores_2180_clipped$same_brand_nn <- same_brand_nn

same_by_brand <- stores_2180_clipped %>%
  st_drop_geometry() %>%
  group_by(brand_clean, same_brand_nn) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(brand_clean) %>%
  mutate(share = round(n / sum(n), 3))

kable(same_by_brand,
      col.names = c("Brand", "Same brand as NN", "Count", "Share"),
      caption   = "Table 8. Same-brand vs different-brand nearest neighbour by brand.")
Table 8. Same-brand vs different-brand nearest neighbour by brand.
Brand Same brand as NN Count Share
Aldi FALSE 8 1.000
Biedronka FALSE 26 0.228
Biedronka TRUE 88 0.772
Lidl FALSE 16 1.000
ggplot(same_by_brand,
       aes(x = brand_clean, y = share, fill = same_brand_nn)) +
  geom_col(position = "stack", width = 0.55) +
  scale_fill_manual(
    values = c("TRUE"  = "#2d6a4f",
               "FALSE" = "#d62828"),
    labels = c("TRUE"  = "Same brand",
               "FALSE" = "Different brand")
  ) +
  labs(
    title    = "Nearest-Neighbour Brand Relationship by Brand",
    subtitle = "Same-brand vs. different-brand nearest neighbours",
    x = "Brand", y = "Proportion", fill = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
Figure 6. Stacked bar chart showing, for each brand, the proportion of stores whose nearest neighbour belongs to the same brand (TRUE) vs. a competing brand (FALSE).

Figure 6. Stacked bar chart showing, for each brand, the proportion of stores whose nearest neighbour belongs to the same brand (TRUE) vs. a competing brand (FALSE).

# Spatstat mark table: who is the nearest neighbour in the ppp?
mark_table <- marktable(stores_ppp, N = 1, collapse = TRUE)
print(mark_table)
##            neighbour
## point       Aldi Biedronka Lidl
##   Aldi         0         7    1
##   Biedronka    6        88   20
##   Lidl         0        16    0

The results are largely driven by the stark difference in store counts. Because Biedronka accounts for the overwhelming majority of stores, it is the nearest neighbour for most Lidl and Aldi locations — not because of deliberate co-location strategy, but simply because its dense network means a Biedronka store is almost always the closest. For Biedronka itself, a meaningful share of nearest neighbours are other Biedronka stores, reflecting high intra-brand density. Aldi and Lidl almost never have a same-brand nearest neighbour — a direct consequence of their sparse networks.


4.4 Intensity and Clustering

4.4.1 Global Intensity

stores_ppp_unmarked <- unmark(stores_ppp)

int_per_m2  <- intensity(stores_ppp_unmarked)
int_per_km2 <- int_per_m2 * 1e6

cat("Overall intensity (stores per m²): ", int_per_m2,  "\n")
## Overall intensity (stores per m²):  2.670717e-07
cat("Overall intensity (stores per km²):", int_per_km2, "\n")
## Overall intensity (stores per km²): 0.2670717

The overall intensity across Warsaw is approximately 0.3 discount stores per km². However, this city-wide average conceals extreme local variation, as shown by the quadrat analysis below.

4.4.2 Quadrat Count and Test of CSR

qdr <- quadratcount(stores_ppp_unmarked)

par(mfrow = c(1, 2))
plot(qdr,                          main = "Quadrat counts (n)")
plot(intensity(qdr, image = TRUE), main = "Intensity (stores per km²)")
Figure 7. Quadrat count grid (5×5) showing the number of stores per cell (left) and intensity per cell (right). Central tiles have far higher counts than peripheral ones.

Figure 7. Quadrat count grid (5×5) showing the number of stores per cell (left) and intensity per cell (right). Central tiles have far higher counts than peripheral ones.

par(mfrow = c(1, 1))
qdr_test <- quadrat.test(stores_ppp_unmarked, alternative = "clustered")
qdr_test
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  stores_ppp_unmarked
## X2 = 147.19, df = 21, p-value < 2.2e-16
## alternative hypothesis: clustered
## 
## Quadrats: 22 tiles (irregular windows)

The chi-squared test of Complete Spatial Randomness (CSR) yields X² = 147 with p < 2.2 × 10⁻¹⁶, decisively rejecting the null hypothesis of spatial randomness in favour of clustering. Even at the coarsest resolution (5×5 grid), the concentration of stores in a handful of central tiles is far beyond what random placement would produce.


4.5 Cross-Type Spatial Interaction: Cross-K and Cross-L Functions

The Cross-K and Cross-L functions test whether stores of one brand are spatially attracted to (or repelled from) stores of another brand, across a range of distance scales.

4.5.1 Cross-K Function: Biedronka → Lidl

K_BL <- Kcross(stores_ppp, i = "Biedronka", j = "Lidl", correction = "border")
plot(K_BL, main = "Cross-K function: Biedronka vs Lidl")
Figure 8. Cross-K function from Biedronka to Lidl. The observed curve (solid) lies above the CSR theoretical line (dashed), indicating that Lidl stores tend to have more Biedronka neighbours within distance r than expected under spatial randomness.

Figure 8. Cross-K function from Biedronka to Lidl. The observed curve (solid) lies above the CSR theoretical line (dashed), indicating that Lidl stores tend to have more Biedronka neighbours within distance r than expected under spatial randomness.

4.5.2 Cross-L Functions: Both Directions

L_BL <- Lcross(stores_ppp, i = "Biedronka", j = "Lidl", correction = "border")
L_LB <- Lcross(stores_ppp, i = "Lidl", j = "Biedronka", correction = "border")

par(mfrow = c(1, 2))
plot(L_BL, main = "Cross-L: Biedronka → Lidl")
plot(L_LB, main = "Cross-L: Lidl → Biedronka")
Figure 9. Cross-L functions between Biedronka and Lidl in both directions. The L-function standardises K so that the CSR baseline is zero, making deviations easier to read. Both curves lie above zero, confirming co-location.

Figure 9. Cross-L functions between Biedronka and Lidl in both directions. The L-function standardises K so that the CSR baseline is zero, making deviations easier to read. Both curves lie above zero, confirming co-location.

par(mfrow = c(1, 1))

In both directions, the observed Cross-L curve lies above the CSR baseline across all distance scales. This indicates co-location: Biedronka and Lidl stores are spatially closer to one another than expected if locations were independently and randomly assigned. The most plausible explanation is that both brands are drawn to the same types of urban locations — busy road intersections, transit nodes, and dense residential areas — rather than any deliberate mutual attraction.

4.5.3 Monte Carlo Envelope Test

set.seed(123)

L_BL_env <- envelope(
  stores_ppp,
  fun        = Lcross,
  i          = "Biedronka",
  j          = "Lidl",
  nsim       = 39,
  correction = "border"
)
## Generating 39 simulated realisations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 
## 39.
## 
## Done.
plot(L_BL_env, main = "Monte Carlo Envelope: Cross-L, Biedronka vs Lidl")
Figure 10. Monte Carlo simulation envelope (39 runs, border correction) for the Cross-L function from Biedronka to Lidl. The observed curve (solid red) lies consistently above the upper simulation boundary, confirming statistically significant co-location at essentially all distance scales tested.

Figure 10. Monte Carlo simulation envelope (39 runs, border correction) for the Cross-L function from Biedronka to Lidl. The observed curve (solid red) lies consistently above the upper simulation boundary, confirming statistically significant co-location at essentially all distance scales tested.

The observed curve exceeds the upper boundary of the simulation envelope at essentially all distances, providing strong statistical evidence that the spatial proximity of Biedronka and Lidl stores is not a product of chance. With 39 simulations, the significance level is approximately 5% (one-sided). Both chains are responding to the same spatial structure of Warsaw — the same transit corridors, the same dense residential blocks — producing systematic co-location even in the absence of any direct interaction between the brands.


4.6 Kernel Density Estimation

Kernel density smoothing provides a continuous intensity surface, useful for identifying spatial hotspots and comparing brand-level distributions.

4.6.1 All Stores

density_all <- density(stores_ppp_unmarked)
plot(density_all, main = "Kernel Density: All Discount Supermarkets")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.3)
Figure 11. Kernel density surface of all discount supermarket locations in Warsaw. Warm colours (yellow/red) indicate higher intensity. The inner-city and western residential districts form the dominant hotspot.

Figure 11. Kernel density surface of all discount supermarket locations in Warsaw. Warm colours (yellow/red) indicate higher intensity. The inner-city and western residential districts form the dominant hotspot.

The overall kernel density map reveals a pronounced hotspot in the central and inner districts of Warsaw, broadly coinciding with the densest residential areas and the core of the public transport network. Intensity decreases steadily towards the city’s peripheral boundary, with the lowest values in forested and less-developed zones.

4.6.2 Kernel Density: Biedronka

biedronka_ppp     <- stores_ppp[marks(stores_ppp) == "Biedronka"]
density_biedronka <- density(biedronka_ppp)
plot(density_biedronka, main = "Kernel Density: Biedronka")
plot(biedronka_ppp, add = TRUE, pch = 16, cex = 0.3)
Figure 12. Kernel density of Biedronka store locations. High intensity extends throughout central Warsaw and along major residential corridors, reflecting a saturation strategy.

Figure 12. Kernel density of Biedronka store locations. High intensity extends throughout central Warsaw and along major residential corridors, reflecting a saturation strategy.

4.6.3 Kernel Density: Lidl

lidl_ppp     <- stores_ppp[marks(stores_ppp) == "Lidl"]
density_lidl <- density(lidl_ppp)
plot(density_lidl, main = "Kernel Density: Lidl")
plot(lidl_ppp, add = TRUE, pch = 16, cex = 0.5)
Figure 13. Kernel density of Lidl store locations. Compared to Biedronka, Lidl shows a more polycentric pattern with moderate-intensity peaks in several suburban areas rather than a uniform central hotspot.

Figure 13. Kernel density of Lidl store locations. Compared to Biedronka, Lidl shows a more polycentric pattern with moderate-intensity peaks in several suburban areas rather than a uniform central hotspot.

The brand-level density maps highlight the fundamental difference in retail strategy. Biedronka’s surface is high and relatively broad across central Warsaw — saturation coverage designed to minimise consumer travel distances citywide. Lidl’s surface, by contrast, is lower in overall intensity with distinct peaks in specific suburban nodes — a selective, large-format strategy prioritising car-accessible locations with concentrated commercial footfall.


5 Poisson Point Process Models

Four inhomogeneous Poisson point process (ppm) models are fitted to the unmarked store pattern to formally quantify how spatial covariates explain the distribution of store intensity across Warsaw.

5.1 Preparing Covariates

# Coordinate means for centring
xy <- coords(stores_ppp_unmarked)

# Covariate 1: Spatial trend — centred X and Y in km
Xkm <- as.im(function(x, y) (x - mean(xy$x)) / 1000, W = warsaw_window)
Ykm <- as.im(function(x, y) (y - mean(xy$y)) / 1000, W = warsaw_window)

# Covariate 2: Distance to Warsaw centroid (km)
city_centre  <- st_centroid(warsaw_boundary_2180)
centre_coords <- st_coordinates(city_centre)

DistCentre <- as.im(
  function(x, y) {
    sqrt((x - centre_coords[1, 1])^2 + (y - centre_coords[1, 2])^2) / 1000
  },
  W = warsaw_window
)

# Covariate 3: Distance to nearest public transport stop (km)
public_transport_points_2180 <- public_transport_stops_2180[
  st_geometry_type(public_transport_stops_2180) == "POINT", ]

stop_coords <- st_coordinates(public_transport_points_2180)

stops_ppp <- ppp(
  x = stop_coords[, 1],
  y = stop_coords[, 2],
  window = warsaw_window
)

DistStop <- distmap(stops_ppp) / 1000
par(mfrow = c(1, 2))
plot(DistCentre, main = "Distance to Warsaw centroid (km)")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.3)
plot(DistStop, main = "Distance to nearest PT stop (km)")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.3)
Figure 14. Spatial covariate images used in the ppm models. Left: distance to the geographic centroid of Warsaw (km). Right: distance to the nearest public transport stop (km). Both show the expected concentric structure centred on the inner city.

Figure 14. Spatial covariate images used in the ppm models. Left: distance to the geographic centroid of Warsaw (km). Right: distance to the nearest public transport stop (km). Both show the expected concentric structure centred on the inner city.

par(mfrow = c(1, 1))

The distance-to-centre covariate captures the urban density gradient — if stores prefer central locations with higher footfall, the coefficient will be negative. The distance-to-transport covariate tests whether stores co-locate with transit stops, which generate pedestrian flow that is essential for walk-in retail. The two covariates are related but distinct: the transit covariate captures fine-grained variation at individual stop locations, whereas the centre covariate captures the broad radial gradient.

5.2 Model A: Spatial Trend (X + Y)

ppm_xy2 <- ppm(stores_ppp_unmarked ~ Xkm + Ykm)
summary(ppm_xy2)
## Point process model
## Fitted to data: stores_ppp_unmarked
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = stores_ppp_unmarked ~ Xkm + Ykm)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  138 points
## Average intensity 2.67e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 898.5754 x 935.7172 units
## 
## Original dummy parameters: =
## Planar point pattern:  677 points
## Average intensity 1.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [16600, 841000]  total: 5.16e+08
## Weights on data points:
##  range: [140000, 420000] total: 4.9e+07
## Weights on dummy points:
##  range: [16600, 841000]  total: 4.67e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~Xkm + Ykm
## Model depends on external covariates 'Xkm' and 'Ykm'
## Covariates provided:
##  Xkm: im
##  Ykm: im
## 
## Fitted trend coefficients:
##  (Intercept)          Xkm          Ykm 
## -15.08636205  -0.05038953  -0.01154417 
## 
##                 Estimate       S.E.      CI95.lo      CI95.hi Ztest
## (Intercept) -15.08636205 0.08512566 -15.25320528 -14.91951882   ***
## Xkm          -0.05038953 0.01407427  -0.07797459  -0.02280447   ***
## Ykm          -0.01154417 0.01264640  -0.03633065   0.01324231      
##                     Zval
## (Intercept) -177.2246127
## Xkm           -3.5802597
## Ykm           -0.9128425
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
##  (Intercept)          Xkm          Ykm 
## -15.08636205  -0.05038953  -0.01154417 
## 
## Fitted exp(theta):
##  (Intercept)          Xkm          Ykm 
## 2.805926e-07 9.508590e-01 9.885222e-01
plot(predict(ppm_xy2), main = "Predicted intensity: spatial trend model")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.4)
Figure 15. Predicted intensity surface from the spatial trend model (linear in centred X and Y coordinates). The directional gradient reflects asymmetry in store distribution across Warsaw, but does not correspond to a specific explanatory mechanism.

Figure 15. Predicted intensity surface from the spatial trend model (linear in centred X and Y coordinates). The directional gradient reflects asymmetry in store distribution across Warsaw, but does not correspond to a specific explanatory mechanism.

Both X and Y coordinates are statistically significant, confirming directional variation in store density across Warsaw. However, raw coordinates carry no behavioural interpretation — this model serves primarily as a baseline rather than an explanatory specification.

5.3 Model B: Distance to City Centre

ppm_centre <- ppm(stores_ppp_unmarked ~ DistCentre)
summary(ppm_centre)
## Point process model
## Fitted to data: stores_ppp_unmarked
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = stores_ppp_unmarked ~ DistCentre)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  138 points
## Average intensity 2.67e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 898.5754 x 935.7172 units
## 
## Original dummy parameters: =
## Planar point pattern:  677 points
## Average intensity 1.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [16600, 841000]  total: 5.16e+08
## Weights on data points:
##  range: [140000, 420000] total: 4.9e+07
## Weights on dummy points:
##  range: [16600, 841000]  total: 4.67e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~DistCentre
## Model depends on external covariate 'DistCentre'
## Covariates provided:
##  DistCentre: im
## 
## Fitted trend coefficients:
## (Intercept)  DistCentre 
## -13.3684312  -0.2351152 
## 
##                Estimate       S.E.     CI95.lo     CI95.hi Ztest       Zval
## (Intercept) -13.3684312 0.17804856 -13.7173999 -13.0194624   *** -75.083064
## DistCentre   -0.2351152 0.02559277  -0.2852762  -0.1849543   ***  -9.186784
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept)  DistCentre 
## -13.3684312  -0.2351152 
## 
## Fitted exp(theta):
##  (Intercept)   DistCentre 
## 1.563738e-06 7.904797e-01
plot(predict(ppm_centre), main = "Predicted intensity: distance to centre model")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.4)
Figure 16. Predicted intensity from the city centre distance model. The radially symmetric surface correctly captures the concentration of stores in the inner city, but smooths over local variation driven by specific transit or commercial nodes.

Figure 16. Predicted intensity from the city centre distance model. The radially symmetric surface correctly captures the concentration of stores in the inner city, but smooths over local variation driven by specific transit or commercial nodes.

The coefficient for DistCentre is negative and highly significant: store intensity decreases as distance from the city centre increases. Exponentiating the coefficient gives the multiplicative factor per additional kilometre — a value below 1 confirms the centrality premium in discount retail site selection.

5.4 Model C: Distance to Public Transport Stops

ppm_stop <- ppm(stores_ppp_unmarked ~ DistStop)
summary(ppm_stop)
## Point process model
## Fitted to data: stores_ppp_unmarked
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = stores_ppp_unmarked ~ DistStop)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  138 points
## Average intensity 2.67e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 898.5754 x 935.7172 units
## 
## Original dummy parameters: =
## Planar point pattern:  677 points
## Average intensity 1.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [16600, 841000]  total: 5.16e+08
## Weights on data points:
##  range: [140000, 420000] total: 4.9e+07
## Weights on dummy points:
##  range: [16600, 841000]  total: 4.67e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~DistStop
## Model depends on external covariate 'DistStop'
## Covariates provided:
##  DistStop: im
## 
## Fitted trend coefficients:
## (Intercept)    DistStop 
##  -13.704822   -7.398389 
## 
##               Estimate      S.E.    CI95.lo   CI95.hi Ztest       Zval
## (Intercept) -13.704822 0.1422505 -13.983628 -13.42602   *** -96.342905
## DistStop     -7.398389 0.8951892  -9.152928  -5.64385   ***  -8.264609
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept)    DistStop 
##  -13.704822   -7.398389 
## 
## Fitted exp(theta):
##  (Intercept)     DistStop 
## 1.117047e-06 6.122383e-04
plot(predict(ppm_stop), main = "Predicted intensity: distance to public transport")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.4)
Figure 17. Predicted intensity from the public transport distance model. Areas well-served by transit have markedly higher predicted store density, and the surface tracks Warsaw's transit network more closely than the simple centre-distance model.

Figure 17. Predicted intensity from the public transport distance model. Areas well-served by transit have markedly higher predicted store density, and the surface tracks Warsaw’s transit network more closely than the simple centre-distance model.

The coefficient for DistStop is also negative and highly significant: stores are more likely to be located near transit stops. This is consistent with discount retail theory — Biedronka, Lidl, and Aldi all depend heavily on pedestrian footfall from commuters, and transit stops are the primary generators of such footfall across Warsaw’s districts.

5.5 Model D: Combined Model

ppm_combined <- ppm(stores_ppp_unmarked ~ DistCentre + DistStop)
summary(ppm_combined)
## Point process model
## Fitted to data: stores_ppp_unmarked
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = stores_ppp_unmarked ~ DistCentre + DistStop)
## Edge correction: "border"
##  [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
## 
## Data pattern:
## Planar point pattern:  138 points
## Average intensity 2.67e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## 
## Dummy quadrature points:
##      32 x 32 grid of dummy points, plus 4 corner points
##      dummy spacing: 898.5754 x 935.7172 units
## 
## Original dummy parameters: =
## Planar point pattern:  677 points
## Average intensity 1.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 1745 vertices
## enclosing rectangle: [626505.9, 655260.3] x [472229.5, 502172.4] units
##                      (28750 x 29940 units)
## Window area = 516715000 square units
## Fraction of frame area: 0.6
## Quadrature weights:
##      (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
##  range: [16600, 841000]  total: 5.16e+08
## Weights on data points:
##  range: [140000, 420000] total: 4.9e+07
## Weights on dummy points:
##  range: [16600, 841000]  total: 4.67e+08
## --------------------------------------------------------------------------------
## FITTED :
## 
## Nonstationary Poisson process
## 
## ---- Intensity: ----
## 
## Log intensity: ~DistCentre + DistStop
## Model depends on external covariates 'DistCentre' and 'DistStop'
## Covariates provided:
##  DistCentre: im
##  DistStop: im
## 
## Fitted trend coefficients:
## (Intercept)  DistCentre    DistStop 
## -12.5692775  -0.1742546  -6.6829270 
## 
##                Estimate       S.E.     CI95.lo     CI95.hi Ztest       Zval
## (Intercept) -12.5692775 0.20942672 -12.9797464 -12.1588087   *** -60.017543
## DistCentre   -0.1742546 0.02692345  -0.2270236  -0.1214856   ***  -6.472224
## DistStop     -6.6829270 0.91468032  -8.4756675  -4.8901865   ***  -7.306298
## 
## ----------- gory details -----
## 
## Fitted regular parameters (theta):
## (Intercept)  DistCentre    DistStop 
## -12.5692775  -0.1742546  -6.6829270 
## 
## Fitted exp(theta):
##  (Intercept)   DistCentre     DistStop 
## 3.477220e-06 8.400830e-01 1.252108e-03
plot(predict(ppm_combined), main = "Predicted intensity: combined model")
plot(stores_ppp_unmarked, add = TRUE, pch = 16, cex = 0.4)
Figure 18. Predicted intensity from the combined model (distance to centre + distance to public transport). Both covariates contribute independently, producing a surface that tracks the observed data more faithfully than either covariate alone.

Figure 18. Predicted intensity from the combined model (distance to centre + distance to public transport). Both covariates contribute independently, producing a surface that tracks the observed data more faithfully than either covariate alone.

In the combined model, both coefficients remain statistically significant, confirming that they capture complementary rather than redundant aspects of store location. DistCentre reflects the broad radial density gradient of Warsaw’s urban fabric; DistStop captures the additional pull of individual transit nodes throughout the city. The predicted surface shows elevated intensity both in the central hotspot and along transit corridors in outer districts — matching the observed data more faithfully.

5.6 Model Comparison: AIC

aic_raw <- data.frame(
  Model = c(
    "A: Spatial trend (Xkm + Ykm)",
    "B: Distance to city centre",
    "C: Distance to public transport",
    "D: Combined (centre + transport)"
  ),
  AIC = c(
    AIC(ppm_xy2),
    AIC(ppm_centre),
    AIC(ppm_stop),
    AIC(ppm_combined)
  )
)

aic_table <- aic_raw %>%
  dplyr::arrange(AIC) %>%
  dplyr::mutate(
    ΔAIC = round(AIC - min(AIC), 1),
    AIC  = round(AIC, 1)
  )

kable(
  aic_table,
  col.names = c("Model", "AIC", "ΔAIC"),
  caption   = "Table 9. AIC comparison of four ppm models. Lower AIC indicates better fit. ΔAIC > 2 denotes a meaningful difference; ΔAIC > 10 indicates no empirical support."
)
Table 9. AIC comparison of four ppm models. Lower AIC indicates better fit. ΔAIC > 2 denotes a meaningful difference; ΔAIC > 10 indicates no empirical support.
Model AIC ΔAIC
D: Combined (centre + transport) 4279.8 0.0
C: Distance to public transport 4322.5 42.7
B: Distance to city centre 4366.2 86.4
A: Spatial trend (Xkm + Ykm) 4445.8 166.0

The AIC ranking confirms the combined model (D) as the best specification. Model C (public transport alone) ranks second and outperforms Model B (city centre alone) — indicating that fine-grained transit access is a stronger predictor of store placement than simple radial distance from the city centre. Model A (spatial trend) performs worst, as it captures only a directional gradient with no causal interpretation.

The large ΔAIC values for Models A, B, and C (all substantially above 10) indicate that the combined model is the only one with meaningful empirical support. Both covariates are necessary; dropping either produces a substantially inferior fit.


6 Summary

This analysis examined the spatial distribution of three discount supermarket chains — Biedronka, Lidl, and Aldi — across Warsaw using point pattern methods. Five main conclusions emerge:

1. Biedronka dominates the discount supermarket landscape. With 114 outlets within Warsaw, Biedronka’s network far exceeds those of Lidl and Aldi combined. Its mean nearest-neighbour distance of 751 m confirms a saturation strategy designed to minimise consumer access distances across the entire city.

2. Store distribution is strongly clustered, not random. The quadrat test of CSR yields X² = 147 with p < 2.2 × 10⁻¹⁶, rejecting spatial randomness decisively. Stores concentrate heavily in central districts — the quadrat intensity map shows the inner city tiles have substantially more stores than peripheral tiles.

3. Biedronka and Lidl are significantly co-located. The Monte Carlo Cross-L envelope test shows the observed curve exceeds the upper simulation boundary at essentially all distance scales, confirming co-location beyond chance. Both brands respond to the same spatial drivers — transit nodes, dense residential corridors, major commercial roads — producing systematic co-location even without direct interaction between the firms.

4. Public transport access is the strongest single predictor of store intensity. Among individual covariates, the public transport distance model (C) achieves a lower AIC than the city centre model (B). Discount supermarkets — dependent on pedestrian footfall — cluster near transit stops throughout Warsaw, not only in the city centre.

5. The combined model best explains store intensity. Distance to the city centre and distance to public transport contribute independently and significantly. Their combination produces the lowest AIC and the most realistic predicted intensity surface, capturing both the central hotspot and secondary clusters along transit corridors in outer districts.

These findings illustrate how retail spatial strategy is legible in measurable spatial statistics: Biedronka’s saturation model, Lidl’s selective suburban positioning, and the shared dependence on transit-driven footfall are all reflected in the point pattern data.


7 Appendix: Session Information

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=Polish_Poland.utf8  LC_CTYPE=Polish_Poland.utf8   
## [3] LC_MONETARY=Polish_Poland.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=Polish_Poland.utf8    
## 
## time zone: Europe/Warsaw
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.49             spatstat.model_3.7-0   rpart_4.1.23          
##  [4] spatstat.explore_3.8-0 nlme_3.1-164           spatstat.random_3.4-5 
##  [7] spatstat.geom_3.7-3    spatstat.univar_3.1-7  spatstat.data_3.1-9   
## [10] ggplot2_4.0.3          dplyr_1.1.4            sf_1.1-1              
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9            utf8_1.2.4            generics_0.1.3       
##  [4] class_7.3-22          tensor_1.5.1          KernSmooth_2.23-24   
##  [7] lattice_0.22-6        digest_0.6.37         magrittr_2.0.3       
## [10] spatstat.utils_3.2-3  evaluate_1.0.5        grid_4.4.1           
## [13] RColorBrewer_1.1-3    fastmap_1.2.0         jsonlite_1.8.9       
## [16] Matrix_1.7-4          spatstat.sparse_3.1-0 e1071_1.7-16         
## [19] DBI_1.2.3             mgcv_1.9-1            fansi_1.0.6          
## [22] scales_1.4.0          codetools_0.2-20      jquerylib_0.1.4      
## [25] abind_1.4-8           cli_3.6.3             rlang_1.1.4          
## [28] polyclip_1.10-7       units_0.8-7           splines_4.4.1        
## [31] withr_3.0.2           cachem_1.1.0          yaml_2.3.10          
## [34] tools_4.4.1           deldir_2.0-4          vctrs_0.6.5          
## [37] R6_2.6.1              proxy_0.4-27          lifecycle_1.0.5      
## [40] classInt_0.4-11       pkgconfig_2.0.3       pillar_1.9.0         
## [43] bslib_0.8.0           gtable_0.3.6          glue_1.8.0           
## [46] Rcpp_1.1.1            xfun_0.49             tibble_3.2.1         
## [49] tidyselect_1.2.1      rstudioapi_0.17.1     goftest_1.2-3        
## [52] farver_2.1.2          htmltools_0.5.8.1     labeling_0.4.3       
## [55] rmarkdown_2.29        compiler_4.4.1        S7_0.2.0