1. Introduction and Economic Aim

Aim of the Study

The primary aim of this project is to analyze the spatial distribution of banking infrastructure in Warsaw, comparing the locations of Automated Teller Machines (ATMs) with Physical Bank Branches using spatial point pattern analysis methods from the spatstat package.

Economic Context

The banking industry is undergoing a massive digital transformation. Maintaining physical bank branches in prime real estate — especially in a capital city like Warsaw — carries enormous overhead costs: rent, staffing, and security. To optimize costs, banks increasingly close traditional branches and replace them with ATMs, which have a far smaller real estate footprint.

From an urban economics perspective, we expect:

  • Bank Branches → heavily clustered in the Central Business District (CBD), where corporate clients and high-value transactions occur.
  • ATMs → dispersed across residential neighborhoods, shopping centers, and transit hubs to maximize consumer convenience at lower cost.

We use spatial point pattern analysis to formally test these economic hypotheses.


2. Data Preparation

We use real-world datasets extracted from OpenStreetMap for Warsaw (1,103 ATMs and 343 Bank Branches in the raw data). For computational efficiency, we sample 350 ATMs and 150 Banks (500 points total). The Warsaw administrative boundary is obtained from the Eurostat GISCO database — no local shapefiles required.

required_pkgs <- c("sf", "spatstat", "ggplot2", "tidyverse",
                   "viridis", "ggthemes", "giscoR")

for (pkg in required_pkgs) {
  if (!require(pkg, character.only = TRUE))
    install.packages(pkg, repos = "http://cran.us.r-project.org")
  library(pkg, character.only = TRUE)
}
## package 'curl' successfully unpacked and MD5 sums checked
## package 'countrycode' successfully unpacked and MD5 sums checked
## package 'httr2' successfully unpacked and MD5 sums checked
## package 'giscoR' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\khami\AppData\Local\Temp\Rtmp69SXs3\downloaded_packages
# ── 1. Load and sample datasets ───────────────────────────────────────────────
set.seed(42)
atms  <- read.csv("Datasets/topic5_atms_warsaw.csv")  |> slice_sample(n = 350)
banks <- read.csv("Datasets/topic5_banks_warsaw.csv") |> slice_sample(n = 150)
banking_data <- bind_rows(atms, banks)

# Convert to sf (WGS84)
banking_sf <- st_as_sf(banking_data, coords = c("lon", "lat"), crs = 4326)

# ── 2. Warsaw boundary from Eurostat GISCO (NUTS3 — "Miasto Warszawa") ────────
nuts3_pl     <- gisco_get_nuts(year = "2021", country = "PL", nuts_level = 3)
warsaw_border <- nuts3_pl[nuts3_pl$NUTS_ID == "PL911", ] |>
  st_transform(crs = 3857)

# ── 3. Transform and clip points to Warsaw ────────────────────────────────────
banking_sf    <- st_transform(banking_sf, crs = 3857)
banking_warsaw <- st_filter(banking_sf, warsaw_border)   # keep points inside Warsaw

# ── 4. Observation window (owin) ──────────────────────────────────────────────
W <- as.owin(warsaw_border)

# ── 5. Marked Point Pattern (ppp) ─────────────────────────────────────────────
pattern <- ppp(
  x      = st_coordinates(banking_warsaw)[, 1],
  y      = st_coordinates(banking_warsaw)[, 2],
  window = W,
  marks  = as.factor(banking_warsaw$type)
)

# Jitter to remove duplicate coordinates; rescale meters → kilometres
pattern <- rjitter(pattern, 0.03)
pattern <- rescale.ppp(pattern, 1000, "km")

summary(pattern)
## Marked planar point pattern:  476 points
## Average intensity 0.3832 points per square km
## 
## Coordinates are given to 12 decimal places
## 
## Multitype:
##      frequency proportion intensity
## atm        332     0.6975    0.2672
## bank       144     0.3025    0.1159
## 
## Window: polygonal boundary
## single connected closed polygon with 7 vertices
## enclosing rectangle: [2327, 2365.8] x [6819, 6865] km
##                      (38.85 x 45.92 km)
## Window area = 1242.29 square km
## Unit of length: 1 km
## Fraction of frame area: 0.696

3. Visualization and First-Order Properties

First-order properties describe the intensity (average density) of events across the study area.

3.1 Spatial Distribution Map

plot_data <- banking_warsaw |>
  mutate(x = st_coordinates(geometry)[, 1],
         y = st_coordinates(geometry)[, 2])

ggplot() +
  geom_sf(data = warsaw_border, fill = "#2b2b2b", color = "#444444", linewidth = 0.5) +
  geom_point(data = plot_data, aes(x = x, y = y, color = type),
             size = 1.5, alpha = 0.8) +
  scale_color_manual(values = c("atm" = "#00FFCC", "bank" = "#FF3366")) +
  theme_fivethirtyeight() +
  theme(
    panel.background  = element_rect(fill = "#1e1e1e"),
    plot.background   = element_rect(fill = "#1e1e1e"),
    legend.background = element_rect(fill = "#1e1e1e"),
    legend.text  = element_text(color = "white"),
    legend.title = element_blank(),
    text         = element_text(color = "white"),
    axis.text    = element_blank(),
    axis.title   = element_blank(),
    panel.grid   = element_blank()
  ) +
  labs(title    = "Banking Infrastructure in Warsaw",
       subtitle = "ATMs (Cyan) vs Physical Bank Branches (Pink)")

3.2 Kernel Density Estimation (KDE)

KDE reveals spatial hotspots of banking infrastructure. We use the Diggle bandwidth selector (bw.diggle), which minimises the mean-squared error of the density estimate.

split_pattern <- split(pattern)

d_atm  <- density(split_pattern$atm,  sigma = bw.diggle)
d_bank <- density(split_pattern$bank, sigma = bw.diggle)

par(mfrow = c(1, 2), mar = c(1, 1, 3, 1))
plot(d_atm,  main = "KDE: ATMs",   col = inferno(256))
plot(d_bank, main = "KDE: Banks",  col = inferno(256))

par(mfrow = c(1, 1))

Observation: Bank branches form a sharp, concentrated peak in the city centre. ATMs display multiple high-density clusters spread across all Warsaw districts.

3.3 Relative Risk

Relative risk maps the spatially varying probability of encountering an ATM relative to a Bank at any given location.

probs_rr <- relrisk(pattern, relative = FALSE)

par(mar = c(1, 1, 3, 1))
plot(probs_rr,
     main = "Relative Risk: Probability of Encountering an ATM vs Bank",
     col  = viridis(256))
contour(probs_rr, add = TRUE, col = "white", alpha = 0.5)

Economic Interpretation: Dark areas indicate Bank-dominant zones (CBD). Bright yellow regions are ATM-dominant residential districts.


4. Second-Order Properties

Second-order properties describe spatial dependence — whether events cluster, repel, or are independent of each other.

4.1 Spatial Segregation Test (Monte Carlo)

We formally test H₀: ATMs and Banks are spatially independent (no segregation).

seg_test <- segregation.test.ppp(pattern, nsim = 39)
## Computing observed value... Done.
## Computing 39 simulated values...
## 1,
## 2,
## 3,
## 4,
## 5,
## 6,
## 7,
## 8,
## 9,
## 10, 11, 12,
## 13,
## 14,
## 15,
## 16,
## 17,
## 18,
## 19,
## 20,
## 21, 22,
## 23,
## 24,
## 25, 26,
## 27,
## 28,
## 29, 30, 31, 32,
## 33,
## 34,
## 35, 36,
## 37,
## 38,
## 
## 39.
## Done.
seg_test
## 
##  Monte Carlo test of spatial segregation of types
## 
## data:  pattern
## T = 4.3, p-value = 0.05

Conclusion: A p-value < 0.05 rejects H₀, confirming significant spatial segregation — ATMs and bank branches serve geographically distinct areas of Warsaw.

4.2 Nearest-Neighbour Distance: G-Function

The G-function (Gest) gives the cumulative distribution of nearest-neighbour distances. If the observed curve lies above the theoretical Poisson (CSR) line, events are more clustered than a random process would produce.

G_atm  <- Gest(split_pattern$atm)
G_bank <- Gest(split_pattern$bank)

par(mfrow = c(1, 2))
plot(G_atm,  main = "G-function: ATMs",   legend = FALSE)
plot(G_bank, main = "G-function: Banks",  legend = FALSE)

par(mfrow = c(1, 1))

Conclusion: Both types exceed the Poisson baseline, confirming clustering. The Bank G-function rises more steeply at short distances, indicating tighter spatial grouping consistent with CBD concentration.

4.3 Inhomogeneous L-Function

The inhomogeneous L-function (Linhom) tests for clustering while correcting for Warsaw’s spatially varying background density. It is the variance-stabilised version of Ripley’s K-function.

L_atm  <- Linhom(split_pattern$atm)
L_bank <- Linhom(split_pattern$bank)

par(mfrow = c(1, 2))
plot(L_atm,  main = "L-function (inhom): ATMs",  legend = FALSE)
plot(L_bank, main = "L-function (inhom): Banks", legend = FALSE)

par(mfrow = c(1, 1))

Conclusion: Both infrastructure types show significant clustering beyond what the local density gradient alone predicts. Bank branches exhibit stronger short-range clustering, consistent with agglomeration in commercial districts.


5. Point Process Modelling (PPM)

We test the central economic hypothesis: Are Bank Branches more strongly attracted to the city centre than ATMs?

The Palace of Culture and Science (PKiN) serves as a proxy for Warsaw’s CBD — the most prominent commercial and transport hub in the city.

# ── 1. City-centre point (PKiN) ───────────────────────────────────────────────
center_sf <- st_as_sf(
  data.frame(lon = 21.0068, lat = 52.2319),
  coords = c("lon", "lat"), crs = 4326
) |> st_transform(crs = 3857)

# Create ppp in the original meters window, then rescale to km
center_ppp <- ppp(
  x      = st_coordinates(center_sf)[, 1],
  y      = st_coordinates(center_sf)[, 2],
  window = W
)
center_ppp <- rescale.ppp(center_ppp, 1000, "km")

# ── 2. Distance-to-centre spatial covariate ───────────────────────────────────
dist_raw     <- crossdist(pattern, center_ppp)
pattern_dist <- ppp(x = pattern$x, y = pattern$y,
                    window = pattern$window, marks = dist_raw[, 1])
dist_map <- Smooth.ppp(pattern_dist)

par(mar = c(1, 1, 3, 1))
plot(dist_map, main = "Distance to Warsaw CBD — Palace of Culture (km)",
     col = magma(256))

# ── 3. Fit models ─────────────────────────────────────────────────────────────
m1 <- ppm(pattern ~ marks)             # baseline: type only
m2 <- ppm(pattern ~ marks * dist_map) # interaction: type × distance to CBD

cat("\n--- Model 2 Coefficients (marks × distance) ---\n")
## 
## --- Model 2 Coefficients (marks × distance) ---
print(coef(summary(m2)))
##                     Estimate    S.E.  CI95.lo  CI95.hi Ztest     Zval
## (Intercept)         0.837332 0.10660  0.62840  1.04626   ***   7.8551
## marksbank          -0.819948 0.19355 -1.19930 -0.44060   ***  -4.2364
## dist_map           -0.201635 0.01146 -0.22410 -0.17917   *** -17.5948
## marksbank:dist_map -0.001933 0.02088 -0.04285  0.03899        -0.0926
# ── 4. Likelihood-Ratio Test ──────────────────────────────────────────────────
cat("\n--- ANOVA Likelihood-Ratio Test: m1 vs m2 ---\n")
## 
## --- ANOVA Likelihood-Ratio Test: m1 vs m2 ---
anova(m1, m2, test = "Chi")
## Analysis of Deviance Table
## 
## Model 1: ~marks   Poisson
## Model 2: ~marks * dist_map    Poisson
##   Npar Df Deviance            Pr(>Chi)    
## 1    2                                    
## 2    4  2      524 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Final Conclusion: The highly significant ANOVA p-value confirms that distance to the CBD is a strong predictor of banking infrastructure location. The interaction term marksbank:dist_map shows that bank branch intensity drops off far more sharply with distance from the centre compared to ATMs. This validates our economic hypothesis:

  • Banks pay premium rents to locate in the high-value CBD corridor.
  • ATMs function as a low-cost, decentralised retail banking network for all districts.

6. Summary

Method Category Key Finding
KDE First-order Banks peak in CBD; ATMs spread city-wide
Relative Risk First-order CBD is Bank-dominant; suburbs are ATM-dominant
Segregation Test Second-order Significant spatial segregation (p < 0.05)
G-function Second-order Both cluster; Banks cluster more tightly
Inhomogeneous L-function Second-order Clustering exceeds background density for both types
PPM + ANOVA Modelling Distance to CBD is highly significant (p < 0.001)

The combined evidence strongly supports the hypothesis that physical bank branch rationalisation in Warsaw follows an economic logic: branches concentrate where commercial returns are highest (CBD), while ATMs substitute for retail banking coverage in lower-cost residential locations. This mirrors patterns observed in other European capital cities undergoing digital banking transitions.