The primary aim of this project is to analyze the spatial
distribution of banking infrastructure in Warsaw, comparing the
locations of Automated Teller Machines (ATMs) with
Physical Bank Branches using spatial point pattern
analysis methods from the spatstat package.
The banking industry is undergoing a massive digital transformation. Maintaining physical bank branches in prime real estate — especially in a capital city like Warsaw — carries enormous overhead costs: rent, staffing, and security. To optimize costs, banks increasingly close traditional branches and replace them with ATMs, which have a far smaller real estate footprint.
From an urban economics perspective, we expect:
We use spatial point pattern analysis to formally test these economic hypotheses.
We use real-world datasets extracted from OpenStreetMap for Warsaw (1,103 ATMs and 343 Bank Branches in the raw data). For computational efficiency, we sample 350 ATMs and 150 Banks (500 points total). The Warsaw administrative boundary is obtained from the Eurostat GISCO database — no local shapefiles required.
required_pkgs <- c("sf", "spatstat", "ggplot2", "tidyverse",
"viridis", "ggthemes", "giscoR")
for (pkg in required_pkgs) {
if (!require(pkg, character.only = TRUE))
install.packages(pkg, repos = "http://cran.us.r-project.org")
library(pkg, character.only = TRUE)
}## package 'curl' successfully unpacked and MD5 sums checked
## package 'countrycode' successfully unpacked and MD5 sums checked
## package 'httr2' successfully unpacked and MD5 sums checked
## package 'giscoR' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\khami\AppData\Local\Temp\Rtmp69SXs3\downloaded_packages
# ── 1. Load and sample datasets ───────────────────────────────────────────────
set.seed(42)
atms <- read.csv("Datasets/topic5_atms_warsaw.csv") |> slice_sample(n = 350)
banks <- read.csv("Datasets/topic5_banks_warsaw.csv") |> slice_sample(n = 150)
banking_data <- bind_rows(atms, banks)
# Convert to sf (WGS84)
banking_sf <- st_as_sf(banking_data, coords = c("lon", "lat"), crs = 4326)
# ── 2. Warsaw boundary from Eurostat GISCO (NUTS3 — "Miasto Warszawa") ────────
nuts3_pl <- gisco_get_nuts(year = "2021", country = "PL", nuts_level = 3)
warsaw_border <- nuts3_pl[nuts3_pl$NUTS_ID == "PL911", ] |>
st_transform(crs = 3857)
# ── 3. Transform and clip points to Warsaw ────────────────────────────────────
banking_sf <- st_transform(banking_sf, crs = 3857)
banking_warsaw <- st_filter(banking_sf, warsaw_border) # keep points inside Warsaw
# ── 4. Observation window (owin) ──────────────────────────────────────────────
W <- as.owin(warsaw_border)
# ── 5. Marked Point Pattern (ppp) ─────────────────────────────────────────────
pattern <- ppp(
x = st_coordinates(banking_warsaw)[, 1],
y = st_coordinates(banking_warsaw)[, 2],
window = W,
marks = as.factor(banking_warsaw$type)
)
# Jitter to remove duplicate coordinates; rescale meters → kilometres
pattern <- rjitter(pattern, 0.03)
pattern <- rescale.ppp(pattern, 1000, "km")
summary(pattern)## Marked planar point pattern: 476 points
## Average intensity 0.3832 points per square km
##
## Coordinates are given to 12 decimal places
##
## Multitype:
## frequency proportion intensity
## atm 332 0.6975 0.2672
## bank 144 0.3025 0.1159
##
## Window: polygonal boundary
## single connected closed polygon with 7 vertices
## enclosing rectangle: [2327, 2365.8] x [6819, 6865] km
## (38.85 x 45.92 km)
## Window area = 1242.29 square km
## Unit of length: 1 km
## Fraction of frame area: 0.696
First-order properties describe the intensity (average density) of events across the study area.
plot_data <- banking_warsaw |>
mutate(x = st_coordinates(geometry)[, 1],
y = st_coordinates(geometry)[, 2])
ggplot() +
geom_sf(data = warsaw_border, fill = "#2b2b2b", color = "#444444", linewidth = 0.5) +
geom_point(data = plot_data, aes(x = x, y = y, color = type),
size = 1.5, alpha = 0.8) +
scale_color_manual(values = c("atm" = "#00FFCC", "bank" = "#FF3366")) +
theme_fivethirtyeight() +
theme(
panel.background = element_rect(fill = "#1e1e1e"),
plot.background = element_rect(fill = "#1e1e1e"),
legend.background = element_rect(fill = "#1e1e1e"),
legend.text = element_text(color = "white"),
legend.title = element_blank(),
text = element_text(color = "white"),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank()
) +
labs(title = "Banking Infrastructure in Warsaw",
subtitle = "ATMs (Cyan) vs Physical Bank Branches (Pink)")KDE reveals spatial hotspots of banking
infrastructure. We use the Diggle bandwidth selector
(bw.diggle), which minimises the mean-squared error of the
density estimate.
split_pattern <- split(pattern)
d_atm <- density(split_pattern$atm, sigma = bw.diggle)
d_bank <- density(split_pattern$bank, sigma = bw.diggle)
par(mfrow = c(1, 2), mar = c(1, 1, 3, 1))
plot(d_atm, main = "KDE: ATMs", col = inferno(256))
plot(d_bank, main = "KDE: Banks", col = inferno(256))Observation: Bank branches form a sharp, concentrated peak in the city centre. ATMs display multiple high-density clusters spread across all Warsaw districts.
Relative risk maps the spatially varying probability of encountering an ATM relative to a Bank at any given location.
probs_rr <- relrisk(pattern, relative = FALSE)
par(mar = c(1, 1, 3, 1))
plot(probs_rr,
main = "Relative Risk: Probability of Encountering an ATM vs Bank",
col = viridis(256))
contour(probs_rr, add = TRUE, col = "white", alpha = 0.5)Economic Interpretation: Dark areas indicate Bank-dominant zones (CBD). Bright yellow regions are ATM-dominant residential districts.
Second-order properties describe spatial dependence — whether events cluster, repel, or are independent of each other.
We formally test H₀: ATMs and Banks are spatially independent (no segregation).
## Computing observed value... Done.
## Computing 39 simulated values...
## 1,
## 2,
## 3,
## 4,
## 5,
## 6,
## 7,
## 8,
## 9,
## 10, 11, 12,
## 13,
## 14,
## 15,
## 16,
## 17,
## 18,
## 19,
## 20,
## 21, 22,
## 23,
## 24,
## 25, 26,
## 27,
## 28,
## 29, 30, 31, 32,
## 33,
## 34,
## 35, 36,
## 37,
## 38,
##
## 39.
## Done.
##
## Monte Carlo test of spatial segregation of types
##
## data: pattern
## T = 4.3, p-value = 0.05
Conclusion: A p-value < 0.05 rejects H₀, confirming significant spatial segregation — ATMs and bank branches serve geographically distinct areas of Warsaw.
The G-function (Gest) gives the
cumulative distribution of nearest-neighbour distances. If the observed
curve lies above the theoretical Poisson (CSR) line,
events are more clustered than a random process would produce.
G_atm <- Gest(split_pattern$atm)
G_bank <- Gest(split_pattern$bank)
par(mfrow = c(1, 2))
plot(G_atm, main = "G-function: ATMs", legend = FALSE)
plot(G_bank, main = "G-function: Banks", legend = FALSE)Conclusion: Both types exceed the Poisson baseline, confirming clustering. The Bank G-function rises more steeply at short distances, indicating tighter spatial grouping consistent with CBD concentration.
The inhomogeneous L-function (Linhom)
tests for clustering while correcting for Warsaw’s spatially varying
background density. It is the variance-stabilised version of Ripley’s
K-function.
L_atm <- Linhom(split_pattern$atm)
L_bank <- Linhom(split_pattern$bank)
par(mfrow = c(1, 2))
plot(L_atm, main = "L-function (inhom): ATMs", legend = FALSE)
plot(L_bank, main = "L-function (inhom): Banks", legend = FALSE)Conclusion: Both infrastructure types show significant clustering beyond what the local density gradient alone predicts. Bank branches exhibit stronger short-range clustering, consistent with agglomeration in commercial districts.
We test the central economic hypothesis: Are Bank Branches more strongly attracted to the city centre than ATMs?
The Palace of Culture and Science (PKiN) serves as a proxy for Warsaw’s CBD — the most prominent commercial and transport hub in the city.
# ── 1. City-centre point (PKiN) ───────────────────────────────────────────────
center_sf <- st_as_sf(
data.frame(lon = 21.0068, lat = 52.2319),
coords = c("lon", "lat"), crs = 4326
) |> st_transform(crs = 3857)
# Create ppp in the original meters window, then rescale to km
center_ppp <- ppp(
x = st_coordinates(center_sf)[, 1],
y = st_coordinates(center_sf)[, 2],
window = W
)
center_ppp <- rescale.ppp(center_ppp, 1000, "km")
# ── 2. Distance-to-centre spatial covariate ───────────────────────────────────
dist_raw <- crossdist(pattern, center_ppp)
pattern_dist <- ppp(x = pattern$x, y = pattern$y,
window = pattern$window, marks = dist_raw[, 1])
dist_map <- Smooth.ppp(pattern_dist)
par(mar = c(1, 1, 3, 1))
plot(dist_map, main = "Distance to Warsaw CBD — Palace of Culture (km)",
col = magma(256))# ── 3. Fit models ─────────────────────────────────────────────────────────────
m1 <- ppm(pattern ~ marks) # baseline: type only
m2 <- ppm(pattern ~ marks * dist_map) # interaction: type × distance to CBD
cat("\n--- Model 2 Coefficients (marks × distance) ---\n")##
## --- Model 2 Coefficients (marks × distance) ---
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 0.837332 0.10660 0.62840 1.04626 *** 7.8551
## marksbank -0.819948 0.19355 -1.19930 -0.44060 *** -4.2364
## dist_map -0.201635 0.01146 -0.22410 -0.17917 *** -17.5948
## marksbank:dist_map -0.001933 0.02088 -0.04285 0.03899 -0.0926
# ── 4. Likelihood-Ratio Test ──────────────────────────────────────────────────
cat("\n--- ANOVA Likelihood-Ratio Test: m1 vs m2 ---\n")##
## --- ANOVA Likelihood-Ratio Test: m1 vs m2 ---
## Analysis of Deviance Table
##
## Model 1: ~marks Poisson
## Model 2: ~marks * dist_map Poisson
## Npar Df Deviance Pr(>Chi)
## 1 2
## 2 4 2 524 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Final Conclusion: The highly significant ANOVA
p-value confirms that distance to the CBD is a strong
predictor of banking infrastructure location. The interaction
term marksbank:dist_map shows that bank branch intensity
drops off far more sharply with distance from the centre compared to
ATMs. This validates our economic hypothesis:
| Method | Category | Key Finding |
|---|---|---|
| KDE | First-order | Banks peak in CBD; ATMs spread city-wide |
| Relative Risk | First-order | CBD is Bank-dominant; suburbs are ATM-dominant |
| Segregation Test | Second-order | Significant spatial segregation (p < 0.05) |
| G-function | Second-order | Both cluster; Banks cluster more tightly |
| Inhomogeneous L-function | Second-order | Clustering exceeds background density for both types |
| PPM + ANOVA | Modelling | Distance to CBD is highly significant (p < 0.001) |
The combined evidence strongly supports the hypothesis that physical bank branch rationalisation in Warsaw follows an economic logic: branches concentrate where commercial returns are highest (CBD), while ATMs substitute for retail banking coverage in lower-cost residential locations. This mirrors patterns observed in other European capital cities undergoing digital banking transitions.