The aim of this project is to analyse the spatial distribution of public transport stops in central Warsaw. The study focuses on the relationship between stop density, spatial accessibility and the simplification of stop data through 200-metre same-mode clustering.
The analysis is based on two main ideas from the literature. First, GIS (Geographic Information System) is considered an appropriate tool for studying public transport accessibility because each stop has a precise geographic location and can be analysed spatially. Florczak (2013) emphasises that GIS is not only a cartographic tool, but also a spatial database that enables buffer and network-based accessibility analysis. Second, stop spacing is important because very dense stop networks may improve walking access, but can reduce operational performance due to frequent stopping, acceleration and deceleration. Nuworsoo (2011) describes this as a trade-off between accessibility and transport performance.
The main research question is:
Does the spatial distribution of public transport stops in central Warsaw indicate high stop density and overlapping catchment areas, and can a 200-metre same-mode clustering approach reduce data complexity while preserving the main spatial structure of public transport accessibility?
Additional sub-questions are:
The dataset contains representative public transport stop locations in central Warsaw. It was created from the original GeoJSON stop dataset by selecting a 7 km study area around the centre of Warsaw and applying a 200-metre same-mode clustering procedure. This means that stops of the same transport mode located within a local 200-metre radius were represented by one point, whose coordinates were calculated as the average location of the original points.
This preprocessing step was used to reduce data complexity while preserving the general spatial structure of the public transport network. The approach is not intended to replace operational stop-level data, but to create a simplified dataset suitable for exploratory spatial accessibility analysis.
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
library(ggplot2)
library(ggspatial)
library(dplyr)
library(tidyr)
library(stringr)
library(spatstat.geom)
library(spatstat.explore)
library(spatstat.model)
library(spatstat.random)
stops_2180 <- st_transform(stops, 2180) # polish metric system
stops_4326 <- st_transform(stops, 4326) # longitude / latitude
stops_3857 <- st_transform(stops, 3857) # metric system used by online maps
# marking centre of Warsaw
warsaw_centre_4326 <- st_sfc(st_point(c(21.0122, 52.2297)), crs = 4326)
warsaw_centre_2180 <- st_transform(warsaw_centre_4326, 2180)
# marking a 7km radius around the centre
study_area_2180 <- st_buffer(warsaw_centre_2180, 7000)
study_area_3857 <- st_transform(study_area_2180, 3857)
# transport mode table based on representative points
mode_summary <- stops |>
st_drop_geometry() |>
count(mode, name = "representative_points") |>
mutate(
share_percent = round(100 * representative_points / sum(representative_points), 1)
) |>
arrange(desc(representative_points))
print(mode_summary)
## mode representative_points share_percent
## 1 bus 742 74.7
## 2 tram 216 21.8
## 3 subway 28 2.8
## 4 rail_station 7 0.7
The mode summary shows the composition of the representative stop dataset. A high share of bus stops is expected, because bus networks usually provide the most local and fine-grained coverage. Tram, metro and rail stops are less numerous, but they represent higher-capacity modes that often serve larger catchment areas.
# comparing original n of stops to representative points
cluster_summary <- stops |>
st_drop_geometry() |>
group_by(mode) |>
summarise(
representative_points = n(),
original_points_represented = sum(merged_count, na.rm = TRUE),
mean_merged_count = round(mean(merged_count, na.rm = TRUE), 2),
median_merged_count = median(merged_count, na.rm = TRUE),
max_merged_count = max(merged_count, na.rm = TRUE),
.groups = "drop"
) |>
arrange(desc(original_points_represented))
print(cluster_summary)
## # A tibble: 4 × 6
## mode representative_points original_points_represe…¹ mean_merged_count
## <chr> <int> <int> <dbl>
## 1 bus 742 1925 2.59
## 2 tram 216 893 4.13
## 3 subway 28 28 1
## 4 rail_station 7 7 1
## # ℹ abbreviated name: ¹original_points_represented
## # ℹ 2 more variables: median_merged_count <dbl>, max_merged_count <int>
The cluster summary shows how strongly the original dataset was
reduced by the 200-metre same-mode clustering procedure. The variable
original_points_represented indicates how many original
stops are represented by the simplified dataset, while
mean_merged_count and max_merged_count show
whether some representative points aggregate several nearby original
stops. High values suggest local stop density and possible overlap
between stop catchment areas.
base_tile <- tryCatch(
ggspatial::annotation_map_tile(type = "osm", zoomin = 0, progress = "none"),
error = function(e) NULL
)
ggplot() +
base_tile +
geom_sf(data = stops_3857, aes(color = mode), size = 1.7, alpha = 0.85) +
coord_sf(
xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
expand = FALSE
) +
theme_minimal() +
labs(
title = "Public transport stops in central Warsaw",
subtitle = "Representative points after 200 m same-mode clustering",
color = "Transport mode"
)
The map shows the spatial distribution of representative public
transport stops after the 200-metre same-mode clustering procedure. The
purpose of this map is to assess whether stops are evenly distributed
across the study area or concentrated in selected transport corridors
and central zones.
ggplot() +
base_tile +
geom_sf(data = stops_3857, aes(size = merged_count, color = mode), alpha = 0.75) +
coord_sf(
xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
expand = FALSE
) +
theme_minimal() +
labs(
title = "Merged stop clusters in central Warsaw",
subtitle = "Larger symbols represent clusters containing more original stop points",
color = "Transport mode",
size = "Original points\nmerged"
)
Larger symbols indicate locations where more original stop points were
merged into one representative point. These areas can be interpreted as
places of high local stop density. This supports the idea that some
parts of the network contain many closely spaced stops, which is
directly related to the stop spacing problem discussed by Nuworsoo
(2011).
# calculating distance from the city centre
stops_2180$distance_from_centre_km <- as.numeric(st_distance(stops_2180, warsaw_centre_2180)) / 1000
centre_distance_summary <- stops_2180 |>
st_drop_geometry() |>
group_by(mode) |>
summarise(
n = n(),
median_km = round(median(distance_from_centre_km, na.rm = TRUE), 2),
mean_km = round(mean(distance_from_centre_km, na.rm = TRUE), 2),
q1_km = round(quantile(distance_from_centre_km, 0.25, na.rm = TRUE), 2),
q3_km = round(quantile(distance_from_centre_km, 0.75, na.rm = TRUE), 2),
.groups = "drop"
) |>
arrange(median_km)
print(centre_distance_summary)
## # A tibble: 4 × 6
## mode n median_km mean_km q1_km q3_km
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 rail_station 7 3.38 2.49 1.42 3.48
## 2 subway 28 3.55 3.71 1.97 5.39
## 3 tram 216 3.94 4.01 2.87 5.39
## 4 bus 742 4.63 4.41 3.15 5.89
ggplot(st_drop_geometry(stops_2180), aes(x = mode, y = distance_from_centre_km, fill = mode)) +
geom_boxplot(alpha = 0.7, show.legend = FALSE) +
theme_minimal() +
labs(
title = "Distance of representative stops from Warsaw centre",
subtitle = "Comparison by transport mode",
x = "Transport mode",
y = "Distance from centre [km]"
)
The distance-from-centre analysis shows how different transport modes are distributed within the central Warsaw study area. The boxplot compares the distance of representative stops from the city centre by transport mode. Lower median values indicate modes that are more centrally concentrated, while wider distributions suggest modes that are more dispersed across the study area. This provides additional context for interpreting the spatial structure of the public transport network.
# determining accessibility range
catchment_distances <- stops_2180 |>
mutate(
catchment_300m = 300,
catchment_mode_based = dplyr::case_when(
mode == "bus" ~ 300,
mode == "tram" ~ 800,
mode == "subway" ~ 800,
mode == "rail_station" ~ 800,
TRUE ~ 300
)
)
calculate_catchment_stats <- function(sf_points, distance_column, scenario_name) {
buffers <- st_buffer(sf_points, dist = sf_points[[distance_column]])
buffers <- st_intersection(buffers, study_area_2180)
sum_buffer_area <- sum(as.numeric(st_area(buffers)))
union_buffer <- st_union(buffers)
union_area <- as.numeric(st_area(union_buffer))
study_area <- as.numeric(st_area(study_area_2180))
data.frame(
scenario = scenario_name,
number_of_stops = nrow(sf_points),
summed_buffer_area_km2 = round(sum_buffer_area / 1e6, 2),
union_buffer_area_km2 = round(union_area / 1e6, 2),
study_area_coverage_percent = round(100 * union_area / study_area, 1),
overlap_percent = round(100 * (1 - union_area / sum_buffer_area), 1)
)
}
catchment_stats <- bind_rows(
calculate_catchment_stats(catchment_distances, "catchment_300m", "All modes: 300 m"),
calculate_catchment_stats(catchment_distances, "catchment_mode_based", "Mode-based: bus 300 m, tram/metro/rail 800 m")
)
print(catchment_stats)
## scenario number_of_stops
## 1 All modes: 300 m 993
## 2 Mode-based: bus 300 m, tram/metro/rail 800 m 993
## summed_buffer_area_km2 union_buffer_area_km2 study_area_coverage_percent
## 1 275.10 119.62 77.7
## 2 692.45 137.42 89.3
## overlap_percent
## 1 56.5
## 2 80.2
buffers_300 <- st_buffer(stops_2180, 300)
buffers_300_union <- st_sf(scenario = "300 m catchment", geometry = st_union(buffers_300)) |>
st_intersection(study_area_2180) |>
st_transform(3857)
ggplot() +
base_tile +
geom_sf(data = buffers_300_union, fill = "steelblue", alpha = 0.25, color = NA) +
geom_sf(data = stops_3857, aes(color = mode), size = 0.9, alpha = 0.85) +
coord_sf(
xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
expand = FALSE
) +
theme_minimal() +
labs(
title = "Approximate 300 m catchment areas",
subtitle = "Euclidean buffer approximation clipped to the 7 km central study area",
color = "Transport mode"
)
‘Generally, the most common values in various studies are 300 and
400 meters for bus stops and 800 meters for tram or metro stops.’ -
Florczak (2013)
The first scenario applies a uniform 300-metre catchment to all stops and represents a conservative estimate of local stop accessibility. The second scenario applies mode-specific catchments: 300 metres for bus stops and 800 metres for tram, metro and rail stops. This scenario reflects the assumption that passengers may be willing to walk further to faster or higher-capacity transport modes. The comparison shows that mode-based assumptions increase spatial coverage from 77.7% to 89.3%, but also increase catchment overlap from 56.5% to 80.2%. This suggests that central Warsaw has very high public transport accessibility, but also substantial duplication of stop catchment areas.
mode_based_buffers <- st_buffer(
catchment_distances,
dist = catchment_distances$catchment_mode_based
)
mode_based_union <- st_sf(
scenario = "Mode-based catchment",
geometry = st_union(mode_based_buffers)
) |>
st_intersection(study_area_2180) |>
st_transform(3857)
ggplot() +
base_tile +
geom_sf(data = mode_based_union, fill = "darkgreen", alpha = 0.25, color = NA) +
geom_sf(data = stops_3857, aes(color = mode), size = 0.9, alpha = 0.85) +
coord_sf(
xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
expand = FALSE
) +
theme_minimal() +
labs(
title = "Mode-based catchment areas",
subtitle = "Bus: 300 m; tram, metro and rail: 800 m",
color = "Transport mode"
)
The mode-based catchment map applies different accessibility thresholds depending on the type of public transport. Bus stops are assigned a 300-metre catchment, while tram, metro and rail stops are assigned an 800-metre catchment. This reflects the idea that passengers may be willing to walk further to higher-capacity or faster transport modes than to local bus stops.
The catchment analysis in this report uses Euclidean buffers. This means that distance is measured as a straight-line radius around each stop. This approach is simple and useful for exploratory spatial analysis, but it does not fully represent real pedestrian accessibility. As noted by Florczak (2013), buffer-based accessibility may differ significantly from network-based accessibility, because pedestrians move along streets and paths rather than in straight lines. Therefore, the results should be interpreted as an approximation of spatial accessibility, not as a precise walking-distance model.
nearest_neighbour_by_mode <- function(sf_points) {
modes <- unique(as.character(sf_points$mode))
out <- list()
for (m in modes) {
sub <- sf_points[sf_points$mode == m, ]
if (nrow(sub) < 2) {
out[[m]] <- data.frame(mode = m, nn_distance_m = NA_real_)
} else {
d <- as.matrix(st_distance(sub))
diag(d) <- Inf
nn <- apply(d, 1, min, na.rm = TRUE)
out[[m]] <- data.frame(mode = m, nn_distance_m = as.numeric(nn))
}
}
bind_rows(out)
}
nn_distances <- nearest_neighbour_by_mode(stops_2180)
nn_summary <- nn_distances |>
group_by(mode) |>
summarise(
n = sum(!is.na(nn_distance_m)),
median_nn_m = round(median(nn_distance_m, na.rm = TRUE), 1),
mean_nn_m = round(mean(nn_distance_m, na.rm = TRUE), 1),
pct_below_200m = round(mean(nn_distance_m < 200, na.rm = TRUE) * 100, 1),
pct_below_300m = round(mean(nn_distance_m < 300, na.rm = TRUE) * 100, 1),
pct_below_400m = round(mean(nn_distance_m < 400, na.rm = TRUE) * 100, 1),
.groups = "drop"
) |>
arrange(median_nn_m)
print(nn_summary)
## # A tibble: 4 × 7
## mode n median_nn_m mean_nn_m pct_below_200m pct_below_300m pct_below_400m
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 bus 742 269. 273. 22.1 65.8 91.9
## 2 tram 216 339. 350. 6.5 32.9 74.1
## 3 subw… 28 989. 921. 0 0 0
## 4 rail… 7 1053. 1194. 0 28.6 28.6
ggplot(nn_distances, aes(x = mode, y = nn_distance_m, fill = mode)) +
geom_boxplot(alpha = 0.7, show.legend = FALSE) +
geom_hline(yintercept = 200, linetype = "dashed") +
geom_hline(yintercept = 300, linetype = "dotted") +
geom_hline(yintercept = 400, linetype = "dotdash") +
coord_cartesian(ylim = c(0, 1200)) +
theme_minimal() +
labs(
title = "Nearest-neighbour distances by transport mode",
subtitle = "Dashed reference lines: 200 m, 300 m and 400 m",
x = "Transport mode",
y = "Distance to nearest same-mode representative point [m]"
)
The nearest-neighbour distance plot shows how far each representative
stop is from the closest stop of the same transport mode. Distances
below 200 metres indicate that some close same-mode stops still remain
after clustering, while distances around 300–400 metres are consistent
with common accessibility thresholds for dense urban areas.
coords <- st_coordinates(stops_2180)
window_rect <- owin(
xrange = range(coords[, 1]),
yrange = range(coords[, 2])
)
stops_ppp <- ppp(
x = coords[, 1],
y = coords[, 2],
window = window_rect,
marks = factor(stops_2180$mode),
check = FALSE
)
density_all <- density.ppp(
unmark(stops_ppp),
sigma = 500
)
density_df <- as.data.frame(density_all)
names(density_df)[3] <- "density"
points_df <- data.frame(
x = coords[, 1],
y = coords[, 2],
mode = stops_2180$mode
)
ggplot() +
geom_raster(
data = density_df,
aes(x = x, y = y, fill = density)
) +
geom_point(
data = points_df,
aes(x = x, y = y, color = mode),
size = 0.5,
alpha = 0.7
) +
coord_equal() +
theme_minimal() +
labs(
title = "Kernel density of representative public transport stops",
subtitle = "Density surface with representative stops by transport mode",
fill = "Density",
color = "Transport mode"
)
The kernel density map identifies spatial hotspots of public transport
stop concentration. Higher density values indicate areas where stops are
more spatially concentrated. In the context of accessibility, these
areas may represent better-served parts of central Warsaw, but they may
also indicate overlapping service areas.
k_result <- Kest(unmark(stops_ppp))
plot(k_result, main = "K-function for public transport stops")
The K-function plot compares the observed spatial pattern of public
transport stops with a theoretical Poisson pattern representing complete
spatial randomness. The pois line shows the expected K-function under
random distribution. The iso, trans and border lines are different
edge-corrected estimates of the observed K-function. If the observed
lines are above the Poisson line, the stops are more clustered than
expected under randomness. If they are below it, the pattern is more
regular or dispersed.
The analysis shows that public transport stops in central Warsaw are not randomly distributed. They form visible concentrations and transport corridors, especially in the central part of the study area. The 200-metre same-mode clustering procedure reduced the complexity of the original dataset while preserving the main spatial structure of the network.
The results are consistent with the literature. Florczak (2013) shows that GIS-based methods are useful for analysing spatial accessibility to public transport stops, while Nuworsoo (2011) emphasises that stop spacing involves a trade-off between walking accessibility and operational efficiency. In this project, dense stop areas and overlapping catchment zones suggest that spatial generalisation is a reasonable method for preparing a simplified but still representative dataset.
However, the results should be treated as exploratory. The analysis uses Euclidean buffers rather than real pedestrian network distances. A more advanced version of the study should include pedestrian paths, population data, employment density or service frequency.
Florczak, M. (2013). GIS jako narzędzie badania dostępności przestrzennej transportu zbiorowego. Transport Miejski i Regionalny, 5.
Nuworsoo, C. (2011). Guidelines for Transit Bus Stop Spacing: Improving Accessibility and Performance. California Polytechnic State University, San Luis Obispo.