Spatial Accessibility and Stop Density of Public Transport Stops in Central Warsaw

Introduction

The aim of this project is to analyse the spatial distribution of public transport stops in central Warsaw. The study focuses on the relationship between stop density, spatial accessibility and the simplification of stop data through 200-metre same-mode clustering.

The analysis is based on two main ideas from the literature. First, GIS (Geographic Information System) is considered an appropriate tool for studying public transport accessibility because each stop has a precise geographic location and can be analysed spatially. Florczak (2013) emphasises that GIS is not only a cartographic tool, but also a spatial database that enables buffer and network-based accessibility analysis. Second, stop spacing is important because very dense stop networks may improve walking access, but can reduce operational performance due to frequent stopping, acceleration and deceleration. Nuworsoo (2011) describes this as a trade-off between accessibility and transport performance.

Research question

The main research question is:

Does the spatial distribution of public transport stops in central Warsaw indicate high stop density and overlapping catchment areas, and can a 200-metre same-mode clustering approach reduce data complexity while preserving the main spatial structure of public transport accessibility?

Additional sub-questions are:

Which transport modes dominate the representative stop dataset?
Are stops evenly distributed across the central study area or spatially clustered?
Do approximate catchment areas overlap strongly, suggesting dense local accessibility?
How does the 200-metre clustering procedure affect the interpretation of stop spacing?

Data and preprocessing

The dataset contains representative public transport stop locations in central Warsaw. It was created from the original GeoJSON stop dataset by selecting a 7 km study area around the centre of Warsaw and applying a 200-metre same-mode clustering procedure. This means that stops of the same transport mode located within a local 200-metre radius were represented by one point, whose coordinates were calculated as the average location of the original points.

This preprocessing step was used to reduce data complexity while preserving the general spatial structure of the public transport network. The approach is not intended to replace operational stop-level data, but to create a simplified dataset suitable for exploratory spatial accessibility analysis.

## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE

library(ggplot2)
library(ggspatial)
library(dplyr)
library(tidyr)
library(stringr)
library(spatstat.geom)
library(spatstat.explore)
library(spatstat.model)
library(spatstat.random)

stops_2180 <- st_transform(stops, 2180) # polish metric system
stops_4326 <- st_transform(stops, 4326) # longitude / latitude 
stops_3857 <- st_transform(stops, 3857) # metric system used by online maps

# marking centre of Warsaw
warsaw_centre_4326 <- st_sfc(st_point(c(21.0122, 52.2297)), crs = 4326)
warsaw_centre_2180 <- st_transform(warsaw_centre_4326, 2180)

# marking a 7km radius around the centre 
study_area_2180 <- st_buffer(warsaw_centre_2180, 7000)
study_area_3857 <- st_transform(study_area_2180, 3857)

# transport mode table based on representative points
mode_summary <- stops |>
  st_drop_geometry() |>
  count(mode, name = "representative_points") |>
  mutate(
    share_percent = round(100 * representative_points / sum(representative_points), 1)
  ) |>
  arrange(desc(representative_points))

print(mode_summary)

##           mode representative_points share_percent
## 1          bus                   742          74.7
## 2         tram                   216          21.8
## 3       subway                    28           2.8
## 4 rail_station                     7           0.7

The mode summary shows the composition of the representative stop dataset. A high share of bus stops is expected, because bus networks usually provide the most local and fine-grained coverage. Tram, metro and rail stops are less numerous, but they represent higher-capacity modes that often serve larger catchment areas.

# comparing original n of stops to representative points
cluster_summary <- stops |>
  st_drop_geometry() |>
  group_by(mode) |>
  summarise(
    representative_points = n(),
    original_points_represented = sum(merged_count, na.rm = TRUE),
    mean_merged_count = round(mean(merged_count, na.rm = TRUE), 2),
    median_merged_count = median(merged_count, na.rm = TRUE),
    max_merged_count = max(merged_count, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(original_points_represented))

print(cluster_summary)

## # A tibble: 4 × 6
##   mode         representative_points original_points_represe…¹ mean_merged_count
##   <chr>                        <int>                     <int>             <dbl>
## 1 bus                            742                      1925              2.59
## 2 tram                           216                       893              4.13
## 3 subway                          28                        28              1   
## 4 rail_station                     7                         7              1   
## # ℹ abbreviated name: ¹original_points_represented
## # ℹ 2 more variables: median_merged_count <dbl>, max_merged_count <int>

The cluster summary shows how strongly the original dataset was reduced by the 200-metre same-mode clustering procedure. The variable original_points_represented indicates how many original stops are represented by the simplified dataset, while mean_merged_count and max_merged_count show whether some representative points aggregate several nearby original stops. High values suggest local stop density and possible overlap between stop catchment areas.

base_tile <- tryCatch(
  ggspatial::annotation_map_tile(type = "osm", zoomin = 0, progress = "none"),
  error = function(e) NULL
)

ggplot() +
  base_tile +
  geom_sf(data = stops_3857, aes(color = mode), size = 1.7, alpha = 0.85) +
  coord_sf(
    xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
    ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
    expand = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Public transport stops in central Warsaw",
    subtitle = "Representative points after 200 m same-mode clustering",
    color = "Transport mode"
  )

The map shows the spatial distribution of representative public transport stops after the 200-metre same-mode clustering procedure. The purpose of this map is to assess whether stops are evenly distributed across the study area or concentrated in selected transport corridors and central zones.

ggplot() +
  base_tile +
  geom_sf(data = stops_3857, aes(size = merged_count, color = mode), alpha = 0.75) +
  coord_sf(
    xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
    ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
    expand = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Merged stop clusters in central Warsaw",
    subtitle = "Larger symbols represent clusters containing more original stop points",
    color = "Transport mode",
    size = "Original points\nmerged"
  )

Larger symbols indicate locations where more original stop points were merged into one representative point. These areas can be interpreted as places of high local stop density. This supports the idea that some parts of the network contain many closely spaced stops, which is directly related to the stop spacing problem discussed by Nuworsoo (2011).

# calculating distance from the city centre
stops_2180$distance_from_centre_km <- as.numeric(st_distance(stops_2180, warsaw_centre_2180)) / 1000

centre_distance_summary <- stops_2180 |>
  st_drop_geometry() |>
  group_by(mode) |>
  summarise(
    n = n(),
    median_km = round(median(distance_from_centre_km, na.rm = TRUE), 2),
    mean_km = round(mean(distance_from_centre_km, na.rm = TRUE), 2),
    q1_km = round(quantile(distance_from_centre_km, 0.25, na.rm = TRUE), 2),
    q3_km = round(quantile(distance_from_centre_km, 0.75, na.rm = TRUE), 2),
    .groups = "drop"
  ) |>
  arrange(median_km)

print(centre_distance_summary)

## # A tibble: 4 × 6
##   mode             n median_km mean_km q1_km q3_km
##   <chr>        <int>     <dbl>   <dbl> <dbl> <dbl>
## 1 rail_station     7      3.38    2.49  1.42  3.48
## 2 subway          28      3.55    3.71  1.97  5.39
## 3 tram           216      3.94    4.01  2.87  5.39
## 4 bus            742      4.63    4.41  3.15  5.89

ggplot(st_drop_geometry(stops_2180), aes(x = mode, y = distance_from_centre_km, fill = mode)) +
  geom_boxplot(alpha = 0.7, show.legend = FALSE) +
  theme_minimal() +
  labs(
    title = "Distance of representative stops from Warsaw centre",
    subtitle = "Comparison by transport mode",
    x = "Transport mode",
    y = "Distance from centre [km]"
  )

The distance-from-centre analysis shows how different transport modes are distributed within the central Warsaw study area. The boxplot compares the distance of representative stops from the city centre by transport mode. Lower median values indicate modes that are more centrally concentrated, while wider distributions suggest modes that are more dispersed across the study area. This provides additional context for interpreting the spatial structure of the public transport network.

# determining accessibility range
catchment_distances <- stops_2180 |>
  mutate(
    catchment_300m = 300,
    catchment_mode_based = dplyr::case_when(
      mode == "bus" ~ 300,
      mode == "tram" ~ 800,
      mode == "subway" ~ 800,
      mode == "rail_station" ~ 800,
      TRUE ~ 300
    )
  )

calculate_catchment_stats <- function(sf_points, distance_column, scenario_name) {
  buffers <- st_buffer(sf_points, dist = sf_points[[distance_column]])
  buffers <- st_intersection(buffers, study_area_2180)

  sum_buffer_area <- sum(as.numeric(st_area(buffers)))
  union_buffer <- st_union(buffers)
  union_area <- as.numeric(st_area(union_buffer))
  study_area <- as.numeric(st_area(study_area_2180))

  data.frame(
    scenario = scenario_name,
    number_of_stops = nrow(sf_points),
    summed_buffer_area_km2 = round(sum_buffer_area / 1e6, 2),
    union_buffer_area_km2 = round(union_area / 1e6, 2),
    study_area_coverage_percent = round(100 * union_area / study_area, 1),
    overlap_percent = round(100 * (1 - union_area / sum_buffer_area), 1)
  )
}

catchment_stats <- bind_rows(
  calculate_catchment_stats(catchment_distances, "catchment_300m", "All modes: 300 m"),
  calculate_catchment_stats(catchment_distances, "catchment_mode_based", "Mode-based: bus 300 m, tram/metro/rail 800 m")
)

print(catchment_stats)

##                                       scenario number_of_stops
## 1                             All modes: 300 m             993
## 2 Mode-based: bus 300 m, tram/metro/rail 800 m             993
##   summed_buffer_area_km2 union_buffer_area_km2 study_area_coverage_percent
## 1                 275.10                119.62                        77.7
## 2                 692.45                137.42                        89.3
##   overlap_percent
## 1            56.5
## 2            80.2

buffers_300 <- st_buffer(stops_2180, 300)
buffers_300_union <- st_sf(scenario = "300 m catchment", geometry = st_union(buffers_300)) |>
  st_intersection(study_area_2180) |>
  st_transform(3857)

ggplot() +
  base_tile +
  geom_sf(data = buffers_300_union, fill = "steelblue", alpha = 0.25, color = NA) +
  geom_sf(data = stops_3857, aes(color = mode), size = 0.9, alpha = 0.85) +
  coord_sf(
    xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
    ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
    expand = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Approximate 300 m catchment areas",
    subtitle = "Euclidean buffer approximation clipped to the 7 km central study area",
    color = "Transport mode"
  )

‘Generally, the most common values in various studies are 300 and 400 meters for bus stops and 800 meters for tram or metro stops.’ - Florczak (2013)

The first scenario applies a uniform 300-metre catchment to all stops and represents a conservative estimate of local stop accessibility. The second scenario applies mode-specific catchments: 300 metres for bus stops and 800 metres for tram, metro and rail stops. This scenario reflects the assumption that passengers may be willing to walk further to faster or higher-capacity transport modes. The comparison shows that mode-based assumptions increase spatial coverage from 77.7% to 89.3%, but also increase catchment overlap from 56.5% to 80.2%. This suggests that central Warsaw has very high public transport accessibility, but also substantial duplication of stop catchment areas.

mode_based_buffers <- st_buffer(
  catchment_distances,
  dist = catchment_distances$catchment_mode_based
)

mode_based_union <- st_sf(
  scenario = "Mode-based catchment",
  geometry = st_union(mode_based_buffers)
) |>
  st_intersection(study_area_2180) |>
  st_transform(3857)

ggplot() +
  base_tile +
  geom_sf(data = mode_based_union, fill = "darkgreen", alpha = 0.25, color = NA) +
  geom_sf(data = stops_3857, aes(color = mode), size = 0.9, alpha = 0.85) +
  coord_sf(
    xlim = st_bbox(study_area_3857)[c("xmin", "xmax")],
    ylim = st_bbox(study_area_3857)[c("ymin", "ymax")],
    expand = FALSE
  ) +
  theme_minimal() +
  labs(
    title = "Mode-based catchment areas",
    subtitle = "Bus: 300 m; tram, metro and rail: 800 m",
    color = "Transport mode"
  )

The mode-based catchment map applies different accessibility thresholds depending on the type of public transport. Bus stops are assigned a 300-metre catchment, while tram, metro and rail stops are assigned an 800-metre catchment. This reflects the idea that passengers may be willing to walk further to higher-capacity or faster transport modes than to local bus stops.

Methodological limitation

The catchment analysis in this report uses Euclidean buffers. This means that distance is measured as a straight-line radius around each stop. This approach is simple and useful for exploratory spatial analysis, but it does not fully represent real pedestrian accessibility. As noted by Florczak (2013), buffer-based accessibility may differ significantly from network-based accessibility, because pedestrians move along streets and paths rather than in straight lines. Therefore, the results should be interpreted as an approximation of spatial accessibility, not as a precise walking-distance model.

nearest_neighbour_by_mode <- function(sf_points) {
  modes <- unique(as.character(sf_points$mode))
  out <- list()

  for (m in modes) {
    sub <- sf_points[sf_points$mode == m, ]
    if (nrow(sub) < 2) {
      out[[m]] <- data.frame(mode = m, nn_distance_m = NA_real_)
    } else {
      d <- as.matrix(st_distance(sub))
      diag(d) <- Inf
      nn <- apply(d, 1, min, na.rm = TRUE)
      out[[m]] <- data.frame(mode = m, nn_distance_m = as.numeric(nn))
    }
  }
  bind_rows(out)
}

nn_distances <- nearest_neighbour_by_mode(stops_2180)

nn_summary <- nn_distances |>
  group_by(mode) |>
  summarise(
    n = sum(!is.na(nn_distance_m)),
    median_nn_m = round(median(nn_distance_m, na.rm = TRUE), 1),
    mean_nn_m = round(mean(nn_distance_m, na.rm = TRUE), 1),
    pct_below_200m = round(mean(nn_distance_m < 200, na.rm = TRUE) * 100, 1),
    pct_below_300m = round(mean(nn_distance_m < 300, na.rm = TRUE) * 100, 1),
    pct_below_400m = round(mean(nn_distance_m < 400, na.rm = TRUE) * 100, 1),
    .groups = "drop"
  ) |>
  arrange(median_nn_m)

print(nn_summary)

## # A tibble: 4 × 7
##   mode      n median_nn_m mean_nn_m pct_below_200m pct_below_300m pct_below_400m
##   <chr> <int>       <dbl>     <dbl>          <dbl>          <dbl>          <dbl>
## 1 bus     742        269.      273.           22.1           65.8           91.9
## 2 tram    216        339.      350.            6.5           32.9           74.1
## 3 subw…    28        989.      921.            0              0              0  
## 4 rail…     7       1053.     1194.            0             28.6           28.6

ggplot(nn_distances, aes(x = mode, y = nn_distance_m, fill = mode)) +
  geom_boxplot(alpha = 0.7, show.legend = FALSE) +
  geom_hline(yintercept = 200, linetype = "dashed") +
  geom_hline(yintercept = 300, linetype = "dotted") +
  geom_hline(yintercept = 400, linetype = "dotdash") +
  coord_cartesian(ylim = c(0, 1200)) +
  theme_minimal() +
  labs(
    title = "Nearest-neighbour distances by transport mode",
    subtitle = "Dashed reference lines: 200 m, 300 m and 400 m",
    x = "Transport mode",
    y = "Distance to nearest same-mode representative point [m]"
  )

The nearest-neighbour distance plot shows how far each representative stop is from the closest stop of the same transport mode. Distances below 200 metres indicate that some close same-mode stops still remain after clustering, while distances around 300–400 metres are consistent with common accessibility thresholds for dense urban areas.

coords <- st_coordinates(stops_2180)

window_rect <- owin(
  xrange = range(coords[, 1]),
  yrange = range(coords[, 2])
)

stops_ppp <- ppp(
  x = coords[, 1],
  y = coords[, 2],
  window = window_rect,
  marks = factor(stops_2180$mode),
  check = FALSE
)

density_all <- density.ppp(
  unmark(stops_ppp),
  sigma = 500
)

density_df <- as.data.frame(density_all)
names(density_df)[3] <- "density"

points_df <- data.frame(
  x = coords[, 1],
  y = coords[, 2],
  mode = stops_2180$mode
)

ggplot() +
  geom_raster(
    data = density_df,
    aes(x = x, y = y, fill = density)
  ) +
  geom_point(
    data = points_df,
    aes(x = x, y = y, color = mode),
    size = 0.5,
    alpha = 0.7
  ) +
  coord_equal() +
  theme_minimal() +
  labs(
    title = "Kernel density of representative public transport stops",
    subtitle = "Density surface with representative stops by transport mode",
    fill = "Density",
    color = "Transport mode"
  )

The kernel density map identifies spatial hotspots of public transport stop concentration. Higher density values indicate areas where stops are more spatially concentrated. In the context of accessibility, these areas may represent better-served parts of central Warsaw, but they may also indicate overlapping service areas.

k_result <- Kest(unmark(stops_ppp))
plot(k_result, main = "K-function for public transport stops")

The K-function plot compares the observed spatial pattern of public transport stops with a theoretical Poisson pattern representing complete spatial randomness. The pois line shows the expected K-function under random distribution. The iso, trans and border lines are different edge-corrected estimates of the observed K-function. If the observed lines are above the Poisson line, the stops are more clustered than expected under randomness. If they are below it, the pattern is more regular or dispersed.

Conclusion

The analysis shows that public transport stops in central Warsaw are not randomly distributed. They form visible concentrations and transport corridors, especially in the central part of the study area. The 200-metre same-mode clustering procedure reduced the complexity of the original dataset while preserving the main spatial structure of the network.

The results are consistent with the literature. Florczak (2013) shows that GIS-based methods are useful for analysing spatial accessibility to public transport stops, while Nuworsoo (2011) emphasises that stop spacing involves a trade-off between walking accessibility and operational efficiency. In this project, dense stop areas and overlapping catchment zones suggest that spatial generalisation is a reasonable method for preparing a simplified but still representative dataset.

However, the results should be treated as exploratory. The analysis uses Euclidean buffers rather than real pedestrian network distances. A more advanced version of the study should include pedestrian paths, population data, employment density or service frequency.

References

Florczak, M. (2013). GIS jako narzędzie badania dostępności przestrzennej transportu zbiorowego. Transport Miejski i Regionalny, 5.

Nuworsoo, C. (2011). Guidelines for Transit Bus Stop Spacing: Improving Accessibility and Performance. California Polytechnic State University, San Luis Obispo.