Access to diagnostic healthcare services, such as blood test collection points, is a key component of urban health equity. In cities like Warsaw—characterized by diverse population densities, urban sprawl, and private service operators—the spatial distribution of diagnostic points directly affects both service availability and public health outcomes.
This report presents a comprehensive spatial analysis of blood collection points in Warsaw, with a focus on identifying underserved areas and guiding recommendations for new locations. Using geospatial data, population grids, and a suite of analytical tools—including Voronoi diagrams, distance-based indicators, population-weighted metrics, and spatial autocorrelation statistics (Moran’s I, LISA) we aim to answer the following questions:
• Where are diagnostic collection points currently located?
• Which areas have insufficient or no coverage?
• How evenly are services distributed in relation to population?
• Where should new diagnostic points be placed to maximize accessibility?
The study integrates datasets on existing diagnostic points (Alab, Diagnostyka, Synevo), population distribution on a 1×1 km grid, and spatial geometries of Warsaw. Special attention is given to equity of access—measured both in terms of geographic proximity and population service load—allowing us to pinpoint critical areas of need and generate data-driven recommendations for future expansion.
By combining visual exploration, quantitative indicators, and spatial statistical techniques, this analysis supports evidence-based planning for improving healthcare accessibility and reducing diagnostic service disparities across Warsaw.
To obtain the full list of blood test collection points in Warsaw, a custom Python-based web scraping script was developed. The scraper targeted Diagnostyka, Alab and Synevo websites, specifically the pages listing collection facilities in Warsaw. Due to the dynamic structure of the pages, where addresses are loaded into a virtualized scrollable popup, the script utilized Selenium for browser automation. The scraper carefully interacted with dynamic elements, waited for content rendering, and extracted both the visible address data and its underlying geolocation or clinic metadata. Each address entry was parsed and saved for further spatial analysis. This method ensured completeness of the dataset and avoided issues arising from JavaScript-rendered content, which could not be accessed using static HTML parsers like BeautifulSoup. The resulting dataset provides the geographic foundation for spatial distribution, coverage, and accessibility analysis of medical services in Warsaw.
This section of the analysis consolidates address data for blood test collection points from three major providers: Diagnostyka, ALAB and Synevo into a unified dataset. Using the dplyr and tidyverse suite in R, addresses are harmonized and enriched with a standard suffix (“Warszawa, Polska”) to improve geocoding accuracy. The script employs the tidygeocoder package with the ArcGIS geocoding API to convert addresses into geographic coordinates (latitude and longitude). After removing incomplete geocoding results (missing coordinates), the data is transformed into a spatial object (sf class) with WGS84 CRS (EPSG:4326), enabling further spatial analysis.
The code loads two spatial layers administrative boundary of Warsaw (powiat Warszawa) from a shapefile and population grid data (tsed18337.shp).
The population layer is clipped to the Warsaw boundary using spatial intersection, and each grid cell is uniquely indexed with a GRID_ID. This prepares the base layers for next tasks such as service coverage analysis, population accessibility or Voronoi tessellation.
diag <- read_csv("punkty_diagnostyka_warszawa.csv")
alab <- read_csv("punkty_alab_warszawa.csv")
synevo <- read_csv("punkty_synevo_warszawa.csv")
all_points <- bind_rows(diag, alab, synevo) %>%
mutate(full_address = paste0(address, ", Warszawa, Polska"))
geocoded <- all_points %>%
geocode(
address = full_address,
method = "arcgis",
quiet = FALSE,
full_results = FALSE
)
geocoded_sf <- geocoded %>%
filter(!is.na(lat), !is.na(long)) %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
WAW <- st_read("data/powiaty/powiaty.shp", quiet = TRUE) %>%
filter(jpt_nazwa_ == "powiat Warszawa") %>%
st_transform(crs = 4326)
POP <- st_read("data/query/tsed18337.shp", quiet = TRUE) %>%
st_transform(crs = 4326) %>%
st_intersection(WAW)
POP <- POP %>%
mutate(GRID_ID = row_number())
To gain an initial spatial understanding of the distribution of blood test collection facilities, an interactive dot map was created using the tmap package in view mode (Leaflet-based). Each point represents a single medical collection site in Warsaw, color-coded by provider — ALAB, Diagnostyka, or Synevo. A small number of points were marked as “Missing,” likely due to incomplete or ambiguous source data.
tmap_mode("view")
tm_shape(geocoded_sf) +
tm_dots(
col = "operator",
palette = "Dark2",
size = 0.4,
title = "Operator"
)
To assess the spatial distribution of blood test facilities relative to urban structure, the geocoded service points were spatially joined to a population grid covering the Warsaw area. Each grid cell represents a small territorial unit (1 km²), and the number of points falling within each cell (n_punkty) was calculated. Missing values were replaced with zeros to ensure completeness across the urban area.
joined <- st_join(geocoded_sf, POP, join = st_within, left = FALSE)
points_per_grid <- joined %>%
mutate(GRID_ID = as.character(GRID_ID)) %>%
group_by(GRID_ID) %>%
summarise(n_punkty = n(), .groups = "drop")
POP_with_counts <- POP %>%
mutate(GRID_ID = as.character(GRID_ID)) %>%
left_join(st_drop_geometry(points_per_grid), by = "GRID_ID") %>%
mutate(n_punkty = replace_na(n_punkty, 0))
tmap_mode("view")
tm_shape(POP_with_counts) +
tm_polygons("n_punkty", style = "quantile", palette = "YlOrRd", title = "number of points") +
tm_shape(geocoded_sf) +
tm_dots(col = "black", size = 0.2)
The resulting choropleth map visualizes the local density of collection sites using a quantile classification:
• Darker red cells indicate higher concentrations of collection points.
• Lighter yellow areas represent lower densities or absence of facilities.
• Each individual point is also plotted in black for additional spatial clarity.
without_points <- POP_with_counts %>%
filter(n_punkty == 0)
tm_shape(without_points) +
tm_polygons(col = "lightgray", border.col = "white") +
tm_shape(geocoded_sf) +
tm_dots(col = "operator", palette = "Set2", size = 0.2, title = "Operator") +
tm_layout(title = "Grid cells without collection points")
This map highlights grid cells within Warsaw that lack any blood collection facilities, based on a spatial join between the geocoded locations and the population grid. Cells with zero service points (n_punkty == 0) were isolated and visualized in light gray, overlaid with the actual facility locations colored by operator.
Large continuous zones without coverage are visible, particularly in the north-eastern and south-western parts of the city. These underserved areas exist despite urbanization in some regions, suggesting a mismatch between population distribution and service accessibility. Central districts show near-complete saturation, while peripheral districts face potential service deserts.
distances <- st_nn(POP, geocoded_sf, k = 1, returnDist = TRUE)
POP$nearest_dist_km <- as.numeric(distances$dist) / 1000
ggplot(POP, aes(x = nearest_dist_km)) +
geom_histogram(binwidth = 0.5, fill = "steelblue", color = "white") +
labs(
title = "Distance to nearest blood test point",
x = "Distance (km)",
y = "Number of grid cells"
) +
theme_minimal()
This histogram illustrates the distribution of distances from each population grid cell to its closest blood collection facility, based on a nearest-neighbour spatial query (st_nn() with k = 1). The distances are calculated in kilometers and grouped into 0.5 km bins.
Most of Warsaw’s population cells are located within 1-2 kilometers of the nearest facility, which indicates generally good accessibility in central and moderately urbanized areas. However, a long tail of the distribution reveals that some peripheral zones are located over 5 km from the nearest collection point, potentially posing accessibility challenges for residents without private transport or those with mobility constraints.
POP$area_km2 <- st_area(POP) / 1e6
POP$area_km2 <- as.numeric(POP$area_km2)
POP_with_counts$area_km2 <- st_area(POP_with_counts) / 1e6
POP_with_counts$area_km2 <- as.numeric(POP_with_counts$area_km2)
POP_with_counts <- POP_with_counts %>%
mutate(
density_points_km2 = n_punkty / area_km2,
density_points_per_capita = ifelse(tot > 0, n_punkty / tot, NA)
)
tm_shape(POP_with_counts) +
tm_polygons("density_points_km2", palette = "Blues", style = "jenks", title = "points per km²")
This choropleth map presents the spatial density of blood collection points per square kilometer, calculated by dividing the number of points (n_punkty) within each grid cell by the cell’s surface area.
The highest densities (3+ points per km², darkest blue) are concentrated in central and western Warsaw, where service providers tend to cluster. Peripheral districts generally show low or zero density, confirming earlier findings on service gaps. The density metric highlights redundancy in central zones (overlapping service coverage), versus scarcity in outlying areas, providing evidence for targeted network optimization.
tm_shape(POP_with_counts) +
tm_polygons("density_points_per_capita", palette = "Greens", style = "jenks", title = "points per capita")