1 Introduction

This project studies how educational and public facilities are distributed across two central districts of Warsaw — Śródmieście and Żoliborz. The data was collected from OpenStreetMap and includes 394 mapped locations of different facilities such as schools, kindergartens, libraries, universities, and colleges.

The study area covers 24.06 km² and all analysis was performed using the spatstat package in R, with the observation window defined from the merged administrative boundary of both districts reprojected to EPSG:2180 (Polish national grid, metres).

Research question:

Is the distribution of educational facilities spatially random, clustered, or regular?
Does this distribution show any relationship with proximity to the city centre?

2 Packages and Setup

The following R packages are used throughout this analysis:

Package Purpose
sf Spatial data loading, manipulation and reprojection
spatstat Core point pattern analysis framework
spatstat.geom Point pattern geometry and ppp objects
spatstat.explore Density estimation, K/L/G/F functions, envelopes
tidyverse Data manipulation and visualisation
viridis Colour palettes for spatial maps
RColorBrewer Additional colour palettes for marked patterns
osmdata Downloading road network from OpenStreetMap
GET Global envelope tests

3 Data Preparation

We loaded two GeoJSON files from OpenStreetMap. The boundary file was filtered to extract Śródmieście and Żoliborz districts (admin_level = 9) and dissolved into a single polygon. Facility points were clipped to this polygon using st_within(). All layers were reprojected to EPSG:2180 (Polish national grid, metres) for metric distance calculations. The point pattern was then converted to a ppp object and rescaled to kilometres, giving a study area of 24.06 km² and an average intensity of 16.38 facilities per km².

Figure 1: Study Area — Śródmieście and Żoliborz Districts, Warsaw

Figure 1: Study Area — Śródmieście and Żoliborz Districts, Warsaw

The table below summarises the nine facility types included in the dataset:

Table 1: Educational and public facilities by type
Facility Type Count Intensity (per km²)
school 132 5.49
kindergarten 99 4.12
library 52 2.16
college 40 1.66
university 39 1.62
language_school 23 0.96
driving_school 4 0.17
music_school 3 0.12
dancing_school 2 0.08

The map below shows the spatial distribution of all 394 facilities within the study area, coloured by facility type.

Figure 1: Educational and public facilities by type in Śródmieście and Żoliborz

Figure 1: Educational and public facilities by type in Śródmieście and Żoliborz

Figure 2: Distribution of facilities by type

Figure 2: Distribution of facilities by type

The marked point pattern already suggests spatial inhomogeneity — facilities are not evenly spread across the study area. Schools and kindergartens appear more concentrated in the northern Żoliborz section while universities are visible primarily in the southern Śródmieście area. This visual impression is formally tested in the sections that follow.

4 First Order Analysis

First order analysis describes the overall intensity of a point pattern — how many points occur per unit area and whether this intensity is constant or varies spatially across the study region.

We examine intensity through quadrat counts, kernel density estimation, bandwidth selection, and formal statistical tests (KS and Berman) against spatial covariates including geographic coordinates and distance to the Warsaw city centre. ## Intensity

The average intensity of the point pattern is 16.38 facilities per km². This confirms a high density of educational infrastructure in central Warsaw. The intensity varies significantly by type — schools dominate with 5.49 per km², followed by kindergartens at 4.12 per km².

4.1 Quadrat Test

To formally test whether the pattern deviates from Complete Spatial Randomness (CSR), we applied the Chi-squared quadrat test using a 5×5 grid.

Figure 3: Quadrat count test — all facilities

Figure 3: Quadrat count test — all facilities

The quadrat counts range from 0 in peripheral quadrats to a maximum of 64.5 per km² in the densest central quadrat. The formal test gives X² = 183.05, df = 15, p < 2.2e-16 against the clustered alternative — strong evidence to reject CSR in favour of clustering. The regular alternative gives p = 1, confirming no evidence of regularity.

4.2 Kernel Density Estimation

Kernel density estimation was applied to visualise the continuous intensity surface across the study area. The default Gaussian kernel was used with automatic bandwidth selection.

Figure 4: Kernel density surface — all facilities

Figure 4: Kernel density surface — all facilities

The density surface confirms the pattern observed visually — intensity peaks in the north-central zone at the Żoliborz–Śródmieście boundary, reaching values above 25 per km². Intensity decreases progressively toward the southern tip of the study area. The contour plot shows tight concentric contours in the high-density zone (values 22–24 per km²), with a single dominant hotspot rather than multiple dispersed peaks.

4.3 Bandwidth Selection

Four bandwidth methods were compared — Diggle (0.0099 km), PPL (0.163 km), Scott (0.51 km) and CvL (0.50 km). The Diggle bandwidth is too small for district-scale interpretation. Scott and CvL produce the smoothest and most interpretable surfaces and are preferred for describing the overall spatial trend.

4.4 Spatial Distribution Tests

To test whether intensity varies systematically with location, we applied the Kolmogorov-Smirnov CDF test and Berman test along both axes and against distance to the city centre (Palace of Culture).

Test Covariate Statistic p-value Conclusion
KS test x (E-W) D = 0.139 4.5e-07 Significant
KS test y (N-S) D = 0.121 1.86e-05 Significant
KS test dist to centre D = 0.178 2.6e-11 Significant
Berman Z2 x -2.37 0.018 Significant
Berman Z2 dist to centre -6.47 9.997e-11 Highly significant
Figure 5: Smoothed distance to Palace of Culture (km)

Figure 5: Smoothed distance to Palace of Culture (km)

All tests confirm that facility intensity is not spatially uniform. Facilities concentrate closer to the city centre — for every 1 km further from the Palace of Culture, intensity decreases by approximately 21%. Both east-west and north-south gradients are statistically significant.

5 Second Order Analysis

Second order analysis examines the spatial dependence between points — whether the presence of one facility influences the likelihood of another facility nearby. Unlike first order analysis which describes overall intensity, second order analysis captures interactions between points at different distances.

5.1 K, L, G, F and J Functions

We computed the full suite of inhomogeneous summary functions to characterise the spatial structure of the pattern at multiple scales.

Figure 6: Inhomogeneous summary functions — all facilities

Figure 6: Inhomogeneous summary functions — all facilities

All six functions give a consistent and unanimous result:

Function Observed vs Poisson Interpretation
Kinhom Above reference Clustering at all scales
Linhom Above diagonal Clustering at all scales
PCF Starts at ~1.9, decreases to 1 Strong short-range clustering
Finhom Right of reference Large empty spaces between clusters
Ginhom Left of reference Short nearest-neighbour distances
Jinhom Below 1, decreasing Clustering (J < 1 = attraction)

Even after accounting for the non-uniform intensity gradient, the pattern shows genuine spatial dependence between facility locations — facilities tend to locate near other facilities beyond what intensity variation alone explains.

5.2 Envelope Tests

To formally test whether the observed clustering exceeds what an inhomogeneous Poisson process would generate, we computed pointwise and global envelopes using 99 simulations.

## Generating 99 simulations by evaluating expression  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
## 99.
## 
## Done.
## Generating 78 simulations by evaluating expression (39 to estimate the mean and 
## 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 
## 78.
## 
## Done.
Figure 7: Pointwise and global envelopes — Linhom

Figure 7: Pointwise and global envelopes — Linhom

The observed Linhom curve lies outside the envelope across virtually all distances in both the pointwise and global tests. This confirms that the clustering is statistically significant even under the inhomogeneous Poisson null hypothesis.

Formal test results:

Test Statistic p-value Conclusion
MAD test 0.0898 0.05 Significant
DCLF test 0.00306 0.05 Significant
Clark-Evans R = 0.633 < 2.2e-16 Strong clustering
Hopkins-Skellam A = 0.194 < 2.2e-16 Strong clustering

The Clark-Evans R = 0.633 means facilities are on average only 63% as far from their nearest neighbour as expected under CSR — confirming extremely strong clustering.

6 Marked Pattern Analysis

Marked pattern analysis extends the basic point pattern by incorporating categorical labels (marks) — in our case the nine facility types. This allows us to examine whether different types of facilities show distinct spatial distributions and whether they are spatially segregated from each other.

6.1 Intensity by Facility Type

Figure 8: Kernel density by facility type

Figure 8: Kernel density by facility type

The four types occupy clearly different parts of the study area:

  • Schools — hotspot in northwestern Żoliborz (peak ~10 per km²)
  • Kindergartens — hotspot in northern Żoliborz tip (peak ~8 per km²)
  • Libraries — central-western corridor, most evenly distributed type
  • Universities — sharply concentrated in southern Śródmieście, nearly absent from Żoliborz

6.2 Relative Risk

The relative risk surfaces show the probability of each facility type at each location, conditioning on a facility being present. This removes the effect of overall intensity and reveals pure spatial differentiation between types.

Figure 9: Relative risk — probability of each facility type

Figure 9: Relative risk — probability of each facility type

Schools have the highest probability (up to 0.45) in northwestern Żoliborz. Kindergartens dominate the northern tip (up to 0.65). Universities show a sharp probability peak in southern Śródmieście (up to 0.15). Libraries show the most even spatial distribution of all types.

6.3 Segregation Test

To formally test whether facility types are spatially segregated we applied the Monte Carlo segregation test with 99 simulations.

The null hypothesis is that all types share the same spatial distribution (random labelling). The result:

T = 9.24, p = 0.01 — the nine facility types are significantly spatially segregated at the 1% level. Schools and kindergartens concentrate in Żoliborz while universities concentrate in southern Śródmieście — this differentiation is far stronger than would arise by chance.

6.4 Schools vs Kindergartens — Independence Test

## Generating 99 simulations by evaluating expression  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
## 99.
## 
## Done.
Figure 10: Schools vs Kindergartens — random labelling test

Figure 10: Schools vs Kindergartens — random labelling test

The observed Lcross curve falls within the random labelling envelope for most distances. Despite both types concentrating in Żoliborz, schools and kindergartens show no significant spatial attraction or repulsion between them. Their co-location reflects a shared response to residential demand in Żoliborz, not direct spatial dependence between the two types.

7 Point Process Models

Point process models (ppm) allow us to formally estimate how the intensity of the pattern depends on spatial covariates. We fitted four Poisson process models of increasing complexity and compared them using AIC and likelihood ratio tests.

7.1 Model Specifications

Model Formula Description
M1 ~ 1 Homogeneous Poisson — CSR null model
M2 ~ x + y Intensity varies with coordinates
M3 ~ dist_im Intensity driven by distance to city centre
M4 ~ polynom(dist_im, 2) Non-linear distance effect

7.2 Model Comparison

Table 3: Model comparison by AIC (lower = better fit)
Model Formula AIC
M1 — Homogeneous ~1 -1413.2
M2 — x + y trend ~x + y -1443.9
M3 — Distance to centre ~dist_im -1452.7
M4 — Polynomial distance ~polynom(dist_im, 2) -1450.7

Model 3 (distance to city centre) is the best fitting model with AIC = -1453.0. The likelihood ratio test confirms distance to centre significantly improves over the null model (deviance = 39.9, p = 2.7e-10). The polynomial term in Model 4 does not improve fit (p = 0.60) — the linear relationship is adequate.

Key coefficient from Model 3: The distance coefficient is β = -0.233 — for every 1 km further from the Palace of Culture, intensity multiplies by exp(-0.233) = 0.79, a 21% reduction per km.

7.3 Fitted Intensity

Figure 11: Fitted intensity surface — Model 3 (distance to city centre)

Figure 11: Fitted intensity surface — Model 3 (distance to city centre)

7.4 Model Validation

Figure 12: Relative intensity vs distance to Palace of Culture

Figure 12: Relative intensity vs distance to Palace of Culture

The rhohat plot reveals that the model underestimates intensity at ~2–3 km from the Palace — the true hotspot is at the Żoliborz–Śródmieście boundary, not at the Palace itself. The residual K-function confirms remaining clustering after fitting, suggesting a cluster process model would improve the fit further.

8 Line Pattern Analysis

Line pattern analysis examines the relationship between point locations and the underlying street network. Rather than treating space as a uniform plane, we consider facilities as located on or near a network of roads, which reflects how people actually access these facilities in an urban setting.

The road network for Śródmieście and Żoliborz was downloaded from OpenStreetMap using the osmdata package, covering primary, secondary, tertiary and residential streets.

8.1 Facilities on the Street Network

Figure 13: Facilities by type on the street network

Figure 13: Facilities by type on the street network

The map confirms the spatial segregation identified in earlier sections — kindergartens (purple) and schools (pink) dominate the residential streets of Żoliborz in the northern section, while universities (grey) and colleges (red) are concentrated along the major arteries of southern Śródmieście.

8.2 Facility Density Along Roads

Figure 14: Road segments coloured by nearby facility count (100m buffer)

Figure 14: Road segments coloured by nearby facility count (100m buffer)

The road density map shows that most segments are dark purple (low density) while a central corridor of higher-density segments is visible running north-south through the study area.

8.3 Network Summary

Table 4: Road network accessibility statistics
Metric Value
Road segments with 0 facilities within 100m 1623 (59%)
Road segments with 1+ facilities within 100m 1149 (41%)
Maximum facilities near one road segment 20

41% of road segments are within 100m of at least one facility, confirming that educational infrastructure in central Warsaw is well integrated with the street network. The maximum of 20 facilities near a single road segment occurs on the main central arteries of Śródmieście.

9 Conclusions

This project analysed the spatial distribution of 394 educational and public facilities across Śródmieście and Żoliborz, Warsaw. The key findings are:

  • Educational facilities in Śródmieście and Żoliborz are strongly clustered — confirmed by every method applied, with the Clark-Evans R = 0.633 showing facilities are on average only 63% as far from their nearest neighbour as expected under complete spatial randomness

  • The best fitting point process model shows that for every 1 km further from the Palace of Culture, facility intensity decreases by approximately 21% — distance to the city centre is the strongest spatial driver of facility distribution

  • Schools and kindergartens concentrate in the residential streets of Żoliborz while universities and colleges concentrate in the institutional zone of southern Śródmieście — libraries occupy a central intermediate position accessible to both districts

  • The nine facility types are significantly spatially segregated (p = 0.01) — each type occupies a distinct spatial niche reflecting the different urban character of the two districts

  • 41% of road segments are within 100m of at least one facility, confirming good street network accessibility of educational infrastructure in central Warsaw