Spatial Point Pattern Analysis of Educational and Public Facilities
This project studies how educational and public facilities are distributed across two central districts of Warsaw — Śródmieście and Żoliborz. The data was collected from OpenStreetMap and includes 394 mapped locations of different facilities such as schools, kindergartens, libraries, universities, and colleges.
The study area covers 24.06 km² and all analysis was
performed using the spatstat package in R, with the
observation window defined from the merged administrative boundary of
both districts reprojected to EPSG:2180 (Polish national grid,
metres).
Research question:
Is the distribution of educational facilities spatially random,
clustered, or regular?
Does this distribution show any relationship with proximity to the city
centre?
The following R packages are used throughout this analysis:
| Package | Purpose |
|---|---|
sf |
Spatial data loading, manipulation and reprojection |
spatstat |
Core point pattern analysis framework |
spatstat.geom |
Point pattern geometry and ppp objects |
spatstat.explore |
Density estimation, K/L/G/F functions, envelopes |
tidyverse |
Data manipulation and visualisation |
viridis |
Colour palettes for spatial maps |
RColorBrewer |
Additional colour palettes for marked patterns |
osmdata |
Downloading road network from OpenStreetMap |
GET |
Global envelope tests |
We loaded two GeoJSON files from OpenStreetMap. The boundary file was
filtered to extract Śródmieście and Żoliborz districts
(admin_level = 9) and dissolved into a single polygon.
Facility points were clipped to this polygon using
st_within(). All layers were reprojected to
EPSG:2180 (Polish national grid, metres) for metric
distance calculations. The point pattern was then converted to a
ppp object and rescaled to kilometres, giving a study area
of 24.06 km² and an average intensity of 16.38
facilities per km².
Figure 1: Study Area — Śródmieście and Żoliborz Districts, Warsaw
The table below summarises the nine facility types included in the dataset:
| Facility Type | Count | Intensity (per km²) |
|---|---|---|
| school | 132 | 5.49 |
| kindergarten | 99 | 4.12 |
| library | 52 | 2.16 |
| college | 40 | 1.66 |
| university | 39 | 1.62 |
| language_school | 23 | 0.96 |
| driving_school | 4 | 0.17 |
| music_school | 3 | 0.12 |
| dancing_school | 2 | 0.08 |
The map below shows the spatial distribution of all 394 facilities within the study area, coloured by facility type.
Figure 1: Educational and public facilities by type in Śródmieście and Żoliborz
Figure 2: Distribution of facilities by type
The marked point pattern already suggests spatial inhomogeneity — facilities are not evenly spread across the study area. Schools and kindergartens appear more concentrated in the northern Żoliborz section while universities are visible primarily in the southern Śródmieście area. This visual impression is formally tested in the sections that follow.
First order analysis describes the overall intensity of a point pattern — how many points occur per unit area and whether this intensity is constant or varies spatially across the study region.
We examine intensity through quadrat counts, kernel density estimation, bandwidth selection, and formal statistical tests (KS and Berman) against spatial covariates including geographic coordinates and distance to the Warsaw city centre. ## Intensity
The average intensity of the point pattern is 16.38 facilities per km². This confirms a high density of educational infrastructure in central Warsaw. The intensity varies significantly by type — schools dominate with 5.49 per km², followed by kindergartens at 4.12 per km².
To formally test whether the pattern deviates from Complete Spatial Randomness (CSR), we applied the Chi-squared quadrat test using a 5×5 grid.
Figure 3: Quadrat count test — all facilities
The quadrat counts range from 0 in peripheral quadrats to a maximum of 64.5 per km² in the densest central quadrat. The formal test gives X² = 183.05, df = 15, p < 2.2e-16 against the clustered alternative — strong evidence to reject CSR in favour of clustering. The regular alternative gives p = 1, confirming no evidence of regularity.
Kernel density estimation was applied to visualise the continuous intensity surface across the study area. The default Gaussian kernel was used with automatic bandwidth selection.
Figure 4: Kernel density surface — all facilities
The density surface confirms the pattern observed visually — intensity peaks in the north-central zone at the Żoliborz–Śródmieście boundary, reaching values above 25 per km². Intensity decreases progressively toward the southern tip of the study area. The contour plot shows tight concentric contours in the high-density zone (values 22–24 per km²), with a single dominant hotspot rather than multiple dispersed peaks.
Four bandwidth methods were compared — Diggle (0.0099 km), PPL (0.163 km), Scott (0.51 km) and CvL (0.50 km). The Diggle bandwidth is too small for district-scale interpretation. Scott and CvL produce the smoothest and most interpretable surfaces and are preferred for describing the overall spatial trend.
To test whether intensity varies systematically with location, we applied the Kolmogorov-Smirnov CDF test and Berman test along both axes and against distance to the city centre (Palace of Culture).
| Test | Covariate | Statistic | p-value | Conclusion |
|---|---|---|---|---|
| KS test | x (E-W) | D = 0.139 | 4.5e-07 | Significant |
| KS test | y (N-S) | D = 0.121 | 1.86e-05 | Significant |
| KS test | dist to centre | D = 0.178 | 2.6e-11 | Significant |
| Berman Z2 | x | -2.37 | 0.018 | Significant |
| Berman Z2 | dist to centre | -6.47 | 9.997e-11 | Highly significant |
Figure 5: Smoothed distance to Palace of Culture (km)
All tests confirm that facility intensity is not spatially uniform. Facilities concentrate closer to the city centre — for every 1 km further from the Palace of Culture, intensity decreases by approximately 21%. Both east-west and north-south gradients are statistically significant.
Second order analysis examines the spatial dependence between points — whether the presence of one facility influences the likelihood of another facility nearby. Unlike first order analysis which describes overall intensity, second order analysis captures interactions between points at different distances.
We computed the full suite of inhomogeneous summary functions to characterise the spatial structure of the pattern at multiple scales.
Figure 6: Inhomogeneous summary functions — all facilities
All six functions give a consistent and unanimous result:
| Function | Observed vs Poisson | Interpretation |
|---|---|---|
| Kinhom | Above reference | Clustering at all scales |
| Linhom | Above diagonal | Clustering at all scales |
| PCF | Starts at ~1.9, decreases to 1 | Strong short-range clustering |
| Finhom | Right of reference | Large empty spaces between clusters |
| Ginhom | Left of reference | Short nearest-neighbour distances |
| Jinhom | Below 1, decreasing | Clustering (J < 1 = attraction) |
Even after accounting for the non-uniform intensity gradient, the pattern shows genuine spatial dependence between facility locations — facilities tend to locate near other facilities beyond what intensity variation alone explains.
To formally test whether the observed clustering exceeds what an inhomogeneous Poisson process would generate, we computed pointwise and global envelopes using 99 simulations.
## Generating 99 simulations by evaluating expression ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
## 99.
##
## Done.
## Generating 78 simulations by evaluating expression (39 to estimate the mean and
## 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
## 78.
##
## Done.
Figure 7: Pointwise and global envelopes — Linhom
The observed Linhom curve lies outside the envelope across virtually all distances in both the pointwise and global tests. This confirms that the clustering is statistically significant even under the inhomogeneous Poisson null hypothesis.
Formal test results:
| Test | Statistic | p-value | Conclusion |
|---|---|---|---|
| MAD test | 0.0898 | 0.05 | Significant |
| DCLF test | 0.00306 | 0.05 | Significant |
| Clark-Evans | R = 0.633 | < 2.2e-16 | Strong clustering |
| Hopkins-Skellam | A = 0.194 | < 2.2e-16 | Strong clustering |
The Clark-Evans R = 0.633 means facilities are on average only 63% as far from their nearest neighbour as expected under CSR — confirming extremely strong clustering.
Marked pattern analysis extends the basic point pattern by incorporating categorical labels (marks) — in our case the nine facility types. This allows us to examine whether different types of facilities show distinct spatial distributions and whether they are spatially segregated from each other.
Figure 8: Kernel density by facility type
The four types occupy clearly different parts of the study area:
The relative risk surfaces show the probability of each facility type at each location, conditioning on a facility being present. This removes the effect of overall intensity and reveals pure spatial differentiation between types.
Figure 9: Relative risk — probability of each facility type
Schools have the highest probability (up to 0.45) in northwestern Żoliborz. Kindergartens dominate the northern tip (up to 0.65). Universities show a sharp probability peak in southern Śródmieście (up to 0.15). Libraries show the most even spatial distribution of all types.
To formally test whether facility types are spatially segregated we applied the Monte Carlo segregation test with 99 simulations.
The null hypothesis is that all types share the same spatial distribution (random labelling). The result:
T = 9.24, p = 0.01 — the nine facility types are significantly spatially segregated at the 1% level. Schools and kindergartens concentrate in Żoliborz while universities concentrate in southern Śródmieście — this differentiation is far stronger than would arise by chance.
## Generating 99 simulations by evaluating expression ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
## 99.
##
## Done.
Figure 10: Schools vs Kindergartens — random labelling test
The observed Lcross curve falls within the random labelling envelope for most distances. Despite both types concentrating in Żoliborz, schools and kindergartens show no significant spatial attraction or repulsion between them. Their co-location reflects a shared response to residential demand in Żoliborz, not direct spatial dependence between the two types.
Point process models (ppm) allow us to formally estimate how the intensity of the pattern depends on spatial covariates. We fitted four Poisson process models of increasing complexity and compared them using AIC and likelihood ratio tests.
| Model | Formula | Description |
|---|---|---|
| M1 | ~ 1 |
Homogeneous Poisson — CSR null model |
| M2 | ~ x + y |
Intensity varies with coordinates |
| M3 | ~ dist_im |
Intensity driven by distance to city centre |
| M4 | ~ polynom(dist_im, 2) |
Non-linear distance effect |
| Model | Formula | AIC |
|---|---|---|
| M1 — Homogeneous | ~1 | -1413.2 |
| M2 — x + y trend | ~x + y | -1443.9 |
| M3 — Distance to centre | ~dist_im | -1452.7 |
| M4 — Polynomial distance | ~polynom(dist_im, 2) | -1450.7 |
Model 3 (distance to city centre) is the best fitting model with AIC = -1453.0. The likelihood ratio test confirms distance to centre significantly improves over the null model (deviance = 39.9, p = 2.7e-10). The polynomial term in Model 4 does not improve fit (p = 0.60) — the linear relationship is adequate.
Key coefficient from Model 3: The distance coefficient is β = -0.233 — for every 1 km further from the Palace of Culture, intensity multiplies by exp(-0.233) = 0.79, a 21% reduction per km.
Figure 11: Fitted intensity surface — Model 3 (distance to city centre)
Figure 12: Relative intensity vs distance to Palace of Culture
The rhohat plot reveals that the model underestimates intensity at ~2–3 km from the Palace — the true hotspot is at the Żoliborz–Śródmieście boundary, not at the Palace itself. The residual K-function confirms remaining clustering after fitting, suggesting a cluster process model would improve the fit further.
Line pattern analysis examines the relationship between point locations and the underlying street network. Rather than treating space as a uniform plane, we consider facilities as located on or near a network of roads, which reflects how people actually access these facilities in an urban setting.
The road network for Śródmieście and Żoliborz was downloaded from
OpenStreetMap using the osmdata package, covering primary,
secondary, tertiary and residential streets.
Figure 13: Facilities by type on the street network
The map confirms the spatial segregation identified in earlier sections — kindergartens (purple) and schools (pink) dominate the residential streets of Żoliborz in the northern section, while universities (grey) and colleges (red) are concentrated along the major arteries of southern Śródmieście.
Figure 14: Road segments coloured by nearby facility count (100m buffer)
The road density map shows that most segments are dark purple (low density) while a central corridor of higher-density segments is visible running north-south through the study area.
| Metric | Value |
|---|---|
| Road segments with 0 facilities within 100m | 1623 (59%) |
| Road segments with 1+ facilities within 100m | 1149 (41%) |
| Maximum facilities near one road segment | 20 |
41% of road segments are within 100m of at least one facility, confirming that educational infrastructure in central Warsaw is well integrated with the street network. The maximum of 20 facilities near a single road segment occurs on the main central arteries of Śródmieście.
This project analysed the spatial distribution of 394 educational and public facilities across Śródmieście and Żoliborz, Warsaw. The key findings are:
Educational facilities in Śródmieście and Żoliborz are strongly clustered — confirmed by every method applied, with the Clark-Evans R = 0.633 showing facilities are on average only 63% as far from their nearest neighbour as expected under complete spatial randomness
The best fitting point process model shows that for every 1 km further from the Palace of Culture, facility intensity decreases by approximately 21% — distance to the city centre is the strongest spatial driver of facility distribution
Schools and kindergartens concentrate in the residential streets of Żoliborz while universities and colleges concentrate in the institutional zone of southern Śródmieście — libraries occupy a central intermediate position accessible to both districts
The nine facility types are significantly spatially segregated (p = 0.01) — each type occupies a distinct spatial niche reflecting the different urban character of the two districts
41% of road segments are within 100m of at least one facility, confirming good street network accessibility of educational infrastructure in central Warsaw