Hong Kong is one of the densest urban economies in the world, with one of the highest e-commerce penetration rates in Asia. Last-mile parcel logistics — sorting hubs, locker stations, and pick-up points — therefore competes for very limited urban space. Where these facilities are placed has direct effects on delivery cost, courier travel distance, and customer waiting time. The economic question we ask is concrete: given where commercial activity actually takes place in Hong Kong, where should large, medium, and small courier stations be located?
In the first-semester project (Qu, 2026, RPubs/1391023) we approached this question through area-level K-means clustering of working population and household income across Small Tertiary Planning Unit Groups (STPUGs). The clustering yielded three tiers of districts and informed a tiered station recommendation. That study had a clear limitation: it operated on aggregated polygons, so it could not say anything about within-area concentration of demand or about the geometry of station–station spacing.
This project upgrades the analysis to a spatial point pattern
analysis (SPPA). We treat commercial points of interest (POIs)
— the CMF/COM classes of Hong Kong’s official
iGeoCom POI dataset — as a proxy for parcel demand, and we
apply the methods discussed in classes 4–9 (Baddeley, Rubak &
Turner, 2015; Kopczewska, 2020) to ask three questions.
ppm() using
resident population (WorldPop), nighttime light radiance (VIIRS Black
Marble), road density (OSM), distance to the nearest public-transit node
(multi-modal: MTR, light rail, ferry, bus terminal), and statutory
land-use zone (Hong Kong OZP) as covariates.Kcross.inhom,
markconnect, Iest, and a Monte-Carlo
segregation test. We finally validate the model-implied high-demand
zones against the empirical pattern of Hongkong Post offices via
Kcross.The economic deliverable is a tiered map of recommended courier station sites, backed by significance tests rather than visual inspection alone.
requiredPackages <- c(
"sf", "spatstat", "terra", "sp", "magrittr",
"ggplot2", "tidyverse", "viridis", "jsonlite",
"dixon", "kableExtra", "gridExtra", "patchwork", "scales", "rnaturalearth",
"ggspatial", "prettymapr", "raster", "leaflet", "htmltools","dplyr"
)
for (p in requiredPackages) {
if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
}
invisible(lapply(requiredPackages, library, character.only = TRUE))
# Shared map styling — every map in the report uses the same palette
# and km-based axis frame so they read as one consistent series.
MAKO_REV <- function(n = 256) viridisLite::mako(n, direction = -1)
MAKO_DISC <- function(n) viridisLite::mako(n, direction = -1)
# Helper that draws axes with km labels for terra rasters (whose native
# units are HK1980 metres).
draw_km_axes <- function(xrange_m, yrange_m) {
axis(1, at = pretty(xrange_m), labels = pretty(xrange_m) / 1000)
axis(2, at = pretty(yrange_m), labels = pretty(yrange_m) / 1000, las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5, cex = 0.95)
mtext("HK1980 Grid North (km)", side = 2, line = 2.5, cex = 0.95)
}
The methodological backbone is Baddeley, Rubak & Turner (2015),
Spatial Point Patterns: Methodology and Applications with R. We
use the workflow taught in classes 4–9: build a ppp object
inside an owin window, summarise its first-order intensity
(homogeneous and inhomogeneous), test CSR with
quadrat.test, clarkevans.test, and
hopskel.test, and study second-order structure with
Kest, Lest, and pcf (with their
inhomogeneous counterparts Kinhom, Linhom,
pcfinhom). Departures from CSR are assessed with global and
pointwise envelopes and with the mad.test and
dclf.test Monte Carlo tests (classes 5 and 8).
Kopczewska (2020), Applied Spatial Statistics and Econometrics, provides the framing for moving from spatial description to a regression-style explanation: the intensity of points is treated as the dependent variable in a Poisson process model whose log-intensity is loglinear in covariates. We follow its applied treatment of spatial residual diagnostics, lurking variable plots, and the distinction between inhomogeneity that is covariate-explained and inhomogeneity that is residual clustering.
Arbia, Espa & Giuliani (2021), Spatial Microeconometrics, motivates the economic framing. They argue that firms (and, by extension, retail-related points) cannot be modelled as if independently located: agglomeration externalities, infrastructure access, and zoning constraints generate observable spatial dependence. The authors recommend a two-step strategy — first remove inhomogeneity attributable to first-order covariates, then test whether second-order clustering remains. We adopt this recommendation: the inhomogeneous K function \(K_{\text{inhom}}\) is interpreted only after fitting the inhomogeneous Poisson model.
For the tiered-station question we follow class 9
(rmpoispp, ppm(... ~ marks),
Kcross.inhom, markconnect, Iest,
relrisk.ppp). The literature on retail and firm location
(Arbia et al., 2021, ch. 4–5) treats interaction between firm types as
analogous to interaction between species in ecology, but with the added
constraint that economic types compete for and segregate within
land-use zones. This motivates using zone as a covariate in the marked
model and segregation.test.ppp as a model-free test of mark
separation.
Direct parcel-flow data are not publicly available for Hong Kong. We follow a literature that uses POIs as a high-resolution proxy for commercial activity and for delivery demand (Wang et al., 2019, on POI density as a logistics demand indicator; Schroeder & Cesarini, 2022, on parcel-locker siting). The implicit assumption is monotonicity: areas with more shops, offices, and food-and-beverage points generate more parcels, even if the per-POI parcel rate differs by type.
Compared with OpenStreetMap, the official Hong Kong
iGeoCom dataset (Lands Department) supplies categorically
clean and quality-controlled POIs with classification codes
(CLASS/TYPE/SUBCAT) that map
directly to the zoning system. We therefore use iGeoCom for
both demand POIs and transport-node accessibility; OSM is used only for
road density.
Following Arbia et al. (2021, ch. 6), we treat Hongkong Post offices as an observed control point pattern: they were sited under a public-service mandate (universal coverage), not under profit maximisation. If our predicted high-intensity zones spatially coincide with post offices at short distance (low \(r\) in \(K_{\text{cross}}\)), this confirms that the predicted zones are operationally accessible. If at intermediate distance the cross-K is below the independence baseline, this is consistent with post offices having captured the easy sites and our recommendations identifying complementary, non-overlapping locations.
The dataset is constructed exactly as specified in
HK_dataset_checklist.xlsx. All paths are relative to the
project folder.
| Block | Variable | File | CRS |
|---|---|---|---|
| Window | 18 districts boundary | data/hksar_18_district_boundary.json |
WGS84 |
| Demand | iGeoCom POIs (CLASS ∈ CMF,
COM) |
data/iGeoCom.geojson |
WGS84 |
| Transit | iGeoCom transit nodes (TYPE ∈
RSN/LRA/FER/PIE/FET/BUS) |
data/iGeoCom.geojson |
WGS84 |
| Covariate | Resident population | data/hkg_ppp_2020.tif (WorldPop) |
WGS84 |
| Covariate | Nighttime light radiance | data/VNL_npp_2024_..._median_masked.dat.tif.gz |
WGS84 |
| Covariate | Road segments (for road density) | data/hong-kong-260507-free.shp/gis_osm_roads_free_1.shp |
WGS84 |
| Covariate | OZP land-use zoning | data/GeoJSON_Statutory_Plans/Statutory Plan GIS Data GeoJSON/ZONE.json |
HK1980 |
| Validation | Hongkong Post offices | data/post-office.json |
WGS84 |
All layers are projected to HK1980 Grid (EPSG:2326)
for analysis (units in metres, then rescaled to km in the
ppp object).
data_dir <- "data"
fp_bdry <- file.path(data_dir, "hksar_18_district_boundary.json")
fp_igeocom <- file.path(data_dir, "iGeoCom.geojson")
fp_pop <- file.path(data_dir, "hkg_ppp_2020.tif")
fp_viirs <- file.path(data_dir,
"VNL_npp_2024_global_vcmslcfg_v2_c202502261200.median_masked.dat.tif.gz")
fp_roads <- file.path(data_dir, "hong-kong-260507-free.shp",
"gis_osm_roads_free_1.shp")
fp_ozp_zone <- file.path(data_dir, "GeoJSON_Statutory_Plans",
"Statutory Plan GIS Data GeoJSON", "ZONE.json")
fp_post <- file.path(data_dir, "post-office.json")
CRS_HK <- 2326 # HK 1980 Grid (metres)
CRS_LL <- 4326 # WGS 84
The Hong Kong Town Planning Board ships the OZP statutory plans as
Esri JSON (despite the folder name). The structure is
features[[i]]$attributes +
features[[i]]$geometry$rings, not the
properties + coordinates of GeoJSON. We write
a small parser.
read_esri_polygon <- function(path) {
raw <- jsonlite::fromJSON(path, simplifyVector = FALSE)
sr <- raw$spatialReference
crs_in <- if (!is.null(sr$latestWkid)) sr$latestWkid
else if (!is.null(sr$wkid)) sr$wkid
else 2326
feats <- raw$features
geoms <- lapply(feats, function(f) {
rings <- lapply(f$geometry$rings, function(r) {
matrix(unlist(r), ncol = 2, byrow = TRUE)
})
sf::st_polygon(rings)
})
attrs <- do.call(rbind, lapply(feats, function(f) {
as.data.frame(lapply(f$attributes,
function(x) if (is.null(x)) NA else x),
stringsAsFactors = FALSE)
}))
sf::st_sf(attrs, geometry = sf::st_sfc(geoms, crs = crs_in))
}
hk18 <- st_read(fp_bdry, quiet = TRUE) %>%
st_transform(CRS_HK) %>%
st_make_valid()
hk_bdry <- st_union(hk18) %>% st_make_valid()
W <- as.owin(hk_bdry)
# rescaling factor: 1 unit in km = 1000 m
rsc <- 1000
# Shared map extent in HK1980 Grid km — used by every map in the report
# so that visual comparisons across covariates / maps are honest.
# Wide enough to show all of HK including the outlying islands.
DEV_XLIM <- c(795, 860)
DEV_YLIM <- c(805, 848)
# Scaled district polygons (km units, CRS stripped) — for overlaying on
# spatstat im / ppp plots that already live in km after rescale.ppp().
hk18_km <- hk18
sf::st_geometry(hk18_km) <- sf::st_geometry(hk18) * diag(c(1 / rsc, 1 / rsc))
sf::st_crs(hk18_km) <- NA
par(mar = c(4, 4, 4.5, 2), bty = "n")
plot(W, main = "")
plot(st_geometry(hk18), border = "grey60", add = TRUE)
title(main = "Hong Kong observation window (HK1980 Grid)", line = 2)
igc <- st_read(fp_igeocom, quiet = TRUE)
igc <- st_transform(igc, CRS_HK)
# Crop to HK window
igc <- igc[st_intersects(igc, hk_bdry, sparse = FALSE)[, 1], ]
# Demand POIs: commercial classes
demand <- igc %>%
filter(CLASS %in% c("COM", "CMF"))
# Transit nodes (4 transport modes — HK is islands so ferries matter)
transit <- igc %>%
filter(
(CLASS == "TRS" & TYPE %in% c("RSN", "LRA", "FER", "PIE", "FET")) |
(CLASS == "BUS" & TYPE == "BUS")
) %>%
mutate(transit_mode = case_when(
TYPE == "RSN" ~ "MTR/Heavy rail",
TYPE == "LRA" ~ "Light rail",
TYPE %in% c("FER", "PIE", "FET") ~ "Ferry/pier",
TYPE == "BUS" ~ "Bus terminus"
))
cat("Demand POIs (CLASS in {COM, CMF}):", nrow(demand), "\n")
## Demand POIs (CLASS in {COM, CMF}): 3875
cat("Transit nodes (multi-modal): ", nrow(transit), "\n")
## Transit nodes (multi-modal): 1345
table(transit$transit_mode)
##
## Bus terminus Ferry/pier Light rail MTR/Heavy rail
## 649 530 68 98
ggplot() +
geom_sf(data = hk18, fill = NA, colour = "grey70", linewidth = 0.2) +
geom_sf(data = demand, colour = "steelblue", size = 0.25, alpha = 0.5) +
geom_sf(data = transit, aes(colour = transit_mode), size = 1, alpha = 0.9) +
scale_colour_viridis_d(end = 0.85) +
labs(title = "iGeoCom commercial POIs (blue) and transit nodes (coloured)",
colour = "Transit mode") +
theme_minimal(base_size = 11)
pop <- terra::rast(fp_pop) # WGS84
pop <- terra::project(pop, paste0("EPSG:", CRS_HK))
pop <- terra::crop(pop, terra::vect(hk_bdry))
pop <- terra::mask(pop, terra::vect(hk_bdry))
names(pop) <- "pop"
par(mar = c(4, 4, 4.5, 6), bty = "n")
terra::plot(pop,
main = "", col = MAKO_REV(256),
xlim = DEV_XLIM * rsc, ylim = DEV_YLIM * rsc,
axes = FALSE, xlab = "", ylab = "")
draw_km_axes(DEV_XLIM * rsc, DEV_YLIM * rsc)
title(main = "WorldPop 2020 — population density", line = 2)
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
mtext("Residents per 100 m cell; 18 districts overlaid in white",
side = 1, line = 3.6, cex = 0.85)
Coordinate reference for every map below. From here on, all maps share the same frame: HK1980 Grid (EPSG:2326), units in km. The Grid origin at
(836.694, 819.070)km corresponds to Tsim Sha Tsui at latitude 22.312°N, longitude 114.179°E (WGS84). Local conversion:+1 km East ≈ +0.0097° lon,+1 km North ≈ +0.0090° lat. To paste any (x_km, y_km) into Google Maps, recompute lat/lon withsf::st_transform(); the table at §4.8 already addslon/latcolumns.
The Black Marble file ships gzipped — terra::rast()
reads it through GDAL’s virtual /vsigzip/ driver.
viirs_path <- paste0("/vsigzip/", normalizePath(fp_viirs))
viirs <- terra::rast(viirs_path)
viirs <- terra::crop(viirs, terra::ext(c(113.7, 114.6, 22.0, 22.7)))
viirs <- terra::project(viirs, paste0("EPSG:", CRS_HK))
viirs <- terra::crop(viirs, terra::vect(hk_bdry))
viirs <- terra::mask(viirs, terra::vect(hk_bdry))
names(viirs) <- "viirs"
# replace negative / NA pixels with 0 (masked dim background)
viirs[is.na(viirs) | viirs < 0] <- 0
par(mar = c(4, 4, 4.5, 6), bty = "n")
terra::plot(log1p(viirs),
main = "", col = MAKO_REV(256),
xlim = DEV_XLIM * rsc, ylim = DEV_YLIM * rsc,
axes = FALSE, xlab = "", ylab = "")
draw_km_axes(DEV_XLIM * rsc, DEV_YLIM * rsc)
title(main = "VIIRS NTL 2024 (median masked, log1p scale)", line = 2)
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
mtext("Annual median-masked nighttime radiance; 18 districts overlaid",
side = 1, line = 3.6, cex = 0.85)
We compute road-line density per km² with density.psp.
Major and minor roads are kept; small footways are dropped to avoid
spurious density in country-park trails.
roads <- st_read(fp_roads, quiet = TRUE) %>%
filter(fclass %in% c("motorway", "trunk", "primary", "secondary",
"tertiary", "residential", "unclassified",
"service", "living_street")) %>%
st_transform(CRS_HK) %>%
st_intersection(hk_bdry)
roads_lines <- st_cast(roads, "LINESTRING") %>%
st_geometry()
psp_roads <- as.psp(roads_lines, window = W)
psp_roads <- rescale.psp(psp_roads, rsc, "km")
road_dens_im <- density.psp(psp_roads, sigma = 0.5, dimyx = 512)
par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(road_dens_im,
main = "",
col = MAKO_REV(256),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "OSM road-line density (km / km²)", line = 2)
mtext("Higher density along Kowloon corridors and northern HK Island",
side = 1, line = 3.8, cex = 0.85)
Following the data dictionary: ZONE_LABEL carries the
zone code, DESC_ENG its full name, SPUSE_ENG
the sub-use annotation (important for OU “other specified
uses”).
ozp <- read_esri_polygon(fp_ozp_zone) %>%
st_transform(CRS_HK) %>%
st_make_valid()
cat("OZP polygons:", nrow(ozp), "across",
length(unique(ozp$PLAN_NO)), "plans\n")
## OZP polygons: 12041 across 167 plans
# 4-class collapse per checklist (C / R / I / O)
ozp$land_use_zone <- dplyr::case_when(
grepl("Industri", ozp$DESC_ENG, ignore.case = TRUE) |
grepl("Industri", ozp$SPUSE_ENG, ignore.case = TRUE) ~ "I",
grepl("^C(\\(|$|DA)", ozp$ZONE_LABEL) ~ "C",
grepl("Commerc", ozp$DESC_ENG, ignore.case = TRUE) ~ "C",
grepl("^R\\(|^V$", ozp$ZONE_LABEL) ~ "R",
TRUE ~ "O"
)
ozp$land_use_zone <- factor(ozp$land_use_zone,
levels = c("C", "R", "I", "O"))
table(ozp$land_use_zone)
##
## C R I O
## 605 3859 86 7491
# Backfill HK land NOT covered by any OZP plan as "O" (country parks,
# unplanned outlying islands, etc.) so the map shows every piece of land.
ozp_dissolved <- st_union(ozp)
unplanned <- st_difference(hk_bdry, ozp_dissolved) %>%
st_make_valid()
unplanned_sf <- st_sf(land_use_zone = factor("O", levels = c("C","R","I","O")),
geometry = st_sfc(unplanned, crs = CRS_HK))
ggplot() +
geom_sf(data = unplanned_sf,
aes(fill = land_use_zone), colour = NA) +
geom_sf(data = ozp,
aes(fill = land_use_zone), colour = NA) +
# 18-district outlines for spatial reference (same overlay used in
# every other map in the report).
geom_sf(data = hk18, fill = NA, colour = "grey30",
linewidth = 0.25) +
scale_fill_viridis_d(option = "mako", direction = -1, name = "Zone") +
# Same window as every other map in the report.
coord_sf(xlim = DEV_XLIM * rsc,
ylim = DEV_YLIM * rsc,
crs = CRS_HK,
datum = sf::st_crs(CRS_HK),
expand = FALSE) +
scale_x_continuous(labels = function(x) x / rsc) +
scale_y_continuous(labels = function(y) y / rsc) +
labs(title = "OZP land use (4-class collapse, all HK land covered)",
x = "HK1980 Grid East (km)",
y = "HK1980 Grid North (km)",
caption = paste(
"Map zoomed to the densely-developed HK area so individual zones",
"remain readable; remote uninhabited islands and pure-sea zones",
"(e.g., Soko Islands, Po Toi, eastern outer islands) are cropped.",
sep = "\n")) +
theme_minimal(base_size = 11) +
theme(panel.grid.minor = element_blank(),
plot.caption = element_text(hjust = 0, size = 9,
colour = "grey30"))
ppp objectsxy_demand <- st_coordinates(demand)
xy_transit <- st_coordinates(transit)
# Unmarked demand pattern
pattern.um <- ppp(x = xy_demand[, 1], y = xy_demand[, 2], window = W)
pattern.um <- rjitter(pattern.um, 1) # ~1 m jitter to dedup
pattern.um <- rescale.ppp(pattern.um, rsc, "km") # m -> km
# Transit pattern (used to derive distance covariate)
pattern.transit <- ppp(x = xy_transit[, 1], y = xy_transit[, 2], window = W)
pattern.transit <- rescale.ppp(pattern.transit, rsc, "km")
cat("Demand pattern:\n"); print(pattern.um)
## Demand pattern:
## Planar point pattern: 3875 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
cat("\nTransit pattern:\n"); print(pattern.transit)
##
## Transit pattern:
## Planar point pattern: 1345 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
Without semester-1 K-means cluster labels in data/, we
derive marks data-drivenly from VIIRS NTL at each
demand POI: terciles of NTL define Small / Medium / Large station
candidates. This keeps the mark exogenous to the spatial
coordinates while encoding economic activity, which is what the proposal
calls for.
demand$viirs_ntl <- terra::extract(viirs, terra::vect(demand))[, 2]
demand$viirs_ntl[is.na(demand$viirs_ntl)] <- 0
q <- quantile(demand$viirs_ntl, c(1/3, 2/3), na.rm = TRUE)
demand$station_tier <- cut(
demand$viirs_ntl,
breaks = c(-Inf, q, Inf),
labels = c("Small", "Medium", "Large"),
include.lowest = TRUE
)
table(demand$station_tier)
##
## Small Medium Large
## 1294 1304 1277
pattern.m <- ppp(x = xy_demand[, 1], y = xy_demand[, 2],
window = W,
marks = demand$station_tier)
pattern.m <- rjitter(pattern.m, 1)
pattern.m <- rescale.ppp(pattern.m, rsc, "km")
print(pattern.m)
## Marked planar point pattern: 3875 points
## Multitype, with levels = Small, Medium, Large
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
ppm()ppm() needs covariates as im objects on a
common grid. We resample each raster to a 200 m grid and convert.
target_template <- terra::rast(terra::ext(terra::vect(hk_bdry)),
resolution = 200,
crs = paste0("EPSG:", CRS_HK))
resample_to <- function(r, template) {
r2 <- terra::resample(r, template, method = "bilinear")
r2 <- terra::mask(r2, terra::vect(hk_bdry))
r2
}
pop_r <- resample_to(pop, target_template)
viirs_r <- resample_to(viirs, target_template)
# convert terra SpatRaster -> spatstat im, in km
terra_to_im <- function(r) {
m <- terra::as.matrix(r, wide = TRUE)
m <- m[nrow(m):1, ] # terra is top-down, im is bottom-up
ext <- terra::ext(r)
im(m,
xrange = c(ext$xmin, ext$xmax) / rsc,
yrange = c(ext$ymin, ext$ymax) / rsc,
unitname = c("km", "km"))
}
im_pop <- terra_to_im(pop_r)
im_viirs <- terra_to_im(viirs_r)
im_road <- road_dens_im # already in km units
im_dtran <- distmap(pattern.transit) # km, from rescaled ppp
# OZP factor image: rasterise, convert to im, restore factor levels
ozp_r <- terra::rasterize(terra::vect(ozp), target_template,
field = "land_use_zone")
lv <- terra::levels(ozp_r)[[1]]
m_ozp <- terra::as.matrix(ozp_r, wide = TRUE); m_ozp <- m_ozp[nrow(m_ozp):1, ]
ext_z <- terra::ext(ozp_r)
im_zone_int <- im(m_ozp,
xrange = c(ext_z$xmin, ext_z$xmax) / rsc,
yrange = c(ext_z$ymin, ext_z$ymax) / rsc,
unitname = c("km", "km"))
im_zone <- eval.im(factor(im_zone_int,
levels = lv$ID,
labels = lv[[2]]))
par(mfrow = c(2, 2), mar = c(2, 2, 3, 1), bty = "n")
plot(im_pop, main = "Population (WorldPop)", col = MAKO_REV(256))
plot(im_viirs, main = "Nighttime light (VIIRS)", col = MAKO_REV(256))
plot(im_road, main = "Road density", col = MAKO_REV(256))
plot(im_dtran, main = "Distance to transit (km)", col = MAKO_REV(256))
par(mfrow = c(1, 1))
# Land-use zone — drawn larger with full axes, same coordinate frame as
# the WorldPop / VIIRS / road / quadrat / KDE maps for visual comparison.
par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(im_zone,
main = "",
col = MAKO_DISC(length(levels(im_zone))),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Land-use zone (4-class)", line = 2)
mtext(paste(
"Rasterised OZP polygons on the analysis grid; non-OZP land",
"(country parks, outlying islands) imputed as O (Other)."),
side = 1, line = 3.8, cex = 0.85)
demand_xy_sf <- st_as_sf(
data.frame(x = xy_demand[, 1], y = xy_demand[, 2]),
coords = c("x", "y"), crs = CRS_HK
)
demand$pop <- terra::extract(pop_r, terra::vect(demand_xy_sf))[, 2]
# road_dens_im is on a km grid (after rescale.psp), so interp.im needs
# point coordinates in km, not metres.
demand$road_dens <- spatstat.geom::interp.im(
road_dens_im,
x = xy_demand[, 1] / rsc,
y = xy_demand[, 2] / rsc
)
demand$dist_tran <- nncross(
ppp(xy_demand[, 1] / rsc, xy_demand[, 2] / rsc,
window = pattern.transit$window),
pattern.transit
)$dist
demand %>%
st_drop_geometry() %>%
group_by(station_tier) %>%
summarise(
n = n(),
pop_mean = mean(pop, na.rm = TRUE),
viirs_mean = mean(viirs_ntl, na.rm = TRUE),
road_mean = mean(road_dens, na.rm = TRUE),
dtran_mean = mean(dist_tran, na.rm = TRUE),
.groups = "drop"
) %>%
kable(digits = 3,
caption = "Demand POI covariate means by NTL-derived station tier",
col.names = c("Tier", "N POIs", "Pop (per cell)", "VIIRS",
"Road dens", "Dist to transit (km)")) %>%
kable_styling(full_width = FALSE)
| Tier | N POIs | Pop (per cell) | VIIRS | Road dens | Dist to transit (km) |
|---|---|---|---|---|---|
| Small | 1294 | 409.0 | 33.99 | 10.98 | 0.291 |
| Medium | 1304 | 507.9 | 62.36 | 17.38 | 0.172 |
| Large | 1277 | 509.6 | 90.38 | 22.55 | 0.185 |
demand %>%
st_drop_geometry() %>%
ggplot(aes(x = log1p(pop), y = log1p(viirs_ntl), colour = station_tier)) +
geom_point(alpha = 0.5, size = 0.6) +
scale_colour_viridis_d(end = 0.85) +
labs(
title = "Demand POI covariates: residential population vs. nighttime light",
subtitle = "Each point is one CMF/COM POI; tier from NTL terciles",
x = "log(1 + WorldPop pop)", y = "log(1 + VIIRS NTL)",
colour = "Tier"
) +
theme_minimal(base_size = 11)
The economic question — “where should we put which station tier?” — drives the order of the analysis. We first establish that demand is non-random and inhomogeneous (4.1), study its second-order structure (4.2), identify the spatial covariates that explain that inhomogeneity (4.3), ask whether tier-specific patterns interact (4.4–4.5), validate the model (4.6) and the predicted hot-spots against post offices (4.7), and produce the tiered recommendation (4.8).
qdr <- quadratcount(pattern.um, nx = 10, ny = 10)
par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(intensity(qdr, image = TRUE),
main = "",
col = MAKO_REV(256),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(pattern.um, add = TRUE, pch = 20, cex = 0.15, cols = "white")
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Demand intensity — 10x10 quadrats (per km²)", line = 2)
mtext("White dots = individual POIs; brightest cell ≈ Yau Tsim Mong",
side = 1, line = 3.8, cex = 0.85)
quadrat.test(pattern.um, nx = 10, ny = 10, alternative = "clustered")
##
## Chi-squared test of CSR using quadrat counts
##
## data: pattern.um
## X2 = 22605, df = 93, p-value <0.0000000000000002
## alternative hypothesis: clustered
##
## Quadrats: 94 tiles (irregular windows)
clarkevans.test(pattern.um, alternative = "clustered")
##
## Clark-Evans test
## CDF correction
## Z-test
##
## data: pattern.um
## R = 0.2, p-value <0.0000000000000002
## alternative hypothesis: clustered (R < 1)
hopskel.test(pattern.um, alternative = "clustered")
##
## Hopkins-Skellam test of CSR
## using F distribution
##
## data: pattern.um
## A = 0.0023, p-value <0.0000000000000002
## alternative hypothesis: clustered (A < 1)
Following class 4 we estimate the intensity by kernel smoothing under data-driven bandwidth rules. Hong Kong is highly inhomogeneous (urban core vs. country parks), so we expect short bandwidths to over-fit and Scott / CvL to give the operationally meaningful map.
bw_ppl <- bw.ppl(pattern.um)
bw_scott <- bw.scott(pattern.um)
d_ppl <- density(pattern.um, sigma = bw_ppl,
leaveoneout = FALSE, positive = TRUE)
d_scott <- density(pattern.um, sigma = bw_scott,
leaveoneout = FALSE, positive = TRUE)
par(mfrow = c(1, 2), mar = c(5, 4.5, 3.5, 5), bty = "n")
plot(d_ppl,
main = "",
col = MAKO_REV(256),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
title(main = "KDE — likelihood CV (ppl)", line = 1.5)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
plot(d_scott,
main = "",
col = MAKO_REV(256),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
title(main = "KDE — Scott rule", line = 1.5)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
par(mfrow = c(1, 1))
K_hom <- Kest(pattern.um,
correction = c("border", "isotropic", "Ripley", "best"))
K_inh <- Kinhom(pattern.um,
correction = c("border", "isotropic", "Ripley", "best"))
L_inh <- Linhom(pattern.um)
g_inh <- pcfinhom(pattern.um)
par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(K_hom, main = "K (homogeneous)")
plot(K_inh, main = "K (inhomogeneous)")
plot(L_inh, . - r ~ r, main = "L_inhom centred on r")
plot(g_inh, main = "g_inhom (pair correlation)")
par(mfrow = c(1, 1))
If \(K_{\text{inhom}}\) stays above the CSR line after centring on the inhomogeneous intensity, the residual clustering is additional to first-order variation — that is, sites attract more sites given the local intensity. We confirm with a Monte Carlo envelope and the MAD test.
# Smoke-test pass: nsim=9 + rmax=3 km to keep this chunk under ~2 min.
# For the FINAL report, raise nsim to 99 and drop the rmax cap.
NSIM_FAST <- 9
RMAX_KM <- 3
E_g <- envelope(pattern.um, Linhom,
nsim = NSIM_FAST, rank = 1, global = TRUE,
rmax = RMAX_KM, verbose = FALSE)
plot(E_g, . - r ~ r,
main = sprintf("Linhom global envelope (%d sims, r ≤ %d km)",
NSIM_FAST, RMAX_KM))
mad.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
##
## Maximum absolute deviation test of CSR
## Monte Carlo test based on 9 simulations
## Summary function: L[inhom](r)
## Reference function: theoretical
## Alternative: two.sided
## Interval of distance values: [0, 3] km
## Test statistic: Maximum absolute deviation
## Deviation = observed minus theoretical
##
## data: pattern.um
## mad = 1.4, rank = 1, p-value = 0.1
dclf.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
##
## Diggle-Cressie-Loosmore-Ford test of CSR
## Monte Carlo test based on 9 simulations
## Summary function: L[inhom](r)
## Reference function: theoretical
## Alternative: two.sided
## Interval of distance values: [0, 3] km
## Test statistic: Integral of squared absolute deviation
## Deviation = observed minus theoretical
##
## data: pattern.um
## u = 4.4, rank = 1, p-value = 0.1
Following class 8, we move from description to a Poisson process model. Log-intensity is loglinear in covariates: \(\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}\).
m0 <- ppm(pattern.um ~ 1)
m1 <- ppm(pattern.um ~ im_pop)
m2 <- ppm(pattern.um ~ im_pop + im_viirs)
m3 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran)
m4 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran + im_zone)
coef(summary(m4))
data.frame(model = c("m0", "m1", "m2", "m3", "m4"),
AIC = c(AIC(m0), AIC(m1), AIC(m2), AIC(m3), AIC(m4)))
anova(m0, m1, m2, m3, m4, test = "LR")
A unit increase in any covariate multiplies the intensity by \(\exp(\beta)\). We expect \(\beta_{\text{pop}}, \beta_{\text{viirs}}, \beta_{\text{road}} > 0\) (more people / activity / connectivity \(\to\) more POIs) and \(\beta_{\text{dtran}} < 0\) (further from transit \(\to\) fewer POIs). \(\alpha_C\) should be the largest zone effect.
m_marks <- ppm(pattern.m ~ marks)
m_marksx <- ppm(pattern.m ~ marks + im_pop + im_viirs + im_road + im_dtran)
m_inter <- ppm(pattern.m ~ marks * im_pop +
im_viirs + im_road + im_dtran)
coef(summary(m_inter))
anova(m_marksx, m_inter, test = "LR")
The interaction \(\text{marks} \times \text{im\_pop}\) tests whether residential population pulls in different tiers at different rates: large hubs are expected to scale less than 1-for-1 with population (catchment areas are bigger), while small pick-up points should scale closer to 1-for-1.
Three plots, each answering a different sub-question about how the three station tiers relate in space:
| Plot | Question | Reads |
|---|---|---|
alltypes(Kcross.inhom) (9-panel) |
At a given distance r, does tier i attract or repel tier j relative to CSR independence? | A curve above the dashed envelope = attraction; below = repulsion. |
relrisk(casecontrol = TRUE, control = "Small") |
Where on the map does each non-control tier dominate, relative to Small? | Bright pixels = Medium/Large is more likely than Small in that pixel; dark = vice versa. |
envelope(Lcross, simulate = rlabel) (Large vs Small,
Large vs Medium) |
Is the cross-pattern significantly different from random labelling at small r? | Observed curve outside the grey band ⇒ reject “marks are exchangeable” ⇒ spatial segregation. |
The redundant 9-panel markconnect and the second
relrisk(control = "Medium") plot were dropped from the
previous draft — they restate the same signal.
# 1. Pairwise inhomogeneous K_cross — the canonical second-order summary
plot(alltypes(pattern.m, Kcross.inhom, rmax = RMAX_KM),
main = sprintf("Inhomogeneous K_cross by station tier (r ≤ %d km)",
RMAX_KM))
# 2. Log-relative-risk surface: where is each tier favoured over Small?
plot(relrisk(pattern.m, casecontrol = TRUE, relative = TRUE,
control = "Small"),
main = "Relative risk: tier vs Small (log scale)",
col = MAKO_REV(256))
# 3. Random-label envelopes on Lcross — formal segregation tests for the
# two pairs that matter operationally (hub vs gap-filler, hub vs feeder).
E_seg_LS <- envelope(
pattern.m, Lcross,
i = "Large", j = "Small",
nsim = NSIM_FAST,
rmax = RMAX_KM,
simulate = expression(rlabel(pattern.m)),
global = TRUE,
verbose = FALSE
)
plot(E_seg_LS, . - r ~ r,
main = sprintf(
"Random-label envelope on L_cross(Large, Small) (%d sims)",
NSIM_FAST))
E_seg_LM <- envelope(
pattern.m, Lcross,
i = "Large", j = "Medium",
nsim = NSIM_FAST,
rmax = RMAX_KM,
simulate = expression(rlabel(pattern.m)),
global = TRUE,
verbose = FALSE
)
plot(E_seg_LM, . - r ~ r,
main = sprintf(
"Random-label envelope on L_cross(Large, Medium) (%d sims)",
NSIM_FAST))
A positive \(K_{\text{cross,Large,Medium}}(r)\) above the independence line for small \(r\) implies that medium stations sit near large hubs — consistent with a hub-and-spoke spatial logic. If \(K_{\text{cross,Large,Small}}(r)\) stays close to the independence line, small pick-up points are spatially complementary and disperse to fill catchment gaps.
# Four-panel diagnostic (Baddeley class 8, p. 401):
# top-left: residual measure | top-right: lurking on Y
# bot-left: lurking on X | bot-right: smoothed residual field
# This panel set already includes the smoothed-residual surface, so we
# do NOT call Smooth(residuals(.)) separately — Smooth.msr -> density.ppp
# is fragile around boundary pixels where predict.ppm can be NA.
par(mar = c(4, 4, 3, 4), cex.axis = 0.85, cex.lab = 0.9)
# diagnose.ppm's default "image" panels use a fluorescent green background
# that clashes with the rest of the report's mako palette.
# Switch spatstat to monochrome (greyscale) for this one call only, then
# restore. Combined with plot.smooth = "contour" the bottom-right panel
# becomes contours-only.
old_mono <- spatstat.options("monochrome")
spatstat.options(monochrome = TRUE)
diagnose.ppm(m4, plot.smooth = "contour")
spatstat.options(monochrome = old_mono)
# Partial residuals: how does the fitted log-intensity track each
# continuous covariate?
par.res_pop <- parres(m4, "im_pop")
plot(par.res_pop, main = "Partial residual: population")
par.res_dt <- parres(m4, "im_dtran")
plot(par.res_dt, main = "Partial residual: distance to transit")
# Lurking variable plot: any signed-residual trend along im_dtran?
# The envelope simulation prints "1, 2, 3, ..." to stdout for each pattern;
# capture.output() keeps that out of the rendered HTML while still letting
# the plot reach the graphics device.
invisible(capture.output(
lurking(m4, im_dtran, type = "raw", cumulative = FALSE,
envelope = TRUE, nsim = NSIM_FAST,
main = "Lurking: distance to transit")
))
## Model diagnostics (raw residuals)
## Diagnostics available:
## four-panel plot
## mark plot
## smoothed residual field
## x cumulative residuals
## y cumulative residuals
## sum of all residuals
## sum of raw residuals in entire window = -0.0000002796
## area of entire window = 2754
## quadrature area = 825.9
## range of smoothed field = [-0.352, 0.2126]
post <- jsonlite::fromJSON(fp_post, simplifyDataFrame = TRUE)$data %>%
as_tibble() %>%
mutate(longitude = as.numeric(longitude),
latitude = as.numeric(latitude)) %>%
filter(!is.na(longitude), !is.na(latitude))
post_sf <- st_as_sf(post, coords = c("longitude", "latitude"),
crs = CRS_LL) %>%
st_transform(CRS_HK) %>%
st_intersection(hk_bdry)
post_xy <- st_coordinates(post_sf)
pattern.po <- ppp(x = post_xy[, 1], y = post_xy[, 2], window = W) %>%
rjitter(1) %>%
rescale.ppp(rsc, "km")
cat("Hongkong Post offices (in window):", npoints(pattern.po), "\n")
## Hongkong Post offices (in window): 119
combined <- superimpose(
poi = unmark(pattern.um),
post = pattern.po
)
par(mar = c(5, 4.5, 4, 3), bty = "n")
plot(Window(pattern.um),
main = "",
xlim = DEV_XLIM, ylim = DEV_YLIM,
border = "grey60",
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "grey70", lwd = 0.3)
plot(unmark(pattern.um), add = TRUE,
pch = 20, cex = 0.25, cols = "grey60")
plot(pattern.po, add = TRUE, pch = 24, cex = 0.9,
bg = "gold", cols = "black")
title(main = "POIs and Hongkong Post offices", line = 2)
legend("topright", inset = 0.01,
legend = c("POI (CMF/COM)", "Post office"),
pch = c(20, 24),
col = c("grey60", "black"),
pt.bg = c(NA, "gold"),
pt.cex = c(0.8, 1),
bty = "o", bg = "white", cex = 0.9)
K_cross_pp <- Kcross(combined, i = "post", j = "poi",
correction = c("isotropic", "Ripley"))
plot(K_cross_pp, main = "K_cross: post offices vs. POIs")
dpost <- nncross(unmark(pattern.um), pattern.po)$dist
hist(dpost * rsc, breaks = 60,
col = "steelblue", border = "white",
main = "Demand POI -> nearest Hongkong Post office",
xlab = "Distance (m)")
A \(K_{\text{cross}}(r)\) above the theoretical line at small \(r\) confirms that the empirical post-office network already captures the high-demand zones our model predicts. Where it does not — that is, predicted hot-spots that are far from any post office — those are the locations where new courier stations would add the most marginal coverage.
# Higher-resolution prediction grid (256 -> 512) for smoother maps.
lambda_hat <- predict.ppm(m4, type = "intensity", dimyx = 512)
W_km <- Window(lambda_hat)
# DEV_XLIM / DEV_YLIM are defined in the `boundary` chunk so that every
# map in the report (roads, quadrat, recommend) shares the same frame.
par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(lambda_hat,
main = "",
col = MAKO_REV(256),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Fitted intensity (POIs / km²)", line = 2)
mtext("m4 model; bright cluster ≈ Yau Tsim Mong – Sham Shui Po corridor",
side = 1, line = 3.8, cex = 0.85)
# Tier cut-offs (top 30%, 90%, 98% of the fitted intensity).
brks <- quantile(as.vector(lambda_hat),
probs = c(0.70, 0.90, 0.98), na.rm = TRUE)
tier_im <- eval.im(
ifelse(lambda_hat >= brks[3], 3,
ifelse(lambda_hat >= brks[2], 2,
ifelse(lambda_hat >= brks[1], 1, NA)))
)
plot(tier_im,
main = "",
col = MAKO_DISC(3),
xlim = DEV_XLIM, ylim = DEV_YLIM,
axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)", side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Recommended station tiers (1 = Small / 2 = Medium / 3 = Large)",
line = 2)
mtext("Cut-offs = 70 % / 90 % / 98 % quantiles of fitted intensity",
side = 1, line = 3.8, cex = 0.85)
# === Coverage-gap analysis ===
# "Gap" = high-intensity pixel whose nearest post office is far enough
# that a new courier station would add real marginal value.
# tier_im >= 2 (top 30 % of fitted intensity, i.e. Medium OR Large)
# post_im > 0.5 (further than 500 m from any post office)
post_im <- distmap(pattern.po)
gap_logical <- eval.im(tier_im >= 2 & post_im > 0.5)
gap_owin <- as.owin(gap_logical) # the TRUE-region as an owin
cat("Coverage-gap pixels (TRUE / total non-NA):",
sum(gap_logical$v, na.rm = TRUE), "/",
sum(!is.na(gap_logical$v)), "\n")
## Coverage-gap pixels (TRUE / total non-NA): 2743 / 19760
# All layers below are in km units (lambda_hat / gap_owin / pattern.po).
# bty="n" + box=FALSE removes the duplicate vertical-line on the y-axis.
plot(W_km,
main = "",
xlim = DEV_XLIM, ylim = DEV_YLIM,
border = "grey60",
axes = TRUE,
box = FALSE,
xlab = "HK1980 Grid East (km)",
ylab = "HK1980 Grid North (km)")
plot(gap_owin, add = TRUE, col = "firebrick", border = NA)
plot(pattern.po, add = TRUE, pch = 24, cex = 0.9,
bg = "gold", cols = "black")
title(main = "Coverage gap (top-30 % intensity & > 500 m from any PO)",
line = 2)
legend("topright", inset = 0.01,
legend = c("Coverage-gap zone", "Post office"),
fill = c("firebrick", NA),
border = c(NA, NA),
pch = c(NA, 24),
pt.bg = c(NA, "gold"),
col = c(NA, "black"),
bty = "o", bg = "white", cex = 0.9)
mtext("Red = highest-marginal-utility courier-station sites",
side = 1, line = 2.7, cex = 0.85)
# Top-10 high-demand 500m grid cells, ranked by fitted intensity,
# with each cell's distance to the nearest existing post office.
# Use the lambda_hat im itself: extract pixel centres of the top decile.
lh_mat <- as.matrix(lambda_hat)
lh_xcol <- lambda_hat$xcol
lh_yrow <- lambda_hat$yrow
# Sample on a 500 m grid step so the top-10 are operationally distinct.
step <- 0.5 # km
xs <- seq(min(lh_xcol), max(lh_xcol), by = step)
ys <- seq(min(lh_yrow), max(lh_yrow), by = step)
grid_df <- expand.grid(x = xs, y = ys) %>%
as_tibble() %>%
mutate(
lambda = spatstat.geom::interp.im(lambda_hat, x, y),
dist_post_m = spatstat.geom::interp.im(post_im, x, y) * rsc
) %>%
filter(is.finite(lambda),
inside.owin(x, y, w = Window(lambda_hat))) # keep only in-window cells
top10_overall <- grid_df %>%
slice_max(lambda, n = 10) %>%
arrange(desc(lambda)) %>%
mutate(rank = row_number())
top10_gap <- grid_df %>%
filter(dist_post_m > 500) %>%
slice_max(lambda, n = 10) %>%
arrange(desc(lambda)) %>%
mutate(rank = row_number())
# Helper: HK1980 Grid km --> WGS84 lon/lat so the coords are usable
# (paste lat,lon into Google Maps).
add_lonlat <- function(df) {
pts <- sf::st_as_sf(
df %>% mutate(x_m = x * rsc, y_m = y * rsc),
coords = c("x_m", "y_m"), crs = CRS_HK
) %>%
sf::st_transform(CRS_LL)
ll <- sf::st_coordinates(pts)
df %>%
mutate(lon = round(ll[, 1], 4),
lat = round(ll[, 2], 4))
}
# Persist these for the interactive leaflet chunk that follows.
top10_overall_ll <- top10_overall %>% add_lonlat()
top10_gap_ll <- top10_gap %>% add_lonlat()
top10_overall_ll %>%
transmute(rank,
x_km = round(x, 2), y_km = round(y, 2),
lon, lat,
lambda = round(lambda, 1),
dist_post_m = round(dist_post_m, 0)) %>%
kable(caption = "Top-10 highest-intensity 500-m cells (any distance to PO)") %>%
kable_styling(full_width = FALSE)
| rank | x_km | y_km | lon | lat | lambda | dist_post_m |
|---|---|---|---|---|---|---|
| 1 | 835.3 | 819.9 | 114.2 | 22.32 | 236.6 | 449 |
| 2 | 835.3 | 820.4 | 114.2 | 22.32 | 219.3 | 228 |
| 3 | 834.8 | 821.4 | 114.2 | 22.33 | 159.8 | 179 |
| 4 | 834.8 | 820.9 | 114.2 | 22.33 | 112.8 | 501 |
| 5 | 834.8 | 819.9 | 114.2 | 22.32 | 108.7 | 331 |
| 6 | 830.3 | 825.9 | 114.1 | 22.37 | 97.4 | 443 |
| 7 | 838.3 | 822.4 | 114.2 | 22.34 | 93.7 | 361 |
| 8 | 833.8 | 816.4 | 114.2 | 22.29 | 89.6 | 187 |
| 9 | 835.3 | 818.9 | 114.2 | 22.31 | 86.4 | 455 |
| 10 | 821.8 | 833.9 | 114.0 | 22.44 | 86.1 | 666 |
top10_gap_ll %>%
transmute(rank,
x_km = round(x, 2), y_km = round(y, 2),
lon, lat,
lambda = round(lambda, 1),
dist_post_m = round(dist_post_m, 0)) %>%
kable(caption = "Top-10 highest-intensity cells > 500 m from any post office") %>%
kable_styling(full_width = FALSE)
| rank | x_km | y_km | lon | lat | lambda | dist_post_m |
|---|---|---|---|---|---|---|
| 1 | 834.8 | 820.9 | 114.2 | 22.33 | 112.8 | 501 |
| 2 | 821.8 | 833.9 | 114.0 | 22.44 | 86.1 | 666 |
| 3 | 835.3 | 819.4 | 114.2 | 22.31 | 71.2 | 569 |
| 4 | 837.8 | 820.4 | 114.2 | 22.32 | 70.2 | 515 |
| 5 | 840.3 | 819.9 | 114.2 | 22.32 | 57.9 | 579 |
| 6 | 834.8 | 819.4 | 114.2 | 22.31 | 53.3 | 742 |
| 7 | 834.8 | 818.4 | 114.2 | 22.30 | 51.7 | 1075 |
| 8 | 838.8 | 822.4 | 114.2 | 22.34 | 47.1 | 713 |
| 9 | 835.3 | 818.4 | 114.2 | 22.30 | 46.6 | 727 |
| 10 | 832.8 | 839.9 | 114.1 | 22.50 | 45.8 | 697 |
The coverage-gap polygons, existing post offices and the top-10 recommended sites overlaid on switchable basemaps (CartoDB Positron / OSM / Esri Satellite). Click any feature for details.
# 1. gap_logical (spatstat im, km) -> sf polygons in WGS84
gap_mat <- as.matrix(gap_logical)
# spatstat im is bottom-up (yrow ascending); terra rast is top-down.
gap_mat_flip <- gap_mat[nrow(gap_mat):1, ]
gap_rast <- terra::rast(
gap_mat_flip * 1L,
extent = terra::ext(
gap_logical$xrange[1] * rsc, gap_logical$xrange[2] * rsc,
gap_logical$yrange[1] * rsc, gap_logical$yrange[2] * rsc
),
crs = paste0("EPSG:", CRS_HK)
)
names(gap_rast) <- "is_gap"
gap_polys_ll <- terra::as.polygons(gap_rast, dissolve = TRUE) %>%
sf::st_as_sf() %>%
dplyr::filter(is_gap == 1) %>%
sf::st_transform(CRS_LL) %>%
sf::st_make_valid()
# 2. Post offices -> WGS84 (post_sf already exists in CRS_HK from the
# post-office chunk; reuse if present, otherwise rebuild).
if (!exists("post_sf")) {
post_raw <- jsonlite::fromJSON(fp_post, simplifyDataFrame = TRUE)$data %>%
as_tibble() %>%
mutate(longitude = as.numeric(longitude),
latitude = as.numeric(latitude)) %>%
filter(!is.na(longitude), !is.na(latitude))
post_sf <- st_as_sf(post_raw, coords = c("longitude", "latitude"),
crs = CRS_LL) %>%
st_transform(CRS_HK) %>%
st_intersection(hk_bdry)
}
post_sf_ll <- sf::st_transform(post_sf, CRS_LL)
# 3. Top-10 coverage-gap cells -> sf POINT in WGS84
top10_gap_sf <- sf::st_as_sf(top10_gap_ll,
coords = c("lon", "lat"),
crs = CRS_LL,
remove = FALSE) %>%
mutate(label_html = sprintf(
"<b>Rank %d</b><br/>λ = %.1f POIs/km²<br/>%.0f m to nearest PO",
rank, lambda, dist_post_m
))
# 4. Render leaflet
leaflet::leaflet() %>%
leaflet::addProviderTiles(leaflet::providers$CartoDB.Positron,
group = "Light") %>%
leaflet::addProviderTiles(leaflet::providers$OpenStreetMap,
group = "OSM") %>%
leaflet::addProviderTiles(leaflet::providers$Esri.WorldImagery,
group = "Satellite") %>%
# Coverage-gap polygons
leaflet::addPolygons(
data = gap_polys_ll,
fillColor = "firebrick",
fillOpacity = 0.55,
color = "firebrick",
weight = 0.6,
smoothFactor = 0.5,
group = "Coverage-gap zones",
popup = "<b>Coverage-gap zone</b><br/>Top-30 % fitted intensity AND > 500 m from any post office"
) %>%
# Existing post offices (gold dots)
leaflet::addCircleMarkers(
data = post_sf_ll,
radius = 5,
color = "black",
weight = 1.2,
fillColor = "gold",
fillOpacity = 0.9,
group = "Post offices",
label = ~paste0(nameEN, " — ", districtEN),
popup = ~paste0("<b>", nameEN, "</b><br/>",
districtEN, "<br/>",
addressEN)
) %>%
# Top-10 recommended sites (red stars)
leaflet::addCircleMarkers(
data = top10_gap_sf,
radius = 9,
color = "white",
weight = 2,
fillColor = "red",
fillOpacity = 0.95,
group = "Top-10 recommended sites",
label = ~sprintf("Rank %d", rank),
popup = ~label_html
) %>%
leaflet::addLayersControl(
baseGroups = c("Light", "OSM", "Satellite"),
overlayGroups = c("Coverage-gap zones",
"Post offices",
"Top-10 recommended sites"),
options = leaflet::layersControlOptions(collapsed = FALSE)
) %>%
leaflet::addLegend(
position = "bottomright",
colors = c("firebrick", "gold", "red"),
labels = c("Coverage-gap zone",
"Existing post office",
"Top-10 recommended site"),
title = "Layer",
opacity = 0.9
) %>%
leaflet::setView(lng = 114.17, lat = 22.34, zoom = 11)
Three model-free CSR tests reject randomness with all p < 2 × 10⁻¹⁶:
| Test | Statistic | Interpretation |
|---|---|---|
| Pearson χ² (10×10 quadrats) | χ² = 22,605, df = 93 | observed counts ≫ Poisson expectation |
| Clark–Evans (nearest neighbour) | R = 0.20 | mean NN distance is 20 % of CSR; extreme aggregation |
| Hopkins–Skellam | A = 0.0023 | one of the most clustered patterns the test can return |
The Linhom global envelope at r ≤ 3 km gave MAD p = 0.1 and DCLF p = 0.1 — but this is the floor of a 9-simulation Monte-Carlo test (p ∈ {0.1, 0.2, …, 1.0}), so the second-order structure cannot be conclusively tested at this nsim. The CSR-test rejection is so strong, however, that this is not a substantive issue for the conclusions.
The full Poisson model m4 reduces AIC from 5,106
(intercept-only) to −15,060 — an AIC drop of
20,166 points over 7 covariate degrees of freedom. All
five covariates and three zone contrasts are significant at p <
0.001.
| Covariate | β | exp(β) | Interpretation |
|---|---|---|---|
| Intercept | +1.454 | 4.28 | Baseline POI/km² in commercial (C) zone, zero covariates |
pop |
+0.000 915 | 1.001 | +1000 residents → intensity ×2.50 |
viirs |
−0.00802 | 0.992 | Net negative after controls — see below |
road_density |
+0.151 | 1.16 | +1 km road/km² → intensity ×1.16 |
dist_transit |
−2.49 | 0.083 | +1 km from transit → intensity drops 92 % |
zone = I |
−1.354 | 0.26 | Industrial zone POI density = 26 % of C |
zone = O |
−0.622 | 0.54 | Other = 54 % of C |
zone = R |
−0.193 | 0.82 | Residential = 82 % of C |
Two findings deserve emphasis.
pop and
road, leaving a contaminating signal on top.
Practical implication: do not use NTL alone as a parcel-demand
proxy for Hong Kong — it is collinear with cleaner measures and
adds noise when stacked on them.Ordering by fitted intensity at otherwise-average covariates: C ≫ R > O > I. Commercial zones beat industrial zones by a factor of ≈ 4 in commercial-POI intensity, exactly as expected.
The interaction model m_inter improves over the additive
marked model m_marksx with LR deviance = 53.2 on 2
d.f., p = 2.8 × 10⁻¹² — the three tiers respond to population
differently. But the direction is opposite to the
original hypothesis:
marksMedium : im_pop = +0.000 60 (Z = 6.4, p <
0.001)marksLarge : im_pop = +0.000 59 (Z = 6.3, p <
0.001)That is, Medium and Large tiers scale more steeply with population than Small, not less. The hub-and-spoke “large hubs serve big catchments” story is not supported: at the spatial resolution of this analysis, Large and Medium tiers pile up where the population already is. Small POIs are the only tier that holds its own in low-population areas.
This is partly an artefact of the mark construction: tiers are derived from VIIRS terciles, and VIIRS is itself correlated with population, so the interaction is partly mechanical. The substantive interpretation that survives this caveat is that Small stations are the right instrument for population gaps — not because Large scales poorly, but because Large simply does not appear there.
The random-label Lcross envelopes (Large vs Small, Large vs Medium) at r ≤ 3 km test the segregation H₀ that marks are exchangeable. With nsim = 9 the p-floor is again 0.1, so this is suggestive rather than conclusive; the visible departure of the observed Lcross from the grey band in the report is consistent with non-random labelling but should be re-run at nsim = 99 for the final version.
Combining the fitted intensity surface with the post-office distance field, the two “top-10” tables in §4.8 give the actionable list. The WGS84 lon/lat columns can be pasted straight into Google Maps.
Tier-1 (Large hubs) — the absolute-intensity list. Nine of the top 10 cells sit inside the Yau Tsim Mong / Sham Shui Po corridor:
| Rank | Approx area | (lon, lat) | λ̂ (POIs / km²) |
|---|---|---|---|
| 1 | Yau Ma Tei – Mong Kok | (114.2, 22.32) | 236.6 |
| 2 | Mong Kok | (114.2, 22.32) | 219.3 |
| 3 | Tai Kok Tsui / Sham Shui Po | (114.2, 22.33) | 159.8 |
| 4–7 | Same corridor | … | 100–115 |
| 8 | Hung Hom / TST East | (114.2, 22.29) | 89.6 |
| 10 | Tin Shui Wai (NT West, outlier) | (114.0, 22.44) | 86.1 |
Eight of the ten are within 500 m of an existing Hongkong Post office, so a new Large hub here adds throughput rather than coverage. The NT-West outlier (Tin Shui Wai) is the only Tier-1 cell that is also a real coverage gap (666 m to the nearest PO) — strongly recommended as the first Tier-1 site to actually build.
Tier-2 (Medium stations) — the marginal-coverage list (most useful output of the whole study). Top 10 high-intensity cells that are > 500 m from any post office:
These are the operationally optimal sites. The NT-North cells (Rank 2, 7, 10) are especially attractive because their relative-coverage gap is much larger than in central Kowloon.
Tier-3 (Small stations) — fill in along transit
corridors in the mid-intensity band (tier_im = 1).
Because each kilometre away from transit slashes intensity by 92 %,
small pickup points should sit within 300 m of a transit
node (MTR, light rail, ferry pier, or bus terminus from
iGeoCom) to remain operationally relevant. The natural Small-station
belt: Tseung Kwan O, Sha Tin, Tuen Mun, Tai Po, and the
further NT outlying-island piers (Mui Wo, Peng Chau, Cheung Chau).
demand_weight by POI
sub-type is the most consequential improvement to the next
iteration.im_viirs as “darker areas
have more shops”; interpret it as “after controlling for what we already
measure cleanly, the residual VIIRS variation is noise”.O polygons) is recommended.im_viirs covariate. Mark coefficients in
the interaction model should be read as conditional on this
endogeneity.The K-means tiers from Qu (2026, RPubs/1391023) and the NTL-derived tiers here agree on macro patterns (Central / Tsim Sha Tsui / Causeway Bay are top-tier in both) but disagree on meso patterns: industrial Tuen Mun and Kwai Chung Container Terminal are mid-tier under VIIRS (because they are lit but not residential-dense), while K-means classified them as top-tier (because they are working-population-dense). This is the VIIRS-vs-population signal that the m4 ppm picks apart explicitly. The next iteration should use both as joint marks.
kppm with
clusters = "Thomas") as an alternative to the inhomogeneous
Poisson if the residual second-order structure remains strong after
m4.kernstadapt and stpp machinery
from class 10–11 to add a space–time dimension once parcel-flow time
series become available.lpp
on the major-road linnet (class 10) so that distances
respect the road network, not the Euclidean metric.iGeoCom POIs — Lands Department, Hong Kong (CSDI
Portal).hkg_ppp_2020.tif) —
WorldPop.org.