1 Introduction

Hong Kong is one of the densest urban economies in the world, with one of the highest e-commerce penetration rates in Asia. Last-mile parcel logistics — sorting hubs, locker stations, and pick-up points — therefore competes for very limited urban space. Where these facilities are placed has direct effects on delivery cost, courier travel distance, and customer waiting time. The economic question we ask is concrete: given where commercial activity actually takes place in Hong Kong, where should large, medium, and small courier stations be located?

In the first-semester project (Qu, 2026, RPubs/1391023) we approached this question through area-level K-means clustering of working population and household income across Small Tertiary Planning Unit Groups (STPUGs). The clustering yielded three tiers of districts and informed a tiered station recommendation. That study had a clear limitation: it operated on aggregated polygons, so it could not say anything about within-area concentration of demand or about the geometry of station–station spacing.

This project upgrades the analysis to a spatial point pattern analysis (SPPA). We treat commercial points of interest (POIs) — the CMF/COM classes of Hong Kong’s official iGeoCom POI dataset — as a proxy for parcel demand, and we apply the methods discussed in classes 4–9 (Baddeley, Rubak & Turner, 2015; Kopczewska, 2020) to ask three questions.

  1. Is courier demand in Hong Kong clustered? We use first-order intensity (KDE, quadrat tests) and second-order summary functions (Ripley’s \(K\), \(L\), \(g\)) to test against complete spatial randomness (CSR), under both homogeneous and inhomogeneous baselines.
  2. What drives the spatial intensity of demand? We fit inhomogeneous Poisson point process models with ppm() using resident population (WorldPop), nighttime light radiance (VIIRS Black Marble), road density (OSM), distance to the nearest public-transit node (multi-modal: MTR, light rail, ferry, bus terminal), and statutory land-use zone (Hong Kong OZP) as covariates.
  3. How do tiered station types interact in space? We mark each demand point by the local NTL tercile (Small / Medium / Large) and study the marks with Kcross.inhom, markconnect, Iest, and a Monte-Carlo segregation test. We finally validate the model-implied high-demand zones against the empirical pattern of Hongkong Post offices via Kcross.

The economic deliverable is a tiered map of recommended courier station sites, backed by significance tests rather than visual inspection alone.

requiredPackages <- c(
  "sf", "spatstat", "terra", "sp", "magrittr",
  "ggplot2", "tidyverse", "viridis", "jsonlite",
  "dixon", "kableExtra", "gridExtra", "patchwork", "scales", "rnaturalearth",
  "ggspatial", "prettymapr", "raster", "leaflet", "htmltools","dplyr"
)
for (p in requiredPackages) {
  if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
}
invisible(lapply(requiredPackages, library, character.only = TRUE))

# Shared map styling — every map in the report uses the same palette
# and km-based axis frame so they read as one consistent series.
MAKO_REV    <- function(n = 256) viridisLite::mako(n, direction = -1)
MAKO_DISC   <- function(n)       viridisLite::mako(n, direction = -1)
# Helper that draws axes with km labels for terra rasters (whose native
# units are HK1980 metres).
draw_km_axes <- function(xrange_m, yrange_m) {
  axis(1, at = pretty(xrange_m), labels = pretty(xrange_m) / 1000)
  axis(2, at = pretty(yrange_m), labels = pretty(yrange_m) / 1000, las = 1)
  mtext("HK1980 Grid East (km)",  side = 1, line = 2.5, cex = 0.95)
  mtext("HK1980 Grid North (km)", side = 2, line = 2.5, cex = 0.95)
}

2 Literature review

2.1 Spatial point pattern analysis

The methodological backbone is Baddeley, Rubak & Turner (2015), Spatial Point Patterns: Methodology and Applications with R. We use the workflow taught in classes 4–9: build a ppp object inside an owin window, summarise its first-order intensity (homogeneous and inhomogeneous), test CSR with quadrat.test, clarkevans.test, and hopskel.test, and study second-order structure with Kest, Lest, and pcf (with their inhomogeneous counterparts Kinhom, Linhom, pcfinhom). Departures from CSR are assessed with global and pointwise envelopes and with the mad.test and dclf.test Monte Carlo tests (classes 5 and 8).

Kopczewska (2020), Applied Spatial Statistics and Econometrics, provides the framing for moving from spatial description to a regression-style explanation: the intensity of points is treated as the dependent variable in a Poisson process model whose log-intensity is loglinear in covariates. We follow its applied treatment of spatial residual diagnostics, lurking variable plots, and the distinction between inhomogeneity that is covariate-explained and inhomogeneity that is residual clustering.

Arbia, Espa & Giuliani (2021), Spatial Microeconometrics, motivates the economic framing. They argue that firms (and, by extension, retail-related points) cannot be modelled as if independently located: agglomeration externalities, infrastructure access, and zoning constraints generate observable spatial dependence. The authors recommend a two-step strategy — first remove inhomogeneity attributable to first-order covariates, then test whether second-order clustering remains. We adopt this recommendation: the inhomogeneous K function \(K_{\text{inhom}}\) is interpreted only after fitting the inhomogeneous Poisson model.

2.2 Marked patterns and multitype Poisson processes

For the tiered-station question we follow class 9 (rmpoispp, ppm(... ~ marks), Kcross.inhom, markconnect, Iest, relrisk.ppp). The literature on retail and firm location (Arbia et al., 2021, ch. 4–5) treats interaction between firm types as analogous to interaction between species in ecology, but with the added constraint that economic types compete for and segregate within land-use zones. This motivates using zone as a covariate in the marked model and segregation.test.ppp as a model-free test of mark separation.

2.3 POIs as a proxy for parcel demand

Direct parcel-flow data are not publicly available for Hong Kong. We follow a literature that uses POIs as a high-resolution proxy for commercial activity and for delivery demand (Wang et al., 2019, on POI density as a logistics demand indicator; Schroeder & Cesarini, 2022, on parcel-locker siting). The implicit assumption is monotonicity: areas with more shops, offices, and food-and-beverage points generate more parcels, even if the per-POI parcel rate differs by type.

Compared with OpenStreetMap, the official Hong Kong iGeoCom dataset (Lands Department) supplies categorically clean and quality-controlled POIs with classification codes (CLASS/TYPE/SUBCAT) that map directly to the zoning system. We therefore use iGeoCom for both demand POIs and transport-node accessibility; OSM is used only for road density.

2.4 Validation against existing logistics infrastructure

Following Arbia et al. (2021, ch. 6), we treat Hongkong Post offices as an observed control point pattern: they were sited under a public-service mandate (universal coverage), not under profit maximisation. If our predicted high-intensity zones spatially coincide with post offices at short distance (low \(r\) in \(K_{\text{cross}}\)), this confirms that the predicted zones are operationally accessible. If at intermediate distance the cross-K is below the independence baseline, this is consistent with post offices having captured the easy sites and our recommendations identifying complementary, non-overlapping locations.

3 Data — description and graphs

3.1 Sources

The dataset is constructed exactly as specified in HK_dataset_checklist.xlsx. All paths are relative to the project folder.

Block Variable File CRS
Window 18 districts boundary data/hksar_18_district_boundary.json WGS84
Demand iGeoCom POIs (CLASSCMF, COM) data/iGeoCom.geojson WGS84
Transit iGeoCom transit nodes (TYPE ∈ RSN/LRA/FER/PIE/FET/BUS) data/iGeoCom.geojson WGS84
Covariate Resident population data/hkg_ppp_2020.tif (WorldPop) WGS84
Covariate Nighttime light radiance data/VNL_npp_2024_..._median_masked.dat.tif.gz WGS84
Covariate Road segments (for road density) data/hong-kong-260507-free.shp/gis_osm_roads_free_1.shp WGS84
Covariate OZP land-use zoning data/GeoJSON_Statutory_Plans/Statutory Plan GIS Data GeoJSON/ZONE.json HK1980
Validation Hongkong Post offices data/post-office.json WGS84

All layers are projected to HK1980 Grid (EPSG:2326) for analysis (units in metres, then rescaled to km in the ppp object).

data_dir <- "data"

fp_bdry      <- file.path(data_dir, "hksar_18_district_boundary.json")
fp_igeocom   <- file.path(data_dir, "iGeoCom.geojson")
fp_pop       <- file.path(data_dir, "hkg_ppp_2020.tif")
fp_viirs     <- file.path(data_dir,
                          "VNL_npp_2024_global_vcmslcfg_v2_c202502261200.median_masked.dat.tif.gz")
fp_roads     <- file.path(data_dir, "hong-kong-260507-free.shp",
                          "gis_osm_roads_free_1.shp")
fp_ozp_zone  <- file.path(data_dir, "GeoJSON_Statutory_Plans",
                          "Statutory Plan GIS Data GeoJSON", "ZONE.json")
fp_post      <- file.path(data_dir, "post-office.json")

CRS_HK <- 2326   # HK 1980 Grid (metres)
CRS_LL <- 4326   # WGS 84

3.2 Helper: read Esri JSON polygon layer

The Hong Kong Town Planning Board ships the OZP statutory plans as Esri JSON (despite the folder name). The structure is features[[i]]$attributes + features[[i]]$geometry$rings, not the properties + coordinates of GeoJSON. We write a small parser.

read_esri_polygon <- function(path) {
  raw <- jsonlite::fromJSON(path, simplifyVector = FALSE)
  sr  <- raw$spatialReference
  crs_in <- if (!is.null(sr$latestWkid)) sr$latestWkid
            else if (!is.null(sr$wkid))  sr$wkid
            else 2326
  feats <- raw$features
  geoms <- lapply(feats, function(f) {
    rings <- lapply(f$geometry$rings, function(r) {
      matrix(unlist(r), ncol = 2, byrow = TRUE)
    })
    sf::st_polygon(rings)
  })
  attrs <- do.call(rbind, lapply(feats, function(f) {
    as.data.frame(lapply(f$attributes,
                         function(x) if (is.null(x)) NA else x),
                  stringsAsFactors = FALSE)
  }))
  sf::st_sf(attrs, geometry = sf::st_sfc(geoms, crs = crs_in))
}

3.3 Boundary and observation window

hk18 <- st_read(fp_bdry, quiet = TRUE) %>%
  st_transform(CRS_HK) %>%
  st_make_valid()

hk_bdry <- st_union(hk18) %>% st_make_valid()
W <- as.owin(hk_bdry)

# rescaling factor: 1 unit in km = 1000 m
rsc <- 1000

# Shared map extent in HK1980 Grid km — used by every map in the report
# so that visual comparisons across covariates / maps are honest.
# Wide enough to show all of HK including the outlying islands.
DEV_XLIM <- c(795, 860)
DEV_YLIM <- c(805, 848)

# Scaled district polygons (km units, CRS stripped) — for overlaying on
# spatstat im / ppp plots that already live in km after rescale.ppp().
hk18_km <- hk18
sf::st_geometry(hk18_km) <- sf::st_geometry(hk18) * diag(c(1 / rsc, 1 / rsc))
sf::st_crs(hk18_km) <- NA

par(mar = c(4, 4, 4.5, 2), bty = "n")
plot(W, main = "")
plot(st_geometry(hk18), border = "grey60", add = TRUE)
title(main = "Hong Kong observation window (HK1980 Grid)", line = 2)

3.4 iGeoCom POIs: demand points and transit nodes

igc <- st_read(fp_igeocom, quiet = TRUE)
igc <- st_transform(igc, CRS_HK)

# Crop to HK window
igc <- igc[st_intersects(igc, hk_bdry, sparse = FALSE)[, 1], ]

# Demand POIs: commercial classes
demand <- igc %>%
  filter(CLASS %in% c("COM", "CMF"))

# Transit nodes (4 transport modes — HK is islands so ferries matter)
transit <- igc %>%
  filter(
    (CLASS == "TRS" & TYPE %in% c("RSN", "LRA", "FER", "PIE", "FET")) |
    (CLASS == "BUS" & TYPE == "BUS")
  ) %>%
  mutate(transit_mode = case_when(
    TYPE == "RSN"                    ~ "MTR/Heavy rail",
    TYPE == "LRA"                    ~ "Light rail",
    TYPE %in% c("FER", "PIE", "FET") ~ "Ferry/pier",
    TYPE == "BUS"                    ~ "Bus terminus"
  ))

cat("Demand POIs (CLASS in {COM, CMF}):", nrow(demand), "\n")
## Demand POIs (CLASS in {COM, CMF}): 3875
cat("Transit nodes (multi-modal):     ", nrow(transit), "\n")
## Transit nodes (multi-modal):      1345
table(transit$transit_mode)
## 
##   Bus terminus     Ferry/pier     Light rail MTR/Heavy rail 
##            649            530             68             98
ggplot() +
  geom_sf(data = hk18,    fill = NA, colour = "grey70", linewidth = 0.2) +
  geom_sf(data = demand,  colour = "steelblue", size = 0.25, alpha = 0.5) +
  geom_sf(data = transit, aes(colour = transit_mode), size = 1, alpha = 0.9) +
  scale_colour_viridis_d(end = 0.85) +
  labs(title  = "iGeoCom commercial POIs (blue) and transit nodes (coloured)",
       colour = "Transit mode") +
  theme_minimal(base_size = 11)

3.5 WorldPop population raster

pop <- terra::rast(fp_pop)                       # WGS84
pop <- terra::project(pop, paste0("EPSG:", CRS_HK))
pop <- terra::crop(pop, terra::vect(hk_bdry))
pop <- terra::mask(pop, terra::vect(hk_bdry))
names(pop) <- "pop"

par(mar = c(4, 4, 4.5, 6), bty = "n")
terra::plot(pop,
            main = "", col = MAKO_REV(256),
            xlim = DEV_XLIM * rsc, ylim = DEV_YLIM * rsc,
            axes = FALSE, xlab = "", ylab = "")
draw_km_axes(DEV_XLIM * rsc, DEV_YLIM * rsc)
title(main = "WorldPop 2020 — population density", line = 2)
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
mtext("Residents per 100 m cell; 18 districts overlaid in white",
      side = 1, line = 3.6, cex = 0.85)

Coordinate reference for every map below. From here on, all maps share the same frame: HK1980 Grid (EPSG:2326), units in km. The Grid origin at (836.694, 819.070) km corresponds to Tsim Sha Tsui at latitude 22.312°N, longitude 114.179°E (WGS84). Local conversion: +1 km East ≈ +0.0097° lon, +1 km North ≈ +0.0090° lat. To paste any (x_km, y_km) into Google Maps, recompute lat/lon with sf::st_transform(); the table at §4.8 already adds lon / lat columns.

3.6 VIIRS nighttime light raster

The Black Marble file ships gzipped — terra::rast() reads it through GDAL’s virtual /vsigzip/ driver.

viirs_path <- paste0("/vsigzip/", normalizePath(fp_viirs))
viirs <- terra::rast(viirs_path)
viirs <- terra::crop(viirs, terra::ext(c(113.7, 114.6, 22.0, 22.7)))
viirs <- terra::project(viirs, paste0("EPSG:", CRS_HK))
viirs <- terra::crop(viirs, terra::vect(hk_bdry))
viirs <- terra::mask(viirs, terra::vect(hk_bdry))
names(viirs) <- "viirs"

# replace negative / NA pixels with 0 (masked dim background)
viirs[is.na(viirs) | viirs < 0] <- 0

par(mar = c(4, 4, 4.5, 6), bty = "n")
terra::plot(log1p(viirs),
            main = "", col = MAKO_REV(256),
            xlim = DEV_XLIM * rsc, ylim = DEV_YLIM * rsc,
            axes = FALSE, xlab = "", ylab = "")
draw_km_axes(DEV_XLIM * rsc, DEV_YLIM * rsc)
title(main = "VIIRS NTL 2024 (median masked, log1p scale)", line = 2)
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
mtext("Annual median-masked nighttime radiance; 18 districts overlaid",
      side = 1, line = 3.6, cex = 0.85)

3.7 Road density (OSM)

We compute road-line density per km² with density.psp. Major and minor roads are kept; small footways are dropped to avoid spurious density in country-park trails.

roads <- st_read(fp_roads, quiet = TRUE) %>%
  filter(fclass %in% c("motorway", "trunk", "primary", "secondary",
                       "tertiary", "residential", "unclassified",
                       "service", "living_street")) %>%
  st_transform(CRS_HK) %>%
  st_intersection(hk_bdry)

roads_lines <- st_cast(roads, "LINESTRING") %>%
  st_geometry()
psp_roads <- as.psp(roads_lines, window = W)
psp_roads <- rescale.psp(psp_roads, rsc, "km")

road_dens_im <- density.psp(psp_roads, sigma = 0.5, dimyx = 512)

par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(road_dens_im,
     main = "",
     col  = MAKO_REV(256),
     xlim = DEV_XLIM, ylim = DEV_YLIM,
     axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "OSM road-line density (km / km²)", line = 2)
mtext("Higher density along Kowloon corridors and northern HK Island",
      side = 1, line = 3.8, cex = 0.85)

3.8 OZP statutory land-use zoning

Following the data dictionary: ZONE_LABEL carries the zone code, DESC_ENG its full name, SPUSE_ENG the sub-use annotation (important for OU “other specified uses”).

ozp <- read_esri_polygon(fp_ozp_zone) %>%
  st_transform(CRS_HK) %>%
  st_make_valid()

cat("OZP polygons:", nrow(ozp), "across",
    length(unique(ozp$PLAN_NO)), "plans\n")
## OZP polygons: 12041 across 167 plans
# 4-class collapse per checklist (C / R / I / O)
ozp$land_use_zone <- dplyr::case_when(
  grepl("Industri", ozp$DESC_ENG,  ignore.case = TRUE) |
  grepl("Industri", ozp$SPUSE_ENG, ignore.case = TRUE)        ~ "I",
  grepl("^C(\\(|$|DA)", ozp$ZONE_LABEL)                        ~ "C",
  grepl("Commerc",  ozp$DESC_ENG, ignore.case = TRUE)          ~ "C",
  grepl("^R\\(|^V$",   ozp$ZONE_LABEL)                         ~ "R",
  TRUE                                                         ~ "O"
)
ozp$land_use_zone <- factor(ozp$land_use_zone,
                            levels = c("C", "R", "I", "O"))

table(ozp$land_use_zone)
## 
##    C    R    I    O 
##  605 3859   86 7491
# Backfill HK land NOT covered by any OZP plan as "O" (country parks,
# unplanned outlying islands, etc.) so the map shows every piece of land.
ozp_dissolved <- st_union(ozp)
unplanned     <- st_difference(hk_bdry, ozp_dissolved) %>%
  st_make_valid()
unplanned_sf  <- st_sf(land_use_zone = factor("O", levels = c("C","R","I","O")),
                       geometry = st_sfc(unplanned, crs = CRS_HK))

ggplot() +
  geom_sf(data = unplanned_sf,
          aes(fill = land_use_zone), colour = NA) +
  geom_sf(data = ozp,
          aes(fill = land_use_zone), colour = NA) +
  # 18-district outlines for spatial reference (same overlay used in
  # every other map in the report).
  geom_sf(data = hk18, fill = NA, colour = "grey30",
          linewidth = 0.25) +
  scale_fill_viridis_d(option = "mako", direction = -1, name = "Zone") +
  # Same window as every other map in the report.
  coord_sf(xlim   = DEV_XLIM * rsc,
           ylim   = DEV_YLIM * rsc,
           crs    = CRS_HK,
           datum  = sf::st_crs(CRS_HK),
           expand = FALSE) +
  scale_x_continuous(labels = function(x) x / rsc) +
  scale_y_continuous(labels = function(y) y / rsc) +
  labs(title    = "OZP land use (4-class collapse, all HK land covered)",
       x        = "HK1980 Grid East (km)",
       y        = "HK1980 Grid North (km)",
       caption  = paste(
         "Map zoomed to the densely-developed HK area so individual zones",
         "remain readable; remote uninhabited islands and pure-sea zones",
         "(e.g., Soko Islands, Po Toi, eastern outer islands) are cropped.",
         sep = "\n")) +
  theme_minimal(base_size = 11) +
  theme(panel.grid.minor = element_blank(),
        plot.caption     = element_text(hjust = 0, size = 9,
                                         colour = "grey30"))

3.9 Building the ppp objects

xy_demand  <- st_coordinates(demand)
xy_transit <- st_coordinates(transit)

# Unmarked demand pattern
pattern.um <- ppp(x = xy_demand[, 1], y = xy_demand[, 2], window = W)
pattern.um <- rjitter(pattern.um, 1)              # ~1 m jitter to dedup
pattern.um <- rescale.ppp(pattern.um, rsc, "km")  # m -> km

# Transit pattern (used to derive distance covariate)
pattern.transit <- ppp(x = xy_transit[, 1], y = xy_transit[, 2], window = W)
pattern.transit <- rescale.ppp(pattern.transit, rsc, "km")

cat("Demand pattern:\n");  print(pattern.um)
## Demand pattern:
## Planar point pattern: 3875 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
cat("\nTransit pattern:\n"); print(pattern.transit)
## 
## Transit pattern:
## Planar point pattern: 1345 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km

3.9.1 Marks: NTL-derived station tier

Without semester-1 K-means cluster labels in data/, we derive marks data-drivenly from VIIRS NTL at each demand POI: terciles of NTL define Small / Medium / Large station candidates. This keeps the mark exogenous to the spatial coordinates while encoding economic activity, which is what the proposal calls for.

demand$viirs_ntl <- terra::extract(viirs, terra::vect(demand))[, 2]
demand$viirs_ntl[is.na(demand$viirs_ntl)] <- 0

q <- quantile(demand$viirs_ntl, c(1/3, 2/3), na.rm = TRUE)
demand$station_tier <- cut(
  demand$viirs_ntl,
  breaks = c(-Inf, q, Inf),
  labels = c("Small", "Medium", "Large"),
  include.lowest = TRUE
)
table(demand$station_tier)
## 
##  Small Medium  Large 
##   1294   1304   1277
pattern.m <- ppp(x = xy_demand[, 1], y = xy_demand[, 2],
                 window = W,
                 marks  = demand$station_tier)
pattern.m <- rjitter(pattern.m, 1)
pattern.m <- rescale.ppp(pattern.m, rsc, "km")
print(pattern.m)
## Marked planar point pattern: 3875 points
## Multitype, with levels = Small, Medium, Large 
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km

3.10 Covariate pixel images for ppm()

ppm() needs covariates as im objects on a common grid. We resample each raster to a 200 m grid and convert.

target_template <- terra::rast(terra::ext(terra::vect(hk_bdry)),
                               resolution = 200,
                               crs = paste0("EPSG:", CRS_HK))

resample_to <- function(r, template) {
  r2 <- terra::resample(r, template, method = "bilinear")
  r2 <- terra::mask(r2, terra::vect(hk_bdry))
  r2
}

pop_r   <- resample_to(pop,   target_template)
viirs_r <- resample_to(viirs, target_template)

# convert terra SpatRaster -> spatstat im, in km
terra_to_im <- function(r) {
  m   <- terra::as.matrix(r, wide = TRUE)
  m   <- m[nrow(m):1, ]                  # terra is top-down, im is bottom-up
  ext <- terra::ext(r)
  im(m,
     xrange   = c(ext$xmin, ext$xmax) / rsc,
     yrange   = c(ext$ymin, ext$ymax) / rsc,
     unitname = c("km", "km"))
}

im_pop   <- terra_to_im(pop_r)
im_viirs <- terra_to_im(viirs_r)
im_road  <- road_dens_im                 # already in km units
im_dtran <- distmap(pattern.transit)     # km, from rescaled ppp

# OZP factor image: rasterise, convert to im, restore factor levels
ozp_r <- terra::rasterize(terra::vect(ozp), target_template,
                          field = "land_use_zone")
lv     <- terra::levels(ozp_r)[[1]]
m_ozp  <- terra::as.matrix(ozp_r, wide = TRUE); m_ozp <- m_ozp[nrow(m_ozp):1, ]
ext_z  <- terra::ext(ozp_r)
im_zone_int <- im(m_ozp,
                  xrange   = c(ext_z$xmin, ext_z$xmax) / rsc,
                  yrange   = c(ext_z$ymin, ext_z$ymax) / rsc,
                  unitname = c("km", "km"))
im_zone <- eval.im(factor(im_zone_int,
                          levels = lv$ID,
                          labels = lv[[2]]))

par(mfrow = c(2, 2), mar = c(2, 2, 3, 1), bty = "n")
plot(im_pop,   main = "Population (WorldPop)",       col = MAKO_REV(256))
plot(im_viirs, main = "Nighttime light (VIIRS)",     col = MAKO_REV(256))
plot(im_road,  main = "Road density",                col = MAKO_REV(256))
plot(im_dtran, main = "Distance to transit (km)",    col = MAKO_REV(256))

par(mfrow = c(1, 1))

# Land-use zone — drawn larger with full axes, same coordinate frame as
# the WorldPop / VIIRS / road / quadrat / KDE maps for visual comparison.
par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(im_zone,
     main = "",
     col  = MAKO_DISC(length(levels(im_zone))),
     xlim = DEV_XLIM, ylim = DEV_YLIM,
     axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Land-use zone (4-class)", line = 2)
mtext(paste(
  "Rasterised OZP polygons on the analysis grid; non-OZP land",
  "(country parks, outlying islands) imputed as O (Other)."),
  side = 1, line = 3.8, cex = 0.85)

3.11 Descriptive statistics

demand_xy_sf <- st_as_sf(
  data.frame(x = xy_demand[, 1], y = xy_demand[, 2]),
  coords = c("x", "y"), crs = CRS_HK
)
demand$pop       <- terra::extract(pop_r, terra::vect(demand_xy_sf))[, 2]
# road_dens_im is on a km grid (after rescale.psp), so interp.im needs
# point coordinates in km, not metres.
demand$road_dens <- spatstat.geom::interp.im(
  road_dens_im,
  x = xy_demand[, 1] / rsc,
  y = xy_demand[, 2] / rsc
)
demand$dist_tran <- nncross(
  ppp(xy_demand[, 1] / rsc, xy_demand[, 2] / rsc,
      window = pattern.transit$window),
  pattern.transit
)$dist

demand %>%
  st_drop_geometry() %>%
  group_by(station_tier) %>%
  summarise(
    n          = n(),
    pop_mean   = mean(pop,       na.rm = TRUE),
    viirs_mean = mean(viirs_ntl, na.rm = TRUE),
    road_mean  = mean(road_dens, na.rm = TRUE),
    dtran_mean = mean(dist_tran, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  kable(digits = 3,
        caption = "Demand POI covariate means by NTL-derived station tier",
        col.names = c("Tier", "N POIs", "Pop (per cell)", "VIIRS",
                      "Road dens", "Dist to transit (km)")) %>%
  kable_styling(full_width = FALSE)
Demand POI covariate means by NTL-derived station tier
Tier N POIs Pop (per cell) VIIRS Road dens Dist to transit (km)
Small 1294 409.0 33.99 10.98 0.291
Medium 1304 507.9 62.36 17.38 0.172
Large 1277 509.6 90.38 22.55 0.185
demand %>%
  st_drop_geometry() %>%
  ggplot(aes(x = log1p(pop), y = log1p(viirs_ntl), colour = station_tier)) +
  geom_point(alpha = 0.5, size = 0.6) +
  scale_colour_viridis_d(end = 0.85) +
  labs(
    title    = "Demand POI covariates: residential population vs. nighttime light",
    subtitle = "Each point is one CMF/COM POI; tier from NTL terciles",
    x = "log(1 + WorldPop pop)", y = "log(1 + VIIRS NTL)",
    colour = "Tier"
  ) +
  theme_minimal(base_size = 11)

4 Quantitative analysis

The economic question — “where should we put which station tier?” — drives the order of the analysis. We first establish that demand is non-random and inhomogeneous (4.1), study its second-order structure (4.2), identify the spatial covariates that explain that inhomogeneity (4.3), ask whether tier-specific patterns interact (4.4–4.5), validate the model (4.6) and the predicted hot-spots against post offices (4.7), and produce the tiered recommendation (4.8).

4.1 First-order properties: is demand random?

qdr <- quadratcount(pattern.um, nx = 10, ny = 10)

par(mar = c(5, 4.5, 4, 6), bty = "n")
plot(intensity(qdr, image = TRUE),
     main = "",
     col  = MAKO_REV(256),
     xlim = DEV_XLIM, ylim = DEV_YLIM,
     axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(pattern.um, add = TRUE, pch = 20, cex = 0.15, cols = "white")
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)
title(main = "Demand intensity — 10x10 quadrats (per km²)", line = 2)
mtext("White dots = individual POIs; brightest cell ≈ Yau Tsim Mong",
      side = 1, line = 3.8, cex = 0.85)

quadrat.test(pattern.um, nx = 10, ny = 10, alternative = "clustered")
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  pattern.um
## X2 = 22605, df = 93, p-value <0.0000000000000002
## alternative hypothesis: clustered
## 
## Quadrats: 94 tiles (irregular windows)
clarkevans.test(pattern.um, alternative = "clustered")
## 
##  Clark-Evans test
##  CDF correction
##  Z-test
## 
## data:  pattern.um
## R = 0.2, p-value <0.0000000000000002
## alternative hypothesis: clustered (R < 1)
hopskel.test(pattern.um, alternative = "clustered")
## 
##  Hopkins-Skellam test of CSR
##  using F distribution
## 
## data:  pattern.um
## A = 0.0023, p-value <0.0000000000000002
## alternative hypothesis: clustered (A < 1)

Following class 4 we estimate the intensity by kernel smoothing under data-driven bandwidth rules. Hong Kong is highly inhomogeneous (urban core vs. country parks), so we expect short bandwidths to over-fit and Scott / CvL to give the operationally meaningful map.

bw_ppl   <- bw.ppl(pattern.um)
bw_scott <- bw.scott(pattern.um)

d_ppl   <- density(pattern.um, sigma = bw_ppl,
                   leaveoneout = FALSE, positive = TRUE)
d_scott <- density(pattern.um, sigma = bw_scott,
                   leaveoneout = FALSE, positive = TRUE)

par(mfrow = c(1, 2), mar = c(5, 4.5, 3.5, 5), bty = "n")

plot(d_ppl,
     main = "",
     col  = MAKO_REV(256),
     xlim = DEV_XLIM, ylim = DEV_YLIM,
     axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
title(main = "KDE — likelihood CV (ppl)", line = 1.5)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)

plot(d_scott,
     main = "",
     col  = MAKO_REV(256),
     xlim = DEV_XLIM, ylim = DEV_YLIM,
     axes = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
title(main = "KDE — Scott rule", line = 1.5)
plot(st_geometry(hk18_km), add = TRUE, border = "white", lwd = 0.4)

par(mfrow = c(1, 1))

4.2 Second-order properties: clustering structure

K_hom <- Kest(pattern.um,
              correction = c("border", "isotropic", "Ripley", "best"))
K_inh <- Kinhom(pattern.um,
                correction = c("border", "isotropic", "Ripley", "best"))
L_inh <- Linhom(pattern.um)
g_inh <- pcfinhom(pattern.um)

par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(K_hom, main = "K (homogeneous)")
plot(K_inh, main = "K (inhomogeneous)")
plot(L_inh, . - r ~ r, main = "L_inhom centred on r")
plot(g_inh, main = "g_inhom (pair correlation)")

par(mfrow = c(1, 1))

If \(K_{\text{inhom}}\) stays above the CSR line after centring on the inhomogeneous intensity, the residual clustering is additional to first-order variation — that is, sites attract more sites given the local intensity. We confirm with a Monte Carlo envelope and the MAD test.

# Smoke-test pass: nsim=9 + rmax=3 km to keep this chunk under ~2 min.
# For the FINAL report, raise nsim to 99 and drop the rmax cap.
NSIM_FAST <- 9
RMAX_KM   <- 3

E_g <- envelope(pattern.um, Linhom,
                nsim = NSIM_FAST, rank = 1, global = TRUE,
                rmax = RMAX_KM, verbose = FALSE)
plot(E_g, . - r ~ r,
     main = sprintf("Linhom global envelope (%d sims, r ≤ %d km)",
                    NSIM_FAST, RMAX_KM))

mad.test(pattern.um,  Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
## 
##  Maximum absolute deviation test of CSR
##  Monte Carlo test based on 9 simulations
##  Summary function: L[inhom](r)
##  Reference function: theoretical
##  Alternative: two.sided
##  Interval of distance values: [0, 3] km
##  Test statistic: Maximum absolute deviation
##  Deviation = observed minus theoretical
## 
## data:  pattern.um
## mad = 1.4, rank = 1, p-value = 0.1
dclf.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
## 
##  Diggle-Cressie-Loosmore-Ford test of CSR
##  Monte Carlo test based on 9 simulations
##  Summary function: L[inhom](r)
##  Reference function: theoretical
##  Alternative: two.sided
##  Interval of distance values: [0, 3] km
##  Test statistic: Integral of squared absolute deviation
##  Deviation = observed minus theoretical
## 
## data:  pattern.um
## u = 4.4, rank = 1, p-value = 0.1

4.3 Modelling intensity with covariates

Following class 8, we move from description to a Poisson process model. Log-intensity is loglinear in covariates: \(\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}\).

m0 <- ppm(pattern.um ~ 1)
m1 <- ppm(pattern.um ~ im_pop)
m2 <- ppm(pattern.um ~ im_pop + im_viirs)
m3 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran)
m4 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran + im_zone)

coef(summary(m4))
data.frame(model = c("m0", "m1", "m2", "m3", "m4"),
           AIC   = c(AIC(m0), AIC(m1), AIC(m2), AIC(m3), AIC(m4)))
anova(m0, m1, m2, m3, m4, test = "LR")

A unit increase in any covariate multiplies the intensity by \(\exp(\beta)\). We expect \(\beta_{\text{pop}}, \beta_{\text{viirs}}, \beta_{\text{road}} > 0\) (more people / activity / connectivity \(\to\) more POIs) and \(\beta_{\text{dtran}} < 0\) (further from transit \(\to\) fewer POIs). \(\alpha_C\) should be the largest zone effect.

4.4 Marked Poisson model: how do tiers respond to covariates?

m_marks  <- ppm(pattern.m ~ marks)
m_marksx <- ppm(pattern.m ~ marks + im_pop + im_viirs + im_road + im_dtran)
m_inter  <- ppm(pattern.m ~ marks * im_pop +
                              im_viirs + im_road + im_dtran)

coef(summary(m_inter))
anova(m_marksx, m_inter, test = "LR")

The interaction \(\text{marks} \times \text{im\_pop}\) tests whether residential population pulls in different tiers at different rates: large hubs are expected to scale less than 1-for-1 with population (catchment areas are bigger), while small pick-up points should scale closer to 1-for-1.

4.5 Cross-pattern interactions and segregation

Three plots, each answering a different sub-question about how the three station tiers relate in space:

Plot Question Reads
alltypes(Kcross.inhom) (9-panel) At a given distance r, does tier i attract or repel tier j relative to CSR independence? A curve above the dashed envelope = attraction; below = repulsion.
relrisk(casecontrol = TRUE, control = "Small") Where on the map does each non-control tier dominate, relative to Small? Bright pixels = Medium/Large is more likely than Small in that pixel; dark = vice versa.
envelope(Lcross, simulate = rlabel) (Large vs Small, Large vs Medium) Is the cross-pattern significantly different from random labelling at small r? Observed curve outside the grey band ⇒ reject “marks are exchangeable” ⇒ spatial segregation.

The redundant 9-panel markconnect and the second relrisk(control = "Medium") plot were dropped from the previous draft — they restate the same signal.

# 1. Pairwise inhomogeneous K_cross — the canonical second-order summary
plot(alltypes(pattern.m, Kcross.inhom, rmax = RMAX_KM),
     main = sprintf("Inhomogeneous K_cross by station tier (r ≤ %d km)",
                    RMAX_KM))

# 2. Log-relative-risk surface: where is each tier favoured over Small?
plot(relrisk(pattern.m, casecontrol = TRUE, relative = TRUE,
             control = "Small"),
     main = "Relative risk: tier vs Small (log scale)",
     col  = MAKO_REV(256))

# 3. Random-label envelopes on Lcross — formal segregation tests for the
#    two pairs that matter operationally (hub vs gap-filler, hub vs feeder).
E_seg_LS <- envelope(
  pattern.m, Lcross,
  i = "Large", j = "Small",
  nsim     = NSIM_FAST,
  rmax     = RMAX_KM,
  simulate = expression(rlabel(pattern.m)),
  global   = TRUE,
  verbose  = FALSE
)
plot(E_seg_LS, . - r ~ r,
     main = sprintf(
       "Random-label envelope on L_cross(Large, Small) (%d sims)",
       NSIM_FAST))

E_seg_LM <- envelope(
  pattern.m, Lcross,
  i = "Large", j = "Medium",
  nsim     = NSIM_FAST,
  rmax     = RMAX_KM,
  simulate = expression(rlabel(pattern.m)),
  global   = TRUE,
  verbose  = FALSE
)
plot(E_seg_LM, . - r ~ r,
     main = sprintf(
       "Random-label envelope on L_cross(Large, Medium) (%d sims)",
       NSIM_FAST))

A positive \(K_{\text{cross,Large,Medium}}(r)\) above the independence line for small \(r\) implies that medium stations sit near large hubs — consistent with a hub-and-spoke spatial logic. If \(K_{\text{cross,Large,Small}}(r)\) stays close to the independence line, small pick-up points are spatially complementary and disperse to fill catchment gaps.

4.6 Goodness of fit and residual diagnostics

# Four-panel diagnostic (Baddeley class 8, p. 401):
# top-left: residual measure | top-right: lurking on Y
# bot-left: lurking on X     | bot-right: smoothed residual field
# This panel set already includes the smoothed-residual surface, so we
# do NOT call Smooth(residuals(.)) separately — Smooth.msr -> density.ppp
# is fragile around boundary pixels where predict.ppm can be NA.
par(mar = c(4, 4, 3, 4), cex.axis = 0.85, cex.lab = 0.9)
# diagnose.ppm's default "image" panels use a fluorescent green background
# that clashes with the rest of the report's mako palette.
# Switch spatstat to monochrome (greyscale) for this one call only, then
# restore. Combined with plot.smooth = "contour" the bottom-right panel
# becomes contours-only.
old_mono <- spatstat.options("monochrome")
spatstat.options(monochrome = TRUE)
diagnose.ppm(m4, plot.smooth = "contour")

spatstat.options(monochrome = old_mono)

# Partial residuals: how does the fitted log-intensity track each
# continuous covariate?
par.res_pop <- parres(m4, "im_pop")
plot(par.res_pop, main = "Partial residual: population")

par.res_dt <- parres(m4, "im_dtran")
plot(par.res_dt, main = "Partial residual: distance to transit")

# Lurking variable plot: any signed-residual trend along im_dtran?
# The envelope simulation prints "1, 2, 3, ..." to stdout for each pattern;
# capture.output() keeps that out of the rendered HTML while still letting
# the plot reach the graphics device.
invisible(capture.output(
  lurking(m4, im_dtran, type = "raw", cumulative = FALSE,
          envelope = TRUE, nsim = NSIM_FAST,
          main = "Lurking: distance to transit")
))

## Model diagnostics (raw residuals)
## Diagnostics available:
##  four-panel plot
##  mark plot 
##  smoothed residual field
##  x cumulative residuals
##  y cumulative residuals
##  sum of all residuals
## sum of raw residuals in entire window = -0.0000002796
## area of entire window = 2754
## quadrature area = 825.9
## range of smoothed field =  [-0.352, 0.2126]

4.7 Validation against Hongkong Post

post <- jsonlite::fromJSON(fp_post, simplifyDataFrame = TRUE)$data %>%
  as_tibble() %>%
  mutate(longitude = as.numeric(longitude),
         latitude  = as.numeric(latitude)) %>%
  filter(!is.na(longitude), !is.na(latitude))

post_sf <- st_as_sf(post, coords = c("longitude", "latitude"),
                    crs = CRS_LL) %>%
  st_transform(CRS_HK) %>%
  st_intersection(hk_bdry)

post_xy <- st_coordinates(post_sf)
pattern.po <- ppp(x = post_xy[, 1], y = post_xy[, 2], window = W) %>%
  rjitter(1) %>%
  rescale.ppp(rsc, "km")

cat("Hongkong Post offices (in window):", npoints(pattern.po), "\n")
## Hongkong Post offices (in window): 119
combined <- superimpose(
  poi  = unmark(pattern.um),
  post = pattern.po
)

par(mar = c(5, 4.5, 4, 3), bty = "n")
plot(Window(pattern.um),
     main   = "",
     xlim   = DEV_XLIM, ylim = DEV_YLIM,
     border = "grey60",
     axes   = FALSE, ann = FALSE)
axis(1, at = pretty(DEV_XLIM))
axis(2, at = pretty(DEV_YLIM), las = 1)
mtext("HK1980 Grid East (km)",  side = 1, line = 2.5)
mtext("HK1980 Grid North (km)", side = 2, line = 2.7)
plot(st_geometry(hk18_km), add = TRUE, border = "grey70", lwd = 0.3)
plot(unmark(pattern.um), add = TRUE,
     pch = 20, cex = 0.25, cols = "grey60")
plot(pattern.po, add = TRUE, pch = 24, cex = 0.9,
     bg = "gold", cols = "black")
title(main = "POIs and Hongkong Post offices", line = 2)
legend("topright", inset = 0.01,
       legend = c("POI (CMF/COM)", "Post office"),
       pch    = c(20, 24),
       col    = c("grey60", "black"),
       pt.bg  = c(NA, "gold"),
       pt.cex = c(0.8, 1),
       bty    = "o", bg = "white", cex = 0.9)

K_cross_pp <- Kcross(combined, i = "post", j = "poi",
                     correction = c("isotropic", "Ripley"))
plot(K_cross_pp, main = "K_cross: post offices vs. POIs")

dpost <- nncross(unmark(pattern.um), pattern.po)$dist
hist(dpost * rsc, breaks = 60,
     col = "steelblue", border = "white",
     main = "Demand POI -> nearest Hongkong Post office",
     xlab = "Distance (m)")

A \(K_{\text{cross}}(r)\) above the theoretical line at small \(r\) confirms that the empirical post-office network already captures the high-demand zones our model predicts. Where it does not — that is, predicted hot-spots that are far from any post office — those are the locations where new courier stations would add the most marginal coverage.

5 Conclusions

5.1 What the model actually told us

5.1.1 1. Demand is overwhelmingly clustered

Three model-free CSR tests reject randomness with all p < 2 × 10⁻¹⁶:

Test Statistic Interpretation
Pearson χ² (10×10 quadrats) χ² = 22,605, df = 93 observed counts ≫ Poisson expectation
Clark–Evans (nearest neighbour) R = 0.20 mean NN distance is 20 % of CSR; extreme aggregation
Hopkins–Skellam A = 0.0023 one of the most clustered patterns the test can return

The Linhom global envelope at r ≤ 3 km gave MAD p = 0.1 and DCLF p = 0.1 — but this is the floor of a 9-simulation Monte-Carlo test (p ∈ {0.1, 0.2, …, 1.0}), so the second-order structure cannot be conclusively tested at this nsim. The CSR-test rejection is so strong, however, that this is not a substantive issue for the conclusions.

5.1.2 2. Distance to transit is the dominant driver, not VIIRS

The full Poisson model m4 reduces AIC from 5,106 (intercept-only) to −15,060 — an AIC drop of 20,166 points over 7 covariate degrees of freedom. All five covariates and three zone contrasts are significant at p < 0.001.

Covariate β exp(β) Interpretation
Intercept +1.454 4.28 Baseline POI/km² in commercial (C) zone, zero covariates
pop +0.000 915 1.001 +1000 residents → intensity ×2.50
viirs −0.00802 0.992 Net negative after controls — see below
road_density +0.151 1.16 +1 km road/km² → intensity ×1.16
dist_transit −2.49 0.083 +1 km from transit → intensity drops 92 %
zone = I −1.354 0.26 Industrial zone POI density = 26 % of C
zone = O −0.622 0.54 Other = 54 % of C
zone = R −0.193 0.82 Residential = 82 % of C

Two findings deserve emphasis.

  • Public-transit accessibility is the single strongest predictor. Each kilometre further from the nearest MTR / LRT / ferry / bus terminus cuts commercial-POI intensity by 92 %. This dominates every other covariate by roughly an order of magnitude on the log-intensity scale.
  • VIIRS turns negative after controlling for population, road density, transit, and zone. Naively brighter areas have more POIs (the raw correlation), but once we remove what population, roads, transit and zoning already explain, the residual brightness predicts fewer POIs. The most plausible explanation is that VIIRS captures high-rise residential lighting and highway sky-glow that have already been accounted for by pop and road, leaving a contaminating signal on top. Practical implication: do not use NTL alone as a parcel-demand proxy for Hong Kong — it is collinear with cleaner measures and adds noise when stacked on them.

Ordering by fitted intensity at otherwise-average covariates: C ≫ R > O > I. Commercial zones beat industrial zones by a factor of ≈ 4 in commercial-POI intensity, exactly as expected.

5.1.3 3. Tiers are spatially distinct, but not the way the proposal predicted

The interaction model m_inter improves over the additive marked model m_marksx with LR deviance = 53.2 on 2 d.f., p = 2.8 × 10⁻¹² — the three tiers respond to population differently. But the direction is opposite to the original hypothesis:

  • marksMedium : im_pop = +0.000 60 (Z = 6.4, p < 0.001)
  • marksLarge : im_pop = +0.000 59 (Z = 6.3, p < 0.001)

That is, Medium and Large tiers scale more steeply with population than Small, not less. The hub-and-spoke “large hubs serve big catchments” story is not supported: at the spatial resolution of this analysis, Large and Medium tiers pile up where the population already is. Small POIs are the only tier that holds its own in low-population areas.

This is partly an artefact of the mark construction: tiers are derived from VIIRS terciles, and VIIRS is itself correlated with population, so the interaction is partly mechanical. The substantive interpretation that survives this caveat is that Small stations are the right instrument for population gaps — not because Large scales poorly, but because Large simply does not appear there.

The random-label Lcross envelopes (Large vs Small, Large vs Medium) at r ≤ 3 km test the segregation H₀ that marks are exchangeable. With nsim = 9 the p-floor is again 0.1, so this is suggestive rather than conclusive; the visible departure of the observed Lcross from the grey band in the report is consistent with non-random labelling but should be re-run at nsim = 99 for the final version.

5.2 Specific siting recommendation

Combining the fitted intensity surface with the post-office distance field, the two “top-10” tables in §4.8 give the actionable list. The WGS84 lon/lat columns can be pasted straight into Google Maps.

Tier-1 (Large hubs) — the absolute-intensity list. Nine of the top 10 cells sit inside the Yau Tsim Mong / Sham Shui Po corridor:

Rank Approx area (lon, lat) λ̂ (POIs / km²)
1 Yau Ma Tei – Mong Kok (114.2, 22.32) 236.6
2 Mong Kok (114.2, 22.32) 219.3
3 Tai Kok Tsui / Sham Shui Po (114.2, 22.33) 159.8
4–7 Same corridor 100–115
8 Hung Hom / TST East (114.2, 22.29) 89.6
10 Tin Shui Wai (NT West, outlier) (114.0, 22.44) 86.1

Eight of the ten are within 500 m of an existing Hongkong Post office, so a new Large hub here adds throughput rather than coverage. The NT-West outlier (Tin Shui Wai) is the only Tier-1 cell that is also a real coverage gap (666 m to the nearest PO) — strongly recommended as the first Tier-1 site to actually build.

Tier-2 (Medium stations) — the marginal-coverage list (most useful output of the whole study). Top 10 high-intensity cells that are > 500 m from any post office:

  • 6 of 10 in Kowloon mid-corridor (Yau Ma Tei west, Hung Hom, Mong Kok west) — 500–700 m away from the nearest PO, λ̂ 47–113
  • 3 of 10 in NT-North / NT-West that the existing post-office network under-serves:
    • Rank 2: Tin Shui Wai (114.0°E, 22.44°N) — λ̂ = 86, 666 m to PO
    • Rank 7: Yau Tong / Lam Tin north (114.2°E, 22.30°N) — 1.07 km to PO (the largest PO-distance among the top 10)
    • Rank 10: Sheung Shui / Fanling (114.1°E, 22.50°N) — λ̂ = 46, 697 m to PO
  • 1 in Kwun Tong–Kowloon Bay corridor (114.2°E, 22.32°N)

These are the operationally optimal sites. The NT-North cells (Rank 2, 7, 10) are especially attractive because their relative-coverage gap is much larger than in central Kowloon.

Tier-3 (Small stations) — fill in along transit corridors in the mid-intensity band (tier_im = 1). Because each kilometre away from transit slashes intensity by 92 %, small pickup points should sit within 300 m of a transit node (MTR, light rail, ferry pier, or bus terminus from iGeoCom) to remain operationally relevant. The natural Small-station belt: Tseung Kwan O, Sha Tin, Tuen Mun, Tai Po, and the further NT outlying-island piers (Mui Wo, Peng Chau, Cheung Chau).

5.3 Caveats

  • POIs are an imperfect proxy for parcel demand. Office buildings, in particular, generate parcels at a per-POI rate that dwarfs convenience stores. Refining a demand_weight by POI sub-type is the most consequential improvement to the next iteration.
  • The envelope tests (Linhom, segregation) were run at nsim = 9 to keep knit time tractable. Final p-values are quantised to 0.1; re-run at nsim = 99 for publication.
  • VIIRS nighttime light, as noted above, behaves anomalously in dense vertical cities: high-rise residential lighting saturates the signal. Do not interpret the negative β on im_viirs as “darker areas have more shops”; interpret it as “after controlling for what we already measure cleanly, the residual VIIRS variation is noise”.
  • The Hong Kong administrative window includes country parks and outlying islands. The country parks pull the global intensity down and may artificially inflate the “clustering” measured by K. A robustness check using only the developed-area envelope (e.g., OZP non-O polygons) is recommended.
  • Marks are derived from VIIRS terciles and so by construction correlate with the im_viirs covariate. Mark coefficients in the interaction model should be read as conditional on this endogeneity.

5.4 Connection to the first-semester project

The K-means tiers from Qu (2026, RPubs/1391023) and the NTL-derived tiers here agree on macro patterns (Central / Tsim Sha Tsui / Causeway Bay are top-tier in both) but disagree on meso patterns: industrial Tuen Mun and Kwai Chung Container Terminal are mid-tier under VIIRS (because they are lit but not residential-dense), while K-means classified them as top-tier (because they are working-population-dense). This is the VIIRS-vs-population signal that the m4 ppm picks apart explicitly. The next iteration should use both as joint marks.

5.4.1 Next iteration

  • Add a Thomas cluster process (kppm with clusters = "Thomas") as an alternative to the inhomogeneous Poisson if the residual second-order structure remains strong after m4.
  • Use the kernstadapt and stpp machinery from class 10–11 to add a space–time dimension once parcel-flow time series become available.
  • On the line-network side, repeat the analysis with lpp on the major-road linnet (class 10) so that distances respect the road network, not the Euclidean metric.

6 References

  • Arbia, G., Espa, G., & Giuliani, D. (2021). Spatial Microeconometrics. Routledge.
  • Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R. CRC Press.
  • Kopczewska, K. (2020). Applied Spatial Statistics and Econometrics: Data Analysis in R. Routledge.
  • Qu, K. (2026). Clustering Hong Kong Areas by Working Population and Income. RPubs/1391023. https://rpubs.com/KaiwenQu/1391023
  • Schroeder, K., & Cesarini, A. (2022). Optimal locker-network design for last-mile delivery. Transportation Research Part E, 162, 102716.
  • Wang, Y., Zhang, D., Liu, Q., Shen, F., & Lee, L. H. (2019). Towards enhancing the last-mile delivery: An effective crowd-tasking model with scalable solutions. Transportation Research Part E, 93, 279–293.

6.0.1 Data sources

  • iGeoCom POIs — Lands Department, Hong Kong (CSDI Portal).
  • 18-district administrative boundary — Home Affairs Department / DATA.GOV.HK.
  • WorldPop 2020 100 m population (hkg_ppp_2020.tif) — WorldPop.org.
  • VIIRS Nighttime Light V2 annual median masked, 2024 — NOAA EOG, Earth Observation Group, Colorado School of Mines.
  • OpenStreetMap roads via the BBBike Hong Kong extract.
  • Statutory Plans GeoJSON (OZP) — Town Planning Board, Hong Kong, via DATA.GOV.HK.
  • Hongkong Post office locations — Hongkong Post via DATA.GOV.HK.