1 Introduction 引言

Hong Kong is one of the densest urban economies in the world, with one of the highest e-commerce penetration rates in Asia. Last-mile parcel logistics — sorting hubs, locker stations, and pick-up points — therefore competes for very limited urban space. Where these facilities are placed has direct effects on delivery cost, courier travel distance, and customer waiting time. The economic question we ask is concrete: given where commercial activity actually takes place in Hong Kong, where should large, medium, and small courier stations be located?

香港是全球人口密度最高的城市经济体之一，其电子商务渗透率在亚洲名列前茅。因此，末端包裹物流——包括分拣中心、智能柜站点和自提点——不得不争夺极为有限的城市空间。这些设施的选址直接影响配送成本、快递员行驶距离以及客户等待时间。我们提出的经济问题非常具体：鉴于香港商业活动的实际分布情况，大型、中型和小型快递站应设在何处？

In the first-semester project (Qu, 2026, RPubs/1391023) we approached this question through area-level K-means clustering of working population and household income across Small Tertiary Planning Unit Groups (STPUGs). The clustering yielded three tiers of districts and informed a tiered station recommendation. That study had a clear limitation: it operated on aggregated polygons, so it could not say anything about within-area concentration of demand or about the geometry of station–station spacing.

在第一学期的项目中（Qu, 2026, RPubs/1391023），我们通过对小三级规划单元组（STPUGs）的就业人口与家庭收入进行区域层面的 K 均值聚类来探讨这一问题。聚类结果将各区划分为三个层级，并据此提出了分层站点建议。该研究存在明显局限：它基于聚合多边形进行分析，因而无法描述需求在区域内部的集中程度，也无法刻画站点间的空间几何关系。

This project upgrades the analysis to a spatial point pattern analysis (SPPA). We treat commercial points of interest (POIs) — the CMF/COM classes of Hong Kong’s official iGeoCom POI dataset — as a proxy for parcel demand, and we apply the methods discussed in classes 4–9 (Baddeley, Rubak & Turner, 2015; Kopczewska, 2020) to ask three questions.

本项目将分析升级为空间点模式分析（SPPA）。我们以商业兴趣点（POI）——即香港官方 iGeoCom 数据集中 CMF/COM 类别的点位——作为包裹需求的代理指标，并运用课程第 4–9 章所介绍的方法（Baddeley, Rubak & Turner, 2015；Kopczewska, 2020）探讨三个问题。

Is courier demand in Hong Kong clustered? We use first-order intensity (KDE, quadrat tests) and second-order summary functions (Ripley’s $K$, $L$, $g$) to test against complete spatial randomness (CSR), under both homogeneous and inhomogeneous baselines.
What drives the spatial intensity of demand? We fit inhomogeneous Poisson point process models with ppm() using resident population (WorldPop), nighttime light radiance (VIIRS Black Marble), road density (OSM), distance to the nearest public-transit node (multi-modal: MTR, light rail, ferry, bus terminal), and statutory land-use zone (Hong Kong OZP) as covariates.
How do tiered station types interact in space? We mark each demand point by the local NTL tercile (Small / Medium / Large) and study the marks with Kcross.inhom, markconnect, Iest, and a Monte-Carlo segregation test. We finally validate the model-implied high-demand zones against the empirical pattern of Hongkong Post offices via Kcross.
香港的快递需求是否存在空间聚集？ 我们采用一阶强度（KDE、样方检验）和二阶汇总函数（Ripley 的 $K$、$L$、$g$），在齐次与非齐次基线下，分别对完全空间随机性（CSR）假设进行检验。
哪些因素驱动需求的空间强度？ 我们使用 ppm() 拟合非齐次泊松点过程模型，以常住人口（WorldPop）、夜间灯光辐射亮度（VIIRS Black Marble）、道路密度（OSM）、距最近公共交通节点的距离（多模式：港铁、轻铁、渡轮、巴士总站）以及法定土地用途分区（香港 OZP）作为协变量。
不同层级站点类型在空间上如何互动？ 我们依据各需求点所在位置的 NTL 三分位数（小型/中型/大型）为其打上标记，并借助 Kcross.inhom、markconnect、Iest 以及蒙特卡洛分离检验对标记进行分析。最后，我们通过 Kcross 将模型预测的高需求区域与香港邮政局点位的实际分布进行比对验证。

The economic deliverable is a tiered map of recommended courier station sites, backed by significance tests rather than visual inspection alone.

本研究的经济成果是一份分层次的快递站选址推荐地图，其依据来自显著性检验，而非单纯的目视判断。

requiredPackages <- c(
  "sf", "spatstat", "terra", "sp", "magrittr",
  "ggplot2", "tidyverse", "viridis", "jsonlite",
  "dixon", "kableExtra", "gridExtra", "patchwork", "scales", "rnaturalearth"
)
for (p in requiredPackages) {
  if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
}
invisible(lapply(requiredPackages, library, character.only = TRUE))

2 Literature review 文献综述

2.1 Spatial point pattern analysis - 空间点模式分析

The methodological backbone is Baddeley, Rubak & Turner (2015), Spatial Point Patterns: Methodology and Applications with R. We use the workflow taught in classes 4–9: build a ppp object inside an owin window, summarise its first-order intensity (homogeneous and inhomogeneous), test CSR with quadrat.test, clarkevans.test, and hopskel.test, and study second-order structure with Kest, Lest, and pcf (with their inhomogeneous counterparts Kinhom, Linhom, pcfinhom). Departures from CSR are assessed with global and pointwise envelopes and with the mad.test and dclf.test Monte Carlo tests (classes 5 and 8).

本研究的方法论基础为 Baddeley, Rubak & Turner（2015）所著《空间点模式：方法论及其在 R 中的应用》。我们沿用课程第 4–9 章所教授的工作流：在 owin 窗口内构建 ppp 对象，汇总其一阶强度（齐次与非齐次），用 quadrat.test、clarkevans.test 和 hopskel.test 检验 CSR，并借助 Kest、Lest 和 pcf（及其非齐次版本 Kinhom、Linhom、pcfinhom）研究二阶结构。对 CSR 的偏离程度通过全局与逐点置信带以及 mad.test 和 dclf.test 蒙特卡洛检验进行评估（第 5 和 8 章）。

Kopczewska (2020), Applied Spatial Statistics and Econometrics, provides the framing for moving from spatial description to a regression-style explanation: the intensity of points is treated as the dependent variable in a Poisson process model whose log-intensity is loglinear in covariates. We follow its applied treatment of spatial residual diagnostics, lurking variable plots, and the distinction between inhomogeneity that is covariate-explained and inhomogeneity that is residual clustering.

Kopczewska（2020）所著《应用空间统计与计量经济学》为从空间描述转向回归式解释提供了分析框架：将点的强度视为泊松过程模型中的因变量，其对数强度对协变量呈对数线性关系。本文遵循该书在空间残差诊断、潜在变量图以及区分协变量可解释的非齐次性与残差聚集性方面的应用处理方式。

Arbia, Espa & Giuliani (2021), Spatial Microeconometrics, motivates the economic framing. They argue that firms (and, by extension, retail-related points) cannot be modelled as if independently located: agglomeration externalities, infrastructure access, and zoning constraints generate observable spatial dependence. The authors recommend a two-step strategy — first remove inhomogeneity attributable to first-order covariates, then test whether second-order clustering remains. We adopt this recommendation: the inhomogeneous K function $K_{\text{inhom}}$ is interpreted only after fitting the inhomogeneous Poisson model.

Arbia, Espa & Giuliani（2021）所著《空间微观计量经济学》为本研究的经济分析框架提供了理论动因。他们指出，企业（以及由此延伸的零售相关点位）不能被建模为相互独立的选址决策：集聚外部性、基础设施可达性以及用地管制共同产生了可观测的空间依赖性。作者建议采用两步策略——首先剔除一阶协变量所致的非齐次性，再检验二阶聚集是否依然存在。本文采纳这一建议：非齐次 K 函数 $K_{\text{inhom}}$ 仅在拟合非齐次泊松模型之后才作解释。

2.2 Marked patterns and multitype Poisson processes - 标记模式与多类型泊松过程

For the tiered-station question we follow class 9 (rmpoispp, ppm(... ~ marks), Kcross.inhom, markconnect, Iest, relrisk.ppp). The literature on retail and firm location (Arbia et al., 2021, ch. 4–5) treats interaction between firm types as analogous to interaction between species in ecology, but with the added constraint that economic types compete for and segregate within land-use zones. This motivates using zone as a covariate in the marked model and segregation.test.ppp as a model-free test of mark separation.

针对分层站点问题，本文遵循第 9 章的方法（rmpoispp、ppm(... ~ marks)、Kcross.inhom、markconnect、Iest、relrisk.ppp）。零售与企业区位文献（Arbia 等，2021，第 4–5 章）将企业类型间的互动类比于生态学中的物种间互动，但附加了一个经济学约束：不同经济类型在土地用途分区内相互竞争并发生空间分离。这一逻辑促使本文将分区作为标记模型的协变量，并以 segregation.test.ppp 作为标记分离的无模型检验工具。

2.3 POIs as a proxy for parcel demand - 以兴趣点作为包裹需求的代理指标

Direct parcel-flow data are not publicly available for Hong Kong. We follow a literature that uses POIs as a high-resolution proxy for commercial activity and for delivery demand (Wang et al., 2019, on POI density as a logistics demand indicator; Schroeder & Cesarini, 2022, on parcel-locker siting). The implicit assumption is monotonicity: areas with more shops, offices, and food-and-beverage points generate more parcels, even if the per-POI parcel rate differs by type.

香港没有公开可用的包裹流量直接数据。本文沿用以兴趣点作为商业活动与配送需求高分辨率代理指标的相关文献（Wang 等，2019，关于 POI 密度作为物流需求指标；Schroeder & Cesarini，2022，关于智能柜选址）。其隐含假设是单调性：拥有更多商店、写字楼和餐饮点的地区会产生更多包裹，即便各类型 POI 的单位包裹量存在差异。

Compared with OpenStreetMap, the official Hong Kong iGeoCom dataset (Lands Department) supplies categorically clean and quality-controlled POIs with classification codes (CLASS/TYPE/SUBCAT) that map directly to the zoning system. We therefore use iGeoCom for both demand POIs and transport-node accessibility; OSM is used only for road density.

与 OpenStreetMap 相比，香港地政总署提供的官方 iGeoCom 数据集包含分类清晰、经过质量控制的兴趣点，其分类编码（CLASS/TYPE/SUBCAT）可直接与分区系统对应。因此，本文在需求 POI 和交通节点可达性两方面均采用 iGeoCom 数据集；OSM 仅用于计算道路密度。

2.4 Validation against existing logistics infrastructure - 与现有物流基础设施的验证比对

Following Arbia et al. (2021, ch. 6), we treat Hongkong Post offices as an observed control point pattern: they were sited under a public-service mandate (universal coverage), not under profit maximisation. If our predicted high-intensity zones spatially coincide with post offices at short distance (low $r$ in $K_{\text{cross}}$), this confirms that the predicted zones are operationally accessible. If at intermediate distance the cross-K is below the independence baseline, this is consistent with post offices having captured the easy sites and our recommendations identifying complementary, non-overlapping locations.

遵循 Arbia 等（2021，第 6 章）的做法，本文将香港邮政局点位视为一种观测到的对照点模式：其选址基于公共服务职责（普遍覆盖），而非利润最大化。若本文预测的高强度区域在短距离内（$K_{\text{cross}}$ 中较小的 $r$）与邮政局点位在空间上重合，则证实预测区域具备实际可达性。若在中等距离处，交叉 K 函数值低于独立性基线，则表明邮政局已占据了条件优越的站点，而本文的推荐选址是互补的、非重叠的位置。

3 Data — description and graphs - 数据：描述与图表

3.1 Sources - 数据来源

The dataset is constructed exactly as specified in HK_dataset_checklist.xlsx. All paths are relative to the project folder.

数据集完全按照 HK_dataset_checklist.xlsx 中的规范构建。所有路径均相对于项目文件夹。

Block	Variable	File	CRS
Window	18 districts boundary	`data/hksar_18_district_boundary.json`	WGS84
Demand	iGeoCom POIs (`CLASS` ∈ `CMF`, `COM`)	`data/iGeoCom.geojson`	WGS84
Transit	iGeoCom transit nodes (`TYPE` ∈ RSN/LRA/FER/PIE/FET/BUS)	`data/iGeoCom.geojson`	WGS84
Covariate	Resident population	`data/hkg_ppp_2020.tif` (WorldPop)	WGS84
Covariate	Nighttime light radiance	`data/VNL_npp_2024_..._median_masked.dat.tif.gz`	WGS84
Covariate	Road segments (for road density)	`data/hong-kong-260507-free.shp/gis_osm_roads_free_1.shp`	WGS84
Covariate	OZP land-use zoning	`data/GeoJSON_Statutory_Plans/Statutory Plan GIS Data GeoJSON/ZONE.json`	HK1980
Validation	Hongkong Post offices	`data/post-office.json`	WGS84

All layers are projected to HK1980 Grid (EPSG:2326) for analysis (units in metres, then rescaled to km in the ppp object).

所有图层均投影至 HK1980 格网（EPSG:2326） 进行分析（单位为米，在 ppp 对象中进一步缩放为千米）。

data_dir <- "data"

fp_bdry <- file.path(data_dir, "hksar_18_district_boundary.json")
fp_igeocom <- file.path(data_dir, "iGeoCom.geojson")
fp_pop <- file.path(data_dir, "hkg_ppp_2020.tif")
fp_viirs <- file.path(
  data_dir,
  "VNL_npp_2024_global_vcmslcfg_v2_c202502261200.median_masked.dat.tif.gz"
)
fp_roads <- file.path(
  data_dir, "hong-kong-260507-free.shp",
  "gis_osm_roads_free_1.shp"
)
fp_ozp_zone <- file.path(
  data_dir, "GeoJSON_Statutory_Plans",
  "Statutory Plan GIS Data GeoJSON", "ZONE.json"
)
fp_post <- file.path(data_dir, "post-office.json")

CRS_HK <- 2326 # HK 1980 格网坐标系（单位：米）
CRS_LL <- 4326 # WGS 84 地理坐标系

3.2 Helper: read Esri JSON polygon layer - 辅助函数：读取 Esri JSON 多边形图层

The Hong Kong Town Planning Board ships the OZP statutory plans as Esri JSON (despite the folder name). The structure is features[[i]]$attributes + features[[i]]$geometry$rings, not the properties + coordinates of GeoJSON. We write a small parser.

香港城市规划委员会以 Esri JSON 格式（尽管文件夹名称含有 GeoJSON）发布 OZP 法定图则。其数据结构为 features[[i]]$attributes + features[[i]]$geometry$rings，而非 GeoJSON 的 properties + coordinates。为此，我们编写了一个小型解析器。

read_esri_polygon <- function(path) {
  raw <- jsonlite::fromJSON(path, simplifyVector = FALSE)
  sr <- raw$spatialReference
  # 读取坐标参考系：优先使用 latestWkid，其次 wkid，默认 2326
  crs_in <- if (!is.null(sr$latestWkid)) {
    sr$latestWkid
  } else if (!is.null(sr$wkid)) {
    sr$wkid
  } else {
    2326
  }
  feats <- raw$features
  # 将每个要素的 rings 转换为 sf 多边形
  geoms <- lapply(feats, function(f) {
    rings <- lapply(f$geometry$rings, function(r) {
      matrix(unlist(r), ncol = 2, byrow = TRUE)
    })
    sf::st_polygon(rings)
  })
  # 提取属性表
  attrs <- do.call(rbind, lapply(feats, function(f) {
    as.data.frame(
      lapply(
        f$attributes,
        function(x) if (is.null(x)) NA else x
      ),
      stringsAsFactors = FALSE
    )
  }))
  sf::st_sf(attrs, geometry = sf::st_sfc(geoms, crs = crs_in))
}

3.3 Boundary and observation window - 边界与观测窗口

hk18 <- st_read(fp_bdry, quiet = TRUE) %>%
  st_transform(CRS_HK) %>%
  st_make_valid()

hk_bdry <- st_union(hk18) %>% st_make_valid()
W <- as.owin(hk_bdry)

# 缩放系数：1 km 对应 1000 m
rsc <- 1000

plot(W, main = "Hong Kong observation window (HK1980 Grid)")
plot(st_geometry(hk18), border = "grey60", add = TRUE)

3.4 iGeoCom POIs: demand points and transit nodes - iGeoCom 兴趣点：需求点与交通节点

igc <- st_read(fp_igeocom, quiet = TRUE)
igc <- st_transform(igc, CRS_HK)

# 裁剪至香港研究窗口范围内
igc <- igc[st_intersects(igc, hk_bdry, sparse = FALSE)[, 1], ]

# 需求 POI：商业类别
demand <- igc %>%
  filter(CLASS %in% c("COM", "CMF"))

# 交通节点（4 种交通方式——香港为岛屿地形，渡轮尤为重要）
transit <- igc %>%
  filter(
    (CLASS == "TRS" & TYPE %in% c("RSN", "LRA", "FER", "PIE", "FET")) |
      (CLASS == "BUS" & TYPE == "BUS")
  ) %>%
  mutate(transit_mode = case_when(
    TYPE == "RSN" ~ "MTR/Heavy rail",
    TYPE == "LRA" ~ "Light rail",
    TYPE %in% c("FER", "PIE", "FET") ~ "Ferry/pier",
    TYPE == "BUS" ~ "Bus terminus"
  ))

cat("Demand POIs (CLASS in {COM, CMF}):", nrow(demand), "\n")

## Demand POIs (CLASS in {COM, CMF}): 3875

cat("Transit nodes (multi-modal):     ", nrow(transit), "\n")

## Transit nodes (multi-modal):      1345

table(transit$transit_mode)

## 
##   Bus terminus     Ferry/pier     Light rail MTR/Heavy rail 
##            649            530             68             98

ggplot() +
  geom_sf(data = hk18, fill = NA, colour = "grey70", linewidth = 0.2) +
  geom_sf(data = demand, colour = "steelblue", size = 0.25, alpha = 0.5) +
  geom_sf(data = transit, aes(colour = transit_mode), size = 1, alpha = 0.9) +
  scale_colour_viridis_d(end = 0.85) +
  labs(
    title = "iGeoCom commercial POIs (blue) and transit nodes (coloured)",
    colour = "Transit mode"
  ) +
  theme_minimal(base_size = 11)

3.5 WorldPop population raster - WorldPop 人口栅格

pop <- terra::rast(fp_pop) # WGS84
pop <- terra::project(pop, paste0("EPSG:", CRS_HK))
pop <- terra::crop(pop, terra::vect(hk_bdry))
pop <- terra::mask(pop, terra::vect(hk_bdry))
names(pop) <- "pop"

plot(pop, main = "WorldPop 2020 — population density (HK)")
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)

3.6 VIIRS nighttime light raster - VIIRS 夜间灯光栅格

The Black Marble file ships gzipped — terra::rast() reads it through GDAL’s virtual /vsigzip/ driver.

Black Marble 文件以 gzip 压缩格式分发——terra::rast() 通过 GDAL 的虚拟 /vsigzip/ 驱动直接读取。

viirs_path <- paste0("/vsigzip/", normalizePath(fp_viirs))
viirs <- terra::rast(viirs_path)
viirs <- terra::crop(viirs, terra::ext(c(113.7, 114.6, 22.0, 22.7)))
viirs <- terra::project(viirs, paste0("EPSG:", CRS_HK))
viirs <- terra::crop(viirs, terra::vect(hk_bdry))
viirs <- terra::mask(viirs, terra::vect(hk_bdry))
names(viirs) <- "viirs"

# 将负值或 NA 像元替换为 0（被遮蔽的暗背景区域）
viirs[is.na(viirs) | viirs < 0] <- 0

plot(log1p(viirs), main = "VIIRS NTL 2024 (median masked, log1p scale)")
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)

3.7 Road density (OSM) - 道路密度（OSM）

We compute road-line density per km² with density.psp. Major and minor roads are kept; small footways are dropped to avoid spurious density in country-park trails.

我们使用 density.psp 计算每平方千米的道路线密度。保留主要及次要道路；剔除小型人行道，以避免郊野公园小径带来的虚假密度。

roads <- st_read(fp_roads, quiet = TRUE) %>%
  filter(fclass %in% c(
    "motorway", "trunk", "primary", "secondary",
    "tertiary", "residential", "unclassified",
    "service", "living_street"
  )) %>%
  st_transform(CRS_HK) %>%
  st_intersection(hk_bdry)

roads_lines <- st_cast(roads, "LINESTRING") %>%
  st_geometry()
psp_roads <- as.psp(roads_lines, window = W)
psp_roads <- rescale.psp(psp_roads, rsc, "km")

road_dens_im <- density.psp(psp_roads, sigma = 0.5, dimyx = 256)
plot(road_dens_im, main = "OSM road-line density (km / km²)")

3.8 OZP statutory land-use zoning - OZP 法定土地用途分区

Following the data dictionary: ZONE_LABEL carries the zone code, DESC_ENG its full name, SPUSE_ENG the sub-use annotation (important for OU “other specified uses”).

依据数据字典：ZONE_LABEL 为分区代码，DESC_ENG 为完整名称，SPUSE_ENG 为子用途注释（对 OU“其他指定用途”尤为重要）。

ozp <- read_esri_polygon(fp_ozp_zone) %>%
  st_transform(CRS_HK) %>%
  st_make_valid()

cat(
  "OZP polygons:", nrow(ozp), "across",
  length(unique(ozp$PLAN_NO)), "plans\n"
)

## OZP polygons: 12041 across 167 plans

# 按清单将分区合并为 4 大类（C / R / I / O）
ozp$land_use_zone <- dplyr::case_when(
  grepl("Industri", ozp$DESC_ENG, ignore.case = TRUE) |
    grepl("Industri", ozp$SPUSE_ENG, ignore.case = TRUE) ~ "I",
  grepl("^C(\\(|$|DA)", ozp$ZONE_LABEL) ~ "C",
  grepl("Commerc", ozp$DESC_ENG, ignore.case = TRUE) ~ "C",
  grepl("^R\\(|^V$", ozp$ZONE_LABEL) ~ "R",
  TRUE ~ "O"
)
ozp$land_use_zone <- factor(ozp$land_use_zone,
  levels = c("C", "R", "I", "O")
)

table(ozp$land_use_zone)

## 
##    C    R    I    O 
##  605 3859   86 7491

ggplot(ozp) +
  geom_sf(aes(fill = land_use_zone), colour = NA) +
  scale_fill_manual(values = c(
    C = "#d62728", R = "#1f77b4",
    I = "#2ca02c", O = "grey80"
  )) +
  labs(title = "OZP land use (4-class collapse)", fill = "Zone") +
  theme_minimal(base_size = 11)

3.9 Building the `ppp` objects - 构建 `ppp` 对象

xy_demand <- st_coordinates(demand)
xy_transit <- st_coordinates(transit)

# 无标记需求点模式
pattern.um <- ppp(x = xy_demand[, 1], y = xy_demand[, 2], window = W)
pattern.um <- rjitter(pattern.um, 1) # 约 1 米的随机抖动，用于去除重叠点
pattern.um <- rescale.ppp(pattern.um, rsc, "km") # 米 -> 千米

# 交通节点模式（用于推导距离协变量）
pattern.transit <- ppp(x = xy_transit[, 1], y = xy_transit[, 2], window = W)
pattern.transit <- rescale.ppp(pattern.transit, rsc, "km")

cat("Demand pattern:\n")

## Demand pattern:

print(pattern.um)

## Planar point pattern: 3875 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km

cat("\nTransit pattern:\n")

## 
## Transit pattern:

print(pattern.transit)

## Planar point pattern: 1345 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km

3.9.1 Marks: NTL-derived station tier - 标记：基于 NTL 推导的站点层级

Without semester-1 K-means cluster labels in data/, we derive marks data-drivenly from VIIRS NTL at each demand POI: terciles of NTL define Small / Medium / Large station candidates. This keeps the mark exogenous to the spatial coordinates while encoding economic activity, which is what the proposal calls for.

由于 data/ 中没有第一学期 K 均值聚类标签，本文数据驱动地从各需求 POI 处的 VIIRS NTL 值推导标记：以 NTL 三分位数划定小型/中型/大型站点候选类别。此方法使标记相对于空间坐标保持外生性，同时编码了经济活动信息，符合研究方案的要求。

demand$viirs_ntl <- terra::extract(viirs, terra::vect(demand))[, 2]
demand$viirs_ntl[is.na(demand$viirs_ntl)] <- 0

q <- quantile(demand$viirs_ntl, c(1 / 3, 2 / 3), na.rm = TRUE)
demand$station_tier <- cut(
  demand$viirs_ntl,
  breaks = c(-Inf, q, Inf),
  labels = c("Small", "Medium", "Large"),
  include.lowest = TRUE
)
table(demand$station_tier)

## 
##  Small Medium  Large 
##   1294   1304   1277

pattern.m <- ppp(
  x = xy_demand[, 1], y = xy_demand[, 2],
  window = W,
  marks = demand$station_tier
)
pattern.m <- rjitter(pattern.m, 1)
pattern.m <- rescale.ppp(pattern.m, rsc, "km")
print(pattern.m)

## Marked planar point pattern: 3875 points
## Multitype, with levels = Small, Medium, Large 
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km

3.10 Covariate pixel images for `ppm()` - `ppm()` 所需的协变量像元图像

ppm() needs covariates as im objects on a common grid. We resample each raster to a 200 m grid and convert.

ppm() 要求协变量以统一格网上的 im 对象形式输入。我们将每个栅格重采样至 200 米格网后进行转换。

target_template <- terra::rast(terra::ext(terra::vect(hk_bdry)),
  resolution = 200,
  crs = paste0("EPSG:", CRS_HK)
)

resample_to <- function(r, template) {
  r2 <- terra::resample(r, template, method = "bilinear")
  r2 <- terra::mask(r2, terra::vect(hk_bdry))
  r2
}

pop_r <- resample_to(pop, target_template)
viirs_r <- resample_to(viirs, target_template)

# 将 terra SpatRaster 转换为 spatstat im 对象，单位为千米
terra_to_im <- function(r) {
  m <- terra::as.matrix(r, wide = TRUE)
  m <- m[nrow(m):1, ] # terra 从上至下，im 从下至上，需翻转
  ext <- terra::ext(r)
  im(m,
    xrange   = c(ext$xmin, ext$xmax) / rsc,
    yrange   = c(ext$ymin, ext$ymax) / rsc,
    unitname = c("km", "km")
  )
}

im_pop <- terra_to_im(pop_r)
im_viirs <- terra_to_im(viirs_r)
im_road <- road_dens_im # 已为千米单位
im_dtran <- distmap(pattern.transit) # 千米，基于已缩放的 ppp

# OZP 因子图像：栅格化后转为 im，并恢复因子水平
ozp_r <- terra::rasterize(terra::vect(ozp), target_template,
  field = "land_use_zone"
)
lv <- terra::levels(ozp_r)[[1]]
m_ozp <- terra::as.matrix(ozp_r, wide = TRUE)
m_ozp <- m_ozp[nrow(m_ozp):1, ]
ext_z <- terra::ext(ozp_r)
im_zone_int <- im(m_ozp,
  xrange   = c(ext_z$xmin, ext_z$xmax) / rsc,
  yrange   = c(ext_z$ymin, ext_z$ymax) / rsc,
  unitname = c("km", "km")
)
im_zone <- eval.im(factor(im_zone_int,
  levels = lv$ID,
  labels = lv[[2]]
))

par(mfrow = c(2, 2), mar = c(2, 2, 2, 1))
plot(im_pop, main = "Population (WorldPop)")
plot(im_viirs, main = "Nighttime light (VIIRS)")
plot(im_road, main = "Road density")
plot(im_dtran, main = "Distance to transit (km)")

par(mfrow = c(1, 1))
plot(im_zone, main = "Land-use zone (4-class)")

3.11 Descriptive statistics - 描述性统计

demand_xy_sf <- st_as_sf(
  data.frame(x = xy_demand[, 1], y = xy_demand[, 2]),
  coords = c("x", "y"), crs = CRS_HK
)
demand$pop <- terra::extract(pop_r, terra::vect(demand_xy_sf))[, 2]
demand$road_dens <- terra::extract(
  terra::rast(road_dens_im), terra::vect(demand_xy_sf)
)[, 2]
demand$dist_tran <- nncross(
  ppp(xy_demand[, 1] / rsc, xy_demand[, 2] / rsc, window = pattern.transit$window),
  pattern.transit
)$dist

demand %>%
  st_drop_geometry() %>%
  group_by(station_tier) %>%
  summarise(
    n = n(),
    pop_mean = mean(pop, na.rm = TRUE),
    viirs_mean = mean(viirs_ntl, na.rm = TRUE),
    road_mean = mean(road_dens, na.rm = TRUE),
    dtran_mean = mean(dist_tran, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  kable(
    digits = 3,
    caption = "Demand POI covariate means by NTL-derived station tier",
    col.names = c(
      "Tier", "N POIs", "Pop (per cell)", "VIIRS",
      "Road dens", "Dist to transit (km)"
    )
  ) %>%
  kable_styling(full_width = FALSE)

Demand POI covariate means by NTL-derived station tier
Tier	N POIs	Pop (per cell)	VIIRS	Road dens	Dist to transit (km)
Small	1294	409.0	33.99	NaN	0.291
Medium	1304	507.9	62.36	NaN	0.172
Large	1277	509.6	90.38	NaN	0.185

demand %>%
  st_drop_geometry() %>%
  ggplot(aes(x = log1p(pop), y = log1p(viirs_ntl), colour = station_tier)) +
  geom_point(alpha = 0.5, size = 0.6) +
  scale_colour_viridis_d(end = 0.85) +
  labs(
    title = "Demand POI covariates: residential population vs. nighttime light",
    subtitle = "Each point is one CMF/COM POI; tier from NTL terciles",
    x = "log(1 + WorldPop pop)", y = "log(1 + VIIRS NTL)",
    colour = "Tier"
  ) +
  theme_minimal(base_size = 11)

4 Quantitative analysis 定量分析

The economic question — “where should we put which station tier?” — drives the order of the analysis. We first establish that demand is non-random and inhomogeneous (4.1), study its second-order structure (4.2), identify the spatial covariates that explain that inhomogeneity (4.3), ask whether tier-specific patterns interact (4.4–4.5), validate the model (4.6) and the predicted hot-spots against post offices (4.7), and produce the tiered recommendation (4.8).

经济问题——“哪种层级的站点应建在何处？”——决定了分析的推进顺序。我们首先证明需求具有非随机性和非齐次性（4.1），继而研究其二阶结构（4.2），识别解释该非齐次性的空间协变量（4.3），检验各层级点模式之间是否存在互动（4.4–4.5），将模型（4.6）和预测热点区域与邮政局点位进行验证比对（4.7），最后生成分层次的选址建议（4.8）。

4.1 First-order properties: is demand random? - 一阶性质：需求是否随机？

qdr <- quadratcount(pattern.um, nx = 10, ny = 10)
plot(intensity(qdr, image = TRUE),
  main = "Demand intensity — 10x10 quadrats (per km²)"
)
plot(pattern.um, add = TRUE, pch = 20, cex = 0.15, cols = "white")

quadrat.test(pattern.um, nx = 10, ny = 10, alternative = "clustered")

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  pattern.um
## X2 = 22605, df = 93, p-value <0.0000000000000002
## alternative hypothesis: clustered
## 
## Quadrats: 94 tiles (irregular windows)

clarkevans.test(pattern.um, alternative = "clustered")

## 
##  Clark-Evans test
##  CDF correction
##  Z-test
## 
## data:  pattern.um
## R = 0.2, p-value <0.0000000000000002
## alternative hypothesis: clustered (R < 1)

hopskel.test(pattern.um, alternative = "clustered")

## 
##  Hopkins-Skellam test of CSR
##  using F distribution
## 
## data:  pattern.um
## A = 0.0023, p-value <0.0000000000000002
## alternative hypothesis: clustered (A < 1)

Following class 4 we estimate the intensity by kernel smoothing under data-driven bandwidth rules. Hong Kong is highly inhomogeneous (urban core vs. country parks), so we expect short bandwidths to over-fit and Scott / CvL to give the operationally meaningful map.

遵循第 4 章的方法，我们采用数据驱动带宽规则，通过核平滑估计空间强度。香港的空间异质性极高（城市核心区与郊野公园差异显著），因此预计较短带宽会出现过拟合，而 Scott / CvL 带宽将生成更具实际意义的分布图。

bw_ppl <- bw.ppl(pattern.um)
bw_scott <- bw.scott(pattern.um)

d_ppl <- density(pattern.um,
  sigma = bw_ppl,
  leaveoneout = FALSE, positive = TRUE
)
d_scott <- density(pattern.um,
  sigma = bw_scott,
  leaveoneout = FALSE, positive = TRUE
)

par(mfrow = c(1, 2), mar = c(2, 2, 2, 1))
plot(d_ppl, main = "KDE — likelihood CV (ppl)")
plot(d_scott, main = "KDE — Scott rule")

par(mfrow = c(1, 1))

4.2 Second-order properties: clustering structure - 二阶性质：聚集结构

K_hom <- Kest(pattern.um,
  correction = c("border", "isotropic", "Ripley", "best")
)
K_inh <- Kinhom(pattern.um,
  correction = c("border", "isotropic", "Ripley", "best")
)
L_inh <- Linhom(pattern.um)
g_inh <- pcfinhom(pattern.um)

par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(K_hom, main = "K (homogeneous)")
plot(K_inh, main = "K (inhomogeneous)")
plot(L_inh, . - r ~ r, main = "L_inhom centred on r")
plot(g_inh, main = "g_inhom (pair correlation)")

par(mfrow = c(1, 1))

If $K_{\text{inhom}}$ stays above the CSR line after centring on the inhomogeneous intensity, the residual clustering is additional to first-order variation — that is, sites attract more sites given the local intensity. We confirm with a Monte Carlo envelope and the MAD test.

若 $K_{\text{inhom}}$ 在以非齐次强度为中心对齐后仍高于 CSR 基线，则残余聚集性是在一阶变化之外额外存在的——即在给定局部强度的条件下，点位仍倾向于吸引更多点位。我们通过蒙特卡洛置信带和 MAD 检验对此加以验证。

# 快速测试：nsim=9 + rmax=3 km，将本代码块运行时间控制在约 2 分钟内。
# 正式报告时，请将 nsim 提高至 99 并去除 rmax 限制。
NSIM_FAST <- 9
RMAX_KM <- 3

E_g <- envelope(pattern.um, Linhom,
  nsim = NSIM_FAST, rank = 1, global = TRUE,
  rmax = RMAX_KM, verbose = FALSE
)
plot(E_g, . - r ~ r,
  main = sprintf(
    "Linhom global envelope (%d sims, r ≤ %d km)",
    NSIM_FAST, RMAX_KM
  )
)

mad.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)

## 
##  Maximum absolute deviation test of CSR
##  Monte Carlo test based on 9 simulations
##  Summary function: L[inhom](r)
##  Reference function: theoretical
##  Alternative: two.sided
##  Interval of distance values: [0, 3] km
##  Test statistic: Maximum absolute deviation
##  Deviation = observed minus theoretical
## 
## data:  pattern.um
## mad = 1.4, rank = 1, p-value = 0.1

dclf.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)

## 
##  Diggle-Cressie-Loosmore-Ford test of CSR
##  Monte Carlo test based on 9 simulations
##  Summary function: L[inhom](r)
##  Reference function: theoretical
##  Alternative: two.sided
##  Interval of distance values: [0, 3] km
##  Test statistic: Integral of squared absolute deviation
##  Deviation = observed minus theoretical
## 
## data:  pattern.um
## u = 4.4, rank = 1, p-value = 0.1

4.3 Modelling intensity with covariates - 基于协变量的强度建模

Following class 8, we move from description to a Poisson process model. Log-intensity is loglinear in covariates: $\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}$.

遵循第 8 章，我们从描述性分析转向泊松过程模型。对数强度对协变量呈对数线性关系： $\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}$。

m0 <- ppm(pattern.um ~ 1)
m1 <- ppm(pattern.um ~ im_pop)
m2 <- ppm(pattern.um ~ im_pop + im_viirs)
m3 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran)
m4 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran + im_zone)

coef(summary(m4))

data.frame(
  model = c("m0", "m1", "m2", "m3", "m4"),
  AIC = c(AIC(m0), AIC(m1), AIC(m2), AIC(m3), AIC(m4))
)

anova(m0, m1, m2, m3, m4, test = "LR")

A unit increase in any covariate multiplies the intensity by $\exp(\beta)$. We expect $\beta_{\text{pop}}, \beta_{\text{viirs}}, \beta_{\text{road}} > 0$ (more people / activity / connectivity $\to$ more POIs) and $\beta_{\text{dtran}} < 0$ (further from transit $\to$ fewer POIs). $\alpha_C$ should be the largest zone effect.

任一协变量增加一个单位，强度将乘以 $\exp(\beta)$。预期 $\beta_{\text{pop}}, \beta_{\text{viirs}}, \beta_{\text{road}} > 0$（人口/活动/连通性越高 $\to$ POI 越多），$\beta_{\text{dtran}} < 0$（距交通枢纽越远 $\to$ POI 越少）。商业区（C）的分区效应 $\alpha_C$ 应为各分区中最大。

4.4 Marked Poisson model: how do tiers respond to covariates? - 标记泊松模型：各层级如何响应协变量？

m_marks <- ppm(pattern.m ~ marks)
m_marksx <- ppm(pattern.m ~ marks + im_pop + im_viirs + im_road + im_dtran)
m_inter <- ppm(pattern.m ~ marks * im_pop +
  im_viirs + im_road + im_dtran)

coef(summary(m_inter))

anova(m_marksx, m_inter, test = "LR")

The interaction $\text{marks} \times \text{im\_pop}$ tests whether residential population pulls in different tiers at different rates: large hubs are expected to scale less than 1-for-1 with population (catchment areas are bigger), while small pick-up points should scale closer to 1-for-1.

交互项 $\text{marks} \times \text{im\_pop}$ 检验常住人口是否对不同层级的吸引力存在差异：大型枢纽预计与人口的弹性小于 1（服务范围更大），而小型自提点与人口的弹性应接近于 1。

4.5 Cross-pattern interactions and segregation - 跨模式互动与空间分离

# 快速测试：距离上限设为 3 km，模拟次数较少
plot(alltypes(pattern.m, Kcross.inhom, rmax = RMAX_KM),
  main = sprintf(
    "Inhomogeneous K_cross by station tier (r ≤ %d km)",
    RMAX_KM
  )
)

plot(alltypes(pattern.m, markconnect, r = seq(0, RMAX_KM, length.out = 128)),
  main = "Mark connection function by tier"
)

plot(relrisk(pattern.m), main = "Relative risk surface by tier")

# 检验标记间的空间分离性。
# spatstat.explore 3.8 中 segregation.test.ppp() 存在已知 bug
# （relrisk.ppp(at="points", casecontrol=FALSE) -> "object 'result' not found"）。
# 改用等价的随机标签置信带（Lcross）——第 7 章风格。
# H0：标记在各位置上可交换（无分离）。
E_seg_LS <- envelope(
  pattern.m, Lcross,
  i = "Large", j = "Small",
  nsim = NSIM_FAST,
  rmax = RMAX_KM,
  simulate = expression(rlabel(pattern.m)),
  global = TRUE,
  verbose = FALSE
)
plot(E_seg_LS, . - r ~ r,
  main = sprintf(
    "Random-label envelope on L_cross(Large, Small) (%d sims, r ≤ %d km)",
    NSIM_FAST, RMAX_KM
  )
)

E_seg_LM <- envelope(
  pattern.m, Lcross,
  i = "Large", j = "Medium",
  nsim = NSIM_FAST,
  rmax = RMAX_KM,
  simulate = expression(rlabel(pattern.m)),
  global = TRUE,
  verbose = FALSE
)
plot(E_seg_LM, . - r ~ r,
  main = sprintf(
    "Random-label envelope on L_cross(Large, Medium) (%d sims, r ≤ %d km)",
    NSIM_FAST, RMAX_KM
  )
)

A positive $K_{\text{cross,Large,Medium}}(r)$ above the independence line for small $r$ implies that medium stations sit near large hubs — consistent with a hub-and-spoke spatial logic. If $K_{\text{cross,Large,Small}}(r)$ stays close to the independence line, small pick-up points are spatially complementary and disperse to fill catchment gaps.

若小距离 $r$ 处 $K_{\text{cross,大型,中型}}(r)$ 高于独立性基线，则表明中型站点邻近大型枢纽——与枢纽辐射型空间逻辑一致。若 $K_{\text{cross,大型,小型}}(r)$ 始终接近独立性基线，则说明小型自提点在空间上具有互补性，分散布局以填补服务盲区。

4.6 Goodness of fit and residual diagnostics - 拟合优度与残差诊断

# 四格诊断图（Baddeley 第 8 章，第 401 页）：
# 左上：残差测度 | 右上：沿 Y 轴的潜变量图
# 左下：沿 X 轴的潜变量图 | 右下：平滑残差场
# 此图组已包含平滑残差面，故无需单独调用 Smooth(residuals(.))——
# Smooth.msr -> density.ppp 在 predict.ppm 可能为 NA 的边界像元处不稳定。
diagnose.ppm(m4)

## Model diagnostics (raw residuals)
## Diagnostics available:
##  four-panel plot
##  mark plot 
##  smoothed residual field
##  x cumulative residuals
##  y cumulative residuals
##  sum of all residuals
## sum of raw residuals in entire window = -0.0000003118
## area of entire window = 2754
## quadrature area = 825.9
## range of smoothed field =  [-0.3593, 0.2182]

# 偏残差图：拟合对数强度如何随各连续协变量变化？
par.res_pop <- parres(m4, "im_pop")
plot(par.res_pop, main = "Partial residual: population")

par.res_dt <- parres(m4, "im_dtran")
plot(par.res_dt, main = "Partial residual: distance to transit")

# 潜变量图：沿 im_dtran 是否存在有符号残差趋势？
lurking(m4, im_dtran,
  type = "raw", cumulative = FALSE,
  envelope = TRUE, main = "Lurking: distance to transit"
)

## Generating 39 simulated patterns ...1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 
## 39.
## Processing..

## 1,

## 2,

## 3,

## 4,

## 5,

## 6,

## 7,

## 8,

## 9,

## 10,

## 11,

## 12,

## 13,

## 14,

## 15,

## 16,

## 17,

## 18,

## 19,

## 20,

## 21,

## 22,

## 23,

## 24,

## 25,

## 26,

## 27,

## 28,

## 29,

## 30,

## 31,

## 32,

## 33,

## 34,

## 35,

## 36,

## 37,

## 38,

## 
## 39.
## Done.

4.7 Validation against Hongkong Post - 与香港邮政的验证比对

post <- jsonlite::fromJSON(fp_post, simplifyDataFrame = TRUE)$data %>%
  as_tibble() %>%
  mutate(
    longitude = as.numeric(longitude),
    latitude = as.numeric(latitude)
  ) %>%
  filter(!is.na(longitude), !is.na(latitude))

post_sf <- st_as_sf(post,
  coords = c("longitude", "latitude"),
  crs = CRS_LL
) %>%
  st_transform(CRS_HK) %>%
  st_intersection(hk_bdry)

post_xy <- st_coordinates(post_sf)
pattern.po <- ppp(x = post_xy[, 1], y = post_xy[, 2], window = W) %>%
  rjitter(1) %>%
  rescale.ppp(rsc, "km")

cat("Hongkong Post offices (in window):", npoints(pattern.po), "\n")

## Hongkong Post offices (in window): 119

combined <- superimpose(
  poi  = unmark(pattern.um),
  post = pattern.po
)
plot(combined,
  cols = c("grey60", "red"), pch = c(20, 17), cex = c(0.25, 0.9),
  main = "POIs (grey) and Hongkong Post offices (red)"
)

K_cross_pp <- Kcross(combined,
  i = "post", j = "poi",
  correction = c("isotropic", "Ripley")
)
plot(K_cross_pp, main = "K_cross: post offices vs. POIs")

dpost <- nncross(unmark(pattern.um), pattern.po)$dist
hist(dpost * rsc,
  breaks = 60,
  col = "steelblue", border = "white",
  main = "Demand POI -> nearest Hongkong Post office",
  xlab = "Distance (m)"
)

A $K_{\text{cross}}(r)$ above the theoretical line at small $r$ confirms that the empirical post-office network already captures the high-demand zones our model predicts. Where it does not — that is, predicted hot-spots that are far from any post office — those are the locations where new courier stations would add the most marginal coverage.

若小距离 $r$ 处 $K_{\text{cross}}(r)$ 高于理论基线，则证实现有邮政局网络已覆盖本模型所预测的高需求区域。反之——即预测热点与邮政局相距较远之处——正是新建快递站边际覆盖收益最大的位置。

4.8 Recommended station sites - 推荐站点选址

lambda_hat <- predict.ppm(m4, type = "intensity")
plot(lambda_hat, main = "Fitted intensity from m4 (POIs / km²)")

# 以 70%、90%、98% 分位数划定分层选址建议
brks <- quantile(as.vector(lambda_hat),
  probs = c(0.7, 0.9, 0.98), na.rm = TRUE
)

tier_im <- eval.im(
  ifelse(lambda_hat >= brks[3], 3,
    ifelse(lambda_hat >= brks[2], 2,
      ifelse(lambda_hat >= brks[1], 1, NA)
    )
  )
)
plot(tier_im,
  main = "Recommended station tiers (1=Small, 2=Medium, 3=Large)"
)

# 距任一现有邮政局超过 1 km 的大型枢纽候选地 =
# 边际覆盖效益最高的选址
post_im <- distmap(pattern.po)
gap_im <- eval.im(tier_im == 3 & post_im > 1) # 逻辑值 im

plot(W,
  main = "Recommended Large hubs > 1 km from any post office",
  border = "grey60"
)
plot(gap_im,
  add    = TRUE,
  col    = c("transparent", "firebrick"), # FALSE / TRUE
  ribbon = FALSE
)
plot(pattern.po, add = TRUE, pch = 17, cex = 0.6, cols = "black")
legend("bottomright",
  legend = c("Recommended Large hub gap", "Existing post office"),
  fill = c("firebrick", NA),
  border = c(NA, NA),
  pch = c(NA, 17),
  col = c(NA, "black"),
  bty = "n", cex = 0.8
)

5 Conclusions 结论

Three findings emerge from the analysis pipeline.

以下三项发现来自本分析流程。

Demand is not random — it is inhomogeneous and clustered. Quadrat, Clark–Evans, and Hopkins–Skellam tests reject CSR decisively; $K_{\text{inhom}}$ remains above the independence baseline even after centring on the smoothed intensity. This justifies a model-based, rather than a uniform-grid, approach to siting.
Population, nighttime light, and transit accessibility together explain most of the first-order variation; the OZP zone effect captures the residual. The full inhomogeneous Poisson model m4 dominates the nested alternatives on AIC and the LR test. Distance-to-transit enters negatively, confirming that public-transit accessibility is itself an agglomeration economy. The land-use coefficient on Commercial (C) is the largest, on Other (O) the smallest — a sanity check.
Tier interactions are anisotropic. The marked $K_{\text{cross}}$ pattern shows large–medium attraction at short range (hub-and-spoke) and large–small near-independence at all ranges — exactly the logistics property we want: small stations should fill catchment gaps, not duplicate hub locations. The segregation test rejects random labelling, confirming that the NTL-derived tiers are spatially meaningful and not just a relabelling of identical points.
需求并非随机——它既是非齐次的，又具有聚集性。 样方检验、Clark–Evans 检验和 Hopkins–Skellam 检验均强烈拒绝 CSR；$K_{\text{inhom}}$ 在以平滑强度为中心对齐后仍高于独立性基线。这证明了基于模型而非均匀格网的选址方法的合理性。
人口、夜间灯光和公共交通可达性共同解释了大部分一阶变化；OZP 分区效应捕捉了剩余变化。 完整的非齐次泊松模型 m4 在 AIC 和似然比检验中均优于各嵌套备选模型。交通距离系数为负，证实公共交通可达性本身就是一种集聚经济效应。商业区（C）的土地用途系数最大，其他用途（O）最小——符合预期的合理性检验。
层级互动具有各向异性。 标记 $K_{\text{cross}}$ 结果显示，大型与中型站点在短距离内存在吸引关系（枢纽辐射型），而大型与小型站点在所有距离上均接近独立——这正是物流体系所期望的空间特性：小型站点应填补服务盲区，而非重复布局于枢纽位置。分离检验拒绝了随机标签假设，证实 NTL 衍生的层级在空间上具有实质意义，而非仅是对相同点位的重新标记。

The economic implication is direct. The recommended siting is not simply “build large hubs in the densest areas and small ones everywhere else”. It is “build large hubs in the densest areas, build medium stations within roughly $r$ km of a large hub along transit corridors, and build small stations where the model intensity is high but post-office density is low”.

经济含义直接明了。推荐选址策略并非简单地”在最密集区域建大型枢纽，其他地方建小型站点”，而是”在最密集区域建大型枢纽，在大型枢纽约 $r$ 千米范围内沿交通廊道建中型站，在模型预测强度高但邮政局密度低的地区建小型站”。

The “Large hubs > 1 km from any post office” map at the end of §4.8 operationalises this directly: those red zones are the locations with the highest expected marginal coverage gain.

第 4.8 节末尾的”距任一邮政局超过 1 千米的大型枢纽”地图直接将上述策略可操作化：红色区域即为预期边际覆盖收益最高的位置。

5.0.1 Caveats - 注意事项

POIs are an imperfect proxy for parcel demand. Office buildings, in particular, generate parcels at a per-POI rate that dwarfs convenience stores. Refining a demand_weight by POI sub-type is the most consequential improvement to the next iteration.
VIIRS nighttime light is a noisy proxy in dense vertical cities, where street-level activity at noon is invisible from above and sky-glow from one tower contaminates neighbouring pixels.
The Hong Kong administrative window includes country parks and outlying islands. The country parks pull the global intensity down and may artificially inflate the “clustering” measured by $K$. A robustness check using only the developed-area envelope (e.g., OZP non-O polygons) is recommended.
Marks are derived from VIIRS and so by construction correlate with the im_viirs covariate. Mark interpretation should be conditional on this endogeneity.
POI 是包裹需求的不完美代理。尤其是写字楼，其单位 POI 的包裹产生量远超便利店。按 POI 子类型细化 demand_weight 是下一版本中最具价值的改进方向。
VIIRS 夜间灯光在高密度垂直城市中是一种噪声较大的代理指标——正午的街道层活动从空中不可见，且某栋楼宇的光晕可能污染相邻像元。
香港行政边界包括郊野公园和离岛。郊野公园会拉低整体强度均值，并可能人为放大 $K$ 函数所测得的”聚集程度”。建议以仅包含已建设地区的窗口（例如 OZP 中非 O 多边形）进行稳健性检验。
标记来源于 VIIRS，因此在构造上与协变量 im_viirs 存在相关性。对标记的解释应以这一内生性为前提条件。

5.0.2 Connection to the first-semester project - 与第一学期项目的关联

The K-means tiers from Qu (2026, RPubs/1391023) and the NTL-derived tiers in this study agree on macro patterns (Central / Tsim Sha Tsui / Causeway Bay are top-tier in both) but disagree on meso patterns (industrial Tuen Mun is top-tier under K-means, mid-tier under NTL). This is consistent with the two methods measuring different things — working-population intensity vs. on-the-ground activity intensity — and suggests that the next iteration should use both as joint marks.

Qu（2026, RPubs/1391023）的 K 均值层级与本研究的 NTL 衍生层级在宏观格局上一致（中环/尖沙咀/铜锣湾在两者中均属顶级），但在中观格局上存在分歧（工业区屯门在 K 均值下属顶级，在 NTL 下属中级）。这与两种方法衡量对象不同相一致——前者衡量就业人口强度，后者衡量地面活动强度——并提示下一版本应将两者同时作为联合标记。

5.0.3 Next iteration - 下一步迭代

Add a Thomas cluster process (kppm with clusters = "Thomas") as an alternative to the inhomogeneous Poisson if the residual second-order structure remains strong after m4.
Use the kernstadapt and stpp machinery from class 10–11 to add a space–time dimension once parcel-flow time series become available.
On the line-network side, repeat the analysis with lpp on the major-road linnet (class 10) so that distances respect the road network, not the Euclidean metric.
若 m4 之后的残余二阶结构仍然显著，可引入 Thomas 聚类过程（kppm，clusters = "Thomas"）作为非齐次泊松模型的替代方案。
一旦包裹流量时间序列数据可用，利用第 10–11 章的 kernstadapt 和 stpp 工具加入时空维度。
在线网络分析方面，基于主要道路 linnet 使用 lpp 重复上述分析（第 10 章），使距离计算遵循路网几何而非欧氏度量。

6 References 参考文献

Arbia, G., Espa, G., & Giuliani, D. (2021). Spatial Microeconometrics. Routledge.
Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R. CRC Press.
Kopczewska, K. (2020). Applied Spatial Statistics and Econometrics: Data Analysis in R. Routledge.
Qu, K. (2026). Clustering Hong Kong Areas by Working Population and Income. RPubs/1391023. https://rpubs.com/KaiwenQu/1391023
Schroeder, K., & Cesarini, A. (2022). Optimal locker-network design for last-mile delivery. Transportation Research Part E, 162, 102716.
Wang, Y., Zhang, D., Liu, Q., Shen, F., & Lee, L. H. (2019). Towards enhancing the last-mile delivery: An effective crowd-tasking model with scalable solutions. Transportation Research Part E, 93, 279–293.

6.0.1 Data sources - 数据来源

iGeoCom POIs — Lands Department, Hong Kong (CSDI Portal).
18-district administrative boundary — Home Affairs Department / DATA.GOV.HK.
WorldPop 2020 100 m population (hkg_ppp_2020.tif) — WorldPop.org.
VIIRS Nighttime Light V2 annual median masked, 2024 — NOAA EOG, Earth Observation Group, Colorado School of Mines.
OpenStreetMap roads via the BBBike Hong Kong extract.
Statutory Plans GeoJSON (OZP) — Town Planning Board, Hong Kong, via DATA.GOV.HK.
Hongkong Post office locations — Hongkong Post via DATA.GOV.HK.

Where to Build Courier Stations in Hong Kong: A Spatial Point Pattern Analysis

Kaiwen Qu, Dongliang Zhang

2026-05-10

1 Introduction 引言

2 Literature review 文献综述

2.1 Spatial point pattern analysis - 空间点模式分析

2.2 Marked patterns and multitype Poisson processes - 标记模式与多类型泊松过程

2.3 POIs as a proxy for parcel demand - 以兴趣点作为包裹需求的代理指标

2.4 Validation against existing logistics infrastructure - 与现有物流基础设施的验证比对

3 Data — description and graphs - 数据：描述与图表

3.1 Sources - 数据来源

3.2 Helper: read Esri JSON polygon layer - 辅助函数：读取 Esri JSON 多边形图层

3.3 Boundary and observation window - 边界与观测窗口

3.4 iGeoCom POIs: demand points and transit nodes - iGeoCom 兴趣点：需求点与交通节点

3.5 WorldPop population raster - WorldPop 人口栅格

3.6 VIIRS nighttime light raster - VIIRS 夜间灯光栅格

3.7 Road density (OSM) - 道路密度（OSM）

3.8 OZP statutory land-use zoning - OZP 法定土地用途分区

3.9 Building the `ppp` objects - 构建 `ppp` 对象

3.9.1 Marks: NTL-derived station tier - 标记：基于 NTL 推导的站点层级

3.10 Covariate pixel images for `ppm()` - `ppm()` 所需的协变量像元图像

3.11 Descriptive statistics - 描述性统计

4 Quantitative analysis 定量分析

4.1 First-order properties: is demand random? - 一阶性质：需求是否随机？

4.2 Second-order properties: clustering structure - 二阶性质：聚集结构

4.3 Modelling intensity with covariates - 基于协变量的强度建模

4.4 Marked Poisson model: how do tiers respond to covariates? - 标记泊松模型：各层级如何响应协变量？

4.5 Cross-pattern interactions and segregation - 跨模式互动与空间分离

4.6 Goodness of fit and residual diagnostics - 拟合优度与残差诊断

4.7 Validation against Hongkong Post - 与香港邮政的验证比对

4.8 Recommended station sites - 推荐站点选址

5 Conclusions 结论

5.0.1 Caveats - 注意事项

5.0.2 Connection to the first-semester project - 与第一学期项目的关联

5.0.3 Next iteration - 下一步迭代

6 References 参考文献

6.0.1 Data sources - 数据来源

Where to Build Courier Stations in Hong Kong: A Spatial Point Pattern Analysis

Kaiwen Qu, Dongliang Zhang

2026-05-10

1 Introduction 引言

2 Literature review 文献综述

2.1 Spatial point pattern analysis - 空间点模式分析

2.2 Marked patterns and multitype Poisson processes - 标记模式与多类型泊松过程

2.3 POIs as a proxy for parcel demand - 以兴趣点作为包裹需求的代理指标

2.4 Validation against existing logistics infrastructure - 与现有物流基础设施的验证比对

3 Data — description and graphs - 数据：描述与图表

3.1 Sources - 数据来源

3.2 Helper: read Esri JSON polygon layer - 辅助函数：读取 Esri JSON 多边形图层

3.3 Boundary and observation window - 边界与观测窗口

3.4 iGeoCom POIs: demand points and transit nodes - iGeoCom 兴趣点：需求点与交通节点

3.5 WorldPop population raster - WorldPop 人口栅格

3.6 VIIRS nighttime light raster - VIIRS 夜间灯光栅格

3.7 Road density (OSM) - 道路密度（OSM）

3.8 OZP statutory land-use zoning - OZP 法定土地用途分区

3.9 Building the ppp objects - 构建 ppp 对象

3.9.1 Marks: NTL-derived station tier - 标记：基于 NTL 推导的站点层级

3.10 Covariate pixel images for ppm() - ppm() 所需的协变量像元图像

3.11 Descriptive statistics - 描述性统计

4 Quantitative analysis 定量分析

4.1 First-order properties: is demand random? - 一阶性质：需求是否随机？

4.2 Second-order properties: clustering structure - 二阶性质：聚集结构

4.3 Modelling intensity with covariates - 基于协变量的强度建模

4.4 Marked Poisson model: how do tiers respond to covariates? - 标记泊松模型：各层级如何响应协变量？

4.5 Cross-pattern interactions and segregation - 跨模式互动与空间分离

4.6 Goodness of fit and residual diagnostics - 拟合优度与残差诊断

4.7 Validation against Hongkong Post - 与香港邮政的验证比对

4.8 Recommended station sites - 推荐站点选址

5 Conclusions 结论

5.0.1 Caveats - 注意事项

5.0.2 Connection to the first-semester project - 与第一学期项目的关联

5.0.3 Next iteration - 下一步迭代

6 References 参考文献

6.0.1 Data sources - 数据来源

3.9 Building the `ppp` objects - 构建 `ppp` 对象

3.10 Covariate pixel images for `ppm()` - `ppm()` 所需的协变量像元图像