Hong Kong is one of the densest urban economies in the world, with one of the highest e-commerce penetration rates in Asia. Last-mile parcel logistics — sorting hubs, locker stations, and pick-up points — therefore competes for very limited urban space. Where these facilities are placed has direct effects on delivery cost, courier travel distance, and customer waiting time. The economic question we ask is concrete: given where commercial activity actually takes place in Hong Kong, where should large, medium, and small courier stations be located?
香港是全球人口密度最高的城市经济体之一,其电子商务渗透率在亚洲名列前茅。因此,末端包裹物流——包括分拣中心、智能柜站点和自提点——不得不争夺极为有限的城市空间。这些设施的选址直接影响配送成本、快递员行驶距离以及客户等待时间。我们提出的经济问题非常具体:鉴于香港商业活动的实际分布情况,大型、中型和小型快递站应设在何处?
In the first-semester project (Qu, 2026, RPubs/1391023) we approached this question through area-level K-means clustering of working population and household income across Small Tertiary Planning Unit Groups (STPUGs). The clustering yielded three tiers of districts and informed a tiered station recommendation. That study had a clear limitation: it operated on aggregated polygons, so it could not say anything about within-area concentration of demand or about the geometry of station–station spacing.
在第一学期的项目中(Qu, 2026, RPubs/1391023),我们通过对小三级规划单元组(STPUGs)的就业人口与家庭收入进行区域层面的 K 均值聚类来探讨这一问题。聚类结果将各区划分为三个层级,并据此提出了分层站点建议。该研究存在明显局限:它基于聚合多边形进行分析,因而无法描述需求在区域内部的集中程度,也无法刻画站点间的空间几何关系。
This project upgrades the analysis to a spatial point pattern
analysis (SPPA). We treat commercial points of interest (POIs)
— the CMF/COM classes of Hong Kong’s official
iGeoCom POI dataset — as a proxy for parcel demand, and we
apply the methods discussed in classes 4–9 (Baddeley, Rubak &
Turner, 2015; Kopczewska, 2020) to ask three questions.
本项目将分析升级为空间点模式分析(SPPA)。我们以商业兴趣点(POI)——即香港官方
iGeoCom 数据集中 CMF/COM
类别的点位——作为包裹需求的代理指标,并运用课程第 4–9
章所介绍的方法(Baddeley, Rubak & Turner, 2015;Kopczewska,
2020)探讨三个问题。
Is courier demand in Hong Kong clustered? We use first-order intensity (KDE, quadrat tests) and second-order summary functions (Ripley’s \(K\), \(L\), \(g\)) to test against complete spatial randomness (CSR), under both homogeneous and inhomogeneous baselines.
What drives the spatial intensity of demand? We
fit inhomogeneous Poisson point process models with ppm()
using resident population (WorldPop), nighttime light radiance (VIIRS
Black Marble), road density (OSM), distance to the nearest
public-transit node (multi-modal: MTR, light rail, ferry, bus terminal),
and statutory land-use zone (Hong Kong OZP) as covariates.
How do tiered station types interact in space?
We mark each demand point by the local NTL tercile (Small / Medium /
Large) and study the marks with Kcross.inhom,
markconnect, Iest, and a Monte-Carlo
segregation test. We finally validate the model-implied high-demand
zones against the empirical pattern of Hongkong Post offices via
Kcross.
香港的快递需求是否存在空间聚集? 我们采用一阶强度(KDE、样方检验)和二阶汇总函数(Ripley 的 \(K\)、\(L\)、\(g\)),在齐次与非齐次基线下,分别对完全空间随机性(CSR)假设进行检验。
哪些因素驱动需求的空间强度? 我们使用
ppm()
拟合非齐次泊松点过程模型,以常住人口(WorldPop)、夜间灯光辐射亮度(VIIRS
Black
Marble)、道路密度(OSM)、距最近公共交通节点的距离(多模式:港铁、轻铁、渡轮、巴士总站)以及法定土地用途分区(香港
OZP)作为协变量。
不同层级站点类型在空间上如何互动?
我们依据各需求点所在位置的 NTL
三分位数(小型/中型/大型)为其打上标记,并借助
Kcross.inhom、markconnect、Iest
以及蒙特卡洛分离检验对标记进行分析。最后,我们通过 Kcross
将模型预测的高需求区域与香港邮政局点位的实际分布进行比对验证。
The economic deliverable is a tiered map of recommended courier station sites, backed by significance tests rather than visual inspection alone.
本研究的经济成果是一份分层次的快递站选址推荐地图,其依据来自显著性检验,而非单纯的目视判断。
requiredPackages <- c(
"sf", "spatstat", "terra", "sp", "magrittr",
"ggplot2", "tidyverse", "viridis", "jsonlite",
"dixon", "kableExtra", "gridExtra", "patchwork", "scales", "rnaturalearth"
)
for (p in requiredPackages) {
if (!requireNamespace(p, quietly = TRUE)) install.packages(p)
}
invisible(lapply(requiredPackages, library, character.only = TRUE))
The methodological backbone is Baddeley, Rubak & Turner (2015),
Spatial Point Patterns: Methodology and Applications with R. We
use the workflow taught in classes 4–9: build a ppp object
inside an owin window, summarise its first-order intensity
(homogeneous and inhomogeneous), test CSR with
quadrat.test, clarkevans.test, and
hopskel.test, and study second-order structure with
Kest, Lest, and pcf (with their
inhomogeneous counterparts Kinhom, Linhom,
pcfinhom). Departures from CSR are assessed with global and
pointwise envelopes and with the mad.test and
dclf.test Monte Carlo tests (classes 5 and 8).
本研究的方法论基础为 Baddeley, Rubak &
Turner(2015)所著《空间点模式:方法论及其在 R
中的应用》。我们沿用课程第 4–9 章所教授的工作流:在 owin
窗口内构建 ppp 对象,汇总其一阶强度(齐次与非齐次),用
quadrat.test、clarkevans.test 和
hopskel.test 检验 CSR,并借助
Kest、Lest 和 pcf(及其非齐次版本
Kinhom、Linhom、pcfinhom)研究二阶结构。对
CSR 的偏离程度通过全局与逐点置信带以及 mad.test 和
dclf.test 蒙特卡洛检验进行评估(第 5 和 8 章)。
Kopczewska (2020), Applied Spatial Statistics and Econometrics, provides the framing for moving from spatial description to a regression-style explanation: the intensity of points is treated as the dependent variable in a Poisson process model whose log-intensity is loglinear in covariates. We follow its applied treatment of spatial residual diagnostics, lurking variable plots, and the distinction between inhomogeneity that is covariate-explained and inhomogeneity that is residual clustering.
Kopczewska(2020)所著《应用空间统计与计量经济学》为从空间描述转向回归式解释提供了分析框架:将点的强度视为泊松过程模型中的因变量,其对数强度对协变量呈对数线性关系。本文遵循该书在空间残差诊断、潜在变量图以及区分协变量可解释的非齐次性与残差聚集性方面的应用处理方式。
Arbia, Espa & Giuliani (2021), Spatial Microeconometrics, motivates the economic framing. They argue that firms (and, by extension, retail-related points) cannot be modelled as if independently located: agglomeration externalities, infrastructure access, and zoning constraints generate observable spatial dependence. The authors recommend a two-step strategy — first remove inhomogeneity attributable to first-order covariates, then test whether second-order clustering remains. We adopt this recommendation: the inhomogeneous K function \(K_{\text{inhom}}\) is interpreted only after fitting the inhomogeneous Poisson model.
Arbia, Espa & Giuliani(2021)所著《空间微观计量经济学》为本研究的经济分析框架提供了理论动因。他们指出,企业(以及由此延伸的零售相关点位)不能被建模为相互独立的选址决策:集聚外部性、基础设施可达性以及用地管制共同产生了可观测的空间依赖性。作者建议采用两步策略——首先剔除一阶协变量所致的非齐次性,再检验二阶聚集是否依然存在。本文采纳这一建议:非齐次 K 函数 \(K_{\text{inhom}}\) 仅在拟合非齐次泊松模型之后才作解释。
For the tiered-station question we follow class 9
(rmpoispp, ppm(... ~ marks),
Kcross.inhom, markconnect, Iest,
relrisk.ppp). The literature on retail and firm location
(Arbia et al., 2021, ch. 4–5) treats interaction between firm types as
analogous to interaction between species in ecology, but with the added
constraint that economic types compete for and segregate within
land-use zones. This motivates using zone as a covariate in the marked
model and segregation.test.ppp as a model-free test of mark
separation.
针对分层站点问题,本文遵循第 9
章的方法(rmpoispp、ppm(... ~ marks)、Kcross.inhom、markconnect、Iest、relrisk.ppp)。零售与企业区位文献(Arbia
等,2021,第 4–5
章)将企业类型间的互动类比于生态学中的物种间互动,但附加了一个经济学约束:不同经济类型在土地用途分区内相互竞争并发生空间分离。这一逻辑促使本文将分区作为标记模型的协变量,并以
segregation.test.ppp 作为标记分离的无模型检验工具。
Direct parcel-flow data are not publicly available for Hong Kong. We follow a literature that uses POIs as a high-resolution proxy for commercial activity and for delivery demand (Wang et al., 2019, on POI density as a logistics demand indicator; Schroeder & Cesarini, 2022, on parcel-locker siting). The implicit assumption is monotonicity: areas with more shops, offices, and food-and-beverage points generate more parcels, even if the per-POI parcel rate differs by type.
香港没有公开可用的包裹流量直接数据。本文沿用以兴趣点作为商业活动与配送需求高分辨率代理指标的相关文献(Wang 等,2019,关于 POI 密度作为物流需求指标;Schroeder & Cesarini,2022,关于智能柜选址)。其隐含假设是单调性:拥有更多商店、写字楼和餐饮点的地区会产生更多包裹,即便各类型 POI 的单位包裹量存在差异。
Compared with OpenStreetMap, the official Hong Kong
iGeoCom dataset (Lands Department) supplies categorically
clean and quality-controlled POIs with classification codes
(CLASS/TYPE/SUBCAT) that map
directly to the zoning system. We therefore use iGeoCom for
both demand POIs and transport-node accessibility; OSM is used only for
road density.
与 OpenStreetMap 相比,香港地政总署提供的官方 iGeoCom
数据集包含分类清晰、经过质量控制的兴趣点,其分类编码(CLASS/TYPE/SUBCAT)可直接与分区系统对应。因此,本文在需求
POI 和交通节点可达性两方面均采用 iGeoCom 数据集;OSM
仅用于计算道路密度。
Following Arbia et al. (2021, ch. 6), we treat Hongkong Post offices as an observed control point pattern: they were sited under a public-service mandate (universal coverage), not under profit maximisation. If our predicted high-intensity zones spatially coincide with post offices at short distance (low \(r\) in \(K_{\text{cross}}\)), this confirms that the predicted zones are operationally accessible. If at intermediate distance the cross-K is below the independence baseline, this is consistent with post offices having captured the easy sites and our recommendations identifying complementary, non-overlapping locations.
遵循 Arbia 等(2021,第 6 章)的做法,本文将香港邮政局点位视为一种观测到的对照点模式:其选址基于公共服务职责(普遍覆盖),而非利润最大化。若本文预测的高强度区域在短距离内(\(K_{\text{cross}}\) 中较小的 \(r\))与邮政局点位在空间上重合,则证实预测区域具备实际可达性。若在中等距离处,交叉 K 函数值低于独立性基线,则表明邮政局已占据了条件优越的站点,而本文的推荐选址是互补的、非重叠的位置。
The dataset is constructed exactly as specified in
HK_dataset_checklist.xlsx. All paths are relative to the
project folder.
数据集完全按照 HK_dataset_checklist.xlsx
中的规范构建。所有路径均相对于项目文件夹。
| Block | Variable | File | CRS |
|---|---|---|---|
| Window | 18 districts boundary | data/hksar_18_district_boundary.json |
WGS84 |
| Demand | iGeoCom POIs (CLASS ∈ CMF,
COM) |
data/iGeoCom.geojson |
WGS84 |
| Transit | iGeoCom transit nodes (TYPE ∈
RSN/LRA/FER/PIE/FET/BUS) |
data/iGeoCom.geojson |
WGS84 |
| Covariate | Resident population | data/hkg_ppp_2020.tif (WorldPop) |
WGS84 |
| Covariate | Nighttime light radiance | data/VNL_npp_2024_..._median_masked.dat.tif.gz |
WGS84 |
| Covariate | Road segments (for road density) | data/hong-kong-260507-free.shp/gis_osm_roads_free_1.shp |
WGS84 |
| Covariate | OZP land-use zoning | data/GeoJSON_Statutory_Plans/Statutory Plan GIS Data GeoJSON/ZONE.json |
HK1980 |
| Validation | Hongkong Post offices | data/post-office.json |
WGS84 |
All layers are projected to HK1980 Grid (EPSG:2326)
for analysis (units in metres, then rescaled to km in the
ppp object).
所有图层均投影至 HK1980 格网(EPSG:2326)
进行分析(单位为米,在 ppp 对象中进一步缩放为千米)。
data_dir <- "data"
fp_bdry <- file.path(data_dir, "hksar_18_district_boundary.json")
fp_igeocom <- file.path(data_dir, "iGeoCom.geojson")
fp_pop <- file.path(data_dir, "hkg_ppp_2020.tif")
fp_viirs <- file.path(
data_dir,
"VNL_npp_2024_global_vcmslcfg_v2_c202502261200.median_masked.dat.tif.gz"
)
fp_roads <- file.path(
data_dir, "hong-kong-260507-free.shp",
"gis_osm_roads_free_1.shp"
)
fp_ozp_zone <- file.path(
data_dir, "GeoJSON_Statutory_Plans",
"Statutory Plan GIS Data GeoJSON", "ZONE.json"
)
fp_post <- file.path(data_dir, "post-office.json")
CRS_HK <- 2326 # HK 1980 格网坐标系(单位:米)
CRS_LL <- 4326 # WGS 84 地理坐标系
The Hong Kong Town Planning Board ships the OZP statutory plans as
Esri JSON (despite the folder name). The structure is
features[[i]]$attributes +
features[[i]]$geometry$rings, not the
properties + coordinates of GeoJSON. We write
a small parser.
香港城市规划委员会以 Esri JSON
格式(尽管文件夹名称含有 GeoJSON)发布 OZP 法定图则。其数据结构为
features[[i]]$attributes +
features[[i]]$geometry$rings,而非 GeoJSON 的
properties +
coordinates。为此,我们编写了一个小型解析器。
read_esri_polygon <- function(path) {
raw <- jsonlite::fromJSON(path, simplifyVector = FALSE)
sr <- raw$spatialReference
# 读取坐标参考系:优先使用 latestWkid,其次 wkid,默认 2326
crs_in <- if (!is.null(sr$latestWkid)) {
sr$latestWkid
} else if (!is.null(sr$wkid)) {
sr$wkid
} else {
2326
}
feats <- raw$features
# 将每个要素的 rings 转换为 sf 多边形
geoms <- lapply(feats, function(f) {
rings <- lapply(f$geometry$rings, function(r) {
matrix(unlist(r), ncol = 2, byrow = TRUE)
})
sf::st_polygon(rings)
})
# 提取属性表
attrs <- do.call(rbind, lapply(feats, function(f) {
as.data.frame(
lapply(
f$attributes,
function(x) if (is.null(x)) NA else x
),
stringsAsFactors = FALSE
)
}))
sf::st_sf(attrs, geometry = sf::st_sfc(geoms, crs = crs_in))
}
hk18 <- st_read(fp_bdry, quiet = TRUE) %>%
st_transform(CRS_HK) %>%
st_make_valid()
hk_bdry <- st_union(hk18) %>% st_make_valid()
W <- as.owin(hk_bdry)
# 缩放系数:1 km 对应 1000 m
rsc <- 1000
plot(W, main = "Hong Kong observation window (HK1980 Grid)")
plot(st_geometry(hk18), border = "grey60", add = TRUE)
igc <- st_read(fp_igeocom, quiet = TRUE)
igc <- st_transform(igc, CRS_HK)
# 裁剪至香港研究窗口范围内
igc <- igc[st_intersects(igc, hk_bdry, sparse = FALSE)[, 1], ]
# 需求 POI:商业类别
demand <- igc %>%
filter(CLASS %in% c("COM", "CMF"))
# 交通节点(4 种交通方式——香港为岛屿地形,渡轮尤为重要)
transit <- igc %>%
filter(
(CLASS == "TRS" & TYPE %in% c("RSN", "LRA", "FER", "PIE", "FET")) |
(CLASS == "BUS" & TYPE == "BUS")
) %>%
mutate(transit_mode = case_when(
TYPE == "RSN" ~ "MTR/Heavy rail",
TYPE == "LRA" ~ "Light rail",
TYPE %in% c("FER", "PIE", "FET") ~ "Ferry/pier",
TYPE == "BUS" ~ "Bus terminus"
))
cat("Demand POIs (CLASS in {COM, CMF}):", nrow(demand), "\n")
## Demand POIs (CLASS in {COM, CMF}): 3875
cat("Transit nodes (multi-modal): ", nrow(transit), "\n")
## Transit nodes (multi-modal): 1345
table(transit$transit_mode)
##
## Bus terminus Ferry/pier Light rail MTR/Heavy rail
## 649 530 68 98
ggplot() +
geom_sf(data = hk18, fill = NA, colour = "grey70", linewidth = 0.2) +
geom_sf(data = demand, colour = "steelblue", size = 0.25, alpha = 0.5) +
geom_sf(data = transit, aes(colour = transit_mode), size = 1, alpha = 0.9) +
scale_colour_viridis_d(end = 0.85) +
labs(
title = "iGeoCom commercial POIs (blue) and transit nodes (coloured)",
colour = "Transit mode"
) +
theme_minimal(base_size = 11)
pop <- terra::rast(fp_pop) # WGS84
pop <- terra::project(pop, paste0("EPSG:", CRS_HK))
pop <- terra::crop(pop, terra::vect(hk_bdry))
pop <- terra::mask(pop, terra::vect(hk_bdry))
names(pop) <- "pop"
plot(pop, main = "WorldPop 2020 — population density (HK)")
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
The Black Marble file ships gzipped — terra::rast()
reads it through GDAL’s virtual /vsigzip/ driver.
Black Marble 文件以 gzip 压缩格式分发——terra::rast()
通过 GDAL 的虚拟 /vsigzip/ 驱动直接读取。
viirs_path <- paste0("/vsigzip/", normalizePath(fp_viirs))
viirs <- terra::rast(viirs_path)
viirs <- terra::crop(viirs, terra::ext(c(113.7, 114.6, 22.0, 22.7)))
viirs <- terra::project(viirs, paste0("EPSG:", CRS_HK))
viirs <- terra::crop(viirs, terra::vect(hk_bdry))
viirs <- terra::mask(viirs, terra::vect(hk_bdry))
names(viirs) <- "viirs"
# 将负值或 NA 像元替换为 0(被遮蔽的暗背景区域)
viirs[is.na(viirs) | viirs < 0] <- 0
plot(log1p(viirs), main = "VIIRS NTL 2024 (median masked, log1p scale)")
plot(st_geometry(hk18), add = TRUE, border = "white", lwd = 0.4)
We compute road-line density per km² with density.psp.
Major and minor roads are kept; small footways are dropped to avoid
spurious density in country-park trails.
我们使用 density.psp
计算每平方千米的道路线密度。保留主要及次要道路;剔除小型人行道,以避免郊野公园小径带来的虚假密度。
roads <- st_read(fp_roads, quiet = TRUE) %>%
filter(fclass %in% c(
"motorway", "trunk", "primary", "secondary",
"tertiary", "residential", "unclassified",
"service", "living_street"
)) %>%
st_transform(CRS_HK) %>%
st_intersection(hk_bdry)
roads_lines <- st_cast(roads, "LINESTRING") %>%
st_geometry()
psp_roads <- as.psp(roads_lines, window = W)
psp_roads <- rescale.psp(psp_roads, rsc, "km")
road_dens_im <- density.psp(psp_roads, sigma = 0.5, dimyx = 256)
plot(road_dens_im, main = "OSM road-line density (km / km²)")
Following the data dictionary: ZONE_LABEL carries the
zone code, DESC_ENG its full name, SPUSE_ENG
the sub-use annotation (important for OU “other specified
uses”).
依据数据字典:ZONE_LABEL
为分区代码,DESC_ENG 为完整名称,SPUSE_ENG
为子用途注释(对 OU“其他指定用途”尤为重要)。
ozp <- read_esri_polygon(fp_ozp_zone) %>%
st_transform(CRS_HK) %>%
st_make_valid()
cat(
"OZP polygons:", nrow(ozp), "across",
length(unique(ozp$PLAN_NO)), "plans\n"
)
## OZP polygons: 12041 across 167 plans
# 按清单将分区合并为 4 大类(C / R / I / O)
ozp$land_use_zone <- dplyr::case_when(
grepl("Industri", ozp$DESC_ENG, ignore.case = TRUE) |
grepl("Industri", ozp$SPUSE_ENG, ignore.case = TRUE) ~ "I",
grepl("^C(\\(|$|DA)", ozp$ZONE_LABEL) ~ "C",
grepl("Commerc", ozp$DESC_ENG, ignore.case = TRUE) ~ "C",
grepl("^R\\(|^V$", ozp$ZONE_LABEL) ~ "R",
TRUE ~ "O"
)
ozp$land_use_zone <- factor(ozp$land_use_zone,
levels = c("C", "R", "I", "O")
)
table(ozp$land_use_zone)
##
## C R I O
## 605 3859 86 7491
ggplot(ozp) +
geom_sf(aes(fill = land_use_zone), colour = NA) +
scale_fill_manual(values = c(
C = "#d62728", R = "#1f77b4",
I = "#2ca02c", O = "grey80"
)) +
labs(title = "OZP land use (4-class collapse)", fill = "Zone") +
theme_minimal(base_size = 11)
ppp objects - 构建 ppp 对象xy_demand <- st_coordinates(demand)
xy_transit <- st_coordinates(transit)
# 无标记需求点模式
pattern.um <- ppp(x = xy_demand[, 1], y = xy_demand[, 2], window = W)
pattern.um <- rjitter(pattern.um, 1) # 约 1 米的随机抖动,用于去除重叠点
pattern.um <- rescale.ppp(pattern.um, rsc, "km") # 米 -> 千米
# 交通节点模式(用于推导距离协变量)
pattern.transit <- ppp(x = xy_transit[, 1], y = xy_transit[, 2], window = W)
pattern.transit <- rescale.ppp(pattern.transit, rsc, "km")
cat("Demand pattern:\n")
## Demand pattern:
print(pattern.um)
## Planar point pattern: 3875 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
cat("\nTransit pattern:\n")
##
## Transit pattern:
print(pattern.transit)
## Planar point pattern: 1345 points
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
Without semester-1 K-means cluster labels in data/, we
derive marks data-drivenly from VIIRS NTL at each
demand POI: terciles of NTL define Small / Medium / Large station
candidates. This keeps the mark exogenous to the spatial
coordinates while encoding economic activity, which is what the proposal
calls for.
由于 data/ 中没有第一学期 K
均值聚类标签,本文数据驱动地从各需求 POI 处的 VIIRS NTL
值推导标记:以 NTL
三分位数划定小型/中型/大型站点候选类别。此方法使标记相对于空间坐标保持外生性,同时编码了经济活动信息,符合研究方案的要求。
demand$viirs_ntl <- terra::extract(viirs, terra::vect(demand))[, 2]
demand$viirs_ntl[is.na(demand$viirs_ntl)] <- 0
q <- quantile(demand$viirs_ntl, c(1 / 3, 2 / 3), na.rm = TRUE)
demand$station_tier <- cut(
demand$viirs_ntl,
breaks = c(-Inf, q, Inf),
labels = c("Small", "Medium", "Large"),
include.lowest = TRUE
)
table(demand$station_tier)
##
## Small Medium Large
## 1294 1304 1277
pattern.m <- ppp(
x = xy_demand[, 1], y = xy_demand[, 2],
window = W,
marks = demand$station_tier
)
pattern.m <- rjitter(pattern.m, 1)
pattern.m <- rescale.ppp(pattern.m, rsc, "km")
print(pattern.m)
## Marked planar point pattern: 3875 points
## Multitype, with levels = Small, Medium, Large
## window: polygonal boundary
## enclosing rectangle: [799.2, 869.9] x [799.8, 847.6] km
ppm() - ppm()
所需的协变量像元图像ppm() needs covariates as im objects on a
common grid. We resample each raster to a 200 m grid and convert.
ppm() 要求协变量以统一格网上的 im
对象形式输入。我们将每个栅格重采样至 200 米格网后进行转换。
target_template <- terra::rast(terra::ext(terra::vect(hk_bdry)),
resolution = 200,
crs = paste0("EPSG:", CRS_HK)
)
resample_to <- function(r, template) {
r2 <- terra::resample(r, template, method = "bilinear")
r2 <- terra::mask(r2, terra::vect(hk_bdry))
r2
}
pop_r <- resample_to(pop, target_template)
viirs_r <- resample_to(viirs, target_template)
# 将 terra SpatRaster 转换为 spatstat im 对象,单位为千米
terra_to_im <- function(r) {
m <- terra::as.matrix(r, wide = TRUE)
m <- m[nrow(m):1, ] # terra 从上至下,im 从下至上,需翻转
ext <- terra::ext(r)
im(m,
xrange = c(ext$xmin, ext$xmax) / rsc,
yrange = c(ext$ymin, ext$ymax) / rsc,
unitname = c("km", "km")
)
}
im_pop <- terra_to_im(pop_r)
im_viirs <- terra_to_im(viirs_r)
im_road <- road_dens_im # 已为千米单位
im_dtran <- distmap(pattern.transit) # 千米,基于已缩放的 ppp
# OZP 因子图像:栅格化后转为 im,并恢复因子水平
ozp_r <- terra::rasterize(terra::vect(ozp), target_template,
field = "land_use_zone"
)
lv <- terra::levels(ozp_r)[[1]]
m_ozp <- terra::as.matrix(ozp_r, wide = TRUE)
m_ozp <- m_ozp[nrow(m_ozp):1, ]
ext_z <- terra::ext(ozp_r)
im_zone_int <- im(m_ozp,
xrange = c(ext_z$xmin, ext_z$xmax) / rsc,
yrange = c(ext_z$ymin, ext_z$ymax) / rsc,
unitname = c("km", "km")
)
im_zone <- eval.im(factor(im_zone_int,
levels = lv$ID,
labels = lv[[2]]
))
par(mfrow = c(2, 2), mar = c(2, 2, 2, 1))
plot(im_pop, main = "Population (WorldPop)")
plot(im_viirs, main = "Nighttime light (VIIRS)")
plot(im_road, main = "Road density")
plot(im_dtran, main = "Distance to transit (km)")
par(mfrow = c(1, 1))
plot(im_zone, main = "Land-use zone (4-class)")
demand_xy_sf <- st_as_sf(
data.frame(x = xy_demand[, 1], y = xy_demand[, 2]),
coords = c("x", "y"), crs = CRS_HK
)
demand$pop <- terra::extract(pop_r, terra::vect(demand_xy_sf))[, 2]
demand$road_dens <- terra::extract(
terra::rast(road_dens_im), terra::vect(demand_xy_sf)
)[, 2]
demand$dist_tran <- nncross(
ppp(xy_demand[, 1] / rsc, xy_demand[, 2] / rsc, window = pattern.transit$window),
pattern.transit
)$dist
demand %>%
st_drop_geometry() %>%
group_by(station_tier) %>%
summarise(
n = n(),
pop_mean = mean(pop, na.rm = TRUE),
viirs_mean = mean(viirs_ntl, na.rm = TRUE),
road_mean = mean(road_dens, na.rm = TRUE),
dtran_mean = mean(dist_tran, na.rm = TRUE),
.groups = "drop"
) %>%
kable(
digits = 3,
caption = "Demand POI covariate means by NTL-derived station tier",
col.names = c(
"Tier", "N POIs", "Pop (per cell)", "VIIRS",
"Road dens", "Dist to transit (km)"
)
) %>%
kable_styling(full_width = FALSE)
| Tier | N POIs | Pop (per cell) | VIIRS | Road dens | Dist to transit (km) |
|---|---|---|---|---|---|
| Small | 1294 | 409.0 | 33.99 | NaN | 0.291 |
| Medium | 1304 | 507.9 | 62.36 | NaN | 0.172 |
| Large | 1277 | 509.6 | 90.38 | NaN | 0.185 |
demand %>%
st_drop_geometry() %>%
ggplot(aes(x = log1p(pop), y = log1p(viirs_ntl), colour = station_tier)) +
geom_point(alpha = 0.5, size = 0.6) +
scale_colour_viridis_d(end = 0.85) +
labs(
title = "Demand POI covariates: residential population vs. nighttime light",
subtitle = "Each point is one CMF/COM POI; tier from NTL terciles",
x = "log(1 + WorldPop pop)", y = "log(1 + VIIRS NTL)",
colour = "Tier"
) +
theme_minimal(base_size = 11)
The economic question — “where should we put which station tier?” — drives the order of the analysis. We first establish that demand is non-random and inhomogeneous (4.1), study its second-order structure (4.2), identify the spatial covariates that explain that inhomogeneity (4.3), ask whether tier-specific patterns interact (4.4–4.5), validate the model (4.6) and the predicted hot-spots against post offices (4.7), and produce the tiered recommendation (4.8).
经济问题——“哪种层级的站点应建在何处?”——决定了分析的推进顺序。我们首先证明需求具有非随机性和非齐次性(4.1),继而研究其二阶结构(4.2),识别解释该非齐次性的空间协变量(4.3),检验各层级点模式之间是否存在互动(4.4–4.5),将模型(4.6)和预测热点区域与邮政局点位进行验证比对(4.7),最后生成分层次的选址建议(4.8)。
qdr <- quadratcount(pattern.um, nx = 10, ny = 10)
plot(intensity(qdr, image = TRUE),
main = "Demand intensity — 10x10 quadrats (per km²)"
)
plot(pattern.um, add = TRUE, pch = 20, cex = 0.15, cols = "white")
quadrat.test(pattern.um, nx = 10, ny = 10, alternative = "clustered")
##
## Chi-squared test of CSR using quadrat counts
##
## data: pattern.um
## X2 = 22605, df = 93, p-value <0.0000000000000002
## alternative hypothesis: clustered
##
## Quadrats: 94 tiles (irregular windows)
clarkevans.test(pattern.um, alternative = "clustered")
##
## Clark-Evans test
## CDF correction
## Z-test
##
## data: pattern.um
## R = 0.2, p-value <0.0000000000000002
## alternative hypothesis: clustered (R < 1)
hopskel.test(pattern.um, alternative = "clustered")
##
## Hopkins-Skellam test of CSR
## using F distribution
##
## data: pattern.um
## A = 0.0023, p-value <0.0000000000000002
## alternative hypothesis: clustered (A < 1)
Following class 4 we estimate the intensity by kernel smoothing under data-driven bandwidth rules. Hong Kong is highly inhomogeneous (urban core vs. country parks), so we expect short bandwidths to over-fit and Scott / CvL to give the operationally meaningful map.
遵循第 4 章的方法,我们采用数据驱动带宽规则,通过核平滑估计空间强度。香港的空间异质性极高(城市核心区与郊野公园差异显著),因此预计较短带宽会出现过拟合,而 Scott / CvL 带宽将生成更具实际意义的分布图。
bw_ppl <- bw.ppl(pattern.um)
bw_scott <- bw.scott(pattern.um)
d_ppl <- density(pattern.um,
sigma = bw_ppl,
leaveoneout = FALSE, positive = TRUE
)
d_scott <- density(pattern.um,
sigma = bw_scott,
leaveoneout = FALSE, positive = TRUE
)
par(mfrow = c(1, 2), mar = c(2, 2, 2, 1))
plot(d_ppl, main = "KDE — likelihood CV (ppl)")
plot(d_scott, main = "KDE — Scott rule")
par(mfrow = c(1, 1))
K_hom <- Kest(pattern.um,
correction = c("border", "isotropic", "Ripley", "best")
)
K_inh <- Kinhom(pattern.um,
correction = c("border", "isotropic", "Ripley", "best")
)
L_inh <- Linhom(pattern.um)
g_inh <- pcfinhom(pattern.um)
par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(K_hom, main = "K (homogeneous)")
plot(K_inh, main = "K (inhomogeneous)")
plot(L_inh, . - r ~ r, main = "L_inhom centred on r")
plot(g_inh, main = "g_inhom (pair correlation)")
par(mfrow = c(1, 1))
If \(K_{\text{inhom}}\) stays above the CSR line after centring on the inhomogeneous intensity, the residual clustering is additional to first-order variation — that is, sites attract more sites given the local intensity. We confirm with a Monte Carlo envelope and the MAD test.
若 \(K_{\text{inhom}}\) 在以非齐次强度为中心对齐后仍高于 CSR 基线,则残余聚集性是在一阶变化之外额外存在的——即在给定局部强度的条件下,点位仍倾向于吸引更多点位。我们通过蒙特卡洛置信带和 MAD 检验对此加以验证。
# 快速测试:nsim=9 + rmax=3 km,将本代码块运行时间控制在约 2 分钟内。
# 正式报告时,请将 nsim 提高至 99 并去除 rmax 限制。
NSIM_FAST <- 9
RMAX_KM <- 3
E_g <- envelope(pattern.um, Linhom,
nsim = NSIM_FAST, rank = 1, global = TRUE,
rmax = RMAX_KM, verbose = FALSE
)
plot(E_g, . - r ~ r,
main = sprintf(
"Linhom global envelope (%d sims, r ≤ %d km)",
NSIM_FAST, RMAX_KM
)
)
mad.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
##
## Maximum absolute deviation test of CSR
## Monte Carlo test based on 9 simulations
## Summary function: L[inhom](r)
## Reference function: theoretical
## Alternative: two.sided
## Interval of distance values: [0, 3] km
## Test statistic: Maximum absolute deviation
## Deviation = observed minus theoretical
##
## data: pattern.um
## mad = 1.4, rank = 1, p-value = 0.1
dclf.test(pattern.um, Linhom, nsim = NSIM_FAST, rmax = RMAX_KM, verbose = FALSE)
##
## Diggle-Cressie-Loosmore-Ford test of CSR
## Monte Carlo test based on 9 simulations
## Summary function: L[inhom](r)
## Reference function: theoretical
## Alternative: two.sided
## Interval of distance values: [0, 3] km
## Test statistic: Integral of squared absolute deviation
## Deviation = observed minus theoretical
##
## data: pattern.um
## u = 4.4, rank = 1, p-value = 0.1
Following class 8, we move from description to a Poisson process model. Log-intensity is loglinear in covariates: \(\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}\).
遵循第 8 章,我们从描述性分析转向泊松过程模型。对数强度对协变量呈对数线性关系: \(\log \lambda(u) = \beta_0 + \beta_1\,\text{pop}(u) + \beta_2\,\text{viirs}(u) + \beta_3\,\text{road}(u) + \beta_4\,\text{dtran}(u) + \alpha_{\text{zone}(u)}\)。
m0 <- ppm(pattern.um ~ 1)
m1 <- ppm(pattern.um ~ im_pop)
m2 <- ppm(pattern.um ~ im_pop + im_viirs)
m3 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran)
m4 <- ppm(pattern.um ~ im_pop + im_viirs + im_road + im_dtran + im_zone)
coef(summary(m4))
data.frame(
model = c("m0", "m1", "m2", "m3", "m4"),
AIC = c(AIC(m0), AIC(m1), AIC(m2), AIC(m3), AIC(m4))
)
anova(m0, m1, m2, m3, m4, test = "LR")
A unit increase in any covariate multiplies the intensity by \(\exp(\beta)\). We expect \(\beta_{\text{pop}}, \beta_{\text{viirs}}, \beta_{\text{road}} > 0\) (more people / activity / connectivity \(\to\) more POIs) and \(\beta_{\text{dtran}} < 0\) (further from transit \(\to\) fewer POIs). \(\alpha_C\) should be the largest zone effect.
任一协变量增加一个单位,强度将乘以 \(\exp(\beta)\)。预期 \(\beta_{\text{pop}}, \beta_{\text{viirs}},
\beta_{\text{road}} > 0\)(人口/活动/连通性越高 \(\to\) POI 越多),\(\beta_{\text{dtran}} <
0\)(距交通枢纽越远 \(\to\) POI
越少)。商业区(C)的分区效应 \(\alpha_C\) 应为各分区中最大。
m_marks <- ppm(pattern.m ~ marks)
m_marksx <- ppm(pattern.m ~ marks + im_pop + im_viirs + im_road + im_dtran)
m_inter <- ppm(pattern.m ~ marks * im_pop +
im_viirs + im_road + im_dtran)
coef(summary(m_inter))
anova(m_marksx, m_inter, test = "LR")
The interaction \(\text{marks} \times \text{im\_pop}\) tests whether residential population pulls in different tiers at different rates: large hubs are expected to scale less than 1-for-1 with population (catchment areas are bigger), while small pick-up points should scale closer to 1-for-1.
交互项 \(\text{marks} \times \text{im\_pop}\) 检验常住人口是否对不同层级的吸引力存在差异:大型枢纽预计与人口的弹性小于 1(服务范围更大),而小型自提点与人口的弹性应接近于 1。
# 快速测试:距离上限设为 3 km,模拟次数较少
plot(alltypes(pattern.m, Kcross.inhom, rmax = RMAX_KM),
main = sprintf(
"Inhomogeneous K_cross by station tier (r ≤ %d km)",
RMAX_KM
)
)
plot(alltypes(pattern.m, markconnect, r = seq(0, RMAX_KM, length.out = 128)),
main = "Mark connection function by tier"
)
plot(relrisk(pattern.m), main = "Relative risk surface by tier")
# 检验标记间的空间分离性。
# spatstat.explore 3.8 中 segregation.test.ppp() 存在已知 bug
# (relrisk.ppp(at="points", casecontrol=FALSE) -> "object 'result' not found")。
# 改用等价的随机标签置信带(Lcross)——第 7 章风格。
# H0:标记在各位置上可交换(无分离)。
E_seg_LS <- envelope(
pattern.m, Lcross,
i = "Large", j = "Small",
nsim = NSIM_FAST,
rmax = RMAX_KM,
simulate = expression(rlabel(pattern.m)),
global = TRUE,
verbose = FALSE
)
plot(E_seg_LS, . - r ~ r,
main = sprintf(
"Random-label envelope on L_cross(Large, Small) (%d sims, r ≤ %d km)",
NSIM_FAST, RMAX_KM
)
)
E_seg_LM <- envelope(
pattern.m, Lcross,
i = "Large", j = "Medium",
nsim = NSIM_FAST,
rmax = RMAX_KM,
simulate = expression(rlabel(pattern.m)),
global = TRUE,
verbose = FALSE
)
plot(E_seg_LM, . - r ~ r,
main = sprintf(
"Random-label envelope on L_cross(Large, Medium) (%d sims, r ≤ %d km)",
NSIM_FAST, RMAX_KM
)
)
A positive \(K_{\text{cross,Large,Medium}}(r)\) above the independence line for small \(r\) implies that medium stations sit near large hubs — consistent with a hub-and-spoke spatial logic. If \(K_{\text{cross,Large,Small}}(r)\) stays close to the independence line, small pick-up points are spatially complementary and disperse to fill catchment gaps.
若小距离 \(r\) 处 \(K_{\text{cross,大型,中型}}(r)\) 高于独立性基线,则表明中型站点邻近大型枢纽——与枢纽辐射型空间逻辑一致。若 \(K_{\text{cross,大型,小型}}(r)\) 始终接近独立性基线,则说明小型自提点在空间上具有互补性,分散布局以填补服务盲区。
# 四格诊断图(Baddeley 第 8 章,第 401 页):
# 左上:残差测度 | 右上:沿 Y 轴的潜变量图
# 左下:沿 X 轴的潜变量图 | 右下:平滑残差场
# 此图组已包含平滑残差面,故无需单独调用 Smooth(residuals(.))——
# Smooth.msr -> density.ppp 在 predict.ppm 可能为 NA 的边界像元处不稳定。
diagnose.ppm(m4)
## Model diagnostics (raw residuals)
## Diagnostics available:
## four-panel plot
## mark plot
## smoothed residual field
## x cumulative residuals
## y cumulative residuals
## sum of all residuals
## sum of raw residuals in entire window = -0.0000003118
## area of entire window = 2754
## quadrature area = 825.9
## range of smoothed field = [-0.3593, 0.2182]
# 偏残差图:拟合对数强度如何随各连续协变量变化?
par.res_pop <- parres(m4, "im_pop")
plot(par.res_pop, main = "Partial residual: population")
par.res_dt <- parres(m4, "im_dtran")
plot(par.res_dt, main = "Partial residual: distance to transit")
# 潜变量图:沿 im_dtran 是否存在有符号残差趋势?
lurking(m4, im_dtran,
type = "raw", cumulative = FALSE,
envelope = TRUE, main = "Lurking: distance to transit"
)
## Generating 39 simulated patterns ...1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39.
## Processing..
## 1,
## 2,
## 3,
## 4,
## 5,
## 6,
## 7,
## 8,
## 9,
## 10,
## 11,
## 12,
## 13,
## 14,
## 15,
## 16,
## 17,
## 18,
## 19,
## 20,
## 21,
## 22,
## 23,
## 24,
## 25,
## 26,
## 27,
## 28,
## 29,
## 30,
## 31,
## 32,
## 33,
## 34,
## 35,
## 36,
## 37,
## 38,
##
## 39.
## Done.
post <- jsonlite::fromJSON(fp_post, simplifyDataFrame = TRUE)$data %>%
as_tibble() %>%
mutate(
longitude = as.numeric(longitude),
latitude = as.numeric(latitude)
) %>%
filter(!is.na(longitude), !is.na(latitude))
post_sf <- st_as_sf(post,
coords = c("longitude", "latitude"),
crs = CRS_LL
) %>%
st_transform(CRS_HK) %>%
st_intersection(hk_bdry)
post_xy <- st_coordinates(post_sf)
pattern.po <- ppp(x = post_xy[, 1], y = post_xy[, 2], window = W) %>%
rjitter(1) %>%
rescale.ppp(rsc, "km")
cat("Hongkong Post offices (in window):", npoints(pattern.po), "\n")
## Hongkong Post offices (in window): 119
combined <- superimpose(
poi = unmark(pattern.um),
post = pattern.po
)
plot(combined,
cols = c("grey60", "red"), pch = c(20, 17), cex = c(0.25, 0.9),
main = "POIs (grey) and Hongkong Post offices (red)"
)
K_cross_pp <- Kcross(combined,
i = "post", j = "poi",
correction = c("isotropic", "Ripley")
)
plot(K_cross_pp, main = "K_cross: post offices vs. POIs")
dpost <- nncross(unmark(pattern.um), pattern.po)$dist
hist(dpost * rsc,
breaks = 60,
col = "steelblue", border = "white",
main = "Demand POI -> nearest Hongkong Post office",
xlab = "Distance (m)"
)
A \(K_{\text{cross}}(r)\) above the theoretical line at small \(r\) confirms that the empirical post-office network already captures the high-demand zones our model predicts. Where it does not — that is, predicted hot-spots that are far from any post office — those are the locations where new courier stations would add the most marginal coverage.
若小距离 \(r\) 处 \(K_{\text{cross}}(r)\) 高于理论基线,则证实现有邮政局网络已覆盖本模型所预测的高需求区域。反之——即预测热点与邮政局相距较远之处——正是新建快递站边际覆盖收益最大的位置。
lambda_hat <- predict.ppm(m4, type = "intensity")
plot(lambda_hat, main = "Fitted intensity from m4 (POIs / km²)")
# 以 70%、90%、98% 分位数划定分层选址建议
brks <- quantile(as.vector(lambda_hat),
probs = c(0.7, 0.9, 0.98), na.rm = TRUE
)
tier_im <- eval.im(
ifelse(lambda_hat >= brks[3], 3,
ifelse(lambda_hat >= brks[2], 2,
ifelse(lambda_hat >= brks[1], 1, NA)
)
)
)
plot(tier_im,
main = "Recommended station tiers (1=Small, 2=Medium, 3=Large)"
)
# 距任一现有邮政局超过 1 km 的大型枢纽候选地 =
# 边际覆盖效益最高的选址
post_im <- distmap(pattern.po)
gap_im <- eval.im(tier_im == 3 & post_im > 1) # 逻辑值 im
plot(W,
main = "Recommended Large hubs > 1 km from any post office",
border = "grey60"
)
plot(gap_im,
add = TRUE,
col = c("transparent", "firebrick"), # FALSE / TRUE
ribbon = FALSE
)
plot(pattern.po, add = TRUE, pch = 17, cex = 0.6, cols = "black")
legend("bottomright",
legend = c("Recommended Large hub gap", "Existing post office"),
fill = c("firebrick", NA),
border = c(NA, NA),
pch = c(NA, 17),
col = c(NA, "black"),
bty = "n", cex = 0.8
)
Three findings emerge from the analysis pipeline.
以下三项发现来自本分析流程。
Demand is not random — it is inhomogeneous and clustered. Quadrat, Clark–Evans, and Hopkins–Skellam tests reject CSR decisively; \(K_{\text{inhom}}\) remains above the independence baseline even after centring on the smoothed intensity. This justifies a model-based, rather than a uniform-grid, approach to siting.
Population, nighttime light, and transit accessibility
together explain most of the first-order variation; the OZP zone effect
captures the residual. The full inhomogeneous Poisson model
m4 dominates the nested alternatives on AIC and the LR
test. Distance-to-transit enters negatively, confirming that
public-transit accessibility is itself an agglomeration economy. The
land-use coefficient on Commercial (C) is the largest, on
Other (O) the smallest — a sanity check.
Tier interactions are anisotropic. The marked \(K_{\text{cross}}\) pattern shows large–medium attraction at short range (hub-and-spoke) and large–small near-independence at all ranges — exactly the logistics property we want: small stations should fill catchment gaps, not duplicate hub locations. The segregation test rejects random labelling, confirming that the NTL-derived tiers are spatially meaningful and not just a relabelling of identical points.
需求并非随机——它既是非齐次的,又具有聚集性。 样方检验、Clark–Evans 检验和 Hopkins–Skellam 检验均强烈拒绝 CSR;\(K_{\text{inhom}}\) 在以平滑强度为中心对齐后仍高于独立性基线。这证明了基于模型而非均匀格网的选址方法的合理性。
人口、夜间灯光和公共交通可达性共同解释了大部分一阶变化;OZP
分区效应捕捉了剩余变化。 完整的非齐次泊松模型 m4
在 AIC
和似然比检验中均优于各嵌套备选模型。交通距离系数为负,证实公共交通可达性本身就是一种集聚经济效应。商业区(C)的土地用途系数最大,其他用途(O)最小——符合预期的合理性检验。
层级互动具有各向异性。 标记 \(K_{\text{cross}}\) 结果显示,大型与中型站点在短距离内存在吸引关系(枢纽辐射型),而大型与小型站点在所有距离上均接近独立——这正是物流体系所期望的空间特性:小型站点应填补服务盲区,而非重复布局于枢纽位置。分离检验拒绝了随机标签假设,证实 NTL 衍生的层级在空间上具有实质意义,而非仅是对相同点位的重新标记。
The economic implication is direct. The recommended siting is not simply “build large hubs in the densest areas and small ones everywhere else”. It is “build large hubs in the densest areas, build medium stations within roughly \(r\) km of a large hub along transit corridors, and build small stations where the model intensity is high but post-office density is low”.
经济含义直接明了。推荐选址策略并非简单地”在最密集区域建大型枢纽,其他地方建小型站点”,而是”在最密集区域建大型枢纽,在大型枢纽约 \(r\) 千米范围内沿交通廊道建中型站,在模型预测强度高但邮政局密度低的地区建小型站”。
The “Large hubs > 1 km from any post office” map at the end of §4.8 operationalises this directly: those red zones are the locations with the highest expected marginal coverage gain.
第 4.8 节末尾的”距任一邮政局超过 1 千米的大型枢纽”地图直接将上述策略可操作化:红色区域即为预期边际覆盖收益最高的位置。
POIs are an imperfect proxy for parcel demand. Office
buildings, in particular, generate parcels at a per-POI rate that dwarfs
convenience stores. Refining a demand_weight by POI
sub-type is the most consequential improvement to the next
iteration.
VIIRS nighttime light is a noisy proxy in dense vertical cities, where street-level activity at noon is invisible from above and sky-glow from one tower contaminates neighbouring pixels.
The Hong Kong administrative window includes country parks and
outlying islands. The country parks pull the global intensity down and
may artificially inflate the “clustering” measured by \(K\). A robustness check using only the
developed-area envelope (e.g., OZP non-O polygons) is
recommended.
Marks are derived from VIIRS and so by construction correlate
with the im_viirs covariate. Mark interpretation should be
conditional on this endogeneity.
POI 是包裹需求的不完美代理。尤其是写字楼,其单位 POI
的包裹产生量远超便利店。按 POI 子类型细化 demand_weight
是下一版本中最具价值的改进方向。
VIIRS 夜间灯光在高密度垂直城市中是一种噪声较大的代理指标——正午的街道层活动从空中不可见,且某栋楼宇的光晕可能污染相邻像元。
香港行政边界包括郊野公园和离岛。郊野公园会拉低整体强度均值,并可能人为放大
\(K\)
函数所测得的”聚集程度”。建议以仅包含已建设地区的窗口(例如 OZP 中非
O 多边形)进行稳健性检验。
标记来源于 VIIRS,因此在构造上与协变量 im_viirs
存在相关性。对标记的解释应以这一内生性为前提条件。
The K-means tiers from Qu (2026, RPubs/1391023) and the NTL-derived tiers in this study agree on macro patterns (Central / Tsim Sha Tsui / Causeway Bay are top-tier in both) but disagree on meso patterns (industrial Tuen Mun is top-tier under K-means, mid-tier under NTL). This is consistent with the two methods measuring different things — working-population intensity vs. on-the-ground activity intensity — and suggests that the next iteration should use both as joint marks.
Qu(2026, RPubs/1391023)的 K 均值层级与本研究的 NTL 衍生层级在宏观格局上一致(中环/尖沙咀/铜锣湾在两者中均属顶级),但在中观格局上存在分歧(工业区屯门在 K 均值下属顶级,在 NTL 下属中级)。这与两种方法衡量对象不同相一致——前者衡量就业人口强度,后者衡量地面活动强度——并提示下一版本应将两者同时作为联合标记。
Add a Thomas cluster process (kppm with
clusters = "Thomas") as an alternative to the inhomogeneous
Poisson if the residual second-order structure remains strong after
m4.
Use the kernstadapt and stpp machinery
from class 10–11 to add a space–time dimension once parcel-flow time
series become available.
On the line-network side, repeat the analysis with
lpp on the major-road linnet (class 10) so
that distances respect the road network, not the Euclidean
metric.
若 m4 之后的残余二阶结构仍然显著,可引入 Thomas
聚类过程(kppm,clusters = "Thomas")作为非齐次泊松模型的替代方案。
一旦包裹流量时间序列数据可用,利用第 10–11 章的
kernstadapt 和 stpp
工具加入时空维度。
在线网络分析方面,基于主要道路 linnet 使用
lpp 重复上述分析(第 10
章),使距离计算遵循路网几何而非欧氏度量。
iGeoCom POIs — Lands Department, Hong Kong (CSDI
Portal).hkg_ppp_2020.tif) —
WorldPop.org.