Spatial Dynamics of Theft Incidents in Taoyuan City: Identifying Hotspots and Spatial Dependence Using Geo-Localized Crime Data

Si Tang Lin

Student number: 476912

Spatial Econometrics Term Project

Project Summary

This project investigates spatial patterns and dependencies of theft incidents in Taoyuan City, Taiwan. Using geo-localized point data, the study evaluates whether theft occurrences exhibit clustering tendencies, identifies significant hotspots, and assesses the extent of spatial autocorrelation. Project include spatial visualizations, statistical diagnostics, kernel density estimation, spatial econometrics and insights to urban crime pattern.

Background and Motivation

Crime demonstrates non-random spatial patterns, influenced by land use, population density and policing allocation. Understanding these mechanisms is essential for promote effective stategies.

This project leverages real Taiwanese police incident records to examine two questions:

  1. Do theft incidents exhibit significant spaital clustering, forming hotspot?

  2. Is there evidence of spatial dependence- fo theft in one area correlate with neighboring areas?

To address above questions, this project applies following methods,to gain insights and provide data-driven policy.

** Spatial statistics

** Spatial weights matrix construction (KNN)

** Spatial dependence models (Moran’s I, LISA)

** Visualization of spatial phenomena

** Spatial econometrics model(SAR, SEM)

Required library

Data Information

(A) National Police Agency – Geo-Localized Crime Dataset

Data Type: Point-based geo-localized crime incident records Source: Taiwan Open Government Data Link: https://data.gov.tw/dataset/167673 Content: Contains the geographic coordinates and attributes of reported crime incidents. Spatial Processing: Filtering applied to remove mislocated or invalid coordinates. Spatial Boundary: Restricted to Taoyuan City, Taiwan.

(B) Resident Population and Population Density of Taoyuan City

Data Type: Demographic and spatial statistical data Source: Taiwan Open Government Data Link: https://data.gov.tw/dataset/18506 Content: Provides resident population counts and population density by administrative district. Spatial Boundary: Taoyuan City, Taiwan.

Load Dataset and Key Insight

theft<-read.csv("/Users/ninalin/Desktop/dsba/winter semester 25/spatial econometrics/term project/Theft Incident Locations.csv")

# View(earthquake)
is.null(theft) # No missing value in our data
## [1] FALSE
# Filter out wrong data focus on Taoyuan City
theft <- theft %>%
  filter(
    Latitude  >= 24.7,
    Latitude  <= 25.2,
    Longitude >= 121.1,
    Longitude <= 121.5)

str(theft)
## 'data.frame':    18043 obs. of  7 variables:
##  $ Case.Type    : chr  "Bicycle Theft" "Bicycle Theft" "Bicycle Theft" "Bicycle Theft" ...
##  $ Incident.Year: int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
##  $ Date         : int  20230120 20230113 20230113 20230113 20230113 20230109 20230109 20230109 20230109 20230130 ...
##  $ Agency.Code  : chr  "380132100C" "380132100C" "380132100C" "380132100C" ...
##  $ Precinct     : chr  "Zhongli Police Precinct" "Zhongli Police Precinct" "Zhongli Police Precinct" "Zhongli Police Precinct" ...
##  $ Latitude     : num  25 25 25 25 25 ...
##  $ Longitude    : num  121 121 121 121 121 ...
summary(theft)
##   Case.Type         Incident.Year       Date          Agency.Code       
##  Length:18043       Min.   :2015   Min.   :20150101   Length:18043      
##  Class :character   1st Qu.:2016   1st Qu.:20160320   Class :character  
##  Mode  :character   Median :2017   Median :20170906   Mode  :character  
##                     Mean   :2018   Mean   :20177876                     
##                     3rd Qu.:2019   3rd Qu.:20190920                     
##                     Max.   :2023   Max.   :20231230                     
##    Precinct            Latitude       Longitude    
##  Length:18043       Min.   :24.78   Min.   :121.1  
##  Class :character   1st Qu.:24.94   1st Qu.:121.2  
##  Mode  :character   Median :24.96   Median :121.3  
##                     Mean   :24.97   Mean   :121.3  
##                     3rd Qu.:24.99   3rd Qu.:121.3  
##                     Max.   :25.12   Max.   :121.5
nrow(theft)
## [1] 18043

Spatial Visualization

# Taoyuan city map

twn1 <- st_read("https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_TWN.gpkg",layer = "ADM_ADM_1")
## Reading layer `ADM_ADM_1' from data source 
##   `https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_TWN.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 7 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 116.71 ymin: 20.6975 xmax: 122.1085 ymax: 26.38542
## Geodetic CRS:  WGS 84
twn2 <- st_read("https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_TWN.gpkg",layer = "ADM_ADM_2")
## Reading layer `ADM_ADM_2' from data source 
##   `https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_TWN.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 22 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 116.71 ymin: 20.6975 xmax: 122.1085 ymax: 26.38542
## Geodetic CRS:  WGS 84
taoyuan_1 <- twn1 %>% filter(NAME_1 == "Taoyuan")
taoyuan_2 <- twn2 %>% filter(NAME_2 == "Taoyuan")

ggplot() +
    geom_sf(data = twn2, color = "gray80", fill = "gray95") +   
    geom_sf(data = taoyuan_2, fill = "white", color = "black", size = 1) +  
    coord_sf(
        xlim = c(120.95, 121.50),
        ylim = c(24.55, 25.15),
        expand = FALSE
    ) +
    theme_minimal() +
    labs(title = "Taoyuan (Figure 1)")

# Raw point data
theft_sf <- st_as_sf(theft, coords = c("Longitude", "Latitude"), crs = 4326)

ggplot() +
    geom_sf(data = taoyuan_1, fill = NA, color = "grey20", size = 0.3) +
    geom_sf(data = theft_sf, color = "grey10", size = 0.02, alpha = 0.25) +
    theme_void() +
    coord_sf(expand = FALSE) +
    theme(
        plot.title = element_text(size = 16, hjust = 0.5)
    ) +
    labs(title = "Spatial Distribution of Theft Incidents(Figure 2)")

# Crime data visualization

leaflet(data = theft) %>%
  addProviderTiles("OpenStreetMap (Figure 3)") %>%    
  addMarkers(~Longitude, ~Latitude, clusterOptions = markerClusterOptions())

The two visualizations reveal a strong pattern in theft incidents across Taoyuan City, particularly in Taoyuan, Zhongli, and Pingzhen districts. Rural areas display sparse activity. The dense point clusters indicates hotspot phenomenon. However, further spatial statistical testing is needed to confirm the significance of this spatial clustering.

Research Question 1

Do theft incidents exhibit significant spaital clustering, forming hotspot?

Kernel Density Estimation

## KDE (Kernel Density Estimation)

theft_sf <- st_as_sf(theft, coords = c("Longitude", "Latitude"), crs = 4326)
theft_sf_3826 <- st_transform(theft_sf, 3826)

coords <- st_coordinates(theft_sf_3826)
win <- as.owin(st_bbox(theft_sf_3826))
theft_ppp <- ppp(
  x = coords[, 1],
  y = coords[, 2],
  window = win)
## Warning: data contain duplicated points
kde_smooth <- density.ppp(theft_ppp, sigma = 1500)
kde_raster <- raster(kde_smooth)
kde_df <- as.data.frame(kde_raster, xy = TRUE)
colnames(kde_df) <- c("x", "y", "value")
ggplot() +
  geom_raster(data = kde_df, aes(x = x, y = y, fill = value), alpha = 0.85) +
  scale_fill_viridis(option = "magma", name = "Density") +
  geom_point(
    data = as.data.frame(coords),
    aes(X, Y),
    color = "white", size = 0.25, alpha = 0.4
  ) +
  coord_equal() +
  theme_minimal() +
  labs(
    title = "KDE Hotspot Map of Theft Incidents (σ = 1500)(Figure 4)",
    x = "X (meters)",
    y = "Y (meters)")

The KDE hotspot patterns indicates that theft incident cluster heavily in major urban districts, confirming the presence of non-random spatial clustering.

Spatial Weights Matrix

# Spatial Weights Matrix

coords_jitter <- jitter(coords, amount = 1)
## KNN = 5
knn5 <- knearneigh(coords_jitter, k = 5)
nb_knn5 <- knn2nb(knn5)
lw_knn5 <- nb2listw(nb_knn5, style = "W")
plot(nb_knn5, coords_jitter, col = "#FFBF00")
points(coords_jitter, pch = 20, cex = 0.5)
title("KNN Spatial Weights (k = 5)(Figure 5)")

## KNN = 8
knn8 <- knearneigh(coords_jitter, k = 8)
nb_knn8 <- knn2nb(knn8)
lw_knn8 <- nb2listw(nb_knn8, style = "W")
plot(nb_knn8, coords_jitter, col = "#134686")
points(coords_jitter, pch = 20, cex = 0.5)
title("KNN Spatial Weights (k = 8)(Figure 6)")

data.frame(
  Min_Neighbors = min(card(nb_knn8)),
  Max_Neighbors = max(card(nb_knn8)),
  Mean_Neighbors = mean(card(nb_knn8))
)
##   Min_Neighbors Max_Neighbors Mean_Neighbors
## 1             8             8              8

Perform K-Nearest Neighbors spatial weight matrix to capture interaction between theft location. No isolated observations and a uniform neighbor distribution which ensure numerical stability for subsequent spatial autocorrelation test. Dense regions show tight connection, while space regions exhibit longer distance.

Spatial Statistics

Global Moran

## Global Moran's I using KDE values at event locations
nn <- get.knnx(kde_df[,1:2], coords_jitter, k = 1)
kde_value <- kde_df$value[nn$nn.index]

moran_knn5 <- moran.test(kde_value, lw_knn5)
moran_knn8 <- moran.test(kde_value, lw_knn8)
moran_knn5
## 
##  Moran I test under randomisation
## 
## data:  kde_value  
## weights: lw_knn5    
## 
## Moran I statistic standard deviate = 228.14, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      9.979903e-01     -5.542623e-05      1.913859e-05
moran_knn8
## 
##  Moran I test under randomisation
## 
## data:  kde_value  
## weights: lw_knn8    
## 
## Moran I statistic standard deviate = 286.57, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      9.973059e-01     -5.542623e-05      1.211281e-05

Global Moran’s I shows extremely strong positive spatial autocorrelation (I ≈ 0.99, p < 0.001) in the KDE-estimated theft intensity surface. Confirming that theft intensity is highly spatially clustered across Taoyuan City. The value for both KNN= 5 and KNN=8 indicated that this clustering pattern is robust to different spatial weight specification.

Local Moran’s I (LISA)

# LISA
lisa <- localmoran(kde_value, lw_knn5)

lisa_df <- cbind(
  as.data.frame(coords_jitter),
  Ii = lisa[,1],
  p = lisa[,5])

theft_sf$Local_I  <- lisa[,1]
theft_sf$p_value  <- lisa[,5]
# Define category
alpha <- 0.05

theft_sf$cluster <- "Not Significant"

theft_sf$cluster[theft_sf$Local_I > 0 & kde_value > mean(kde_value) & theft_sf$p_value < alpha] <- "High-High"
theft_sf$cluster[theft_sf$Local_I > 0 & kde_value < mean(kde_value) & theft_sf$p_value < alpha] <- "Low-Low"
theft_sf$cluster[theft_sf$Local_I < 0 & kde_value > mean(kde_value) & theft_sf$p_value < alpha] <- "High-Low"
theft_sf$cluster[theft_sf$Local_I < 0 & kde_value < mean(kde_value) & theft_sf$p_value < alpha] <- "Low-High"

theft_sf$cluster <- factor(
  theft_sf$cluster,
  levels = c("High-High", "Low-Low", "High-Low", "Low-High", "Not Significant"))
tmap_mode("plot")
## tmap mode set to plotting
tm_shape(theft_sf) +
  tm_dots(
    col = "cluster",
    palette = c(
      "High-High"   = "#EA2F14",
      "Low-Low"     = "#FB9E3A",
      "High-Low"    = "#E6521F",
      "Low-High"    = "#FCEF91",
      "Not Significant" = "#EBE8DB"
    ),
    size = 0.1,
    border.col = NA
  ) +
  tm_layout(
    title = "LISA Cluster Map of Theft KDE (KNN = 5)(Figure 7)",
    title.position = c("center", "top"),
    legend.outside = TRUE)

The Lisa cluster map revealed two High-High hotspots, indicating areas where theft intensity is elevated. Surrounding Low-Low regions highlight districts with consistently low theft activity.

Result Interpretation

The results of spatial analysis demonstrate theft incidents in Taoyuan City exhibit a clear and statistically significant pattern of spatial clustering. These areas display dense point intensity and prominent high-density surfaces, indicating persistent crime concentration rather than random spatial occurrence.

Research Question 2

Do population size and density influence the distribution of theft incidents across districts in Taoyuan City in 2023?

Spatial Econometrics model

Model Construction

# Population data
xml <- read_xml("/Users/ninalin/Desktop/dsba/winter semester 25/spatial econometrics/term project/popu.xml")

nodes <- xml_find_all(xml, ".//桃園市常住人口數及人口密度")

extract_block <- function(node) {
  tibble(
    district = xml_text(xml_find_first(node, "./按鄉鎮市區別分_By_Township_City_District")),
    pop_total = xml_text(xml_find_first(node, "./常住人口數_總計_人_Number_of_resident_population_Grand_total_person")),
    pop_male  = xml_text(xml_find_first(node, "./常住人口數_男_人_Number_of_resident_population_Male_person")),
    pop_female = xml_text(xml_find_first(node, "./常住人口數_女_人_Number_of_resident_population_Female_person")),
    area_km2 = xml_text(xml_find_first(node, "./土地面積_平方公里_Total_land_area_km2")),
    density = xml_text(xml_find_first(node, "./人口密度_Population_density"))
  )}

pop_df <- bind_rows(lapply(nodes, extract_block))
head(pop_df)
## # A tibble: 6 × 6
##   district             pop_total pop_male  pop_female area_km2 density  
##   <chr>                <chr>     <chr>     <chr>      <chr>    <chr>    
## 1  桃園市             "2441064" "1230476" "1210588"  "1221"   "1999.3" 
## 2  Taoyuan City       ""        ""        ""         ""       ""       
## 3   桃園區           "458566"  "224189"  "234377"   "34.8"   "13175.4"
## 4   Taoyuan District ""        ""        ""         ""       ""       
## 5   中壢區           "468641"  "231923"  "236718"   "76.5"   "6124.4" 
## 6   Zhongli District ""        ""        ""         ""       ""
# English format
pop_df2 <- pop_df %>% 
  mutate(across(everything(), ~ str_squish(.)))

pop_df2 <- pop_df2 %>% 
  mutate(is_chinese = str_detect(district, "[\u4e00-\u9fff]"))

cols_to_fill <- c("pop_total", "pop_male", "pop_female", "area_km2", "density")

for (col in cols_to_fill) {
  pop_df2[[col]] <- ifelse(
    !pop_df2$is_chinese & (pop_df2[[col]] == "" | is.na(pop_df2[[col]])),
    lag(pop_df2[[col]]),
    pop_df2[[col]]
  )}

clean_pop <- pop_df2 %>% 
  filter(!is_chinese) %>% 
  dplyr::select(-is_chinese)
clean_pop <- clean_pop %>% 
  mutate(across(all_of(cols_to_fill), ~ as.numeric(.)))

head(clean_pop)
## # A tibble: 6 × 6
##   district         pop_total pop_male pop_female area_km2 density
##   <chr>                <dbl>    <dbl>      <dbl>    <dbl>   <dbl>
## 1 Taoyuan City       2441064  1230476    1210588   1221     1999.
## 2 Taoyuan District    458566   224189     234377     34.8  13175.
## 3 Zhongli District    468641   231923     236718     76.5   6124.
## 4 Daxi District       103101    53457      49644    105.     981.
## 5 Yangmei District    182572    94057      88515     89.1   2048.
## 6 Luzhu District      184664    93769      90895     75.5   2446.
# preprocess

theft_2023 <- theft %>%
  filter(Incident.Year == 2023) %>%   
  mutate(Precinct = trimws(Precinct)) %>%    
  filter(!Precinct %in% c("Xinzhuang Police Precinct",
                          "Linkou Police Precinct"))

mapping <- tibble(
  precinct = c(
    "Bade Police Precinct",
    "Daxi Police Precinct",
    "Dayuan Police Precinct",
    "Guishan Police Precinct",
    "Longtan Police Precinct",
    "Luzhu Police Precinct",
    "Pingzhen Police Precinct",
    "Taoyuan Police Precinct",
    "Yangmei Police Precinct",
    "Zhongli Police Precinct"
  ),
  district = c(
    "Bade District",
    "Daxi District",
    "Dayuan District",
    "Guishan District",
    "Longtan District",
    "Luzhu District",
    "Pingzhen District",
    "Taoyuan District",
    "Yangmei District",
    "Zhongli District"
  )
)

theft_2023 <- theft_2023 %>%
  left_join(mapping, by = c("Precinct" = "precinct"))

setdiff(unique(theft_2023$Precinct), mapping$precinct)
## character(0)
theft_district_2023 <- theft_2023 %>%
  group_by(district) %>%
  summarise(theft_count = n(), .groups = "drop")


final_data <- clean_pop %>%
  left_join(theft_district_2023, by = "district") %>%
  mutate(theft_count = ifelse(is.na(theft_count), 0, theft_count)) %>%
  filter(district != "Taoyuan City")   # remove non-district row
# District Neighbnor list
neighbors_list <- list(
  "Bade District"    = c("Taoyuan District", "Daxi District", "Guishan District"),
  "Daxi District"    = c("Bade District", "Zhongli District", "Longtan District"),
  "Dayuan District"  = c("Luzhu District", "Guanyin District", "Zhongli District", "Pingzhen District"),
  "Fuxing District"  = c("Daxi District", "Longtan District"),
  "Guishan District" = c("Bade District", "Taoyuan District", "Luzhu District"),
  "Longtan District" = c("Daxi District", "Zhongli District", "Yangmei District", "Fuxing District", "Pingzhen District"),
  "Luzhu District"   = c("Guishan District", "Taoyuan District", "Dayuan District"),
  "Pingzhen District"= c("Zhongli District", "Longtan District", "Yangmei District", "Dayuan District"),
  "Taoyuan District" = c("Bade District", "Guishan District", "Luzhu District"),
  "Yangmei District" = c("Pingzhen District", "Zhongli District", "Longtan District"),
  "Zhongli District" = c("Pingzhen District", "Longtan District", "Daxi District", "Dayuan District", "Yangmei District"),
  "Xinwu District"   = c("Guanyin District", "Dayuan District"),
  "Guanyin District" = c("Xinwu District", "Dayuan District")
)

district_order <- final_data$district  

nb_obj <- lapply(district_order, function(d){
  match(neighbors_list[[d]], district_order)
})

class(nb_obj) <- "nb"
attr(nb_obj, "region.id") <- district_order
# Spatial weight matrix
W <- nb2listw(nb_obj, style = "W")

Statistical test

# Moran test
moran.test(final_data$theft_count, listw = W)
## 
##  Moran I test under randomisation
## 
## data:  final_data$theft_count  
## weights: W    
## 
## Moran I statistic standard deviate = 0.4974, p-value = 0.3095
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.00342022       -0.08333333        0.03042072

The Global moran test indicate no detectable spatial autocorrelation.

Spatial Autoregressive Model

# SAR model
model_sar <- lagsarlm(
  theft_count ~ pop_total + density,
  data = final_data,
  listw = W
)

summary(model_sar)
## 
## Call:lagsarlm(formula = theft_count ~ pop_total + density, data = final_data, 
##     listw = W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -36.7827 -17.1705  -4.9493   9.1369  66.1067 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value  Pr(>|z|)
## (Intercept) -2.84164690 19.63605644 -0.1447    0.8849
## pop_total    0.00059282  0.00011234  5.2768 1.314e-07
## density     -0.00389647  0.00435451 -0.8948    0.3709
## 
## Rho: -0.18018, LR test value: 0.60753, p-value: 0.43572
## Asymptotic standard error: 0.18392
##     z-value: -0.97965, p-value: 0.32726
## Wald statistic: 0.95972, p-value: 0.32726
## 
## Log likelihood: -61.10698 for lag model
## ML residual variance (sigma squared): 702.3, (sigma: 26.501)
## Number of observations: 13 
## Number of parameters estimated: 5 
## AIC: 132.21, (AIC for lm: 130.82)
## LM test for residual autocorrelation
## test value: 4.4038, p-value: 0.035858

The result indicate theft levle in one district do not significantly depend on theft levels in neighboring districts(no evidence of spatial spillover). Population size shows a positive association with theft incident, areas with more resident tends to have more thefts. While population density reflecting a negative relation, possibly reflecting social monitoring effects.

Spatial Error Model

# SEM model
model_sem <- errorsarlm(
  theft_count ~ pop_total + density,
  data = final_data,
  listw = W
)
summary(model_sem)
## 
## Call:
## errorsarlm(formula = theft_count ~ pop_total + density, data = final_data, 
##     listw = W)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.6975 -14.5092  -1.5132  13.1696  38.2280 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value Pr(>|z|)
## (Intercept) -1.0902e+01  7.3053e+00 -1.4924  0.13560
## pop_total    5.8971e-04  6.9226e-05  8.5187  < 2e-16
## density     -5.5413e-03  2.6296e-03 -2.1073  0.03509
## 
## Lambda: -0.9909, LR test value: 6.174, p-value: 0.012964
## Asymptotic standard error: 0.26221
##     z-value: -3.7791, p-value: 0.00015741
## Wald statistic: 14.281, p-value: 0.00015741
## 
## Log likelihood: -58.32373 for error model
## ML residual variance (sigma squared): 349.09, (sigma: 18.684)
## Number of observations: 13 
## Number of parameters estimated: 5 
## AIC: 126.65, (AIC for lm: 130.82)

Population has a strongly positive effect on theft while population density exhibits a negative effect, suggest that denser district may benefit from stronger natural surveillance.

Result Interpretation

The spatial result indicate demographic factors are signicant in explaining district-level theft incidents in Taoyuan City. Population size shows a high correlation with theft counts, suggesting that districts with larger populations face more theft incidents. In contrast, population density is negatively associated with theft, implying that compact urban areas may benefit from stronger policing presence.

Conclusion

This study demonstrate a examination of theftpatterns in Taoyuan City by combining point hotsopts with spatial econometrics model. The Kernel density and LISA reveal clear and statistically significant spatial clustering, with theft incidents heavily concentrated in major urban districts such as Taoyuan, Zhongli, and Pingzhen. These localized hotspots confirming that theft does not occur randomly but gravitates toward dense activity centers.

However, both SAR and SEM models indicate no significant spatial dependence across districts, suggesting that hotspot formation does not spill over administrative boundaries. Instead, theft variation indicates districts with larger populations experience more theft, while higher population density is associated with lower theft levels.

Together, these findings illustrate how spatial analysis operates at multiple scales: point-level methods detect fine-grained clustering, while spatial econometrics clarifies whether such patterns persist at broader administrative levels. In Taoyuan’s case, theft is spatially clustered locally but not structurally interconnected across districts, highlighting the importance of scale in interpreting urban crime dynamics.