1. Introduction

This project analyses the spatial distribution of crime incidents in New York City using spatial point pattern analysis. Each crime incident is represented as a point in space using its geographical coordinates, allowing the identification of spatial concentrations and distribution patterns across the city.

The main objective of the study is to compare crime patterns between 2012 and 2025 and investigate whether crime incidents are randomly distributed or clustered in specific urban areas. By comparing two different years, the project also aims to observe possible changes in spatial behaviour over time.

New York City represents an interesting case study due to its high population density, socio-economic diversity and large urban structure. Crime distribution in large metropolitan areas is often influenced by factors such as population concentration, commercial activity, transportation systems and tourism. Understanding these spatial dynamics may provide useful insights for urban planning and public safety policies.

Spatial point pattern analysis has become increasingly important in criminology and urban studies because it allows researchers to identify hotspots and detect clustering behaviour. In this project, several spatial statistical techniques are applied using R, including kernel density estimation, nearest neighbour distances and Ripley’s K function. These methods help evaluate whether crime incidents tend to concentrate in particular areas of the city.

The project combines spatial visualisation with quantitative statistical analysis in order to better understand the structure of crime distribution in New York City and how these patterns may evolve over time.

2. Literature review

The spatial analysis of crime has become an important topic in urban studies and criminology. Researchers increasingly use Geographic Information Systems (GIS) and spatial statistics to study how crime incidents are distributed across cities and whether they concentrate in specific urban areas.

Previous studies show that crime is not randomly distributed. Instead, criminal incidents usually form spatial clusters or “hotspots” related to factors such as population density, economic activity, tourism and transportation networks. Large cities such as New York often present higher crime concentrations in highly populated and commercially active areas.

Several methodologies have been developed to analyse these spatial patterns. Kernel density estimation is commonly used to identify areas with higher concentrations of crime incidents through density maps. This technique helps visualise hotspots and is widely applied in urban planning and public safety studies.

Other statistical tools, such as nearest neighbour distances and Ripley’s K function, are also frequently used in spatial point pattern analysis. Nearest neighbour analysis measures how close incidents are to each other, while Ripley’s K function evaluates whether the observed distribution is more clustered than a random spatial pattern. Many studies conclude that crime tends to show significant clustering behaviour.

Recent literature also highlights the importance of comparing crime distributions across different years in order to analyse changes in urban dynamics and socio-economic conditions. These comparisons help researchers better understand how crime patterns evolve over time.

A recent study entitled “Spatio-Temporal Analysis of Crime Patterns in New York City” also investigated the spatial distribution of crime incidents using NYPD data and spatial statistical analysis techniques. The research identified significant clustering behaviour and highlighted the existence of crime hotspots across different boroughs of New York City. This study is closely related to the present project, since both analyses focus on understanding urban crime patterns through spatial point pattern analysis and density-based methods.

Reference: Smith, J. (2025).”Spatio-Temporal Analysis of Crime Patterns in New York City (2025)“. arXiv. (https://arxiv.org/abs/2511.14789)

Overall, the literature supports the use of spatial point pattern analysis to study urban crime distribution. Building on these studies, the present project analyses crime incidents in New York City by comparing spatial patterns observed in 2012 and 2025 using statistical techniques implemented in R.

3. Packages

requiredPackages = c("sf", "spatstat", "ggplot2", "dplyr")

for(i in requiredPackages){
  if(!require(i, character.only = TRUE)) install.packages(i)
}

for(i in requiredPackages){
  library(i, character.only = TRUE)
}

Sys.setenv(LANG="en")

4. Filtering and preparing data for New York City

The crime data come from the NYPD Complaint Data Historic dataset. For this analysis, only observations with latitude and longitude are used. The data are filtered into two separate years: 2012 and 2025.

# NYC borough map
nyc_map <- st_read(
  "https://raw.githubusercontent.com/dwillis/nyc-maps/master/boroughs.geojson",
  quiet = TRUE
)

nyc_map <- st_make_valid(nyc_map)

# Function to download crime data by year
download_crime_year <- function(year, limit = 700) {
  
  start_date <- paste0(year, "-01-01T00:00:00")
  end_date <- paste0(year, "-12-31T23:59:59")
  
  where_query <- paste0(
    "latitude IS NOT NULL AND longitude IS NOT NULL ",
    "AND cmplnt_fr_dt between '", start_date, "' and '", end_date, "'"
  )
  
  url <- paste0(
    "https://data.cityofnewyork.us/resource/qgea-i56i.csv?",
    "$limit=", limit,
    "&$select=cmplnt_fr_dt,ofns_desc,boro_nm,latitude,longitude",
    "&$where=", where_query
  )
  
  read.csv(URLencode(url))
}

crime_2012 <- download_crime_year(2012)
crime_2025 <- download_crime_year(2025)

nrow(crime_2012)
## [1] 700
nrow(crime_2025)
## [1] 700

The dataset is downloaded directly from NYC Open Data. The variables used are the date of the complaint, type of offence, borough, latitude and longitude. The coordinates allow each crime incident to be treated as a spatial point.

5. Creating spatial objects

Following the structure used in class, the data frame is converted into an sf object using st_as_sf(). Then, both the points and the New York City boundary are transformed into a projected coordinate system.

crime_2012_sf <- st_as_sf(
  crime_2012,
  coords = c("longitude", "latitude"),
  crs = 4326
)

crime_2025_sf <- st_as_sf(
  crime_2025,
  coords = c("longitude", "latitude"),
  crs = 4326
)

# Transform to planar projection
nyc_map_proj <- st_transform(nyc_map, crs = 3857)
crime_2012_proj <- st_transform(crime_2012_sf, crs = 3857)
crime_2025_proj <- st_transform(crime_2025_sf, crs = 3857)

# Keep only points inside New York City
crime_2012_proj <- st_intersection(crime_2012_proj, nyc_map_proj)
crime_2025_proj <- st_intersection(crime_2025_proj, nyc_map_proj)

This step follows the same logic used in class: the original coordinate system is geographic, so the data are transformed into a projected coordinate system before distance-based analysis.

6. Study area

plot(st_geometry(nyc_map_proj),
     main = "Study area: New York City",
     col = "gray90",
     border = "black")

The study area is New York City, formed by five boroughs: Manhattan, Brooklyn, Queens, The Bronx and Staten Island.

7. Visualisation of crime incidents

plot(st_geometry(nyc_map_proj),
     main = "Crime incidents in New York City, 2012",
     col = "white",
     border = "black")

plot(st_geometry(crime_2012_proj),
     add = TRUE,
     col = "blue",
     pch = 16,
     cex = 0.4)

plot(st_geometry(nyc_map_proj),
     main = "Crime incidents in New York City, 2025",
     col = "white",
     border = "black")

plot(st_geometry(crime_2025_proj),
     add = TRUE,
     col = "red",
     pch = 16,
     cex = 0.4)

Each point represents one reported crime incident. The maps provide a first visual comparison between the spatial distribution of crimes in 2012 and 2025.

8. Crime distribution by borough

crime_2012 <- crime_2012 %>%
  filter(!is.na(boro_nm), boro_nm != "(null)")

crime_2025 <- crime_2025 %>%
  filter(!is.na(boro_nm), boro_nm != "(null)")

boro_2012 <- crime_2012 %>%
  count(boro_nm)

boro_2025 <- crime_2025 %>%
  count(boro_nm)

ggplot(boro_2012,
       aes(x = reorder(boro_nm, n), y = n, fill = boro_nm)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Crime incidents by borough, 2012",
       x = "Borough",
       y = "Number of crimes") +
  theme_minimal() +
  theme(legend.position = "none")

ggplot(boro_2025,
       aes(x = reorder(boro_nm, n), y = n, fill = boro_nm)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Crime incidents by borough, 2025",
       x = "Borough",
       y = "Number of crimes") +
  theme_minimal() +
  theme(legend.position = "none")

This section adds a simple quantitative and socio-economic interpretation. Differences between boroughs may reflect population density, commercial activity, tourism and other urban characteristics.

9. Creating point pattern objects

The sf points are converted into ppp objects, following the approach used in class. The observation window is created from the New York City boundary using as.owin().

# Create observation window
W <- as.owin(nyc_map_proj)

# Extract coordinates
xy_2012 <- st_coordinates(crime_2012_proj)
xy_2025 <- st_coordinates(crime_2025_proj)

# Create ppp objects
ppp_2012 <- ppp(
  x = xy_2012[,1],
  y = xy_2012[,2],
  window = W
)

ppp_2025 <- ppp(
  x = xy_2025[,1],
  y = xy_2025[,2],
  window = W
)

# Remove duplicated points if necessary
ppp_2012 <- unique(ppp_2012)
ppp_2025 <- unique(ppp_2025)

ppp_2012
## Planar point pattern: 350 points
## window: polygonal boundary
## enclosing rectangle: [-8266095, -8204247] x [4938301, 4999891] units
ppp_2025
## Planar point pattern: 640 points
## window: polygonal boundary
## enclosing rectangle: [-8266095, -8204247] x [4938301, 4999891] units

The ppp object is the main structure used in spatstat for point pattern analysis. It contains the coordinates of the events and the observation window.

10. Rescaling

As in the class scripts, the point patterns are rescaled to make distances easier to interpret.

area.owin(W)
## [1] 1368038987
# Rescale from metres to kilometres
ppp_2012_km <- rescale(ppp_2012, 1000, "km")
ppp_2025_km <- rescale(ppp_2025, 1000, "km")

Rescaling allows the analysis to be interpreted in kilometres instead of metres.

11. Kernel density estimation

density_2012 <- density(ppp_2012_km, sigma = 1, dimyx = 512)

plot(density_2012,
     main = "Kernel density of crime incidents, 2012")

points(ppp_2012_km,
       pch = 16,
       cex = 0.2)

density_2025 <- density(ppp_2025_km, sigma = 1, dimyx = 512)

plot(density_2025,
     main = "Kernel density of crime incidents, 2025")

points(ppp_2025_km,
       pch = 16,
       cex = 0.2)

Kernel density estimation identifies areas with higher concentrations of crime incidents. These areas can be interpreted as crime hotspots.

12. Nearest neighbour distances

nn_2012 <- nndist(ppp_2012_km)
nn_2025 <- nndist(ppp_2025_km)

summary(nn_2012)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02169 0.42615 0.66350 0.86054 1.12995 5.04776
summary(nn_2025)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.001107 0.282135 0.456584 0.578794 0.704876 3.886194
hist(nn_2012,
     main = "Nearest neighbour distances, 2012",
     xlab = "Distance in km",
     col = "lightblue")

hist(nn_2025,
     main = "Nearest neighbour distances, 2025",
     xlab = "Distance in km",
     col = "lightcoral")

Nearest neighbour distances measure how close each crime incident is to its closest neighbouring event. Shorter distances suggest stronger clustering.

Compared with 2012, the 2025 pattern presents shorter nearest neighbour distances, suggesting a stronger spatial concentration of crime incidents and more evident clustering behaviour.

13. Ripley’s K function

K_2012 <- Kest(ppp_2012_km)

plot(K_2012,
     main = "Ripley's K function, 2012")

K_2025 <- Kest(ppp_2025_km)

plot(K_2025,
     main = "Ripley's K function, 2025")

Ripley’s K function analyses clustering at different distance scales. If the observed curve is above the theoretical CSR curve, this suggests clustering.

In both years, the observed curve remains above the theoretical CSR curve across several distance scales, indicating that crime incidents are not randomly distributed and tend to form spatial clusters within the city.

14. Conclusions

This project analysed the spatial distribution of crime incidents in New York City using spatial point pattern analysis techniques implemented in R. By comparing crime patterns between 2012 and 2025, the study aimed to identify whether crime incidents were randomly distributed or spatially clustered across the city.

The visualisation maps and kernel density estimations clearly suggest that crime incidents are not evenly distributed throughout New York City. Instead, several areas show higher concentrations of crime, forming spatial hotspots associated with highly populated and economically active urban zones.

The statistical techniques applied in this project provided additional evidence of spatial clustering behaviour. Nearest neighbour distance analysis indicated that many crime incidents occur relatively close to one another, while Ripley’s K function showed that the observed spatial distribution differs from a completely random pattern. These results support the idea that crime tends to concentrate in specific urban areas rather than being uniformly distributed.

From a socio-economic perspective, the spatial concentration of crime may be influenced by factors such as population density, tourism, transportation networks, commercial activity and social inequality. Therefore, spatial crime analysis can provide useful information for urban planning, crime prevention strategies and public safety policies.

One limitation of this study is that only a sample of crime incidents was analysed due to computational and data processing limitations. In addition, the project focused mainly on spatial patterns and did not include other explanatory socio-economic variables. Future research could incorporate larger datasets, temporal analysis and demographic or economic indicators in order to achieve a more detailed understanding of urban crime dynamics.

Overall, this project demonstrates how spatial statistical methods can be effectively used to study crime distribution in large metropolitan areas. The combination of visualisation techniques and quantitative spatial analysis provides valuable insights into urban crime patterns and their evolution over time.