Exploring the Spatial Distribution of Carjackings in Rochester, New York During the 2023 Carjacking Surge

Cody Longbotham | The Pennsylvania University | GEOG 588

Abstract

Rochester, New York, has been victimized by increased carjacking crimes in 2023. It is essential to the City of Rochester policy and decision-makers to understand the spatial relationship during the pivotal years that saw the increase in carjackings. This research will investigate spatial relationships and carjacking data in 2023 to uncover spatial characteristics that will enable decision-making to support the reduction and mitigation of crime in Rochester. By employing exploratory spatial data analysis, this study looks to find outliers and areas that that share distribution properties using spatial autocorrelation.

Introduction

The city of Rochester, NY, witnessed a staggering 240% increase in carjackings from 2022 to 2023, solidifying its position as one of the nation’s leading carjacking and theft hotspots (Hernandez, 2024). Understanding these incidents’ geographical distribution and underlying patterns is crucial for developing effective crime prevention strategies.

Exploratory spatial data analysis (ESDA) has proven valuable in uncovering trends within crime data (Messner, 1999). Messner et al. (1999) used ESDA to conduct an exploratory analysis of county homicide rates. They investigated uncovering spatial patterns through cluster analysis to understand the distribution of events. This research proposes the use of spatial autocorrelation techniques, an ESDA tool, to analyze carjackings in Rochester at the census tract level to explore the spatial distribution of the data. This explorative approach will identify clustering and outliers in the spatial characteristics of carjackings throughout the city.

By leveraging Rochester’s open data portals providing access to crime data over several years, this study aims to establish a meaningful spatial patterns with distributed crime data to support the city of Rochester and their efforts to mitigate carjackings. Utilizing ESDA to compare the spatial distribution of the city’s carjacking focus areas will yield insights beyond what raw data alone can provide. The spatial analysis will illuminate which areas experienced the most significant carjacking growth, potentially leading to a deeper understanding of the factors driving this localized increase.

This research paper looks to understand:

How does the spatial distribution of carjackings in Rochester, NY during 2023 vary at the census tract level?

Data

R libraries needed to pursue this research problem will primarily include R packages sf and rgeoda. R’s “rgeoda” and “sf” packages can create a LISA cluster map, which will be explained further during the methodology.

library(sf) # allows the ingest of shapefiles
library(rgeoda) # supports spatial autocorrelation methods

library(ggplot2)
library(tidyverse)
library(here)

Two datasets will be used for this research. The first will include point data, which provides for crime events in 2023, specifically carjackings. The second dataset will be the 2020 Rochester Census Tracts, which will allow the characterization of the data with the spatial boundaries of the census tracts. The primary data resource for this research will be Rochester’s open-source repository that includes Rochester Police Department (RPD) Crime data 2011 to Present and 2020 Rochester Census Tracts (RPD, n.d.). The City of Rochester collects and databases crime data in a wide data structure format. It contains various crime data from 2011 to now, highlighting several additional variables to include spatial properties.

crime2011present <- read_csv(here("Rochester_2011Present_Crime.csv"))

The primary variables used for this research are the spatial properties, report date, and statute text (crime type). Additionally, the variable statute description provides details of the crime. Some robberies involve motor vehicles that are not categorized as carjackings. For this study, carjackings will refer to “motor vehicle theft” as captured in the Statute Text variable. It will also include statute descriptions that include vehicles. For example, if a motor vehicle were robbed, it would be classified as carjacking for this research.

The 2020 Census Tract data breaks down the city of Rochester into 111 different observations. With unique identifiers for each census tract and distinct boundaries, it is easy to manipulate and visualize crime data distribution throughout the city. The data will be imported as a shapefile that contains spatial properties while encompassing polygons for each observation (City of Rochester, 2020).

rochester_blocks <- st_read(here("RochesterBlocks.shp"), quiet = TRUE)

Two focus areas for this research with the two data sets will be joining the spatial datasets while counting the number of carjackings within each block. This is a critical step necessary to conduct any spatial autocorrelation. Understanding shapefile manipulation and csv manipulation will be required to combine both datasets. Furthermore, creating different visualizations will be essential to exploring the data throughout the research area. Figure 1 highlights the various steps through the data manipulation process to the visualization.

Figure 1: Flowchart highlighting the methodology for the research. The two datasets, crime and census tracts, require manipulation prior to visualization and spatial autocorrelation process. This flowchart highlights the key areas that need manipulation to achieve the desired results.

Methodology

To explore the carjacking data throughout 2023 with ESDA, this research will utilize local indicators of spatial association (LISA) spatial autocorrelation methodology. The LISA method will be utilized over several steps:

Data preparation and Manipulation
Identify for clustering and outliers by analyzing spatial local autocorrelation.
Interpreting LISA spatial patterns using the High-High / High-Low visualizations.

Data Preparation and Manipulation

Data preparation will take a holistic look at the crime data. Although the data incorporates reported crimes over the last 13 years, this research will only explore the spike of crime in 2023. Tidying up the data to capture only 2023 will lower the overall observation count. Furthermore, removing unnecessary variables will simplify the interpretation of the data while only leveraging the necessary information to portray the spatial patterns using the LISA model accurately.

The RPD crime data is a large dataset with 128,066 observations and 40 different variables. Tidying up the data will simplify the data manipulation to conduct the spatial methods for visualization. The data will need to be filtered by all the primary variables previously mentioned. The crime type, year, and statute description will be filtered to reduce the overall crime dataset to approximately only 11,760 observations.

# filter for only 2023 crime
crime2023 <- filter(crime2011present, OccurredFrom_Date_Year == "2023")

# filter 2023 crime for only motor vehicle crime
motorcrime2023 <- filter(crime2023, grepl("Motor", Statute_Description))

While it might clean up the data, it might not entirely be necessary to delete all other variables. Future research may require other variables that add depth to the data to include case numbers, times, addresses, location type, and even police beat numbers. Since the spatial techniques will only leverage a couple variables after the manipulation, it might be a good idea to keep the variables intact.

In addition to the filtering, it is important to understand the contents within a variable. For instance, the coordinate variables (x, y) do not have values for every observation throughout the data. This will cause issues when trying to join and spatially represent each carjacking. To ensure that the data is clean, and all observation have a spatial value we can remove incomplete cases.

# remove the na coordinates so that I can create a shapefile
motorcrime2023_na <- drop_na(motorcrime2023, c("X"))

There are only other NAs throughout the observation if there are NAs in the coordinate variable. With that understanding, removing all NA’s through the data is okay.

# views the cases that have null values
complete.cases(motorcrime2023_na)

Once the crime dataset is tidy and filtered to meet the carjacking requirement, we can work towards a spatial join with the Rochester census tracts. To join both spatial datasets, it will be important to ensure that they use the same coordinate reference system. For this instance, the coordinate reference system “4326” will be used for both.

# transforms with the coordinate reference system 4326
crime2023sf <- st_as_sf(motorcrime2023_na, coords = c("X", "Y"), crs = 4326)

# ensures the rochester blocks are using 4326
rochester_blocks2 <- st_transform(rochester_blocks, 4326)

The spatial join between the two datasets will enable the spatial relationship between the census tracts and the location of the carjackings. Carjackings within a census tract can further be categorized and associated with the census tract to understand the level of activity within that area. This will enable the spatial autocorrelation methodology while determining which tracts have a spatial relationship with neighboring areas based on the number of events.

# spatial join
rochesterblockjoin <- st_join(rochester_blocks2, crime2023sf)

Once the data is joined a shapefile, to add a count field to the joined file, it will need to be converted back to an st. Using the rochesterblockjoin we can utilize the st_drop_geometry function to convert it back to an st.

rochesterblockjoinst <- st_drop_geometry(rochesterblockjoin)

The new rochesterblockjoinstis now ready to count the number of occurrences for each census tract. The variable “TRACT_FIPS” is the unique identifier for the census tracts. While the data is in a long format, we can group the data by the “TRACT_FIPS” to count how many times there was a carjacking in each individual “TRACT_FIPS.”

# group by tract_fips to count the number of occurrence from the join
crimesum <- rochesterblockjoinst %>% 
  group_by(TRACT_FIPS) %>% 
  summarize(crime_count = n())

Once the data is counted, we will have no other use for the rochesterblockjoinst dataset. We can now add the count field to the previous rochester_blocks2 file with a left_join function.

# join the count with the Rochester blocks
rochestermap <- rochester_blocks2 %>% 
  left_join(crimesum, by = "TRACT_FIPS")

The new rochestermap will now be the primary dataset moving forward with the analysis.

Identify for clustering and outliers

The counts of the carjackings within the census tracts can now be visualized in several different formats. One quick way to ensure that the count accurately portrays each census tract is to create a choropleth map. This spatial method visualizes each census tract with a gradient color scheme based on the carjacking values and counts identified within the crimesum variable.

ggplot(data = rochestermap) +
  geom_sf(aes(fill = crime_count)) + 
  theme_void()

Figure 2: Choropleth map using the crimesum variable to display the results. The choropleth map is an easy way to ensure the data manipulation made was succesful.

LISA is a statistical method that uses the Moran’s I methodology to identify clusters and spatial outliers within the data. It analyzes relationships between neighboring spatial data while identifying relationships and patterns through different categories (Blanford, n.d.).

Determining if the data is clustered or dispersed randomly will manage expectations prior to manipulating the data and conducting the LISA analysis. A high-low clustering report will look at several values tied to the crime data to help determine if the null hypothesis is true. The null hypothesis for LISA Moran’s I states that the data has no significant clustering. While an alternative hypothesis states that there is significant clustering within the data. This will be used initially to understand the depth of the cluster analysis (Anselin, 2020). Figure 3 highlights that there is a low liklihood (less than 1%) that the data is randomly distributed.

Figure 3: High-Low Cluster Report. The High-Low cluster report manipulated through ArcPro highlights a bell curve graph depicting the likelihood that the crime data is not randomly dispersed.

LISA focuses on individual spatial units and looks to identify whether there are clusters or outliers within the data set. Data preparation is an important aspect of the methodology. Without the right attribute data for each spatial unit, it will be difficult to visualize the spatial relationships. A spatial weight matrix quantifies the relationship between a desired unit or variable. For this research, the number of carjackings in each census tract is calculated to quantify the values of each tract. Once the weights are identified, the values are compared to neighboring units to determine the values for location. There are several different spatial weights. However, for this research we will be using the most common which is the queen weighted matrix.

# creates the weights for the LISA
queen_w <- queen_weights(rochestermap)

Queen Weighted Matrix - takes into consideration the surrounding neighbors all sharing an edge or common vertex.

Rook Weighted Matrix - only defines and considers neighbors by a common edge (Anselin, 2020).

Creating the LISA with rgeoda is as easy as one line of code. What adds depth into the process are the functions that fall within rgeoda such as the colors, label, and clustering methods. These can easily be added to the final visualization plot for interpretation.

# creates variable for the crime_count in RochesterMap
mapcrimecounts <- rochestermap["crime_count"]

# Creates the Lisa for both weights
lisa <- local_moran(queen_w, mapcrimecounts)

# Adds variables for the specific kind of labels, colors, and clustering for the Lisa map
lisa_colors <- lisa_colors(lisa)
lisa_labels <- lisa_labels(lisa)
lisa_clusters <- lisa_clusters(lisa)

Interpreting LISA spatial patterns using the High-High / High-Low visualizations.

The values from the LISA are displayed and calculated into five different categories based on the values and neighbors to calculate the cluster analysis.

High-High Cluster: identify high carjackings surrounded by neighbors with high crime as well.
High-Low Cluster: high carjackings surrounded by neighbors with low crimes.
Low-High Cluster: Low counts of carjackings surrounded by neighbors with high carjackings.
Low-Low Cluster: Low counts of carjackings surrounded by other low carjacking neighbors.
Non-significant: carjacking neighbors that are considered random and not significant compared to the other carjacking values.

LISA’s unique spatial pattern analysis can be used to compare 2023 carjackings in Rochester. This method can assist in measuring changes and altering patterns using the high-high, low-low, high-low, and low-high methodologies. Rgeoda will assist with overlaying the carjacking data with city tract data to categorize the spatial distribution throughout the city and return the LISA results.

With the LISA Local Moran’s I created with rGeoDa, it can now be displayed using a basic plot function while leveraging the othe rGeoDa predetermined functions. The LISA colors, labels, and clusters will add depth to the basic plot map.

# plot function uses the rochester map to identify the lisa_clusters with lisa_colors
plot(st_geometry(rochestermap), 
     col=sapply(lisa_clusters, function(x){return(lisa_colors[[x+1]])}), 
     border = "#333333", lwd=0.2)
# additional data to add to the map includes altering the legend with rGeoDa labels and colors with a title
title(main = "Local Moran Cluster Map for 2023 Rochester Crime")
legend('bottomleft', legend = lisa_labels, fill = lisa_colors)

Figure 3: Local Moran LISA Cluster Map for 2023. The LISA cluster map highlights significant and non-significant areas based on the crimesum count highlighted with the crime data and census tract polygons.

Results

The anticipated results from this research will highlight areas in Rochester that typically have high carjackings. The idea is to show Rochester areas with data values that are strongly positively or negatively associated with the neighboring tracts (Blanford, n.d.). It is anticipated that there will be low-crime areas with higher crime counts in 2023. These areas are of interest and should be further researched to understand what other external factors influence the increase in crime. Based on initial visualizations, there is one census tract that has high-low carjacking crimes. This area is just as significant as the two other tracts that have low-high crime. Exploring various LISA cluster maps with different filters and weights may provide different perspectives to the tract areas.

Figure 4: LISA Cluster Map compared to Choropleth Map. The LISA Cluster adds a layer of depth that is difficult to discern with a general choropleth map.

Figure 4 compares the LISA Cluster Map to the Choropleth map using the crime count; it is not easily determined which areas have high or low crime neighbors. The visualization of the LISA color scheme draws the reader into the center of Rochester by helping them understand where most carjackings occur. The pattern of high-high carjackings is somewhat noticeable on the choropleth map. Still, the additional details in the LISA map add depth and a conceptual understanding of the end results.

At the census tract level, the incidence of carjacking varies significantly. Notably, areas within central Rochester share high carjacking areas with their high carjacking neighbors. While the number of high-low and low-high areas is relatively low, certain census tracts demand attention. For instance, a single high-low census tract is intriguing as it features high carjackings amidst low carjacking neighbors. This anomaly could be a potential area for further research, aiming to understand why this specific census tract experiences high carjacking compared to its neighbors. A similar approach could be taken for the two low-high areas, which also present unique characteristics.

Reflection

During the research there were several aspects that went well, while other aspects could see improvement.

Went well

Available data and level of detail. The data added a lot of value with the hundreds to thousands amount of observations throughout Rochester, NY.
The Rochester Census Tracts provided boundaries that encompassed the crime data in a manageable format.
The application of the Local Moran’s I LISA Cluster was intuative and supported the research question with the available data.

Could use improvement

The additional analysis could use improvement. Adding a layer of detail to the why in the data.
Comparing the 2023 data to historical data such as 2022 to monitor the census tracts that changed over the carjacking surge.
Adding different weighted matrix to understand various LISA methods could provide alternative analysis to the 2023 carjacking data.

References

Anselin, L. (2020). Local Spatial Autocorrelation. GeoDa. An Introduction to Spatial Data Science. https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html

Anselin, L. (2020). GeoDa. Contiguity-Based Spatial Weights. https://geodacenter.github.io/workbook/4a_contig_weights/lab4a.html#spatial-weights---basic-concepts

Blanford, J., Kessler, F., Griffin, A., & O’Sullivan, D. (n.d.). Project 4: Local indicators of spatial association. GEOG 586 Geographic Information Analysis. https://www.e-education.psu.edu/geog586/node/673

City of Rochester. (2020). 2020 Census Tracts in Rochester, NY Web Map. https://data.cityofrochester.gov/maps/5ac4da20bb814f63b0180d970588e787/explore?location=43.188244%2C-77.596982%2C12.11

Hernández, A. (2024, February 9). Car thefts and carjackings are up. unreliable data makes it hard to pinpoint why. Stateline. https://stateline.org/2024/02/09/car-thefts-and-carjackings-are-up-unreliable-data-makes-it-hard-to-pinpoint-why/

Messner, S. F., Anselin, L., Baller, R. D., Hawkins, D. F., Deane, G., & Tolnay, S. E. (1999). The Spatial Patterning of County Homicide Rates: An Application of Exploratory Spatial Data Analysis. Journal of Quantitative Criminology, 15(4), 423–450. http://www.jstor.org/stable/23366751

Rochester Police Department (RPD). (n.d.). RPD - Part I Crime 2011 to Present. https://data-rpdny.opendata.arcgis.com/datasets/rpdny::rpd-part-i-crime-2011-to-present/about

Tan, S.-Y., & Haining, R. (2009). An urban study of crime and health using an exploratory spatial data analysis approach. Computational Science and Its Applications – ICCSA 2009, 269–284. https://doi.org/10.1007/978-3-642-02454-2_19