1. Introduction

Research Question: “Can we identify the ‘Hidden Logic’ of criminal operations in Los Angeles by analyzing the non-random associations between environment, method, and weaponry?”

Core Objective: To transition from traditional patrolling to Context-Aware Dispatching. By applying the Apriori Algorithm, we treat each incident as a “behavioral transaction.” Our goal is to empower dispatchers with the ability to predict weapon presence or specific criminal signatures the moment a location is reported, long before an officer arrives at the scene.

2. Data Source & Overview

This dataset contains 50,000 crime incident reports from the Los Angeles Police Department (LAPD) covering January 2020 to 2025. Each record includes detailed information about crime types, locations, victim demographics, and case outcomes.

Primary Source: data.lacity.org
Dataset Curator: Hammad Zafar (2025)
Access Date: December 2025
Kaggle Link: Crime Data Set

2.1. Data Dictionary for Association Rules

For this project, we prioritize Categorical Variables to discover qualitative links:

AREA NAME: The 21 geographic patrol divisions.
Crm Cd Desc: Description of the crime committed.
Premis Desc: The type of structure or location where the crime took place.
Weapon Desc: Type of weapon involved.

2.2. Loading and Cleaning Data

To ensure the visualizations are professional, we apply a robust cleaning function to truncate long technical descriptions.

library(kableExtra)
library(arules)
library(arulesViz)
library(tidyverse)
library(stringr)
library(DT)

data <- read_csv("Crime_Data_from_2020_to_Present.csv")

# Label Cleaning for scannable Visuals
clean_labels <- function(x) {
  x <- str_replace_all(x, "MULTI-UNIT DWELLING \\(APARTMENT, DUPLEX, ETC\\)", "Apt/Duplex")
  x <- str_replace_all(x, "SINGLE FAMILY DWELLING", "Single House")
  x <- str_replace_all(x, "STRONG-ARM \\(HANDS, FIST, FEET OR BODILY FORCE\\)", "Strong-Arm")
  x <- str_replace_all(x, "UNKNOWN WEAPON/OTHER WEAPON", "Other Weapon")
  x <- str_replace_all(x, "INTIMATE PARTNER - SIMPLE ASSAULT", "Domestic Assault")
  x <- str_trunc(x, 22, "right")
  return(x)
}

arules_data <- data %>%
  select(`AREA NAME`, `Crm Cd Desc`, `Premis Desc`, `Weapon Desc`) %>%
  mutate(across(everything(), ~clean_labels(as.character(.)))) %>%
  mutate(across(everything(), as.factor)) %>%
  na.omit()

3. Exploratory Data Analysis (EDA)

Before mining rules, we must understand the distribution of criminal attributes.

3.1. Crime Occurrence by Time (Histogram)

ggplot(data, aes(x = `TIME OCC`)) +
  geom_histogram(bins = 24, fill = "firebrick", color = "white") +
  theme_minimal() +
  labs(title = "Distribution of Crime by Time of Day", x = "Military Time", y = "Count")

The distribution reveals a distinct bi-modal peak in criminal activity. The first surge occurs around noon (12:00), likely corresponding to daytime residential and commercial thefts. However, the most significant escalation begins at 18:00 (6:00 PM) and remains elevated through the late evening.

Operational Insight: This “Evening Surge” suggests that the majority of criminal transactions occur under the cover of darkness or during the transition from work to home, emphasizing the need for increased patrol visibility during the 18:00–22:00 window.

3.2. Top 10 Attribute Prevalence

This plot shows the most frequent “items” in our crime transactions.

crime_trans <- as(arules_data, "transactions")
par(mar = c(12, 5, 4, 2))
itemFrequencyPlot(crime_trans, topN = 10, type = "relative", 
                  col = "steelblue", main = "Top 10 Criminal Attributes",
                  cex.names = 0.8, las = 2)

The prevalence plot shows that “Street” and “Single House” are the dominant premises, while “Strong-Arm” is the most frequent weapon attribute.

Critical Context: While these items have high Support, they represent the “baseline” of urban crime. The challenge for the Apriori algorithm is to move beyond these commonalities to find high-Lift associations—identifying when a specific premise (like a parking lot) significantly increases the probability of a specific crime type beyond this baseline frequency.

4. Association Rule Mining

rules <- apriori(crime_trans, parameter = list(supp = 0.005, conf = 0.5, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 81 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[363 item(s), 16225 transaction(s)] done [0.00s].
## sorting and recoding items ... [80 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [132 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

significant_rules <- subset(rules, confidence > 0.5 & lift > 1.2)
rules_sorted <- sort(significant_rules, by = "lift")
inspectDT(head(rules_sorted, 50))

Interpretation: * The Pacific Anomaly: A Confidence of 1.0 at Pacific Transportation Facilities suggests an absolute localized pattern—a rare finding that demands immediate tactical investigation.

Residential Verbal Nexus: The Lift of ~13 for Verbal Threats in Houses/Apartments indicates that these premises act as massive catalysts for this specific criminal behavior, far exceeding the city average.

5. Visualizing the Logic of Criminal Behavior

5.1. The Support-Confidence Landscape

The scatterplot reveals the “Behavioral Signatures” (high lift) versus common occurrences.

plot(rules, method = "scatterplot", engine = "plotly")

The scatterplot displays a high concentration of rules with Lift > 2.0, indicated by the darker shading. Most rules cluster in the low-support, high-confidence zone.

Operational Insight: This confirms that LA crime is driven by Specialized Patterns rather than generic ones. We are not just seeing random events; we are seeing specific “criminal recipes” that, while they may not happen everywhere (low support), are highly predictable (high confidence) once the initial conditions are met.

5.2. Structural Clustering (Grouped Matrix)

As seen in high-level association analyses, grouping antecedents allows us to see “clusters of risk.”

filtered_rules <- subset(rules, !(lhs %pin% "STREET") & lift > 1.5)
top_rules <- head(sort(filtered_rules, by = "lift"), 50)
plot(top_rules, method = "grouped", control = list(k = 10))

The Grouped Matrix reveals several “Risk Anchors.” Notably, the cluster around “Apt/Duplex” and “Single House” shows a very strong, dark circle linked to “Domestic Assault” and “Strong-Arm” tactics.

Strategic Insight: This grouping identifies a clear Residential Violence Fingerprint. The size of the bubbles indicates that residential premises are the primary catalysts for physical altercations. Conversely, smaller, high-lift bubbles associated with commercial premises suggest more professional, premeditated “transactions” like shoplifting or specialized thefts.

5.3. Modus Operandi Interconnectivity (Network Graph)

The network graph visualizes the “Hubs” of criminal activity.

plot(head(rules_sorted, 20), method = "graph", engine = "htmlwidget")

The network graph highlights “Strong-Arm” and “Domestic Assault” as the primary Central Hubs of the criminal ecosystem in Los Angeles. These hubs are heavily fed by residential premises.

Operational Intelligence: The arrows pointing from “Apt/Duplex” to “Domestic Assault” represent a high-probability behavioral path. Interestingly, “Other Weapon” and “Verbal Threats” form a separate sub-network, suggesting a different Modus Operandi for intimidation-based crimes compared to physical force crimes. This allows dispatchers to differentiate between “Physical Force Risks” and “Intimidation Risks” based on the reported environment.

6. Conclusion & Strategic Recommendations

6.1. Strategic Insights

The Multiplier Effect: Rules with Lift > 10 (like the Verbal Threat nexus) prove that the environment increases risk by over 1,000%. This is a “Statistical Certainty” that must be communicated to field officers.
Predictive Certainty: The 100% confidence rules in the Pacific Division show that data-driven policing can identify “Perfect Patterns” for targeted resource allocation.

6.2. Tactical Roadmap (The 3 Pillars)

Context-Aware Alerting: Integrate the “Residential-Verbal” rule into the Dispatch system. When a call comes from an Apt/Duplex, the system should trigger a “Verbal Escalation Warning” based on the 92% confidence level.
Specialized Deployment: The Pacific Division should prioritize “Transportation Facility” surveillance, as the data shows an absolute link to specific reported incidents in that area.
Environmental Deterrence: Focus on “Risk Anchors.” Since residential areas are hubs for verbal-threat escalations, community-based conflict de-escalation programs should be centered in these high-support zones.

6.3. Final Thought

By mastering the Modus Operandi through Association Rules, the LAPD can shift from a reactive force to a proactive intelligence agency. We no longer just know where crime happens; we now understand the logic of how it unfolds.

Association Rule Mining for Urban Crime Patterns

Thi Yen Nhi Pham

2026-02-25