Research Question: “Can we identify the ‘Hidden Logic’ of criminal operations in Los Angeles by analyzing the non-random associations between environment, method, and weaponry?”
Core Objective: To transition from traditional patrolling to Context-Aware Dispatching. By applying the Apriori Algorithm, we treat each incident as a “behavioral transaction.” Our goal is to empower dispatchers with the ability to predict weapon presence or specific criminal signatures the moment a location is reported, long before an officer arrives at the scene.
This dataset contains 50,000 crime incident reports from the Los Angeles Police Department (LAPD) covering January 2020 to 2025. Each record includes detailed information about crime types, locations, victim demographics, and case outcomes.
For this project, we prioritize Categorical Variables to discover qualitative links:
To ensure the visualizations are professional, we apply a robust cleaning function to truncate long technical descriptions.
library(kableExtra)
library(arules)
library(arulesViz)
library(tidyverse)
library(stringr)
library(DT)
data <- read_csv("Crime_Data_from_2020_to_Present.csv")
# Label Cleaning for scannable Visuals
clean_labels <- function(x) {
x <- str_replace_all(x, "MULTI-UNIT DWELLING \\(APARTMENT, DUPLEX, ETC\\)", "Apt/Duplex")
x <- str_replace_all(x, "SINGLE FAMILY DWELLING", "Single House")
x <- str_replace_all(x, "STRONG-ARM \\(HANDS, FIST, FEET OR BODILY FORCE\\)", "Strong-Arm")
x <- str_replace_all(x, "UNKNOWN WEAPON/OTHER WEAPON", "Other Weapon")
x <- str_replace_all(x, "INTIMATE PARTNER - SIMPLE ASSAULT", "Domestic Assault")
x <- str_trunc(x, 22, "right")
return(x)
}
arules_data <- data %>%
select(`AREA NAME`, `Crm Cd Desc`, `Premis Desc`, `Weapon Desc`) %>%
mutate(across(everything(), ~clean_labels(as.character(.)))) %>%
mutate(across(everything(), as.factor)) %>%
na.omit()Before mining rules, we must understand the distribution of criminal attributes.
ggplot(data, aes(x = `TIME OCC`)) +
geom_histogram(bins = 24, fill = "firebrick", color = "white") +
theme_minimal() +
labs(title = "Distribution of Crime by Time of Day", x = "Military Time", y = "Count")The distribution reveals a distinct bi-modal peak in criminal activity. The first surge occurs around noon (12:00), likely corresponding to daytime residential and commercial thefts. However, the most significant escalation begins at 18:00 (6:00 PM) and remains elevated through the late evening.
This plot shows the most frequent “items” in our crime transactions.
crime_trans <- as(arules_data, "transactions")
par(mar = c(12, 5, 4, 2))
itemFrequencyPlot(crime_trans, topN = 10, type = "relative",
col = "steelblue", main = "Top 10 Criminal Attributes",
cex.names = 0.8, las = 2)The prevalence plot shows that “Street” and “Single House” are the dominant premises, while “Strong-Arm” is the most frequent weapon attribute.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 81
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[363 item(s), 16225 transaction(s)] done [0.00s].
## sorting and recoding items ... [80 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [132 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
significant_rules <- subset(rules, confidence > 0.5 & lift > 1.2)
rules_sorted <- sort(significant_rules, by = "lift")
inspectDT(head(rules_sorted, 50))Interpretation: * The Pacific Anomaly: A Confidence of 1.0 at Pacific Transportation Facilities suggests an absolute localized pattern—a rare finding that demands immediate tactical investigation.
The scatterplot reveals the “Behavioral Signatures” (high lift) versus common occurrences.
The scatterplot displays a high concentration of rules with Lift > 2.0, indicated by the darker shading. Most rules cluster in the low-support, high-confidence zone.
As seen in high-level association analyses, grouping antecedents allows us to see “clusters of risk.”
filtered_rules <- subset(rules, !(lhs %pin% "STREET") & lift > 1.5)
top_rules <- head(sort(filtered_rules, by = "lift"), 50)
plot(top_rules, method = "grouped", control = list(k = 10))The Grouped Matrix reveals several “Risk Anchors.” Notably, the cluster around “Apt/Duplex” and “Single House” shows a very strong, dark circle linked to “Domestic Assault” and “Strong-Arm” tactics.
The network graph visualizes the “Hubs” of criminal activity.
The network graph highlights “Strong-Arm” and “Domestic Assault” as the primary Central Hubs of the criminal ecosystem in Los Angeles. These hubs are heavily fed by residential premises.
By mastering the Modus Operandi through Association Rules, the LAPD can shift from a reactive force to a proactive intelligence agency. We no longer just know where crime happens; we now understand the logic of how it unfolds.