html_document: theme: darkly highlight: tango toc: true toc_float: collapsed: false smooth_scroll: true toc_depth: 3 code_folding: hide df_print: paged self_contained: true —
Final Deliverable | Detroit, Michigan | 2019
Dataset: Open Crime Database (OCDB) via the
crimedata R package
Source URL:
https://osf.io/zyaqn/
Time Period: January 1, 2019 — December 31, 2019
This analysis uses the Open Crime Database (OCDB),
accessed through the crimedata R package developed by
criminologist Dr. Michael Langton and colleagues. The
OCDB aggregates standardized, geocoded crime incident data from police
departments across major U.S. cities, making it one of the most
accessible open-source crime datasets available for academic
research.
Dataset URL: https://osf.io/zyaqn/
The specific dataset used here covers Detroit, Michigan for the full calendar year 2019, drawing from incident-level records reported by the Detroit Police Department (DPD). Detroit was selected due to its historically elevated crime rates relative to other U.S. cities, making it a meaningful subject for criminological prediction and pattern analysis.
The OCDB provides standardized variable documentation across all participating cities. Key variables relevant to this analysis include:
| Variable | Description |
|---|---|
offense_type |
Specific offense classification (e.g., aggravated assault) |
offense_group |
Broader offense category (e.g., assault offenses) |
offense_against |
Whether the offense is against a person, property, or society |
date_single |
Date and time of the incident |
latitude / longitude |
Geographic coordinates of the incident |
census_block |
U.S. Census block identifier for spatial aggregation |
The OCDB synchronizes offense classifications using the FBI’s National Incident-Based Reporting System (NIBRS) taxonomy, enabling consistent cross-city comparisons. Full codebook documentation is available at the OSF repository linked above.
The data originates from the Detroit Police Department’s public crime incident reporting system, aggregated into the OCDB by Langton et al. Detroit’s open data portal publishes crime incidents as part of Michigan’s broader commitment to government transparency under the Freedom of Information Act (FOIA).
The OCDB project is hosted on the Open Science Framework (OSF), a non-profit, open-source platform for research data sharing, ensuring long-term accessibility and reproducibility.
Crime incident data is collected by law enforcement agencies for operational and legal purposes: dispatching officers, prosecuting offenses, and allocating departmental resources. Secondary academic use, as in this analysis, repurposes these administrative records to identify spatial and temporal crime patterns, test criminological theories, and inform evidence-based policy recommendations.
library(crimedata)
library(leaflet)
library(leaflet.extras)
library(dplyr)
library(RColorBrewer)
library(DT)
crimes_raw <- get_crime_data(
years = 2019,
cities = "Detroit",
type = "core"
)
target_offenses <- c(
"assault offenses",
"burglary/breaking & entering",
"motor vehicle theft",
"robbery",
"homicide offenses"
)
crimes <- crimes_raw |>
filter(
!is.na(longitude),
!is.na(latitude),
offense_group %in% target_offenses
) |>
mutate(
offense_group = as.character(offense_group),
offense_label = paste0(
toupper(substring(offense_group, 1, 1)),
substring(offense_group, 2)
),
date_fmt = format(date_single, "%b %d, %Y")
) |>
slice_sample(n = min(8000, nrow(crimes_raw)))
The raw dataset required the following preparation steps before analysis:
Filtering: The full OCDB dataset for Detroit 2019 contains numerous offense categories. This analysis focuses on five high-impact violent and property crime categories — assault, burglary, motor vehicle theft, robbery, and homicide — which represent the offenses most closely associated with public safety outcomes and predictive policing research.
Geocoding validation: Incidents missing latitude or longitude coordinates were removed, as spatial visualization is a central component of this analysis. Approximately 0 records were excluded for missing coordinates.
Sampling: For browser performance in the interactive map, a random sample of up to 8,000 incidents was drawn from the filtered dataset. All non-map visualizations use the full filtered dataset.
Variable recoding: The offense_group
variable was stored as a factor and required conversion to character
before string formatting functions could be applied.
Overall, the dataset was relatively clean with consistent variable naming across years, which is a key advantage of the OCDB’s standardized schema.