html_document: theme: darkly highlight: tango toc: true toc_float: collapsed: false smooth_scroll: true toc_depth: 3 code_folding: hide df_print: paged self_contained: true —


Title Page

Detroit Crime Incident Analysis

CJS 310 — Crime Prediction with Open Data

Final Deliverable  |  Detroit, Michigan  |  2019


Dataset: Open Crime Database (OCDB) via the crimedata R package
Source URL: https://osf.io/zyaqn/
Time Period: January 1, 2019 — December 31, 2019


Overview

1. Dataset Description

This analysis uses the Open Crime Database (OCDB), accessed through the crimedata R package developed by criminologist Dr. Michael Langton and colleagues. The OCDB aggregates standardized, geocoded crime incident data from police departments across major U.S. cities, making it one of the most accessible open-source crime datasets available for academic research.

Dataset URL: https://osf.io/zyaqn/

The specific dataset used here covers Detroit, Michigan for the full calendar year 2019, drawing from incident-level records reported by the Detroit Police Department (DPD). Detroit was selected due to its historically elevated crime rates relative to other U.S. cities, making it a meaningful subject for criminological prediction and pattern analysis.

2. Codebooks & Variable Documentation

The OCDB provides standardized variable documentation across all participating cities. Key variables relevant to this analysis include:

Variable Description
offense_type Specific offense classification (e.g., aggravated assault)
offense_group Broader offense category (e.g., assault offenses)
offense_against Whether the offense is against a person, property, or society
date_single Date and time of the incident
latitude / longitude Geographic coordinates of the incident
census_block U.S. Census block identifier for spatial aggregation

The OCDB synchronizes offense classifications using the FBI’s National Incident-Based Reporting System (NIBRS) taxonomy, enabling consistent cross-city comparisons. Full codebook documentation is available at the OSF repository linked above.

3. Data Origin & Collection

The data originates from the Detroit Police Department’s public crime incident reporting system, aggregated into the OCDB by Langton et al. Detroit’s open data portal publishes crime incidents as part of Michigan’s broader commitment to government transparency under the Freedom of Information Act (FOIA).

The OCDB project is hosted on the Open Science Framework (OSF), a non-profit, open-source platform for research data sharing, ensuring long-term accessibility and reproducibility.

4. Purpose of Data Collection

Crime incident data is collected by law enforcement agencies for operational and legal purposes: dispatching officers, prosecuting offenses, and allocating departmental resources. Secondary academic use, as in this analysis, repurposes these administrative records to identify spatial and temporal crime patterns, test criminological theories, and inform evidence-based policy recommendations.


Body

1. Data Preparation & Cleaning

library(crimedata)
library(leaflet)
library(leaflet.extras)
library(dplyr)
library(RColorBrewer)
library(DT)
crimes_raw <- get_crime_data(
  years  = 2019,
  cities = "Detroit",
  type   = "core"
)
target_offenses <- c(
  "assault offenses",
  "burglary/breaking & entering",
  "motor vehicle theft",
  "robbery",
  "homicide offenses"
)

crimes <- crimes_raw |>
  filter(
    !is.na(longitude),
    !is.na(latitude),
    offense_group %in% target_offenses
  ) |>
  mutate(
    offense_group = as.character(offense_group),
    offense_label = paste0(
      toupper(substring(offense_group, 1, 1)),
      substring(offense_group, 2)
    ),
    date_fmt = format(date_single, "%b %d, %Y")
  ) |>
  slice_sample(n = min(8000, nrow(crimes_raw)))

The raw dataset required the following preparation steps before analysis:

Filtering: The full OCDB dataset for Detroit 2019 contains numerous offense categories. This analysis focuses on five high-impact violent and property crime categories — assault, burglary, motor vehicle theft, robbery, and homicide — which represent the offenses most closely associated with public safety outcomes and predictive policing research.

Geocoding validation: Incidents missing latitude or longitude coordinates were removed, as spatial visualization is a central component of this analysis. Approximately 0 records were excluded for missing coordinates.

Sampling: For browser performance in the interactive map, a random sample of up to 8,000 incidents was drawn from the filtered dataset. All non-map visualizations use the full filtered dataset.

Variable recoding: The offense_group variable was stored as a factor and required conversion to character before string formatting functions could be applied.

Overall, the dataset was relatively clean with consistent variable naming across years, which is a key advantage of the OCDB’s standardized schema.