This report analyzes arrest data from the Baltimore Police Department, focusing on crime trends and demographic patterns since 2020. The dataset contains over 370,000 records with 21 variables that provide detailed information on arrests, such as arrest date and time, incident offense, charge descriptions, district, neighborhood, and geographic coordinates. Note that some fields—particularly district, neighborhood, and location data—contain missing values, which may impact certain geographic analyses.
By examining this dataset, the analysis aims to explore overall crime frequency, gender differences in offenses, and spatial distribution trends over time. Descriptive statistics and various visualizations are used to highlight key insights into post-COVID crime patterns in Baltimore.
This section provides an overview of the dataset, including the structure, missing values, key variables, and trends in arrests. The following information was broken up into numeric and categorical based data to briefly identify common trends in the Baltimore Arrest Data.
#getwd()
#setwd("/Users/kellyratigan/Desktop/ASYN 460/Asyn R")
df <- read.csv("BPD_Arrests.csv", stringsAsFactors = FALSE)
#-----------------------------------------------------------------------------
#cleaning data:
#df <- subset(df, select = -c(Shape, GeoLocation))
#df <- na.omit(df)
#df <- df[, !colnames(df) %in% c("Latitude", "Longitude", "Post")]
#colnames(df)
# Convert ArrestDateTime to Date Time Format
#df$ArrestDateTime <- as.POSIXct(df$ArrestDateTime, format="%Y/%m/%d %H:%M:%S", tz="UTC")
# Remove duplicate rows
#df <- unique(df)
#Missing values:
#df <- df[!is.na(df$IncidentOffence) & !is.na(df$ChargeDescription), ]
# Standardize categorical columns by converting them to lowercase
#df$Gender <- tolower(df$Gender)
#df$Race <- tolower(df$Race)
#df$District <- tolower(df$District)
#df$IncidentOffence <- tolower(df$IncidentOffence)
#Unrealistic ages (keeping ages between 10 and 100)
#df <- df[df$Age >= 10 & df$Age <= 100, ]
#colnames(df)
#-------------------------------------------------------------------------------------
summary(df$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 8.00 24.00 31.00 33.29 42.00 100.00 132
table(df$Gender)
##
## F M Y Z
## 34 69631 310186 1 1
table(df$Race)
##
## A B H I U W
## 98 1003 313892 1 918 8684 55257
sort(table(df$District), decreasing = TRUE)
##
## Western Central Eastern Southern Northeast Southeast Northwest
## 160239 30354 29216 28332 25964 24099 24022 22212
## Southwest Northern
## 22146 13269
This bar chart visualizes the top 10 most frequently committed crimes in Baltimore since 2020. To create this chart, the arrest data was first filtered to include only records from 2020 onward, and entries labeled as “unknown offense” were removed to focus on interpretable crime categories. The dataset’s date-time field was converted into a proper date format to facilitate this filtering.
Next, the number of occurrences for each crime type was calculated and the crime types were ranked in descending order. The top 10 crimes were then selected and plotted as a horizontal bar chart (using a coordinate flip) to enhance the readability of crime names. The chart employs a minimal theme with a steel blue color palette for clarity. This visualization provides insight into which offenses are most common in the dataset, serving as a basis for further analysis of crime trends and enforcement strategies in the post-COVID era.
ggplot(top_crimes, aes(x = reorder(IncidentOffence, count), y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() + # Flip the axes for better readability
labs(title = "Top 10 Most Common Crimes in Baltimore (Since 2020)",
x = "Crime Type",
y = "Number of Arrests") +
theme_minimal()
From the initial bar chart, it is evident that “Common Assault” is the most frequently recorded crime in Baltimore post COVID reaching close to 3000 offenses over the course of four years.
This chart explores the top 10 crime types in Baltimore from 2020 onward (excluding “unknown offense”) and visualizes the gender distribution for each offense category. The data is first filtered to include only records since January 1, 2020, ensuring a focus on recent trends. Next, the total arrests for each crime type are calculated, and the top 10 offenses are selected. By stacking the bars by gender, the chart highlights potential differences in offense patterns between men and women.
X-Axis (Crime Type): Lists the most frequently occurring crimes in descending order (left to right).
Y-Axis (Number of Arrests): Shows the total arrests for each crime type, segmented by gender.
Stacked Bars (Gender Split): Each bar’s segments indicate how many of the total arrests for that crime were male vs. female. Larger segments suggest a particular gender’s higher involvement in that offense.
By comparing the relative sizes of the stacked segments, you can quickly see if certain crimes show a strong gender disparity. This information may be useful for understanding demographic patterns, shaping targeted interventions, or guiding policy decisions aimed at reducing specific types of crime.
#stacked bar chart --> Gender Insights
library(ggplot2)
library(dplyr)
library(lubridate)
# Convert ArrestDateTime to Date format and filter data from 2020 onwards
df_clean <- df %>%
mutate(ArrestDateTime = as.Date(ArrestDateTime, format="%m/%d/%Y")) %>%
filter(ArrestDateTime >= as.Date("2020-01-01") &
IncidentOffence != "Unknown Offense" &
Gender != "")
# Count arrests by Gender & Crime Type
gender_crime_counts <- df_clean %>%
group_by(Gender, IncidentOffence) %>%
summarise(ArrestCount = n(), .groups = 'drop')
#Top 10 Most Common Crimes since 2020
top_crimes <- gender_crime_counts %>%
group_by(IncidentOffence) %>%
summarise(TotalArrests = sum(ArrestCount)) %>%
arrange(desc(TotalArrests)) %>%
head(10) %>%
pull(IncidentOffence)
#only the top crimes
gender_crime_counts_filtered <- gender_crime_counts %>%
filter(IncidentOffence %in% top_crimes)
#stacked bar chart
ggplot(gender_crime_counts_filtered, aes(x = reorder(IncidentOffence, -ArrestCount),
y = ArrestCount, fill = Gender)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Top 10 Most Common Crimes by Gender (Since 2020, Excluding 'Unknown Offense')",
x = "Crime Type",
y = "Number of Arrests",
fill = "Gender") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) #rotate
Following the most frequently recorded crimes in Baltimore, I was interested in the distribution of men and women committing these crimes. From this data it is evident that most of the recorded crimes were committed by men.
This visualization highlights how arrest frequencies in Baltimore have evolved on an annual basis. After converting the arrest timestamps to a simple date format, each record is grouped by its year of occurrence, and the total arrests are tallied for each year. The resulting line chart provides a clear temporal overview:
X-Axis (Year): Represents the calendar year extracted from the ArrestDate.
Y-Axis (Number of Arrests): Indicates the total arrests recorded for that year.
Line & Points: Each point corresponds to the total arrest count for a specific year, with a connecting line to illustrate the trend over time.
A steady rise might suggest increasing enforcement activity or changes in crime rate. A decline could reflect successful prevention strategies, demographic shifts, or changes in police procedures.By examining peaks, valleys, or plateaus in the line, you can identify potential external factors (e.g., policy changes, societal events) that correlate with notable shifts in arrest numbers. This information is essential for stakeholders seeking to understand long-term patterns in Baltimore’s crime and enforcement landscape
#year based data
library(dplyr)
library(ggplot2)
# Convert ArrestDateTime to Date format
df$ArrestDate <- as.Date(df$ArrestDateTime)
# Count arrests per year
arrest_trend <- df %>%
mutate(Year = format(ArrestDate, "%Y")) %>%
group_by(Year) %>%
summarise(Count = n())
# Line plot of arrests over the years
ggplot(arrest_trend, aes(x = Year, y = Count, group = 1)) +
geom_line(color = "blue") +
geom_point() +
labs(title = "Trend of Arrests Over the Years",
x = "Year",
y = "Number of Arrests") +
theme_minimal()
The line chart reveals an intriguing trend in Baltimore’s arrest data from 2010 onward. Over the course of the decade, there has been a steady decline in overall arrests, which may suggest improvements in crime prevention strategies, changes in policing practices, or broader societal shifts. However, in 2020—a year marked by significant social and economic upheaval—there is a noticeable spike in arrests. This increase could be linked to several factors, such as heightened police activity in response to rising tensions, changes in crime reporting, or the impact of the COVID-19 pandemic on community dynamics and law enforcement policies. These observations provide a compelling basis for further investigation into how external events and policy shifts have shaped crime trends in Baltimore over time.
This donut chart visually represents the distribution of arrests by race within the Baltimore dataset. The chart is created by first grouping the data by race, calculating the total number of arrests for each group, and then converting these counts into proportions (percentages). These proportions are used to determine the size of each slice in the donut chart.
Visualization Technique: The chart is constructed using a modified pie chart (a donut chart) where a hole is created in the center for a modern aesthetic. The slices are arranged around the circle, with each slice’s arc length proportional to the percentage of arrests for that race.
Data Insights:
The relative size of each slice immediately reveals which racial groups are over- or under-represented in arrest records. By including percentage labels, the chart offers a clear quantitative perspective on the contribution of each racial group to the overall arrest count. Purpose: This visualization helps identify demographic patterns in arrests, which can inform discussions about potential disparities or focus areas for further investigation in law enforcement practices.
Including this chart in your report supports a deeper understanding of the dataset’s demographic composition and provides a basis for analyzing broader trends in crime and community impact.
#Donut Chart: Arrests By Race
library(ggplot2)
library(dplyr)
# Summarize arrests by Race
race_counts <- df %>%
group_by(Race) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
mutate(prop = count / sum(count) * 100,
ypos = cumsum(prop) - 0.5 * prop)
# Create donut chart using a pie chart approach with a hole in the middle
ggplot(race_counts, aes(x = 2, y = prop, fill = Race)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar(theta = "y") +
xlim(0.5, 2.5) +
theme_void() +
labs(title = "Donut Chart: Arrests by Race") +
theme(legend.position = "right") +
geom_text(aes(y = ypos, label = paste0(round(prop, 1), "%")),
color = "white", size = 3)
In the dataset, while significant numbers of arrests are recorded for other racial groups, including White individuals and those listed as ‘unknown,’ African American individuals constitute a disproportionately high percentage of arrests. It is important to interpret these statistics with caution, as they may reflect systemic factors such as socioeconomic disparities, variations in law enforcement practices, and potential reporting biases, rather than a direct measure of criminal behavior.
This heatmap provides a visual representation of arrest patterns across Baltimore’s districts over time. In the chart, the x-axis represents different years, while the y-axis lists the districts. Each cell is color-coded according to the number of arrests in that district for a given year, with a gradient from light blue (lower counts) to dark blue (higher counts).
This visualization helps to quickly identify trends and disparities across different geographic areas. For example, consistently darker cells in certain districts may indicate persistently high arrest rates, while lighter cells in others could suggest lower levels of recorded criminal activity. Such patterns might reflect variations in population density, socioeconomic conditions, or differences in local law enforcement practices.
By examining these trends, stakeholders can better understand the spatial distribution of arrests over time in Baltimore and potentially focus further investigation or policy interventions in districts that exhibit unusual patterns or significant shifts in arrest activity.
#Heatmap: Arrests by District and Year
library(ggplot2)
library(dplyr)
library(lubridate)
# Prepare data: Group by District and Year
heatmap_data <- df %>%
mutate(Year = year(ArrestDateTime)) %>%
group_by(District, Year) %>%
summarise(count = n(), .groups = 'drop')
# Create heatmap
ggplot(heatmap_data, aes(x = factor(Year), y = District, fill = count)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Heatmap: Arrests by District and Year",
x = "Year",
y = "District",
fill = "Number of Arrests") +
theme_minimal()
The heatmap shows that the Central District of Baltimore has consistently reported the most amount of arrests from 2010 to 2024.
In summary, this analysis of Baltimore’s arrest data has revealed several noteworthy trends and patterns. Over the past decade, overall arrests have shown a steady decline, yet the spike observed in 2020 indicates that external factors—possibly related to the COVID-19 pandemic or shifts in policing practices—have had a significant impact on crime patterns. The bar chart of top crimes underscores that certain offenses, such as common assault, remain prevalent, while the stacked bar chart highlights a pronounced gender disparity, with males accounting for the majority of arrests.
The donut chart further reveals that African American individuals represent a disproportionately high percentage of arrests compared to other racial groups. This suggests that systemic factors—such as socioeconomic disparities, enforcement practices, and potential reporting biases—may be influencing these outcomes. Additionally, the heatmap of arrests by district shows that specific areas, notably the Central District, consistently report higher arrest counts, pointing to geographic disparities that warrant further investigation.
Collectively, these visualizations offer a comprehensive view of Baltimore’s crime landscape in the post-COVID era, highlighting the need for a deeper exploration of the underlying causes of these trends. The findings from this report provide a valuable foundation for future research and can help inform targeted policy interventions and community-based strategies to address persistent issues in public safety.