My data source is Baltimore 2025 crime incident data collected from city records. In this report, I examine patterns in crime frequency across time, location, and demographic breakdowns. The dataset includes variables such as crime description, date and time, location coordinates, and demographic attributes like gender. After cleaning the data and converting date fields, the dataset allows for analysis of trends by weekday, month, crime category, and geographic distribution.
#imports; read file into dataframe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
from matplotlib.ticker import FuncFormatter
import seaborn as sns
import warnings
import folium
warnings.filterwarnings('ignore')
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/users/mcshe/anaconda3/library/plugins/platforms'
path = "C:/Users/mcshe/OneDrive/Documents/Python Project Data Vis/"
filename = "Part1_Crime_Beta_5960161298247612570.csv"
df = pd.read_csv(
path + filename,
usecols=['CCNumber', 'CrimeDateTime', 'Description', 'Inside_Outside',
'Weapon', 'Gender', 'Age', 'Race', 'Neighborhood',
'Latitude', 'Longitude', 'PremiseType'],
low_memory=False
)
#convert to datetime
df['CrimeDateTime'] = pd.to_datetime(
df['CrimeDateTime'],
format='%m/%d/%Y %I:%M:%S %p',
errors='coerce'
)
#create date parts
df['Year'] = df['CrimeDateTime'].dt.year
df['Month'] = df['CrimeDateTime'].dt.month
df['Day'] = df['CrimeDateTime'].dt.day
df['Day of the Week'] = df['CrimeDateTime'].dt.dayofweek
df['Month Name'] = df['CrimeDateTime'].dt.strftime('%b')
df['Weekday Name'] = df['CrimeDateTime'].dt.strftime('%a')
Crime incidents were grouped by weekday and crime category, then displayed as a stacked bar chart. Similar categories such as larceny and robbery were consolidated to reduce clutter and improve readability.
#combine Larceny and Robbery categories to reduce clutter in graph
df['Crime Group'] = df['Description']
df.loc[df['Description'].str.contains('LARCENY', case=False, na=False), 'Crime Group'] = 'Larceny'
df.loc[df['Description'].str.contains('ROBBERY', case=False, na=False), 'Crime Group'] = 'Robbery'
#convert legend/category labels to Title Case
df['Crime Group'] = df['Crime Group'].str.title()
#create count values for each crime group by weekday
counts = df.groupby(['Weekday Name', 'Crime Group']).size().unstack(fill_value=0)
#reorder weekdays so it starts with Sun
weekday_order = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
counts = counts.reindex(weekday_order)
#create stacked bar chart
ax = counts.plot(
kind='bar',
stacked=True,
figsize=(12, 6)
)
plt.title('Crime Incidents by Weekday and Type')
plt.xlabel('Weekday')
plt.ylabel('Number of Incidents')
plt.legend(title='Description', bbox_to_anchor=(1.05, 1))
#add commas to Y-axis labels
ax.yaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}'))
#keep X-axis labels unrotated
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
The chart shows that crime levels remain relatively consistent
throughout the week, with slight increases toward the weekend. Larceny
appears to be the most frequent category across all days, forming the
largest portion of each bar. Other categories, such as robbery,
contribute smaller but noticeable portions. Overall, there is no single
weekday that dramatically exceeds others, suggesting that crime in
Baltimore is relatively evenly distributed across the week, with only
mild weekend effects.
Crime incidents were aggregated by month and plotted as a line chart to observe seasonal trends.
#group by month number
monthly_counts = df.groupby(df['CrimeDateTime'].dt.month).size()
#ensure all 12 months appear on the axis
monthly_counts = monthly_counts.reindex(range(1, 13), fill_value=0)
#create plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(monthly_counts.index, monthly_counts.values, marker='o')
plt.title('Crime Incidents by Month')
plt.xlabel('Month')
plt.ylabel('Number of Incidents')
#set month labels
plt.xticks(
ticks=range(1, 13),
labels=['Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec']
)
#add commas to y-axis
ax.yaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}'))
plt.show()
The chart shows moderate variation in crime levels throughout the year, but no extreme spikes or drops. February had the lowest number, while months such as May through October show slightly higher activity. This could indicate possible seasonal influences such as weather or increased outdoor activity. However, the overall pattern suggests that crime remains relatively stable across months, with only gradual fluctuations rather than sharp changes.
A heatmap was created to visualize the frequency of each crime group across weekdays. Each cell represents the number of incidents for a specific crime type on a given day, with darker colors indicating higher counts.
#create heatmap dataframe using pivot_table
heatmap_df = pd.pivot_table(
df,
index='Crime Group',
columns='Weekday Name',
aggfunc='size',
fill_value=0
)
#reorder columns in weekday order
heatmap_df = heatmap_df[weekday_order]
#set style and comma format
sns.set_theme(style="white")
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
#create figure
fig = plt.figure(figsize=(20, 12))
ax = fig.add_subplot(1, 1, 1)
#build heatmap
ax = sns.heatmap(
heatmap_df,
linewidths=0.2,
annot=True,
cmap=sns.light_palette("#4b0082", as_cmap=True),
fmt=',.0f',
annot_kws={'size': 11},
cbar_kws={
'format': comma_fmt,
'orientation': 'vertical'
}
)
#add titles and labels
plt.title('Crime Frequency by Group and Day of Week', fontsize=18, pad=15)
plt.xlabel('Day of Week', fontsize=18, labelpad=10)
plt.ylabel('Crime Group', fontsize=18, labelpad=10)
#tick formatting and color customization
plt.xticks(size=14)
plt.yticks(rotation=0, size=14)
cbar = ax.collections[0].colorbar
max_count = heatmap_df.to_numpy().max()
ticks = [*range(0, max_count + 5000, 5000)]
cbar.set_ticks(ticks)
cbar.set_ticklabels(['{:,.0f}'.format(x) for x in ticks])
cbar.set_label(
'Number of Incidents',
rotation=270,
fontsize=14,
labelpad=20
)
plt.show()
The heatmap highlights that larceny is consistently the most frequent crime across all days. Other categories, such as assault and burglary, also show steady activity but at lower levels. There are no strong day-specific spikes for most crime types, reinforcing the idea that crime patterns are fairly consistent throughout the week. The visualization makes it easy to compare relative intensity across both crime types and days simultaneously.
A geographic dot map was created using a sampled subset of the data to display the spatial distribution of selected crime types. Points were color-coded by category, with larceny, assault, and robbery represented by different colors.
#prepare data: drop NAs, use sampling to create smaller version of the dataframe
map_df = df.dropna(subset=['Latitude', 'Longitude']).copy()
map_df['Latitude'] = pd.to_numeric(map_df['Latitude'], errors='coerce')
map_df['Longitude'] = pd.to_numeric(map_df['Longitude'], errors='coerce')
map_df = map_df.dropna(subset=['Latitude', 'Longitude']).reset_index(drop=True)
map_df = map_df.sample(1000, random_state=1).reset_index(drop=True)
#make Penn Station center of map
center_of_map = [39.3024723, -76.6195023]
bmore_map = folium.Map(
location=center_of_map,
zoom_start=12,
width='90%',
height='100%',
left='5%',
right='5%',
top='0%'
)
#add tile layers
folium.TileLayer('OpenStreetMap').add_to(bmore_map)
folium.TileLayer('CartoDB positron').add_to(bmore_map)
folium.LayerControl().add_to(bmore_map)
#add points
for i in range(len(map_df)):
crime = str(map_df.iloc[i]['Crime Group']).lower()
if 'larceny' in crime:
color = 'green'
elif 'assault' in crime:
color = 'red'
elif 'robbery' in crime:
color = 'blue'
else:
continue
folium.CircleMarker(
location=[
map_df.iloc[i]['Latitude'],
map_df.iloc[i]['Longitude']
],
tooltip=map_df.iloc[i]['Crime Group'],
popup='Description: {}'.format(map_df.iloc[i]['Description']),
radius=3,
color=color,
fill=True,
fill_color=color,
fill_opacity=0.4
).add_to(bmore_map)
bmore_map.save(path + 'Baltimore Crime Dot Map.html')
bmore_map
The map shows that crime incidents are concentrated in certain areas of the city rather than evenly distributed. Clusters appear in more densely populated or active urban areas, suggesting that crime is influenced by population density, economic activity, and environmental factors. The visualization also highlights how different crime types overlap geographically, indicating that high-activity areas tend to experience multiple types of crime rather than just one.
A nested pie chart was used to show the distribution of crimes by gender (outer ring) and further break down each gender category by crime type (inner ring). To improve clarity, all assault-related categories were combined, and only the most common crime types were displayed explicitly.
#clean data
df['Gender'] = df['Gender'].fillna('U')
df['Gender'] = df['Gender'].replace('Unknown', 'U')
#combine any crime containing "assault" into one category
df['Crime Group'] = df['Crime Group'].fillna('').astype(str)
df['Crime Group'] = df['Crime Group'].apply(
lambda x: 'Assault' if 'assault' in x.lower() else x.title()
)
#keep only top crime groups to reduce clutter
top_crimes = df['Crime Group'].value_counts().nlargest(3).index
df['Crime Group Clean'] = df['Crime Group'].apply(
lambda x: x if x in top_crimes else 'Other'
)
#build dataframe
pie_df = (
df.groupby(['Gender', 'Crime Group Clean'])
.size()
.reset_index(name='Count')
)
#shorten labels
pie_df['Crime Label'] = pie_df['Crime Group Clean'].replace({
'Common Assault': 'Assault',
'Agg. Assault': 'Agg Assault'
})
#force gender order
gender_order = ['F', 'M', 'U']
pie_df['Gender'] = pd.Categorical(
pie_df['Gender'],
categories=gender_order,
ordered=True
)
#sort so inner slices line up properly within each outer slice
pie_df.sort_values(by=['Gender', 'Crime Group Clean'], inplace=True)
pie_df.reset_index(drop=True, inplace=True)
#create counts and total
outer_counts = pie_df.groupby('Gender')['Count'].sum()
inner_counts = pie_df['Count']
inner_labels = pie_df['Crime Label']
all_crimes = pie_df['Count'].sum()
#add colors
outer_color_map = {
'F': 'red',
'M': 'blue',
'U': '#66aa66'
}
outer_colors = [outer_color_map[g] for g in outer_counts.index]
inner_colors = []
for gender in pie_df['Gender']:
if gender == 'F':
inner_colors.append('#ff9999') # light red
elif gender == 'M':
inner_colors.append('#99ccff') # light blue
else:
inner_colors.append('#66ff66') # light green
#create plot
fig = plt.figure(figsize=(14, 14))
ax = fig.add_subplot(1, 1, 1)
#outside pie = Gender
outer_counts.plot(
kind='pie',
radius=1,
colors=outer_colors,
labels=outer_counts.index,
labeldistance=1.12,
pctdistance=0.86,
wedgeprops=dict(edgecolor='w', linewidth=1),
textprops=dict(fontsize=12),
autopct=lambda p: '{:.1f}%'.format(p),
startangle=90,
ax=ax
)
#inside pie = Crime group within gender
ax.pie(
inner_counts,
radius=0.62,
colors=inner_colors,
labels=inner_labels,
labeldistance=0.66,
wedgeprops=dict(width=0.28, edgecolor='w', linewidth=1),
textprops=dict(fontsize=12),
pctdistance=0.86,
autopct=lambda p: '{:.1f}%'.format(p),
startangle=90
)
#hole in middle
hole = plt.Circle((0, 0), 0.35, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
#clean up
ax.yaxis.set_visible(False)
ax.axis('equal')
plt.title('Crime Distribution by Gender and Crime Group', fontsize=16, pad=10)
#center text
ax.text(
0, 0,
'Total Crimes\n{:,.0f}'.format(all_crimes),
ha='center',
va='center',
size=13
)
plt.tight_layout()
plt.show()
The outer ring shows that crime incidents are distributed across
genders, with male and female categories accounting for the majority and
a smaller portion classified as unknown. Females actually took the lead
in representation. The inner ring reveals that larceny and assault make
up a large share of crimes within each gender group, while other
categories contribute smaller portions. The similarity in composition
across genders suggests that crime type distribution does not vary
dramatically by gender, even if total counts differ.
Overall, the visualizations reveal that crime in Baltimore during 2025 is relatively consistent across time but concentrated geographically and dominated by a few key categories. Larceny emerges as the most frequent crime type across all analyses, while assault and burglary also contribute significantly. Temporal patterns show only moderate variation by weekday and month, suggesting that crime is a steady phenomenon rather than one driven by extreme seasonal or weekly spikes. Spatial analysis highlights clustering in specific areas, indicating the importance of location-based factors. Together, these findings suggest that crime patterns are shaped more by environment and activity levels than by time alone, with consistent trends across demographic categories.