Baltimore 2025 Crime Data

My data source is Baltimore 2025 crime incident data collected from city records. In this report, I examine patterns in crime frequency across time, location, and demographic breakdowns. The dataset includes variables such as crime description, date and time, location coordinates, and demographic attributes like gender. After cleaning the data and converting date fields, the dataset allows for analysis of trends by weekday, month, crime category, and geographic distribution.

#imports; read file into dataframe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
from matplotlib.ticker import FuncFormatter
import seaborn as sns
import warnings
import folium
warnings.filterwarnings('ignore')

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/users/mcshe/anaconda3/library/plugins/platforms'

path = "C:/Users/mcshe/OneDrive/Documents/Python Project Data Vis/"
filename = "Part1_Crime_Beta_5960161298247612570.csv"

df = pd.read_csv(
    path + filename,
    usecols=['CCNumber', 'CrimeDateTime', 'Description', 'Inside_Outside',
             'Weapon', 'Gender', 'Age', 'Race', 'Neighborhood',
             'Latitude', 'Longitude', 'PremiseType'],
    low_memory=False
)

#convert to datetime
df['CrimeDateTime'] = pd.to_datetime(
    df['CrimeDateTime'],
    format='%m/%d/%Y %I:%M:%S %p',
    errors='coerce'
)

#create date parts
df['Year'] = df['CrimeDateTime'].dt.year
df['Month'] = df['CrimeDateTime'].dt.month
df['Day'] = df['CrimeDateTime'].dt.day
df['Day of the Week'] = df['CrimeDateTime'].dt.dayofweek
df['Month Name'] = df['CrimeDateTime'].dt.strftime('%b')
df['Weekday Name'] = df['CrimeDateTime'].dt.strftime('%a')

Visualization 1: Stacked Bar Chart

Comparing crime category counts per weekday

Crime incidents were grouped by weekday and crime category, then displayed as a stacked bar chart. Similar categories such as larceny and robbery were consolidated to reduce clutter and improve readability.

#combine Larceny and Robbery categories to reduce clutter in graph
df['Crime Group'] = df['Description']

df.loc[df['Description'].str.contains('LARCENY', case=False, na=False), 'Crime Group'] = 'Larceny'
df.loc[df['Description'].str.contains('ROBBERY', case=False, na=False), 'Crime Group'] = 'Robbery'

#convert legend/category labels to Title Case
df['Crime Group'] = df['Crime Group'].str.title()

#create count values for each crime group by weekday
counts = df.groupby(['Weekday Name', 'Crime Group']).size().unstack(fill_value=0)

#reorder weekdays so it starts with Sun
weekday_order = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
counts = counts.reindex(weekday_order)

#create stacked bar chart
ax = counts.plot(
    kind='bar',
    stacked=True,
    figsize=(12, 6)
)

plt.title('Crime Incidents by Weekday and Type')
plt.xlabel('Weekday')
plt.ylabel('Number of Incidents')
plt.legend(title='Description', bbox_to_anchor=(1.05, 1))

#add commas to Y-axis labels
ax.yaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}'))

#keep X-axis labels unrotated
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

The chart shows that crime levels remain relatively consistent throughout the week, with slight increases toward the weekend. Larceny appears to be the most frequent category across all days, forming the largest portion of each bar. Other categories, such as robbery, contribute smaller but noticeable portions. Overall, there is no single weekday that dramatically exceeds others, suggesting that crime in Baltimore is relatively evenly distributed across the week, with only mild weekend effects.

Visualization 2: Line chart

Crimes per month

Crime incidents were aggregated by month and plotted as a line chart to observe seasonal trends.

#group by month number
monthly_counts = df.groupby(df['CrimeDateTime'].dt.month).size()

#ensure all 12 months appear on the axis
monthly_counts = monthly_counts.reindex(range(1, 13), fill_value=0)

#create plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(monthly_counts.index, monthly_counts.values, marker='o')

plt.title('Crime Incidents by Month')
plt.xlabel('Month')
plt.ylabel('Number of Incidents')

#set month labels
plt.xticks(
    ticks=range(1, 13),
    labels=['Jan','Feb','Mar','Apr','May','Jun',
            'Jul','Aug','Sep','Oct','Nov','Dec']
)
#add commas to y-axis
ax.yaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}'))

plt.show()

The chart shows moderate variation in crime levels throughout the year, but no extreme spikes or drops. February had the lowest number, while months such as May through October show slightly higher activity. This could indicate possible seasonal influences such as weather or increased outdoor activity. However, the overall pattern suggests that crime remains relatively stable across months, with only gradual fluctuations rather than sharp changes.

Visualization 3: Heatmap

Crime group incidents by weekday

A heatmap was created to visualize the frequency of each crime group across weekdays. Each cell represents the number of incidents for a specific crime type on a given day, with darker colors indicating higher counts.

#create heatmap dataframe using pivot_table
heatmap_df = pd.pivot_table(
    df,
    index='Crime Group',
    columns='Weekday Name',
    aggfunc='size',
    fill_value=0
)

#reorder columns in weekday order
heatmap_df = heatmap_df[weekday_order]

#set style and comma format
sns.set_theme(style="white")
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

#create figure
fig = plt.figure(figsize=(20, 12))
ax = fig.add_subplot(1, 1, 1)

#build heatmap
ax = sns.heatmap(
    heatmap_df,
    linewidths=0.2,
    annot=True,
    cmap=sns.light_palette("#4b0082", as_cmap=True),
    fmt=',.0f',
    annot_kws={'size': 11},
    cbar_kws={
        'format': comma_fmt,
        'orientation': 'vertical'
    }
)

#add titles and labels
plt.title('Crime Frequency by Group and Day of Week', fontsize=18, pad=15)
plt.xlabel('Day of Week', fontsize=18, labelpad=10)
plt.ylabel('Crime Group', fontsize=18, labelpad=10)

#tick formatting and color customization
plt.xticks(size=14)
plt.yticks(rotation=0, size=14)
cbar = ax.collections[0].colorbar

max_count = heatmap_df.to_numpy().max()
ticks = [*range(0, max_count + 5000, 5000)]
cbar.set_ticks(ticks)
cbar.set_ticklabels(['{:,.0f}'.format(x) for x in ticks])

cbar.set_label(
    'Number of Incidents',
    rotation=270,
    fontsize=14,
    labelpad=20
)

plt.show()

The heatmap highlights that larceny is consistently the most frequent crime across all days. Other categories, such as assault and burglary, also show steady activity but at lower levels. There are no strong day-specific spikes for most crime types, reinforcing the idea that crime patterns are fairly consistent throughout the week. The visualization makes it easy to compare relative intensity across both crime types and days simultaneously.

Visualization 4: Folium Dot Map

Plotting incidents of larceny, assault, and robbery

A geographic dot map was created using a sampled subset of the data to display the spatial distribution of selected crime types. Points were color-coded by category, with larceny, assault, and robbery represented by different colors.

#prepare data: drop NAs, use sampling to create smaller version of the dataframe
map_df = df.dropna(subset=['Latitude', 'Longitude']).copy()
map_df['Latitude'] = pd.to_numeric(map_df['Latitude'], errors='coerce')
map_df['Longitude'] = pd.to_numeric(map_df['Longitude'], errors='coerce')
map_df = map_df.dropna(subset=['Latitude', 'Longitude']).reset_index(drop=True)
map_df = map_df.sample(1000, random_state=1).reset_index(drop=True)

#make Penn Station center of map
center_of_map = [39.3024723, -76.6195023]

bmore_map = folium.Map(
    location=center_of_map,
    zoom_start=12,
    width='90%',
    height='100%',
    left='5%',
    right='5%',
    top='0%'
)

#add tile layers
folium.TileLayer('OpenStreetMap').add_to(bmore_map)
folium.TileLayer('CartoDB positron').add_to(bmore_map)
folium.LayerControl().add_to(bmore_map)
#add points
for i in range(len(map_df)):

    crime = str(map_df.iloc[i]['Crime Group']).lower()

    if 'larceny' in crime:
        color = 'green'
    elif 'assault' in crime:
        color = 'red'
    elif 'robbery' in crime:
        color = 'blue'
    else:
        continue

    folium.CircleMarker(
        location=[
            map_df.iloc[i]['Latitude'],
            map_df.iloc[i]['Longitude']
        ],
        tooltip=map_df.iloc[i]['Crime Group'],
        popup='Description: {}'.format(map_df.iloc[i]['Description']),
        radius=3,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.4
    ).add_to(bmore_map)
bmore_map.save(path + 'Baltimore Crime Dot Map.html')
bmore_map
Make this Notebook Trusted to load map: File -> Trust Notebook

The map shows that crime incidents are concentrated in certain areas of the city rather than evenly distributed. Clusters appear in more densely populated or active urban areas, suggesting that crime is influenced by population density, economic activity, and environmental factors. The visualization also highlights how different crime types overlap geographically, indicating that high-activity areas tend to experience multiple types of crime rather than just one.

Visualization 5: Nested Pie Chart

Crime counts broken down by gender

A nested pie chart was used to show the distribution of crimes by gender (outer ring) and further break down each gender category by crime type (inner ring). To improve clarity, all assault-related categories were combined, and only the most common crime types were displayed explicitly.

#clean data
df['Gender'] = df['Gender'].fillna('U')
df['Gender'] = df['Gender'].replace('Unknown', 'U')

#combine any crime containing "assault" into one category
df['Crime Group'] = df['Crime Group'].fillna('').astype(str)
df['Crime Group'] = df['Crime Group'].apply(
    lambda x: 'Assault' if 'assault' in x.lower() else x.title()
)

#keep only top crime groups to reduce clutter
top_crimes = df['Crime Group'].value_counts().nlargest(3).index
df['Crime Group Clean'] = df['Crime Group'].apply(
    lambda x: x if x in top_crimes else 'Other'
)

#build dataframe
pie_df = (
    df.groupby(['Gender', 'Crime Group Clean'])
      .size()
      .reset_index(name='Count')
)

#shorten labels
pie_df['Crime Label'] = pie_df['Crime Group Clean'].replace({
    'Common Assault': 'Assault',
    'Agg. Assault': 'Agg Assault'
})

#force gender order
gender_order = ['F', 'M', 'U']
pie_df['Gender'] = pd.Categorical(
    pie_df['Gender'],
    categories=gender_order,
    ordered=True
)

#sort so inner slices line up properly within each outer slice
pie_df.sort_values(by=['Gender', 'Crime Group Clean'], inplace=True)
pie_df.reset_index(drop=True, inplace=True)

#create counts and total
outer_counts = pie_df.groupby('Gender')['Count'].sum()
inner_counts = pie_df['Count']
inner_labels = pie_df['Crime Label']
all_crimes = pie_df['Count'].sum()

#add colors
outer_color_map = {
    'F': 'red',
    'M': 'blue',
    'U': '#66aa66'
}
outer_colors = [outer_color_map[g] for g in outer_counts.index]

inner_colors = []
for gender in pie_df['Gender']:
    if gender == 'F':
        inner_colors.append('#ff9999')   # light red
    elif gender == 'M':
        inner_colors.append('#99ccff')   # light blue
    else:
        inner_colors.append('#66ff66')   # light green

#create plot
fig = plt.figure(figsize=(14, 14))
ax = fig.add_subplot(1, 1, 1)

#outside pie = Gender
outer_counts.plot(
    kind='pie',
    radius=1,
    colors=outer_colors,
    labels=outer_counts.index,
    labeldistance=1.12,
    pctdistance=0.86,
    wedgeprops=dict(edgecolor='w', linewidth=1),
    textprops=dict(fontsize=12),
    autopct=lambda p: '{:.1f}%'.format(p),
    startangle=90,
    ax=ax
)

#inside pie = Crime group within gender
ax.pie(
    inner_counts,
    radius=0.62,
    colors=inner_colors,
    labels=inner_labels,
    labeldistance=0.66,
    wedgeprops=dict(width=0.28, edgecolor='w', linewidth=1),
    textprops=dict(fontsize=12),
    pctdistance=0.86,
    autopct=lambda p: '{:.1f}%'.format(p),
    startangle=90
)
#hole in middle
hole = plt.Circle((0, 0), 0.35, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

#clean up
ax.yaxis.set_visible(False)
ax.axis('equal')
plt.title('Crime Distribution by Gender and Crime Group', fontsize=16, pad=10)

#center text
ax.text(
    0, 0,
    'Total Crimes\n{:,.0f}'.format(all_crimes),
    ha='center',
    va='center',
    size=13
)

plt.tight_layout()
plt.show()

The outer ring shows that crime incidents are distributed across genders, with male and female categories accounting for the majority and a smaller portion classified as unknown. Females actually took the lead in representation. The inner ring reveals that larceny and assault make up a large share of crimes within each gender group, while other categories contribute smaller portions. The similarity in composition across genders suggests that crime type distribution does not vary dramatically by gender, even if total counts differ.

Conclusion

Overall, the visualizations reveal that crime in Baltimore during 2025 is relatively consistent across time but concentrated geographically and dominated by a few key categories. Larceny emerges as the most frequent crime type across all analyses, while assault and burglary also contribute significantly. Temporal patterns show only moderate variation by weekday and month, suggesting that crime is a steady phenomenon rather than one driven by extreme seasonal or weekly spikes. Spatial analysis highlights clustering in specific areas, indicating the importance of location-based factors. Together, these findings suggest that crime patterns are shaped more by environment and activity levels than by time alone, with consistent trends across demographic categories.