Introduction

Data visualization is essential for analyzing crime trends and informing decision-making. This report explores visualizations of Boston’s 2020 crime data, using scatter plots, bar charts, line graphs, heat maps, and donut pie charts to highlight crime frequency, distribution, and trends. These visualizations will highlight patterns and anomalies that can provide us with valuable insight.

Dataset

The Boston Crime 2020 dataset contains 70,894 records of reported crime incidents in Boston throughout the year. It includes various details such as a unique incident number, offense code, and offense description. Location data is also provided, including district, street name, latitude, and longitude, though some values are missing. Temporal information such as the date, time, day of the week, month, and year of occurrence allows for time-based analysis of crime trends. Additionally, the dataset identifies whether an incident involved a shooting and includes reporting area codes. However, some columns, such as OFFENSE_CODE_GROUP and UCR_PART, lack data. Despite these gaps, the dataset offers valuable insights into crime patterns across Boston, making it useful for visualization and analysis.

Findings

My findings in this report show different crime trends that occurred in Boston in 2020. The following visualizations will highlight crime frequency by day of the week and hour of the day, as well as the most common types of crime in the city of Boston.

Visualization 1

This visualization displays crime frequency by the hour of the day, showing distinct trends in criminal activity throughout a 24-hour period. The data reveals that crime rates are lowest in the early morning hours (around 3AM - 6AM), gradually increasing as the day progresses. Crime activity peaks during the late morning and afternoon hours, with the highest number of incidents occurring around noon and early evening. A decline in crime is observed after 9PM, though there is an unusual spike at midnight. This pattern suggests that crime is more prevalent during active daytime and evening hours when more people are out and about, with a notable dip during the late-night and early-morning hours when activity in the city is lower.

import os 
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '//apporto.com/dfs/LOYOLA/Users/cgfallon_loyola/desktop/'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings 
import seaborn as sns
import os 
import plotly.graph_objects as go
import matplotlib.ticker as ticker

path = "U:\\"

df = pd.read_csv("Boston_Crime_2020.csv", low_memory = False)

df 
##       INCIDENT_NUMBER  ...                               Location
## 0           212012996  ...  (42.3169424347871, -71.0699121273113)
## 1           212008096  ...  (42.3524181472861, -71.0652549858121)
## 2           202000034  ...   (42.340069862647, -71.0527942008028)
## 3           202007210  ...   (42.3412875043904, -71.054679326494)
## 4           202000355  ...  (42.3618385665647, -71.0597648909416)
## ...               ...  ...                                    ...
## 70889       202095668  ...  (42.3166174973565, -71.0640782725732)
## 70890       202095647  ...  (42.2992630326008, -71.0762651335127)
## 70891       212000339  ...  (42.3618385665647, -71.0597648909416)
## 70892       212000004  ...                                    NaN
## 70893       212000003  ...  (42.2878296258842, -71.0711928171256)
## 
## [70894 rows x 17 columns]
df.columns
## Index(['INCIDENT_NUMBER', 'OFFENSE_CODE', 'OFFENSE_CODE_GROUP',
##        'OFFENSE_DESCRIPTION', 'DISTRICT', 'REPORTING_AREA', 'SHOOTING',
##        'OCCURRED_ON_DATE', 'YEAR', 'MONTH', 'DAY_OF_WEEK', 'HOUR', 'UCR_PART',
##        'STREET', 'Lat', 'Long', 'Location'],
##       dtype='object')
print(df.columns)
## Index(['INCIDENT_NUMBER', 'OFFENSE_CODE', 'OFFENSE_CODE_GROUP',
##        'OFFENSE_DESCRIPTION', 'DISTRICT', 'REPORTING_AREA', 'SHOOTING',
##        'OCCURRED_ON_DATE', 'YEAR', 'MONTH', 'DAY_OF_WEEK', 'HOUR', 'UCR_PART',
##        'STREET', 'Lat', 'Long', 'Location'],
##       dtype='object')
time_counts = df['HOUR'].value_counts().sort_index()

plt.figure(figsize=(10, 6))
sns.scatterplot(x=time_counts.index, y=time_counts.values, color='blue', edgecolor='black', s=100)
plt.xlabel("Hour of the Day", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Frequency by Hour of the Day", fontsize=14)
plt.xticks(range(0, 24))
## ([<matplotlib.axis.XTick object at 0x0000023FC3899E50>, <matplotlib.axis.XTick object at 0x0000023FC3899850>, <matplotlib.axis.XTick object at 0x0000023FC65C39D0>, <matplotlib.axis.XTick object at 0x0000023FC6C40450>, <matplotlib.axis.XTick object at 0x0000023FC6C42810>, <matplotlib.axis.XTick object at 0x0000023FC6C43FD0>, <matplotlib.axis.XTick object at 0x0000023FC6C4A110>, <matplotlib.axis.XTick object at 0x0000023FC206EDD0>, <matplotlib.axis.XTick object at 0x0000023FC6C6ED50>, <matplotlib.axis.XTick object at 0x0000023FC6C78F50>, <matplotlib.axis.XTick object at 0x0000023FC6C7B310>, <matplotlib.axis.XTick object at 0x0000023FC6C7D650>, <matplotlib.axis.XTick object at 0x0000023FC6C7E5D0>, <matplotlib.axis.XTick object at 0x0000023FC6C88350>, <matplotlib.axis.XTick object at 0x0000023FC6C8A650>, <matplotlib.axis.XTick object at 0x0000023FC6C90B10>, <matplotlib.axis.XTick object at 0x0000023FC6C92E10>, <matplotlib.axis.XTick object at 0x0000023FC65C7A10>, <matplotlib.axis.XTick object at 0x0000023FC6C95CD0>, <matplotlib.axis.XTick object at 0x0000023FC6C97E50>, <matplotlib.axis.XTick object at 0x0000023FC6C9E150>, <matplotlib.axis.XTick object at 0x0000023FC6CA4590>, <matplotlib.axis.XTick object at 0x0000023FC6C96B10>, <matplotlib.axis.XTick object at 0x0000023FC6CA7350>], [Text(0, 0, '0'), Text(1, 0, '1'), Text(2, 0, '2'), Text(3, 0, '3'), Text(4, 0, '4'), Text(5, 0, '5'), Text(6, 0, '6'), Text(7, 0, '7'), Text(8, 0, '8'), Text(9, 0, '9'), Text(10, 0, '10'), Text(11, 0, '11'), Text(12, 0, '12'), Text(13, 0, '13'), Text(14, 0, '14'), Text(15, 0, '15'), Text(16, 0, '16'), Text(17, 0, '17'), Text(18, 0, '18'), Text(19, 0, '19'), Text(20, 0, '20'), Text(21, 0, '21'), Text(22, 0, '22'), Text(23, 0, '23')])
plt.grid(True, linestyle='--', alpha=0.6)

Visualization 2

The bar chart displays the frequency of crimes reported in different districts of Boston in 2020. The district with the highest crime frequency is B2, followed closely by D4 and C11. These three districts have significantly higher crime rates than the others, indicating that crime is more concentrated in these areas. In contrast, districts such as A15 and the category labeled “External” have the lowest crime frequencies, suggesting that these areas experience relatively less criminal activity. The data shows a gradual decline in crime frequency from left to right, illustrating the disparity in crime distribution across different districts. This visualization highlights potential high-crime areas that may require increased law enforcement or community interventions.

district_counts = df["DISTRICT"].value_counts().reset_index()
district_counts.columns = ["District", "Crime Count"]
plt.figure(figsize=(10, 6))
sns.barplot(data=district_counts, x="District", y="Crime Count", palette="viridis")
plt.xlabel("District")
plt.ylabel("Crime Frequency")
plt.title("Crime Frequency by District in Boston (2020)")
plt.xticks(rotation=45)
## (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]), [Text(0, 0, 'B2'), Text(1, 0, 'D4'), Text(2, 0, 'C11'), Text(3, 0, 'B3'), Text(4, 0, 'A1'), Text(5, 0, 'C6'), Text(6, 0, 'D14'), Text(7, 0, 'E18'), Text(8, 0, 'E13'), Text(9, 0, 'E5'), Text(10, 0, 'A7'), Text(11, 0, 'A15'), Text(12, 0, 'External')])
plt.grid(axis='x', linestyle='--', alpha=0.6)

Visualization 3

This line graph illustrates the hourly crime frequency for the top five crime types in Boston during 2020. The x-axis represents the hour of the day, ranging from 0 to 23, while the y-axis indicates the crime frequency. Each crime type is represented by a different colored line, as shown in the legend. The data reveals that “Investigate Person” incidents occur most frequently, with a sharp peak around midnight and another increase in the evening. “Investigate Property” and “MV - Leaving Scene - Property Damage” follow similar patterns, showing lower occurrences during the early morning hours and a steady rise throughout the day, peaking in the late afternoon or early evening. “Sick Assist” incidents gradually increase from morning to evening, with their highest frequency occurring in the late afternoon. “Vandalism” follows a comparable trend, with minimal activity in the early morning and a noticeable rise later in the day. Overall, crime frequency is at its lowest between 3 AM and 6 AM, gradually increasing throughout the day and peaking during the evening hours.


top_crimes = df["OFFENSE_DESCRIPTION"].value_counts().head(5).index

filtered_crimes = df[df["OFFENSE_DESCRIPTION"].isin(top_crimes)]

crime_trends_type = filtered_crimes.groupby(["HOUR", "OFFENSE_DESCRIPTION"]).size().reset_index(name="Crime Count")

plt.figure(figsize=(12, 6))
sns.lineplot(data=crime_trends_type, x="HOUR", y="Crime Count", hue="OFFENSE_DESCRIPTION", marker="o", palette="tab10")

plt.xlabel("Hour of the Day")
plt.ylabel("Crime Frequency")
plt.title("Crime Frequency by Hour for Top 5 Crime Types in Boston (2020)")
plt.legend(title="Crime Type")
plt.grid(True, linestyle="--", alpha=0.6)

Visualization 4

The heatmap visualizes crime frequency by hour of the day and day of the week, with colors representing crime intensity—blue indicating lower crime counts and red indicating higher crime counts. The data shows that crime is most frequent during the late evening and early morning hours, particularly around midnight across all days, with Monday having the highest peak at 792 incidents. Crime rates are relatively low in the early morning hours (3 AM to 6 AM) but gradually increase throughout the day. The highest concentrations of crime occur between noon and early evening (12 PM to 6 PM), with noticeable peaks between 6 PM and 9 PM. Weekend crime trends differ slightly, showing increased activity later at night compared to weekdays. This pattern suggests that crime is more prevalent during active social hours and declines during typical sleep hours.

datetime_column = 'OCCURRED_ON_DATE'

comma_fmt = ticker.FuncFormatter(lambda x, pos: f'{x:,.0f}')

if datetime_column in df.columns: df[datetime_column] = pd.to_datetime(df[datetime_column])
df['Hour'] = df[datetime_column].dt.hour
df['DayOfWeek'] = df[datetime_column].dt.dayofweek
heatmap_data = df.pivot_table(index='DayOfWeek', columns='Hour', aggfunc='size', fill_value=0)
plt.figure(figsize=(15, 5))
ax = sns.heatmap(heatmap_data, cmap='coolwarm', annot=True, fmt=',.0f', 
                 square = True, annot_kws = {'size': 9}, 
                 cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.xlabel('Hour of the Day')
plt.ylabel('Day of the Week')
plt.title('Heatmap of Crime Frequency by Hour and Day of the Week')
plt.yticks(ticks=range(7), labels=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], rotation=0)
## ([<matplotlib.axis.YTick object at 0x0000023FC6382910>, <matplotlib.axis.YTick object at 0x0000023FC63893D0>, <matplotlib.axis.YTick object at 0x0000023FC638A710>, <matplotlib.axis.YTick object at 0x0000023FC62A7DD0>, <matplotlib.axis.YTick object at 0x0000023FC62A4F50>, <matplotlib.axis.YTick object at 0x0000023FC62A2210>, <matplotlib.axis.YTick object at 0x0000023FC852E5D0>], [Text(0, 0, 'Monday'), Text(0, 1, 'Tuesday'), Text(0, 2, 'Wednesday'), Text(0, 3, 'Thursday'), Text(0, 4, 'Friday'), Text(0, 5, 'Saturday'), Text(0, 6, 'Sunday')])

cbar = ax.collections[0].colorbar 
cbar.set_label('Crime Count', rotation = 270, fontsize = 12, color='black', labelpad=20)

Visualization 5

The donut chart visualizes the distribution of crime categories and subcategories, showing the relative frequency of different types of incidents. “Investigate Person” is the most common category, accounting for 16.1% of cases, followed by “Sick Assist” at 13.3% and “M/V - Leaving Scene - Property Damage” at 11.3%. Other notable categories include “Vandalism” (10.4%) and “Investigate Property” (10.1%). Less frequent categories include “Larceny Shoplifting” (6.4%) and “Sick/Injured/Medical - Person” (7.7%). The data suggests that a significant portion of incidents involve investigative actions or property-related issues, with medical and theft-related cases also contributing to crime reports. The distribution highlights the need for law enforcement and emergency response services to focus on a range of issues, from crime prevention to medical assistance and property protection.

category_column = 'OFFENSE_CODE_GROUP'

sub_category_column = 'OFFENSE_DESCRIPTION'

category_counts = df[category_column].value_counts().nlargest(5)

sub_category_counts = df[sub_category_column].value_counts().nlargest(10)

outer_labels = category_counts.index.tolist()
outer_sizes = category_counts.values.tolist()
inner_labels = sub_category_counts.index.tolist()
inner_sizes = sub_category_counts.values.tolist()
fig, ax = plt.subplots(figsize=(10, 6))
ax.pie(outer_sizes, labels=outer_labels, radius=1.2, wedgeprops=dict(width=0.3), autopct='%1.1f%%')
## ([], [], [])
ax.pie(inner_sizes, labels=inner_labels, radius=0.9, wedgeprops=dict(width=0.3), autopct='%1.1f%%')
## ([<matplotlib.patches.Wedge object at 0x0000023FC620D4D0>, <matplotlib.patches.Wedge object at 0x0000023FC6200CD0>, <matplotlib.patches.Wedge object at 0x0000023FC61FF150>, <matplotlib.patches.Wedge object at 0x0000023FC61FD310>, <matplotlib.patches.Wedge object at 0x0000023FC61FB290>, <matplotlib.patches.Wedge object at 0x0000023FC61F9090>, <matplotlib.patches.Wedge object at 0x0000023FC61F6B10>, <matplotlib.patches.Wedge object at 0x0000023FC61F4810>, <matplotlib.patches.Wedge object at 0x0000023FC61FB110>, <matplotlib.patches.Wedge object at 0x0000023FC61F0C10>], [Text(0.8662415976221428, 0.47929687517136793, 'INVESTIGATE PERSON'), Text(0.14018850858174467, 0.9800240721847735, 'SICK ASSIST'), Text(-0.5843544515746114, 0.799143213025635, 'M/V - LEAVING SCENE - PROPERTY DAMAGE'), Text(-0.9577254840140988, 0.25072274980535814, 'VANDALISM'), Text(-0.9157581743699178, -0.37614753232564957, 'INVESTIGATE PROPERTY'), Text(-0.5556867514035099, -0.8193364597737696, 'LARCENY THEFT FROM MV - NON-ACCESSORY'), Text(-0.07075871903359245, -0.9874680772970463, 'ASSAULT - SIMPLE'), Text(0.41943441494132594, -0.8967579224979436, 'TOWED MOTOR VEHICLE'), Text(0.7937474656964132, -0.5916628775751622, 'SICK/INJURED/MEDICAL - PERSON'), Text(0.9703046376323433, -0.1964915015698313, 'LARCENY SHOPLIFTING')], [Text(0.4724954168848052, 0.26143465918438247, '16.1%'), Text(0.07646645922640619, 0.5345585848280583, '13.3%'), Text(-0.3187387917679698, 0.43589629801398266, '11.3%'), Text(-0.5223957185531448, 0.13675786353019534, '10.4%'), Text(-0.4995044587472279, -0.2051713812685361, '10.1%'), Text(-0.30310186440191444, -0.44691079624023794, '8.5%'), Text(-0.03859566492741406, -0.5386189512529344, '8.2%'), Text(0.22878240814981415, -0.4891406849988783, '8.0%'), Text(0.4329531631071345, -0.32272520595008847, '7.7%'), Text(0.5292570750721872, -0.10717718267445345, '6.4%')])
plt.title("Donut Chart of Crime Categorie Distribution")