library(reticulate)
    use_python("C:/ProgramData/Anaconda3")

Analysis of Crime in Boston (2015-2021)

Introduction This page was created by Reggie Fay on March 30th, 2022. The page contains graphs of Boston crime data with the idea of analyzing the graphs to show where, when, and what types of crime occur in Boston. The overall goal of the analysis is to know when it is statistically more safe and more dangerous to be in Boston, which areas are more safe and dangerous, and what types of crimes are committed most and least frequently.

Dataset The dataset being analyzed on this page is data from 2015-2021. The dataset contains 535,292 rows of data within 11 columns, representing a specific crime or location statistic. Dataset Variables: Hour, Offense Code Group, Offense Description, District, Occurred on Date, Year, Month, Day of Week, Latitude, Longitude, and Location.

Findings After analyzing these graphs, it has been found that crime in Boston fluctuates depending on the hour of the day, location, and month. The top three crimes reported in Boston were also found: a person in need of medical personnel. This data could help Boston citizens and visitors be safe and smart when planning when and where to go, where to live, and what to avoid.

Bar Chart of Crime by Hour:

This bar chart shows the Crime by Hour in Boston from the early morning hours to the later hours of the evening. The green bars show when crime is below average, and the pink bars show when crime is above average. The dashed black line going through the bars shows the average crime rate by hour, which is 23,478 crimes reported. Most crime occurs at 5 PM, with approximately 35,000 crimes reported, which could partially be due to this being the time of day when people are getting out of work. The least crime occurs between 4 AM and 5 AM, with approximately 6,000 crimes recorded, which could partially be due to most people being in their homes asleep at this time. Crime gradually increases between 4 AM and 5 PM with some slight decreasing points. Crime gradually decreases between 5 PM and 11 PM and then spikes at midnight to approximately 30,000 crimes reported.

import pandas as pd
import warnings
import matplotlib.patches as mpatches
import matplotlib.pyplot
import numpy as np
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
import folium

dst = "U:/boston.csv"
df = pd.read_csv(dst, usecols = ['HOUR', 'OFFENSE_CODE_GROUP', 'OFFENSE_DESCRIPTION', 'DISTRICT','OCCURRED_ON_DATE','YEAR','MONTH','DAY_OF_WEEK', 'Lat', 'Long', 'Location'])

x = df.groupby(['HOUR']).agg({'HOUR':['count']}).reset_index()
x.columns = ['Hour','CrimeCount']

def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.CrimeCount.mean()
    for each in this_data.CrimeCount:
        if each > avg*1.01:
            colors.append('lightcoral')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('black')
    return colors
    
my_colors = pick_colors_according_to_mean_count(x)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')

fig=plt.figure(figsize=(10,5));
ax = fig.add_subplot(1,1,1);
ax.bar(x.Hour, x.CrimeCount, label='Count', color=my_colors);
ax.legend(handles=[Above, At, Below],fontsize = 10);
plt.axhline(x.CrimeCount.mean(), color='black', linestyle='dashed');
ax.set_title('Crime by Hour', size=30);
plt.xlabel('Hour of Day', fontsize=14);
plt.ylabel('Crime Count', fontsize=14);
ax.text(22, x.CrimeCount.mean()+500, 'Mean ='+str("{:,.0f}".format(x.CrimeCount.mean())));

ax.get_yaxis().set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')));

my_x_labels = [*range(0,24,1)];
plt.xticks(my_x_labels, fontsize=14, color='black');

plt.show()

Scatterplot of Crime by Month and Year:

This scatter plot shows Crime Count by Month and by Year. The color bar on the right side of the graph represents the amount of crime in various colors ranging from dark purple, the lowest amount, and yellow, the highest amount. The months of the seven different years are represented by all of the octagons that range in size depending on the amount of crime. It is apparent that in the warmer months, such as June, July, August, and September, more crime occurs, which could be because people are out in the city more frequently when the weather is nice. In the colder months of January and February, for example, less crime occurs, which could be due to people staying in their homes and not being out in the city as often. The blank spaces are present because there is either not enough data collected on those months or not nearly as much crime occurred in them compared to the other months.


df = df[df['MONTH'] !=0]

x = df.groupby(['MONTH', 'YEAR'])['YEAR'].count().reset_index(name='count')
x = pd.DataFrame(x)

x['count_hundreds'] = round(x['count']/100, 0)

plt.figure(figsize=(10,5));

plt.scatter(x['MONTH'], x['YEAR'], marker='8', cmap='viridis', 
            c=x['count_hundreds'], s=7*x['count_hundreds'],edgecolors='black');

plt.title('Crime Count by Month and by Year', fontsize = 18);
plt.xlabel('Months of the Year', fontsize=14);
plt.ylabel('Year', fontsize = 14);

cbar = plt.colorbar();
cbar.set_label('Amount of Crime (Thousands)', rotation=270, fontsize=14, color='black', labelpad=30);

#my_colorbar_tick_labels = [*range(10000, int(x['count'].max()), 10000)];
#my_colorbar_tick_labels = ['{:,}'.format(each)for each in my_colorbar_tick_labels];
#cbar.set_ticklabels(my_colorbar_tick_labels);

#my_x_ticks = [*range(x['MONTH'].min(), x['MONTH'].max()+1, 1)]
#plt.xticks(my_x_ticks, fontsize=14, color='black');
 
#my_y_ticks = [*range(x['YEAR'].min(), x['YEAR'].max()+1, 1)];
#plt.yticks(my_y_ticks, fontsize=14, color='black');
 
plt.show()

Stacked Bar Chart of Total Crimes Recorded by Hour and by Day:

This stacked bar chart shows the Total Crimes Recorded by Hour and by Day of the Week. The key in the top right corner color codes the days of the week, starting with pink at the top representing Monday all the way down to blue representing Sunday. All of the bars in this chart align with the first visualization, which was also a bar chart. However, we can now see that crime is, for the most part, evenly distributed throughout all days of the week. As the total crimes recorded increases, each day of the seven days of the week does, not just one or two. Friday and Saturday are slightly more prominent than other days of the week during some points of the day, which could be because people go out in the city during the afternoon and evening more so than they would in the middle of the workweek.


x = df.groupby(['DAY_OF_WEEK', 'HOUR'])['HOUR'].count().reset_index(name='count')
x = pd.DataFrame(x)

stacked_df = x.groupby(['HOUR', 'DAY_OF_WEEK'])['count'].sum().reset_index(name='TotalCrime')

stacked_df.columns = ['Hour','WeekDay','TotalCrime']
stacked_df = stacked_df.pivot(index = 'Hour', columns='WeekDay', values='TotalCrime')

day_order = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
stacked_df = stacked_df.reindex(columns=reversed(day_order))

fig = plt.figure(figsize = (12,7));
ax = fig.add_subplot(1,1,1);

stacked_df.plot(kind='bar', stacked=True, ax=ax)

plt.ylabel('Total Crimes Recorded', fontsize=14);
plt.title('Total Crimes Recorded by Hour and by Day \n Stacked Bar Plot', fontsize=18);
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize= 14);
plt.yticks(fontsize=14);
ax.set_xlabel('Hour (24 Hour Interval)', fontsize = 14);
ax.get_yaxis().set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')));

handles, labels = ax.get_legend_handles_labels();
handles = [handles[6],handles[5],handles[4],handles[3],handles[2],handles[1],handles[0],];
labels = [labels[6],labels[5],labels[4],labels[3],labels[2],labels[1],labels[0],];
plt.legend(handles, labels, loc = 'best', fontsize = 7);

plt.show()

###Nested Pie Chart of Total Crime by Quarter and Month:

This nested pie chart shows the Total Crime by Quarter and Month. The outermost part of the chart represents the four quarters of the year and shows the percentage and amount of crime in each. The second outermost part of the chart represents all of the months of the year and the percentage of crime that occurs in each. The middle of the chart shows the total crimes, which is 563,477. The most crime occurs in the third quarter of the year, holding 29.86% of the total crimes, and the least amount of crime occurs in the first quarter of the year, holding 23.74% of total crimes. This could be due to more people being outside in the third quarter when the weather is warmest in Boston and staying in the first quarter when the weather is much colder.


df.OCCURRED_ON_DATE = pd.to_datetime(df['OCCURRED_ON_DATE'], format='%Y-%m-%d %H:%M:%S')
df['Quarter'] = df.OCCURRED_ON_DATE.dt.quarter

pie_df = df.groupby(['Quarter', 'MONTH'])['Quarter'].count().reset_index(name='TotalCrime')

number_outside_colors = len(pie_df.Quarter.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

number_inside_colors = len(pie_df.MONTH.unique())
all_color_ref_number = np.arange(number_outside_colors + number_inside_colors)

inside_color_ref_number = []
for each in all_color_ref_number:
    if each not in outside_color_ref_number:
        inside_color_ref_number.append(each)
        
fig = plt.figure(figsize=(12,7));
ax = fig.add_subplot(1,1,1);

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_crimes = pie_df.TotalCrime.sum()

pie_df.groupby(['Quarter'])['TotalCrime'].sum().plot(
        kind='pie', radius=1, colors = outer_colors, pctdistance = 0.84, labeldistance = 1.1,
        wedgeprops = dict(edgecolor='white'), textprops={'fontsize':13},
        autopct = lambda p: '{:,.2f}%\n({:,.0f})'.format(p,(p/100)*all_crimes),
        startangle=90);

inner_colors = colormap(inside_color_ref_number)
pie_df.TotalCrime.plot(
        kind='pie', radius=0.7, colors = inner_colors, pctdistance = 0.55, labeldistance = 0.8,
        wedgeprops = dict(edgecolor='white'), textprops={'fontsize':10},
        labels = pie_df.MONTH, 
        autopct = '%1.2f%%',
        startangle=90);

hole = plt.Circle((0,0), 0.3, fc='white');
fig1 = plt.gcf();
fig1.gca().add_artist(hole);

ax.yaxis.set_visible(False);
plt.title('Total Crime by Quarter and Month', fontsize=18);

ax.text(0,0, 'Total Crimes\n' + '{:,}'.format(all_crimes), size= 13, ha='center', va='center');

ax.axis('equal');
plt.tight_layout();

plt.show()

Map of Boston Reporting Locations of Crime (Top 3):

This visualization is a map of Boston with different colored points representing the top three crimes that occur, a person in need of medical personnel in green, property damage in red, and vandalism in blue. This map has many data points because it shows these three crimes over seven years. Much crime occurs by the coast and more southern parts of Boston. Additionally, the three crimes analyzed in this map tend to bunch to themselves, especially vandalism in specific areas. This could be due to some areas of Boston being less supervised and more populated than other areas.


import folium;

map_df = df.groupby(['OFFENSE_DESCRIPTION', 'Location', 'Lat', 'Long', 'OCCURRED_ON_DATE', 'DISTRICT'])['Location'].count().reset_index(name='TotalCrime');

center_of_map = [42.334453, -71.089938];

my_map = folium.Map(location = center_of_map,
                    zoom_start = 11,
                    tile = 'cartodbpositron',
                    width = '90%',
                    height = '100%',
                    left = '5%',
                    right = '5%',
                    top = '0%');

tiles = ['cartodbpositron','openstreetmap','stamenterrain','stamentoner'];
for tile in tiles:
    folium.TileLayer(tile).add_to(my_map);
    
folium.LayerControl().add_to(my_map);

for i in range(0, len(map_df)):
    crime = map_df.loc[i, 'OFFENSE_DESCRIPTION']
    if crime == 'SICK/INJURED/MEDICAL - PERSON':
            color='green'
    elif crime == 'M/V - LEAVING SCENE - PROPERTY DAMAGE':
            color='red'
    elif crime == 'VANDALISM':
            color='blue'
    else:
            color = 'black';
           
    try:
        if (color != 'black'):
            folium.Circle(location = [map_df.loc[i, 'Lat'], map_df.loc[i, 'Long']],
                         tooltip = map_df.loc[i, 'OFFENSE_DESCRIPTION'],
                         popup = 'Date: {}: \n District: {}'.format(map_df.loc[i, 'OCCURRED_ON_DATE'], map_df.loc[i, 'DISTRICT']),
                         radius = 10,
                         color = color,
                         fill = True,
                         fill_color = color,
                         fill_opacity = 0.5).add_to(my_map);
                                                
    except:
        pass;

my_map
Make this Notebook Trusted to load map: File -> Trust Notebook

Conclusion:

Overall, the key takeaway from the analysis of these charts is that no one statistic describes the crime in Boston as a whole. Instead, it is the combination of the statistics of the type of crime, location, month of the year, day of the week, hour of the day, and even more beyond this analysis. When pondering the question of which combination of statistics allows for the best understanding of Boston crime, this analysis would include possible portions of the answer: Crime is at its lowest between the hours of 4 AM and 5 AM and at its highest around 5 PM which could be due to people sleeping in those early hours of the morning and leaving work around that time in the afternoon. The average amount of crime recorded per hour in Boston is 23,478, similar to the number of crimes recorded in the late morning into the early afternoon. More crime occurs in the warmer months of the year, specifically quarter 3 with 29.86%, due to more people being out in Boston in the warmer weather. The least amount of crime happens in the colder months of the year, specifically quarter 1 with 21.96%, when people tend to not be as out and about in the city. Crime occurs more on the weekends, especially Fridays, than on the weekdays because, again, people are out in the city more and tourists come to visit on the weekends as well. A large portion of crime occurs on the coast of Boston and in its southern parts, which could be due to those parts being more populated than others.