The dataset used for the following series of visualizations is downloaded from https://opendata.cityofnewyork.us/data/. As displayed, the dataset contains a multitude of variables related to shootings (fatal and non-fatal) in NY’s 5 boroughs (Bronx, Brooklyn, Manhattan, Queens, Staten Island). The data for these visualizations has been drawn from the current, as well as, newly created columns of INCIDENT_KEY, BORO, PRESCINCT, STATISTICAL_MURDER_FLAG, LONGITUDE, LATITUDE, Year, Month_str, and DayNameAbbrev. When looking at a real-world dataset that involves such dangerous actions, it is important to have a focus on location and time.
The following visualizations seek to answer:
Under the assumption/hypothesis that location and time will have a great effect on frequency, these visualizations will seek to paint a image of when/where shootings occur in the hopes that preventative measures can be enhanced in vulnerable/probable areas. This is simply a preliminary analysis to gain better insights into NYC crime.
# Imports for various charts and functions
import pandas as pd
import numpy as np
import plotly
import matplotlib.pyplot as plt
import warnings
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import seaborn as sns
from matplotlib.ticker import FuncFormatter
import matplotlib.style as style
style.use('seaborn')
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'
#Read in file/data
shootings = pd.read_csv("U:/NYPD_shooting_Data.csv")
shootings['date'] = pd.to_datetime(shootings['OCCUR_DATE'])
shootings['year'] = shootings['date'].dt.year
shootings['month'] = shootings['date'].dt.month
shootings['month_str'] = shootings['date'].dt.month_name()
shootings['day'] = shootings['date'].dt.day
shootings['DayOfTheWeek'] = shootings['date'].dt.dayofweek
shootings['DayName'] = shootings['date'].dt.strftime('%A')
shootings['DayNameAbbrev'] = shootings['date'].dt.strftime('%a')
shootings['hour'] = shootings['OCCUR_TIME'].apply(lambda date : int(date.split(':')[0]))
linegraph = shootings.groupby(['year', 'hour']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index()
linegraph['fatal shootings'] = shootings.groupby(['year', 'hour']).agg('sum').reset_index()['STATISTICAL_MURDER_FLAG']
stacked = shootings.groupby(['year', 'BORO']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index()
stacked = stacked.pivot(index = 'year', columns = 'BORO', values = 'count')
import matplotlib
Heatmap = shootings.groupby(['DayNameAbbrev', 'month_str']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index()
Heatmap = Heatmap[ Heatmap['DayNameAbbrev'].notna() & Heatmap['month_str'].notna()]
Heatmap1 = pd.pivot_table(Heatmap, index = 'DayNameAbbrev', columns = 'month_str', values = 'count')
column_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'September', 'October', 'November', 'December']
Heatmap1 = Heatmap1[column_order]
row_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
Heatmap1 = Heatmap1.reindex(row_order)
chart2 = shootings.groupby(['PRECINCT']).agg('count')['INCIDENT_KEY'].to_frame(name = "Count")
chart3 = chart2.sort_values(by=['Count'], ascending=False).reset_index()
chart3['PRECINCT'] = chart3['PRECINCT'].apply(str)
def pick_colors_according_to_mean_count(this_data):
colors=[]
avg = this_data.Count.mean()
for each in this_data.Count:
if each > avg*1.05:
colors.append('indigo')
elif each < avg*0.95:
colors.append('mediumseagreen')
else:
colors.append('mediumslateblue')
return colors
import matplotlib.patches as mpatches
bottom1 = 1
top1 = 121
d1 = chart3.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)
bottom2 = 1
top2 = 15
d2 = chart3.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)
Above = mpatches.Patch(color='indigo', label='Above Average')
At = mpatches.Patch(color='mediumseagreen', label='Within 5% of Average')
Below = mpatches.Patch(color='mediumslateblue', label='Below Average')The first graph illustrates the count of fatal shootings in NYC by hour, for the years 2006-2019. The main purpose is to identify potential trends in terms of time of day. The addition of 13 years is to provide consistency and reassurance that these are well defined/reoccurring trends.
As seen below, there is an initial drop off when we move from 12:00am to 5:00am, which picks back up again at approx. 4:00pm. The max it reached at around midnight, signalling that most shootings occur in the middle of the night. Over the years the trend remains near identical despite slight differences in overall frequency of shootings.
## Line Graph: Shootings by Year/Hour
fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1,1,1)
my_colors = {2006:'tomato',
2007:'greenyellow',
2008:'purple',
2009:'gold',
2010:'cornflowerblue',
2011:'navy',
2012:'violet',
2013:'darkorchid',
2014:'black',
2015:'cyan',
2016:'seagreen',
2017:'maroon',
2018:'blue',
2019:'navajowhite'}
for key, grp in linegraph.groupby(['year']):
grp.plot(ax=ax, kind='line', x='hour', y='count', color=my_colors[key], label=key)
plt.title('Fatal Shootings by Hour/Year', fontsize=25, color = 'Black')
ax.set_xlabel('Hour (24 Hour Interval)', fontsize=23, color = 'Black')
ax.set_ylabel('Fatal shootings', fontsize=23, color = 'Black')
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize=14)
ax.set_xticks(np.arange(24)) plt.show() A stacked bar chart represents the overall change in shooting totals over the years. Each bar is divided by Borough to give a glimpse as to where the most shootings occur over the years. Although these areas aren’t incredibly specific, they offer knowledge as to where we should look more in depth.
Over the course of 2006-2019 there is a constant decrease in total shootings. Year by year new lows are made with slight exceptions. For boroughs, the proportions of shootings in each year are fairly similar, meaning that the shootings general locations are fairly consistent over the years, with most shootings occuring in Brooklyn by a strong margin.
## Stacked Bar: Shootings by Hour/Borough
fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)
stacked.plot(kind = 'bar', stacked = True, ax=ax, cmap = 'viridis')
plt.ylabel('Total Shootings', fontsize = 23, labelpad = 15, color = 'Black')
plt.xlabel('Hour of the Day', fontsize = 23, labelpad = 15, color = 'Black')
plt.title('Total Shootings by Hour and Borough', fontsize = 25, color = 'Black')
plt.xticks(rotation = 0, horizontalalignment = 'center', fontsize = 16) plt.yticks(fontsize = 14) plt.legend(fontsize = 15)
ax.set_xlabel('Year', fontsize = 18)
ax.get_yaxis().set_major_formatter(
matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
plt.show()
This Heatmap presents a time stamp of when shootings occur inside of each month. A heatmap gives us an interesting visual into not just month and day of the week, but a combination of both in accordance to display a true picture of when these shootings happen.
With darker tones in purple representing lower frequencies and yellows representing higher frequencies it is evident that the majority of shootings occur on the weekends of summer months, and more specifically Sundays in July. Logically speaking this makes sense as there is more activity in general on these dates, but this also means that it is more likely that people are out late, further making sense of the night time shootings.
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p:format(int(x), ','))
ax = sns.heatmap(Heatmap1, linewidth = 0.2, annot = True, cmap = 'viridis', fmt=',.0f',
annot_kws = {'size': 14}, cbar_kws = {'format': comma_fmt,'orientation':'vertical'})
plt.ylabel('Days of the Week', fontsize = 24, labelpad = 15, color = 'Black')
plt.xlabel('Month', fontsize = 22, labelpad =15, color = 'Black')
plt.title('Heatmap: Shootings by Month and Day of the Week', fontsize = 22, color = 'Black')
plt.xticks(fontsize=15)plt.yticks(fontsize=14)
plt.show() This multi-panel frequency plot displays counts of shootings by Precinct. Being that the police are the main line of defense against such actions, it is important to understand which precincts are vulnerable or may need greater staffing/training. It may even be in correlation with the precincts employees performance.
Below we first see all 121 precincts forming a fairly even distribution of precincts above and below the mean, there is no dominant precinct that skews the data largely one way or the other. These findings are confirmed in the plot of the top 15 precincts. Here it is even more obvious that the top end of the data is evenly distributed, in comparison to the mean line.
fig = plt.figure(figsize=(18,16))
fig.suptitle('Frequency of Shootings Analysis by Presinct:\n Top ' + str(top1) + ' and Top ' + str(top2), fontsize=25, fontweight='bold', color='black')
ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.PRECINCT, d1.Count, label='Count', color=my_colors1)ax1.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d1.Count.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
plt.yticks(fontsize=16)ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' Precincts (Shootings)', size=23, color = 'black')
ax1.text(top1-65, d1.Count.mean()+25, 'Mean = ' + str(d1.Count.mean()), rotation=0, fontsize=14, color='black')
ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.PRECINCT, d2.Count, label='Count', color=my_colors2)ax2.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d2.Count.mean(), color='black', linestyle='dashed')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
plt.xticks(fontsize=15)plt.yticks(fontsize=16)ax2.axes.xaxis.set_visible(True)
ax2.set_title('Top ' + str(top2) + ' Precincts (Shootings)', size=23, color = 'black')
plt.show()The map below represents fatal and no-fatal shootings in NYC. Through the time-lapse of shootings over the hours of the day, the number of shootings (both fatal and non-fatal) dissipate during the middle of the day and pick up as night returns. This map also solidifies the claims reached by the stacked bar, that most shootings occur in Brooklyn and the Bronx.
NOTE: The following chart is interactive so feel free to click the play button.
path = "U:/Map Files/"
my_map = px.scatter_mapbox(shootings.sort_values("hour"), lat="Latitude", lon="Longitude",
zoom=10, animation_frame="hour", color = 'STATISTICAL_MURDER_FLAG',
labels={"STATISTICAL_MURDER_FLAG": "Fatal Shooting"})
my_map.update_layout(height=1000, width=1000) my_map.update_layout(mapbox_style='open-street-map') my_map.update_layout(title='Locations of NY shootings (1:00am-12:00am)')
plotly.offline.plot(my_map, filename=path + 'Map_Output.html')Overall, it is evident that time has a great effect on the overall likelihood that a shooting will take place. Looking at time in terms of years, months, and hours it can be concluded that the hours hold the greatest correlation, although months is a large determinant as well. From the combination of multiple plots and different views, shootings occur more frequently at night/summer, generally speaking. In terms of location it can be noted that the large majority of shootings occurred in Brooklyn and the Bronx. At the top end of the data there were was a normal spread in terms of precincts where no 1 precinct dominated the data. What we can gain from this analysis is a picture of when/where shootings occur in hopes that police forces can staff accordingly and organize shifts and preventative measures to minimize the detriments. Over the years shootings have decreased accordingly, it is likely that similar data has been a major factor in those efforts.