NYC Shootings Data (2006-2019)

The dataset used for the following series of visualizations is downloaded from https://opendata.cityofnewyork.us/data/. As displayed, the dataset contains a multitude of variables related to shootings (fatal and non-fatal) in NY’s 5 boroughs (Bronx, Brooklyn, Manhattan, Queens, Staten Island). The data for these visualizations has been drawn from the current, as well as, newly created columns of INCIDENT_KEY, BORO, PRESCINCT, STATISTICAL_MURDER_FLAG, LONGITUDE, LATITUDE, Year, Month_str, and DayNameAbbrev. When looking at a real-world dataset that involves such dangerous actions, it is important to have a focus on location and time.

The following visualizations seek to answer:

  1. What time of the day is most common for shootings to occur?
  2. What time of the year do most shootings occur?
  3. Where are shootings happening most frequently?
  4. What trend has developed over recent years?

Under the assumption/hypothesis that location and time will have a great effect on frequency, these visualizations will seek to paint a image of when/where shootings occur in the hopes that preventative measures can be enhanced in vulnerable/probable areas. This is simply a preliminary analysis to gain better insights into NYC crime.

# Imports for various charts and functions

import pandas as pd
import numpy as np
import plotly
import matplotlib.pyplot as plt
import warnings
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import seaborn as sns 
from matplotlib.ticker import FuncFormatter
import matplotlib.style as style
style.use('seaborn')
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'


#Read in file/data
shootings = pd.read_csv("U:/NYPD_shooting_Data.csv")



shootings['date'] = pd.to_datetime(shootings['OCCUR_DATE']) 
shootings['year'] = shootings['date'].dt.year 
shootings['month'] = shootings['date'].dt.month 
shootings['month_str'] = shootings['date'].dt.month_name() 
shootings['day'] = shootings['date'].dt.day 
shootings['DayOfTheWeek'] = shootings['date'].dt.dayofweek
shootings['DayName'] = shootings['date'].dt.strftime('%A')
shootings['DayNameAbbrev'] = shootings['date'].dt.strftime('%a') 
shootings['hour'] = shootings['OCCUR_TIME'].apply(lambda date : int(date.split(':')[0]))

linegraph = shootings.groupby(['year', 'hour']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index() 
linegraph['fatal shootings'] = shootings.groupby(['year', 'hour']).agg('sum').reset_index()['STATISTICAL_MURDER_FLAG'] 


stacked = shootings.groupby(['year', 'BORO']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index() 
stacked = stacked.pivot(index = 'year', columns = 'BORO', values = 'count')  

import matplotlib 

Heatmap = shootings.groupby(['DayNameAbbrev', 'month_str']).agg('count')['INCIDENT_KEY'].to_frame(name = "count").reset_index() 
Heatmap = Heatmap[ Heatmap['DayNameAbbrev'].notna() & Heatmap['month_str'].notna()] 

Heatmap1 = pd.pivot_table(Heatmap, index = 'DayNameAbbrev', columns = 'month_str', values = 'count') 
 
column_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'September', 'October', 'November', 'December'] 
Heatmap1 = Heatmap1[column_order] 
 
row_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] 
Heatmap1 = Heatmap1.reindex(row_order) 


chart2 = shootings.groupby(['PRECINCT']).agg('count')['INCIDENT_KEY'].to_frame(name = "Count")
chart3 = chart2.sort_values(by=['Count'], ascending=False).reset_index()
chart3['PRECINCT'] = chart3['PRECINCT'].apply(str)


def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.Count.mean()
    for each in this_data.Count:
        if each > avg*1.05:
            colors.append('indigo')
        elif each < avg*0.95:
            colors.append('mediumseagreen')
        else:
            colors.append('mediumslateblue')
    return colors

import matplotlib.patches as mpatches

bottom1 = 1
top1 = 121
d1 = chart3.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)

bottom2 = 1
top2 = 15
d2 = chart3.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)

Above = mpatches.Patch(color='indigo', label='Above Average')
At = mpatches.Patch(color='mediumseagreen', label='Within 5% of Average')
Below = mpatches.Patch(color='mediumslateblue', label='Below Average')

Line Graph: Shootings by Year/Hour

The first graph illustrates the count of fatal shootings in NYC by hour, for the years 2006-2019. The main purpose is to identify potential trends in terms of time of day. The addition of 13 years is to provide consistency and reassurance that these are well defined/reoccurring trends.

As seen below, there is an initial drop off when we move from 12:00am to 5:00am, which picks back up again at approx. 4:00pm. The max it reached at around midnight, signalling that most shootings occur in the middle of the night. Over the years the trend remains near identical despite slight differences in overall frequency of shootings.

## Line Graph: Shootings by Year/Hour


fig = plt.figure(figsize = (18,10)) 
ax = fig.add_subplot(1,1,1) 

my_colors = {2006:'tomato', 
             2007:'greenyellow', 
             2008:'purple', 
             2009:'gold', 
             2010:'cornflowerblue', 
             2011:'navy', 
             2012:'violet', 
             2013:'darkorchid', 
             2014:'black', 
             2015:'cyan', 
             2016:'seagreen', 
             2017:'maroon', 
             2018:'blue', 
             2019:'navajowhite'} 
 
for key, grp in linegraph.groupby(['year']):      
    grp.plot(ax=ax, kind='line', x='hour', y='count', color=my_colors[key], label=key) 
    plt.title('Fatal Shootings by Hour/Year', fontsize=25,  color = 'Black') 
    ax.set_xlabel('Hour (24 Hour Interval)', fontsize=23,  color = 'Black') 
    ax.set_ylabel('Fatal shootings', fontsize=23, color = 'Black') 
    plt.xticks(fontsize=15)
    plt.yticks(fontsize=15)
    plt.legend(fontsize=14)
 
 
ax.set_xticks(np.arange(24)) 
plt.show() 

Stacked Bar: Shootings by Hour/Borough

A stacked bar chart represents the overall change in shooting totals over the years. Each bar is divided by Borough to give a glimpse as to where the most shootings occur over the years. Although these areas aren’t incredibly specific, they offer knowledge as to where we should look more in depth.

Over the course of 2006-2019 there is a constant decrease in total shootings. Year by year new lows are made with slight exceptions. For boroughs, the proportions of shootings in each year are fairly similar, meaning that the shootings general locations are fairly consistent over the years, with most shootings occuring in Brooklyn by a strong margin.

## Stacked Bar: Shootings by Hour/Borough

 
fig = plt.figure(figsize = (18,10)) 
ax = fig.add_subplot(1, 1, 1) 
 
stacked.plot(kind = 'bar', stacked = True, ax=ax, cmap = 'viridis')
plt.ylabel('Total Shootings', fontsize = 23, labelpad = 15, color = 'Black') 
plt.xlabel('Hour of the Day', fontsize = 23, labelpad = 15, color = 'Black') 
plt.title('Total Shootings by Hour and Borough', fontsize = 25,  color = 'Black')
plt.xticks(rotation = 0, horizontalalignment = 'center', fontsize = 16) 
plt.yticks(fontsize = 14) 
plt.legend(fontsize = 15)
ax.set_xlabel('Year', fontsize = 18) 
ax.get_yaxis().set_major_formatter( 
    matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ','))) 
 
plt.show()  
 

Heatmap: Shootings by Day/Month

This Heatmap presents a time stamp of when shootings occur inside of each month. A heatmap gives us an interesting visual into not just month and day of the week, but a combination of both in accordance to display a true picture of when these shootings happen.

With darker tones in purple representing lower frequencies and yellows representing higher frequencies it is evident that the majority of shootings occur on the weekends of summer months, and more specifically Sundays in July. Logically speaking this makes sense as there is more activity in general on these dates, but this also means that it is more likely that people are out late, further making sense of the night time shootings.


fig = plt.figure(figsize=(18, 10)) 
ax = fig.add_subplot(1, 1, 1) 
 
comma_fmt = FuncFormatter(lambda x, p:format(int(x), ',')) 
ax = sns.heatmap(Heatmap1, linewidth = 0.2, annot = True, cmap = 'viridis', fmt=',.0f',          
                 annot_kws = {'size': 14}, cbar_kws = {'format': comma_fmt,'orientation':'vertical'}) 
 
plt.ylabel('Days of the Week', fontsize = 24, labelpad = 15, color = 'Black') 
plt.xlabel('Month', fontsize = 22, labelpad =15, color = 'Black') 
plt.title('Heatmap: Shootings by Month and Day of the Week', fontsize = 22, color = 'Black')
plt.xticks(fontsize=15)
plt.yticks(fontsize=14)

 
plt.show() 

Frequency Plot: Shootings by Precinct

This multi-panel frequency plot displays counts of shootings by Precinct. Being that the police are the main line of defense against such actions, it is important to understand which precincts are vulnerable or may need greater staffing/training. It may even be in correlation with the precincts employees performance.

Below we first see all 121 precincts forming a fairly even distribution of precincts above and below the mean, there is no dominant precinct that skews the data largely one way or the other. These findings are confirmed in the plot of the top 15 precincts. Here it is even more obvious that the top end of the data is evenly distributed, in comparison to the mean line.


fig = plt.figure(figsize=(18,16))
fig.suptitle('Frequency of Shootings Analysis by Presinct:\n Top ' + str(top1) + ' and Top ' + str(top2), fontsize=25, fontweight='bold', color='black')

ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.PRECINCT, d1.Count, label='Count', color=my_colors1)
ax1.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d1.Count.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
plt.yticks(fontsize=16)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' Precincts (Shootings)', size=23, color = 'black')
ax1.text(top1-65, d1.Count.mean()+25, 'Mean = ' + str(d1.Count.mean()), rotation=0, fontsize=14, color='black')


ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.PRECINCT, d2.Count, label='Count', color=my_colors2)
ax2.legend(handles=[Above, At, Below], fontsize=14)
plt.axhline(d2.Count.mean(), color='black', linestyle='dashed')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
plt.xticks(fontsize=15)
plt.yticks(fontsize=16)
ax2.axes.xaxis.set_visible(True)
ax2.set_title('Top ' + str(top2) + ' Precincts (Shootings)', size=23, color = 'black')

plt.show()

Map: Fatal/Non-Fatal Shootings by Hour

The map below represents fatal and no-fatal shootings in NYC. Through the time-lapse of shootings over the hours of the day, the number of shootings (both fatal and non-fatal) dissipate during the middle of the day and pick up as night returns. This map also solidifies the claims reached by the stacked bar, that most shootings occur in Brooklyn and the Bronx.

NOTE: The following chart is interactive so feel free to click the play button.

path = "U:/Map Files/"

my_map = px.scatter_mapbox(shootings.sort_values("hour"), lat="Latitude", lon="Longitude",
                        zoom=10, animation_frame="hour", color = 'STATISTICAL_MURDER_FLAG',
                        labels={"STATISTICAL_MURDER_FLAG": "Fatal Shooting"}) 
 
my_map.update_layout(height=1000, width=1000) 
my_map.update_layout(mapbox_style='open-street-map') 
my_map.update_layout(title='Locations of NY shootings (1:00am-12:00am)')

plotly.offline.plot(my_map, filename=path + 'Map_Output.html')

Summary

Overall, it is evident that time has a great effect on the overall likelihood that a shooting will take place. Looking at time in terms of years, months, and hours it can be concluded that the hours hold the greatest correlation, although months is a large determinant as well. From the combination of multiple plots and different views, shootings occur more frequently at night/summer, generally speaking. In terms of location it can be noted that the large majority of shootings occurred in Brooklyn and the Bronx. At the top end of the data there were was a normal spread in terms of precincts where no 1 precinct dominated the data. What we can gain from this analysis is a picture of when/where shootings occur in hopes that police forces can staff accordingly and organize shifts and preventative measures to minimize the detriments. Over the years shootings have decreased accordingly, it is likely that similar data has been a major factor in those efforts.