1. Overview

New York City has often been lauded as a role model for other sprawling American metropolises in terms of cutting down violent crime and ensuring that it is one of the safest mega-cities to live in. However, in June 2020, just as New York City began to gradually emerge after several months of lockdown due to the coronavirus, there has been a significant uptick in gun violence - over the Independence Day (otherwise known as the Fourth of July) weekend, there were 64 shootings, while the subsequent weekend had 53 shootings (Slotnik, 2020). Moreover, when comparing the number of shootings in the same period in 2019, there has already been 634 shootings in 2020, compared with 394 in 2019 (Southall, 2020). Therefore, it would be insightful to have a closer look into the information behind these shootings for 2020, and see which areas have suffered the brunt of this unprecedented surge in gun violence.

1.1 Purpose

The purpose of this visualisation is to make use of the dataset that has been retrieved from NYC Open Data (https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Year-To-Date-/5ucz-vwe8), which lists every shooting incident that has occurred in NYC for the current calendar year. Through this dataset, we can endeavour to visualise the information related to gun violence for the year 2020, and see if there are any discernible trends that can be determined such as which boroughs or neighbourhoods are more prone to gun violence.

1.2 Data & Design Challenges

The dataset comprises shooting incident-level data records, which comprises information such as the perpetrator and victim demographics, the borough it took place in, and most importantly, the geographic coordinates of the shooting location. One challenge is that the dataset does not have zip code level data, which adversely affects the granularity of the dataset. The first workaround is to use the Google API to grab the map view directly and visualise the shooting locations directly. The second workaround will be to use the precinct as a field for joining with the precinct-level shapefile. Additionally, there are no numerical fields to visualise, so data preparation will have to be done to derive the count of the number of incidents per precinct. There is also some missing data for the demographic fields which cannot be imputed.

With regards to the design challenges, making an interactive map with the police precinct number showing as the tooltip is not very meaningful, as the viewer would need to know which neighbourhoods correspond to which precinct. After not having any luck finding any neighbourhood to precinct mapping, I decided to create my own mapping. This is done by checking the nyc.gov website’s precinct details, which shows which neighbourhoods each precinct is serving. As sometimes the number of neighbourhoods each precinct serves can be a lot, I have only included up to the first 5 neighbourhoods on each precinct’s website. This would ensure that the tooltip would not be overly long when one mouses over a specific portion of the map.

1.3 Sketch of Proposed Design

2. Step-by-step description

2.1 Install and load R packages

  • ggmap - Helps to visualise spatial data and models.
  • tidyverse - Helps with cleaning, processing, modelling and visualising data, and includes the ggplot2 and dplyr packages.
  • readxl - Helps to retrieve data from .xlsx files.
  • dplyr - Helps with data manipulation.
  • ggthemes - Provides additional themes for ggplot2.
  • sf - Used for encoding spatial vector data.
  • tmap - Used to generate thematic maps
packages = c("ggmap", "tidyverse", "readxl", "dplyr", "ggthemes", 'sf', 'tmap')

for(p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

2.2 Loading the data

The shapefile for the police precincts is downloaded from NYC Open Data at https://data.cityofnewyork.us/Public-Safety/Police-Precincts/78dh-3ptz. The shooting incident dataset mentioned in section 1.1 is loaded as well. Additionally, the precinct to neighbourhood level dataset which I created is also loaded.

shp <- st_read(dsn= 'Police Precincts',layer='geo_export_cf489a06-5548-4944-a259-56a344376e96')
## Reading layer `geo_export_cf489a06-5548-4944-a259-56a344376e96' from data source `C:\Users\winst\OneDrive\Documents\Police Precincts' using driver `ESRI Shapefile'
## Simple feature collection with 77 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## geographic CRS: WGS84(DD)
shooting <- read_csv('NYPD_Shooting_Incident_Data__Year_To_Date_.csv')
precinct_map <- read_excel('Precinct_Neighbourhood_Mapping.xlsx')

2.3 Data Preparation

Data preparation and processing is needed in order to get derived fields from the dataset, and to improve on the interactive map’s tooltip.

2.3.1 Deriving the count of the number of incidents

The count of the number of incidents per precinct is derived first, using a combination of the group_by and summarise functions. This summarised dataset is then merged with the precinct-level shapefile dataset.

precinct_groupby <- shooting %>%
  group_by(PRECINCT)%>%
  summarise(lat=as.numeric(Latitude),
            long=as.numeric(Longitude),
            borough=BORO,
            incidents=n())

2.3.2 Merging the datasets

The datasets are then merged. The shapefile is left-joined with the newly created precinct_groupby file, joining on the precinct keys. Subsequetnly, this new dataset, shp_shooting_2020, is left-joined with the precinct mapping dataset, joining on the precinct keys.

shp_shooting_2020 <- left_join(shp, precinct_groupby, by = c('precinct' = 'PRECINCT'))
neighbourhood_shooting <- left_join(shp_shooting_2020,precinct_map,by = c('precinct'='Precinct'))

2.3.3 Data Cleaning

Some data cleaning is done, to ensure that NA or missing values will not show up on the map, they are changed to 0 instead, because having a NA for the incidents column equates to having no shooting incidents. Additionally, a new column, Neighbourhood_Incident_Count, is created and shifted to the front, to be used as the tooltip for the interactive map.

neighbourhood_shooting[is.na(neighbourhood_shooting)] <- 0
neighbourhood_shooting$Neighbourhood_Incident_Count <- paste(neighbourhood_shooting$Neighbourhoods, neighbourhood_shooting$incidents, sep=": ")
neighbourhood_shooting <- neighbourhood_shooting %>%
  select(Neighbourhood_Incident_Count, everything())

2.3.4 Google Maps API Key

For the initial maps, I will be using ggmap’s get_map function, which requires the use of a Google API Key. I have generated and restricted the key’s usage to only the relevant APIs.

register_google(key=<enter your key here>)

2.4 Exploratory Data Analysis

Some exploratory data analysis will be performed with ggmap first with the original dataset.

2.4.1 Map of shootings by borough and murder flag

The first chart looks at the points where shootings have taken place in each borough. This is to allow clearer views of each borough, as certain boroughs such as Queens and Brooklyn share a land border and the points can overlap on a single map. Additionally, the murder flag adds more information whether the victim has died as a result of the shooting incident.

newyork.map_color <- get_map(location= 'New York City', 
                       maptype='roadmap', color='color',source='google',zoom=11)

ggmap(newyork.map_color) +
  geom_point(data=shooting, aes(x=Longitude, y=Latitude, color=STATISTICAL_MURDER_FLAG), size =2, alpha=0.5)+facet_wrap(~BORO)+scale_color_manual(values=c("steelblue", "red"))+
  labs(title='Map of shootings by borough and murder flag',
       subtitle='The maps show the distribution of shootings across the five boroughs. \nMurder flags indicate whether the victim died as a result of the shooting.')+
    theme(plot.title=element_text(hjust=0),
          plot.subtitle=element_text(hjust=0),
        axis.ticks = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        legend.position = 'bottom') 

2.4.2 Shooting heatmap across NYC

The second map is a heatmap of shooting incidents across NYC. This allows the viewers to quickly identify hotspots for gun violence quickly across the five boroughs.

ggmap(newyork.map_color) + 
  stat_density2d(data=shooting,aes(x=Longitude, y=Latitude, fill=..level.., alpha=..level..),geom='polygon')  + 
  scale_fill_gradient(low = "green", high = "red") + 
  scale_alpha(range = c(0, 0.5), guide = FALSE) + 
  labs(title='Gun violence incidents heatmap',
       subtitle='The heatmap shows the hotspots for gun violence across New York City') +
  theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        legend.position="none") 

3 Final Visualisation - Interactive Choropleth Map

The choropleth map gives a quick overview of which precincts are more prone to gun violence, with the tooltip showing the neighbourhood names and the count of shooting incidents.

tmap_mode('view')
tm_shape(neighbourhood_shooting) + tm_fill('incidents', n=5, style='quantile', palette='Blues')+tm_borders(alpha=0.8)+ 
  tm_scale_bar(width = 0.15) +
  tm_layout(legend.format=list(fun=function(x) paste0(formatC(x, digits=0, format= "f"),"")))

4 Insights

  • The first map shows the distribution of shootings across each borough. One noteworthy observation is that most of the shootings that result in deaths take place in areas that are known to have high crime, such as East Brooklyn and South Bronx. East Brooklyn contains the neighbourhoods of Brownsville and East New York, which are known to be the murder capitals of New York City, while the South Bronx contains the neighbourhoods of Mott Haven and Concourse Village, which have a high concentration of working-class people staying in low-income housing projects. In contrast, South and West Brooklyn, along with the south of Staten Island, have traditionally known to be safe areas and this is shown in this chart, in which almost no shootings take place in those areas.

  • The second map shows the heatmap of shootings across the entire New York City. This shows a clearer view of the hotspots where the shootings have been taken place. It is clear that the concentration of shootings are the eastern and northern borders of Brooklyn that joins with Queens, along with the northern border of Manhattan which joins with South Bronx. Even though these are known crime hotspot areas, the surge in shootings this year have been attributed to a myriad of reasons, ranging from the fallout arising from the pandemic, to heightened tension due to the death of George Floyd (Closson, 2020).

  • The interactive map shows the number of shooting incidents at a precinct level. Mousing over each precinct will show the main neighbourhood names and the count of the shooting incidents. Brownsville and East New York in Central/East Brooklyn remain the two most dangerous neighbourhoods for gun shootings, with 38 and 41 shooting incidents respectively. Both neighbourhoods have also been hit disparately hard due to the coronavirus pandemic, with Starett City (within East New York) having the highest death rate due to the coronavirus (Schwirtz & Cook, 2020).

5. References