New York City has often been lauded as a role model for other sprawling American metropolises in terms of cutting down violent crime and ensuring that it is one of the safest mega-cities to live in. However, in June 2020, just as New York City began to gradually emerge after several months of lockdown due to the coronavirus, there has been a significant uptick in gun violence - over the Independence Day (otherwise known as the Fourth of July) weekend, there were 64 shootings, while the subsequent weekend had 53 shootings (Slotnik, 2020). Moreover, when comparing the number of shootings in the same period in 2019, there has already been 634 shootings in 2020, compared with 394 in 2019 (Southall, 2020). Therefore, it would be insightful to have a closer look into the information behind these shootings for 2020, and see which areas have suffered the brunt of this unprecedented surge in gun violence.
The purpose of this visualisation is to make use of the dataset that has been retrieved from NYC Open Data (https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Year-To-Date-/5ucz-vwe8), which lists every shooting incident that has occurred in NYC for the current calendar year. Through this dataset, we can endeavour to visualise the information related to gun violence for the year 2020, and see if there are any discernible trends that can be determined such as which boroughs or neighbourhoods are more prone to gun violence.
The dataset comprises shooting incident-level data records, which comprises information such as the perpetrator and victim demographics, the borough it took place in, and most importantly, the geographic coordinates of the shooting location. One challenge is that the dataset does not have zip code level data, which adversely affects the granularity of the dataset. The first workaround is to use the Google API to grab the map view directly and visualise the shooting locations directly. The second workaround will be to use the precinct as a field for joining with the precinct-level shapefile. Additionally, there are no numerical fields to visualise, so data preparation will have to be done to derive the count of the number of incidents per precinct. There is also some missing data for the demographic fields which cannot be imputed.
With regards to the design challenges, making an interactive map with the police precinct number showing as the tooltip is not very meaningful, as the viewer would need to know which neighbourhoods correspond to which precinct. After not having any luck finding any neighbourhood to precinct mapping, I decided to create my own mapping. This is done by checking the nyc.gov website’s precinct details, which shows which neighbourhoods each precinct is serving. As sometimes the number of neighbourhoods each precinct serves can be a lot, I have only included up to the first 5 neighbourhoods on each precinct’s website. This would ensure that the tooltip would not be overly long when one mouses over a specific portion of the map.
packages = c("ggmap", "tidyverse", "readxl", "dplyr", "ggthemes", 'sf', 'tmap')
for(p in packages){
if(!require(p, character.only=T)){
install.packages(p)
}
library(p, character.only=T)
}
The shapefile for the police precincts is downloaded from NYC Open Data at https://data.cityofnewyork.us/Public-Safety/Police-Precincts/78dh-3ptz. The shooting incident dataset mentioned in section 1.1 is loaded as well. Additionally, the precinct to neighbourhood level dataset which I created is also loaded.
shp <- st_read(dsn= 'Police Precincts',layer='geo_export_cf489a06-5548-4944-a259-56a344376e96')
## Reading layer `geo_export_cf489a06-5548-4944-a259-56a344376e96' from data source `C:\Users\winst\OneDrive\Documents\Police Precincts' using driver `ESRI Shapefile'
## Simple feature collection with 77 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## geographic CRS: WGS84(DD)
shooting <- read_csv('NYPD_Shooting_Incident_Data__Year_To_Date_.csv')
precinct_map <- read_excel('Precinct_Neighbourhood_Mapping.xlsx')
Data preparation and processing is needed in order to get derived fields from the dataset, and to improve on the interactive map’s tooltip.
The count of the number of incidents per precinct is derived first, using a combination of the group_by and summarise functions. This summarised dataset is then merged with the precinct-level shapefile dataset.
precinct_groupby <- shooting %>%
group_by(PRECINCT)%>%
summarise(lat=as.numeric(Latitude),
long=as.numeric(Longitude),
borough=BORO,
incidents=n())
The datasets are then merged. The shapefile is left-joined with the newly created precinct_groupby file, joining on the precinct keys. Subsequetnly, this new dataset, shp_shooting_2020, is left-joined with the precinct mapping dataset, joining on the precinct keys.
shp_shooting_2020 <- left_join(shp, precinct_groupby, by = c('precinct' = 'PRECINCT'))
neighbourhood_shooting <- left_join(shp_shooting_2020,precinct_map,by = c('precinct'='Precinct'))
Some data cleaning is done, to ensure that NA or missing values will not show up on the map, they are changed to 0 instead, because having a NA for the incidents column equates to having no shooting incidents. Additionally, a new column, Neighbourhood_Incident_Count, is created and shifted to the front, to be used as the tooltip for the interactive map.
neighbourhood_shooting[is.na(neighbourhood_shooting)] <- 0
neighbourhood_shooting$Neighbourhood_Incident_Count <- paste(neighbourhood_shooting$Neighbourhoods, neighbourhood_shooting$incidents, sep=": ")
neighbourhood_shooting <- neighbourhood_shooting %>%
select(Neighbourhood_Incident_Count, everything())
For the initial maps, I will be using ggmap’s get_map function, which requires the use of a Google API Key. I have generated and restricted the key’s usage to only the relevant APIs.
register_google(key=<enter your key here>)
Some exploratory data analysis will be performed with ggmap first with the original dataset.
The first chart looks at the points where shootings have taken place in each borough. This is to allow clearer views of each borough, as certain boroughs such as Queens and Brooklyn share a land border and the points can overlap on a single map. Additionally, the murder flag adds more information whether the victim has died as a result of the shooting incident.
newyork.map_color <- get_map(location= 'New York City',
maptype='roadmap', color='color',source='google',zoom=11)
ggmap(newyork.map_color) +
geom_point(data=shooting, aes(x=Longitude, y=Latitude, color=STATISTICAL_MURDER_FLAG), size =2, alpha=0.5)+facet_wrap(~BORO)+scale_color_manual(values=c("steelblue", "red"))+
labs(title='Map of shootings by borough and murder flag',
subtitle='The maps show the distribution of shootings across the five boroughs. \nMurder flags indicate whether the victim died as a result of the shooting.')+
theme(plot.title=element_text(hjust=0),
plot.subtitle=element_text(hjust=0),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = 'bottom')
The second map is a heatmap of shooting incidents across NYC. This allows the viewers to quickly identify hotspots for gun violence quickly across the five boroughs.
ggmap(newyork.map_color) +
stat_density2d(data=shooting,aes(x=Longitude, y=Latitude, fill=..level.., alpha=..level..),geom='polygon') +
scale_fill_gradient(low = "green", high = "red") +
scale_alpha(range = c(0, 0.5), guide = FALSE) +
labs(title='Gun violence incidents heatmap',
subtitle='The heatmap shows the hotspots for gun violence across New York City') +
theme(axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position="none")
The choropleth map gives a quick overview of which precincts are more prone to gun violence, with the tooltip showing the neighbourhood names and the count of shooting incidents.
tmap_mode('view')
tm_shape(neighbourhood_shooting) + tm_fill('incidents', n=5, style='quantile', palette='Blues')+tm_borders(alpha=0.8)+
tm_scale_bar(width = 0.15) +
tm_layout(legend.format=list(fun=function(x) paste0(formatC(x, digits=0, format= "f"),"")))