Presented by:
This project provided us with the opportunity of showcasing many of the skills we have learned throughout this course and of applying them to an investigation into datasets of our choosing. We narrowed our scope to a few datasets containing information on social economic information, namely unemployment and crime data in NYC. We hoped that this investigation would reveal valuable information that could be used to formulate policy proposals. This project provided us with the opportunity of showcasing many of the skills we have learned throughout this course and of applying them to an investigation into datasets of our choosing. We narrowed our scope to a few datasets containing information on social economic information, namely unemployment crime data in NYC. We hoped that this investigation would reveal valuable information that could be used to formulate policy proposals.
We used the following workflow for each dataset:
We then merged the datasets to explore further and try to draw some final conslusions.
workflowchart
source("environment_setup.R", echo = T, prompt.echo = "", spaced = F)
## if (!require("dplyr")) install.packages("dplyr")
## if (!require("RSocrata")) install.packages("RSocrata")
## if (!require("tidyverse")) install.packages("tidyverse")
## if (!require("ggplot2")) install.packages("ggplot2")
## if (!require("readxl")) install.packages("readxl")
## if (!require("plyr")) install.packages("plyr")
## if (!require("treemap")) install.packages("treemap")
## if (!require("leaflet")) install.packages("leaflet")
## if (!require("forcats")) install.packages("forcats")
## if (!require("ggExtra")) install.packages("ggExtra")
## if (!require("GGally")) install.packages("GGally")
We will start with the NYPD Arrests Data (Historic) data from NYC Open Data found below and conduct some exploratory data analysis to find out how arrests are distributed in general. We will explore trends like for example investigating seasonality trends or trends in particular kinds of arrest or by boroughs.
There are 4.8M rows, there are 18 columns and each row is an arrest.
variable | description |
---|---|
arrest_date |
Exact date of arrest for the reported event. |
ofns_desc |
Description of internal classification corresponding with KY code (more general category than PD description). |
arrest_boro |
Borough of arrest. B(Bronx), S(Staten Island), K(Brooklyn), M(Manhattan), Q(Queens) |
age_group |
Perpetrator’s age within a category. |
perp_sex |
Perpetrator’s sex description. |
perp_race |
Perpetrator’s race description. |
x_coord_cd |
Midblock X-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104). |
y_coord_cd |
Midblock Y-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104) |
latitude |
Latitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326) |
longitude |
Longitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326) |
Load the data into R using the RSocrata API.
source("arrests_dataset.R", echo = F, prompt.echo = "", spaced = F)
head(arrests_df, 10)
Rename the borough letters to proper names.
arrests_df$arrest_boro <- revalue(arrests_df$arrest_boro, c("Q"="Queens", "K"="Brooklyn", "M"="Manhatttan", "S"="Staten Island", "B" = "Bronx"))
Remove missing values where no offense description is recorded.
arrests_df <- arrests_df %>% filter(ofns_desc != "")
We generate a series of data frames aggregating the data in different manners for analysis and plotting. For example, we look at arrests by race, borough, offense.
murder_counts <- arrests_df %>%
group_by(arrest_boro, year, perp_race) %>%
dplyr::summarise(murder_counts = n()) %>%
arrange(desc(year))
murder_counts
# get the count of arrests per year, by borough
grouped_boro <- arrests_df %>%
group_by(year, arrest_boro) %>%
dplyr::summarize(count = n()) %>%
arrange(desc(count))
# get the count of offenses per year, by borough
grouped_offenses <- arrests_df %>%
group_by(year, arrest_boro, ofns_desc) %>%
dplyr::summarize(count = n()) %>%
arrange(desc(count))
# get the top five offense per borough
t5 <- grouped_offenses %>% top_n(5)
# get the counts of offenses overall
crime_counts <- arrests_df %>%
group_by(ofns_desc) %>%
dplyr::summarize(count = n()) %>%
arrange(desc(count))
# get the count of arrests related to dangerous drugs by year, by borough
drugs <- arrests_df %>%
filter(ofns_desc == 'DANGEROUS DRUGS') %>%
group_by(year, arrest_boro) %>%
dplyr::summarize(count = n())
Let’s study the evolution of crime over the period of interest (2014-2018).
What the plot below reveals is that overall crime is decreasing for all boroughs of NYC. The data year over year is very similar, appearing to simply scale down over time.
What we can note as suprising is the fact that total crime between Manhattan and Brooklyn is at fairly similar levels. Total crime is aggregated without accounting for different types of crime so we will further our investigation by looking at top crimes overall, and then dissecting crime per borough.
Here is a plot of the top 10 most common crimes for the 2014-2018 period across all boroughs. We learn that dangerous drugs related offenses are the most prevailent followed by 3rd degree assaults.
A peek at the bottom 10 crimes for the same period reveals somewhat unexpected crimes like disruption of a religious service. It is interesting to note that while dangerous drugs offenses are the most common crime, only 1 person was arrested for being under the influence of drugs.
Following from the exploration above, we take a deeper look at the most common crimes by borough. On the plot below, we once again see that how drug related offenses are the most common crimes and that this is consistent across boroughs. We notice that while Brooklyn and Manhattan had the most crimes, the Bronx captures the most drug arrests.
The plot below explores that relationship over time for each borough. We observe that similarly to crime in general, drug related arrests are going down.
We continue investigating the demographics and take a look at the distribution of crime by gender. Male adults between the ages of 25-44 remain the most common perpetrators.
This interactive map will let you explore the distribution of crime geographically.