Bronx Crime in 2018
The Bronx has historically held the reputation of being a hub for violent crime. However in recent years things have changed for the better. I will explore the 2018 crime dataset provided by the New York City Police Department in order to gain insight into the abundance and types of crime affecting my favourite NYC borough. The data is hosted at data.cityofnewyork.us and contains all reported crimes in NYC. Letâs explore how the incidents of crime in the Bronx varied throughout last year.
The Data
This dataset includes all confirmed valid felony, misdemeanor, and violation crimes in the Bronx reported to the New York City Police Department (NYPD) for all of 2018. This dataset does not contain crime that was unreported and reports of crime the NYPD could not confirm. This dataset contains a total of 1,018 observations each representing a single reported case. Each observation has relevant information on each reported crime such as the description, category of the offense, date/time, description of the victim and offender, etc.
# Importing the data
data <- read.csv("https://raw.githubusercontent.com/Chris-Ayre/DS-Final/master/NYPD_Complaint_Data_2018.csv")library(dplyr)
library(tidyr)
library(ggplot2)Pre-processing the data
The data must be organised in a useful way so that it makes sense to our end users. We do not need all of the data from the data set to explore the types of crimes being reported in the Bronx. Specific Variables are choosen, merged and renamed to allow easier consumption by end users.
# Selecting only the Columns I wish to work with from the dataset
data <- data %>% select(OFNS_DESC, ADDR_PCT_CD, LAW_CAT_CD, CMPLNT_FR_DT, SUSP_AGE_GROUP, SUSP_RACE, SUSP_SEX, VIC_AGE_GROUP, VIC_RACE, VIC_SEX)
# Rename the columns with end user friendly names to to better represent the variables.
names(data) <- c("Offense", "Precinct", "Category of Offense", "Date Filed", "Suspect Age", "Suspect Race", "Suspect Sex", "Victim Age", "Victim Race", "Victim Sex")
# Merge the 3 suspect description columns into one
data <- unite_(data, "Suspect", c("Suspect Age","Suspect Race","Suspect Sex"))
# Merge the 3 victim description columns into one
data <- unite_(data, "Victim", c("Victim Age", "Victim Race", "Victim Sex"))Analysis
datacat <- sort(table(data$"Category of Offense"),decreasing = TRUE)
datacat <- data.frame(datacat[datacat > 5000])
colnames(datacat) <- c("Category", "Frequency")
datacat$Percentage <- datacat$Frequency / sum(datacat$Frequency)*100
datacat## Category Frequency Percentage
## 1 MISDEMEANOR 57474 56.89481
## 2 FELONY 27194 26.91995
## 3 VIOLATION 16350 16.18523
bp <- ggplot(datacat, aes(x=Category, y=Frequency, fill=Category)) + geom_bar(stat="identity") +
theme(axis.text.x=element_blank()) + geom_text(data=datacat, aes(label=""))
bppie2 <- ggplot(datacat,
aes(x="", y=Frequency, fill=Category)) +
geom_bar(stat="identity") +
coord_polar(theta = "y") +
scale_x_discrete("")
pie2Misdemeanors accounted for more than half of all reported crime in the bronx (57%). A Misdemeanor is an offense other than a traffic infraction of which a sentence in excess of 15 days but not greater than one year may be imposed.
Just about half as many Felonies were reported as Misdeameanors (27,194). A Felony is the most serious type of criminal charge In New York State. A felony is an offense for which a sentence to a term of imprisonment in excess of one year may be imposed.
A Violation is an offense other than a traffic infraction for which a sentence to a term of imprisonment of up to 15 days may be imposed. Many Violations go unreported due to it’s low severity. Harrassment is a common form of violation. Violations accounted for 16% of the crime reported in the Bronx
Summarizing the data by high and low occurring incident categories.
datalow <- sort(table(data$Offense),decreasing = TRUE)
datalow <- data.frame(datalow[datalow < 10])
colnames(datalow) <- c("Category", "Frequency")
datalow$Percentage <- datalow$Frequency / sum(datalow$Frequency)*100
datahigh <- sort(table(data$Offense),decreasing = TRUE)
datahigh <- data.frame(datahigh[datahigh > 3000])
colnames(datahigh) <- c("Category", "Frequency")
datahigh$Percentage <- datahigh$Frequency / sum(datahigh$Frequency)*100The most reported crimes
pie <- ggplot(datahigh,
aes(x=Category, y=Frequency, fill=Category)) +
geom_bar(stat="identity") +
coord_polar(theta = "x") +
scale_x_discrete("")
pieHarrasment, Petit Larcency, Assault and Criminal Mischief accord for a vast majority of reported crimes in the Bronx. This is consistent with daily life throughout NYC - I imagine all New Yorkers have witnessed a crime falling into one of these offenses.
ofbp <- ggplot(datahigh,
aes(x=Category, y=Frequency, fill=Category)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_blank()) +
geom_text(data=datahigh, aes(label=""))
ofbpThe Least reported crimes
ofbp2 <- ggplot(datalow,
aes(x=Category, y=Frequency, fill=Category)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_blank()) +
geom_text(data=datalow, aes(label=""))
ofbp2Having only been reported twice last year, driving while impaired/intoxicated are among the least reported crimes. Not because it does not happen but because it is inconvenient to file a report against an impaired driver, then it is unlikely the police is able to confirm the offense actually happened and charge the driver.
The TreeMap
The treemap represents the hierarchical crime data in a tree-like structure. Data, organized as branches and sub-branches, is represented using rectangles. This makes the at-a-glance distinguishing between categories and data values easy.
library(treemap)## Warning: package 'treemap' was built under R version 3.5.3
treemap(data,
index=c("Offense","Category of Offense"),
vSize="Precinct",
type="index",
fontsize.labels=c(15,12),
fontcolor.labels=c("white","orange"),
fontface.labels=c(2,1),
bg.labels=c("transparent"),
align.labels=list(
c("center", "center"),
c("center", "top")
),
overlap.labels=0.2,
inflate.labels=F
)The treemap indicates the most commonly reported crimes in the Bronx are Harrassment, Petty Larceny and Assault.
The most reported Felonies are Grand Larceny, Felony Assault and Robbery
Harrassment is by far the most popular type of violation reported, accounting for over 15% of all crime reported in the Bronx.
What I learned
There are many crimes that are more likely to be reported that others, this dataset is affected by that bias. Crime is nearly always reported by the victim. Many ‘victimless’ offenses such as prostitution and drug offenses are underreported in this dataset for that reason.
Public perception about crime in the Bronx does not align with the data. Crime has been on the decline in New York for the last 10 years, especially in the Bronx. However, peace never makes it to the news so public perception is still quite negative despite the strides being made here in the BX.