Data607-Final-Project

Chris Ayre

5/12/2019

Bronx Crime in 2018

The Bronx has historically held the reputation of being a hub for violent crime. However in recent years things have changed for the better. I will explore the 2018 crime dataset provided by the New York City Police Department in order to gain insight into the abundance and types of crime affecting my favourite NYC borough. The data is hosted at data.cityofnewyork.us and contains all reported crimes in NYC. Let’s explore how the incidents of crime in the Bronx varied throughout last year.

The Data

This dataset includes all confirmed valid felony, misdemeanor, and violation crimes in the Bronx reported to the New York City Police Department (NYPD) for all of 2018. This dataset does not contain crime that was unreported and reports of crime the NYPD could not confirm. This dataset contains a total of 1,018 observations each representing a single reported case. Each observation has relevant information on each reported crime such as the description, category of the offense, date/time, description of the victim and offender, etc.

# Importing the data
data <- read.csv("https://raw.githubusercontent.com/Chris-Ayre/DS-Final/master/NYPD_Complaint_Data_2018.csv")
library(dplyr)
library(tidyr)
library(ggplot2)

Pre-processing the data

The data must be organised in a useful way so that it makes sense to our end users. We do not need all of the data from the data set to explore the types of crimes being reported in the Bronx. Specific Variables are choosen, merged and renamed to allow easier consumption by end users.

# Selecting only the Columns I wish to work with from the dataset
data <- data %>% select(OFNS_DESC, ADDR_PCT_CD, LAW_CAT_CD, CMPLNT_FR_DT, SUSP_AGE_GROUP, SUSP_RACE, SUSP_SEX, VIC_AGE_GROUP, VIC_RACE, VIC_SEX)

# Rename the columns with end user friendly names to to better represent the variables.
names(data) <- c("Offense", "Precinct", "Category of Offense", "Date Filed", "Suspect Age", "Suspect Race", "Suspect Sex", "Victim Age", "Victim Race", "Victim Sex")

# Merge the 3 suspect description columns into one
data <- unite_(data, "Suspect", c("Suspect Age","Suspect Race","Suspect Sex"))

# Merge the 3 victim description columns into one
data <- unite_(data, "Victim", c("Victim Age", "Victim Race", "Victim Sex"))

Analysis

datacat <- sort(table(data$"Category of Offense"),decreasing = TRUE)
datacat <- data.frame(datacat[datacat > 5000])
colnames(datacat) <- c("Category", "Frequency")
datacat$Percentage <- datacat$Frequency / sum(datacat$Frequency)*100

datacat
##      Category Frequency Percentage
## 1 MISDEMEANOR     57474   56.89481
## 2      FELONY     27194   26.91995
## 3   VIOLATION     16350   16.18523
bp <- ggplot(datacat, aes(x=Category, y=Frequency, fill=Category)) + geom_bar(stat="identity") + 
  theme(axis.text.x=element_blank()) + geom_text(data=datacat, aes(label=""))
bp

pie2 <- ggplot(datacat,
  aes(x="", y=Frequency, fill=Category)) +  
  geom_bar(stat="identity") +
  coord_polar(theta = "y") +
  scale_x_discrete("")
pie2

Misdemeanors accounted for more than half of all reported crime in the bronx (57%). A Misdemeanor is an offense other than a traffic infraction of which a sentence in excess of 15 days but not greater than one year may be imposed.

Just about half as many Felonies were reported as Misdeameanors (27,194). A Felony is the most serious type of criminal charge In New York State. A felony is an offense for which a sentence to a term of imprisonment in excess of one year may be imposed.

A Violation is an offense other than a traffic infraction for which a sentence to a term of imprisonment of up to 15 days may be imposed. Many Violations go unreported due to it’s low severity. Harrassment is a common form of violation. Violations accounted for 16% of the crime reported in the Bronx

Summarizing the data by high and low occurring incident categories.

datalow <- sort(table(data$Offense),decreasing = TRUE)
datalow <- data.frame(datalow[datalow < 10])
colnames(datalow) <- c("Category", "Frequency")
datalow$Percentage <- datalow$Frequency / sum(datalow$Frequency)*100

datahigh <- sort(table(data$Offense),decreasing = TRUE)
datahigh <- data.frame(datahigh[datahigh > 3000])
colnames(datahigh) <- c("Category", "Frequency")
datahigh$Percentage <- datahigh$Frequency / sum(datahigh$Frequency)*100

The most reported crimes

pie <- ggplot(datahigh,
  aes(x=Category, y=Frequency, fill=Category)) +  
  geom_bar(stat="identity") +
  coord_polar(theta = "x") +
  scale_x_discrete("")

pie

Harrasment, Petit Larcency, Assault and Criminal Mischief accord for a vast majority of reported crimes in the Bronx. This is consistent with daily life throughout NYC - I imagine all New Yorkers have witnessed a crime falling into one of these offenses.

ofbp <- ggplot(datahigh, 
  aes(x=Category, y=Frequency, fill=Category)) + 
  geom_bar(stat="identity") + 
  theme(axis.text.x=element_blank()) + 
  geom_text(data=datahigh, aes(label=""))

ofbp

The Least reported crimes

ofbp2 <- ggplot(datalow, 
  aes(x=Category, y=Frequency, fill=Category)) + 
  geom_bar(stat="identity") + 
  theme(axis.text.x=element_blank()) + 
  geom_text(data=datalow, aes(label=""))

ofbp2

Having only been reported twice last year, driving while impaired/intoxicated are among the least reported crimes. Not because it does not happen but because it is inconvenient to file a report against an impaired driver, then it is unlikely the police is able to confirm the offense actually happened and charge the driver.

The TreeMap

The treemap represents the hierarchical crime data in a tree-like structure. Data, organized as branches and sub-branches, is represented using rectangles. This makes the at-a-glance distinguishing between categories and data values easy.

library(treemap)
## Warning: package 'treemap' was built under R version 3.5.3
treemap(data, 
        index=c("Offense","Category of Offense"),
        vSize="Precinct", 
        type="index",
        fontsize.labels=c(15,12),
        fontcolor.labels=c("white","orange"),
        fontface.labels=c(2,1), 
        bg.labels=c("transparent"),
        align.labels=list(
          c("center", "center"), 
          c("center", "top")
        ),                                 
        overlap.labels=0.2,                     
        inflate.labels=F
      )

The treemap indicates the most commonly reported crimes in the Bronx are Harrassment, Petty Larceny and Assault.

The most reported Felonies are Grand Larceny, Felony Assault and Robbery

Harrassment is by far the most popular type of violation reported, accounting for over 15% of all crime reported in the Bronx.

What I learned

There are many crimes that are more likely to be reported that others, this dataset is affected by that bias. Crime is nearly always reported by the victim. Many ‘victimless’ offenses such as prostitution and drug offenses are underreported in this dataset for that reason.

Public perception about crime in the Bronx does not align with the data. Crime has been on the decline in New York for the last 10 years, especially in the Bronx. However, peace never makes it to the news so public perception is still quite negative despite the strides being made here in the BX.