Exploratory Data Analysis (EDA) of NYC Civilian Complain Review Boards complaints closed in or after 2006

Objectives

The objective of this assingment is to conduct an exploratory data analysis of the NYC Data Transparnecy Initiative. This database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Our objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

Load data

In this section we use the “readxl” package to read large excel files and load data.

library(readxl)
library(ggplot2)
ccrb_data<- read_excel("/Users/yousiyan/Downloads/ccrb_datatransparencyinitiative.xlsx",sheet = "Complaints_Allegations")

1. Distribution of incidents over years.

First let’s look at the distribution of incidents over years.

#Please note that we do have same complaints that have more than one entries, thus we need to get Unique incident ids in order to be more clarified
inci.year<- unique(ccrb_data[c("UniqueComplaintId","Incident Year")])
inci.year<- data.frame(inci.year)
ggplot(inci.year,aes(Incident.Year))+geom_bar()

From the plot above, it looks like the reporting of overall incidents has increased dramatically from 2005-2006, almost stay steady over years from 2006-2009, however, declined smmothly over years from 2009 to 2016. There may be an actual decline in incidents over these years or may be just the people reporting incidents have decreased.

Stem-and-Leaf Plots

stem(inci.year$Incident.Year)
## 
##   The decimal point is at the |
## 
##   1999 | 00
##   2000 | 0
##   2001 | 
##   2002 | 000
##   2003 | 0000000
##   2004 | 00000000000000000000000000000000000000000000000000000000000000000000+122
##   2005 | 00000000000000000000000000000000000000000000000000000000000000000000+3344
##   2006 | 00000000000000000000000000000000000000000000000000000000000000000000+7618
##   2007 | 00000000000000000000000000000000000000000000000000000000000000000000+7464
##   2008 | 00000000000000000000000000000000000000000000000000000000000000000000+7263
##   2009 | 00000000000000000000000000000000000000000000000000000000000000000000+7549
##   2010 | 00000000000000000000000000000000000000000000000000000000000000000000+6381
##   2011 | 00000000000000000000000000000000000000000000000000000000000000000000+5932
##   2012 | 00000000000000000000000000000000000000000000000000000000000000000000+5675
##   2013 | 00000000000000000000000000000000000000000000000000000000000000000000+5330
##   2014 | 00000000000000000000000000000000000000000000000000000000000000000000+4670
##   2015 | 00000000000000000000000000000000000000000000000000000000000000000000+4322
##   2016 | 00000000000000000000000000000000000000000000000000000000000000000000+2769

From the Stem and Leaf plot above we can see that the incidents peaked at 2006 and decreased from 2009 to 2016.

2. Distribution of incidents different areas in NYC.

Now let’s look at the distribution of incidents over different areas in NYC.

area.year<- unique(ccrb_data[c("UniqueComplaintId","Incident Year","Borough of Occurrence")])
area.year<- data.frame(area.year)
ggplot(area.year,aes(Incident.Year,fill=Borough.of.Occurrence))+geom_bar()                     

From the plot above, it looks like there is an even decrease of incidents over the years in all the Boroughs. Staten Island has the smallest number of incidents compared to the others. Brooklyn has the highest incidents compared to other areas.

Histograms and frequency polygons of occurrence areas

ggplot(area.year,aes(Incident.Year,color=Borough.of.Occurrence))+geom_freqpoly(binwidth=1)

From this plot above, we can see that almost all areas have a trenmendous increase from 2004-2005, all started to decrease after 2006, each borough of occurrence alomst has the same number of incidents at 2016. Brooklyn has the highest counts of incidents, Manhattan and Bronx are almost the same, Queens follows the next, and Staten Island has the lowest counts of incidents.

3. Location of Incidents over different areas in NYC

Now let’s look at the location of incidents over different areas in NYC.

loc.area.year<- unique(ccrb_data[c("UniqueComplaintId","Incident Year","Incident Location")])
loc.area.year<- as.data.frame(table(loc.area.year$`Incident Year`,loc.area.year$`Incident Location`))
ggplot(loc.area.year,aes(Var1,Freq,color=Var2))+geom_point()

From the plot below it looks like the Street/highway reported incidents has the biggesdt change over the years, the frequency was higher in 2006-2009 and then steadyly decreased from 2010 to 2016. Apartment/house has a slightly increase from 2006-2009, remains steady since 2010 until 2015, Whereas with the other locations, we did not see this huge change.

4. Mode of Reporting per year

Now let’s look at mode of reporting incidents and whether there is a preference for one method over the other.

mode.year<- data.frame(unique(ccrb_data[c("UniqueComplaintId","Incident Year","Complaint Filed Mode")]))
ggplot(mode.year,aes(Complaint.Filed.Mode,colors = Complaint.Filed.Mode))+geom_bar()

From the above plot, it looks like Phone is highly used as the reporting mode, next is the Call Processing System and then comes the online website.

5. Exploring Reasons for initial contact of incident reporting

Now let’s look at the reasons for initial contact of incident reporting.

reason.year<- data.frame(unique(ccrb_data[c("UniqueComplaintId","Incident Year","Reason For Initial Contact")]))
order<- data.frame(sort(table(reason.year$Reason.For.Initial.Contact),decreasing = TRUE))
ggplot(order[1:15,],aes(Var1,Freq))+geom_point()+coord_flip()

From the above plot, “P/D suspected C/V of Violation/Crime - Street” is the No.1 reason for initial contact of incident reporting. Other followed as the secound, we didn’t see this high frequency in other reasons.

6. Exploring Encounter outcomes for incidents

Now let’s look at Encounter outcomes for incidents.

outcome.year<- data.frame(unique(ccrb_data[c("UniqueComplaintId","Incident Year","Reason For Initial Contact","Encounter Outcome")]))
order<- data.frame(sort(table(outcome.year$Encounter.Outcome),decreasing = TRUE))
ggplot(order[1:4,],aes(Var1,Freq))+geom_point()

A majority of the complaints results fall in “No Arrests or Summons”. The second result goes to “Arrest”,which we will explore more in the second plot.

7. Exploring Encounter outcomes for incidents and its relation to reasons for initial contact

Now let’s look at encounter outcomes for incidents and its relation to reasons for initial contact

reasons<- data.frame(sort(table(outcome.year$Reason.For.Initial.Contact),decreasing = TRUE))
outcome.year<- as.data.frame(outcome.year[outcome.year$Reason.For.Initial.Contact %in% reasons$Var1[1:5],])
ggplot(outcome.year,aes(Reason.For.Initial.Contact,fill=Encounter.Outcome))+geom_bar()+coord_flip()

This plot shows that majority of the cases that were suspected as violation/crime in the street led to arrests.The majority of the cases that were suspected as other led to no Arrest or Summons.

Summary

As we have seen in this Exploratory Data Analysis of Civilian incident reports from CCRB. We discovered several important trends as followed.

  1. Reporting of overall incidents has declined over years from 2009 to 2016, incidents peaked at 2006 and decreased from 2009 to 2016.
  2. Brooklyn has the highest incidents, whereas Staten island has the lowest incidents.
  3. Street/highway reported incidents has the biggesdt change over the years, the frequency was higher in 2006-2009 and then steadyly decreased from 2010 to 2016. Apartment/house has a slightly increase from 2006-2009, remains steady since 2010 until 2015, Whereas with the other locations, we did not see this huge change.
  4. Phone is highly used as the reporting mode and next is call processing system.
  5. “P/D suspected C/V of Violation/Crime - Street” is the No.1 reason for initial contact of incident reporting.
  6. A majority of the complaints results fall in “No Arrests or Summons”.
  7. Majority of the cases that were suspected as violation/crime in the street led to arrests. The majority of the cases that were suspected as other led to no Arrest or Summons.