Objectives

Data recourse: link download the data set of “Complaints_Allegations”in .xlsx format.

Import Data

library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggthemes)

#After downloading the file in my computer, I removed the first data description sheet.
ccrb <-read_excel("~/Downloads/ccrb.xlsx")
View(ccrb)
df = data.frame(ccrb)

Viz 1 The Number of Complaints Filed in Different Modes

Pie Chart

#Pie Chart
pie_data = table(df$Complaint.Filed.Mode) 
pie(pie_data,radius = 1, col = c("blue","yellow","green3","pink","violet","red","orange"))

##However, it is not clear to deploy all the complaints filed mode, therefore, I create the bar chart.

Bar Chart

Temp <- table(ccrb$"Complaint Filed Mode")
barplot(Temp, xlab="Complaint Filed Mode", col="light blue")

## As the bar chart illuminates: most people use phone to coney the complaints. The secondary most feaquent method is Call Processing System. Later, we would like to discover which borough in NYC filed the complaints the most.

Viz 2 Distribution of Complaints in 5 Boroughs and Out of NYC

Pie Chart

loc <- table(ccrb$"Borough of Occurrence")
pie(loc, radius = 1, col = c("pink", "light blue", "green3","cyan", "cornsilk", "yellow","violet"))

##As examing the pie chart, I find that most complaints happens in Brooklyn.Following with the Bronx and Manhattan.The ratio of the Bronx and Manhattan are closed.Therefore, I deploy the bar chart and dig in the detail of the incident location by each borough in NYC.

Viz 3 Bar Char of Incident Location

ggplot(df, aes(x = Borough.of.Occurrence, fill = Incident.Location)) + geom_bar(stat = 'count') + labs(title = "Incident Location", x = "Location", Y = "Number of Complaints") + theme_bw()

#As we can see in the bar chart, we know that street/highway is the highest incident happend place. The second one is resident building.Usually, citizen would able to report the complaints by phone. After knowing the mode and location of complaint be filed, I would like to learn more about the efficiency of complaints solving.

Viz 4 Complaints by Receiced Year

df.by.receiveyear <- df %>% 
                          group_by(Received.Year) %>%
                            summarize(num_case = n_distinct(UniqueComplaintId)) %>%
                              select(Received.Year, num_case)

ggplot(data = df.by.receiveyear, aes(x = Received.Year, y = num_case)) + 
  geom_line(alpha = 0.5) + 
  ggtitle('Number of Complaints by Received Year') + 
  xlab('Received Year') + 
  ylab('Number of Cases') + 
  theme_economist()

From 2005 to 2010, the officer received the most complaints in NYC. After 2009, the number of receiving complaints gradually decreased.

Viz 5 The Number of Incident Area by Year

ggplot(ccrb, aes(ccrb$`Incident Year`,fill=ccrb$`Borough of Occurrence`)) +geom_bar() +guides(fill=guide_legend(title = "Incident Boroughs"))+theme_dark()

##The peak of incidents year is 2006.The number of incidents happened in Brooklyn remains high for four to five year since 2006.

Viz 6 Summary of Full Investigation Or Not

ggplot(df, aes(x = Borough.of.Occurrence, fill = Is.Full.Investigation)) + geom_bar(stat = 'count') + labs(title = "Full Investigation True or False", x = "Location", Y = "Count")

##The degree of investigatino is one of the key factors affact the complaints resolve efficency.According to the data, half of the complaints were full investigation.

Viz 7 Summary of “Complaints have video evidence or not

ggplot(df, aes(x = Borough.of.Occurrence, fill = Complaint.Has.Video.Evidence)) + geom_bar(stat = 'count') + labs(title = "Complaints Have Video Evidence or Not", 
x = "Location", Y = "Number of Complaints") + theme_bw()

##If the police officers have the video as evidence, it would highly raise the complaints resolved efficency.According to the data, not more than 10% of complaints has video evidence. This data points out the difficulty to solve the complaints.

Viz 8 The Efficiency of Complaint Case Closing Time

Temp <- table(ccrb$"Incident Year", ccrb$"Close Year")
barplot((Temp),main="Cases Closed in One year", xlab="Close Year", ylab="Number of Cases",col=c("pink","violet","lightgreen","grey", "yellow", "lightblue","white","darkolivegreen","orange4",
"darkorchid","blue3","darkgreen"), legend = rownames(Temp))

##By deploying the close year in the data, I would know the duration to solve the complaints. Assumed One year as a stanard to solve the complaint,I found most of complaints took more than one year to close the case.

Viz 9 Number Of Complaints by Close Year

df.by.closeyear <- df %>% 
                          group_by(Close.Year) %>%
                            summarize(num_case = n_distinct(UniqueComplaintId)) %>%
                              select(Close.Year, num_case)

ggplot(data = df.by.closeyear, aes(x = Close.Year, y = num_case)) + 
  geom_line(alpha = 0.5) + 
  ggtitle('Number of Complaints by Close Year') + 
  xlab('Close Year') + 
  ylab('Number of Cases') + 
  theme_economist()

##Observing the data, I found 2008 and 2012 is the lowest time to close the complaints. We might need more data to check the reason of the low complaints closed years.

Viz 10 Time Length for Complaints to Be Processed

df.dif <- df %>% 
                distinct(UniqueComplaintId, .keep_all = TRUE) %>%
                  mutate(time_length = Close.Year - Received.Year)

ggplot(data = df.dif, aes(x = time_length)) + 
  geom_bar(width = 0.5, alpha = 0.5) + 
  labs(title = 'Time Length for Complaints to Be Processed', x = 'Time Length (Years)') +
  theme_economist()

#Summary ##Although we might not be opptimistic about the efficiency of complaints sloving, when we check the charts of Viz 5 The Number of Incident Area by Year and Viz 9 Number Of Complaints by Close Year showed; still, we will know the exactly time that the officer solved the complaints. I calculated the time of complaint resolve time by mutating Close.Year - Received.Year. In fact, half of the complaints were resolved within a year.Since half of the complaints filed without the video evidence, I consider this is a great working efficiency for the Police officers and the investigators.In the end, I believe that with the improvement of the cellphone, all the incident bystanders might have chance to use their cell phone to record the incidents and uploaded through WIFI, I think the number of complaints will decrease.