Objectives

The objective of this assingment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

This link will allow you to download the data set in .xlsx format. The data file has two tabs: one with metadata, and the “Complaints_Allegations” tab with the actual data.

## Warning: package 'readxl' was built under R version 3.3.3
## Warning: package 'ggplot2' was built under R version 3.3.3

Number of complains in each borough.

Full Investigation Closure boroughwise.

How complains are filed in each borough

How complaints are filed in each borough and with video evidence

Where incidents happened in each Borough.

Where complaints were filed in each Borough?

How incidents were tackeled every year since 1999?

Borough wise incidents encounter outcome.

Arrest per year.

Arrest per year per FADO type.

Summary

The analysis of the above visualizations are as follows - 1. Maximum allegations came from Brooklyn borough. Bronx is at 2nd postion. 2. The alleagtions or complaints are not fully investigated and closed. 3. Maxinum number of complaints were logged through phone calls. 4. It is rare that a complaint logged or filed with a video evidence. 5. Streets are not safe as most of the incidents happend on street. 6. Complaints were mosly filed in CCRB data base. 7. Incidents encounter rate is falling since 2005. 8. Discourtesy and offensive complaints rate are high but those kind of complaints encounter rate is low.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.