Objectives

The objective of this assingment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

This link will allow you to downloa d the data set in .xlsx format. The data file has two tabs: one with metadata, and the “Complaints_Allegations” tab with the actual data.
### Deliverable and Grades

For this assignment you should submit a link to a knitr rendered html document that shows your exploratory data analysis. Organize your analysis using section headings:

# This is a top section

## This is a subsection

Your final document should include at minimum 10 visualization. Each should include a brief statement of why you made the graphic.

A final section should summarize what you learned from your EDA. Your grade will be based on the quality of your graphics and the sophistication of your findings.

Load data from excel

library(readxl)
library(ggplot2)


help("read_excel")
## starting httpd help server ...
##  done
data_CCRB <- read_excel("C:/Users/RBHARADWAJ/Desktop/HU/May 2017/Data Visualizations - Anly 512/ccrb_datatransparencyinitiative.xlsx")

Viz-1

Number of complaints filed in different complaint filed mode.

Temp <- table(data_CCRB$"Complaint Filed Mode")

barplot(Temp, xlab="Complaint Filed Mode", col="purple")

Viz-2

Complaints closed in a given year from different filling years.

library(ggplot2)
Temp <- table(data_CCRB$"Incident Year", data_CCRB$"Close Year")
barplot((Temp),main="cases closed in a year from past", xlab="Close Year", ylab="number of cases",col=c("darkblue","red","green","grey", "yellow", "dodgerblue","orange","darkolivegreen","darkorange4",
"darkorchid","darkorchid1","darkorchid2"), legend = rownames(Temp)
      
    )

Viz-3

No.of Incidens took place in a given year.It’s clear maximum number of incidents were reported in 2007.

hist(data_CCRB$"Incident Year", main="Histogram for Incident Year", xlab="Incident Year", border="red", breaks = 15, col="darkolivegreen3")

Viz-4

Boroughs where crime was committed and in what proportion crimes were done.

Temp <- table(data_CCRB$"Borough of Occurrence")
pie(Temp, radius = 1, col = c("purple", "red", "green3","cornsilk", "cyan", "yellow","violet"))

Viz-5

Dot plot to compare time line between complaints received and closed.

attach(data_CCRB)
plot(data_CCRB$"Received Year", data_CCRB$"Close Year", main=" Received Year vs Close Year", xlab="Close Year", ylab="Received Year",pch=15,col="Red")

Viz-6

Print first 10 rows of data_CCRB

head(data_CCRB, n=10)
## # A tibble: 10 x 16
##     DateStamp UniqueComplaintId `Close Year` `Received Year`
##        <dttm>             <dbl>        <dbl>           <dbl>
##  1 2016-11-29                11         2006            2005
##  2 2016-11-29                18         2006            2004
##  3 2016-11-29                18         2006            2004
##  4 2016-11-29                18         2006            2004
##  5 2016-11-29                18         2006            2004
##  6 2016-11-29                18         2006            2004
##  7 2016-11-29                18         2006            2004
##  8 2016-11-29                18         2006            2004
##  9 2016-11-29                18         2006            2004
## 10 2016-11-29                18         2006            2004
## # ... with 12 more variables: `Borough of Occurrence` <chr>, `Is Full
## #   Investigation` <lgl>, `Complaint Has Video Evidence` <lgl>, `Complaint
## #   Filed Mode` <chr>, `Complaint Filed Place` <chr>, `Complaint Contains
## #   Stop & Frisk Allegations` <lgl>, `Incident Location` <chr>, `Incident
## #   Year` <dbl>, `Encounter Outcome` <chr>, `Reason For Initial
## #   Contact` <chr>, `Allegation FADO Type` <chr>, `Allegation
## #   Description` <chr>

Viz-7

Print last 10 rows of data_CCRB

tail(data_CCRB, n=10)
## # A tibble: 10 x 16
##     DateStamp UniqueComplaintId `Close Year` `Received Year`
##        <dttm>             <dbl>        <dbl>           <dbl>
##  1 2016-11-29             69463         2016            2016
##  2 2016-11-29             69469         2016            2016
##  3 2016-11-29             69475         2016            2016
##  4 2016-11-29             69475         2016            2016
##  5 2016-11-29             69476         2016            2016
##  6 2016-11-29             69476         2016            2016
##  7 2016-11-29             69476         2016            2016
##  8 2016-11-29             69476         2016            2016
##  9 2016-11-29             69476         2016            2016
## 10 2016-11-29             69476         2016            2016
## # ... with 12 more variables: `Borough of Occurrence` <chr>, `Is Full
## #   Investigation` <lgl>, `Complaint Has Video Evidence` <lgl>, `Complaint
## #   Filed Mode` <chr>, `Complaint Filed Place` <chr>, `Complaint Contains
## #   Stop & Frisk Allegations` <lgl>, `Incident Location` <chr>, `Incident
## #   Year` <dbl>, `Encounter Outcome` <chr>, `Reason For Initial
## #   Contact` <chr>, `Allegation FADO Type` <chr>, `Allegation
## #   Description` <chr>

Viz-8

Action on number of complaints every incident year

library(ggplot2)
Temp <- table(data_CCRB$"Encounter Outcome", data_CCRB$"Incident Year")
barplot((Temp),main="Action on complaints", xlab="Incident Year", ylab="Action taken on number of cases",col=c("red","green", "dodgerblue","orange"), legend = rownames(Temp)
      
    )

Viz-9

Line graph to show variations in number of complaints over closing years.

plot(data_CCRB$"Close Year", type = "l",lwd=2, col="blue", xlab = "Number of Complaints", ylab="Close Year")

Viz-10

Distribution of complaints over allegation type and close year.

library(ggplot2)
Temp <- table(data_CCRB$"Allegation FADO Type", data_CCRB$"Close Year")
barplot((Temp),main="Complaints distributed over allegations each close year", xlab="Close Year", ylab="Number of Complaints",horiz = TRUE, col=c("coral4","coral3", "coral2","coral1","coral"), legend = rownames(Temp)
      
    ) 

Conclusion

This exploratory data exercise was very helpful in understanding the distribution and trend of underlying data. Using the

vizualization techniques with CCRB data we can easily understand many things that otherwise seem to be hidden in the sea of

data. Comparing number of cases filed in a year and closed in that year gives us an idea about how much time on an average it

takes to conclude the complaints.We can also explore which year or location was receiving more/less complaints and further we

can understand whether the more number of complaints can be attributed to more crimes or strict policing. This exercise was

very helpful in terms of data exploration and R tool exploration.

Thanks,

Rupesh