Objectives

The objective of this assingment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

This link will allow you to download the data set in .xlsx format. The data file has two tabs: one with metadata, and the “Complaints_Allegations” tab with the actual data. Deliverable and Grades

Your final document should include at minimum 10 visualization. Each should include a brief statement of why you made the graphic.

A final section should summarize what you learned from your EDA. Your grade will be based on the quality of your graphics and the sophistication of your findings.

library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
nyc <- read_excel("F:/HU/ANLY 512/Problem Set 4/nyc.xlsx")

Visualization 1

Number of Incidents Per Year

hist(nyc$"Incident Year", main="Incidents per Year", xlab="Year", border="blue", breaks = 15, col="lightblue")

2007 was the year when the most accidents were reported.

Visualization 2

Action on Complaints Per Year

library(ggplot2)
Temp <- table(nyc$"Encounter Outcome", nyc$"Incident Year")
barplot((Temp),main="Action on complaints", xlab="Incident Year", ylab="Action taken on number of cases",col=c("red","green", "blue","orange"), legend = rownames(Temp))

In 2007, there was most arrest than any other year.

Visualization 3

Proportion of Crimes Committed Per Borough

Temp <- table(nyc$"Borough of Occurrence")
pie(Temp, radius = 1, col = c("red", "blue", "green","yellow", "orange", "yellow","violet"))

The majority of crimes were committed in Brooklyn.

Visualization 4

Locations of The Incidents

loc <- ggplot(nyc, aes(y=`Incident Year`,x=`Incident Location`))
loc + geom_boxplot() + coord_flip()

Visualization 5

Incident Year Compared to Close Year

ggplot(nyc, aes(x=`Incident Year`, y=`Close Year`)) + geom_point() + geom_smooth(method = lm) + labs(title="Incident Year vs Close Year", x="Incident Year", y="Close Year")

To compare the incident year and the close year of the complains, we will use the graph below.

Visualization 6

Most Frequent Incidents Types

ggplot(nyc, aes(x=`Allegation Description`)) +geom_bar(width = 0.5, position = position_dodge(width = 0.5))+ theme(axis.text.x=element_text(angle=90, hjust=1))

The most frequent incidents types are Physical force and Word.

Visualization 7

Has video evidence and is fully investigated

ggplot(nyc, aes(`Complaint Has Video Evidence`,fill=nyc$`Is Full Investigation`))+geom_bar() +guides(fill=guide_legend(title = "Full investigation"))+theme_classic()

Visualization 8

Complaints With Full Investigation

ggplot(nyc, aes(x=`Incident Year`, fill=`Is Full Investigation`)) + geom_bar(stat = "count") + labs(title="Complaints with Fully Investigation", x="Incident Year", y="Number") + scale_fill_discrete(name="Fully Investigated")

To have an idea of the number of cases that have been fully investigated, we will use the graph below.

Visualization 9

Illustration by Allegation Type

ggplot(nyc, aes(x=`Allegation FADO Type`, fill=`Allegation FADO Type`)) + geom_bar(stat = "count") + labs(title="Number of Complain by Allegation Type", x="Type", y="Number") + theme(legend.position = "bottom") + scale_fill_discrete(name="Type")

This graph will provide a summary on allegation type, therefore showing which crimes need to be focused on.

Visualization 10

Complaints Filed Per Complaint Filed Mode

Temp <- table(nyc$"Complaint Filed Mode")
barplot(Temp, xlab="Complaint Filed Mode", col="blue")

Most complaints were reported by phone.