This is a top section

This is a subsection

Your final document should include at minimum 10 visualization. Each should include a brief statement of why you made the graphic.

A final section should summarize what you learned from your EDA. Your grade will be based on the quality of your graphics and the sophistication of your findings.

library(ggplot2)
data_CCRB <- read.csv(file="C:/Users/Calmth of Life/Dropbox/Harrisburg Semesters/ANLY 512/Problem Set 4/ccrb_datatransparencyinitiative.csv")

Vis 1

This graphic is a bar chart. This graphic shows us the number of cases received each year.

ggplot(data_CCRB, aes(x=Received.Year)) + geom_bar(stat = "count") + labs(title="Complains Received Each Year", x="Received Year", y="Number of Complains")

Vis 2

This graphic is a bar chart. This graphic shows us the different types of complaints.

ggplot(data_CCRB, aes(x=Allegation.FADO.Type, fill=Allegation.FADO.Type)) + geom_bar(stat = "count") + labs(title="Number of Complain by Allegation Type", x="Type", y="Number") + theme(legend.position = "bottom") + scale_fill_discrete(name="Type")

Vis 3

This graphic is a stacked bar chart. This graphic shows us the number of cases fully closed by investigation.

ggplot(data_CCRB, aes(x=Close.Year, fill=Is.Full.Investigation)) + geom_histogram(stat = "count") + labs(title="No. of Cases Closed Each Year by Investigation", x="Close Year", y="Number") + scale_fill_discrete(name="Fully Investigated")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 4

This graphic is a stacked bar chart. This graphic shows us the number of incidents fully closed by investigation having different outcome.

ggplot(data_CCRB, aes(x=Close.Year, fill=Encounter.Outcome)) + geom_histogram(stat = "count") + labs(title="Number of Incidents Closed Each Year by Outcome", x="Close Year", y="Number") +  scale_fill_discrete(name="Encounter Outcome")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 5

This graphic is a stacked bar chart. This graphic shows us the number of cases fully investigated.

ggplot(data_CCRB, aes(x=Incident.Year, fill=Is.Full.Investigation)) + geom_bar(stat = "count") + labs(title="Complaints with Fully Investigation", x="Incident Year", y="Number") + scale_fill_discrete(name="Fully Investigated")

Vis 6

This graphic is a stacked bar chart. This graphic shows us the number of complaints.

ggplot(data_CCRB, aes(x=Incident.Year, fill=Encounter.Outcome)) + geom_histogram(stat = "count") + labs(title="Number of Incident Occurred Each Year Divided by Outcome", x="Incident Year", y="Number") + scale_fill_discrete(name="Outcome")

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 7

This graphic is a stacked bar chart. This graphic shows us the number of cases that have video evidences.

ggplot(data_CCRB, aes(x=Incident.Year, fill=Complaint.Has.Video.Evidence)) + geom_bar(stat = "count") + labs(title="Complaints with Video Evidence", x="Incident Year", y="Number") + scale_fill_discrete(name="Has Video Evidence")

Vis 8

This graphic shows us the year in which the incident happened to the year in which the case was closed.

ggplot(data_CCRB, aes(x=Incident.Year, y=Close.Year)) + geom_point() + geom_smooth(method = lm) + labs(title="Incident Year vs Close Year", x="Incident Year", y="Close Year")

Vis 9

This graphic gives us the idea of the compaint filed mode of each Borough.

ggplot(data_CCRB, aes(x=Borough.of.Occurrence, fill=Complaint.Filed.Mode)) +  geom_bar(stat = "count") + labs(title="Borough of Occurrence by Filed Mode", x="Borough of Occurrence", y="Number") +  scale_fill_discrete(name="Complaint Filed Mode")

Vis 10

This graphic shows us the different modes to file a complaint.

ggplot(data_CCRB, aes(x=Complaint.Filed.Mode, fill=Complaint.Filed.Mode)) + geom_bar(stat = "count") + labs(title="Number of Complain by Filed Mode", x="Mode", y="Number") + theme(legend.position = "bottom") + scale_fill_discrete(name="Mode")

Summary

Exploratory Data Analysis is very helpful in understanding the distribution and trend of underlying data. Using the vizualization techniques with CCRB data we can easily understand many things that otherwise seem to be hidden in the sea of data. Comparing number of cases filed in a year and closed in that year gives us an idea about how much time on an average it takes to conclude the complaints.We can also explore which year or location was receiving more/less complaints and further we can understand whether the more number of complaints can be attributed to more crimes or strict policing. This exercise was very helpful in terms of data exploration and R tool exploration.

ANLY 512 - Problem Set 4

Exploratory Data Analysis

Megha S

May 15, 2018

Objectives

Deliverable and Grades