Objectives

The objective of this assingment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

This link will allow you to download the data set in .xlsx format. The data file has two tabs: one with metadata, and the “Complaints_Allegations” tab with the actual data.

Deliverable and Grades

For this assignment you should submit a link to a knitr rendered html document that shows your exploratory data analysis. Organize your analysis using section headings:

library(ggplot2)
library(ggthemes)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(forcats)

ccrb <- unique(read.csv("/Users/arka/Desktop/Harrisburg_University_Courses/Semester_2_Late_Fall/ANLY 512-50/ccrb.csv"))

Full Investigation Location Analysis (Stacked Bar Chart)

ggplot(ccrb, aes(x = Borough.of.Occurrence, fill = Is.Full.Investigation)) +
       geom_bar(stat = "count") + 
       labs(title = "Full Investigation Location Analysis", x = "Location", y = "Count") +
       theme_economist()

Complaints received by Year from 2000 to 2020

ggplot(ccrb, aes(x = Received.Year, fill = Allegation.FADO.Type)) +
       geom_bar(stat = "count") +
       labs(title = "Complaints Received from 2000 to 2020", x = "Year", y = "Complaints")

Complaints received by Year (Scatter Plot)

aggregate <- ccrb %>% 
                  group_by(Received.Year) %>% 
                  summarise(count = n())

ggplot(aggregate, aes(x = Received.Year, y = count)) +
       xlab("Received Year") +
       ylab("Count") +
       geom_point() +
       geom_line() +
       ggtitle("Frequency Count vs Received Year") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

Complaints closed by Year (Scatter Plot)

aggregate <- ccrb %>% 
                  group_by(Close.Year) %>% 
                  summarise(count = n())

ggplot(aggregate, aes(x = Close.Year, y = count)) +
       xlab("Closed Year") +
       ylab("Count") +
       geom_point() +
       geom_line() +
       ggtitle("Frequency Count vs Closed Year") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

Complaints by Borough Locations and Allegation Type

ggplot(ccrb, aes(x = Borough.of.Occurrence, fill = Allegation.FADO.Type)) +
       geom_bar(stat = "count") +
       labs(title = "Received complaints by Borough and Allegation Type", x = "Borough", y = "Complaints")

Classification of Allegations

ggplot(ccrb, aes(x = fct_infreq(Allegation.FADO.Type))) +
       geom_bar() +
       xlab("Allegation Type")

Incident Location in descending order of frequency counts

ggplot(ccrb, aes(x = fct_infreq(Incident.Location))) +
       geom_bar() +
       xlab("Incident Location") +
       ylab("Frequency Count") +
       coord_flip()

Analysis of complaints with video evidence

ggplot(ccrb, aes(x = Borough.of.Occurrence, fill = Complaint.Has.Video.Evidence)) +
       geom_bar(stat = "count") +
       labs(title = "Complaints with Video Evidence", x = "Location", y = "Frequency Count") +
       theme_economist()

Analysis of full investigation with complaint filing mode

ggplot(ccrb, aes(x = Complaint.Filed.Mode, fill = Is.Full.Investigation)) +
       geom_bar(stat = "count") +
       labs(title = "Full investigation with complaint filing mode", x = "Complaint Filing Mode", y = "Complaints")

Outcome of Encounter by Borough

ggplot(ccrb, aes(x = Encounter.Outcome, fill = Borough.of.Occurrence)) +
       geom_bar(stat = "count") +
       labs(title = "Borough vs Encounter Outcome", x = "Encounter Outcome", y = "Frequency Count")