Objective

The objective of this assingment is to conduct an exploratory data analysis of NYC Data Transparnecy Initiative.Purpose of this assignment is to apply tools and techniques learned in the regular class session relating to visualiation approaches to exploring a data set.NYC Data Transparnecy Initiative is a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Intention is to identify patterns and trends within the data that may be indicative of large scale trends.

# Resetting the Rstudio environment variables, plots and loading the data and libraries that are required to perform EDA.
rm(list = ls())
dev.off()
## null device 
##           1
library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
CCRB.Data <- read_excel("D:\\HU\\Sem 2\\ANLY 512-50\\Assignment 4\\ccrb_datatransparencyinitiative.xlsx", 
                                              sheet = "Complaints_Allegations")
str(CCRB.Data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    204397 obs. of  16 variables:
##  $ DateStamp                                  : POSIXct, format: "2016-11-29" "2016-11-29" ...
##  $ UniqueComplaintId                          : num  11 18 18 18 18 18 18 18 18 18 ...
##  $ Close Year                                 : num  2006 2006 2006 2006 2006 ...
##  $ Received Year                              : num  2005 2004 2004 2004 2004 ...
##  $ Borough of Occurrence                      : chr  "Manhattan" "Brooklyn" "Brooklyn" "Brooklyn" ...
##  $ Is Full Investigation                      : logi  FALSE TRUE TRUE TRUE TRUE TRUE ...
##  $ Complaint Has Video Evidence               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Complaint Filed Mode                       : chr  "On-line website" "Phone" "Phone" "Phone" ...
##  $ Complaint Filed Place                      : chr  "CCRB" "CCRB" "CCRB" "CCRB" ...
##  $ Complaint Contains Stop & Frisk Allegations: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Incident Location                          : chr  "Street/highway" "Street/highway" "Street/highway" "Street/highway" ...
##  $ Incident Year                              : num  2005 2004 2004 2004 2004 ...
##  $ Encounter Outcome                          : chr  "No Arrest or Summons" "Arrest" "Arrest" "Arrest" ...
##  $ Reason For Initial Contact                 : chr  "Other" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" ...
##  $ Allegation FADO Type                       : chr  "Abuse of Authority" "Abuse of Authority" "Discourtesy" "Discourtesy" ...
##  $ Allegation Description                     : chr  "Threat of arrest" "Refusal to obtain medical treatment" "Word" "Word" ...
# Removing the Duplicate rows form the data if any
CCRB.Data.No.Dupes <- CCRB.Data[!duplicated(CCRB.Data),]

str(CCRB.Data.No.Dupes)
## Classes 'tbl_df', 'tbl' and 'data.frame':    166916 obs. of  16 variables:
##  $ DateStamp                                  : POSIXct, format: "2016-11-29" "2016-11-29" ...
##  $ UniqueComplaintId                          : num  11 18 18 18 18 34 34 40 55 55 ...
##  $ Close Year                                 : num  2006 2006 2006 2006 2006 ...
##  $ Received Year                              : num  2005 2004 2004 2004 2004 ...
##  $ Borough of Occurrence                      : chr  "Manhattan" "Brooklyn" "Brooklyn" "Brooklyn" ...
##  $ Is Full Investigation                      : logi  FALSE TRUE TRUE TRUE TRUE TRUE ...
##  $ Complaint Has Video Evidence               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Complaint Filed Mode                       : chr  "On-line website" "Phone" "Phone" "Phone" ...
##  $ Complaint Filed Place                      : chr  "CCRB" "CCRB" "CCRB" "CCRB" ...
##  $ Complaint Contains Stop & Frisk Allegations: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Incident Location                          : chr  "Street/highway" "Street/highway" "Street/highway" "Street/highway" ...
##  $ Incident Year                              : num  2005 2004 2004 2004 2004 ...
##  $ Encounter Outcome                          : chr  "No Arrest or Summons" "Arrest" "Arrest" "Arrest" ...
##  $ Reason For Initial Contact                 : chr  "Other" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" ...
##  $ Allegation FADO Type                       : chr  "Abuse of Authority" "Abuse of Authority" "Discourtesy" "Force" ...
##  $ Allegation Description                     : chr  "Threat of arrest" "Refusal to obtain medical treatment" "Word" "Physical force" ...
head(CCRB.Data.No.Dupes)
## # A tibble: 6 x 16
##    DateStamp UniqueComplaintId `Close Year` `Received Year`
##       <dttm>             <dbl>        <dbl>           <dbl>
## 1 2016-11-29                11         2006            2005
## 2 2016-11-29                18         2006            2004
## 3 2016-11-29                18         2006            2004
## 4 2016-11-29                18         2006            2004
## 5 2016-11-29                18         2006            2004
## 6 2016-11-29                34         2006            2005
## # ... with 12 more variables: `Borough of Occurrence` <chr>, `Is Full
## #   Investigation` <lgl>, `Complaint Has Video Evidence` <lgl>, `Complaint
## #   Filed Mode` <chr>, `Complaint Filed Place` <chr>, `Complaint Contains
## #   Stop & Frisk Allegations` <lgl>, `Incident Location` <chr>, `Incident
## #   Year` <dbl>, `Encounter Outcome` <chr>, `Reason For Initial
## #   Contact` <chr>, `Allegation FADO Type` <chr>, `Allegation
## #   Description` <chr>
DateStamp <- as.Date(CCRB.Data.No.Dupes$DateStamp)
str(CCRB.Data.No.Dupes$DateStamp)
##  POSIXct[1:166916], format: "2016-11-29" "2016-11-29" "2016-11-29" "2016-11-29" "2016-11-29" ...

VIZ 01 :

No of complaints recieved from each borough and year of incident occurence

VIZ 02:

Analyzed No of complaints recieved by year and whether or not Full Investigation has taken place

VIZ 03:

Inorder to see the time taken to close a case, a line chart is visualized below over the period as they received the incidents. the trend is response time is better in current decase compared to previous one.

VIZ 04

Below graph is no of complaints by borough and mode of Complaint to see how the complainant registered with CCRB.

VIZ 05

Below graph is no of complaints by location and mode of Complaint to see how the complainant registered with CCRB by loation.

VIZ 06

This visualization is helps to identify whether arrests have been made or not based on frisk allegations.

VIZ 07

Which type of allegations are more prevalant in a borough and identify trends and patterns to prevent such allegations in future.

VIZ 08

This visualization shows that complaints have been peaked in early stages of current decade and since in downward trends, which shows that authorities have been responding effectively.

VIZ 09

Below Stacked plot shows that streets and highways are where large number of incidents have been taken place.

VIZ 10

Below is a density plot of Video evidence and Allegation type

Summary:

Data visualization and exploratory data analysis (EDA) are interdependent in solving any business problem. EDA helps to tell the story of data accurately. EDA is very important step in any statistical analysis. With help of EDA data driven decision are possible.Particularly in this assignment It help me to find the trends and patterns in the data to answermy question with making any assumptions.