The objective of this assingment is to conduct an exploratory data analysis of NYC Data Transparnecy Initiative.Purpose of this assignment is to apply tools and techniques learned in the regular class session relating to visualiation approaches to exploring a data set.NYC Data Transparnecy Initiative is a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Intention is to identify patterns and trends within the data that may be indicative of large scale trends.
# Resetting the Rstudio environment variables, plots and loading the data and libraries that are required to perform EDA.
rm(list = ls())
dev.off()
## null device
## 1
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
CCRB.Data <- read_excel("D:\\HU\\Sem 2\\ANLY 512-50\\Assignment 4\\ccrb_datatransparencyinitiative.xlsx",
sheet = "Complaints_Allegations")
str(CCRB.Data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 204397 obs. of 16 variables:
## $ DateStamp : POSIXct, format: "2016-11-29" "2016-11-29" ...
## $ UniqueComplaintId : num 11 18 18 18 18 18 18 18 18 18 ...
## $ Close Year : num 2006 2006 2006 2006 2006 ...
## $ Received Year : num 2005 2004 2004 2004 2004 ...
## $ Borough of Occurrence : chr "Manhattan" "Brooklyn" "Brooklyn" "Brooklyn" ...
## $ Is Full Investigation : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
## $ Complaint Has Video Evidence : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Complaint Filed Mode : chr "On-line website" "Phone" "Phone" "Phone" ...
## $ Complaint Filed Place : chr "CCRB" "CCRB" "CCRB" "CCRB" ...
## $ Complaint Contains Stop & Frisk Allegations: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Incident Location : chr "Street/highway" "Street/highway" "Street/highway" "Street/highway" ...
## $ Incident Year : num 2005 2004 2004 2004 2004 ...
## $ Encounter Outcome : chr "No Arrest or Summons" "Arrest" "Arrest" "Arrest" ...
## $ Reason For Initial Contact : chr "Other" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" ...
## $ Allegation FADO Type : chr "Abuse of Authority" "Abuse of Authority" "Discourtesy" "Discourtesy" ...
## $ Allegation Description : chr "Threat of arrest" "Refusal to obtain medical treatment" "Word" "Word" ...
# Removing the Duplicate rows form the data if any
CCRB.Data.No.Dupes <- CCRB.Data[!duplicated(CCRB.Data),]
str(CCRB.Data.No.Dupes)
## Classes 'tbl_df', 'tbl' and 'data.frame': 166916 obs. of 16 variables:
## $ DateStamp : POSIXct, format: "2016-11-29" "2016-11-29" ...
## $ UniqueComplaintId : num 11 18 18 18 18 34 34 40 55 55 ...
## $ Close Year : num 2006 2006 2006 2006 2006 ...
## $ Received Year : num 2005 2004 2004 2004 2004 ...
## $ Borough of Occurrence : chr "Manhattan" "Brooklyn" "Brooklyn" "Brooklyn" ...
## $ Is Full Investigation : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
## $ Complaint Has Video Evidence : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Complaint Filed Mode : chr "On-line website" "Phone" "Phone" "Phone" ...
## $ Complaint Filed Place : chr "CCRB" "CCRB" "CCRB" "CCRB" ...
## $ Complaint Contains Stop & Frisk Allegations: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Incident Location : chr "Street/highway" "Street/highway" "Street/highway" "Street/highway" ...
## $ Incident Year : num 2005 2004 2004 2004 2004 ...
## $ Encounter Outcome : chr "No Arrest or Summons" "Arrest" "Arrest" "Arrest" ...
## $ Reason For Initial Contact : chr "Other" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" "PD suspected C/V of violation/crime - street" ...
## $ Allegation FADO Type : chr "Abuse of Authority" "Abuse of Authority" "Discourtesy" "Force" ...
## $ Allegation Description : chr "Threat of arrest" "Refusal to obtain medical treatment" "Word" "Physical force" ...
head(CCRB.Data.No.Dupes)
## # A tibble: 6 x 16
## DateStamp UniqueComplaintId `Close Year` `Received Year`
## <dttm> <dbl> <dbl> <dbl>
## 1 2016-11-29 11 2006 2005
## 2 2016-11-29 18 2006 2004
## 3 2016-11-29 18 2006 2004
## 4 2016-11-29 18 2006 2004
## 5 2016-11-29 18 2006 2004
## 6 2016-11-29 34 2006 2005
## # ... with 12 more variables: `Borough of Occurrence` <chr>, `Is Full
## # Investigation` <lgl>, `Complaint Has Video Evidence` <lgl>, `Complaint
## # Filed Mode` <chr>, `Complaint Filed Place` <chr>, `Complaint Contains
## # Stop & Frisk Allegations` <lgl>, `Incident Location` <chr>, `Incident
## # Year` <dbl>, `Encounter Outcome` <chr>, `Reason For Initial
## # Contact` <chr>, `Allegation FADO Type` <chr>, `Allegation
## # Description` <chr>
DateStamp <- as.Date(CCRB.Data.No.Dupes$DateStamp)
str(CCRB.Data.No.Dupes$DateStamp)
## POSIXct[1:166916], format: "2016-11-29" "2016-11-29" "2016-11-29" "2016-11-29" "2016-11-29" ...
No of complaints recieved from each borough and year of incident occurence
Analyzed No of complaints recieved by year and whether or not Full Investigation has taken place
Inorder to see the time taken to close a case, a line chart is visualized below over the period as they received the incidents. the trend is response time is better in current decase compared to previous one.
Below graph is no of complaints by borough and mode of Complaint to see how the complainant registered with CCRB.
Below graph is no of complaints by location and mode of Complaint to see how the complainant registered with CCRB by loation.
This visualization is helps to identify whether arrests have been made or not based on frisk allegations.
Which type of allegations are more prevalant in a borough and identify trends and patterns to prevent such allegations in future.
This visualization shows that complaints have been peaked in early stages of current decade and since in downward trends, which shows that authorities have been responding effectively.
Below Stacked plot shows that streets and highways are where large number of incidents have been taken place.
Below is a density plot of Video evidence and Allegation type
Summary:
Data visualization and exploratory data analysis (EDA) are interdependent in solving any business problem. EDA helps to tell the story of data accurately. EDA is very important step in any statistical analysis. With help of EDA data driven decision are possible.Particularly in this assignment It help me to find the trends and patterns in the data to answermy question with making any assumptions.