R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Including Plots

You can also embed plots, for example:

df <- read.csv("C:/Users/Vivek/Desktop/Data Visualization/ccrb_datatransparencyinitiative.csv")

# Vis.1 Year by year complaints by borough through Histogram
## This chart shows the number of complaints received between 2004 and 2016 in each borough. Bronx and Brooklyn are the two most crime-sensitive boroughs. However the crime rates in Bronx is in gradual decrease between 2007 to 2016.This plot helps to understand which borough is more sensitive to crime over the period of study.

ggplot(df, aes(x=df$Received.Year, fill=df$Borough.of.Occurrence))+   
  geom_histogram(stat = "count") + 
  labs(title="Year by year complaints by borough", x="Year", y="Nunmber of Complaints")+ 
  scale_fill_discrete(name="Borough of Occurrence") +  theme(legend.position = "bottom")

# Vis.2 Incidence location in each borough
## This plot shows the locations where crime occured in each borough. Of the three most violent boroughs, Brooklyn topped on street related crimes and household crimes. Street crimes in Bronx and Manhattan were almost equal. This analysis is useful for law enforment officers as it identifies locations most susceptible to crime
ggplot(df, aes(x= df$Borough.of.Occurrence, fill= df$Incident.Location))+   
  geom_histogram(stat = "count") + 
  labs(title="Incidence location in each borough", x="Borough", y="Number of Complaint")+ 
  scale_fill_discrete(name="Incident Location") +  theme(legend.position = "bottom")

# Vis.3 Crime occurance of borough and mode of complaint filed
## This plot visualizes mode of complaints filed in each borough. Clearly telephone and call processing system are the two most common modes used to file complaints in each borough. This type of visualization will be useful in analysis of technology used by victims to report a crime and hence to develop infrastructure to develop those modes.

ggplot(df, aes(x=df$Borough.of.Occurrence, fill= df$Complaint.Filed.Mode)) + geom_histogram(stat = "count") + labs (title = "Crime occurance of borough and mode of complaint filed", x="Borough of Occurence", y="Mode of complaint filed") + theme (legend.position = "bottom")

# Vis.4 Mode of complaint filed each year
## This plot is similar to the previous one except that this one takes into account modes of complaint filed against all complaints filed. As we see, there is a gradual decrease in the number of complaints filed through phone and call processing system and more and more victims are using on-line system as we live in the era of internet. So this plot can help borough officals to spend more on developing websites and making it convenient to use for everyone.

ggplot(df, aes(x=df$Received.Year, fill= df$Complaint.Filed.Mode)) + geom_histogram(stat = "count") + labs (title = "Mode of complaint filed each year", x="Year", y="Mode of complaint filed") + theme (legend.position = "bottom")

# Vis.5 Number of complaints by Allegation FADO type
## This plot visualizes the four categories of police misconduct in New York City charter namely: force, abuse of authority, discourtesy and abusive language. The plot clearly shows most complaints were filed against police officers on abuse of their authority. This plot is a critical analysis at a time when police related crimes are in news all across the US.

ggplot(df, aes(x=df$Allegation.FADO.Type, fill= df$Allegation.FADO.Type)) + geom_bar(stat = "count") + labs (title = "Number of complaints by Allegation FADO type", x="Type of Allegations", y="Number of Complaints") + theme (legend.position = "bottom") + scale_fill_discrete(name = "Type of Allegations")

# Vis.6 Number of complaints by Allegation FADO type in each borough
## Another important aspect of previous analysis is to visualize the four categories of police misconduct in each borough. Abuse of authority tops the list in each borough. However the most alarming finding is that Brooklyn comes first in use of excessive force by police as against Bronx.  

ggplot(df, aes(x=df$Borough.of.Occurrence, fill= df$Allegation.FADO.Type)) + geom_bar(stat = "count") + labs (title = "Number of complaints by Allegation FADO type in each Borough", x="Borough of Occurance", y="Number of Complaints") + theme (legend.position = "bottom") + scale_fill_discrete(name = "Type of Allegations")

# Vis.7 Variation of Allegations with time
## This plot is in continuation of our study on police misconducts. This plot visualizes trend in allegations over time. The plot's results is a good news for civil right groups as we see there is big decrease in all four categories of police misconduct.

ggplot(df, aes(x=df$Incident.Year, color= Allegation.FADO.Type)) + geom_line(stat = "count") + labs (title = "Variation of Allegations FADO type with time", x="Incident Year", y="Number of Complaints")

# Vis.8 Crime location with video evidence
## This plot visualizes whether video evidence was present at the place of crime. As we see, there were many subway and train stations where there was no video recording when crime took place. This is again an alarming study specially when terror related activities are in peak in NYC. This visualization can be vital for fixing of CCTV camera in crime-prone locations.

ggplot(df, aes(x=df$Complaint.Has.Video.Evidence, fill= df$Incident.Location)) + geom_histogram(stat = "count") + labs (title = "Crime location with video evidence", x="Video evidence", y="Incidence location") + theme (legend.position = "bottom")

# Vis.9 Complaint filed mode vs Encounter outcome
## Most of the complaints were filed using phone and as a result it saw maximum arrests as outcome. This plot is critical to find the fastest mode of complaint filing for getting prompt action by law enforcement officers.

ggplot(df, aes(x=df$Complaint.Filed.Mode, fill= df$Encounter.Outcome)) + geom_histogram(stat = "count") + labs (title = "Complaint filed mode vs Encounter outcome", x="Complaint filed mode", y="Encounter outcome") + theme (legend.position = "bottom")

# Vis.10 Borough of occurance vs reason for initial contact
## This plot shows the different types of crime or violation for which police contacted alleged people in each borough. Parking violation are among the most common reason for contact. The results of this plot can be used by police to find most sensitive areas in each borough so that crime in those areas can be minimised in future.

ggplot(df, aes(x=df$Borough.of.Occurrence, fill= df$Reason.For.Initial.Contact)) + geom_histogram(stat = "count") + labs (title = "Crime occurance of borough and reason for initial contact", x="Borough of Occurence", y="Reason for initial contact") + theme (legend.position = "bottom")

# EDA Summary:
## EDA helps us gain insights on the datasets which can be explored for meaningful analysis. This project on CCRB dataset visualization helped us to get additional information apart from direct correlation analysis between two variables. For example, the plot on 'Borough of occurance vs reason for initial contact' can be further analysed to understand the areas more prone to crime in each borough and law enforcement personnel can be deployed in those areas to control future crimes.

## Technically, I found histogram or bar charts in ggplot are most suitable to study relationship of two variables and their analysis. 

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.