Objectives

The objective of this assignment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

data<-read.csv("/Users/jain/Desktop/EDAData.csv")
summary(data)
##       DateStamp      UniqueComplaintId   Close.Year   Received.Year 
##  11/29/2016:204397   Min.   :    1     Min.   :2006   Min.   :1999  
##                      1st Qu.:17356     1st Qu.:2008   1st Qu.:2007  
##                      Median :34794     Median :2010   Median :2009  
##                      Mean   :34778     Mean   :2010   Mean   :2010  
##                      3rd Qu.:52204     3rd Qu.:2013   3rd Qu.:2012  
##                      Max.   :69492     Max.   :2016   Max.   :2016  
##                                                                     
##    Borough.of.Occurrence Is.Full.Investigation
##  Bronx        :49442     Mode :logical        
##  Brooklyn     :72215     FALSE:107084         
##  Manhattan    :42104     TRUE :97313          
##  Outside NYC  :  170                          
##  Queens       :30883                          
##  Staten Island: 9100                          
##  NA's         :  483                          
##  Complaint.Has.Video.Evidence             Complaint.Filed.Mode
##  Mode :logical                Call Processing System: 42447   
##  FALSE:195530                 E-mail                :   799   
##  TRUE :8867                   Fax                   :   356   
##                               In-person             :  9586   
##                               Mail                  :  3424   
##                               On-line website       : 14197   
##                               Phone                 :133588   
##        Complaint.Filed.Place Complaint.Contains.Stop...Frisk.Allegations
##  CCRB             :130877    Mode :logical                              
##  IAB              : 69214    FALSE:119856                               
##  Precinct         :  3548    TRUE :84541                                
##  Other City agency:   295                                               
##  Mayor's Office   :   157                                               
##  Other            :   110                                               
##  (Other)          :   196                                               
##             Incident.Location  Incident.Year             Encounter.Outcome
##  Street/highway      :123274   Min.   :1999   Arrest              :89139  
##  Apartment/house     : 34720   1st Qu.:2007   No Arrest or Summons:82964  
##  Residential building: 12421   Median :2009   Other/NA            : 1050  
##  Police building     :  8968   Mean   :2010   Summons             :31244  
##  Subway station/train:  6077   3rd Qu.:2012                               
##  (Other)             : 15581   Max.   :2016                               
##  NA's                :  3356                                              
##                                 Reason.For.Initial.Contact
##  PD suspected C/V of violation/crime - street:60107       
##  Other                                       :39030       
##  PD suspected C/V of violation/crime - bldg  :16067       
##  PD suspected C/V of violation/crime - auto  :12953       
##  Moving violation                            : 8843       
##  (Other)                                     :66542       
##  NA's                                        :  855       
##          Allegation.FADO.Type
##  Abuse of Authority:102173   
##  Discourtesy       : 34452   
##  Force             : 61761   
##  Offensive Language:  6008   
##  NA's              :     3   
##                              
##                              
##                            Allegation.Description
##  Physical force                       :44116     
##  Word                                 :31704     
##  Stop                                 :12944     
##  Search (of person)                   :12250     
##  Refusal to provide name/shield number:10359     
##  (Other)                              :93021     
##  NA's                                 :    3

Section 1: Visualisation

Vis 1: Number of Complaints received each year

This graph will provide an idea as to whether crimes have increased, decreased, or are consistent.

install.packages("ggplot2",repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/_f/hk0n1w157s5bfmt2ykvxt1yh0000gn/T//Rtmp9EVb7U/downloaded_packages
library(ggplot2)
ggplot(data, aes(x=Received.Year, fill=Received.Year)) + 
  geom_line(stat = "count") + 
  labs(title="Number of Complaints received by Year", x="Received Year", y="Number of Complaints")

Vis 2: Number of Complaints from a Borough

The graph shows the Boroughs with the highest level of crime, this can help determine the focus areas.

graph2 <- table(data$Borough.of.Occurrence)
pie(graph2, radius = 0.5, col = c("green", "red", "violet","cornsilk", "cyan", "yellow","pink"))

Vis 3: Video Evidence exist or nor

This graph shows if most of the crimes have a video evidence available ?

ggplot(data, aes(x=Received.Year, fill=Complaint.Has.Video.Evidence)) + 
  geom_histogram(stat = "count") + 
  labs(title="Availability of Video Evidence", x="Video Evidence", y="Number of Complaints")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 4: Complaints by Allegation Type

This graph shows the number of complaints by each allegation type

ggplot(data, aes(x = data$Allegation.FADO.Type, fill = data$Allegation.FADO.Type)) + geom_histogram(stat = "count")+labs(title="Complaints by Type of Allegation", x="Allegation Type", y="Number of Complaints")+theme(legend.position = "bottom") +
  scale_fill_discrete(name="Allegation Type")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 5: Different modes used to make Complaints

This graph shows the number of complaints by each allegation type.

ggplot(data, aes(x = data$Complaint.Filed.Mode, fill = data$Complaint.Filed.Mode)) + geom_histogram(stat = "count")+labs(title="Mode of Complaint", x="Mode", y="Number of Complaints") +  theme(legend.position = "bottom") + scale_fill_discrete(name="Mode")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 6: Histogram of incident year

This graph shows in which year the incidents were the highest.

hist(data$Incident.Year, main="Histogram for Incident Year", xlab="Incident Year", border="black", breaks = 20, col="red")

Vis 7: Full Investigation was done or not

This graph shows whether there was a full investigation done of the crime or not.

ggplot(data, aes(x = Borough.of.Occurrence, fill = Is.Full.Investigation)) + geom_bar(stat = 'count') + labs(title = "Full Investigation True or False for crime", x = "Location", Y = "Count") + theme_minimal()

Vis 8: Analysis of received year versus close year

This graph shows the comparison between when the case was received versus when it was closed.

boxp <- ggplot(data, aes(grouping(Close.Year), x=Close.Year,y=Received.Year))
boxp + geom_jitter(width = 0.3, alpha = .2) + geom_boxplot(alpha = .25)
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

Vis 9: What kind of incients occured in each borough

This graph shows the type of incidents in each borough, this way the cops can pin point the troble areas.

ggplot(data, aes(x=Borough.of.Occurrence, fill=Incident.Location)) + 
  geom_histogram(stat = "count") + 
  labs(title="Complaints by the location of the Incident", x="Borough", y="Number of Complaints") +
  theme(legend.position = "bottom") +
  scale_fill_discrete(name="Incident Location")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Vis 10: What was the outcome of the encounter in each Borough

This graph shows the outcome of the encounter in each Borough.

ggplot(data, aes(x=Borough.of.Occurrence, fill=Encounter.Outcome)) + 
  geom_histogram(stat = "count") + 
  labs(title="Complaints by Outcome of the Encounter", x="Borough", y="Number of Complaints") +
  theme(legend.position = "bottom") +
  scale_fill_discrete(name="Borough")
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Section 2: Summary

From the first visualisation, we can notice that the number of complaints have been high during 2005-2010 time span. The department needs to do an analysis to understand the reasons. Part of the explantion might be linked to the housing market related recession that left many people jobless and hence turn to crimes. From the next visualisation we can observe that Brooklyn is the borough with the most occurences of the crime. From Vis 3 from majority of the complaint cases there was no video evidence available. Most compalinst have been regarding the abuse of authority, followed by force. It gives an impression that in a state like NY lot of people engage in some kind of abuse of the authority. Most popular system for making complaints is the phone. This comes as no surprise when it is so easy to dial 911 and register a complaint. From Visualisation 1 and Visualisation 6 it is clear that there is a correlation between the year with most incidents and the year with most complaints registered. In most boroughs, the ratio of full investigation versus partial investigation was almost 50 percent. There were a lot of outliers in the analysis of when the case was recieved versus when it was closed. Street/Highways were the most unsafe places as majority of incidents occured there, especially in Brooklyn. There was a near equal chance of Arrest versus no arrest or summon for crimes in each of the borough. Overall this analysis is helpful for the department to identify the problem areas and the kind of crime concentration in the boroughs, they can take measures to improve safety and security of the public.