Objectives

The objective of this assingment is to conduct an exploratory data analysis of a data set that you are not familiar with. In this weeks lecture we discussed a number of visualiation approaches to exploring a data set, this assignment will apply those tools and techniques. An important distinction between class examples and applied data science work is that interative and repetitive nature of exploring a data set. It takes time and understand what is is the data and what is interesting in the data.

For this week we will be exploring data from the NYC Data Transparnecy Initiative. They maintain a database of complaints that fall within the Civilian Complain Review Board (CCRB), an independent municiple agency. Your objective is to identify interesting patterns and trends within the data that may be indicative of large scale trends.

Loading the data set into the R studio.

ccrb_datatransparencyinitiative <- read_excel("C:/Users/nprak/Desktop/Harrisburg Courses/ANLY_512/ccrb_datatransparencyinitiative.xlsx",sheet = "Complaints_Allegations")
View(ccrb_datatransparencyinitiative)
Problemset_4<- ccrb_datatransparencyinitiative
names(Problemset_4) <- gsub(" ", "_", names(Problemset_4))

Lets first do an analysis on case response based on Open year vs Case Closed Year.

ggplot(Problemset_4, aes(x=Incident_Year, y=Close_Year)) + geom_point(shape=14, color="purple") + geom_smooth(method=lm, se=FALSE,  color="red") +labs(title="Relationship between Incident Year and Case Closed Year", x="Incident Year", y="Case Closed Year")

No.of Incidens took place in a given year.

It’s clear maximum number of incidents were reported in 2007.

hist(Problemset_4$Incident_Year, main="Histogram for Incident Year", xlab="Incident Year", border="red", breaks = 15, col="blue")

Analysis on video surviellance over a period of time frame which speaks to advancement in technology for police force.

For Recent years there’s been relative increase in incidents with video evidence

Legend_color <- brewer.pal(8, "Spectral")
Viz_4 <- table(Problemset_4$Complaint_Has_Video_Evidence, Problemset_4$Incident_Year)
barplot((Viz_4),main="Complaints filed with Video evidence each incident year", xlab="Incident Year", ylab="Number of Complaints",horiz = FALSE, col=c(Legend_color), legend = rownames(Viz_4)) 

Distribution of complaints over allegation type and Incident year.

Abuse of Authority seems to be most prominent abuse of authority as per the vizualization

Viz_5 <- table(Problemset_4$Allegation_FADO_Type, Problemset_4$Incident_Year)
barplot((Viz_5),main="Complaints distributed over allegations each incident year", xlab="Incident Year", ylab="Number of Complaints",horiz = TRUE, col=c("coral4","coral3", "coral2","coral1","coral"), legend = rownames(Viz_5)) 

Distribution of complaints over mode of complaint filed and Incident year.

Compliants reported by via telephone remains the top mode of communiaction in NYC area.

Viz_6 <- table(Problemset_4$Complaint_Filed_Mode, Problemset_4$Incident_Year)
barplot((Viz_6),main="Complaints filed mode each incident year", xlab="Incident Year", ylab="Number of Complaints",horiz = FALSE, col=c(Legend_color), legend = rownames(Viz_6))

Ditribution of incidents over NYC areas

Over a period of timeline we can see that there’s a decrease in incident rates from 2010 onwards.

viz_7 <- unique(Problemset_4[c("UniqueComplaintId","Incident_Year","Borough_of_Occurrence")])
viz_7 <- data.frame(viz_7)
ggplot(viz_7,aes(viz_7$Incident_Year,fill=viz_7$Borough_of_Occurrence))+geom_bar()+labs(title="Incidents overtime by borough in NYC", x="Incident Year", y="Count of Incidents")+theme(legend.title=element_blank())

Since Bronx is the most crime centric borough from above analysis lets take a deep look into the typr of crimes happening.

Arrests in Bronx area is proportionaltely declining with incidents.

viz_8 <- sqldf("select * from Problemset_4 where Borough_of_Occurrence = 'Bronx'")
viz_8 <- data.frame(viz_8)
ggplot(viz_8,aes(viz_8$Incident_Year,fill=viz_8$Encounter_Outcome))+geom_bar()+labs(title="Criminal outcome in Bronx", x="Incident Year", y="Count of Incidents")+theme(legend.title=element_blank())

Incident location for crimes in Bronx which identifies the more prone location for Criminal activities.

Based on the below vizualization, it appears that PARKS are the most common place where most of the incidents occur and this information is helpful to caution public in parks.

VIZ_9 <- ggplot(viz_8,aes(viz_8$Incident_Year,fill=viz_8$Incident_Location))+geom_bar()+labs(title="Criminal Activities by loaction in Bronx", x="Incident Year", y="Count of Incidents")+theme(legend.title=element_blank())
VIZ_9

Crimes involving guns by borough

The below Pie chart details the most gun violence prone boroughs, as we see 38% of the gun violence in NYC is concentrated in Brooklyn at 38% and next ranked is Bronx at 23%

viz_10 <- sqldf("select * from Problemset_4 where Allegation_description like '%Gun%'")
Viz_10_count <- sqldf("SElect Count(*) as No_of_Incidents, Borough_of_Occurrence from viz_10 Group by Borough_of_Occurrence")
pct <-  round(Viz_10_count$No_of_Incidents/sum(Viz_10_count$No_of_Incidents)*100)
lbls <- paste(pct,"%")
lbls<- paste(Viz_10_count$Borough_of_Occurrence,lbls)
pie(Viz_10_count$No_of_Incidents,labels = lbls, col=c(Legend_color),main="Pie Chart of Borough with Gun involved Crimes",cex=.5)