library(readxl)
library(ggplot2)
ccrb = read_excel("/Users/Jade/Desktop/Harrisburg University/Second Semester/Data Visualization ANLY 512 2017LateFall/Visual Data Exploratio (VDE)/ccrb.xlsx", sheet = "Complaints_Allegations")
I would like to know the number of each type of allegation. This can help to better understand what’s the most complained type of allegation of NYPD. After running and analyzing this Viz, I think NYPD need to educate their police officers how to use their authority correctly (within a reasonable range).
ggplot(ccrb, aes(x=ccrb$"Allegation FADO Type", fill=ccrb$"Allegation FADO Type")) +
geom_bar(stat = "count") +
labs(title="Number of Complain by Allegation Type", x="Allegation Type", y="Number of Complain") +
scale_fill_discrete(name="Allegation Type")
I would like to know how many complains NYPD received each year. By running this Viz, I found that there is a decreasing trend in the past few years regard to the number of complains. As showing on the bar chart, number of complains increased tramadsely from 2005 to 2006. A futrue study of why the number of complains increased in 2006 can help NYPD improve the service.
ggplot(ccrb, aes(x=ccrb$"Received Year")) +
geom_bar(stat = "count", fill="blue") +
labs(title="How Many Complains Received Every Year", x="Received Year", y="Number of Complains")
I was wondering of why the total number of complaints dropped during the past few years. After running several analyses, The location of incident has some interesting trends. Based on the graph below, incidents happened on Street/highway have decreased tremendously. Given the fact that the number of other incidents didn’t changed a lot. The decreasing of incidents happened on Street/highway might be the reason why the overall numbers of complaints dropped.
ggplot(ccrb,aes(x=ccrb$"Incident Year",color=ccrb$"Incident Location"))+
geom_point(stat="count")+
labs(title="Frequency of Incident by Location", x="Year", y="Number") +
scale_fill_discrete(name="Location")+
theme(legend.position = "bottom")
Then I filtered out all Cases that happened on Street/highway. And ran a bar chart to see if ratio of Allegation FADO Type has changed while the incidents are decreasing. The answer is no. Which means the decreasing of Street/highway cases is not caused by an individual type of complaint dropped.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
sthw = filter(ccrb, ccrb$"Incident Location" == "Street/highway")
ggplot(sthw,aes(x=sthw$"Incident Year",fill=sthw$"Allegation FADO Type"))+
geom_bar(stat="count")+
labs(title="Frequency of Street/highway Incident by Type", x="Year", y="Number") +
scale_fill_discrete(name="Type")+
theme(legend.position = "bottom")
By running histogram of “Borough of Occurrence”, we can find that Brookyln has the most occurrence and Staten Island has the least occureence. Based on reaserch, Brookyln has the largest population, so it’s normal to see Brookyln has the most occurrence. Yet Queens has the second largest population amoung the boroughs, it ranked the 4th of occurrence.
ggplot(ccrb, aes(x=ccrb$"Borough of Occurrence")) +
geom_histogram(stat="count", fill="blue") +
labs(title="Frequency of Incident Occurence by Borough", x="Borough of Occurence", y="Frequence of Occurence")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
I created a new veriable called “Case Duration”, this ceriable is calculated as “Close Year - Received Year”. This variable indicates how long it takes to close a case. Based on the histogram, most cases closed within 2 years.
ccrb$"Case Duration" = ccrb$"Close Year" - ccrb$"Received Year"
ggplot(ccrb, aes(x=ccrb$"Case Duration")) +
geom_histogram(stat="count", fill="blue", binwidth = 0.1) +
labs(title="Case Duration (How long it takes to close a case)", x="Duration (year)", y="Frequence")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
I filtered out all records with case duration more than 2 years as I was trying to understand the reason why it took so long. First I Checked Case Type. Obviously “Force” is the number one type especially for cases with a duration more that 3 years.
cd2 = filter(ccrb, ccrb$"Case Duration" > 2)
ggplot(cd2, aes(x = cd2$"Case Duration", fill = cd2$"Allegation FADO Type")) +
geom_bar(stat = "count") +
labs(title="Frequency of Case Duration >2 by Case Type", x="Case Duration", y="Number") +
scale_fill_discrete(name="Case Duration")
Then I checked the frequency of case duration (>2) by if Complaint Has Video Evidence. Based on the result, most cases with duration more than 3 years do not has video evidence. So in order to decrease the duration of cases, video recorded devices are recommended.
ggplot(cd2, aes(x = cd2$"Case Duration", fill = cd2$"Complaint Has Video Evidence")) +
geom_bar(stat = "count") +
labs(title="Frequency of Case Duration >2 by if Complaint Has Video Evidence", x="Case Duration", y="Number") +
scale_fill_discrete(name="Complaint Has Video Evidence")
Then I ran a bar chart to see the trend of cases with video evidence. From the graph below, it’s clear that cases with video evidence increased since 2010. This is a good trend, since cases with video evidence have a higher full investigation rate.
ggplot(ccrb, aes(x=ccrb$"Received Year", fill=ccrb$"Complaint Has Video Evidence")) +
geom_bar(stat = "count") +
labs(title="Number of Cases Received Each Year by Evidence", x="Received Year", y="Number") +
scale_fill_discrete(name="Has Video Evidence")
When I further dig into how video evidence affects the investigation result. I found that cases with video evidence have a much higher full investigation rate that cases do not.
ggplot(ccrb, aes(x=ccrb$"Complaint Has Video Evidence",fill=ccrb$"Is Full Investigation")) +
geom_bar(stat="count") +
labs(title="Compaint has video evidence vs. Compaint is fully investigated", x="Complaint Has Video Evidence", y="Number")+
scale_fill_discrete(name="Is Full Investigation")
I am curious about how people submit their complaint and how does that changed over the past few years. By running the bar chart below, I found that phone is still the most popular method of submit complaints. Another trend is that the percentage of online submission is increasing during the past few years. So NYPD can optimize their website to make their service better.
ggplot(ccrb, aes(x=ccrb$"Received Year", fill=ccrb$"Complaint Filed Mode")) +
geom_bar(stat = "count") +
labs(title="Complaint Received Year by Filed Mode", x="Year", y="Number") +
scale_fill_discrete(name="Complaint Filed Mode")
After making more than 30 different Viz, I choose the most valuable 11 Viz and listed them above. This study is to find out some factors that NYPD can improve or have further studies on that affecting the complaints allegations. Based on the result:
1. The overall complains are decreasing because Street/highway cases are decreasing.
2. Most cases closed within 2 years. However, for those durations more than 2 years, “Force” is the most common Complain. NYPD need to improve the solving capability of solve “Force” cases. Additionally, cases without video evidence tend to have a longer duration. So, to decrease the duration of cases, video recorded devices are recommended.
3. Cases with video evidence have a higher change to get full investigation. As the overall number of cases with video evidence increase, a higher full investigation rate is expected.
4. Phone is still the most popular method of submit complaints. Another trend is that the percentage of online submission is increasing during the past few years. So, NYPD can optimize their website to make their service better.