Charles Palmer
Between the years of 1999 and 2016, 204397 complaints and allegations of excessive force, abuse of authority, discourtesy or offensive language were reported across the five boroughs of New York City. These alligations and complaints were reported to and collected by the NYC Civilian Complaint Review Board.
For this Exploratory Data Analysis, I’ve presented 10 data visualizations to summerize a few of the primary characteristics within the dataset.
summary(ccrb_raw)
## DateStamp UniqueComplaintId Close Year Received Year
## Length:204397 Min. : 1 Min. :2006 Min. :1999
## Class :character 1st Qu.:17356 1st Qu.:2008 1st Qu.:2007
## Mode :character Median :34794 Median :2010 Median :2009
## Mean :34778 Mean :2010 Mean :2010
## 3rd Qu.:52204 3rd Qu.:2013 3rd Qu.:2012
## Max. :69492 Max. :2016 Max. :2016
## Borough of Occurrence Is Full Investigation Complaint Has Video Evidence
## Length:204397 Mode :logical Mode :logical
## Class :character FALSE:107084 FALSE:195530
## Mode :character TRUE :97313 TRUE :8867
## NA's :0 NA's :0
##
##
## Complaint Filed Mode Complaint Filed Place
## Length:204397 Length:204397
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Complaint Contains Stop & Frisk Allegations Incident Location
## Mode :logical Length:204397
## FALSE:119856 Class :character
## TRUE :84541 Mode :character
## NA's :0
##
##
## Incident Year Encounter Outcome Reason For Initial Contact
## Min. :1999 Length:204397 Length:204397
## 1st Qu.:2007 Class :character Class :character
## Median :2009 Mode :character Mode :character
## Mean :2010
## 3rd Qu.:2012
## Max. :2016
## Allegation FADO Type Allegation Description
## Length:204397 Length:204397
## Class :character Class :character
## Mode :character Mode :character
##
##
##
Vis 1: Incident Reports
A historgram of the incident complaints received across the five boroughs of New York City between the years. This image shows two things; 1) incident reports have steadily declined since 2007 and 2) although the range of incidents occurred between 1999 and 2016, incidents weren’t officially reported until 2004.
hist(ccrb_boros$`Received Year`, main="#1 Histogram for Received Year", xlab="Received Year", border="black", breaks=18, col="green", las=3)
Vis 2: Summary of all Incidents by Borough in 2010
The original data provided by the Civilian Complain Review Board included complaints from the five boroughs, a category titled ‘Outside NYC’ and ‘NA’. For this vizualizations Outside NYC and NA have been removed to focus on the known origin of the complaints. I am also only presenting the 2010 reports. The goal was to compare this data against the 2010 Census report to see if the density of reporting match the borough populations. Unfortunately I could not layer this additional information on the bar chart, so I have included the census data (as a percentage of the region) on a separate chart.
It is clear that Brooklyn has the most complaints and this makes sense the borough has 30.6% of the population. Surprisingly, Queens, at 27.3% of the population, should see a higher number of reports. Much higher than the results show.
p <- ggplot(ccrb_2010, aes(ccrb_2010$'Borough of Occurrence')) + geom_bar(stat="count") + labs(title = "#2 2010 Complaints and Allegations", x = "Boroughs", y = "", subtitle = "An accounting of all allegations by NYC Borough", caption = "(Source: Civilian Complain Review Board (CCRB))") + theme_light()
p
dat1 <- data.frame(census = c(16.9, 30.6, 19.4, 27.3, 5.7), boro = factor(c("Bronx","Brooklyn","Manhattan","Queens","Staten Island"))
)
c <- ggplot(data=dat1, aes(x=boro, y=census)) +
geom_line() +
geom_point()
c
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
Vis 3: Filing Mode
Direct telephone calls are the primary source for most complaints since 2004. Given the number of cellphone in use since 2008, I am not surprised by these results.
p <- ggplot(ccrb_2004, aes(ccrb_2004$'Complaint Filed Mode')) + geom_bar() + labs(title = "#3 Filing Mode", x = "Boroughs", y = "Number of reports", subtitle = "A comparison of how complaints were filed", caption = "(Source: Civilian Complain Review Board (CCRB))")
p
Vis 4: Filing modes over time
Where Vis 3 showed that most complaints have been received by phone, this stacked bar chart highlights that the reporting of incidents has been declining over the last decade. Although this image does not provide any new insights, it does reinforce the results identified in Vis 1 & 3. The addition of color helps illustrate the point.
p <- ggplot(data = ccrb_2004, aes(ccrb_2004$`Received Year`)) + geom_bar(aes(fill = ccrb_2004$`Complaint Filed Mode`)) + scale_fill_discrete(name = "Reporting type") + labs(title = "#4 Filing Mode over time", x = "Year Complaint Received", y = "Report count", subtitle = "A breakdown of how complaints were filed", caption = "(Source: Civilian Complain Review Board (CCRB))") + theme_light()
p
Vis 5: Summary of Stop and Frisk Allegations by Borough
Stop and frisk has been a very controversial issue over the years. When compared with Viz 2, this chart shows that the occurance of these types of reports is consistent with the overall frequency of all reporting. Surprising, I would have expected a steeper dropoff of these incidents given that Stop and Frisk policy was deemed unconstitutional by a federal judge in August of 2013. But as the chart shows, the practice was still being employed.
p <- ggplot(ccrb_frisk, aes(ccrb_frisk$`Received Year`)) + geom_bar() + scale_fill_discrete(name = "Reporting type") + labs(title = "#5 Stop & Frisk Allegations", x = "Year Complaint Received", y = "Report count", subtitle = "A comparison of Stop & Frisk complaints over time", caption = "(Source: Civilian Complain Review Board (CCRB))") + theme_light()
p
Vis 6: Allegations by Borough
This visualization illustrates the breakdown of various allegation types for each borough. It shows that ‘Abuse of Authority’ represents the bulk of complaints from all five boroughs. Personally, I was surprised by the consistency of the complaints across the region.
p <- ggplot(ccrb_boros, aes(ccrb_boros$`Borough of Occurrence`, fill = ccrb_boros$`Allegation FADO Type`)) + geom_bar(position = "fill") + labs(title = "#6 Allegations by Borough", x = "Boroughs", y = "Percentage") + scale_fill_discrete(name = "Allegation Types")
p
Vis 7: Presence of video evidence
With the proliferation of cellphones, 2011-2015 saw a rise in the release of supporting video evidence. Although 2016 saw a reduction in this evidence, continued study would be needed to determine if this is a new trend. It should also be noted that in April of 2017, the first group of NYC Police Officers were issued “body cams”. With these devices in place we should see a marked increase in video evidence in the coming years.
p <- ggplot(data = ccrb_2004, aes(as.factor(ccrb_2004$`Received Year`))) + geom_bar(aes(fill = ccrb_2004$`Complaint Has Video Evidence`)) + labs(title = "#7 Presence of Video Evidence",x = "Year Complaint Received", y="") + scale_fill_manual(name = "Video evidence", values=c("#999999", "#E69F00"))
p
Vis 8: Encounter outcome
Here we have an illustration representing a summary of the outcome of the 203,744 reports from the five boroughs.
p <- ggplot(ccrb_boros, aes(x=ccrb_boros$'Encounter Outcome')) + geom_bar() + labs(title = "#8 Encounter outcome",x = "", y="", subtitle = "An illustration of the encounter outcomes")
p
Vis 9: Incident Locations
Another summary indicating where the incidents took place. Here we see that and incident is more likely to be reported in on a public street or highway than anywhere else.
ggplot(ccrb_boros, aes(ccrb_boros$'Incident Location')) + geom_histogram(stat="count") + coord_flip() + labs(title = "#9 Incident Locations", x = "", y="Incidents per location", subtitle = "A general description of the incident location")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
Vis 10: Incidents by Borough of Occurrence
And finally, here is a box-plot comparing the incidents per year per borough.
p <- ggplot(ccrb_boros, aes(y=ccrb_boros$`Incident Year`,x=ccrb_boros$`Borough of Occurrence`)) + geom_boxplot() + labs(title = "#10 Incidents across the Boroughs", y = "Incident Years", x = "New York Boroughs") + coord_flip()
p