In order to perform an EDA on this dataset, I will mostly use the ggplot package. Because the data is mostly categorical, we will have pie charts and bar graphs to show breakdowns of different categories. We will also use bar graphs to see trends over the years based on the year of reception of the complaint. We will explore the data under several angles: the types of incidents, the outcomes of the complaints, whether the complaints involve “stop&frisk”, etc. We will check out these instances not only overall (accross the board), but also broken down by borough and year of reception of the complaint to see if there is any trend within these boundaries. Then we will discuss any major findings we come accross.

1- Number of allegations

We have a total of 204,397 allegations in our dataset. Let’s see how they are broken down by borough and how the number of allegations reported has changed over the years.

1.a- By borough

In this pie chart, We notice that most of the allegations occured in Brooklyn, followed by Bronx then Manhattan.

1.b- By year received

The bar chart shows that there was a huge increase in complaints received between 2005 and 2006. Since then we have witnessed a gradual decrease in the number of complaints.

2- Incidents breakdown

In this section, we take a look at how many allegations of each type of incident we have; from the FADO category - force, abuse of authority, discourtesy, and offensive language. We first look at the big picture, then break it down under different categories.

2.a- Overall

Almost half of all the incidents reported are Abuse of Authority. Force and discourtesy are distant seconds and we barely have any reported cases of offensive language.

2.b- By borough

The breakdown of the types of allegations reported is pretty even throughout all the boroughs.

2.c- By incident location

Most of the incidents reported happened on the street or highway.

2.d- By year received

The incidents reported broken down by type of allegations follows a similar pattern for each year. We notice mostly abuse of authority, then force and discourtesy.

3- Encounter outcome

In the third section, we look at the outcome of all the incidents reported in our dataset. The different outcomes are arrests, arrest or summons, summons, and other. We look at an overall visualization, then we take a deeper look by borough, incident location, incident type, and year to see if we can identify any pattern.

3.a- Overall

We notice that most of the allegations either end up in arrests or no arrest or summons. In approximately 15-20% of cases, we encouter summons.

3.b- By borough

Here also we notice a similar breakdown of the outcomes in all the boroughs.

3.c- By incident location

Here the outcome breakdown by location is pretty even as well except for one thing. We do notice that events occuring in a police building tend to result in more arrests.

3.d- By allegation type

For the outcome breakdown by incident type, it seems that Force resulted in more arrests while the other types resulted mostly in no arrest or summons.

3.e- By year received

The outcome breakdown seems to be the same throughout the years

4- Complaint of stop and frisk

In this section, I am particularly interested in the “Stop&Frisk” incidents because it could potentially be an angle to work on during the hypothesis tetsing part of a project. It would be interesting to see if there is any singular pattern that we can identify. Therefore, we look at the number of incidents that include Stop&Frisk" and we see how that compares with the number of Stop&Frisk" reported under more specific circumstances (by borough or by year).

4.a- Overall

More than half of the incidents reported dont include any case of “Stop and Frisk”.

4.b- By borough

There seems to be no significant difference in whether or not people report more “Stop and Frisk” cases based on the borough.

4.c- By year received

In a similar way, it looks like there is also no difference in whether or not people report more “Stop and Frisk” cases over these 10 years.

5- Number of allegations by incident location

Let us take a look at the incident locations in order to see if there is a pattern in where most of them happen.

We see that most incidents happened on the highway. The distant second location where most incidents happened is the apartment.

6- Complaint filing mode

Let’s take a look at people’s preferred way of reporting an incident.

6.a- Overall

Most complaints are filed by phone, followed by the call processing system.

6.b- By year received

We noticed that the phone is still the preferred method of submitting complaints. This is a bit of a surprise, as we would expect more complaints to be submitted online as the years go and technology makes online submissions more convenient.

EDA Summary

At the end of our exploratory data analysis, we identified few trends. First we have the significant increase in the total number of allegations from 2005 to 2006 that could potentially be investigated. Another one is the significantly higher number of complaints reported coming from Brooklyn. It would be good idea to dive into it and see why that is the case. We also notice that there has not been any major changes in the breakdown of incidents, incident locations, or filing mode over the course of a decade. This EDA enables us to develop several ideas for further hypotheses testings to conduct and to confirm the statistical significance of some of the trends (or lack thereof) that we identified from our visualizations.

Problem Set 4 - Exploratory Data Analysis

Ahien C. Djouka

10/9/2017