We want to try and answer the general question: What happens when there is a spike in arrests by the NYPD?
We will start with the NYPD Arrests Data (Historic) data from NYC Open Data and conduct some exploratory data analysis to find out how arrests are distributed in general. We will explore trends like for example investigating seasonality trends or trends in particular kinds of arrest or by boroughs.
To dig into the research question further, we will include data from the Civilian Complaint Review Board. We will look at the number and types of complaints received and see if there is any link to the number of arrests by the NYPD during the period of study.
[Optional] We will then supplement our study with sentiment analysis based on data extracted via the Twitter API. We will search for NYPD related tweets for the study period of interest (specify years) and perform sentiment analysis in order to score the tweets and match them against the number of arrests.
Overall, amongst the exploratory analysis we hope to uncover what happens when there is a spike in arrests by the NYPD and whether there is civilian discontent as expressed by complaints filed against the police force? We will provide some analysis around this by conducting the following hypothesis test:
\[H0: There\;is\;no\; change\;in \; public\; discontent \; following \; NYPD \; arrests\] \[HA: There\; is\; a\; change\; in \; public \; discontent \; following \; NYPD \; arrests\]
Our analysis will be around 2 datasets.
List of every arrest in NYC going back to 2006. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, suspect demographic, the location and time of enforcement.
List of every arrest in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every arrest effected in NYC by the NYPD going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement. In addition, information related to suspect demographics is also included. This data can be used by the public to explore the nature of police enforcement activity. Please refer to the attached data footnotes for additional information about this dataset. This dataset has about 4.8M rows and 18 colums.
We will access it with via the Socrata API as shown below. The data shown below is from a subset of the NYPD data of year 2019. The full dataset is very large and we will use a downsampled version for our analysis.
## install.packages("RSocrata")
library("RSocrata")
year_to_date <- "https://data.cityofnewyork.us/resource/uip8-fykc.json"
nypd_fullset <- "https://data.cityofnewyork.us/resource/8h9b-rp9u.json"
nypd <- read.socrata(year_to_date, app_token = app_token)
showtable(nypd,"NYPD Data", 10)
| arrest_key | arrest_date | pd_cd | pd_desc | ky_cd | ofns_desc | law_code | law_cat_cd | arrest_boro | arrest_precinct | jurisdiction_code | age_group | perp_sex | perp_race | x_coord_cd | y_coord_cd | latitude | longitude |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 203078287 | 2019-09-30 | 397 | ROBBERY,OPEN AREA UNCLASSIFIED | 105.0 | ROBBERY | PL 1600500 | F | M | 9 | 0 | 25-44 | M | BLACK HISPANIC | 990563 | 203120 | 40.72420015400007 | -73.97722564299994 |
| 203072424 | 2019-09-30 | 268 | CRIMINAL MIS 2 & 3 | 121.0 | CRIMINAL MISCHIEF & RELATED OF | PL 1450502 | F | Q | 113 | 0 | 18-24 | M | BLACK | 1040611 | 190715 | 40.68997415500007 | -73.79676854399997 |
| 203061215 | 2019-09-30 | 905 | INTOXICATED DRIVING,ALCOHOL | 347.0 | INTOXICATED & IMPAIRED DRIVING | VTL11920U3 | M | S | 122 | 0 | 18-24 | M | WHITE | 962989 | 160112 | 40.60612948000005 | -74.07657042999993 |
| 203061218 | 2019-09-30 | 397 | ROBBERY,OPEN AREA UNCLASSIFIED | 105.0 | ROBBERY | PL 1601502 | F | S | 120 | 0 | 25-44 | M | WHITE HISPANIC | 962822 | 174282 | 40.645022746000045 | -74.077216847 |
| 203063729 | 2019-09-30 | 397 | ROBBERY,OPEN AREA UNCLASSIFIED | 105.0 | ROBBERY | PL 1601001 | F | Q | 106 | 0 | 18-24 | M | ASIAN / PACIFIC ISLANDER | 1035521 | 188624 | 40.68426580100004 | -73.81513849899994 |
| 203066320 | 2019-09-30 | 109 | ASSAULT 2,1,UNCLASSIFIED | 106.0 | FELONY ASSAULT | PL 1200502 | F | B | 47 | 0 | 25-44 | M | BLACK | 1021271 | 257055 | 40.87216165000007 | -73.86614123499999 |
| 203066328 | 2019-09-30 | 779 | PUBLIC ADMINISTRATION,UNCLASSI | 126.0 | MISCELLANEOUS PENAL LAW | PL 215510B | F | K | 63 | 0 | 45-64 | M | BLACK | 1000521 | 168264 | 40.62851560000007 | -73.94138369799998 |
| 203068778 | 2019-09-30 | 922 | TRAFFIC,UNCLASSIFIED MISDEMEAN | 348.0 | VEHICLE AND TRAFFIC LAWS | VTL0511001 | M | K | 70 | 0 | 45-64 | M | BLACK | 995070 | 176121 | 40.65008972100002 | -73.96100881399997 |
| 203070783 | 2019-09-30 | 175 | SEXUAL ABUSE 3,2 | 233.0 | SEX CRIMES | PL 13052A1 | M | M | 25 | 0 | 45-64 | M | BLACK | 1000555 | 230994 | 40.800694331000045 | -73.94110928599997 |
| 203072401 | 2019-09-30 | 101 | ASSAULT 3 | 344.0 | ASSAULT 3 & RELATED OFFENSES | PL 1200001 | M | B | 43 | 0 | 25-44 | F | BLACK | 1022817 | 238673 | 40.82170190800008 | -73.86065695699995 |
List of complaints against New York City police officers alleging the use of excessive or unnecessary force, abuse of authority, discourtesy, or the use of offensive language.
Civilian Complaint Review Board (CCRB) - Complaints Received
The New York City Civilian Complaint Review Board (CCRB) is an independent agency. It is empowered to receive, investigate, mediate, hear, make findings, and recommend action on complaints against New York City police officers alleging the use of excessive or unnecessary force, abuse of authority, discourtesy, or the use of offensive language. The Boardโs investigative staff, composed entirely of civilian employees, conducts investigations in an impartial fashion. The Board forwards its findings to the police commissioner. This dataset is about ~240K rows and 10 colums.
library(readr)
ccrb <- read_csv("https://raw.githubusercontent.com/dhairavc/DATA607-FinalProject/master/ccrb.csv", col_names = TRUE)
showtable(ccrb, "CCRB Data", 10)
| Extract Run Date | Randomized Id | CCRB Received Year | Days Between Incident Date and Received Date | Case Type | Complaint Received Place | Complaint Received Mode | Borough Of Incident | Patrol Borough Of Incident | Reason For Initial Contact |
|---|---|---|---|---|---|---|---|---|---|
| 05/25/2018 | 1 | 2000 | 2 | IAB | CCRB | Phone | Bronx | Bronx | PD suspected C/V of violation/crime - street |
| 05/25/2018 | 2 | 2000 | 86 | OCD | Other NYPD unit | In-person | Brooklyn | Brooklyn North | Parking violation |
| 05/25/2018 | 3 | 2000 | 0 | OCD | CCRB | Phone | Queens | Other | NA |
| 05/25/2018 | 4 | 2000 | 0 | OCD | Other NYPD unit | Phone | Bronx | Bronx | NA |
| 05/25/2018 | 5 | 2000 | 117 | OCD | CCRB | Phone | Manhattan | Manhattan North | Traffic accident |
| 05/25/2018 | 6 | 2000 | 1 | CCRB | CCRB | Phone | Brooklyn | Brooklyn South | Other |
| 05/25/2018 | 7 | 2000 | 15 | OCD | CCRB | Phone | Brooklyn | Brooklyn South | Other violation of VTL |
| 05/25/2018 | 8 | 2000 | 10 | CCRB | IAB | Phone | Bronx | Bronx | PD suspected C/V of violation/crime - street |
| 05/25/2018 | 9 | 2000 | 11 | CCRB | IAB | Phone | Brooklyn | Brooklyn North | NA |
| 05/25/2018 | 10 | 2000 | 6 | CCRB | Precinct | In-person | Brooklyn | Brooklyn North | PD suspected C/V of violation/crime - auto |
We will look at:
- Trends in seasonality of arrests
- Trends in types of arrests
- Trends in location of arrests
- Geographical distribution of arrests in NYC
- Trends in types of complaints
- Trends in reporting method of complaints
Our two datasets share two main variables: arrest date and borough of incident. The date variable differs slightly between datasets. The NYPD Data is in the MM/DD/YYYY format and ranges from 01/01/2006 to 12/31/2018. The CCRB data only provides the year in the YYYY format ranging from 2000 to 2017. We will need to split the date variable in order join the datasets on the year.
In order to combine data from the two sources, we will need aggregate the variables by year for ease of comparison.
We will try to build a model to predit crime rates and predict the subsequent effect of compliants filed against the NYPD. We will model this prediction using linear regression, etc.
The diagram below is a representation of our overall methodology. Two factors were determinant in how the team approached this project;
So we set a preliminary timeframe plan with the required tasks to keep them team on track and easy for us to track progress.