Motivation

We want to try and answer the general question: What happens when there is a spike in arrests by the NYPD?

We will start with the NYPD Arrests Data (Historic) data from NYC Open Data and conduct some exploratory data analysis to find out how arrests are distributed in general. We will explore trends like for example investigating seasonality trends or trends in particular kinds of arrest or by boroughs.

To dig into the research question further, we will include data from the Civilian Complaint Review Board. We will look at the number and types of complaints received and see if there is any link to the number of arrests by the NYPD during the period of study.

[Optional] We will then supplement our study with sentiment analysis based on data extracted via the Twitter API. We will search for NYPD related tweets for the study period of interest (specify years) and perform sentiment analysis in order to score the tweets and match them against the number of arrests.

Overall, amongst the exploratory analysis we hope to uncover what happens when there is a spike in arrests by the NYPD and whether there is civilian discontent as expressed by complaints filed against the police force? We will provide some analysis around this by conducting the following hypothesis test:

\[H0: There\;is\;no\; change\;in \; public\; discontent \; following \; NYPD \; arrests\] \[HA: There\; is\; a\; change\; in \; public \; discontent \; following \; NYPD \; arrests\]

Datasets

Our analysis will be around 2 datasets.

NYPD Arrests

List of every arrest in NYC going back to 2006. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, suspect demographic, the location and time of enforcement.

NYPD Arrests Data Historic

Dataset desctiption

List of every arrest in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every arrest effected in NYC by the NYPD going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement. In addition, information related to suspect demographics is also included. This data can be used by the public to explore the nature of police enforcement activity. Please refer to the attached data footnotes for additional information about this dataset. This dataset has about 4.8M rows and 18 colums.

We will access it with via the Socrata API as shown below. The data shown below is from a subset of the NYPD data of year 2019. The full dataset is very large and we will use a downsampled version for our analysis.

## install.packages("RSocrata")
library("RSocrata")

year_to_date <- "https://data.cityofnewyork.us/resource/uip8-fykc.json"
nypd_fullset <- "https://data.cityofnewyork.us/resource/8h9b-rp9u.json"
nypd <- read.socrata(year_to_date, app_token = app_token)
showtable(nypd,"NYPD Data", 10)
NYPD Data
arrest_key arrest_date pd_cd pd_desc ky_cd ofns_desc law_code law_cat_cd arrest_boro arrest_precinct jurisdiction_code age_group perp_sex perp_race x_coord_cd y_coord_cd latitude longitude
203078287 2019-09-30 397 ROBBERY,OPEN AREA UNCLASSIFIED 105.0 ROBBERY PL 1600500 F M 9 0 25-44 M BLACK HISPANIC 990563 203120 40.72420015400007 -73.97722564299994
203072424 2019-09-30 268 CRIMINAL MIS 2 & 3 121.0 CRIMINAL MISCHIEF & RELATED OF PL 1450502 F Q 113 0 18-24 M BLACK 1040611 190715 40.68997415500007 -73.79676854399997
203061215 2019-09-30 905 INTOXICATED DRIVING,ALCOHOL 347.0 INTOXICATED & IMPAIRED DRIVING VTL11920U3 M S 122 0 18-24 M WHITE 962989 160112 40.60612948000005 -74.07657042999993
203061218 2019-09-30 397 ROBBERY,OPEN AREA UNCLASSIFIED 105.0 ROBBERY PL 1601502 F S 120 0 25-44 M WHITE HISPANIC 962822 174282 40.645022746000045 -74.077216847
203063729 2019-09-30 397 ROBBERY,OPEN AREA UNCLASSIFIED 105.0 ROBBERY PL 1601001 F Q 106 0 18-24 M ASIAN / PACIFIC ISLANDER 1035521 188624 40.68426580100004 -73.81513849899994
203066320 2019-09-30 109 ASSAULT 2,1,UNCLASSIFIED 106.0 FELONY ASSAULT PL 1200502 F B 47 0 25-44 M BLACK 1021271 257055 40.87216165000007 -73.86614123499999
203066328 2019-09-30 779 PUBLIC ADMINISTRATION,UNCLASSI 126.0 MISCELLANEOUS PENAL LAW PL 215510B F K 63 0 45-64 M BLACK 1000521 168264 40.62851560000007 -73.94138369799998
203068778 2019-09-30 922 TRAFFIC,UNCLASSIFIED MISDEMEAN 348.0 VEHICLE AND TRAFFIC LAWS VTL0511001 M K 70 0 45-64 M BLACK 995070 176121 40.65008972100002 -73.96100881399997
203070783 2019-09-30 175 SEXUAL ABUSE 3,2 233.0 SEX CRIMES PL 13052A1 M M 25 0 45-64 M BLACK 1000555 230994 40.800694331000045 -73.94110928599997
203072401 2019-09-30 101 ASSAULT 3 344.0 ASSAULT 3 & RELATED OFFENSES PL 1200001 M B 43 0 25-44 F BLACK 1022817 238673 40.82170190800008 -73.86065695699995

Civilian Complaint Review Board (CCRB)

List of complaints against New York City police officers alleging the use of excessive or unnecessary force, abuse of authority, discourtesy, or the use of offensive language.

Civilian Complaint Review Board (CCRB) - Complaints Received

Dataset description

The New York City Civilian Complaint Review Board (CCRB) is an independent agency. It is empowered to receive, investigate, mediate, hear, make findings, and recommend action on complaints against New York City police officers alleging the use of excessive or unnecessary force, abuse of authority, discourtesy, or the use of offensive language. The Boardโ€™s investigative staff, composed entirely of civilian employees, conducts investigations in an impartial fashion. The Board forwards its findings to the police commissioner. This dataset is about ~240K rows and 10 colums.

library(readr)
ccrb <- read_csv("https://raw.githubusercontent.com/dhairavc/DATA607-FinalProject/master/ccrb.csv", col_names = TRUE)
showtable(ccrb, "CCRB Data", 10)
CCRB Data
Extract Run Date Randomized Id CCRB Received Year Days Between Incident Date and Received Date Case Type Complaint Received Place Complaint Received Mode Borough Of Incident Patrol Borough Of Incident Reason For Initial Contact
05/25/2018 1 2000 2 IAB CCRB Phone Bronx Bronx PD suspected C/V of violation/crime - street
05/25/2018 2 2000 86 OCD Other NYPD unit In-person Brooklyn Brooklyn North Parking violation
05/25/2018 3 2000 0 OCD CCRB Phone Queens Other NA
05/25/2018 4 2000 0 OCD Other NYPD unit Phone Bronx Bronx NA
05/25/2018 5 2000 117 OCD CCRB Phone Manhattan Manhattan North Traffic accident
05/25/2018 6 2000 1 CCRB CCRB Phone Brooklyn Brooklyn South Other
05/25/2018 7 2000 15 OCD CCRB Phone Brooklyn Brooklyn South Other violation of VTL
05/25/2018 8 2000 10 CCRB IAB Phone Bronx Bronx PD suspected C/V of violation/crime - street
05/25/2018 9 2000 11 CCRB IAB Phone Brooklyn Brooklyn North NA
05/25/2018 10 2000 6 CCRB Precinct In-person Brooklyn Brooklyn North PD suspected C/V of violation/crime - auto

Exploratory Data Analysis

We will look at:
- Trends in seasonality of arrests
- Trends in types of arrests
- Trends in location of arrests
- Geographical distribution of arrests in NYC
- Trends in types of complaints
- Trends in reporting method of complaints

Data Transformation

Our two datasets share two main variables: arrest date and borough of incident. The date variable differs slightly between datasets. The NYPD Data is in the MM/DD/YYYY format and ranges from 01/01/2006 to 12/31/2018. The CCRB data only provides the year in the YYYY format ranging from 2000 to 2017. We will need to split the date variable in order join the datasets on the year.

In order to combine data from the two sources, we will need aggregate the variables by year for ease of comparison.

Data Analysis

We will try to build a model to predit crime rates and predict the subsequent effect of compliants filed against the NYPD. We will model this prediction using linear regression, etc.

Workflow scheme

The diagram below is a representation of our overall methodology. Two factors were determinant in how the team approached this project;

So we set a preliminary timeframe plan with the required tasks to keep them team on track and easy for us to track progress.