Introduction

Stop and frisk (also known as stop, question, frisk by the NYPD) has been a controversial policing tactic. In 2013, Floyd v City of New York, the ACLU alleged that this program was unconstitutional and discriminated base on race. Eventually, the city was allowed to proceed with an overhaul of the policing system in NYC. However, I’m going to investigate if those changes truly changed the NYPD’s policies of enforcing equally across race.

Data

Research Question

From the self-reported NYPD data, is there evidence of racialized policing? We will use a \(\alpha = .01\) for our confidence level.

\(H_0:\) The chance of being stopped by the police is independent from a person’s race.
\(H_A:\) The chance of being stopped by the police is not independent from a person’s race.

Observations

I downloaded the 2016 arrest data from NYC Open Data as it was the most recent data available. It contains the records of 1.7 million arrests. That’s almost 1/4 peolple in NYC! From there, I took a simple random sample of these arrests using the Linux kernel utility shuf. At 260 MB the original file was far too large to read into memory at once. Additionally, a large sample size will lead to artificially high \(X^2\) values. Instead, I used a linux utility shuf to randomly collect a sample of 300 arrests using the below line of code.

shuf -n 300 big-sqf-2016.csv > sqf-2016.csv

Data Collection

Demographic data was obtained from the 2010 census records and the 2017 data was randomly sampled from all arrest data in NYC. I then recorded the race of each person

Type of Study

This is an observational study that will attempt to relate arrest rates to the race of the person being arrested. The relevant statistics are arrest records and racial demographics.

Scope of Inference

Our data is within the scope of our model. Our observations are a simple random sample of all arrests. Each cell in the contigency table has more than 5 expected cases. If assume that each arrest is independent of the others, than these are independent events. Additionally, because the 2010 census data was the most recent available, large demographic changes from these numbers would influence the results. However, the rapid gentrification of the city would suggest a whiter population, which would amplify any existing disparity beyond what we measure here. Pearson’s Chi-Sq test is appropriate because it relates two distributions by measuring the distance between two numbers in a given category, squaring that figure, and dividing it by the expected value. This gives us a number that describes the scale of difference between these two numbers. Because that number is scaled by the expected value, the two datasets do not have to be normalized. Finally, in order to minimize the difference in scales between the two sets, I scaled the expected values based off the total number of arrests.

Exploratory Data Analysis

We can see the demographic breakdown of NYC by race below.

Likewise, we can see the same categories when it comes to policing. Even from these simple bar graphs, we can see a substantial difference in these numbers. In particular, black people are twice as likely to be stopped as their population suggests. Additionally, white people are a fraction of likely to be stopped by police as their population would suggest. Furthermore, it’s striking that there were 1.7 million arrests in 2017! That’s nearly 1 arrest for every 4 New Yorkers.

Inference

## 
##  Pearson's Chi-squared test
## 
## data:  as.table(rbind(measured, expected))
## X-squared = 159.03, df = 3, p-value < 2.2e-16

Conclusions

Because the p-vaule of \(2.2*10^{-16} < .01,\) we can reject the null hypothesis. There appears to be evidence that your likelihood of being stopped by the police is correlated with one’s race. However, the lack of current demographic data makes the demographic data an estimate, which could skew the results. As discussed above, I’d expect the results to skew towards racialized policing as the city becomes more white. However, without annual demographic data, it is impossible to tell. Since the study focuses on stops rather than arrests, it highlights the collective experience of individual officers rather than the result of a long court system. To that end, it may be highlighting individualized racial biases rather than systemic policy. Additionally, other factors like socio-economic indicators and neighborhood could cause some underlying relationships. However, what’s clear from this data is that NYPD policing disproportionately effects communities of color whether by causal policy, systemic bias, or a confluence of factors outside the control of the NYPD. In order to parse one cause from another, it would be interesting to look out the break-down of crimes and criminal outcomes across race rather than just number of of stops. However, that was outside the scope of this project.

Sources :

Chi-Square Test

Source for expected values:2010 Census:
Source for measured values: NYC Open Data
Source for Stop and Frisk Lawsuit: Floyd v City of New York