Project Proposal

I intend to investigate the link between policing and race by examinging the distribution of arrests by race and comparing that to the population by race.

\(H_0:\) The NYPD does not target certain racial groups for enforcement more than others.

\(H_A:\) The NYPD does target certain racial groups for enforcement nore than others.

Data Preparation

stops <- read.csv("sqf-2017.csv")
with(stops, table(stops$race, stops$year));
##    
##     2016
##        0
##   A  737
##   B 6498
##   I   38
##   P  873
##   Q 2753
##   U   95
##   W 1270
##   Z  140
stop.amer.ind <- 38
stop.asian.pacific <-737
stop.white <- 1270
stop.hispanic <- 873 + 2753 
stop.black <-6498+873
stops <- c(stop.amer.ind, stop.asian.pacific, stop.black, stop.hispanic, stop.white) 


stop.total <- sum(stops)

total <- (57512 + 3597341 + 2158851 + 1038388 + 5147 + 2088510 + 177225) #This is different than population total because census data treats 'Hispanic' separately than race, leading to double counting of ~2 million people. This is fine for my purposes however as the census data still accurately reports who identifies as 'black'  and 'white' or 'Hispanic' and people are free to choose multiple options. Because of the discrepancies in racial bookeeping in the agencies, each hispanic person will be double counted as either white or black from the information provided by the NYPD. This ensures that a person's bi-racial status does not affect their race or the independence between sets.


exp.amer.ind <- 57512
exp.white <- 3597341
exp.asian.pacific <- (1038388+5147)
exp.black <- 2088510
exp.hispanic <-(2158851+177225)



measured <- c(stop.amer.ind, stop.asian.pacific, stop.black, stop.hispanic, stop.white)
expected <- c(exp.amer.ind, exp.asian.pacific, exp.black, exp.hispanic, exp.white)/total*stop.total
expected.total <- sum(expected)

barplot(measured)

barplot(expected)

top <- (measured-expected)^2
sum(top/expected)
## [1] 9787.617
pchisq(900, df = 4, lower.tail=FALSE)
## [1] 1.665941e-193

Research Question

From the self-reported NYPD data, is there evidence of racialized policing?

Observations

I downloaded the 2017 arrest data from NYC Open Data. It contains the records of 1.7 million arrests. That’s almost 1/4 peolple in NYC! From there, I took a simple random sample of these arrests using the Linux kernel utility shuf. At 260 MB the original file was far too large to read into memory at once.

shuf -n 60000 NYPD_Arrests_Data__Historic_.csv > arrest_sample.csv

Data Collection

Demographic data was obtained from the 2010 census records and the 2017 data was randomly sampled from all arrest data in NYC.

Type of Study

This is an observational study.

Response Variable

Number of Stops/ Race # Explanatory Variable Race

Relevant Statistics

chi.sq.statistic <- sum(((measured-expected)^2)/expected)
pchisq(chi.sq.statistic, df = 4, lower.tail=FALSE)
## [1] 0
chi <- chisq.test(as.table(rbind(measured,expected)), simulate.p.value = FALSE)
chi
## 
##  Pearson's Chi-squared test
## 
## data:  as.table(rbind(measured, expected))
## X-squared = 4479.3, df = 4, p-value < 2.2e-16

Because \(P(x) < \alpha\), we can reject the null hypothesis.

Inference

Source:

Chi-Square Test

Source for expected values:2010 Census: Source for measured values: NYC Open Data