# Load the data
# Replace the path below with the path to where your data lives
#data_path <- file.choose()
#stops <- read_csv(data_path)
stops<-read_csv('/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   raw_row_number = col_double(),
##   date = col_date(format = ""),
##   time = col_time(format = ""),
##   location = col_character(),
##   county_name = col_character(),
##   subject_age = col_double(),
##   subject_race = col_character(),
##   subject_sex = col_character(),
##   type = col_character(),
##   violation = col_character(),
##   speed = col_double(),
##   posted_speed = col_double(),
##   vehicle_color = col_character(),
##   vehicle_make = col_character(),
##   vehicle_model = col_logical(),
##   vehicle_type = col_character(),
##   vehicle_registration_state = col_character(),
##   vehicle_year = col_double(),
##   raw_RACE = col_character()
## )
## Warning: 93 parsing failures.
##    row           col           expected     actual                                                                                                                                                                  file
##  39345 vehicle_model 1/0/T/F/TRUE/FALSE grand am   '/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv'
##  70509 vehicle_model 1/0/T/F/TRUE/FALSE blazer     '/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv'
##  84597 vehicle_model 1/0/T/F/TRUE/FALSE blazer     '/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv'
## 199200 vehicle_model 1/0/T/F/TRUE/FALSE rendezvous '/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv'
## 284823 vehicle_model 1/0/T/F/TRUE/FALSE cobalt     '/Users/TASNEEM/OneDrive - SUNY - The College at Brockport/P-Drive copy 2020/COVID2020 Changes/Research/SABBATICAL/SABBATICAL/StopnFrisk/ny_statewide_2020_04_01.csv'
## ...... ............. .................. .......... .....................................................................................................................................................................
## See problems(...) for more details.
# Additional data and fixed values we'll be using
population_2019 <- tibble(
  subject_race = c(
    "asian/pacific islander", "black", "hispanic", "other/unknown","white"
  ),
  num_people = c(1770274, 3423826, 3754537, 719781, 10757819)
) %>% 
  mutate(subject_race = as.factor(subject_race))

center_lat <- 43.00035
center_lng <- -75.4999

Background:

In the last few years, videos of traffic stops have created a national debate about the way law enforcement treats minorities (Vignesh 2016). Cases have spurred calls for more reliable information; from police video to data logged every time someone is pulled over, since traffic stops are one of the most common ways members of the public interact with the police (Vignesh 2016). The Fourth Amendment requires that before stopping the suspect, the police must have a reasonable suspicion that a crime has been, is being, or is about to be committed by the suspect. If the police reasonably suspect that the suspect is armed and dangerous, the police may stop the suspect. A reasonable stop is one “in which a reasonably prudent officer is warranted in the circumstances of a given case in believing that his safety or that of others is endangered, he may make a reasonable search for weapons of the person believed by him to be armed and dangerous.” Stops fall under criminal law, as opposed to civil law (Cornell Law School 1992). Our data is taken from The Stanford Open Policing Project which is a unique partnership between the Stanford Computational Journalism Lab and the Stanford Computational Policy Lab. Starting in 2015, the Open Policing Project began requesting such data from state after state. To date, the project has collected and standardized over 200 million records of traffic stop and search data from across the country (Pierson 2020). Some states do not collect the demographic information of the drivers that police pull over. States that do collect the information do not always release the data. Even when states do provide the information, the way they track and then process the data varies widely across the country, creating challenges for standardizing the information. Data from 21 state patrol agencies and 29 municipal police departments, comprising nearly 100 million traffic stops, are sufficiently detailed to facilitate rigorous statistical analysis (Pierson 2020). The project has found significant racial disparities in policing. These disparities can occur for many reasons: differences in driving behavior, to name one. But, in some cases, we find evidence that bias also plays a role.

Benchmark analysis is the most common statistical method for assessing racial bias in police stops and searches. The key methodological challenge with this approach is estimating the race distribution of the at-risk, or benchmark, population. Traditional benchmarks include the residential population, licensed drivers, arrestees, and reported crime suspects (Engel and Calnon 2004). McConnell and Scheidegger (see McConnell and Scheidegger 2001) have looked at stops initiated by aerial patrols and those based on radar and cameras (Lange, Blackman, and Johnson 2001), arguing that such stops are less prone to potential bias, and thus more likely to reflect the true population of traffic violators. Gelman, Fagan and Kiss (see Gelman, Fangan, and Kiss 2007) use a hierarchical Bayesian model to construct a benchmark based on neighborhood and race specific crime rates. Ridgeway studies post-stop police actions by creating benchmarks based on propensity scores, with minority and white drivers matched using demographics and the time, location, and purpose of the stops (Ridgeway and MacDonald 2009). Grogger and Ridgeway construct benchmarks by considering stops at night, when a “veil of darkness” masks race (Grogger and Ridgeway 2006). Antonovics and Knight use officer-level demographics in a variation of the standard benchmark test: they argue that search rates that are higher when the officer’s race differs from that of the suspect is evidence of discrimination (Antonovic and Knight 2009). Finally, “internal benchmarks” have been used to flag potentially biased officers by comparing each officer’s stop decisions to those made by others patrolling the same area at the same time (Ridgeway and MacDonald 2009).

In benchmarking, one compares the rate at which whites and minorities are treated favorably. For example, in the case of stopping decisions, if white drivers are stopped less often than non-white drivers, that may be the result of bias against minorities. However, if minorities in reality are less creditworthy, possibly due to more minority population than whites, then such disparities in lending rates (stop-and-frisk) may simply reflect reasonable practices rather than discrimination. This limitation of benchmarking is referred to in the literature as the qualified pool or denominator problem (Ayres 2002) and is a specific instance of omitted variable bias. Ideally, one would like to compare similarly qualified white and minority applicants, but such a comparison requires detailed individual-level data and is often unfeasible to carry out in practice.

Methods:

We used a popularly used statistical methods, i.e., benchmarking, to analyze the data for detecting disparities in policing decisions, specifically, a benchmark comparison for stop rates. We use descriptive and inferential statistics to explore and analyze the New York State Patrol data, along with quantitative observations about the proportion of stops for each race. We define proportion of stops to be the number of stops within a specific race divided by the total number of stops. We are using the population of each race as our baseline, defining stop rate to be the proportion of stopped drivers of a specific race. The proportion is defined by the number of stops for a specific race divided by the population of that specific race. Furthermore, we made quantitative claims about disparities in stop rates by dividing the white stop rate by each of the other stop rates for each specific race. Lastly, we computed Cohen’s H value which measures the distance between the proportions of stops for drivers of each specific race compared to proportion of stops of White drivers (Wikipedia, n.d.). We use Cohen’s H value to describe the difference between the two proportions as either small, medium, or large (Wikipedia, n.d.). Thus, Cohen’s H value allows us to determine if the difference between the two proportions is meaningful if the difference is large.

The Data:

We consider a comprehensive dataset of 7.96 million traffic stops conducted in the state of New York between January 2010 and December 2017 that was obtained through the Stanford Open Policing Project website (Pierson 2020). Several variables are recorded for each stop, including the location of the stop, the race of the driver (White, Black, Hispanic, Asian/Pacific Islander, Other, or NA), the subject’s sex (Male or Female), the subject’s age, the violation, the county the stop happened in, and several characteristics of the vehicle that was being driven, including vehicle make, model, color, and type. Our data set does not contain data on whether a search was conducted, if a frisk was performed, if an arrest was made, nor a reason for the stop. In our primary analysis, we include all stops. We have acquired population data of New York State from the Census Bureau website (New York State Census Bureau, n.d.). This data gives us the population of each race within New York State during 2017. We also acquired the population data of Monroe County, New York from the Census Bureau website as well (Monroe County Census Bureau, n.d.). This data gives us the population of each race within Monroe County, New York during 2017.

Results:

There are 19 different variables contained in out dataset. The following variables are shown in the table below.

New York State Variables
x
raw_row_number
date
time
location
county_name
subject_age
subject_race
subject_sex
type
violation
speed
posted_speed
vehicle_color
vehicle_make
vehicle_model
vehicle_type
vehicle_registration_state
vehicle_year
raw_RACE

There are 7,962,169 rows in our NYS dataset, hence there were 7,962,169 recorded stops within NYS. As we stated before, our range of data is from January 2010 to December 2017.

Number of Rows in Data
x
7962169
Range Dates of Data
x
2010-01-01
2017-12-14

There are 63 counties in our dataset (62 counties and one NA “county”). We find that the number of stops per year from 2010 to 2017 has stayed relatively the same. The year 2010 consisted of the highest number of stops with 1,120,192 stops.

Number of Stops by Year
Year Number of Stops
2010 1120192
2011 994119
2012 943322
2013 955896
2014 994232
2015 982626
2016 984923
2017 986859

The NYS Patrol data consists of 7,962,169 number of stops from 2010-2017, with an average number of stops per year around 995, 271 stops. We then look at the number of stops per race and the respective proportions of stops per race to conclude if there is any specific race being pulled over consistently more than the others.

Number of Stops by Race
Race Number of Stops
asian/pacific islander 278075
black 888696
hispanic 553552
other 315590
white 5926237
NA 19
Number and Proportion of Stops by Race
Race Number of Stops Proportion of Stops
asian/pacific islander 278075 0.0349245
black 888696 0.1116148
hispanic 553552 0.0695228
other 315590 0.0396362
white 5926237 0.7442993
NA 19 0.0000024

Among this set of stops, we find that 74.43% of drivers are white, 11.16% are black, 6.95% are Hispanic, 3.49% are Asian/Pacific Islander, and 3.96% are unknown/other. Notice how the highest percentage of drivers being stopped are of White drivers and the second highest percentage of drivers being stopped is of Black drivers.

We then computed Cohen’s H value which measures the distance between the proportions of stops for drivers of each specific race compared to proportion of stops of White drivers.

Cohen’s H Value for Comparison of Proportion of Stops by Race
Cohen’s H Value
White vs. Non-White 1.02
White vs. Asian/Pacific Islander 1.71
White vs. Black 1.40
White vs. Hispanic 1.55
White vs. Other 1.68
White vs. NA 2.08

Here, we find that when each race is compared to the White race, the Cohen’s H value is above 1. Since each comparison is larger than 1, we can conclude that each difference in proportions of stops between Whites and every other race is large. Thus, each difference has a large effect size. Hence, we can conclude that each difference in proportions of stops between races is meaningful, and therefore statistically significant.

We then want to graphically look at the number of stops per year subjected by race to see the trend of stops over time for each race.

Graphically, we can see that the number of stops of White drivers is extremely higher than stops of any other race throughout the years. From this plot we see that, at least for Black and White drivers, the annual trends are very different by race. It is difficult to tell from this plot for drivers of other races because the counts are comparatively so much smaller. All races experienced a spike in stops in 2014, but thereafter, there were fewer White drivers stopped from 2015-2017, whereas there continued to be an increase in the number of Black and Hispanic drivers stopped over those two years.

This is already a potential lead. We investigated further to see what the trend looks like when adjusting for gender within each specific race. Thus, we can see if there was a difference between male drivers and female drivers within each specific race over time.

Both of these figures for male and female drivers depict the same trends for both male and female drivers for each race over the years from 2010-2017. Regardless of gender, White drivers seem to be stopped at a much higher rate than compared to any other race. However, notice how in the figure for male drivers, the number of male drivers being stopped is drastically higher than the number of female drivers being stopped, regardless of race. This could be due to there being more males in NYS than females, more males driving than females, etc.

In order to conclude any racial discrimination of police stops, we need to conduct a benchmark analysis. With a benchmark analysis, we are able to compare the number of stops of each race with the population of that race as a whole. We are now able to determine if any specific race is over-represented in the number of stops compared to its respective population. First, we retrieve the population numbers of each specific race within NYS.

Population Number and Proportion of People by Race in NYS
Race Population Number of People Population Proportion
asian/pacific islander 1770274 0.0866667
black 3423826 0.1676190
hispanic 3754537 0.1838095
other/unknown 719781 0.0352381
white 10757819 0.5266667

Among the population data for each race in NYS, we find that Whites make up 52.67% of the population while all other races make up far less of the population. We find that NYS population consists of 8.67% Asian/Pacific Islander, 16.76% Black, 18.38% Hispanic, and 3.52% other/unknown. Notice how NYS population consists of mostly people who are White and then the second largest proportion are of people who are Black.

We then want to investigate the stop rates for each race to determine if there is any racial bias within police stops.

Stop Rates of each Race using Population of NYS
Race Number of Stops Number of People Stop Rate
asian/pacific islander 278075 1770274 0.1570802
black 888696 3423826 0.2595623
hispanic 553552 3754537 0.1474355
other 315590 NA NA
white 5926237 10757819 0.5508772
NA 19 NA NA

By conducting a benchmark analysis and comparing the number of stops for each race with the population of each race in NYS, we can determine if a certain race is over-represented. Therefore, proving the existence of racial bias within police stops. According to this table, we find that the highest stop rate in NYS is of White drivers who are stopped 55.10% of the time. The next highest percentage of stops are of Black drivers who are stopped 26% of the time. Here, we can conclude that White drivers are stopped the most out of any race, however, this is relative to such a high population of Whites within NYS. We see that White drivers are stopped at a rate 2.12 times higher than Black drivers and 3.74 times higher than Hispanic drivers.

We then distributed the number of stops in NYS across a heat map. The following heat map is a map of New York State with different shades of blue representing different number of stops per county in NYS. The lighter shade of blue, the more stops occurred in those respective counties. We also recreated the same map except with different colors to make the map more colorful.

We further restrict our stops data set to look at Monroe County specifically.

First we see how many rows we have in our new data set consisting of stops only in Monroe County.

Number of Rows in Monroe County Data
x
336879
Number of Stops by Year in Monroe County
Year Number of Stops
2010 47604
2011 42504
2012 42244
2013 41063
2014 48664
2015 43416
2016 38867
2017 32517

We find that the average of stops per year from 2010-2017 in Monroe County is 42,109 stops per year. We will use these numbers for our primary analysis and look at the number of stops and proportion of stops per race to conclude if there is any specific race being pulled over consistently more than the others in Monroe County.

Number of Stops by Race in Monroe County
Race Number of Stops
asian/pacific islander 9696
black 88083
hispanic 27444
other 3744
white 207912
Number and Proportion of Stops by Race in Monroe County
Race Number of Stops Proportion of Stops
asian/pacific islander 9696 0.0287818
black 88083 0.2614678
hispanic 27444 0.0814655
other 3744 0.0111138
white 207912 0.6171712

Among this set of stops for Monroe County, we find that 61.72% of drivers are White, 26.15% are Black, 8.15% are Hispanic, 2.88% are Asian/Pacific Islander, and 1.11% are unknown/other. Notice how the results for Monroe County are the same as they were for NYS. The highest percentage of stopped drivers are of White drivers and the second highest percentage of stopped drivers are of Black drivers.

We then want to graphically look at the number of stops per year subjected by race to see the trend of stops over time for each race for Monroe County.

Graphically, we can see that in Monroe County, the number of stops of White and Black drivers is extremely higher than stops of drivers of any other race throughout the years. From this plot we see that, at least for Black and White drivers, the annual trends are very different by race. It is difficult to tell from this plot for drivers of other races because the counts are drastically smaller. Note how all races experienced a spike in stops in 2013, but then a drop in stops starting in 2014. Thereafter, there were fewer drivers stopped from 2014-2017 for all races except there continued to be an increase in the number of Hispanic drivers stopped within the last year.

As we did with NYS before, to conclude any racial discrimination of police stops within Monroe County, we need to conduct a benchmark analysis. We will be able to determine if any specific race is over-represented in the number of stops compared to its respective population within Monroe County.

Population Number and Proportion of People by Race in Monroe County
Race Number of People Proportion of People
asian/pacific islander 28187 0.0371092
black 120166 0.1582029
hispanic 68242 0.0898431
other/unknown 22994 0.0302724
white 519980 0.6845724

Among the population data for each race in Monroe County, we find that Whites make up 68.46% of the population while all other races make up far less of the population. We find that Monroe County population consists of 3.71% Asian/Pacific Islander, 15.82% Black, 8.98% Hispanic, and 3.02% other/unknown. Notice how NYS population consists of mostly people who are White and then the second largest proportion are of people who are Black, just like we saw with NYS as a whole.

Stop Rates of each Race using Population of Monroe County
Race Number of Stops Number of People Stop Rate
asian/pacific islander 9696 28187 0.3439884
black 88083 120166 0.7330110
hispanic 27444 68242 0.4021570
other 3744 NA NA
white 207912 519980 0.3998461

According to this table, we find that we have different results than of NYS. We see that the highest stop rate in Monroe County is of Black drivers who are stopped 73.30 % of the time. We see the next highest percentage of stops are of Hispanic drivers who are stopped 40.22% of the time. Here, we can conclude that Black and Hispanic drivers are stopped the most out of any race. We see that Black drivers are stopped at a rate 1.83 times higher than White drivers. We also find that Hispanic drivers are stopped at a rate 1.01 times higher than White drivers.

Benchmark Results:

We saw before that about 11.16% of stops in NYS were of Black drivers. However, the NYS population was about 16.76% Black and having a stop rate of .26. Due to this difference in population size, 11.16% of stops being of Black drivers is not all that surprising. We also saw that 6.95% of stops in NYS were of Hispanic drivers, while the NYS population was about 18.38% Hispanic and having a stop rate of only .1474. Thus, 6.95% of stops being of Hispanic drivers is not that surprising as well. Lastly, we saw that 74.43% of stops in NYS were of White drivers. It is important to note that the NYS population was about 52.67% White and having a stop rate of .5509. Therefore, 74.43% of stops being of White drivers also makes sense. White drivers are stopped at a rate 2.12 times higher than Black drivers and 3.74 times higher than Hispanic drivers. Here, we see that the number of stops for each specific race relates to the population percentage of each specific race. Thus, there is no racial discrimination of police stops within NYS.

For Monroe County, we retrieve different results. We saw before that about 26.15% of total stops in Monroe County were of Black drivers. However, the Monroe County population was about 15.82% Black but has a stop rate of .733. Now note that 26.15% of stops being of Black drivers is very surprising since Black drivers only make up 15.82% of the population but are being stopped at a very high rate of 0.733. We also saw that 8.15% of stops in Monroe County were of Hispanic drivers, but the Monroe County population was about 8.98% Hispanic and has a stop rate of .4022. Thus, 8.15% of stops being of Hispanic drivers is very shocking as well, considering that Hispanic drivers only make up 8.98% of the Monroe County population. However, Hispanic drivers are being stopped at a somewhat high rate of 0.4022. Lastly, we saw that 61.72% of stops in Monroe County were of White drivers. It is important to note that the Monroe County population was about 68.46% White and has a stop rate of only .3998. Therefore, 61.72% of stops being of White drivers is somewhat surprising. White drivers make up quite a big percentage of the population in Monroe County; but White drivers are only stopped at a rate of .3998. In Monroe County, Black drivers are stopped at a rate 1.83 times higher than White drivers and Hispanic drivers are stopped at a rate 1.01 times higher than White drivers. Here, we see that even though Whites make up most of the population and stops in Monroe County, we find that Black and Hispanic drivers are stopped at a rate way higher than Whites. Thus, there is evidence of racial discrimination of police stops within Monroe County.

Discussion:

While these baseline stats give us a sense that there are racial disparities in policing practices in NYS, but they are not evidence of discrimination. The argument against the benchmark test is that we have not identified the correct baseline to compare to.

Now, these same baseline stats showed us that there are racial disparities in policing practices in Monroe County, and they are evidence of discrimination. In Monroe County, Black and Hispanic drivers were over-represented and disproportionately stopped relative to their population numbers. Specifically, the population of White drivers was 52.64% higher than Black drivers and 59.48% higher than Hispanic drivers, making it obvious that with stop rates being almost double for Black drivers, racial bias is present in Monroe County.

Limitations:

We ran into a few limitations within our data sets. Within the NYS Patrol data, not all the variables were collected, i.e. search conducted, contraband found, citation issued, warning issued, frisk performed, arrest made, and reason for stop all were missing from our database, which includes Monroe County. The most important variables for our exploration of racial disparities were, search conducted, and frisk performed. These variables would have given us a better understanding of what exactly these drivers are being stopped for and if/how they are being racially discriminated against. Hence our investigation into the Stop-and-Frisk data, was just narrowly focused onto stops due to missing information. Thus, there could be the issue of racial discrimination within these factors but since we are not given the data, we could not perform any test.

Outcome tests, however, are known to suffer from the problem of infra-marginality: even absent discrimination, the repayment rates for minority and white loan recipients might differ if the two groups have different risk distributions. Thus, at least in theory, outcome tests can fail to accurately detect discrimination.

Nevertheless, despite these shortcomings, these tests provide a useful if imperfect measure of bias in stop decisions.

References:

Antonovic, K., and B. G. Knight. 2009. “A New Look at Racial Profiling; Evidence from the Boston Police Department.” 91: 163–77.

Ayres, I. 2002. “Outcome Tests of Racial Disparities in Police Practices” 4: 131–42.

Cornell Law School. 1992. “Stop and Frisk.” Legal Information Institute. https://www.law.cornell.edu/wex/stop_and_frisk.

Engel, R. S., and J. M. Calnon. 2004. “Comparing Benchmark Methodologies for Police-Citizen Contacts; Traffic Stop Data Collection for the Pennsylvania State Police.” 7: 97–125.

Gelman, A., J. Fangan, and A. Kiss. 2007. “An Analysis of the New York City Police Department’s ‘Stop-and-Frisk’ Policy in the Context of Claims of Racial Bias” 102: 813–23.

Grogger, J., and G. Ridgeway. 2006. “Testing for Racial Profiling in Traffic Stops from Behind a Veil of Darkness.” 101: 878–87.

Lange, J. E., K. O. Blackman, and M. B. Johnson. 2001. “Speed Violation Survey of the New Jersey Turnpike.”

McConnell, E. H., and A. R. Scheidegger. 2001. “Race and Speeding Citations; Comparing Speeding Citations Issued by Air Traffic Officers with Those Issued by Ground Traffic Officers.”

Monroe County Census Bureau. n.d. “Population Data of Monroe County, New York State from the Census Bureau Website.” https://www.census.gov/quickfacts/fact/table/monroecountynewyork/PST045219.

New York State Census Bureau. n.d. “Population Data of New York State from the Census Bureau Website.” https://www.census.gov/quickfacts/NY.

Pierson, E. et al. 2020. “A Large-Scale Analysis of Racial Disparities in Police Stops Across the United States.” Nature Human Behavior 4. https://openpolicing.stanford.edu/data/.

Ridgeway, G., and J. M. MacDonald. 2009. “Doubly Robust Internal Benchmarking and False Discovery Rates for Detecting Racial Bias in Police Stops.” 104: 661–68.

Vignesh, Ramachandran et al. 2016. “Are Traffic Stops Prone to Racial Bias?” The Marshall Project. https://www.themarshallproject.org/2016/06/21/are-traffic-stops-prone-to-racial-bias.

Wikipedia. n.d. “Cohen’s H Effect Size.” https://en.wikipedia.org/wiki/Cohen’s_h.