Stat 360 Project 4

Author

Griffin Lessinger

Published

December 8, 2025

Introduction

In this report, we analyze different aspects of the official 2022 NYPD “Stop, Question, and Frisk” dataset. The dataset is comprised of roughly 15,000 stop-and-frisk incidents conducted by the NYPD during the 2022 calendar year, with 82 columns capturing data such as the demographics the the subject being stopped, context of the stop, arrests made (if any), citations/summons issued, etc.

Here is a sample of the first few columns of the first few entries of the data:

Sample of stop-and-frisk data

Code

library(readxl)
library(ggplot2)
library(dplyr)

frisk <- read_xlsx("/home/user/School/STAT360/Project 4 (NYPD)/sqf-2022.xlsx")
frisk[1:10, 2:7]

# A tibble: 10 × 6
   STOP_FRISK_DATE     STOP_FRISK_TIME     YEAR2 MONTH2 DAY2  STOP_WAS_INITIATED
   <dttm>              <dttm>              <dbl> <chr>  <chr> <chr>             
 1 2022-01-01 00:00:00 1899-12-31 08:40:00  2022 Janua… Satu… Based on Self Ini…
 2 2022-01-01 00:00:00 1899-12-31 03:25:00  2022 Janua… Satu… Based on Self Ini…
 3 2022-01-01 00:00:00 1899-12-31 00:19:00  2022 Janua… Satu… Based on Self Ini…
 4 2022-01-01 00:00:00 1899-12-31 03:00:00  2022 Janua… Satu… Based on Radio Run
 5 2022-01-01 00:00:00 1899-12-31 03:00:00  2022 Janua… Satu… Based on Radio Run
 6 2022-01-01 00:00:00 1899-12-31 10:30:00  2022 Janua… Satu… Based on C/W on S…
 7 2022-01-01 00:00:00 1899-12-31 12:00:00  2022 Janua… Satu… Based on C/W on S…
 8 2022-01-01 00:00:00 1899-12-31 01:19:00  2022 Janua… Satu… Based on Self Ini…
 9 2022-01-01 00:00:00 1899-12-31 00:39:00  2022 Janua… Satu… Based on Self Ini…
10 2022-01-01 00:00:00 1899-12-31 03:25:00  2022 Janua… Satu… Based on Self Ini…

(not all that useful, sorry!)

Column names of frisk data (for the brave)

Code

colnames(frisk)

 [1] "STOP_ID"                                                     
 [2] "STOP_FRISK_DATE"                                             
 [3] "STOP_FRISK_TIME"                                             
 [4] "YEAR2"                                                       
 [5] "MONTH2"                                                      
 [6] "DAY2"                                                        
 [7] "STOP_WAS_INITIATED"                                          
 [8] "RECORD_STATUS_CODE"                                          
 [9] "ISSUING_OFFICER_RANK"                                        
[10] "ISSUING_OFFICER_COMMAND_CODE"                                
[11] "SUPERVISING_OFFICER_RANK"                                    
[12] "SUPERVISING_OFFICER_COMMAND_CODE"                            
[13] "SUPERVISING_ACTION_CORRESPONDING_ACTIVITY_LOG_ENTRY_REVIEWED"
[14] "LOCATION_IN_OUT_CODE"                                        
[15] "JURISDICTION_CODE"                                           
[16] "JURISDICTION_DESCRIPTION"                                    
[17] "OBSERVED_DURATION_MINUTES"                                   
[18] "SUSPECTED_CRIME_DESCRIPTION"                                 
[19] "STOP_DURATION_MINUTES"                                       
[20] "OFFICER_EXPLAINED_STOP_FLAG"                                 
[21] "OFFICER_NOT_EXPLAINED_STOP_DESCRIPTION"                      
[22] "OTHER_PERSON_STOPPED_FLAG"                                   
[23] "SUSPECT_ARRESTED_FLAG"                                       
[24] "SUSPECT_ARREST_OFFENSE"                                      
[25] "SUMMONS_ISSUED_FLAG"                                         
[26] "SUMMONS_OFFENSE_DESCRIPTION"                                 
[27] "OFFICER_IN_UNIFORM_FLAG"                                     
[28] "ID_CARD_IDENTIFIES_OFFICER_FLAG"                             
[29] "SHIELD_IDENTIFIES_OFFICER_FLAG"                              
[30] "VERBAL_IDENTIFIES_OFFICER_FLAG"                              
[31] "FRISKED_FLAG"                                                
[32] "SEARCHED_FLAG"                                               
[33] "ASK_FOR_CONSENT_FLG"                                         
[34] "CONSENT_GIVEN_FLG"                                           
[35] "OTHER_CONTRABAND_FLAG"                                       
[36] "FIREARM_FLAG"                                                
[37] "KNIFE_CUTTER_FLAG"                                           
[38] "OTHER_WEAPON_FLAG"                                           
[39] "WEAPON_FOUND_FLAG"                                           
[40] "PHYSICAL_FORCE_CEW_FLAG"                                     
[41] "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG"                      
[42] "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG"                        
[43] "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG"                           
[44] "PHYSICAL_FORCE_OTHER_FLAG"                                   
[45] "PHYSICAL_FORCE_RESTRAINT_USED_FLAG"                          
[46] "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG"                      
[47] "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG"                           
[48] "BACKROUND_CIRCUMSTANCES_VIOLENT_CRIME_FLAG"                  
[49] "BACKROUND_CIRCUMSTANCES_SUSPECT_KNOWN_TO_CARRY_WEAPON_FLAG"  
[50] "SUSPECTS_ACTIONS_CASING_FLAG"                                
[51] "SUSPECTS_ACTIONS_CONCEALED_POSSESSION_WEAPON_FLAG"           
[52] "SUSPECTS_ACTIONS_DECRIPTION_FLAG"                            
[53] "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"                     
[54] "SUSPECTS_ACTIONS_IDENTIFY_CRIME_PATTERN_FLAG"                
[55] "SUSPECTS_ACTIONS_LOOKOUT_FLAG"                               
[56] "SUSPECTS_ACTIONS_OTHER_FLAG"                                 
[57] "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG"                    
[58] "SEARCH_BASIS_ADMISSION_FLAG"                                 
[59] "SEARCH_BASIS_CONSENT_FLAG"                                   
[60] "SEARCH_BASIS_HARD_OBJECT_FLAG"                               
[61] "SEARCH_BASIS_INCIDENTAL_TO_ARREST_FLAG"                      
[62] "SEARCH_BASIS_OTHER_FLAG"                                     
[63] "SEARCH_BASIS_OUTLINE_FLAG"                                   
[64] "DEMEANOR_OF_PERSON_STOPPED"                                  
[65] "SUSPECT_REPORTED_AGE"                                        
[66] "SUSPECT_SEX"                                                 
[67] "SUSPECT_RACE_DESCRIPTION"                                    
[68] "SUSPECT_HEIGHT"                                              
[69] "SUSPECT_WEIGHT"                                              
[70] "SUSPECT_BODY_BUILD_TYPE"                                     
[71] "SUSPECT_EYE_COLOR"                                           
[72] "SUSPECT_HAIR_COLOR"                                          
[73] "SUSPECT_OTHER_DESCRIPTION"                                   
[74] "STOP_LOCATION_PRECINCT"                                      
[75] "STOP_LOCATION_SECTOR_CODE"                                   
[76] "STOP_LOCATION_APARTMENT"                                     
[77] "STOP_LOCATION_FULL_ADDRESS"                                  
[78] "STOP_LOCATION_STREET_NAME"                                   
[79] "STOP_LOCATION_X"                                             
[80] "STOP_LOCATION_Y"                                             
[81] "STOP_LOCATION_PATROL_BORO_NAME"                              
[82] "STOP_LOCATION_BORO_NAME"

Many of the columns are just simple flags or indicators, but there are a lot of columns. The goal of this report is to supply a visual analysis of the above dataset, answering questions about demographics of those stopped, trends over the year, reasons for stops, etc.

1. Demographic Analysis

What is the racial distribution of individuals who were stopped and frisked by the police in 2022?

This is a major point of contention in the US today, whether or not police unfairly discriminate against certain people-groups when it comes to responses or investigation.

Demographic analysis of racial distribution of stop-and-frisk incidents

Code

demographic.race <- frisk |>
  group_by(SUSPECT_RACE_DESCRIPTION) |>
  summarize(count = n())

demographic.race.bp <- barplot(
  height = demographic.race$count[-1],
  names.arg = c("Native A.", "Asian", "Black", "Black His.", "Middle E.", "White", "White His."),
  col = colorRampPalette(c("skyblue", "royalblue4"))(7),
  cex.names = 0.9,
  xlab = "Racial description",
  ylab = "Count",
  main = "Stop-and-frisk incidents by race"
)
text(
  x = demographic.race.bp,
  y = demographic.race$count[-1] + 350,
  labels = demographic.race$count[-1],
  cex = 0.8,
  xpd = TRUE
)

This does little to understand the overall picture, however. Instead, we should look at these counts proportional to the overall population:

Code

pops <- c(143632, 1373502, 1776891, 100000, 2719856)

demographic.race.bpadj <- barplot(
  height = demographic.race$count[-c(1, 5, 8)]/pops,
  names.arg = c("Native A.", "Asian", "Black", "Middle E.", "White"),
  col = colorRampPalette(c("skyblue", "royalblue4"))(5),
  cex.names = 0.9,
  xlab = "Racial description",
  ylab = "Proportion of pop.",
  ylim = c(0, 0.0056),
  main = "Stop-and-frisk incidents by race"
)
text(
  x = demographic.race.bpadj,
  y = demographic.race$count[-c(1, 5, 8)]/pops + 0.00023,
  labels = paste0(trunc(10000*demographic.race$count[-c(1, 5, 8)]/pops)/100, "%"),
  cex = 0.8,
  xpd = TRUE
)
text(
  x = 3.1,
  y = 0.0059,
  labels = "Pop. adjusted with 2020 census data",
  cex = 0.9,
  xpd = TRUE
)

Unfortunately, I was not able to track down population estimates for individuals who identify as “Black Hispanic” or “White Hispanic”. Given the nature of the data, it may be best to then omit a population-adjusted proportion of those individuals who are stopped-and-frisked, though it would still likely be high (as seen in the first plot).

If we continue with these population-adjusted figures, however, we still see that Black people tend to be disproportionately represented in terms of counts of stop-and-frisk incidents. This is also true with the Middle-Eastern-descendant population of NYC.

Additionally, the absolute count of Native Americans stopped by NYPD during 2022 is very low, mostly because the overall population of Native Americans in NYC itself is low. As such, we can’t extrapolate much without looking at more data.

Demographic analysis of sex distribution of stop-and-frisk incidents

Code

demographic.sex <- frisk |>
  group_by(SUSPECT_SEX) |>
  summarize(count = n())

par(mar = c(5, 13, 5, 9))

demographic.sex.bp <- barplot(
  height = matrix(data = demographic.sex$count[-1]/14968, nrow = 2),
  col = c("pink2", "skyblue"),
  ylab = "Proportion",
  main = "Stop-and-frisk incidents by sex",
  xlim = c(0, 1),
  width = 0.2
)
legend(
  x = "right",
  fill = c("skyblue", "pink2"),
  legend = c(
    paste0("Male: ", demographic.sex$count[3]),
    paste0("Female: ", demographic.sex$count[2])
  ),
  bty = "n"
)

Code

par(mar = c(5, 6, 4, 2))

Seemingly a little under 10% of all frisk incidents are of stopping women, and the vast majority are of stopping men.

Conclusions here would be that Black people are disproportionately stopped the most (by a lot). Men are also stopped and frisked far more often than women. Explanations for these observations tend to be quite nuanced and are beyond the scope of this specific assignment.

2. Temporal Trends

How have the number of stop-and-frisk encounters changed over the months of 2022? Does day of the week make an impact?

We will first examine the trend over the entire year by month, then over the week by weekday:

Analysis of incidents by month

Code

count.month <- frisk |>
  group_by(MONTH2) |>
  summarise(count = n())

count.month$month <- substr(count.month$MONTH2, 1, 3)
count.month$month <- factor(
  count.month$month,
  levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
  ordered = TRUE
)

ggplot(data = count.month, aes(x = month, y = count)) +
  ylim(0, 2000) +
  geom_line(group = 1) +
  labs(
    title = "Stop-and-frisk incidents by month",
    x = "Month",
    y = "Count"
  ) +
  theme_classic()

The monthly total seems to hover at roughly 1000-1500 incidents, except in December, where the count drops rather sharply. New York can have some cold winters, so fewer people are likely walking around outside. This would explain the dip in frisking incidents.

Analysis of incidents by weekday

Code

count.week <- frisk |>
  group_by(DAY2) |>
  summarise(count = n())

count.week$day <- substr(count.week$DAY2, 1, 3)
count.week$day <- factor(
  count.week$day,
  levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
  ordered = TRUE
)

ggplot(data = count.week, aes(x = day, y = count)) +
  ylim(0, 3000) +
  geom_line(group = 1) +
  labs(
    title = "Stop-and-frisk incidents by weekday",
    x = "Weekday",
    y = "Count"
  ) +
  theme_classic()

There is a very real, but odd, dip for incident counts on Mondays. It is almost certainly related to the fact that Monday is the start of the work week, but exactly why this would cause a dip is difficult to explain.

Overall, it is reasonable to conclude that there exists a significant temporal component related to overall counts (and thus, monthly/weekly averages) of stop-and-frisk occurrences. There is an expected decline in rates during December, then also on each Monday (aggregated throughout the year).

3 + 4. Reasons for Stops & Outcomes of Stops

What are the most common reasons cited by police officers for conducting stop-and-frisk encounters?

What are the NYPD officers typically stopping someone for? What is the suspected crime? We are given the suspected crime in the data, along with outcome of the encounter (nothing, arrest, etc.):

10 most commonly cited reasons for stop

Code

count.reasons <- frisk |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  summarise(count = n()) |>
  arrange(desc(count))

count.arrests <- frisk |>
  filter(SUSPECT_ARRESTED_FLAG == "Y") |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  summarise(count = n())

count.reasons <- left_join(
  x = count.reasons,
  y = count.arrests,
  by = "SUSPECTED_CRIME_DESCRIPTION"
)

count.reasons$count.y[c(23, 25, 26)] <- 0
colnames(count.reasons) <- c("reason", "stops", "arrests")

count.reasons[1:10, ]

# A tibble: 10 × 3
   reason             stops arrests
   <chr>              <int>   <dbl>
 1 CPW                 6908    1432
 2 ROBBERY             1544     717
 3 PETIT LARCENY       1336     820
 4 ASSAULT             1269     513
 5 BURGLARY            1077     379
 6 OTHER                557     179
 7 GRAND LARCENY        439     222
 8 GRAND LARCENY AUTO   434     128
 9 MENACING             414     173
10 CRIMINAL MISCHIEF    238     108

The common stops seem expected, all things considered. “CPW” stands for “Criminal Possession of Weapon”, “Petit” means “Petty” in this context, and the others are all self-explanatory. Note: Just because one was stopped by the NYPD for these suspected crimes does not means that they actually committed them, nor that they were arrested; rather, it’s just that the individual was a suspect.

Is there a relationship between the stated reason for the stop and the outcome of the encounter?

Analysis of arrests rates for suspected crimes

Code

par(oma = c(2, 0, 0, 0))

reasons.bp <- barplot(
  height = count.reasons[1:20, ]$arrests/count.reasons[1:20, ]$stops,
  col = colorRampPalette(c("grey90", "royalblue4"))(20),
  ylim = c(0, 1),
  main = "Proportions of stop counts to arrest outcomes",
  ylab = "Proportion arrested"
)
text(
  x = reasons.bp + 0.05,
  y = -0.03,
#  labels = count.reasons[1:20, ]$reason,
  labels = c("CPW", "Robbery", "P. Larceny", "Assault", "Burglary", "Other", "G. Larceny", "G. Larceny Auto", "Menacing", "Mischief", "Trespassing", "Endangering", "CPSP", "Graffiti", "Auto stripping", "P. of Substance", "S. of Substance", "Use of Vehicle", "Murder", "Forced Touching"),
  cex = 0.85,
  adj = 1,
  srt = 60,
  xpd = TRUE
)
text(
  x = 11.9,
  y = 1.05,
  labels = "Top 20 cited reasons for stop",
  cex = 0.9,
  xpd = TRUE
)
abline(
  h = nrow(frisk[frisk$SUSPECT_ARRESTED_FLAG == "Y", ])/nrow(frisk),
  col = "red3",
  lwd = 2,
  lty = 2
)
legend(
  x = "topright",
  fill = "red3",
  legend = "Mean arrest rate: 0.33",
  bty = "n"
)
mtext(
  "Reason for stop",
  side = 1,
  outer = TRUE
)

Code

par(oma = c(0, 0, 0, 0))

Yes, it seems so! Some suspected crimes are far more likely to have arrest being an outcome than others. Petty Larceny, for example, has an arrest rate nearly double that of the mean overall arrest rate, while CPW has a significantly smaller arrest rate. Most, however, are somewhat near the mean arrest rate, at +-10% of the mean.

Again, note that an arrest being made does not mean that the arrested person necessarily committed the crime, just that the attending officer believes that the crime may have occurred, beyond a reasonable doubt.

We can also compute the following statistics: 60.03% of stops resulted in the suspect being frisked, 42.22% of stops resulted in the suspect being searched, and 2.78% of stops resulted in a court summons being issued.

Are there differences in outcomes based on the demographics of the individuals stopped?

Now that we understand how certain suspected crimes result in more arrests than others, how does the proportions of arrests for those crimes depend on demographics (primarily, race and sex)?

Analysis of outcomes by race and sex

Code

count.race <- frisk |>
  filter(SUSPECT_RACE_DESCRIPTION != "(null)") |>
  group_by(SUSPECTED_CRIME_DESCRIPTION, SUSPECT_RACE_DESCRIPTION) |>
  summarise(
    stops = n(),
    arrests = sum(SUSPECT_ARRESTED_FLAG == "Y")
  ) |>
  ungroup() |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  mutate(total = sum(stops)) |>
  ungroup() |>
  arrange(desc(total)) |>
  select(-total) |>
  mutate(rate = arrests/stops)

count.sex <- frisk |>
  filter(SUSPECT_SEX != "(null)") |>
  group_by(SUSPECTED_CRIME_DESCRIPTION, SUSPECT_SEX) |>
  summarise(
    stops = n(),
    arrests = sum(SUSPECT_ARRESTED_FLAG == "Y")
  ) |>
  ungroup() |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  mutate(total = sum(stops)) |>
  ungroup() |>
  arrange(desc(total)) |>
  select(-total) |>
  mutate(rate = arrests/stops)

race.bp <- barplot(
  height = matrix(
    data = count.race$rate[c(2:7, 9:14, 23:28)],
    nrow = 6,
    byrow = FALSE
  ),
  beside = TRUE,
  col = colorRampPalette(c("grey90", "royalblue4"))(6),
  ylim = c(0, 1),
  main = "Proportions of stop counts to arrests by race",
  xlab = "Reason for stop",
  ylab = "Proportion arrested",
  names.arg = c("CPW", "Robbery", "Assault")
)
legend(
  x = "topleft",
  fill = colorRampPalette(c("grey90", "royalblue4"))(6),
  legend = c("Asian", "Black", "Black Hispanic", "Middle Eastern", "White", "White Hispanic"),
  cex = 0.8,
  bty = "n"
)

Above are some suspected crimes selected (mainly because these were some of the few that had enough data to indicate patterns). It appears that race is likely an indicator of stop-to-arrest rates, but the pattern is likely not as simple as one race being arrested more than others. It probably depends on factors such as class of crime, severity, and others.

Code

sex.bp <- barplot(
  height = matrix(
    data = count.sex$rate[c(1:2, 3:4, 5:6, 7:8, 9:10)],
    nrow = 2,
    byrow = FALSE
  ),
  beside = TRUE,
  col = c("pink2", "skyblue"),
  ylim = c(0, 1),
  main = "Proportions of stop counts to arrests by sex",
  xlab = "Reason for stop",
  ylab = "Proportion arrested",
  names.arg = c("CPW", "Robbery", "P. Larceny", "Assault", "Burglary")
)
legend(
  x = "topleft",
  fill = c("pink2", "skyblue"),
  legend = c("Female", "Male"),
  bty = "n"
)

Same is true for male vs. female suspect stops. There is probably a pattern, but it is likely a complex one, dependent on many factors. It may also be a data limitation, many groupings of this dataset will result in groups with <100 observations, which is a bit shaky to generalize on. More data would be better.

5. Location Analysis

Which neighborhoods or precincts have the highest number of stop-and-frisk incidents?

This is a big question, because neighborhood rates of stops-and-frisk incidents are highly correlated with demographic makeup of the neighborhood, but not dependent on said demographic makeup:

Analysis of incidents by precinct

Code

count.loc <- frisk |>
  group_by(STOP_LOCATION_BORO_NAME, STOP_LOCATION_PRECINCT) |>
  summarize(count = n()) |>
  ungroup() |>
  group_by(STOP_LOCATION_BORO_NAME) |>
  mutate(total = sum(count)) |>
  ungroup() |>
  arrange(desc(total))

pops <- c(1472000, 2736000, 1695000, 2405000, 496000)

boro.bp <- barplot(
  height = unique(count.loc$total)/pops,
  names.arg = c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island"),
  main = "Incident counts by borough",
  xlab = "Borough",
  ylab = "Proportion",
  col = colorRampPalette(c("grey85", "royalblue4"))(5)
)
text(
  x = 3.1,
  y = 0.00325,
  labels = "Pop. adjusted with 2020 census data",
  cex = 0.85,
  xpd = TRUE
)
text(
  x = boro.bp,
  y = unique(count.loc$total)/pops + 0.00015,
  labels = unique(count.loc$total),
  cex = 0.8,
  xpd = TRUE
)

Bronx sees both a high absolute rate of stop-and-frisk incidents at 4495 in 2022 as well as a high proportional rate (with population adjustment). This aligns with the Bronx having the lowest white population of any of the boroughs, as well as a Hispanic majority, which we have already seen as a frequent target of stops.

Code

precinct.bp <- barplot(
  height = count.loc$count[1:12],
  names.arg = c(40:50, 52),
  main = "Incident counts by Bronx precincts",
  xlab = "Precinct",
  ylab = "Count",
  col = colorRampPalette(c("grey85", "royalblue4"))(12)
)
text(
  x = precinct.bp,
  y = count.loc$count[1:12] + 40,
  labels = count.loc$count[1:12],
  cex = 0.8,
  xpd = TRUE
)
abline(
  h = mean(count.loc$count[1:12]),
  col = "red3",
  lwd = 2,
  lty = 2
)
legend(
  x = "topleft",
  fill = "red3",
  legend = "Mean: 374.6",
  cex = 0.8,
  bty = "n"
)

We also see that within the Bronx, precinct 46 has, by far, the most stop-and-frisk incidents of all the precinct locations. In fact, it has the most incidents out of any NYPD precinct, beating 2nd place by over 600 incidents.

So yes, location matters greatly when it comes to rates of stops-and-frisk events, for multiple reasons. The Bronx sees the most of these, with precinct 46 initiating over 1100 in 2022 alone.

6. Weapon and Contraband Discovery

How frequently do police officers find weapons or contraband during stop-and-frisk encounters?

Given that a large number of the total set of stop events involve searches, it makes sense to wonder how many of those searches actually yield weapons or contraband. Note that only 42.22% of all stops actually lead to searches.

Analysis of weapons/contraband searches

Code

count.search <- frisk |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  summarize(
    search = sum(SEARCHED_FLAG == "Y"),
    weapon = sum(WEAPON_FOUND_FLAG == "Y"),
    gun = sum(FIREARM_FLAG == "Y"),
    knife = sum(KNIFE_CUTTER_FLAG == "Y"),
    otherweapon = sum(OTHER_WEAPON_FLAG == "Y"),
    contra = sum(OTHER_CONTRABAND_FLAG == "Y"),
    total = n()
  ) |>
  arrange(desc(total)) |>
  select(-total)

count.search <- count.search |>
  mutate(percent.weapon = weapon/search) |>
  mutate(percent.gun = gun/search) |>
  mutate(percent.knife = knife/search) |>
  mutate(percent.otherweapon = otherweapon/search) |>
  mutate(percent.contra = contra/search)

search.bp <- barplot(
  height = matrix(
    data = c(count.search$percent.weapon[c(1, 2, 4, 11, 14)], count.search$percent.contra[c(1, 2, 4, 11, 14)]),
    nrow = 2,
    byrow = TRUE
  ),
  beside = TRUE,
  col = c("grey", "tan"),
  ylim = c(0, 1),
  main = "Proportions of searches to found-illicit items",
  xlab = "Reason for stop",
  ylab = "Proportion",
  names.arg = c("CPW", "Robbery", "Assault", "Tresspassing", "Graffiti")
)
legend(
  x = "topleft",
  fill = c("grey", "tan"),
  legend = c("Weapon", "Contraband"),
  cex = 0.8,
  bty = "n"
)

Above is a barplot displaying the proportion of types of items found to the total number of searches, per suspected offense. Clearly, there is a strong relationship between the ratios and the suspected crime (as one might expect). “Criminal possession of weapon” (CPW) searches lead to a very high chance of finding a weapon, and Graffiti searches lead to a high chance of finding contraband.

Overall, if the officers choose to search, they find weapons 37.81% of the time, and they find other contraband 13.72% of the time.

7. Repeat Encounters

I could not find any section of the data that disclosed information about repeat encounters. Thus, We will have to move on for now.

8. Age and Stop Frequency

Is there a relationship between the age of individuals and their likelihood of being stopped and frisked?

Analysis of age vs. stop rates

Code

frisk.age <- frisk |>
  filter(SUSPECT_REPORTED_AGE != "(null)") |>
  mutate(SUSPECT_REPORTED_AGE = as.numeric(SUSPECT_REPORTED_AGE))

hist(
  frisk.age$SUSPECT_REPORTED_AGE,
  main = "Counts of stops by age",
  xlab = "Age",
  ylab = "Count",
  col = "magenta3"
)

It seems that there exists a strong relationship between the suspect ages and the rates at which they are stopped. Specifically, the age with the highest stop counts was 15 years old, decreasing as the suspects get older.

There were 5 people who had reported their age as “0” years old. I’m not exactly sure why they did so.

Code

count.age <- frisk.age |>
  group_by(SUSPECTED_CRIME_DESCRIPTION) |>
  summarise(
    age1.14 = sum(SUSPECT_REPORTED_AGE >= 1 & SUSPECT_REPORTED_AGE < 15),
    age15.29 = sum(SUSPECT_REPORTED_AGE >= 15 & SUSPECT_REPORTED_AGE < 30),
    age30.44 = sum(SUSPECT_REPORTED_AGE >= 30 & SUSPECT_REPORTED_AGE < 45),
    age45.59 = sum(SUSPECT_REPORTED_AGE >= 45 & SUSPECT_REPORTED_AGE < 60),
    age60.up = sum(SUSPECT_REPORTED_AGE >= 60),
    total = n()
  ) |>
  arrange(desc(total)) |>
  select(-total)

age.bp <- barplot(
  height = matrix(
    data = c(
      count.age$age1.14[c(2, 3, 5, 10)],
      count.age$age15.29[c(2, 3, 5, 10)],
      count.age$age30.44[c(2, 3, 5, 10)],
      count.age$age45.59[c(2, 3, 5, 10)],
      count.age$age60.up[c(2, 3, 5, 10)]
    ),
    nrow = 5,
    byrow = TRUE
  ),
  beside = TRUE,
  col = colorRampPalette(c("grey", "red3", "purple3"))(5),
  names.arg = c("Robbery", "P. Larceny", "Burglary", "Mischief"),
  main = "Suspected crimes by age groups",
  xlab = "Reason for stop",
  ylab = "Count"
)
legend(
  x = "topright",
  title = "Age group (years)",
  fill = colorRampPalette(c("grey", "red3", "purple3"))(5),
  legend = c("1-14", "15-29", "30-44", "45-59", "60+"),
  cex = 0.8,
  bty = "n"
)

There is a somewhat clear distribution of counts of stops when broken down by age, with suspects placed in the 15-29 or 30-44 year old age groups being stopped the majority of the time for most suspected crimes.

9. Officer Identification

Which police officers have conducted the highest number of stop-and-frisk encounters?

Code

count.officer <- frisk |>
  group_by(ISSUING_OFFICER_COMMAND_CODE) |>
  summarize(
    total = n(),
    searches = sum(SEARCHED_FLAG == "Y"),
    arrests = sum(SUSPECT_ARRESTED_FLAG == "Y")
  ) |>
  mutate(search.ratio = searches/total,
         arrest.ratio = arrests/total) |>
  arrange(desc(arrest.ratio)) |>
  filter(total >= 15)

names(count.officer)[1] <- "officer.code"

First, something interesting to note is that many of the officers have a 100% stop-to-arrest ratio because they have only made one stop. Likewise, there are many officers with a 0% stop-to-arrest ratio for the same reason. As we care about trends, we will only consider officers who have conducted at least 15 stops:

Top 10 officers in terms of stops-to-arrests

Code

count.officer[1:10, ]

# A tibble: 10 × 6
   officer.code total searches arrests search.ratio arrest.ratio
          <dbl> <int>    <int>   <int>        <dbl>        <dbl>
 1           13   108       77      73        0.713        0.676
 2          112   189      132     124        0.698        0.656
 3          182    86       55      55        0.640        0.640
 4           10    52       41      32        0.788        0.615
 5          108    70       47      42        0.671        0.6  
 6          169    17       11      10        0.647        0.588
 7           17    42       24      24        0.571        0.571
 8           19   155      111      88        0.716        0.568
 9           20   131       74      74        0.565        0.565
10           76    80       60      45        0.75         0.562

Remaining officers

Code

as.data.frame(count.officer[11:113, ], row.names = 11:113)

    officer.code total searches arrests search.ratio arrest.ratio
1             72   121       64      67    0.5289256    0.5537190
2            863    35       22      19    0.6285714    0.5428571
3            123    61       32      32    0.5245902    0.5245902
4            102    85       57      44    0.6705882    0.5176471
5             90   118       68      61    0.5762712    0.5169492
6              1    78       46      40    0.5897436    0.5128205
7             68    65       35      33    0.5384615    0.5076923
8              7    66       36      33    0.5454545    0.5000000
9            483    38       23      19    0.6052632    0.5000000
10             6   105       23      51    0.2190476    0.4857143
11           181    56       26      26    0.4642857    0.4642857
12            79   168       61      77    0.3630952    0.4583333
13           106   162       83      74    0.5123457    0.4567901
14            62   119       51      54    0.4285714    0.4537815
15            24   225       62     102    0.2755556    0.4533333
16            63   199      105      88    0.5276382    0.4422111
17            94   185       67      81    0.3621622    0.4378378
18           105   141       86      61    0.6099291    0.4326241
19           122    98       61      42    0.6224490    0.4285714
20           804    54       27      23    0.5000000    0.4259259
21           809    48       17      20    0.3541667    0.4166667
22            30   171       73      71    0.4269006    0.4152047
23            47   198      111      82    0.5606061    0.4141414
24            14   225      113      93    0.5022222    0.4133333
25            26    34       19      14    0.5588235    0.4117647
26            18   113       47      46    0.4159292    0.4070796
27           120    79       44      32    0.5569620    0.4050633
28            23   132       76      53    0.5757576    0.4015152
29            78    70       29      28    0.4142857    0.4000000
30           107   123       54      49    0.4390244    0.3983740
31           121   152       67      60    0.4407895    0.3947368
32            49   113       46      44    0.4070796    0.3893805
33           104   160       78      62    0.4875000    0.3875000
34           163    26        6      10    0.2307692    0.3846154
35           185    34       17      13    0.5000000    0.3823529
36           866    34        9      13    0.2647059    0.3823529
37           161    55       12      21    0.2181818    0.3818182
38            45   109       40      41    0.3669725    0.3761468
39            41   173       73      65    0.4219653    0.3757225
40           101   128       70      48    0.5468750    0.3750000
41             9   117       53      43    0.4529915    0.3675214
42           103   192       94      70    0.4895833    0.3645833
43           114   173       81      61    0.4682081    0.3526012
44            84   165       58      58    0.3515152    0.3515152
45           109   188       84      66    0.4468085    0.3510638
46           801    38       15      13    0.3947368    0.3421053
47           805   152       83      52    0.5460526    0.3421053
48           806    44       21      15    0.4772727    0.3409091
49            61   149       66      50    0.4429530    0.3355705
50            43   273      110      91    0.4029304    0.3333333
51           186    21       13       7    0.6190476    0.3333333
52            69   187       73      62    0.3903743    0.3315508
53           100    64       30      21    0.4687500    0.3281250
54            42   237      104      77    0.4388186    0.3248945
55            60   182       66      59    0.3626374    0.3241758
56           802   113       36      36    0.3185841    0.3185841
57            50   236       84      75    0.3559322    0.3177966
58            67   272      111      86    0.4080882    0.3161765
59            25   130       45      41    0.3461538    0.3153846
60            70   102       38      30    0.3725490    0.2941176
61            88    99       36      29    0.3636364    0.2929293
62            33    82       27      24    0.3292683    0.2926829
63           318   123       61      36    0.4959350    0.2926829
64           860    24       11       7    0.4583333    0.2916667
65           113   222      107      64    0.4819820    0.2882883
66           807   108       59      31    0.5462963    0.2870370
67           870    35        8      10    0.2285714    0.2857143
68            28   203       76      57    0.3743842    0.2807882
69           110   100       54      28    0.5400000    0.2800000
70           165    18        9       5    0.5000000    0.2777778
71           849    18        5       5    0.2777778    0.2777778
72            44   296      100      81    0.3378378    0.2736486
73           865    22        8       6    0.3636364    0.2727273
74           115    67       34      18    0.5074627    0.2686567
75           183   177       57      47    0.3220339    0.2655367
76            81   212       59      56    0.2783019    0.2641509
77           803   226       92      59    0.4070796    0.2610619
78            40   363      113      94    0.3112948    0.2589532
79           138    39       14      10    0.3589744    0.2564103
80             5    55       21      14    0.3818182    0.2545455
81           187    67       16      17    0.2388060    0.2537313
82           162    16        3       4    0.1875000    0.2500000
83            77   162       47      40    0.2901235    0.2469136
84            73   244       88      60    0.3606557    0.2459016
85           862    41       14      10    0.3414634    0.2439024
86           808   167       62      40    0.3712575    0.2395210
87           869    21        6       5    0.2857143    0.2380952
88           111    80       32      19    0.4000000    0.2375000
89            48   276       99      65    0.3586957    0.2355072
90            83   199       51      45    0.2562814    0.2261307
91            71   129       36      29    0.2790698    0.2248062
92           861    58       23      13    0.3965517    0.2241379
93            34   183       47      41    0.2568306    0.2240437
94           164    23        5       5    0.2173913    0.2173913
95            52   360      151      76    0.4194444    0.2111111
96            75   406      161      84    0.3965517    0.2068966
97            32   123       41      23    0.3333333    0.1869919
98            66    83       19      15    0.2289157    0.1807229
99           868    52       14       9    0.2692308    0.1730769
100          437    35       13       6    0.3714286    0.1714286
101          136    26        7       4    0.2692308    0.1538462
102          871    17        2       2    0.1176471    0.1176471
103           46  1234      462     139    0.3743922    0.1126418

For most officers, the stop-to-search ratio is quite related to the stop-to-arrest ratio. This is expected, as ideally, officers only search when they believe that they have a reason to search. The officer with the highest stop-to-arrest ratio was officer with code 13, making arrests 67.59% of the time!

The average search-to-arrest ratio was 85.64%.