Exploring NYPD Stop-and-Frisk Data

1. Abstract

This project explores and analyzes data from the NYPD on stop-and-frisk encounters from 2022 to recognize relationships between variables and answer questions about demographics. Exploring these relationships and answering these research questions ultimately provides insights into the effect of police bias on stop-and-frisk encounters. We use R programming to visualize these relationships and explore trends.

2. Introduction

Psychologists have established that racial and other types of bias affect the way we treat one another, and police officers are not exempt from this. In this project, we will complete a demographic, location, and comparative analysis and inspect temporal trends, reasons for stops, stop outcomes, weapon and contraband discovery, repeat encounters, and officer identification. We complete our analyses and create graphs using the R programming language.

3. Methodology

Data Source: NYC Government Website NYPD Stop, Question, and Frisk Datasets.
Tools: R programming language, readxl, dplr, ggplot2.
Data Pre-processing: Cleaning datasets, handling missing values.
Data Visualization: Creating bar plots, line plots, and scatter plots using ggplot2.

4. Results

4.1 Demographic Analysis

4.1.1 Bar plot of Racial Distribution of Stop-and-Frisk Encounters

4.1.2 Bar plot of Gender Distribution of Stop-and-Frisk Encounters

The two bar plots above show the racial and gender distribution of stop-and-frisk encounters. We can see that the majority of individuals stopped and frisked by the police are Black, making up nearly 60% of all encounters. The bar plot of the gender distribution shows us that the majority of the individuals stopped by police are male. We would expect most individuals in stop-and-frisk encounters to be Black males. These findings are disparities because it is unlikely that Black people make up 60% of the entire New York City population, nor is it likely that males make up 92% of the New York City population either.

4.2 Temporal Trends

4.2.1 Line plot of Stop-and-Frisk Encounters Over the Months of 2022

In this line plot of stop-and-frisk encounters throughout 2022 by month, we can see there are the most stops between March, April, and May. The number of encounters decreases until July. They then increase as October approaches. There is then a dramatic decrease near the end of the year leading into December.

4.2.2 Bar plot of Most Common Days of Week for Stop-and-Frisk Encounters

This bar plot shows that stop-and-frisk encounters are most common on Wednesdays, although Thursdays, Fridays, and Saturdays follow closely behind in the number of stops. Stops are least common on Mondays, and Sundays and Tuesdays are also less common than the days with the most stops but are significantly higher than Mondays.

4.2.3 Bar plot of Most Common Times of Day for Stop-and-Frisk Encounters

In this bar plot, we observe that it is most common for stop-and-frisk encounters to occur in the evenings. The afternoons and late nights have similar occurrences of stops, while stops in the morning are the least common.

4.3 Reasons for Stops

4.3.1 Bar plot for Most Common Reasons for Stop-and-Frisk Encounters

The bar plot above shows the top 5 most common reasons for police to initiate a stop-and-frisk encounter. The suspect may or may not be guilty. However, the police must report a suspected crime as their reason for initiating a stop, so this graph reflects that. We decided on the top five reasons because the number of stops per reason decreases by at least 50 percent from the 5th to the 6th most common reason. We can see that the most common reason for the police to stop an individual is CPW, which stands for Criminal Possession of a Weapon.

4.3.2.a Bar plot of Reason for Stop vs. Outcome of Encounter

In this bar plot, we showcase the relationship between reasons for stopping an individual and the encounter’s outcome. Each of the suspected crimes belongs to a category, which is shown in the table below. We can see a large disparity between the people suspected of and stopped for Weapon Possession Crimes and the number of people arrested. While with all the other crime categories, we can see pretty consistently across the board that only slightly more people are not arrested than those who are.

4.3.2.b Table of Suspected Crimes in Crime Category

## # A tibble: 26 × 2
##    SUSPECTED_CRIME_DESCRIPTION                 crime_category          
##    <chr>                                       <chr>                   
##  1 CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE Drug-Related Crimes     
##  2 CRIMINAL POSSESSION OF MARIHUANA            Drug-Related Crimes     
##  3 CRIMINAL SALE OF CONTROLLED SUBSTANCE       Drug-Related Crimes     
##  4 CRIMINAL SALE OF MARIHUANA                  Drug-Related Crimes     
##  5 CPSP                                        Other Crimes            
##  6 MAKING GRAFFITI                             Other Crimes            
##  7 OTHER                                       Other Crimes            
##  8 UNAUTHORIZED USE OF A VEHICLE               Other Crimes            
##  9 CRIMINAL MISCHIEF                           Property-Related Crimes 
## 10 CRIMINAL POSSESSION OF FORGED INSTRUMENT    Property-Related Crimes 
## 11 CRIMINAL TRESPASS                           Property-Related Crimes 
## 12 AUTO STRIPPIG                               Theft-Related Crimes    
## 13 BURGLARY                                    Theft-Related Crimes    
## 14 GRAND LARCENY                               Theft-Related Crimes    
## 15 GRAND LARCENY AUTO                          Theft-Related Crimes    
## 16 PETIT LARCENY                               Theft-Related Crimes    
## 17 THEFT OF SERVICES                           Theft-Related Crimes    
## 18 ASSAULT                                     Violent Crimes          
## 19 FORCIBLE TOUCHING                           Violent Crimes          
## 20 MENACING                                    Violent Crimes          
## 21 MURDER                                      Violent Crimes          
## 22 RAPE                                        Violent Crimes          
## 23 RECKLESS ENDANGERMENT                       Violent Crimes          
## 24 ROBBERY                                     Violent Crimes          
## 25 TERRORISM                                   Violent Crimes          
## 26 CPW                                         Weapon Possession Crimes

4.4 Outcomes of Stops

4.4.1 Bar plot of Percentage of Stops that Result in Arrests or Summonses

This bar plot shows the percentage of stop-and-frisk encounters that result in arrests, summonses, or both. We can see that summonses are relatively rare for arrests and no arrests. Most encounters do not result in an arrest or summons, while with most encounters that result in arrests, there are no summons.

4.4.2 Bar plot of Percentage of Stops for Frisk, Search, and Arrest Outcomes

The bar plot shows the percentage of all combinations of Frisk/No Frisk, Search/No Search, and Arrest/No Arrest outcomes. Most outcomes where there is a frisk do not result in an arrest. There are more outcomes where a search leads to an arrest than encounters with searches that do not result in an arrest.

4.4.3 Bar plots of Outcomes Based on Racial Demographics of Individuals

The following seven bar plots show the differences in outcomes based on the racial demographics of the individuals stopped. The most significant observation about each race is that White individuals are more likely to be let go after being stopped without a frisk, search, or arrest and are less likely to be arrested overall. Black and Black Hispanic individuals are more likely to be frisked and searched regardless of arrest outcome and more likely to be frisked, searched, and arrested. White Hispanic individuals are more likely to be frisked and searched regardless of arrest outcome but are also more likely to be let go without being frisked, searched, or arrested. Asian/Pacific Islander individuals are less likely to be frisked and searched than other non-White races and are more likely to be let go without incident. Middle Eastern/Southwest Asian individuals are more likely to be frisked without a search or arrest and are also more likely to be let go without incident. American Indian/Alaskan Native individuals are more likely to be searched regardless of arrest outcome but are also more likely to be let go without incident.

4.5 Location Analysis

4.5.1 Bar plot of Stop Frequency by Neighborhood

This bar plot shows which neighborhoods in New York City have the highest incidents of stop-and-frisk encounters. The top three are the Bronx, Brooklyn, and Manhattan.

4.5.2 Scatter plot of Percentage of Stop-and-Frisk Encounters Across Neighborhoods by Race

This scatter plot shows the percentage of stop-and-frisk incidents by race across neighborhoods. It would be incorrect to draw any conclusions about the actual population of these neighborhoods based on our dataset, which only gives us information about the people stopped by police, not the population of those neighborhoods as a whole. We are unable to determine if there is a correlation between the location of stops and the racial demographics of the population based on the available data from the NYPD.

4.6 Weapon and Contraband Discovery

4.6.1 Bar plot for Weapon and Contraband Discovery Frequency

The bar plot above shows the frequency in which each type of weapon or contraband is found. Weapons are not found in the majority of cases, as we can see by the red bar that any weapon is found in approximately 15% of cases. Specific types of weapons or other contraband are found even less.

4.6.2 Bar plot of Top 5 Reasons for Stops Where Weapons are Found

In this bar plot, we can see the top 5 reasons for stops where police find weapons. Weapons are overwhelmingly found when a person is suspected crime of criminal possession of a weapon, which makes logical sense. The other reasons for stops where weapons are found focus on violent crimes, with the exception of burglary. The distinction between robbery and burglary is that robbery focuses on an act of violence in the eyes of the law. Burglary focuses on entering a location illegally to commit a crime.

4.7 Repeat Encounters

Based on our dataset, we could not explore repeat stop-and-frisk encounters with their demographics and outcomes. There is not enough information to help us identify unique individuals. Undoubtedly, there probably are some repeat encounters in the dataset, but we cannot prove that based on the information given.

4.8 Age and Stop Frequency

4.8.1 Bar plot of Frequency of Stops by Age Group

The bar plot above showcases the relationship between the age of individuals and their likelihood of being stopped and frisked. We can see that the age group most likely to be stopped is between 18 and 34. Before the age of 18 and after the age of 34 are significantly less likely.

4.8.2 Stacked Bar plot of Top 3 Reasons for Stops by Age Group

In this bar plot, we can see the top three reasons for stops across age groups. Overwhelming, the top reason for stopping in nearly all age groups is CPW. With individuals 55 and older, petit larceny (or minor theft) as a reason for stopping is given by police slightly more than CPW. For individuals up to the age of 24, the other two reasons are robbery and assault. For people in the 25-34 age group, the other top two reasons are assault and petit larceny. Finally, for people ages 35 and over, the top three reasons for a stop are CPW, burglary, and petit larceny.

4.9 Officer Identification

4.9.1 Bar plot of Top 10 Officers with the Highest Number of Stops

The bar plot above shows the top ten officers with the most stops throughout 2022. There is one officer who has way more stops than any of the other officers. For the rest of the plot, the number of stops reduces to about 300 and slowly reduces onward. This slow reduction of the number of stops from 300 is true for the rest of the dataset as well.

4.9.2 Bar plot of Outcomes for Randomly Sampled Officers with at least 75 Stops

This bar plot shows the variations of stop-and-frisk encounters based on a random sample of individual officers. There were 1,594 individual officers in the dataset, so it became necessary to get a random sample of officers with similar stop frequencies to show the variations properly. As we can see, some officers are more likely to let the people they stop go without incident than others. Some officers are more likely to perform an arrest after frisking someone without performing a search than others.

4.10 Comparative Analysis

Based on this data alone, we could not complete a comparative analysis of the overall numbers and demographic distributions to previous years because this data set only contains data from 2022. Even though the data sets are available for earlier years, like from 2022-2018, attempting to import the datasets to R repeatedly caused the program to crash due to the size of the datasets and lack of memory in RStudio. The issue persisted even when we excluded some of these datasets to free up memory. Working with datasets of this size may need to be done in a different programming language, such as SQL, with the necessary RAM.

5. Conclusion

This project showcases the effects of bias on police stop-and-frisk encounters. It provides helpful insights to convince people in the law enforcement field to implement or prioritize anti-bias training for their officers and DEI departments that can specifically help tackle these issues..

6. References

NYC Government Website NYPD Stop, Question, and Frisk 2022 Dataset [https://www.nyc.gov/assets/nypd/downloads/excel/analysis_and_planning/stop-question-frisk/sqf-2022.xlsx]

7. Appendices

7.1 Setup Code, Loading Libraries, Reading in Data

knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(ggplot2)
library(dplyr)

stop_frisk_data <- read_excel("sqf-2022.xlsx")

7.2 Results 4.1 Demographic Analysis

7.2.1 Cleaning and Grouping Data, Bar plot 4.1.1

stop_frisk_data_clean_DA <- stop_frisk_data[!stop_frisk_data$SUSPECT_RACE_DESCRIPTION %in% c("(null)") &
                                           !stop_frisk_data$SUSPECT_SEX %in% c("(null)"), ]

race_distribution <- stop_frisk_data_clean_DA %>%
  count(SUSPECT_RACE_DESCRIPTION) %>%
  mutate(proportion = n / sum(n))

ggplot(race_distribution, aes(x = SUSPECT_RACE_DESCRIPTION, y = n, fill = SUSPECT_RACE_DESCRIPTION)) +
  geom_bar(stat = "identity", color = "black") + 
  geom_text(aes(label = scales::percent(proportion), y = n + 0.15),  
            hjust = 0.5, vjust = -0.3, size = 3.75) +  
  labs(title = "Racial Distribution of Individuals Stopped and Frisked",
       x = "Race",
       y = "Count with Proportion") +
  scale_fill_brewer(palette = "Set3") +  
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),  
    axis.text.y = element_text(size = 10),  
    plot.margin = margin(10, 10, 20, 10),  
    plot.title = element_text(hjust = 0.5), 
    legend.title = element_blank(),  
    legend.text = element_text(size = 6.5), 
    legend.key.size = unit(0.5, "cm") 
  ) +
  theme(legend.position = "right") +  
  coord_cartesian(ylim = c(0, max(race_distribution$n) + 100))

7.2.2 Cleaning, Filtering Data, Bar plot 4.1.2

gender_distribution <- stop_frisk_data %>%
  filter(!is.na(SUSPECT_SEX) & SUSPECT_SEX != "(null)") %>%  
  count(SUSPECT_SEX) %>%  
  mutate(proportion = n / sum(n))

ggplot(gender_distribution, aes(x = SUSPECT_SEX, y = n, fill = SUSPECT_SEX)) +
  geom_bar(stat = "identity", color = "black") +  
  geom_text(aes(label = scales::percent(proportion), y = n + 0.15),  
            hjust = 0.5, vjust = -0.5, size = 3.75) +  
  labs(title = "Gender Distribution of Individuals Stopped and Frisked",
       x = "Gender",
       y = "Count with Proportion") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(size = 10),  
    axis.text.y = element_text(size = 10),  
    plot.margin = margin(10, 10, 20, 10),  
    plot.title = element_text(hjust = 0.5),  
    legend.position = "none", 
    legend.title = element_blank(),  
    legend.text = element_text(size = 6.5), 
    legend.key.size = unit(0.5, "cm") 
  ) +
  coord_cartesian(ylim = c(0, max(gender_distribution$n) + 1000))

7.3 Results 4.2 Temporal Trends

7.3.1 Cleaning, Grouping Data, Line plot 4.2.1

stop_frisk_data$STOP_FRISK_DATE <- as.Date(stop_frisk_data$STOP_FRISK_DATE)

monthly_counts <- stop_frisk_data %>%
  group_by(Month = format(STOP_FRISK_DATE, "%m-%Y")) %>%  
  summarise(count = n())

ggplot(monthly_counts, aes(x = Month, y = count)) +
  geom_line(group = 1, color = "skyblue", linewidth = 1) +  
  geom_point(color = "blue", size = 2) +  
  labs(title = "Stop-and-Frisk Encounters in 2022 by Month",
       x = "Month",
       y = "Number of Encounters") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.3.2 Cleaning, Grouping Data, Bar plot 4.2.2

stop_frisk_data <- stop_frisk_data %>%
  mutate(
    day_of_week = weekdays(STOP_FRISK_DATE),  
    hour_of_day = as.integer(format(STOP_FRISK_TIME, "%H"))  
  )


stop_frisk_data <- stop_frisk_data %>%
  mutate(
    day_of_week = factor(weekdays(STOP_FRISK_DATE), 
                         levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
  )

day_of_week_counts <- stop_frisk_data %>%
  group_by(day_of_week) %>%
  summarise(count = n()) %>%
  arrange(desc(count))  

day_of_week_counts$day_of_week <- factor(day_of_week_counts$day_of_week, 
                                         levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

stop_frisk_data$day_of_week <- factor(stop_frisk_data$day_of_week, 
                                      levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

ggplot(day_of_week_counts, aes(x = day_of_week, y = count, fill = day_of_week)) +
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Stop-and-Frisk Encounters by Day of the Week",
       x = "Day of the Week", y = "Number of Encounters") +
  scale_fill_brewer(palette = "Set3", name = "Day of the Week") +  
  theme_minimal() +
  theme(
    axis.text.y = element_text(angle = 0, hjust = 1),  
    axis.text.x = element_text(size = 10),  
    legend.title = element_text(size = 10),  
    legend.text = element_text(size = 8),  
    legend.position = "right",  
    plot.margin = margin(10, 10, 20, 10)  
  ) +
  coord_flip() +  
  scale_x_discrete(limits = rev(levels(stop_frisk_data$day_of_week)))

7.3.3 Data Transformation, Bar plot 4.2.3

stop_frisk_data <- stop_frisk_data %>%
  mutate(
    time_of_day = case_when(
      hour_of_day >= 0 & hour_of_day < 6 ~ "Late Night",
      hour_of_day >= 6 & hour_of_day < 12 ~ "Morning",
      hour_of_day >= 12 & hour_of_day < 18 ~ "Afternoon",
      hour_of_day >= 18 & hour_of_day < 24 ~ "Evening",
      TRUE ~ "Unknown"  
    ),
    time_of_day = factor(time_of_day, levels = c("Morning", "Afternoon", "Evening", "Late Night"))
  )

time_of_day_counts <- stop_frisk_data %>%
  count(time_of_day)

ggplot(stop_frisk_data, aes(x = time_of_day, fill = time_of_day)) +
  geom_bar(stat = "count", color = "black") +
  labs(title = "Stop-and-Frisk Encounters by Time of Day",
       x = "Time of Day", y = "Number of Encounters", fill = "Time of Day") +
  scale_fill_brewer(palette = "Set3") +  
  theme_minimal() +
  theme(
    axis.text.x = element_text(size = 10),  
    axis.text.y = element_text(size = 10),  
    plot.margin = margin(10, 10, 20, 10),  
    plot.title = element_text(hjust = 0.5),  
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),  
    legend.position = "right"  
  ) +
  coord_flip() +  
  scale_x_discrete(limits = rev(levels(stop_frisk_data$time_of_day)))

7.4 Results 4.3 Reasons for Stops

7.4.1 Data Grouping, Bar plot 4.3.1

reason_counts <- stop_frisk_data %>%
  group_by(SUSPECTED_CRIME_DESCRIPTION) %>%
  tally() %>%
  arrange(desc(n))

top_5_reasons <- stop_frisk_data %>%
  group_by(SUSPECTED_CRIME_DESCRIPTION) %>%
  tally() %>%
  arrange(desc(n)) %>%
  top_n(5) 

top_5_reasons <- top_5_reasons %>%
  mutate(SUSPECTED_CRIME_DESCRIPTION = factor(SUSPECTED_CRIME_DESCRIPTION, 
                                              levels = top_5_reasons$SUSPECTED_CRIME_DESCRIPTION[order(-top_5_reasons$n)]))

ggplot(top_5_reasons, aes(x = reorder(SUSPECTED_CRIME_DESCRIPTION, -n), y = n, fill = SUSPECTED_CRIME_DESCRIPTION)) +
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Top 5 Most Common Reasons for Stop and Frisk Encounters",
       x = "Reason for Stop", 
       y = "Frequency",
       fill = "Reason for Stop") +  
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),  
    axis.text.y = element_text(size = 10),  
    plot.margin = margin(10, 10, 20, 10)  
  )

7.4.2 Transforming, Grouping Data, Bar plot 4.3.2.a

stop_frisk_data <- stop_frisk_data %>%
  mutate(
    crime_category = case_when(
      SUSPECTED_CRIME_DESCRIPTION %in% c("ASSAULT", "MENACING", "RECKLESS ENDANGERMENT", "ROBBERY", 
                                         "RAPE", "MURDER", "FORCIBLE TOUCHING", "TERRORISM") ~ "Violent Crimes",
      SUSPECTED_CRIME_DESCRIPTION %in% c("GRAND LARCENY AUTO", "PETIT LARCENY", "GRAND LARCENY", "BURGLARY", 
                                         "THEFT OF SERVICES", "AUTO STRIPPING", "AUTO STRIPPIG") ~ "Theft-Related Crimes",
      SUSPECTED_CRIME_DESCRIPTION %in% c("CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE", "CRIMINAL SALE OF CONTROLLED SUBSTANCE", 
                                         "CRIMINAL POSSESSION OF MARIHUANA", "CRIMINAL SALE OF MARIHUANA") ~ "Drug-Related Crimes",
      SUSPECTED_CRIME_DESCRIPTION %in% c("CRIMINAL TRESPASS", "CRIMINAL MISCHIEF", "CRIMINAL POSSESSION OF FORGED INSTRUMENT") ~ "Property-Related Crimes",
      SUSPECTED_CRIME_DESCRIPTION %in% c("CPSP", "UNAUTHORIZED USE OF A VEHICLE", "MAKING GRAFFITI", "OTHER") ~ "Other Crimes",
      SUSPECTED_CRIME_DESCRIPTION == "CPW" ~ "Weapon Possession Crimes",  
      TRUE ~ "Unknown"
    )
  )

outcome_by_category <- stop_frisk_data %>%
  group_by(crime_category, SUSPECT_ARRESTED_FLAG) %>%
  summarise(count = n(), .groups = "drop")

ggplot(outcome_by_category, aes(x = crime_category, y = count, fill = SUSPECT_ARRESTED_FLAG)) +
  geom_bar(stat = "identity", position = "dodge") +  
  labs(title = "Outcome of Stop-and-Frisk Encounters by Crime Category",
       x = "Crime Category", y = "Number of Encounters",
       fill = "Arrested") +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10), 
        axis.text.y = element_text(size = 10),
        legend.title = element_text(size = 12),  
        legend.text = element_text(size = 10),   
        plot.margin = margin(10, 10, 20, 10))

7.4.3 Table 4.3.2.b

crime_category_table <- stop_frisk_data %>%
  select(SUSPECTED_CRIME_DESCRIPTION, crime_category) %>%
  distinct() %>%  
  arrange(crime_category, SUSPECTED_CRIME_DESCRIPTION) 

print(crime_category_table, n = 26)

7.5 Results 4.4 Outcomes of Stops

7.5.1 Transforming, Grouping Data, Bar plot 4.4.1

outcome_category_data <- stop_frisk_data %>%
  mutate(
    outcome_category = case_when(
      SUSPECT_ARRESTED_FLAG == "Y" & SUMMONS_ISSUED_FLAG == "Y" ~ "Arrest, Summons", 
      SUSPECT_ARRESTED_FLAG == "Y" & SUMMONS_ISSUED_FLAG == "N" ~ "Arrest, No Summons",     
      SUSPECT_ARRESTED_FLAG == "N" & SUMMONS_ISSUED_FLAG == "Y" ~ "No Arrest, Summons",   
      SUSPECT_ARRESTED_FLAG == "N" & SUMMONS_ISSUED_FLAG == "N" ~ "No Arrest, No Summons",    
      TRUE ~ "Unknown"  
    )
  ) %>%
  count(outcome_category)
total_count_outcome_category <- sum(outcome_category_data$n)

outcome_category_data <- outcome_category_data %>%
  mutate(percentage = (n / total_count_outcome_category) * 100)

ggplot(outcome_category_data, aes(x = outcome_category, y = percentage, fill = outcome_category)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Outcome Category: Arrests and Summonses",
       x = "Outcome", y = "Percentage of Encounters", fill = "Outcome") +
  theme_minimal() +
  theme(axis.text.x = element_blank(), 
        axis.text.y = element_text(size = 10), 
        legend.title = element_text(size = 12), 
        legend.text = element_text(size = 10))

7.5.2 Transforming, Grouping Data, Bar plot 4.4.2

encounter_outcome_data <- stop_frisk_data %>%
  mutate(
    encounter_outcome = case_when(
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
      TRUE ~ "Other"
    )
  )

encounter_outcome_counts <- encounter_outcome_data %>%
  group_by(encounter_outcome) %>%
  summarise(count = n(), .groups = "drop")

total_count_encounter_outcome <- sum(encounter_outcome_counts$count)

encounter_outcome_counts <- encounter_outcome_counts %>%
  mutate(percentage = (count / total_count_encounter_outcome) * 100)

ggplot(encounter_outcome_counts, aes(x = encounter_outcome, y = percentage, fill = encounter_outcome)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Encounter Outcome: Frisks, Searches, and Arrests",
       x = "Encounter Outcome", y = "Percentage of Encounters", fill = "Encounter Outcome") +
  theme_minimal() +
  theme(axis.text.x = element_blank(),  # Remove x-axis labels
        axis.text.y = element_text(size = 10), 
        legend.title = element_text(size = 12), 
        legend.text = element_text(size = 10),
        plot.title = element_text(size = 14))

7.5.3 Transforming, Grouping, Merging Data, Bar plots 4.4.3

encounter_outcome_data_clean <- stop_frisk_data %>%
  filter(SUSPECT_RACE_DESCRIPTION != "(null)", 
         !is.na(SUSPECT_RACE_DESCRIPTION)) %>%
  mutate(
    encounter_outcome = case_when(
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
      TRUE ~ "Other"
    )
  )

encounter_outcome_by_race <- encounter_outcome_data_clean %>%
  group_by(SUSPECT_RACE_DESCRIPTION, encounter_outcome) %>%
  summarise(count = n(), .groups = "drop")

total_count_race <- encounter_outcome_by_race %>%
  group_by(SUSPECT_RACE_DESCRIPTION) %>%
  summarise(total = sum(count), .groups = "drop")

encounter_outcome_by_race <- encounter_outcome_by_race %>%
  left_join(total_count_race, by = "SUSPECT_RACE_DESCRIPTION") %>%
  mutate(percentage = (count / total) * 100)  # Calculate percentage for each outcome by race

race_of_interest <- c("WHITE", "BLACK", "BLACK HISPANIC", "WHITE HISPANIC", 
                      "ASIAN / PACIFIC ISLANDER", "MIDDLE EASTERN/SOUTHWEST ASIAN", 
                      "AMERICAN INDIAN/ALASKAN NATIVE")

for (race in race_of_interest) {
  race_data <- encounter_outcome_by_race %>%
    filter(SUSPECT_RACE_DESCRIPTION == race)
  
  p <- ggplot(race_data, aes(x = encounter_outcome, y = percentage, fill = encounter_outcome)) +
    geom_bar(stat = "identity", position = "stack") +
    labs(title = paste("Encounter Outcomes for", race),
         x = "Encounter Outcome", y = "Percentage of Encounters by Racial Category", fill = "Encounter Outcome") +
    theme_minimal() +
    theme(axis.text.x = element_blank(),  
          axis.text.y = element_text(size = 10),
          plot.title = element_text(size = 12))  
  
  print(p)
}

7.6 Results 4.5 Location Analysis

7.6.1 Grouping Data, Bar plot 4.5.1

stop_by_borough <- stop_frisk_data %>%
  group_by(STOP_LOCATION_BORO_NAME) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

ggplot(stop_by_borough, aes(x = reorder(STOP_LOCATION_BORO_NAME, -count), y = count)) +
  geom_bar(stat = "identity", fill = "lightcoral") +
  labs(title = "Number of Stop-and-Frisk Incidents by Neighborhood",
       x = "Neighborhood", y = "Number of Incidents") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.6.2 Data Grouping, Scatter plot 4.5.2

stop_frisk_data_clean <- stop_frisk_data %>%
  filter(SUSPECT_RACE_DESCRIPTION != "(null)")

stop_frisk_demo_by_borough <- stop_frisk_data_clean %>%
  group_by(STOP_LOCATION_BORO_NAME, SUSPECT_RACE_DESCRIPTION) %>%
  summarise(count = n()) %>%
  ungroup()

stop_frisk_demo_by_borough_percentage <- stop_frisk_demo_by_borough %>%
  group_by(STOP_LOCATION_BORO_NAME) %>%
  mutate(percentage = (count / sum(count)) * 100) %>%
  ungroup()

ggplot(stop_frisk_demo_by_borough_percentage, aes(x = STOP_LOCATION_BORO_NAME, y = percentage, color = SUSPECT_RACE_DESCRIPTION)) +
  geom_point(size = 3, alpha = 0.7) + 
  labs(title = "Percentage of Stop-and-Frisk Incidents by Race Across Neighborhoods",
       x = "Neighborhood",
       y = "Percentage of Stop-and-Frisk Incidents",
       color = "Race") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.7 Result 4.6 Weapon and Contraband Discovery

7.7.1 Transforming Data, Barplot 4.6.1

cleaned_data <- stop_frisk_data %>%
  mutate(
    FIREARM_FLAG = ifelse(FIREARM_FLAG == "Y", TRUE, FALSE),
    KNIFE_CUTTER_FLAG = ifelse(KNIFE_CUTTER_FLAG == "Y", TRUE, FALSE),
    OTHER_WEAPON_FLAG = ifelse(OTHER_WEAPON_FLAG == "Y", TRUE, FALSE),
    WEAPON_FOUND_FLAG = ifelse(WEAPON_FOUND_FLAG == "Y", TRUE, FALSE),
    OTHER_CONTRABAND_FLAG = ifelse(OTHER_CONTRABAND_FLAG == "Y", TRUE, FALSE)
  )

findings_summary <- cleaned_data %>%
  summarise(
    firearm_found = sum(FIREARM_FLAG, na.rm = TRUE),
    knife_found = sum(KNIFE_CUTTER_FLAG, na.rm = TRUE),
    other_weapon_found = sum(OTHER_WEAPON_FLAG, na.rm = TRUE),
    contraband_found = sum(OTHER_CONTRABAND_FLAG, na.rm = TRUE),
    weapon_found = sum(WEAPON_FOUND_FLAG, na.rm = TRUE)
  )

total_encounters <- nrow(cleaned_data)

findings_percentage <- findings_summary %>%
  mutate(
    firearm_found_percentage = (firearm_found / total_encounters) * 100,
    knife_found_percentage = (knife_found / total_encounters) * 100,
    other_weapon_found_percentage = (other_weapon_found / total_encounters) * 100,
    contraband_found_percentage = (contraband_found / total_encounters) * 100,
    weapon_found_percentage = (weapon_found / total_encounters) * 100
  )

findings_percentage_data <- data.frame(
  category = c("Firearm Found", "Knife Found", "Other Weapon Found", "Contraband Found", "Any Weapon Found"),
  percentage = c(findings_percentage$firearm_found_percentage, 
                 findings_percentage$knife_found_percentage, 
                 findings_percentage$other_weapon_found_percentage, 
                 findings_percentage$contraband_found_percentage,
                 findings_percentage$weapon_found_percentage)
)

ggplot(findings_percentage_data, aes(x = category, y = percentage, fill = category)) +
  geom_bar(stat = "identity") +
  labs(title = "Percentage of Weapons or Contraband Found in Stops",
       x = "Type of Finding",
       y = "Percentage of Findings",
       fill = "Category") +  
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),  
    axis.text.y = element_text(size = 12),  
    plot.title = element_text(size = 14)
  )

7.7.2 Data Filtering, Bar plot 4.6.2

stop_frisk_data_modified <- stop_frisk_data_clean %>%
  mutate(
    FIREARM_FLAG = ifelse(FIREARM_FLAG == "(null)", NA, FIREARM_FLAG),
    KNIFE_CUTTER_FLAG = ifelse(KNIFE_CUTTER_FLAG == "(null)", NA, KNIFE_CUTTER_FLAG),
    OTHER_WEAPON_FLAG = ifelse(OTHER_WEAPON_FLAG == "(null)", NA, OTHER_WEAPON_FLAG),
    OTHER_CONTRABAND_FLAG = ifelse(OTHER_CONTRABAND_FLAG == "(null)", NA, OTHER_CONTRABAND_FLAG),
    WEAPON_FOUND_FLAG = ifelse(WEAPON_FOUND_FLAG == "(null)", NA, WEAPON_FOUND_FLAG)
  )

stop_weapon_contraband_data <- stop_frisk_data_modified %>%
  filter(!is.na(SUSPECTED_CRIME_DESCRIPTION)) %>%
  mutate(
    weapon_found = ifelse(FIREARM_FLAG == "Y" | KNIFE_CUTTER_FLAG == "Y" | OTHER_WEAPON_FLAG == "Y" | WEAPON_FOUND_FLAG == "Y", "Weapon Found", "No Weapon Found"),
    contraband_found = ifelse(OTHER_CONTRABAND_FLAG == "Y", "Contraband Found", "No Contraband Found")
  )

weapon_found_data <- stop_weapon_contraband_data %>%
  filter(weapon_found == "Weapon Found")

top_weapon_reasons <- weapon_found_data %>%
  count(SUSPECTED_CRIME_DESCRIPTION) %>%
  top_n(5, n)

top_weapon_reasons <- top_weapon_reasons %>%
  mutate(SUSPECTED_CRIME_DESCRIPTION = factor(SUSPECTED_CRIME_DESCRIPTION, levels = SUSPECTED_CRIME_DESCRIPTION[order(-n)]))

ggplot(top_weapon_reasons, aes(x = SUSPECTED_CRIME_DESCRIPTION, y = n, fill = SUSPECTED_CRIME_DESCRIPTION)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 5 Reasons for Stops Where Weapons Are Found",
       x = "Reason for Stop (Crime Description)",
       y = "Frequency of Stop",
       fill = "Reason for Stop") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        axis.text.y = element_text(size = 12), 
        plot.title = element_text(size = 14))

7.8 Results 4.8 Age and Stop Frequency

7.8.1 Data Grouping, Bar plot 4.8.1

age_group_data <- stop_frisk_data %>%
  filter(SUSPECT_REPORTED_AGE != "(null)") %>%  
  mutate(
    age_group = case_when(
      as.numeric(SUSPECT_REPORTED_AGE) < 18 ~ "Under 18",
      as.numeric(SUSPECT_REPORTED_AGE) >= 18 & as.numeric(SUSPECT_REPORTED_AGE) <= 24 ~ "18-24",
      as.numeric(SUSPECT_REPORTED_AGE) >= 25 & as.numeric(SUSPECT_REPORTED_AGE) <= 34 ~ "25-34",
      as.numeric(SUSPECT_REPORTED_AGE) >= 35 & as.numeric(SUSPECT_REPORTED_AGE) <= 44 ~ "35-44",
      as.numeric(SUSPECT_REPORTED_AGE) >= 45 & as.numeric(SUSPECT_REPORTED_AGE) <= 54 ~ "45-54",
      as.numeric(SUSPECT_REPORTED_AGE) >= 55 ~ "55 and older",
      TRUE ~ "Unknown" 
    )
  ) %>%
  group_by(age_group) %>% 
  summarise(stop_count = n())

age_group_data$age_group <- factor(age_group_data$age_group, 
                                   levels = c("Under 18", "18-24", "25-34", "35-44", "45-54", "55 and older", "Unknown"))

ggplot(age_group_data, aes(x = age_group, y = stop_count, fill = age_group)) +
  geom_bar(stat = "identity") +
  labs(title = "Frequency of Stop-and-Frisk Encounters by Age Group",
       x = "Age Group",
       y = "Number of Stops",
       fill = "Age Group") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.8.2 Data Grouping, Bar plot 4.8.2

age_reason_data <- stop_frisk_data %>%
  filter(SUSPECT_REPORTED_AGE != "(null)") %>% 
  mutate(
    age_group = case_when(
      as.numeric(SUSPECT_REPORTED_AGE) < 18 ~ "Under 18",
      as.numeric(SUSPECT_REPORTED_AGE) >= 18 & as.numeric(SUSPECT_REPORTED_AGE) <= 24 ~ "18-24",
      as.numeric(SUSPECT_REPORTED_AGE) >= 25 & as.numeric(SUSPECT_REPORTED_AGE) <= 34 ~ "25-34",
      as.numeric(SUSPECT_REPORTED_AGE) >= 35 & as.numeric(SUSPECT_REPORTED_AGE) <= 44 ~ "35-44",
      as.numeric(SUSPECT_REPORTED_AGE) >= 45 & as.numeric(SUSPECT_REPORTED_AGE) <= 54 ~ "45-54",
      as.numeric(SUSPECT_REPORTED_AGE) >= 55 ~ "55 and older",
      TRUE ~ "Unknown"  
    )
  ) %>%
  group_by(age_group, SUSPECTED_CRIME_DESCRIPTION) %>%
  summarise(stop_count = n(), .groups = "drop") %>%
  arrange(age_group, desc(stop_count))  

age_reason_top3 <- age_reason_data %>%
  group_by(age_group) %>%
  top_n(3, stop_count) %>%
  ungroup()

age_reason_top3$age_group <- factor(age_reason_top3$age_group, 
                                    levels = c("Under 18", "18-24", "25-34", "35-44", "45-54", "55 and older", "Unknown"))

ggplot(age_reason_top3, aes(x = age_group, y = stop_count, fill = SUSPECTED_CRIME_DESCRIPTION)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Top 3 Reasons for Stops by Age Group",
       x = "Age Group",
       y = "Number of Stops",
       fill = "Reason for Stop") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.9 Results 4.9 Officer Identification

7.9.1 Grouping Data, Bar plot 4.9.1

officer_stop_count <- stop_frisk_data %>%
  group_by(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %>%  
  summarise(stop_count = n(), .groups = "drop") %>%  
  arrange(desc(stop_count))

top_10_officers <- officer_stop_count %>%
  top_n(10, stop_count)

ggplot(top_10_officers, aes(x = reorder(paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE), stop_count), 
                            y = stop_count, fill = stop_count)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 10 Officers with Highest Number of Stop-and-Frisk Encounters",
       x = "Officer (Rank and Command Code)",
       y = "Number of Stops",
       fill = "Number of Stops") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

7.9.2 Data Sampling, Summarizing, and Grouping, Bar plot 4.9.2

set.seed(123)

officer_stop_counts <- stop_frisk_data %>%
  group_by(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %>%
  summarise(num_stops = n(), .groups = 'drop') 

officers_with_100_stops <- officer_stop_counts %>%
  filter(num_stops >= 75)

sampled_officers <- officers_with_100_stops %>%
  sample_n(10) 

sampled_data <- stop_frisk_data %>%
  filter(paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %in% 
           paste(sampled_officers$ISSUING_OFFICER_RANK, sampled_officers$ISSUING_OFFICER_COMMAND_CODE))

encounter_outcome_sampled_data <- sampled_data %>%
  mutate(
    encounter_outcome = case_when(
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
      FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
      FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
      TRUE ~ "Other"
    )
  )

outcome_by_sampled_officer <- encounter_outcome_sampled_data %>%
  group_by(ISSUING_OFFICER_COMMAND_CODE, ISSUING_OFFICER_RANK, encounter_outcome) %>%
  summarise(count = n(), .groups = 'drop')

outcome_by_sampled_officer <- outcome_by_sampled_officer %>%
  mutate(officer_label = paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE))

ggplot(outcome_by_sampled_officer, aes(x = officer_label, y = count, fill = encounter_outcome)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Variation of Encounter Outcomes for Officers with At Least 75 Stops",
    x = "Issuing Officer Rank & Command Code",
    y = "Count of Outcomes",
    fill = "Encounter Outcome"  
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))