This project explores and analyzes data from the NYPD on stop-and-frisk encounters from 2022 to recognize relationships between variables and answer questions about demographics. Exploring these relationships and answering these research questions ultimately provides insights into the effect of police bias on stop-and-frisk encounters. We use R programming to visualize these relationships and explore trends.
Psychologists have established that racial and other types of bias affect the way we treat one another, and police officers are not exempt from this. In this project, we will complete a demographic, location, and comparative analysis and inspect temporal trends, reasons for stops, stop outcomes, weapon and contraband discovery, repeat encounters, and officer identification. We complete our analyses and create graphs using the R programming language.
The two bar plots above show the racial and gender distribution of stop-and-frisk encounters. We can see that the majority of individuals stopped and frisked by the police are Black, making up nearly 60% of all encounters. The bar plot of the gender distribution shows us that the majority of the individuals stopped by police are male. We would expect most individuals in stop-and-frisk encounters to be Black males. These findings are disparities because it is unlikely that Black people make up 60% of the entire New York City population, nor is it likely that males make up 92% of the New York City population either.
In this line plot of stop-and-frisk encounters throughout 2022 by month, we can see there are the most stops between March, April, and May. The number of encounters decreases until July. They then increase as October approaches. There is then a dramatic decrease near the end of the year leading into December.
This bar plot shows that stop-and-frisk encounters are most common on Wednesdays, although Thursdays, Fridays, and Saturdays follow closely behind in the number of stops. Stops are least common on Mondays, and Sundays and Tuesdays are also less common than the days with the most stops but are significantly higher than Mondays.
In this bar plot, we observe that it is most common for stop-and-frisk encounters to occur in the evenings. The afternoons and late nights have similar occurrences of stops, while stops in the morning are the least common.
The bar plot above shows the top 5 most common reasons for police to initiate a stop-and-frisk encounter. The suspect may or may not be guilty. However, the police must report a suspected crime as their reason for initiating a stop, so this graph reflects that. We decided on the top five reasons because the number of stops per reason decreases by at least 50 percent from the 5th to the 6th most common reason. We can see that the most common reason for the police to stop an individual is CPW, which stands for Criminal Possession of a Weapon.
In this bar plot, we showcase the relationship between reasons for stopping an individual and the encounter’s outcome. Each of the suspected crimes belongs to a category, which is shown in the table below. We can see a large disparity between the people suspected of and stopped for Weapon Possession Crimes and the number of people arrested. While with all the other crime categories, we can see pretty consistently across the board that only slightly more people are not arrested than those who are.
## # A tibble: 26 × 2
## SUSPECTED_CRIME_DESCRIPTION crime_category
## <chr> <chr>
## 1 CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE Drug-Related Crimes
## 2 CRIMINAL POSSESSION OF MARIHUANA Drug-Related Crimes
## 3 CRIMINAL SALE OF CONTROLLED SUBSTANCE Drug-Related Crimes
## 4 CRIMINAL SALE OF MARIHUANA Drug-Related Crimes
## 5 CPSP Other Crimes
## 6 MAKING GRAFFITI Other Crimes
## 7 OTHER Other Crimes
## 8 UNAUTHORIZED USE OF A VEHICLE Other Crimes
## 9 CRIMINAL MISCHIEF Property-Related Crimes
## 10 CRIMINAL POSSESSION OF FORGED INSTRUMENT Property-Related Crimes
## 11 CRIMINAL TRESPASS Property-Related Crimes
## 12 AUTO STRIPPIG Theft-Related Crimes
## 13 BURGLARY Theft-Related Crimes
## 14 GRAND LARCENY Theft-Related Crimes
## 15 GRAND LARCENY AUTO Theft-Related Crimes
## 16 PETIT LARCENY Theft-Related Crimes
## 17 THEFT OF SERVICES Theft-Related Crimes
## 18 ASSAULT Violent Crimes
## 19 FORCIBLE TOUCHING Violent Crimes
## 20 MENACING Violent Crimes
## 21 MURDER Violent Crimes
## 22 RAPE Violent Crimes
## 23 RECKLESS ENDANGERMENT Violent Crimes
## 24 ROBBERY Violent Crimes
## 25 TERRORISM Violent Crimes
## 26 CPW Weapon Possession Crimes
This bar plot shows the percentage of stop-and-frisk encounters that result in arrests, summonses, or both. We can see that summonses are relatively rare for arrests and no arrests. Most encounters do not result in an arrest or summons, while with most encounters that result in arrests, there are no summons.
The bar plot shows the percentage of all combinations of Frisk/No Frisk, Search/No Search, and Arrest/No Arrest outcomes. Most outcomes where there is a frisk do not result in an arrest. There are more outcomes where a search leads to an arrest than encounters with searches that do not result in an arrest.
The following seven bar plots show the differences in outcomes based on the racial demographics of the individuals stopped. The most significant observation about each race is that White individuals are more likely to be let go after being stopped without a frisk, search, or arrest and are less likely to be arrested overall. Black and Black Hispanic individuals are more likely to be frisked and searched regardless of arrest outcome and more likely to be frisked, searched, and arrested. White Hispanic individuals are more likely to be frisked and searched regardless of arrest outcome but are also more likely to be let go without being frisked, searched, or arrested. Asian/Pacific Islander individuals are less likely to be frisked and searched than other non-White races and are more likely to be let go without incident. Middle Eastern/Southwest Asian individuals are more likely to be frisked without a search or arrest and are also more likely to be let go without incident. American Indian/Alaskan Native individuals are more likely to be searched regardless of arrest outcome but are also more likely to be let go without incident.
This bar plot shows which neighborhoods in New York City have the highest incidents of stop-and-frisk encounters. The top three are the Bronx, Brooklyn, and Manhattan.
This scatter plot shows the percentage of stop-and-frisk incidents by race across neighborhoods. It would be incorrect to draw any conclusions about the actual population of these neighborhoods based on our dataset, which only gives us information about the people stopped by police, not the population of those neighborhoods as a whole. We are unable to determine if there is a correlation between the location of stops and the racial demographics of the population based on the available data from the NYPD.
The bar plot above shows the frequency in which each type of weapon or contraband is found. Weapons are not found in the majority of cases, as we can see by the red bar that any weapon is found in approximately 15% of cases. Specific types of weapons or other contraband are found even less.
In this bar plot, we can see the top 5 reasons for stops where police find weapons. Weapons are overwhelmingly found when a person is suspected crime of criminal possession of a weapon, which makes logical sense. The other reasons for stops where weapons are found focus on violent crimes, with the exception of burglary. The distinction between robbery and burglary is that robbery focuses on an act of violence in the eyes of the law. Burglary focuses on entering a location illegally to commit a crime.
Based on our dataset, we could not explore repeat stop-and-frisk encounters with their demographics and outcomes. There is not enough information to help us identify unique individuals. Undoubtedly, there probably are some repeat encounters in the dataset, but we cannot prove that based on the information given.
The bar plot above showcases the relationship between the age of individuals and their likelihood of being stopped and frisked. We can see that the age group most likely to be stopped is between 18 and 34. Before the age of 18 and after the age of 34 are significantly less likely.
In this bar plot, we can see the top three reasons for stops across age groups. Overwhelming, the top reason for stopping in nearly all age groups is CPW. With individuals 55 and older, petit larceny (or minor theft) as a reason for stopping is given by police slightly more than CPW. For individuals up to the age of 24, the other two reasons are robbery and assault. For people in the 25-34 age group, the other top two reasons are assault and petit larceny. Finally, for people ages 35 and over, the top three reasons for a stop are CPW, burglary, and petit larceny.
The bar plot above shows the top ten officers with the most stops throughout 2022. There is one officer who has way more stops than any of the other officers. For the rest of the plot, the number of stops reduces to about 300 and slowly reduces onward. This slow reduction of the number of stops from 300 is true for the rest of the dataset as well.
This bar plot shows the variations of stop-and-frisk encounters based on a random sample of individual officers. There were 1,594 individual officers in the dataset, so it became necessary to get a random sample of officers with similar stop frequencies to show the variations properly. As we can see, some officers are more likely to let the people they stop go without incident than others. Some officers are more likely to perform an arrest after frisking someone without performing a search than others.
Based on this data alone, we could not complete a comparative analysis of the overall numbers and demographic distributions to previous years because this data set only contains data from 2022. Even though the data sets are available for earlier years, like from 2022-2018, attempting to import the datasets to R repeatedly caused the program to crash due to the size of the datasets and lack of memory in RStudio. The issue persisted even when we excluded some of these datasets to free up memory. Working with datasets of this size may need to be done in a different programming language, such as SQL, with the necessary RAM.
This project showcases the effects of bias on police stop-and-frisk encounters. It provides helpful insights to convince people in the law enforcement field to implement or prioritize anti-bias training for their officers and DEI departments that can specifically help tackle these issues..
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(ggplot2)
library(dplyr)
stop_frisk_data <- read_excel("sqf-2022.xlsx")
stop_frisk_data_clean_DA <- stop_frisk_data[!stop_frisk_data$SUSPECT_RACE_DESCRIPTION %in% c("(null)") &
!stop_frisk_data$SUSPECT_SEX %in% c("(null)"), ]
race_distribution <- stop_frisk_data_clean_DA %>%
count(SUSPECT_RACE_DESCRIPTION) %>%
mutate(proportion = n / sum(n))
ggplot(race_distribution, aes(x = SUSPECT_RACE_DESCRIPTION, y = n, fill = SUSPECT_RACE_DESCRIPTION)) +
geom_bar(stat = "identity", color = "black") +
geom_text(aes(label = scales::percent(proportion), y = n + 0.15),
hjust = 0.5, vjust = -0.3, size = 3.75) +
labs(title = "Racial Distribution of Individuals Stopped and Frisked",
x = "Race",
y = "Count with Proportion") +
scale_fill_brewer(palette = "Set3") +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_text(size = 10),
plot.margin = margin(10, 10, 20, 10),
plot.title = element_text(hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(size = 6.5),
legend.key.size = unit(0.5, "cm")
) +
theme(legend.position = "right") +
coord_cartesian(ylim = c(0, max(race_distribution$n) + 100))
gender_distribution <- stop_frisk_data %>%
filter(!is.na(SUSPECT_SEX) & SUSPECT_SEX != "(null)") %>%
count(SUSPECT_SEX) %>%
mutate(proportion = n / sum(n))
ggplot(gender_distribution, aes(x = SUSPECT_SEX, y = n, fill = SUSPECT_SEX)) +
geom_bar(stat = "identity", color = "black") +
geom_text(aes(label = scales::percent(proportion), y = n + 0.15),
hjust = 0.5, vjust = -0.5, size = 3.75) +
labs(title = "Gender Distribution of Individuals Stopped and Frisked",
x = "Gender",
y = "Count with Proportion") +
scale_fill_brewer(palette = "Set2") +
theme_minimal() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
plot.margin = margin(10, 10, 20, 10),
plot.title = element_text(hjust = 0.5),
legend.position = "none",
legend.title = element_blank(),
legend.text = element_text(size = 6.5),
legend.key.size = unit(0.5, "cm")
) +
coord_cartesian(ylim = c(0, max(gender_distribution$n) + 1000))
stop_frisk_data$STOP_FRISK_DATE <- as.Date(stop_frisk_data$STOP_FRISK_DATE)
monthly_counts <- stop_frisk_data %>%
group_by(Month = format(STOP_FRISK_DATE, "%m-%Y")) %>%
summarise(count = n())
ggplot(monthly_counts, aes(x = Month, y = count)) +
geom_line(group = 1, color = "skyblue", linewidth = 1) +
geom_point(color = "blue", size = 2) +
labs(title = "Stop-and-Frisk Encounters in 2022 by Month",
x = "Month",
y = "Number of Encounters") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
stop_frisk_data <- stop_frisk_data %>%
mutate(
day_of_week = weekdays(STOP_FRISK_DATE),
hour_of_day = as.integer(format(STOP_FRISK_TIME, "%H"))
)
stop_frisk_data <- stop_frisk_data %>%
mutate(
day_of_week = factor(weekdays(STOP_FRISK_DATE),
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
)
day_of_week_counts <- stop_frisk_data %>%
group_by(day_of_week) %>%
summarise(count = n()) %>%
arrange(desc(count))
day_of_week_counts$day_of_week <- factor(day_of_week_counts$day_of_week,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
stop_frisk_data$day_of_week <- factor(stop_frisk_data$day_of_week,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
ggplot(day_of_week_counts, aes(x = day_of_week, y = count, fill = day_of_week)) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Stop-and-Frisk Encounters by Day of the Week",
x = "Day of the Week", y = "Number of Encounters") +
scale_fill_brewer(palette = "Set3", name = "Day of the Week") +
theme_minimal() +
theme(
axis.text.y = element_text(angle = 0, hjust = 1),
axis.text.x = element_text(size = 10),
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
legend.position = "right",
plot.margin = margin(10, 10, 20, 10)
) +
coord_flip() +
scale_x_discrete(limits = rev(levels(stop_frisk_data$day_of_week)))
stop_frisk_data <- stop_frisk_data %>%
mutate(
time_of_day = case_when(
hour_of_day >= 0 & hour_of_day < 6 ~ "Late Night",
hour_of_day >= 6 & hour_of_day < 12 ~ "Morning",
hour_of_day >= 12 & hour_of_day < 18 ~ "Afternoon",
hour_of_day >= 18 & hour_of_day < 24 ~ "Evening",
TRUE ~ "Unknown"
),
time_of_day = factor(time_of_day, levels = c("Morning", "Afternoon", "Evening", "Late Night"))
)
time_of_day_counts <- stop_frisk_data %>%
count(time_of_day)
ggplot(stop_frisk_data, aes(x = time_of_day, fill = time_of_day)) +
geom_bar(stat = "count", color = "black") +
labs(title = "Stop-and-Frisk Encounters by Time of Day",
x = "Time of Day", y = "Number of Encounters", fill = "Time of Day") +
scale_fill_brewer(palette = "Set3") +
theme_minimal() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
plot.margin = margin(10, 10, 20, 10),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
legend.position = "right"
) +
coord_flip() +
scale_x_discrete(limits = rev(levels(stop_frisk_data$time_of_day)))
reason_counts <- stop_frisk_data %>%
group_by(SUSPECTED_CRIME_DESCRIPTION) %>%
tally() %>%
arrange(desc(n))
top_5_reasons <- stop_frisk_data %>%
group_by(SUSPECTED_CRIME_DESCRIPTION) %>%
tally() %>%
arrange(desc(n)) %>%
top_n(5)
top_5_reasons <- top_5_reasons %>%
mutate(SUSPECTED_CRIME_DESCRIPTION = factor(SUSPECTED_CRIME_DESCRIPTION,
levels = top_5_reasons$SUSPECTED_CRIME_DESCRIPTION[order(-top_5_reasons$n)]))
ggplot(top_5_reasons, aes(x = reorder(SUSPECTED_CRIME_DESCRIPTION, -n), y = n, fill = SUSPECTED_CRIME_DESCRIPTION)) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Top 5 Most Common Reasons for Stop and Frisk Encounters",
x = "Reason for Stop",
y = "Frequency",
fill = "Reason for Stop") +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_text(size = 10),
plot.margin = margin(10, 10, 20, 10)
)
stop_frisk_data <- stop_frisk_data %>%
mutate(
crime_category = case_when(
SUSPECTED_CRIME_DESCRIPTION %in% c("ASSAULT", "MENACING", "RECKLESS ENDANGERMENT", "ROBBERY",
"RAPE", "MURDER", "FORCIBLE TOUCHING", "TERRORISM") ~ "Violent Crimes",
SUSPECTED_CRIME_DESCRIPTION %in% c("GRAND LARCENY AUTO", "PETIT LARCENY", "GRAND LARCENY", "BURGLARY",
"THEFT OF SERVICES", "AUTO STRIPPING", "AUTO STRIPPIG") ~ "Theft-Related Crimes",
SUSPECTED_CRIME_DESCRIPTION %in% c("CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE", "CRIMINAL SALE OF CONTROLLED SUBSTANCE",
"CRIMINAL POSSESSION OF MARIHUANA", "CRIMINAL SALE OF MARIHUANA") ~ "Drug-Related Crimes",
SUSPECTED_CRIME_DESCRIPTION %in% c("CRIMINAL TRESPASS", "CRIMINAL MISCHIEF", "CRIMINAL POSSESSION OF FORGED INSTRUMENT") ~ "Property-Related Crimes",
SUSPECTED_CRIME_DESCRIPTION %in% c("CPSP", "UNAUTHORIZED USE OF A VEHICLE", "MAKING GRAFFITI", "OTHER") ~ "Other Crimes",
SUSPECTED_CRIME_DESCRIPTION == "CPW" ~ "Weapon Possession Crimes",
TRUE ~ "Unknown"
)
)
outcome_by_category <- stop_frisk_data %>%
group_by(crime_category, SUSPECT_ARRESTED_FLAG) %>%
summarise(count = n(), .groups = "drop")
ggplot(outcome_by_category, aes(x = crime_category, y = count, fill = SUSPECT_ARRESTED_FLAG)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Outcome of Stop-and-Frisk Encounters by Crime Category",
x = "Crime Category", y = "Number of Encounters",
fill = "Arrested") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10),
plot.margin = margin(10, 10, 20, 10))
crime_category_table <- stop_frisk_data %>%
select(SUSPECTED_CRIME_DESCRIPTION, crime_category) %>%
distinct() %>%
arrange(crime_category, SUSPECTED_CRIME_DESCRIPTION)
print(crime_category_table, n = 26)
outcome_category_data <- stop_frisk_data %>%
mutate(
outcome_category = case_when(
SUSPECT_ARRESTED_FLAG == "Y" & SUMMONS_ISSUED_FLAG == "Y" ~ "Arrest, Summons",
SUSPECT_ARRESTED_FLAG == "Y" & SUMMONS_ISSUED_FLAG == "N" ~ "Arrest, No Summons",
SUSPECT_ARRESTED_FLAG == "N" & SUMMONS_ISSUED_FLAG == "Y" ~ "No Arrest, Summons",
SUSPECT_ARRESTED_FLAG == "N" & SUMMONS_ISSUED_FLAG == "N" ~ "No Arrest, No Summons",
TRUE ~ "Unknown"
)
) %>%
count(outcome_category)
total_count_outcome_category <- sum(outcome_category_data$n)
outcome_category_data <- outcome_category_data %>%
mutate(percentage = (n / total_count_outcome_category) * 100)
ggplot(outcome_category_data, aes(x = outcome_category, y = percentage, fill = outcome_category)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Outcome Category: Arrests and Summonses",
x = "Outcome", y = "Percentage of Encounters", fill = "Outcome") +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.text.y = element_text(size = 10),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10))
encounter_outcome_data <- stop_frisk_data %>%
mutate(
encounter_outcome = case_when(
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
TRUE ~ "Other"
)
)
encounter_outcome_counts <- encounter_outcome_data %>%
group_by(encounter_outcome) %>%
summarise(count = n(), .groups = "drop")
total_count_encounter_outcome <- sum(encounter_outcome_counts$count)
encounter_outcome_counts <- encounter_outcome_counts %>%
mutate(percentage = (count / total_count_encounter_outcome) * 100)
ggplot(encounter_outcome_counts, aes(x = encounter_outcome, y = percentage, fill = encounter_outcome)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Encounter Outcome: Frisks, Searches, and Arrests",
x = "Encounter Outcome", y = "Percentage of Encounters", fill = "Encounter Outcome") +
theme_minimal() +
theme(axis.text.x = element_blank(), # Remove x-axis labels
axis.text.y = element_text(size = 10),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10),
plot.title = element_text(size = 14))
encounter_outcome_data_clean <- stop_frisk_data %>%
filter(SUSPECT_RACE_DESCRIPTION != "(null)",
!is.na(SUSPECT_RACE_DESCRIPTION)) %>%
mutate(
encounter_outcome = case_when(
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
TRUE ~ "Other"
)
)
encounter_outcome_by_race <- encounter_outcome_data_clean %>%
group_by(SUSPECT_RACE_DESCRIPTION, encounter_outcome) %>%
summarise(count = n(), .groups = "drop")
total_count_race <- encounter_outcome_by_race %>%
group_by(SUSPECT_RACE_DESCRIPTION) %>%
summarise(total = sum(count), .groups = "drop")
encounter_outcome_by_race <- encounter_outcome_by_race %>%
left_join(total_count_race, by = "SUSPECT_RACE_DESCRIPTION") %>%
mutate(percentage = (count / total) * 100) # Calculate percentage for each outcome by race
race_of_interest <- c("WHITE", "BLACK", "BLACK HISPANIC", "WHITE HISPANIC",
"ASIAN / PACIFIC ISLANDER", "MIDDLE EASTERN/SOUTHWEST ASIAN",
"AMERICAN INDIAN/ALASKAN NATIVE")
for (race in race_of_interest) {
race_data <- encounter_outcome_by_race %>%
filter(SUSPECT_RACE_DESCRIPTION == race)
p <- ggplot(race_data, aes(x = encounter_outcome, y = percentage, fill = encounter_outcome)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = paste("Encounter Outcomes for", race),
x = "Encounter Outcome", y = "Percentage of Encounters by Racial Category", fill = "Encounter Outcome") +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.text.y = element_text(size = 10),
plot.title = element_text(size = 12))
print(p)
}
stop_by_borough <- stop_frisk_data %>%
group_by(STOP_LOCATION_BORO_NAME) %>%
summarise(count = n()) %>%
arrange(desc(count))
ggplot(stop_by_borough, aes(x = reorder(STOP_LOCATION_BORO_NAME, -count), y = count)) +
geom_bar(stat = "identity", fill = "lightcoral") +
labs(title = "Number of Stop-and-Frisk Incidents by Neighborhood",
x = "Neighborhood", y = "Number of Incidents") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
stop_frisk_data_clean <- stop_frisk_data %>%
filter(SUSPECT_RACE_DESCRIPTION != "(null)")
stop_frisk_demo_by_borough <- stop_frisk_data_clean %>%
group_by(STOP_LOCATION_BORO_NAME, SUSPECT_RACE_DESCRIPTION) %>%
summarise(count = n()) %>%
ungroup()
stop_frisk_demo_by_borough_percentage <- stop_frisk_demo_by_borough %>%
group_by(STOP_LOCATION_BORO_NAME) %>%
mutate(percentage = (count / sum(count)) * 100) %>%
ungroup()
ggplot(stop_frisk_demo_by_borough_percentage, aes(x = STOP_LOCATION_BORO_NAME, y = percentage, color = SUSPECT_RACE_DESCRIPTION)) +
geom_point(size = 3, alpha = 0.7) +
labs(title = "Percentage of Stop-and-Frisk Incidents by Race Across Neighborhoods",
x = "Neighborhood",
y = "Percentage of Stop-and-Frisk Incidents",
color = "Race") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
cleaned_data <- stop_frisk_data %>%
mutate(
FIREARM_FLAG = ifelse(FIREARM_FLAG == "Y", TRUE, FALSE),
KNIFE_CUTTER_FLAG = ifelse(KNIFE_CUTTER_FLAG == "Y", TRUE, FALSE),
OTHER_WEAPON_FLAG = ifelse(OTHER_WEAPON_FLAG == "Y", TRUE, FALSE),
WEAPON_FOUND_FLAG = ifelse(WEAPON_FOUND_FLAG == "Y", TRUE, FALSE),
OTHER_CONTRABAND_FLAG = ifelse(OTHER_CONTRABAND_FLAG == "Y", TRUE, FALSE)
)
findings_summary <- cleaned_data %>%
summarise(
firearm_found = sum(FIREARM_FLAG, na.rm = TRUE),
knife_found = sum(KNIFE_CUTTER_FLAG, na.rm = TRUE),
other_weapon_found = sum(OTHER_WEAPON_FLAG, na.rm = TRUE),
contraband_found = sum(OTHER_CONTRABAND_FLAG, na.rm = TRUE),
weapon_found = sum(WEAPON_FOUND_FLAG, na.rm = TRUE)
)
total_encounters <- nrow(cleaned_data)
findings_percentage <- findings_summary %>%
mutate(
firearm_found_percentage = (firearm_found / total_encounters) * 100,
knife_found_percentage = (knife_found / total_encounters) * 100,
other_weapon_found_percentage = (other_weapon_found / total_encounters) * 100,
contraband_found_percentage = (contraband_found / total_encounters) * 100,
weapon_found_percentage = (weapon_found / total_encounters) * 100
)
findings_percentage_data <- data.frame(
category = c("Firearm Found", "Knife Found", "Other Weapon Found", "Contraband Found", "Any Weapon Found"),
percentage = c(findings_percentage$firearm_found_percentage,
findings_percentage$knife_found_percentage,
findings_percentage$other_weapon_found_percentage,
findings_percentage$contraband_found_percentage,
findings_percentage$weapon_found_percentage)
)
ggplot(findings_percentage_data, aes(x = category, y = percentage, fill = category)) +
geom_bar(stat = "identity") +
labs(title = "Percentage of Weapons or Contraband Found in Stops",
x = "Type of Finding",
y = "Percentage of Findings",
fill = "Category") +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_text(size = 12),
plot.title = element_text(size = 14)
)
stop_frisk_data_modified <- stop_frisk_data_clean %>%
mutate(
FIREARM_FLAG = ifelse(FIREARM_FLAG == "(null)", NA, FIREARM_FLAG),
KNIFE_CUTTER_FLAG = ifelse(KNIFE_CUTTER_FLAG == "(null)", NA, KNIFE_CUTTER_FLAG),
OTHER_WEAPON_FLAG = ifelse(OTHER_WEAPON_FLAG == "(null)", NA, OTHER_WEAPON_FLAG),
OTHER_CONTRABAND_FLAG = ifelse(OTHER_CONTRABAND_FLAG == "(null)", NA, OTHER_CONTRABAND_FLAG),
WEAPON_FOUND_FLAG = ifelse(WEAPON_FOUND_FLAG == "(null)", NA, WEAPON_FOUND_FLAG)
)
stop_weapon_contraband_data <- stop_frisk_data_modified %>%
filter(!is.na(SUSPECTED_CRIME_DESCRIPTION)) %>%
mutate(
weapon_found = ifelse(FIREARM_FLAG == "Y" | KNIFE_CUTTER_FLAG == "Y" | OTHER_WEAPON_FLAG == "Y" | WEAPON_FOUND_FLAG == "Y", "Weapon Found", "No Weapon Found"),
contraband_found = ifelse(OTHER_CONTRABAND_FLAG == "Y", "Contraband Found", "No Contraband Found")
)
weapon_found_data <- stop_weapon_contraband_data %>%
filter(weapon_found == "Weapon Found")
top_weapon_reasons <- weapon_found_data %>%
count(SUSPECTED_CRIME_DESCRIPTION) %>%
top_n(5, n)
top_weapon_reasons <- top_weapon_reasons %>%
mutate(SUSPECTED_CRIME_DESCRIPTION = factor(SUSPECTED_CRIME_DESCRIPTION, levels = SUSPECTED_CRIME_DESCRIPTION[order(-n)]))
ggplot(top_weapon_reasons, aes(x = SUSPECTED_CRIME_DESCRIPTION, y = n, fill = SUSPECTED_CRIME_DESCRIPTION)) +
geom_bar(stat = "identity") +
labs(title = "Top 5 Reasons for Stops Where Weapons Are Found",
x = "Reason for Stop (Crime Description)",
y = "Frequency of Stop",
fill = "Reason for Stop") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 12),
plot.title = element_text(size = 14))
age_group_data <- stop_frisk_data %>%
filter(SUSPECT_REPORTED_AGE != "(null)") %>%
mutate(
age_group = case_when(
as.numeric(SUSPECT_REPORTED_AGE) < 18 ~ "Under 18",
as.numeric(SUSPECT_REPORTED_AGE) >= 18 & as.numeric(SUSPECT_REPORTED_AGE) <= 24 ~ "18-24",
as.numeric(SUSPECT_REPORTED_AGE) >= 25 & as.numeric(SUSPECT_REPORTED_AGE) <= 34 ~ "25-34",
as.numeric(SUSPECT_REPORTED_AGE) >= 35 & as.numeric(SUSPECT_REPORTED_AGE) <= 44 ~ "35-44",
as.numeric(SUSPECT_REPORTED_AGE) >= 45 & as.numeric(SUSPECT_REPORTED_AGE) <= 54 ~ "45-54",
as.numeric(SUSPECT_REPORTED_AGE) >= 55 ~ "55 and older",
TRUE ~ "Unknown"
)
) %>%
group_by(age_group) %>%
summarise(stop_count = n())
age_group_data$age_group <- factor(age_group_data$age_group,
levels = c("Under 18", "18-24", "25-34", "35-44", "45-54", "55 and older", "Unknown"))
ggplot(age_group_data, aes(x = age_group, y = stop_count, fill = age_group)) +
geom_bar(stat = "identity") +
labs(title = "Frequency of Stop-and-Frisk Encounters by Age Group",
x = "Age Group",
y = "Number of Stops",
fill = "Age Group") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
age_reason_data <- stop_frisk_data %>%
filter(SUSPECT_REPORTED_AGE != "(null)") %>%
mutate(
age_group = case_when(
as.numeric(SUSPECT_REPORTED_AGE) < 18 ~ "Under 18",
as.numeric(SUSPECT_REPORTED_AGE) >= 18 & as.numeric(SUSPECT_REPORTED_AGE) <= 24 ~ "18-24",
as.numeric(SUSPECT_REPORTED_AGE) >= 25 & as.numeric(SUSPECT_REPORTED_AGE) <= 34 ~ "25-34",
as.numeric(SUSPECT_REPORTED_AGE) >= 35 & as.numeric(SUSPECT_REPORTED_AGE) <= 44 ~ "35-44",
as.numeric(SUSPECT_REPORTED_AGE) >= 45 & as.numeric(SUSPECT_REPORTED_AGE) <= 54 ~ "45-54",
as.numeric(SUSPECT_REPORTED_AGE) >= 55 ~ "55 and older",
TRUE ~ "Unknown"
)
) %>%
group_by(age_group, SUSPECTED_CRIME_DESCRIPTION) %>%
summarise(stop_count = n(), .groups = "drop") %>%
arrange(age_group, desc(stop_count))
age_reason_top3 <- age_reason_data %>%
group_by(age_group) %>%
top_n(3, stop_count) %>%
ungroup()
age_reason_top3$age_group <- factor(age_reason_top3$age_group,
levels = c("Under 18", "18-24", "25-34", "35-44", "45-54", "55 and older", "Unknown"))
ggplot(age_reason_top3, aes(x = age_group, y = stop_count, fill = SUSPECTED_CRIME_DESCRIPTION)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Top 3 Reasons for Stops by Age Group",
x = "Age Group",
y = "Number of Stops",
fill = "Reason for Stop") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
officer_stop_count <- stop_frisk_data %>%
group_by(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %>%
summarise(stop_count = n(), .groups = "drop") %>%
arrange(desc(stop_count))
top_10_officers <- officer_stop_count %>%
top_n(10, stop_count)
ggplot(top_10_officers, aes(x = reorder(paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE), stop_count),
y = stop_count, fill = stop_count)) +
geom_bar(stat = "identity") +
labs(title = "Top 10 Officers with Highest Number of Stop-and-Frisk Encounters",
x = "Officer (Rank and Command Code)",
y = "Number of Stops",
fill = "Number of Stops") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
set.seed(123)
officer_stop_counts <- stop_frisk_data %>%
group_by(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %>%
summarise(num_stops = n(), .groups = 'drop')
officers_with_100_stops <- officer_stop_counts %>%
filter(num_stops >= 75)
sampled_officers <- officers_with_100_stops %>%
sample_n(10)
sampled_data <- stop_frisk_data %>%
filter(paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE) %in%
paste(sampled_officers$ISSUING_OFFICER_RANK, sampled_officers$ISSUING_OFFICER_COMMAND_CODE))
encounter_outcome_sampled_data <- sampled_data %>%
mutate(
encounter_outcome = case_when(
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, Search, No Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "Frisk, No Search, Arrest",
FRISKED_FLAG == "Y" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "Frisk, No Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "Y" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, Search, No Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "Y" ~ "No Frisk, No Search, Arrest",
FRISKED_FLAG == "N" & SEARCHED_FLAG == "N" & SUSPECT_ARRESTED_FLAG == "N" ~ "No Frisk, No Search, No Arrest",
TRUE ~ "Other"
)
)
outcome_by_sampled_officer <- encounter_outcome_sampled_data %>%
group_by(ISSUING_OFFICER_COMMAND_CODE, ISSUING_OFFICER_RANK, encounter_outcome) %>%
summarise(count = n(), .groups = 'drop')
outcome_by_sampled_officer <- outcome_by_sampled_officer %>%
mutate(officer_label = paste(ISSUING_OFFICER_RANK, ISSUING_OFFICER_COMMAND_CODE))
ggplot(outcome_by_sampled_officer, aes(x = officer_label, y = count, fill = encounter_outcome)) +
geom_bar(stat = "identity") +
labs(
title = "Variation of Encounter Outcomes for Officers with At Least 75 Stops",
x = "Issuing Officer Rank & Command Code",
y = "Count of Outcomes",
fill = "Encounter Outcome"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))