In this report, we analyze different aspects of the official 2022 NYPD “Stop, Question, and Frisk” dataset. The dataset is comprised of roughly 15,000 stop-and-frisk incidents conducted by the NYPD during the 2022 calendar year, with 82 columns capturing data such as the demographics the the subject being stopped, context of the stop, arrests made (if any), citations/summons issued, etc.
Here is a sample of the first few columns of the first few entries of the data:
Many of the columns are just simple flags or indicators, but there are a lot of columns. The goal of this report is to supply a visual analysis of the above dataset, answering questions about demographics of those stopped, trends over the year, reasons for stops, etc.
1. Demographic Analysis
What is the racial distribution of individuals who were stopped and frisked by the police in 2022?
This is a major point of contention in the US today, whether or not police unfairly discriminate against certain people-groups when it comes to responses or investigation.
Demographic analysis of racial distribution of stop-and-frisk incidents
Unfortunately, I was not able to track down population estimates for individuals who identify as “Black Hispanic” or “White Hispanic”. Given the nature of the data, it may be best to then omit a population-adjusted proportion of those individuals who are stopped-and-frisked, though it would still likely be high (as seen in the first plot).
If we continue with these population-adjusted figures, however, we still see that Black people tend to be disproportionately represented in terms of counts of stop-and-frisk incidents. This is also true with the Middle-Eastern-descendant population of NYC.
Additionally, the absolute count of Native Americans stopped by NYPD during 2022 is very low, mostly because the overall population of Native Americans in NYC itself is low. As such, we can’t extrapolate much without looking at more data.
Demographic analysis of sex distribution of stop-and-frisk incidents
Seemingly a little under 10% of all frisk incidents are of stopping women, and the vast majority are of stopping men.
Conclusions here would be that Black people are disproportionately stopped the most (by a lot). Men are also stopped and frisked far more often than women. Explanations for these observations tend to be quite nuanced and are beyond the scope of this specific assignment.
2. Temporal Trends
How have the number of stop-and-frisk encounters changed over the months of 2022? Does day of the week make an impact?
We will first examine the trend over the entire year by month, then over the week by weekday:
The monthly total seems to hover at roughly 1000-1500 incidents, except in December, where the count drops rather sharply. New York can have some cold winters, so fewer people are likely walking around outside. This would explain the dip in frisking incidents.
There is a very real, but odd, dip for incident counts on Mondays. It is almost certainly related to the fact that Monday is the start of the work week, but exactly why this would cause a dip is difficult to explain.
Overall, it is reasonable to conclude that there exists a significant temporal component related to overall counts (and thus, monthly/weekly averages) of stop-and-frisk occurrences. There is an expected decline in rates during December, then also on each Monday (aggregated throughout the year).
3 + 4. Reasons for Stops & Outcomes of Stops
What are the most common reasons cited by police officers for conducting stop-and-frisk encounters?
What are the NYPD officers typically stopping someone for? What is the suspected crime? We are given the suspected crime in the data, along with outcome of the encounter (nothing, arrest, etc.):
# A tibble: 10 × 3
reason stops arrests
<chr> <int> <dbl>
1 CPW 6908 1432
2 ROBBERY 1544 717
3 PETIT LARCENY 1336 820
4 ASSAULT 1269 513
5 BURGLARY 1077 379
6 OTHER 557 179
7 GRAND LARCENY 439 222
8 GRAND LARCENY AUTO 434 128
9 MENACING 414 173
10 CRIMINAL MISCHIEF 238 108
The common stops seem expected, all things considered. “CPW” stands for “Criminal Possession of Weapon”, “Petit” means “Petty” in this context, and the others are all self-explanatory. Note: Just because one was stopped by the NYPD for these suspected crimes does not means that they actually committed them, nor that they were arrested; rather, it’s just that the individual was a suspect.
Is there a relationship between the stated reason for the stop and the outcome of the encounter?
Yes, it seems so! Some suspected crimes are far more likely to have arrest being an outcome than others. Petty Larceny, for example, has an arrest rate nearly double that of the mean overall arrest rate, while CPW has a significantly smaller arrest rate. Most, however, are somewhat near the mean arrest rate, at +-10% of the mean.
Again, note that an arrest being made does not mean that the arrested person necessarily committed the crime, just that the attending officer believes that the crime may have occurred, beyond a reasonable doubt.
We can also compute the following statistics: 60.03% of stops resulted in the suspect being frisked, 42.22% of stops resulted in the suspect being searched, and 2.78% of stops resulted in a court summons being issued.
Are there differences in outcomes based on the demographics of the individuals stopped?
Now that we understand how certain suspected crimes result in more arrests than others, how does the proportions of arrests for those crimes depend on demographics (primarily, race and sex)?
Above are some suspected crimes selected (mainly because these were some of the few that had enough data to indicate patterns). It appears that race is likely an indicator of stop-to-arrest rates, but the pattern is likely not as simple as one race being arrested more than others. It probably depends on factors such as class of crime, severity, and others.
Same is true for male vs. female suspect stops. There is probably a pattern, but it is likely a complex one, dependent on many factors. It may also be a data limitation, many groupings of this dataset will result in groups with <100 observations, which is a bit shaky to generalize on. More data would be better.
5. Location Analysis
Which neighborhoods or precincts have the highest number of stop-and-frisk incidents?
This is a big question, because neighborhood rates of stops-and-frisk incidents are highly correlated with demographic makeup of the neighborhood, but not dependent on said demographic makeup:
Bronx sees both a high absolute rate of stop-and-frisk incidents at 4495 in 2022 as well as a high proportional rate (with population adjustment). This aligns with the Bronx having the lowest white population of any of the boroughs, as well as a Hispanic majority, which we have already seen as a frequent target of stops.
We also see that within the Bronx, precinct 46 has, by far, the most stop-and-frisk incidents of all the precinct locations. In fact, it has the most incidents out of any NYPD precinct, beating 2nd place by over 600 incidents.
So yes, location matters greatly when it comes to rates of stops-and-frisk events, for multiple reasons. The Bronx sees the most of these, with precinct 46 initiating over 1100 in 2022 alone.
6. Weapon and Contraband Discovery
How frequently do police officers find weapons or contraband during stop-and-frisk encounters?
Given that a large number of the total set of stop events involve searches, it makes sense to wonder how many of those searches actually yield weapons or contraband. Note that only 42.22% of all stops actually lead to searches.
Above is a barplot displaying the proportion of types of items found to the total number of searches, per suspected offense. Clearly, there is a strong relationship between the ratios and the suspected crime (as one might expect). “Criminal possession of weapon” (CPW) searches lead to a very high chance of finding a weapon, and Graffiti searches lead to a high chance of finding contraband.
Overall, if the officers choose to search, they find weapons 37.81% of the time, and they find other contraband 13.72% of the time.
7. Repeat Encounters
I could not find any section of the data that disclosed information about repeat encounters. Thus, We will have to move on for now.
8. Age and Stop Frequency
Is there a relationship between the age of individuals and their likelihood of being stopped and frisked?
Analysis of age vs. stop rates
Code
frisk.age <- frisk |>filter(SUSPECT_REPORTED_AGE !="(null)") |>mutate(SUSPECT_REPORTED_AGE =as.numeric(SUSPECT_REPORTED_AGE))hist( frisk.age$SUSPECT_REPORTED_AGE,main ="Counts of stops by age",xlab ="Age",ylab ="Count",col ="magenta3")
It seems that there exists a strong relationship between the suspect ages and the rates at which they are stopped. Specifically, the age with the highest stop counts was 15 years old, decreasing as the suspects get older.
There were 5 people who had reported their age as “0” years old. I’m not exactly sure why they did so.
There is a somewhat clear distribution of counts of stops when broken down by age, with suspects placed in the 15-29 or 30-44 year old age groups being stopped the majority of the time for most suspected crimes.
9. Officer Identification
Which police officers have conducted the highest number of stop-and-frisk encounters?
First, something interesting to note is that many of the officers have a 100% stop-to-arrest ratio because they have only made one stop. Likewise, there are many officers with a 0% stop-to-arrest ratio for the same reason. As we care about trends, we will only consider officers who have conducted at least 15 stops:
For most officers, the stop-to-search ratio is quite related to the stop-to-arrest ratio. This is expected, as ideally, officers only search when they believe that they have a reason to search. The officer with the highest stop-to-arrest ratio was officer with code 13, making arrests 67.59% of the time!
The average search-to-arrest ratio was 85.64%.
Source Code
---title: "Stat 360 Project 4"author: "Griffin Lessinger"date: "`r Sys.Date()`"format: html: code-fold: true code-tools: true toc: trueeditor: source---### IntroductionIn this report, we analyze different aspects of the official 2022 NYPD "Stop, Question, and Frisk" dataset. The dataset is comprised of roughly 15,000 stop-and-frisk incidents conducted by the NYPD during the 2022 calendar year, with 82 columns capturing data such as the demographics the the subject being stopped, context of the stop, arrests made (if any), citations/summons issued, etc.Here is a sample of the first few columns of the first few entries of the data::::{.callout collapse=true title="Sample of stop-and-frisk data"}```{r, message = FALSE, warning = FALSE}library(readxl)library(ggplot2)library(dplyr)frisk <-read_xlsx("/home/user/School/STAT360/Project 4 (NYPD)/sqf-2022.xlsx")frisk[1:10, 2:7]```(not all that useful, sorry!):::{.callout collapse=true title="Column names of frisk data (for the brave)"}```{r, cache = TRUE}colnames(frisk)```::::::Many of the columns are just simple flags or indicators, but there are a lot of columns. The goal of this report is to supply a visual analysis of the above dataset, answering questions about demographics of those stopped, trends over the year, reasons for stops, etc.### 1. Demographic Analysis#### What is the racial distribution of individuals who were stopped and frisked by the police in 2022?This is a major point of contention in the US today, whether or not police unfairly discriminate against certain people-groups when it comes to responses or investigation.:::{.callout collapse=true title="Demographic analysis of racial distribution of stop-and-frisk incidents"}```{r, cache = TRUE}demographic.race <- frisk |>group_by(SUSPECT_RACE_DESCRIPTION) |>summarize(count =n())demographic.race.bp <-barplot(height = demographic.race$count[-1],names.arg =c("Native A.", "Asian", "Black", "Black His.", "Middle E.", "White", "White His."),col =colorRampPalette(c("skyblue", "royalblue4"))(7),cex.names =0.9,xlab ="Racial description",ylab ="Count",main ="Stop-and-frisk incidents by race")text(x = demographic.race.bp,y = demographic.race$count[-1] +350,labels = demographic.race$count[-1],cex =0.8,xpd =TRUE)```This does little to understand the overall picture, however. Instead, we should look at these counts *proportional to* the overall population:```{r, cache = TRUE}pops <-c(143632, 1373502, 1776891, 100000, 2719856)demographic.race.bpadj <-barplot(height = demographic.race$count[-c(1, 5, 8)]/pops,names.arg =c("Native A.", "Asian", "Black", "Middle E.", "White"),col =colorRampPalette(c("skyblue", "royalblue4"))(5),cex.names =0.9,xlab ="Racial description",ylab ="Proportion of pop.",ylim =c(0, 0.0056),main ="Stop-and-frisk incidents by race")text(x = demographic.race.bpadj,y = demographic.race$count[-c(1, 5, 8)]/pops +0.00023,labels =paste0(trunc(10000*demographic.race$count[-c(1, 5, 8)]/pops)/100, "%"),cex =0.8,xpd =TRUE)text(x =3.1,y =0.0059,labels ="Pop. adjusted with 2020 census data",cex =0.9,xpd =TRUE)```Unfortunately, I was not able to track down population estimates for individuals who identify as "Black Hispanic" or "White Hispanic". Given the nature of the data, it may be best to then omit a population-adjusted proportion of those individuals who are stopped-and-frisked, though it would still likely be high (as seen in the first plot).If we continue with these population-adjusted figures, however, we still see that Black people tend to be disproportionately represented in terms of counts of stop-and-frisk incidents. This is also true with the Middle-Eastern-descendant population of NYC.Additionally, the absolute count of Native Americans stopped by NYPD during 2022 is very low, mostly because the overall population of Native Americans in NYC itself is low. As such, we can't extrapolate much without looking at more data.::::::{.callout collapse=true title="Demographic analysis of sex distribution of stop-and-frisk incidents"}```{r, cache = TRUE}demographic.sex <- frisk |>group_by(SUSPECT_SEX) |>summarize(count =n())par(mar =c(5, 13, 5, 9))demographic.sex.bp <-barplot(height =matrix(data = demographic.sex$count[-1]/14968, nrow =2),col =c("pink2", "skyblue"),ylab ="Proportion",main ="Stop-and-frisk incidents by sex",xlim =c(0, 1),width =0.2)legend(x ="right",fill =c("skyblue", "pink2"),legend =c(paste0("Male: ", demographic.sex$count[3]),paste0("Female: ", demographic.sex$count[2]) ),bty ="n")par(mar =c(5, 6, 4, 2))```Seemingly a little under 10% of all frisk incidents are of stopping women, and the vast majority are of stopping men.:::Conclusions here would be that Black people are disproportionately stopped the most (by a lot). Men are also stopped and frisked far more often than women. Explanations for these observations tend to be quite nuanced and are beyond the scope of this specific assignment.### 2. Temporal Trends#### How have the number of stop-and-frisk encounters changed over the months of 2022? Does day of the week make an impact?We will first examine the trend over the entire year by month, then over the week by weekday::::{.callout collapse=true title="Analysis of incidents by month"}```{r, cache = TRUE}count.month <- frisk |>group_by(MONTH2) |>summarise(count =n())count.month$month <-substr(count.month$MONTH2, 1, 3)count.month$month <-factor( count.month$month,levels =c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),ordered =TRUE)ggplot(data = count.month, aes(x = month, y = count)) +ylim(0, 2000) +geom_line(group =1) +labs(title ="Stop-and-frisk incidents by month",x ="Month",y ="Count" ) +theme_classic()```The monthly total seems to hover at roughly 1000-1500 incidents, except in December, where the count drops rather sharply. New York can have some cold winters, so fewer people are likely walking around outside. This would explain the dip in frisking incidents.::::::{.callout collapse=true title="Analysis of incidents by weekday"}```{r, cache = TRUE}count.week <- frisk |>group_by(DAY2) |>summarise(count =n())count.week$day <-substr(count.week$DAY2, 1, 3)count.week$day <-factor( count.week$day,levels =c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),ordered =TRUE)ggplot(data = count.week, aes(x = day, y = count)) +ylim(0, 3000) +geom_line(group =1) +labs(title ="Stop-and-frisk incidents by weekday",x ="Weekday",y ="Count" ) +theme_classic()```There is a very real, but odd, dip for incident counts on Mondays. It is almost certainly related to the fact that Monday is the start of the work week, but exactly why this would cause a dip is difficult to explain.:::Overall, it is reasonable to conclude that there exists a significant temporal component related to overall counts (and thus, monthly/weekly averages) of stop-and-frisk occurrences. There is an expected decline in rates during December, then also on each Monday (aggregated throughout the year).### 3 + 4. Reasons for Stops & Outcomes of Stops#### What are the most common reasons cited by police officers for conducting stop-and-frisk encounters?What are the NYPD officers typically stopping someone for? What is the suspected crime? We are given the suspected crime in the data, along with outcome of the encounter (nothing, arrest, etc.)::::{.callout collapse=true title="10 most commonly cited reasons for stop"}```{r}count.reasons <- frisk |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>summarise(count =n()) |>arrange(desc(count))count.arrests <- frisk |>filter(SUSPECT_ARRESTED_FLAG =="Y") |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>summarise(count =n())count.reasons <-left_join(x = count.reasons,y = count.arrests,by ="SUSPECTED_CRIME_DESCRIPTION")count.reasons$count.y[c(23, 25, 26)] <-0colnames(count.reasons) <-c("reason", "stops", "arrests")count.reasons[1:10, ]```:::The common stops seem expected, all things considered. "CPW" stands for "Criminal Possession of Weapon", "Petit" means "Petty" in this context, and the others are all self-explanatory. Note: Just because one was stopped by the NYPD for these suspected crimes *does not* means that they actually committed them, nor that they were arrested; rather, it's just that the individual was a suspect.#### Is there a relationship between the stated reason for the stop and the outcome of the encounter?:::{.callout collapse=true title="Analysis of arrests rates for suspected crimes"}```{r, cache = TRUE}par(oma =c(2, 0, 0, 0))reasons.bp <-barplot(height = count.reasons[1:20, ]$arrests/count.reasons[1:20, ]$stops,col =colorRampPalette(c("grey90", "royalblue4"))(20),ylim =c(0, 1),main ="Proportions of stop counts to arrest outcomes",ylab ="Proportion arrested")text(x = reasons.bp +0.05,y =-0.03,# labels = count.reasons[1:20, ]$reason,labels =c("CPW", "Robbery", "P. Larceny", "Assault", "Burglary", "Other", "G. Larceny", "G. Larceny Auto", "Menacing", "Mischief", "Trespassing", "Endangering", "CPSP", "Graffiti", "Auto stripping", "P. of Substance", "S. of Substance", "Use of Vehicle", "Murder", "Forced Touching"),cex =0.85,adj =1,srt =60,xpd =TRUE)text(x =11.9,y =1.05,labels ="Top 20 cited reasons for stop",cex =0.9,xpd =TRUE)abline(h =nrow(frisk[frisk$SUSPECT_ARRESTED_FLAG =="Y", ])/nrow(frisk),col ="red3",lwd =2,lty =2)legend(x ="topright",fill ="red3",legend ="Mean arrest rate: 0.33",bty ="n")mtext("Reason for stop",side =1,outer =TRUE)par(oma =c(0, 0, 0, 0))```Yes, it seems so! Some suspected crimes are far more likely to have arrest being an outcome than others. Petty Larceny, for example, has an arrest rate nearly double that of the mean overall arrest rate, while CPW has a *significantly* smaller arrest rate. Most, however, are somewhat near the mean arrest rate, at +-10% of the mean.Again, note that an arrest being made *does not* mean that the arrested person necessarily committed the crime, just that the attending officer believes that the crime may have occurred, beyond a reasonable doubt.We can also compute the following statistics: 60.03% of stops resulted in the suspect being frisked, 42.22% of stops resulted in the suspect being searched, and 2.78% of stops resulted in a court summons being issued.:::#### Are there differences in outcomes based on the demographics of the individuals stopped?Now that we understand how certain suspected crimes result in more arrests than others, how does the proportions of arrests for those crimes depend on demographics (primarily, race and sex)?:::{.callout collapse=true title="Analysis of outcomes by race and sex"}```{r, message = FALSE, warning = FALSE}count.race <- frisk |>filter(SUSPECT_RACE_DESCRIPTION !="(null)") |>group_by(SUSPECTED_CRIME_DESCRIPTION, SUSPECT_RACE_DESCRIPTION) |>summarise(stops =n(),arrests =sum(SUSPECT_ARRESTED_FLAG =="Y") ) |>ungroup() |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>mutate(total =sum(stops)) |>ungroup() |>arrange(desc(total)) |>select(-total) |>mutate(rate = arrests/stops)count.sex <- frisk |>filter(SUSPECT_SEX !="(null)") |>group_by(SUSPECTED_CRIME_DESCRIPTION, SUSPECT_SEX) |>summarise(stops =n(),arrests =sum(SUSPECT_ARRESTED_FLAG =="Y") ) |>ungroup() |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>mutate(total =sum(stops)) |>ungroup() |>arrange(desc(total)) |>select(-total) |>mutate(rate = arrests/stops)race.bp <-barplot(height =matrix(data = count.race$rate[c(2:7, 9:14, 23:28)],nrow =6,byrow =FALSE ),beside =TRUE,col =colorRampPalette(c("grey90", "royalblue4"))(6),ylim =c(0, 1),main ="Proportions of stop counts to arrests by race",xlab ="Reason for stop",ylab ="Proportion arrested",names.arg =c("CPW", "Robbery", "Assault"))legend(x ="topleft",fill =colorRampPalette(c("grey90", "royalblue4"))(6),legend =c("Asian", "Black", "Black Hispanic", "Middle Eastern", "White", "White Hispanic"),cex =0.8,bty ="n")```Above are some suspected crimes selected (mainly because these were some of the few that had enough data to indicate patterns). It appears that race is likely an indicator of stop-to-arrest rates, but the pattern is likely not as simple as one race being arrested more than others. It probably depends on factors such as class of crime, severity, and others.```{r, cache = TRUE}sex.bp <-barplot(height =matrix(data = count.sex$rate[c(1:2, 3:4, 5:6, 7:8, 9:10)],nrow =2,byrow =FALSE ),beside =TRUE,col =c("pink2", "skyblue"),ylim =c(0, 1),main ="Proportions of stop counts to arrests by sex",xlab ="Reason for stop",ylab ="Proportion arrested",names.arg =c("CPW", "Robbery", "P. Larceny", "Assault", "Burglary"))legend(x ="topleft",fill =c("pink2", "skyblue"),legend =c("Female", "Male"),bty ="n")```Same is true for male vs. female suspect stops. There is probably a pattern, but it is likely a complex one, dependent on many factors. It may also be a data limitation, many groupings of this dataset will result in groups with <100 observations, which is a bit shaky to generalize on. More data would be better.:::### 5. Location Analysis#### Which neighborhoods or precincts have the highest number of stop-and-frisk incidents?This is a big question, because neighborhood rates of stops-and-frisk incidents are highly correlated with demographic makeup of the neighborhood, *but not dependent* on said demographic makeup::::{.callout collapse=true title="Analysis of incidents by precinct"}```{r, message = FALSE, warning = FALSE}count.loc <- frisk |>group_by(STOP_LOCATION_BORO_NAME, STOP_LOCATION_PRECINCT) |>summarize(count =n()) |>ungroup() |>group_by(STOP_LOCATION_BORO_NAME) |>mutate(total =sum(count)) |>ungroup() |>arrange(desc(total))pops <-c(1472000, 2736000, 1695000, 2405000, 496000)boro.bp <-barplot(height =unique(count.loc$total)/pops,names.arg =c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island"),main ="Incident counts by borough",xlab ="Borough",ylab ="Proportion",col =colorRampPalette(c("grey85", "royalblue4"))(5))text(x =3.1,y =0.00325,labels ="Pop. adjusted with 2020 census data",cex =0.85,xpd =TRUE)text(x = boro.bp,y =unique(count.loc$total)/pops +0.00015,labels =unique(count.loc$total),cex =0.8,xpd =TRUE)```Bronx sees both a high absolute rate of stop-and-frisk incidents at 4495 in 2022 as well as a high proportional rate (with population adjustment). This aligns with the Bronx having *the lowest* white population of any of the boroughs, as well as a Hispanic majority, which we have already seen as a frequent target of stops.```{r}precinct.bp <-barplot(height = count.loc$count[1:12],names.arg =c(40:50, 52),main ="Incident counts by Bronx precincts",xlab ="Precinct",ylab ="Count",col =colorRampPalette(c("grey85", "royalblue4"))(12))text(x = precinct.bp,y = count.loc$count[1:12] +40,labels = count.loc$count[1:12],cex =0.8,xpd =TRUE)abline(h =mean(count.loc$count[1:12]),col ="red3",lwd =2,lty =2)legend(x ="topleft",fill ="red3",legend ="Mean: 374.6",cex =0.8,bty ="n")```We also see that within the Bronx, precinct 46 has, by far, the most stop-and-frisk incidents of all the precinct locations. In fact, it has the most incidents out of any NYPD precinct, beating 2nd place by over 600 incidents.:::So yes, location matters greatly when it comes to rates of stops-and-frisk events, for multiple reasons. The Bronx sees the most of these, with precinct 46 initiating over 1100 in 2022 alone.### 6. Weapon and Contraband Discovery#### How frequently do police officers find weapons or contraband during stop-and-frisk encounters?Given that a large number of the total set of stop events involve searches, it makes sense to wonder how many of those searches actually yield weapons or contraband. Note that only 42.22% of all stops actually lead to searches.:::{.callout collapse=true title="Analysis of weapons/contraband searches"}```{r}count.search <- frisk |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>summarize(search =sum(SEARCHED_FLAG =="Y"),weapon =sum(WEAPON_FOUND_FLAG =="Y"),gun =sum(FIREARM_FLAG =="Y"),knife =sum(KNIFE_CUTTER_FLAG =="Y"),otherweapon =sum(OTHER_WEAPON_FLAG =="Y"),contra =sum(OTHER_CONTRABAND_FLAG =="Y"),total =n() ) |>arrange(desc(total)) |>select(-total)count.search <- count.search |>mutate(percent.weapon = weapon/search) |>mutate(percent.gun = gun/search) |>mutate(percent.knife = knife/search) |>mutate(percent.otherweapon = otherweapon/search) |>mutate(percent.contra = contra/search)search.bp <-barplot(height =matrix(data =c(count.search$percent.weapon[c(1, 2, 4, 11, 14)], count.search$percent.contra[c(1, 2, 4, 11, 14)]),nrow =2,byrow =TRUE ),beside =TRUE,col =c("grey", "tan"),ylim =c(0, 1),main ="Proportions of searches to found-illicit items",xlab ="Reason for stop",ylab ="Proportion",names.arg =c("CPW", "Robbery", "Assault", "Tresspassing", "Graffiti"))legend(x ="topleft",fill =c("grey", "tan"),legend =c("Weapon", "Contraband"),cex =0.8,bty ="n")```Above is a barplot displaying the proportion of types of items found to the total number of searches, per suspected offense. Clearly, there is a strong relationship between the ratios and the suspected crime (as one might expect). "Criminal possession of weapon" (CPW) searches lead to a very high chance of finding a weapon, and Graffiti searches lead to a high chance of finding contraband. Overall, if the officers choose to search, they find weapons 37.81% of the time, and they find other contraband 13.72% of the time.:::### 7. Repeat EncountersI could not find any section of the data that disclosed information about repeat encounters. Thus, We will have to move on for now.### 8. Age and Stop Frequency#### Is there a relationship between the age of individuals and their likelihood of being stopped and frisked?:::{.callout collapse=true title="Analysis of age vs. stop rates"}```{r}frisk.age <- frisk |>filter(SUSPECT_REPORTED_AGE !="(null)") |>mutate(SUSPECT_REPORTED_AGE =as.numeric(SUSPECT_REPORTED_AGE))hist( frisk.age$SUSPECT_REPORTED_AGE,main ="Counts of stops by age",xlab ="Age",ylab ="Count",col ="magenta3")```It seems that there exists a strong relationship between the suspect ages and the rates at which they are stopped. Specifically, the age with the highest stop counts was 15 years old, decreasing as the suspects get older.There were 5 people who had reported their age as "0" years old. I'm not exactly sure why they did so.```{r}count.age <- frisk.age |>group_by(SUSPECTED_CRIME_DESCRIPTION) |>summarise(age1.14 =sum(SUSPECT_REPORTED_AGE >=1& SUSPECT_REPORTED_AGE <15),age15.29 =sum(SUSPECT_REPORTED_AGE >=15& SUSPECT_REPORTED_AGE <30),age30.44 =sum(SUSPECT_REPORTED_AGE >=30& SUSPECT_REPORTED_AGE <45),age45.59 =sum(SUSPECT_REPORTED_AGE >=45& SUSPECT_REPORTED_AGE <60),age60.up =sum(SUSPECT_REPORTED_AGE >=60),total =n() ) |>arrange(desc(total)) |>select(-total)age.bp <-barplot(height =matrix(data =c( count.age$age1.14[c(2, 3, 5, 10)], count.age$age15.29[c(2, 3, 5, 10)], count.age$age30.44[c(2, 3, 5, 10)], count.age$age45.59[c(2, 3, 5, 10)], count.age$age60.up[c(2, 3, 5, 10)] ),nrow =5,byrow =TRUE ),beside =TRUE,col =colorRampPalette(c("grey", "red3", "purple3"))(5),names.arg =c("Robbery", "P. Larceny", "Burglary", "Mischief"),main ="Suspected crimes by age groups",xlab ="Reason for stop",ylab ="Count")legend(x ="topright",title ="Age group (years)",fill =colorRampPalette(c("grey", "red3", "purple3"))(5),legend =c("1-14", "15-29", "30-44", "45-59", "60+"),cex =0.8,bty ="n")```There is a somewhat clear distribution of counts of stops when broken down by age, with suspects placed in the 15-29 or 30-44 year old age groups being stopped the majority of the time for most suspected crimes.:::### 9. Officer Identification#### Which police officers have conducted the highest number of stop-and-frisk encounters?```{r}count.officer <- frisk |>group_by(ISSUING_OFFICER_COMMAND_CODE) |>summarize(total =n(),searches =sum(SEARCHED_FLAG =="Y"),arrests =sum(SUSPECT_ARRESTED_FLAG =="Y") ) |>mutate(search.ratio = searches/total,arrest.ratio = arrests/total) |>arrange(desc(arrest.ratio)) |>filter(total >=15)names(count.officer)[1] <-"officer.code"```First, something interesting to note is that many of the officers have a 100% stop-to-arrest ratio *because* they have only made one stop. Likewise, there are many officers with a 0% stop-to-arrest ratio for the same reason. As we care about trends, we will only consider officers who have conducted at least 15 stops::::{.callout collapse=true title="Top 10 officers in terms of stops-to-arrests"}```{r}count.officer[1:10, ]```:::{.callout collapse=true title="Remaining officers"}```{r}as.data.frame(count.officer[11:113, ], row.names =11:113)```::::::For most officers, the stop-to-search ratio is quite related to the stop-to-arrest ratio. This is expected, as ideally, officers only search when they believe that they have a reason to search. The officer with the highest stop-to-arrest ratio was officer with code 13, making arrests 67.59% of the time!The average search-to-arrest ratio was 85.64%.