NY Hate Crimes Tutorial

Author

Z Griffin

NY Hate Crimes, 2019-2026

About the data set

This dataset looks at all types of hate crimes in New York counties by the type of hate crime from 2019 to 2026. https://data.cityofnewyork.us/Public-Safety/NYPD-Hate-Crimes/bqiq-cu78/about_data

However, the set is flawed as it relies on either the victim first reporting a crime to the police, or, if they did so, the police to recognize the crime as a hate crime. And, if those two events happened, for the it to be recognized as a hate crime against the correct group/characteristic.

We know there is bias in the data. What can we do with it?

library(tidyverse)
library(knitr)
setwd("~/Schol Stuff/Montgomery College 2025/Data 110 Data Visualization/Hate Crimes HW")
hatecrimes <- read_csv('NYPD_Hate_Crimes_19-26.csv')

Clean up the data:

Make all headers lowercase and remove spaces

use tolower to make all the column names lowercase, and gsub to remove the spaces

names(hatecrimes) <- tolower(names(hatecrimes))
names(hatecrimes) <- gsub(" ","",names(hatecrimes)) #replace the space with no space in the names of hatecrimes columns
head(hatecrimes)

# A tibble: 6 × 14
  fullcomplaintid complaintyearnumber monthnumber recordcreatedate
            <dbl>               <dbl>       <dbl> <chr>           
1         2.02e14                2019           1 1/23/2019       
2         2.02e14                2019           2 2/25/2019       
3         2.02e14                2019           2 2/27/2019       
4         2.02e14                2019           4 4/16/2019       
5         2.02e14                2019           6 6/20/2019       
6         2.02e14                2019           7 7/31/2019       
# ℹ 10 more variables: complaintprecinctcode <dbl>, patrolboroughname <chr>,
#   county <chr>, lawcodecategorydescription <chr>, offensedescription <chr>,
#   pdcodedescription <chr>, biasmotivedescription <chr>,
#   offensecategory <chr>, arrestdate <lgl>, arrestid <chr>

] ## Explore the Bias Motive (biasmotivedescription)

bias_count <- hatecrimes |>
  select(biasmotivedescription) |>
  group_by(biasmotivedescription) |>
  count() |>
  arrange(desc(n))
head(bias_count)

# A tibble: 6 × 2
# Groups:   biasmotivedescription [6]
  biasmotivedescription          n
  <chr>                      <int>
1 ANTI-JEWISH                 1906
2 ANTI-MALE HOMOSEXUAL (GAY)   489
3 ANTI-ASIAN                   401
4 ANTI-BLACK                   315
5 ANTI-OTHER ETHNICITY         168
6 ANTI-MUSLIM                  156

Visualize thse counts as a bar graph

ggplot(hatecrimes, aes(x = biasmotivedescription)) +
  geom_bar()

Use inclusion/exclusion criteria to filter

There are 29 different motives, some with only one or two crimes. Filter for the top ten using the bias_count subset with geom_col()

bias_count |>
  head(10) |>
  ggplot(aes(x = biasmotivedescription, y = n)) +
  geom_col()

Arrange the bars according to height, and rotate

Use ‘reorder’ and ‘coord_flip’

bias_count |>
  head(10) |>
  ggplot(aes(x = reorder(biasmotivedescription, n), y = n)) +
  geom_col() +
  coord_flip()

Add title, caption for data source, and x-axis label

bias_count |>
  head(10) |>
  ggplot(aes(x = reorder(biasmotivedescription, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(x = "",
       y = "counts of hate crime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts Based on the Hate Crime Motive",
       caption = "Source: NY Division of Criminal Justice Services")

Finally add color and change the theme

bias_count |>
  head(10) |>
  ggplot(aes(x = reorder(biasmotivedescription, n), y = n)) +
  geom_col(fill = "darkred") +
  coord_flip() +
  labs(x = "",
       y = "counts of hate crime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts Based on the Hate Crime Motive",
       caption = "Source: NY Division of Criminal Justice Services") +
  theme_bw()

Add annotations for counts and remove the x-axis values

Aso expanding the y axis limit so the count for the anti-Jewish crimes fit on the plot.

bias_count |>
  head(10) |>
  ggplot(aes(x = reorder(biasmotivedescription, n), y = n)) +
  geom_col(fill = "darkred") +
  ylim(0,2050) +
  coord_flip() +
  labs(x = "",
       y = "counts of hate crime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts Based on the Hate Crime Motive",
       caption = "Source: NY Division of Criminal Justice Services") +
  theme_bw() +
  geom_text(aes(label = n), hjust = -.05, size = 3.6) +
  theme(axis.text.x = element_blank())

Look deeper into crimes against Jewish, Asian, Black people, and gay males

Remember to mind spelling.

First check the year totals

hate_year <- hatecrimes|> 
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK")) |>
  group_by(complaintyearnumber) |>
  count(biasmotivedescription) |>
  arrange(desc(n))
hate_year

# A tibble: 28 × 3
# Groups:   complaintyearnumber [7]
   complaintyearnumber biasmotivedescription          n
                 <dbl> <chr>                      <int>
 1                2024 ANTI-JEWISH                  371
 2                2023 ANTI-JEWISH                  343
 3                2025 ANTI-JEWISH                  320
 4                2022 ANTI-JEWISH                  279
 5                2019 ANTI-JEWISH                  252
 6                2021 ANTI-JEWISH                  215
 7                2021 ANTI-ASIAN                   150
 8                2020 ANTI-JEWISH                  126
 9                2023 ANTI-MALE HOMOSEXUAL (GAY)   116
10                2022 ANTI-ASIAN                    91
# ℹ 18 more rows

Then check the county totals

hate_county <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK")) |>
  group_by(county) |>
  count(biasmotivedescription) |>
  arrange(desc(n))
hate_county

# A tibble: 20 × 3
# Groups:   county [5]
   county   biasmotivedescription          n
   <chr>    <chr>                      <int>
 1 KINGS    ANTI-JEWISH                  798
 2 NEW YORK ANTI-JEWISH                  651
 3 QUEENS   ANTI-JEWISH                  289
 4 NEW YORK ANTI-MALE HOMOSEXUAL (GAY)   237
 5 NEW YORK ANTI-ASIAN                   228
 6 KINGS    ANTI-MALE HOMOSEXUAL (GAY)   120
 7 KINGS    ANTI-BLACK                    99
 8 BRONX    ANTI-JEWISH                   92
 9 QUEENS   ANTI-MALE HOMOSEXUAL (GAY)    91
10 KINGS    ANTI-ASIAN                    80
11 NEW YORK ANTI-BLACK                    79
12 QUEENS   ANTI-ASIAN                    78
13 RICHMOND ANTI-JEWISH                   76
14 QUEENS   ANTI-BLACK                    75
15 BRONX    ANTI-MALE HOMOSEXUAL (GAY)    35
16 RICHMOND ANTI-BLACK                    35
17 BRONX    ANTI-BLACK                    27
18 BRONX    ANTI-ASIAN                    10
19 RICHMOND ANTI-MALE HOMOSEXUAL (GAY)     6
20 RICHMOND ANTI-ASIAN                     5

Check information combining totals from counties and years

hate2 <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK")) |>
  group_by(complaintyearnumber, county) |>
  rename(year = complaintyearnumber) |> # shorted the name of this column so I can see all the columns in output without having to scroll sideways (working in minimized half-screen windows)
  count(biasmotivedescription) |>
  arrange(desc(n))
hate2

# A tibble: 127 × 4
# Groups:   year, county [35]
    year county   biasmotivedescription     n
   <dbl> <chr>    <chr>                 <int>
 1  2024 KINGS    ANTI-JEWISH             152
 2  2024 NEW YORK ANTI-JEWISH             136
 3  2025 KINGS    ANTI-JEWISH             136
 4  2019 KINGS    ANTI-JEWISH             128
 5  2023 KINGS    ANTI-JEWISH             126
 6  2022 KINGS    ANTI-JEWISH             125
 7  2023 NEW YORK ANTI-JEWISH             124
 8  2025 NEW YORK ANTI-JEWISH             110
 9  2022 NEW YORK ANTI-JEWISH             104
10  2021 NEW YORK ANTI-ASIAN               84
# ℹ 117 more rows

Plot these four types of hate crimes together

position = “dodge” makes side by side bars stat = “identity” lets you plot bars for each group for each year (2019-2026) labs title titles the entire plot, labs fill titles the legend

ggplot(data = hate2) +
  geom_bar(aes(x=year, y=n, fill = biasmotivedescription),
          position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Nate Crime Incidents",
       title = "Hate Crimie Type in NY Counties Between 2019 -2026",
       caption = "Source: NY State Division of Criminal Justice Service") +
  scale_fill_brewer(palette = "Dark2") + 
  theme_bw()

What about the counties?

Make the bar graph by county instead of year

ggplot(data = hate2) +
  geom_bar(aes(x = county, y = n, fill = biasmotivedescription),
          position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime INcidents",
       title = "Hate Crime in NY Counties Between 2019-2026",
       caption = "Source: NY State Divisin of Criminal Justice Services") +
  scale_fill_brewer(palette = "Accent") +
  theme(axis.text.x = element_text(angle = 45)) +
  theme_minimal()

The Hightest Counts

We can see that the highest counts of hate crimes against Jewish, Asian, and Black people took place in Kings County (Brooklyn) and New York City

Put it all together with yeras and counties using “facet”

ggplot(data = hate2) +
  geom_bar(aes(x=year, y=n, fill = biasmotivedescription), # 
          position = "dodge", stat = "identity") +
  facet_wrap(~county) +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime INcidents",
       title = "Hate Crime Type in NY Counties Between 20219-2026",
       caption = "Source: NY State Division of Criminal Justice Services") +
  scale_fill_brewer(palette = "Dark2") +
  theme_bw()

How would calculations be affected by looking at hate crimes in counties per year by population densities?

Bring in census data for populations of New York counties. These are estimates from the 2010 census.

setwd("~/Schol Stuff/Montgomery College 2025/Data 110 Data Visualization/Hate Crimes HW")
nypop <- read_csv("nyc_census_pop_2020.csv")

Clean the county name to match the other dataset

Rename the variable “Area name” as “county” so it matches the first dataset

nypop$`Area Name`<- gsub( " County", "", nypop$`Area Name`) #X County becomes X
nypop2 <- nypop |>
  rename(county = `Area Name`)|>
  select(county, `2020 Census Population`)
head(nypop2)

# A tibble: 6 × 2
  county      `2020 Census Population`
  <chr>                          <dbl>
1 Albany                        314848
2 Allegany                       46456
3 Bronx                        1472654
4 Broome                        198683
5 Cattaraugus                    77042
6 Cayuga                         76248

##Join the hate2 data with nypop2

datajoin <- left_join(hate2, nypop2, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   year, county [35]
    year county   biasmotivedescription     n `2020 Census Population`
   <dbl> <chr>    <chr>                 <int>                    <dbl>
 1  2024 KINGS    ANTI-JEWISH             152                       NA
 2  2024 NEW YORK ANTI-JEWISH             136                       NA
 3  2025 KINGS    ANTI-JEWISH             136                       NA
 4  2019 KINGS    ANTI-JEWISH             128                       NA
 5  2023 KINGS    ANTI-JEWISH             126                       NA
 6  2022 KINGS    ANTI-JEWISH             125                       NA
 7  2023 NEW YORK ANTI-JEWISH             124                       NA
 8  2025 NEW YORK ANTI-JEWISH             110                       NA
 9  2022 NEW YORK ANTI-JEWISH             104                       NA
10  2021 NEW YORK ANTI-ASIAN               84                       NA
# ℹ 117 more rows

It didnt work: new column has NA values

The counties are all in uppercase in hate 2, and mixed in nypop

hate_new <- hate2 |> 
  mutate(county = as.factor(str_to_lower(as.character(county))))
nypop_new <- nypop2 |>  
  mutate(county = as.factor(str_to_lower(as.character(county))))

Try joining again

datajoin <- left_join(hate_new, nypop_new, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   year, county [35]
    year county   biasmotivedescription     n `2020 Census Population`
   <dbl> <fct>    <chr>                 <int>                    <dbl>
 1  2024 kings    ANTI-JEWISH             152                  2736074
 2  2024 new york ANTI-JEWISH             136                  1694251
 3  2025 kings    ANTI-JEWISH             136                  2736074
 4  2019 kings    ANTI-JEWISH             128                  2736074
 5  2023 kings    ANTI-JEWISH             126                  2736074
 6  2022 kings    ANTI-JEWISH             125                  2736074
 7  2023 new york ANTI-JEWISH             124                  1694251
 8  2025 new york ANTI-JEWISH             110                  1694251
 9  2022 new york ANTI-JEWISH             104                  1694251
10  2021 new york ANTI-ASIAN               84                  1694251
# ℹ 117 more rows

Calculate the rate of incidents per 100,000. Then arrange in descending order

datajoinrate <- datajoin |>
  mutate(rate = n / `2020 Census Population`* 100000) |>
  arrange(desc(rate))
datajoinrate

# A tibble: 127 × 6
# Groups:   year, county [35]
    year county   biasmotivedescription     n `2020 Census Population`  rate
   <dbl> <fct>    <chr>                 <int>                    <dbl> <dbl>
 1  2024 new york ANTI-JEWISH             136                  1694251  8.03
 2  2023 new york ANTI-JEWISH             124                  1694251  7.32
 3  2025 new york ANTI-JEWISH             110                  1694251  6.49
 4  2022 new york ANTI-JEWISH             104                  1694251  6.14
 5  2024 kings    ANTI-JEWISH             152                  2736074  5.56
 6  2025 kings    ANTI-JEWISH             136                  2736074  4.97
 7  2021 new york ANTI-ASIAN               84                  1694251  4.96
 8  2021 new york ANTI-JEWISH              84                  1694251  4.96
 9  2019 kings    ANTI-JEWISH             128                  2736074  4.68
10  2023 kings    ANTI-JEWISH             126                  2736074  4.61
# ℹ 117 more rows

Your Turn!

As mentioned at the top, this data set is biased as populations that don’t trust the police may not report crimes against them, or police may not recognize a crime as a hate crime, or the police may miscategorize a crime. I noticed the offense category column, and that it identifies anti-transgender and -gender conforming crimes as “gender” based and anti-homosexuality crimes as “sexual orientation”. This is excellent, as it shows that at least someone in the NYPD who understands that sexual orientation and gender identity are not the same thing and that there are distinct populations within the LGBTQ community, however it also means that there is no aggregate LGTBQ+ category. There are five (counting anti-GNC) bias motives descriptions across two offense categories.

I’d like to do further analysis on the anti-lgbtq+ crimes. I suspect that the anti-transgender crimes are under reported: 79 seems suspiciously low; it’s possible crimes against transwomen were improperly classified as anti–male homosexual. I would like to first find the number of crimes against the aggregate and each population by year (and county).

I’d also like to analyze the crimes by offense code or pd code description, and see if the severity of crimes is consistent across populations and years

```