Hatecrimes Assignment

Author

Qian He

Hatecrimes Assignment

NY Hate Crimes 2019-2026

AUTHOR

Qian He

About this dataset

Flawed hate crime data collection - we should know how the data was collected

So now we know that there is possible bias in the dataset, what can we do with it?

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)
hatecrimes <-read_csv("NYPD_Hate_Crimes_19-26.csv")

Rows: 4029 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Record Create Date, Patrol Borough Name, County, Law Code Category ...
dbl (4): Full Complaint ID, Complaint Year Number, Month Number, Complaint P...
lgl (1): Arrest Date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean up the data:

Make all headers lowercase and remove spaces

#make lowercase
names(hatecrimes) <- tolower(names(hatecrimes))
#global substitution=sub
#format:gsub(A,replacement,x)
names(hatecrimes) <- gsub(" ","",names(hatecrimes))
#show first 6 rows
head(hatecrimes)

# A tibble: 6 × 14
  fullcomplaintid complaintyearnumber monthnumber recordcreatedate
            <dbl>               <dbl>       <dbl> <chr>           
1         2.02e14                2019           1 1/23/2019       
2         2.02e14                2019           2 2/25/2019       
3         2.02e14                2019           2 2/27/2019       
4         2.02e14                2019           4 4/16/2019       
5         2.02e14                2019           6 6/20/2019       
6         2.02e14                2019           7 7/31/2019       
# ℹ 10 more variables: complaintprecinctcode <dbl>, patrolboroughname <chr>,
#   county <chr>, lawcodecategorydescription <chr>, offensedescription <chr>,
#   pdcodedescription <chr>, biasmotivedescription <chr>,
#   offensecategory <chr>, arrestdate <lgl>, arrestid <chr>

Explore the bias motive (biasmotivedescription)

#name, (dataframe/tibble)
bias_count <- hatecrimes |>
  select(biasmotivedescription) |>
  group_by(biasmotivedescription) |>
  #nums of bias description
  count() |>
  arrange(desc(n))
head(bias_count)

# A tibble: 6 × 2
# Groups:   biasmotivedescription [6]
  biasmotivedescription          n
  <chr>                      <int>
1 ANTI-JEWISH                 1906
2 ANTI-MALE HOMOSEXUAL (GAY)   489
3 ANTI-ASIAN                   401
4 ANTI-BLACK                   315
5 ANTI-OTHER ETHNICITY         168
6 ANTI-MUSLIM                  156

Visualize these counts as a bar graph

#grammar of graphics plot
ggplot(hatecrimes,aes(x=biasmotivedescription))+
  #bar chart
  geom_bar()

Use inclusion/exclusion criteria to filter

bias_count |>
  head(10) |>
  ggplot(aes(x=biasmotivedescription, y=n))+
  geom_col()

Arrange the bars according to height and rotate

bias_count |>
  head(10) |>
  #reorder 
  ggplot(aes(x=reorder(biasmotivedescription,n),y=n))+
  geom_col()+
#when to use + ：inside ggplot(labs/coord_flip...)
  coord_flip()

Add title, caption for the data source, and x-axis label

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y=n))+
  geom_col()+
  coord_flip()+
  # labs(x=""
  labs(x="",
      y="Counts of hatecrime types based on motive",
      title="Bar Graph of Hate Crimes from 2019-2026",
      subtitle="Counts based on the hatecrime motive",
      caption="Source: NY State Division of Criminal Justice Services")

Finally add color and change the theme

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription,n), y=n)) +
  geom_col(fill="salmon")+
  coord_flip()+
  labs(x="",
       y="Counts of hatecrime types based on motive",
       title="Bar Graph of Hate Crimes from 2019-2026",
       subtitle="Counts based on the hatecrime motive",
       caption="Source: NY State Division of Criminal Justice Services")+
  #theme_minimal()
  theme_minimal()

Add annotations for counts and remove the x-axis values

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y=n))+
  geom_col(fill="lightblue")+
  coord_flip()+
  labs(x="",
       y="Counts of hatecrime types based on motive",
       title="Bar Graph of Hate Crimes from 2019-2026",
       subtitle="Counts based on the hatecrime motive",
       caption="Source: NY State Division of Criminal Justice Services")+
  theme_minimal()+
  #geom_text()
  geom_text(aes(label=n),hjust=-.05, size=4)+
  theme(axis.text.x=element_blank())

Look deeper into crimes against Jewish, Asian, Black people, and gay males

First check the year totals

hate_year <- hatecrimes |>
  #%in%有
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK")) |>
  group_by(complaintyearnumber) |>
  count(biasmotivedescription) |>
  arrange(desc(n))
hate_year

# A tibble: 28 × 3
# Groups:   complaintyearnumber [7]
   complaintyearnumber biasmotivedescription          n
                 <dbl> <chr>                      <int>
 1                2024 ANTI-JEWISH                  371
 2                2023 ANTI-JEWISH                  343
 3                2025 ANTI-JEWISH                  320
 4                2022 ANTI-JEWISH                  279
 5                2019 ANTI-JEWISH                  252
 6                2021 ANTI-JEWISH                  215
 7                2021 ANTI-ASIAN                   150
 8                2020 ANTI-JEWISH                  126
 9                2023 ANTI-MALE HOMOSEXUAL (GAY)   116
10                2022 ANTI-ASIAN                    91
# ℹ 18 more rows

Then check the county totals

hate_county <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK"))|>
  group_by(county) |>
  count(biasmotivedescription)|>
  arrange(desc(n))
hate_county

# A tibble: 20 × 3
# Groups:   county [5]
   county   biasmotivedescription          n
   <chr>    <chr>                      <int>
 1 KINGS    ANTI-JEWISH                  798
 2 NEW YORK ANTI-JEWISH                  651
 3 QUEENS   ANTI-JEWISH                  289
 4 NEW YORK ANTI-MALE HOMOSEXUAL (GAY)   237
 5 NEW YORK ANTI-ASIAN                   228
 6 KINGS    ANTI-MALE HOMOSEXUAL (GAY)   120
 7 KINGS    ANTI-BLACK                    99
 8 BRONX    ANTI-JEWISH                   92
 9 QUEENS   ANTI-MALE HOMOSEXUAL (GAY)    91
10 KINGS    ANTI-ASIAN                    80
11 NEW YORK ANTI-BLACK                    79
12 QUEENS   ANTI-ASIAN                    78
13 RICHMOND ANTI-JEWISH                   76
14 QUEENS   ANTI-BLACK                    75
15 BRONX    ANTI-MALE HOMOSEXUAL (GAY)    35
16 RICHMOND ANTI-BLACK                    35
17 BRONX    ANTI-BLACK                    27
18 BRONX    ANTI-ASIAN                    10
19 RICHMOND ANTI-MALE HOMOSEXUAL (GAY)     6
20 RICHMOND ANTI-ASIAN                     5

Check information combining totals from counties and years

hate2 <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK"))|>
  group_by(complaintyearnumber, county) |>
  count(biasmotivedescription)|>
  arrange(desc(n))
hate2

# A tibble: 127 × 4
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county   biasmotivedescription     n
                 <dbl> <chr>    <chr>                 <int>
 1                2024 KINGS    ANTI-JEWISH             152
 2                2024 NEW YORK ANTI-JEWISH             136
 3                2025 KINGS    ANTI-JEWISH             136
 4                2019 KINGS    ANTI-JEWISH             128
 5                2023 KINGS    ANTI-JEWISH             126
 6                2022 KINGS    ANTI-JEWISH             125
 7                2023 NEW YORK ANTI-JEWISH             124
 8                2025 NEW YORK ANTI-JEWISH             110
 9                2022 NEW YORK ANTI-JEWISH             104
10                2021 NEW YORK ANTI-ASIAN               84
# ℹ 117 more rows

Plot these three types of hate crimes together

ggplot(data=hate2)+
  geom_bar(aes(x=complaintyearnumber, y=n,fill=biasmotivedescription),
           #position:dodge=side by side
           position="dodge",stat="identity")+
  labs(fill="Hate Crime Type",
       y="Number of Hate Crime Incidents",
       title="Hate Crime Type in NY Counties Between 2010-2016",
       caption="Source: NY State Division of Criminal Justice Services")

What about the counties?

ggplot(data = hate2) +
  geom_bar(aes(x=county, y=n, fill = biasmotivedescription),
      position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")

The highest counts

Put it all together with years and counties using “facet”

ggplot(data = hate2) +
  geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),
      position = "dodge", stat = "identity") +
  facet_wrap(~county)+
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")

How would calculations be affected by looking at hate crimes in counties per year by population densities?

nypop <- read_csv("nyc_census_pop_2020.csv")

Rows: 62 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Area Name, Population Percent Change
num (2): 2020 Census Population, Population Change

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean the county name to match the other dataset

nypop$`Area Name` <-gsub("county", "", nypop$`Area Name`)
nypop2 <- nypop |>
  rename(county='Area Name') |>
  select(county, '2020 Census Population')
head(nypop2)

# A tibble: 6 × 2
  county             `2020 Census Population`
  <chr>                                 <dbl>
1 Albany County                        314848
2 Allegany County                       46456
3 Bronx County                        1472654
4 Broome County                        198683
5 Cattaraugus County                    77042
6 Cayuga County                         76248

Join the hate2 data with nypop

#left_join：
datajoin <-left_join(hate2, nypop2, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <chr>  <chr>                 <int>                  <dbl>
 1                2024 KINGS  ANTI-JEWISH             152                     NA
 2                2024 NEW Y… ANTI-JEWISH             136                     NA
 3                2025 KINGS  ANTI-JEWISH             136                     NA
 4                2019 KINGS  ANTI-JEWISH             128                     NA
 5                2023 KINGS  ANTI-JEWISH             126                     NA
 6                2022 KINGS  ANTI-JEWISH             125                     NA
 7                2023 NEW Y… ANTI-JEWISH             124                     NA
 8                2025 NEW Y… ANTI-JEWISH             110                     NA
 9                2022 NEW Y… ANTI-JEWISH             104                     NA
10                2021 NEW Y… ANTI-ASIAN               84                     NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`

It didn’t work - the new column has NA values

hate_new <- hate2 |>
  #str_to_lower
  mutate(county=as_factor(str_to_lower(as.character(county))))
nypop_new <- nypop2 |>
  mutate(county=as_factor(str_to_lower(as.character(county))))

Try joining again

datajoin <- left_join(hate_new,nypop_new, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <fct>  <chr>                 <int>                  <dbl>
 1                2024 kings  ANTI-JEWISH             152                     NA
 2                2024 new y… ANTI-JEWISH             136                     NA
 3                2025 kings  ANTI-JEWISH             136                     NA
 4                2019 kings  ANTI-JEWISH             128                     NA
 5                2023 kings  ANTI-JEWISH             126                     NA
 6                2022 kings  ANTI-JEWISH             125                     NA
 7                2023 new y… ANTI-JEWISH             124                     NA
 8                2025 new y… ANTI-JEWISH             110                     NA
 9                2022 new y… ANTI-JEWISH             104                     NA
10                2021 new y… ANTI-ASIAN               84                     NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`

Calculate the rate of incidents per 100,000. Then arrange in descending order

datajoinrate <- datajoin |>
  #n=count,`...` for space,1000000 population
  mutate(rate = n/`2020 Census Population`* 100000) |>
  arrange(desc(rate))
datajoinrate

# A tibble: 127 × 6
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <fct>  <chr>                 <int>                  <dbl>
 1                2024 kings  ANTI-JEWISH             152                     NA
 2                2024 new y… ANTI-JEWISH             136                     NA
 3                2025 kings  ANTI-JEWISH             136                     NA
 4                2019 kings  ANTI-JEWISH             128                     NA
 5                2023 kings  ANTI-JEWISH             126                     NA
 6                2022 kings  ANTI-JEWISH             125                     NA
 7                2023 new y… ANTI-JEWISH             124                     NA
 8                2025 new y… ANTI-JEWISH             110                     NA
 9                2022 new y… ANTI-JEWISH             104                     NA
10                2021 new y… ANTI-ASIAN               84                     NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# ℹ 1 more variable: rate <dbl>

Essay

I think the positive aspect of this hate crimes dataset is that it spans 16 years and compares different types of biased motives. The dataset is relatively complete and diverse, which allows us to obtain a fair result.

However, there may be underreporting cases as well for various reasons. Additionally, the definition of “hate crime” is not standardized or accurate. Different people have different opinions about it, which may compromise the dataset’s quality. Last but not least, it would be better if this dataset included detailed individual cases, so that we, as readers, can have a clearer picture of the victims’ families and educational backgrounds, and conduct a more comprehensive and in-depth analysis.

In the future, I would like to explore the relationships between major political or social events and hate crime rates, as well as between educational backgrounds and crime rates.