DATA 110_Hate Crimes Tutorial Assignment

Author

Catherine Z. Matenje

Hate Crimes in NY from 2010-2016

This assignment explores a Hate Crimes data set that includes record of all hate crimes in New York counties by the type of hate crime from 2010 to 2016.

Step 1: Load required libraries and Hate Crimes Data set, Explore Data set Columns

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)
setwd("C:/Users/cathe/OneDrive/Desktop/Montgomery College Transition/2025-2026 MONTGOMERY COLLEGE TRANSITION/MC COURSES 25-26/Spring 2026/DATA 110/01. Assignments")
hatecrimes <- read_csv("NYPD_Hate_Crimes_19-26.csv")

Rows: 4029 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Record Create Date, Patrol Borough Name, County, Law Code Category ...
dbl (4): Full Complaint ID, Complaint Year Number, Month Number, Complaint P...
lgl (1): Arrest Date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#The dataset contains 4029 observations and 14 nariables
head(hatecrimes) # View first 6 rows

# A tibble: 6 × 14
  `Full Complaint ID` Complaint Year Numbe…¹ `Month Number` `Record Create Date`
                <dbl>                  <dbl>          <dbl> <chr>               
1             2.02e14                   2019              1 1/23/2019           
2             2.02e14                   2019              2 2/25/2019           
3             2.02e14                   2019              2 2/27/2019           
4             2.02e14                   2019              4 4/16/2019           
5             2.02e14                   2019              6 6/20/2019           
6             2.02e14                   2019              7 7/31/2019           
# ℹ abbreviated name: ¹`Complaint Year Number`
# ℹ 10 more variables: `Complaint Precinct Code` <dbl>,
#   `Patrol Borough Name` <chr>, County <chr>,
#   `Law Code Category Description` <chr>, `Offense Description` <chr>,
#   `PD Code Description` <chr>, `Bias Motive Description` <chr>,
#   `Offense Category` <chr>, `Arrest Date` <lgl>, `Arrest Id` <chr>

names(hatecrimes) # I learned this code in data 101, it reveals the names of all columns/variables

 [1] "Full Complaint ID"             "Complaint Year Number"        
 [3] "Month Number"                  "Record Create Date"           
 [5] "Complaint Precinct Code"       "Patrol Borough Name"          
 [7] "County"                        "Law Code Category Description"
 [9] "Offense Description"           "PD Code Description"          
[11] "Bias Motive Description"       "Offense Category"             
[13] "Arrest Date"                   "Arrest Id"

## Step 2: Clean up variable names in data set

names(hatecrimes) <- tolower(names(hatecrimes))
names(hatecrimes) <- gsub(" ","",names(hatecrimes))
head(hatecrimes) # View first 6 rows

# A tibble: 6 × 14
  fullcomplaintid complaintyearnumber monthnumber recordcreatedate
            <dbl>               <dbl>       <dbl> <chr>           
1         2.02e14                2019           1 1/23/2019       
2         2.02e14                2019           2 2/25/2019       
3         2.02e14                2019           2 2/27/2019       
4         2.02e14                2019           4 4/16/2019       
5         2.02e14                2019           6 6/20/2019       
6         2.02e14                2019           7 7/31/2019       
# ℹ 10 more variables: complaintprecinctcode <dbl>, patrolboroughname <chr>,
#   county <chr>, lawcodecategorydescription <chr>, offensedescription <chr>,
#   pdcodedescription <chr>, biasmotivedescription <chr>,
#   offensecategory <chr>, arrestdate <lgl>, arrestid <chr>

#The above code name all variable names lowercase and removed spaces from variable names

## Step 3: Explore the bias motive (Variable Name: biasmotivedescription)

bias_count <- hatecrimes |>
  select(biasmotivedescription) |>
  group_by(biasmotivedescription) |>
  count() |>
  arrange(desc(n))
#The above code generates of table displaying counts the number of hate crimes by bias type aka frequency table
head(bias_count) #displays first 6 rows of frequency table

# A tibble: 6 × 2
# Groups:   biasmotivedescription [6]
  biasmotivedescription          n
  <chr>                      <int>
1 ANTI-JEWISH                 1906
2 ANTI-MALE HOMOSEXUAL (GAY)   489
3 ANTI-ASIAN                   401
4 ANTI-BLACK                   315
5 ANTI-OTHER ETHNICITY         168
6 ANTI-MUSLIM                  156

Step 4: Visualize these counts as a bar graph

Create a bar graph to explore bias motives further

ggplot(hatecrimes, aes(x = biasmotivedescription))+
  geom_bar()

# The above code produces a simple bar graph of the bias motives variable

There are many bias motives (29) and some have very little counts

Step 5: Use inclusion/exclusion criteria to filter

We want to create a visually meaningful graph so we will filter to only the top 10 highest bias motives

bias_count |>
  head(10) |>
  ggplot(aes(x=biasmotivedescription, y = n)) +
  geom_col()

Step 6: Arrange the bars according to height and rotate

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +
  geom_col() +
  coord_flip()

#The above code rerranges the bars from highest to lowest and flips the barchart sideways

Step 7: Add title, caption for the data source, and x-axis label

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +
  geom_col() +
  coord_flip()+
  labs(x = "",
       y = "Counts of hatecrime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts based on the hatecrime motive",
       caption = "Source: NY State Division of Criminal Justice Services")

# The above code adds a title to the graph and labels the axis

Step 8: Finally add color and change the theme

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +
  geom_col(fill = "salmon") +
  coord_flip()+
  labs(x = "",
       y = "Counts of hatecrime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts based on the hatecrime motive",
       caption = "Source: NY State Division of Criminal Justice Services") +
  theme_minimal()

Step 9: Add annotations for counts and remove the x-axis values

bias_count |>
  head(10) |>
  ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +
  geom_col(fill = "salmon") +
  coord_flip()+
  labs(x = "",
       y = "Counts of hatecrime types based on motive",
       title = "Bar Graph of Hate Crimes from 2019-2026",
       subtitle = "Counts based on the hatecrime motive",
       caption = "Source: NY State Division of Criminal Justice Services") +
  theme_minimal()+
  geom_text(aes(label = n), hjust = -.05, size = 3) +
  theme(axis.text.x = element_blank())

Step 10: Look deeper into crimes against Jewish, Asian, Black people, and gay males

Explore by Year

hate_year <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK"))|>
  group_by(complaintyearnumber) |>
  count(biasmotivedescription)|>
  arrange(desc(n))
hate_year

# A tibble: 28 × 3
# Groups:   complaintyearnumber [7]
   complaintyearnumber biasmotivedescription          n
                 <dbl> <chr>                      <int>
 1                2024 ANTI-JEWISH                  371
 2                2023 ANTI-JEWISH                  343
 3                2025 ANTI-JEWISH                  320
 4                2022 ANTI-JEWISH                  279
 5                2019 ANTI-JEWISH                  252
 6                2021 ANTI-JEWISH                  215
 7                2021 ANTI-ASIAN                   150
 8                2020 ANTI-JEWISH                  126
 9                2023 ANTI-MALE HOMOSEXUAL (GAY)   116
10                2022 ANTI-ASIAN                    91
# ℹ 18 more rows

Explore by County

hate_county <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK"))|>
  group_by(county) |>
  count(biasmotivedescription)|>
  arrange(desc(n))
hate_county

# A tibble: 20 × 3
# Groups:   county [5]
   county   biasmotivedescription          n
   <chr>    <chr>                      <int>
 1 KINGS    ANTI-JEWISH                  798
 2 NEW YORK ANTI-JEWISH                  651
 3 QUEENS   ANTI-JEWISH                  289
 4 NEW YORK ANTI-MALE HOMOSEXUAL (GAY)   237
 5 NEW YORK ANTI-ASIAN                   228
 6 KINGS    ANTI-MALE HOMOSEXUAL (GAY)   120
 7 KINGS    ANTI-BLACK                    99
 8 BRONX    ANTI-JEWISH                   92
 9 QUEENS   ANTI-MALE HOMOSEXUAL (GAY)    91
10 KINGS    ANTI-ASIAN                    80
11 NEW YORK ANTI-BLACK                    79
12 QUEENS   ANTI-ASIAN                    78
13 RICHMOND ANTI-JEWISH                   76
14 QUEENS   ANTI-BLACK                    75
15 BRONX    ANTI-MALE HOMOSEXUAL (GAY)    35
16 RICHMOND ANTI-BLACK                    35
17 BRONX    ANTI-BLACK                    27
18 BRONX    ANTI-ASIAN                    10
19 RICHMOND ANTI-MALE HOMOSEXUAL (GAY)     6
20 RICHMOND ANTI-ASIAN                     5

Step 10: Check information combining totals from counties and years

Combine Year and County

hate2 <- hatecrimes |>
  filter(biasmotivedescription %in% c("ANTI-JEWISH", "ANTI-MALE HOMOSEXUAL (GAY)", "ANTI-ASIAN", "ANTI-BLACK"))|>
  group_by(complaintyearnumber, county) |>
  count(biasmotivedescription)|>
  arrange(desc(n))
hate2

# A tibble: 127 × 4
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county   biasmotivedescription     n
                 <dbl> <chr>    <chr>                 <int>
 1                2024 KINGS    ANTI-JEWISH             152
 2                2024 NEW YORK ANTI-JEWISH             136
 3                2025 KINGS    ANTI-JEWISH             136
 4                2019 KINGS    ANTI-JEWISH             128
 5                2023 KINGS    ANTI-JEWISH             126
 6                2022 KINGS    ANTI-JEWISH             125
 7                2023 NEW YORK ANTI-JEWISH             124
 8                2025 NEW YORK ANTI-JEWISH             110
 9                2022 NEW YORK ANTI-JEWISH             104
10                2021 NEW YORK ANTI-ASIAN               84
# ℹ 117 more rows

Plot by Year

ggplot(data = hate2) +
  geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),
      position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")

Plot by County

ggplot(data = hate2) +
  geom_bar(aes(x=county, y=n, fill = biasmotivedescription),
      position = "dodge", stat = "identity") +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")

The highest counts of hate crimes occured among Jewish, Asian, and Black people in Kings County (Brooklyn) and New York County

Step 11: Facet by County

ggplot(data = hate2) +
  geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),
      position = "dodge", stat = "identity") +
  facet_wrap(~county) +
  labs(fill = "Hate Crime Type",
       y = "Number of Hate Crime Incidents",
       title = "Hate Crime Type in NY Counties Between 2010-2016",
       caption = "Source: NY State Division of Criminal Justice Services")

Step 12: How would calculations be affected by looking at hate crimes in counties per year by population densities?

We will merge the current data set with population data from the NYC 2020 Census

Load NYC Census Population Data

nypop <- read_csv("nyc_census_pop_2020.csv")

Rows: 62 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Area Name, Population Percent Change
num (2): 2020 Census Population, Population Change

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean County names in Population data set to match current data set

nypop$`Area Name` <- gsub(" County", "", nypop$`Area Name`)
nypop2 <- nypop |>
  rename(county = `Area Name`)|>
  select(county, `2020 Census Population`)
head(nypop2)

# A tibble: 6 × 2
  county      `2020 Census Population`
  <chr>                          <dbl>
1 Albany                        314848
2 Allegany                       46456
3 Bronx                        1472654
4 Broome                        198683
5 Cattaraugus                    77042
6 Cayuga                         76248

# This code also renameds varibale "Geography" in NYC census data to "county" so it matches current data set

Join the hate2 data with nypop data

datajoin <- left_join(hate2, nypop2, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <chr>  <chr>                 <int>                  <dbl>
 1                2024 KINGS  ANTI-JEWISH             152                     NA
 2                2024 NEW Y… ANTI-JEWISH             136                     NA
 3                2025 KINGS  ANTI-JEWISH             136                     NA
 4                2019 KINGS  ANTI-JEWISH             128                     NA
 5                2023 KINGS  ANTI-JEWISH             126                     NA
 6                2022 KINGS  ANTI-JEWISH             125                     NA
 7                2023 NEW Y… ANTI-JEWISH             124                     NA
 8                2025 NEW Y… ANTI-JEWISH             110                     NA
 9                2022 NEW Y… ANTI-JEWISH             104                     NA
10                2021 NEW Y… ANTI-ASIAN               84                     NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`

# Problem identified is that the county names do not match current format. In hate2 counties are in upper case and in nypop they are mixed

Fix the County Names

hate_new <- hate2 |>
  mutate(county = as_factor(str_to_lower(as.character(county))))
nypop_new <- nypop2 |>
  mutate(county = as_factor(str_to_lower(as.character(county))))

#Changes county names to lowercase

Attempt to Join the hate2 data with nypop data AGAIN

datajoin <- left_join(hate_new, nypop_new, by=c("county"))
datajoin

# A tibble: 127 × 5
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <fct>  <chr>                 <int>                  <dbl>
 1                2024 kings  ANTI-JEWISH             152                2736074
 2                2024 new y… ANTI-JEWISH             136                1694251
 3                2025 kings  ANTI-JEWISH             136                2736074
 4                2019 kings  ANTI-JEWISH             128                2736074
 5                2023 kings  ANTI-JEWISH             126                2736074
 6                2022 kings  ANTI-JEWISH             125                2736074
 7                2023 new y… ANTI-JEWISH             124                1694251
 8                2025 new y… ANTI-JEWISH             110                1694251
 9                2022 new y… ANTI-JEWISH             104                1694251
10                2021 new y… ANTI-ASIAN               84                1694251
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`

Step 13: Calculate the rate of incidents per 100,000. Then arrange in descending order

datajoinrate <- datajoin |>
  mutate(rate = n/`2020 Census Population`* 100000) |>
  arrange(desc(rate))
datajoinrate

# A tibble: 127 × 6
# Groups:   complaintyearnumber, county [35]
   complaintyearnumber county biasmotivedescription     n 2020 Census Populati…¹
                 <dbl> <fct>  <chr>                 <int>                  <dbl>
 1                2024 new y… ANTI-JEWISH             136                1694251
 2                2023 new y… ANTI-JEWISH             124                1694251
 3                2025 new y… ANTI-JEWISH             110                1694251
 4                2022 new y… ANTI-JEWISH             104                1694251
 5                2024 kings  ANTI-JEWISH             152                2736074
 6                2025 kings  ANTI-JEWISH             136                2736074
 7                2021 new y… ANTI-ASIAN               84                1694251
 8                2021 new y… ANTI-JEWISH              84                1694251
 9                2019 kings  ANTI-JEWISH             128                2736074
10                2023 kings  ANTI-JEWISH             126                2736074
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# ℹ 1 more variable: rate <dbl>

#This code standardizes the counts by population (per 100,000)

The highest hate crimes rates are in New York and Kings counties

Essay

Write about the positive and negative aspects of this hate crimes data set.
List 2 different paths you could hypothetically like to study about this data set at some future point.

One positive aspect of the NY hate crimes data set is that it provides detailed information about reported hate crimes by type (bias motive), county, and year. This allows for comparisons across time and location, which is crucial for identifying trends over time. Another positive aspect is that the data set includes additional descriptive variables, such as record create date, which can be useful in understanding trends in hate crimes within specific date ranges.

Some negative aspects are that hate crimes are very likely under reported, which means this data set may not be representative of the true number of incidents between 2019–2026. Additionally, since we are looking at various counties, boroughs, and precincts, there may be variability and inconsistency in reporting practices. Another limitation is that there is limited qualitative information about each incident. Including details about each incident could allow for further analysis using mixed methods to better understand the patterns of hate crimes in NY.

One path I would want to take is a time series analysis focusing specifically on election years to observe trends in hate crimes during that time. Another possible path would be to incorporate more detailed incident information to identify patterns in types of hate crimes, which could help inform prevention strategies in the future.