Project 1

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(lubridate)
library(zoo)

## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

setwd("~/Desktop/Project 1 DATA101")

df <- read_csv("Police_Department_Investigated_Hate_Crimes.csv")

## Rows: 1790 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (13): record_id, suspects_race_as_a_group, most_serious_ucr, most_serio...
## dbl   (8): ncic, total_number_of_victims, total_number_of_individual_victims...
## lgl   (1): is_multiple_bias
## date  (1): occurence_month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction:

What type of bias in hate crime is the most present in San Francisco between the year 2016 to 2025?

What is a hate crime? A hate crime is a crime against an individual’s real or perceived race, color, religion, ethnicity, sexual orientation, gender identity, or disability. The data set represents hate crimes reported by the SFPD (San Francisco Police Department) to the California Department of Justice from 2001 to September 2025. This data set was derived from data.gov.

This data set consists of 1790 observations across 23 variables, including suspect_race_as_a_group, offensive_Act, most_serious_ucr, and most_serious_bias. This provides valuable insights into the details of the hate crime reports, such as when the crime occurred, where it occurred, and the type of bias. The variables I will be using in this project are most_serious_bias and occurrence_month. With these variables, I plan on comparing the types of hate crime bias reported the most for each year between 2016 and 2025.

Data Analysis:

In this data analysis, I will obtain information on the types of hate crime bias reported for each year, as well as the type of bias that is reported the most. I will first create a data frame that only contains data from 2016 to 2025. Once I have this data frame, I can then begin the process of extracting the number of reports on each type of bias between the years 2016 and 2025. The following objective is to find the maximum bias for every year. I first created a data frame for each year by passing the data frame, which contains the years between 2016 and 2025-to my new data frame. Then I grouped ‘most_serious_bias’ and summarized the count to extract the numbers of each bias for that specific year. Once I have the amount of each bias, I can then compare the values to find the maximum number and output the type of bias with the maximum count of reports.

Explore data

str(df)

## spc_tbl_ [1,790 × 23] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ record_id                                  : chr [1:1790] "CA01-0000001892" "CA01-0000001899" "CA01-0000001906" "CA01-0000001913" ...
##  $ occurence_month                            : Date[1:1790], format: "2001-12-01" "2001-01-01" ...
##  $ ncic                                       : num [1:1790] 3801 3801 3801 3801 NA ...
##  $ total_number_of_victims                    : num [1:1790] 1 1 1 1 1 1 1 1 1 1 ...
##  $ total_number_of_individual_victims         : num [1:1790] 1 1 1 1 1 1 1 0 0 2 ...
##  $ suspects_race_as_a_group                   : chr [1:1790] "White" "Unknown" "White" "White" ...
##  $ total_number_of_suspects                   : num [1:1790] 1 0 2 1 2 1 5 1 0 1 ...
##  $ most_serious_ucr                           : chr [1:1790] "Intimidation" "Intimidation" "Intimidation" "Simple Assault" ...
##  $ most_serious_ucr_type                      : chr [1:1790] "Violent Crimes" "Violent Crimes" "Violent Crimes" "Violent Crimes" ...
##  $ most_serious_location                      : chr [1:1790] "Highway/Road/Alley/Street" "Commercial/Office Building" "Highway/Road/Alley/Street" "Parking Lot/Garage" ...
##  $ most_serious_bias                          : chr [1:1790] "Anti-Transgender" "Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)" "Anti-Black or African American" "Anti-Other Race/Ethnicity/Ancestry" ...
##  $ most_serious_bias_type                     : chr [1:1790] "Gender Nonconforming" "Sexual Orientation" "Race/Ethnicity/Ancestry" "Race/Ethnicity/Ancestry" ...
##  $ most_serious_victim_type                   : chr [1:1790] "Person" "Person" "Person" "Person" ...
##  $ weapon_type                                : chr [1:1790] NA NA NA "Other (bottle, rocks, spitting)" ...
##  $ offensive_act                              : chr [1:1790] "Verbal slurs" "Threatening letters/flyers/email" "Verbal slurs" "Verbal slurs" ...
##  $ is_multiple_bias                           : logi [1:1790] NA NA NA NA NA NA ...
##  $ total_number_of_individual_victims_adult   : num [1:1790] NA NA NA NA 1 1 1 0 0 2 ...
##  $ total_number_of_individual_victims_juvenile: num [1:1790] NA NA NA NA 0 0 0 0 0 0 ...
##  $ total_number_of_suspects_adult             : num [1:1790] NA NA NA NA 2 0 0 1 0 1 ...
##  $ total_number_of_suspects_juvenile          : num [1:1790] NA NA NA NA 0 0 0 0 0 0 ...
##  $ suspects_ethnicity_as_a_group              : chr [1:1790] NA NA NA NA ...
##  $ data_as_of                                 : chr [1:1790] "2025/09/18 12:00:00 AM" "2025/09/18 12:00:00 AM" "2025/09/18 12:00:00 AM" "2025/09/18 12:00:00 AM" ...
##  $ data_loaded_at                             : chr [1:1790] "2025/09/20 11:03:55 AM" "2025/09/20 11:03:55 AM" "2025/09/20 11:03:55 AM" "2025/09/20 11:03:55 AM" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   record_id = col_character(),
##   ..   occurence_month = col_date(format = ""),
##   ..   ncic = col_double(),
##   ..   total_number_of_victims = col_double(),
##   ..   total_number_of_individual_victims = col_double(),
##   ..   suspects_race_as_a_group = col_character(),
##   ..   total_number_of_suspects = col_double(),
##   ..   most_serious_ucr = col_character(),
##   ..   most_serious_ucr_type = col_character(),
##   ..   most_serious_location = col_character(),
##   ..   most_serious_bias = col_character(),
##   ..   most_serious_bias_type = col_character(),
##   ..   most_serious_victim_type = col_character(),
##   ..   weapon_type = col_character(),
##   ..   offensive_act = col_character(),
##   ..   is_multiple_bias = col_logical(),
##   ..   total_number_of_individual_victims_adult = col_double(),
##   ..   total_number_of_individual_victims_juvenile = col_double(),
##   ..   total_number_of_suspects_adult = col_double(),
##   ..   total_number_of_suspects_juvenile = col_double(),
##   ..   suspects_ethnicity_as_a_group = col_character(),
##   ..   data_as_of = col_character(),
##   ..   data_loaded_at = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

head(df)

## # A tibble: 6 × 23
##   record_id  occurence_month  ncic total_number_of_vict…¹ total_number_of_indi…²
##   <chr>      <date>          <dbl>                  <dbl>                  <dbl>
## 1 CA01-0000… 2001-12-01       3801                      1                      1
## 2 CA01-0000… 2001-01-01       3801                      1                      1
## 3 CA01-0000… 2001-03-01       3801                      1                      1
## 4 CA01-0000… 2001-01-01       3801                      1                      1
## 5 220015368  2022-01-01         NA                      1                      1
## 6 220078413  2022-02-01         NA                      1                      1
## # ℹ abbreviated names: ¹total_number_of_victims,
## #   ²total_number_of_individual_victims
## # ℹ 18 more variables: suspects_race_as_a_group <chr>,
## #   total_number_of_suspects <dbl>, most_serious_ucr <chr>,
## #   most_serious_ucr_type <chr>, most_serious_location <chr>,
## #   most_serious_bias <chr>, most_serious_bias_type <chr>,
## #   most_serious_victim_type <chr>, weapon_type <chr>, offensive_act <chr>, …

Check if any NA’s are present

colSums(is.na(df))

##                                   record_id 
##                                           0 
##                             occurence_month 
##                                           0 
##                                        ncic 
##                                         182 
##                     total_number_of_victims 
##                                           0 
##          total_number_of_individual_victims 
##                                           0 
##                    suspects_race_as_a_group 
##                                           0 
##                    total_number_of_suspects 
##                                           0 
##                            most_serious_ucr 
##                                           0 
##                       most_serious_ucr_type 
##                                         182 
##                       most_serious_location 
##                                           0 
##                           most_serious_bias 
##                                           0 
##                      most_serious_bias_type 
##                                           0 
##                    most_serious_victim_type 
##                                           0 
##                                 weapon_type 
##                                         774 
##                               offensive_act 
##                                          47 
##                            is_multiple_bias 
##                                        1789 
##    total_number_of_individual_victims_adult 
##                                        1221 
## total_number_of_individual_victims_juvenile 
##                                        1221 
##              total_number_of_suspects_adult 
##                                        1221 
##           total_number_of_suspects_juvenile 
##                                        1221 
##               suspects_ethnicity_as_a_group 
##                                        1422 
##                                  data_as_of 
##                                           0 
##                              data_loaded_at 
##                                           0

#no NA's present in the columns I will be using

Explore the number of bias for each category

#Frequency table
table(df$most_serious_bias)

## 
##                                              Anti-Arab 
##                                                     42 
##                                             Anti-Asian 
##                                                    210 
##                                          Anti-Bisexual 
##                                                      2 
##                         Anti-Black or African American 
##                                                    229 
##                                          Anti-Catholic 
##                                                     16 
##                                Anti-Citizenship Status 
##                                                      2 
##                                            Anti-Female 
##                                                      4 
##                                        Anti-Gay (Male) 
##                                                    427 
##                             Anti-Gender Non-Conforming 
##                                                      1 
##                                             Anti-Hindu 
##                                                      1 
##                                Anti-Hispanic or Latino 
##                                                     96 
##                                  Anti-Islamic (Muslim) 
##                                                     47 
##                                            Anti-Jewish 
##                                                    187 
##                                           Anti-Lesbian 
##                                                     62 
## Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group) 
##                                                     54 
##                  Anti-Lesbian/Gay/Bisexual/Transgender 
##                                                     10 
##                                              Anti-Male 
##                                                      1 
##                                 Anti-Mental Disability 
##                                                      1 
##                            Anti-Multiple Races (Group) 
##                                                     36 
##                        Anti-Multiple Religions (Group) 
##                                                      4 
##                                   Anti-Other Christian 
##                                                      5 
##                     Anti-Other Race/Ethnicity/Ancestry 
##                                                    141 
##                                    Anti-Other Religion 
##                                                     13 
##                               Anti-Physical Disability 
##                                                      4 
##                                        Anti-Protestant 
##                                                      1 
##                                       Anti-Transgender 
##                                                     85 
##                                             Anti-White 
##                                                    109

Create a new column only containing the year of the date

df <- df |>
  mutate(year = year(occurence_month)) #extract the year from the date the crime occurred
head(df)

## # A tibble: 6 × 24
##   record_id  occurence_month  ncic total_number_of_vict…¹ total_number_of_indi…²
##   <chr>      <date>          <dbl>                  <dbl>                  <dbl>
## 1 CA01-0000… 2001-12-01       3801                      1                      1
## 2 CA01-0000… 2001-01-01       3801                      1                      1
## 3 CA01-0000… 2001-03-01       3801                      1                      1
## 4 CA01-0000… 2001-01-01       3801                      1                      1
## 5 220015368  2022-01-01         NA                      1                      1
## 6 220078413  2022-02-01         NA                      1                      1
## # ℹ abbreviated names: ¹total_number_of_victims,
## #   ²total_number_of_individual_victims
## # ℹ 19 more variables: suspects_race_as_a_group <chr>,
## #   total_number_of_suspects <dbl>, most_serious_ucr <chr>,
## #   most_serious_ucr_type <chr>, most_serious_location <chr>,
## #   most_serious_bias <chr>, most_serious_bias_type <chr>,
## #   most_serious_victim_type <chr>, weapon_type <chr>, offensive_act <chr>, …

Records from 2016 to 2025

df_post2016 <- df |>
  filter(year %in% c(2016:2025)) |>
  arrange(year)
print(df_post2016)

## # A tibble: 560 × 24
##    record_id occurence_month  ncic total_number_of_vict…¹ total_number_of_indi…²
##    <chr>     <date>          <dbl>                  <dbl>                  <dbl>
##  1 CA00-000… 2016-03-01       3801                      1                      1
##  2 CA00-000… 2016-06-01       3801                      1                      1
##  3 CA00-000… 2016-06-01       3801                      2                      2
##  4 CA00-000… 2016-08-01       3801                      1                      1
##  5 CA00-000… 2016-10-01       3801                      1                      1
##  6 CA00-000… 2016-09-01       3801                      1                      1
##  7 CA00-000… 2016-02-01       3801                      1                      1
##  8 CA00-000… 2016-04-01       3801                      2                      2
##  9 CA00-000… 2016-06-01       3801                      1                      1
## 10 CA00-000… 2016-08-01       3801                      1                      1
## # ℹ 550 more rows
## # ℹ abbreviated names: ¹total_number_of_victims,
## #   ²total_number_of_individual_victims
## # ℹ 19 more variables: suspects_race_as_a_group <chr>,
## #   total_number_of_suspects <dbl>, most_serious_ucr <chr>,
## #   most_serious_ucr_type <chr>, most_serious_location <chr>,
## #   most_serious_bias <chr>, most_serious_bias_type <chr>, …

Frequency table of bias from 2016-2025

table(df_post2016$most_serious_bias)

## 
##                                              Anti-Arab 
##                                                     11 
##                                             Anti-Asian 
##                                                    118 
##                                          Anti-Bisexual 
##                                                      1 
##                         Anti-Black or African American 
##                                                     77 
##                                          Anti-Catholic 
##                                                      2 
##                                Anti-Citizenship Status 
##                                                      2 
##                                            Anti-Female 
##                                                      2 
##                                        Anti-Gay (Male) 
##                                                     97 
##                             Anti-Gender Non-Conforming 
##                                                      1 
##                                             Anti-Hindu 
##                                                      1 
##                                Anti-Hispanic or Latino 
##                                                     53 
##                                  Anti-Islamic (Muslim) 
##                                                     17 
##                                            Anti-Jewish 
##                                                     71 
##                                           Anti-Lesbian 
##                                                      8 
## Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group) 
##                                                      6 
##                  Anti-Lesbian/Gay/Bisexual/Transgender 
##                                                     10 
##                                              Anti-Male 
##                                                      1 
##                                 Anti-Mental Disability 
##                                                      1 
##                            Anti-Multiple Races (Group) 
##                                                      6 
##                                   Anti-Other Christian 
##                                                      5 
##                     Anti-Other Race/Ethnicity/Ancestry 
##                                                     15 
##                                    Anti-Other Religion 
##                                                      2 
##                               Anti-Physical Disability 
##                                                      1 
##                                        Anti-Protestant 
##                                                      1 
##                                       Anti-Transgender 
##                                                     28 
##                                             Anti-White 
##                                                     23

#According to the table Anti-Asian is the most prominent hate crime in San Francisco between the years 2016-2025

Extract the number of each type of bias and then find the max for that year

2016

year_2016 <- df_post2016 |>
  filter(year == 2016) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2016)

## # A tibble: 13 × 2
##    most_serious_bias                                      Count
##    <chr>                                                  <int>
##  1 Anti-Arab                                                  1
##  2 Anti-Asian                                                 3
##  3 Anti-Black or African American                             3
##  4 Anti-Female                                                1
##  5 Anti-Gay (Male)                                           11
##  6 Anti-Hindu                                                 1
##  7 Anti-Jewish                                                4
##  8 Anti-Lesbian                                               2
##  9 Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)     1
## 10 Anti-Other Religion                                        1
## 11 Anti-Protestant                                            1
## 12 Anti-Transgender                                           4
## 13 Anti-White                                                 3

most_bias_2016 <- year_2016$most_serious_bias[which.max(year_2016$Count)]
print(most_bias_2016)

## [1] "Anti-Gay (Male)"

2017

year_2017 <- df_post2016 |>
  filter(year == 2017) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2017)

## # A tibble: 12 × 2
##    most_serious_bias                                      Count
##    <chr>                                                  <int>
##  1 Anti-Asian                                                 7
##  2 Anti-Bisexual                                              1
##  3 Anti-Black or African American                             5
##  4 Anti-Catholic                                              1
##  5 Anti-Gay (Male)                                            9
##  6 Anti-Hispanic or Latino                                    5
##  7 Anti-Islamic (Muslim)                                      3
##  8 Anti-Jewish                                                3
##  9 Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)     1
## 10 Anti-Other Religion                                        1
## 11 Anti-Transgender                                           4
## 12 Anti-White                                                 2

most_bias_2017 <- year_2017$most_serious_bias[which.max(year_2017$Count)]
print(most_bias_2017)

## [1] "Anti-Gay (Male)"

2018

year_2018 <- df_post2016 |>
  filter(year == 2018) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2018)

## # A tibble: 15 × 2
##    most_serious_bias                                      Count
##    <chr>                                                  <int>
##  1 Anti-Arab                                                  1
##  2 Anti-Asian                                                 4
##  3 Anti-Black or African American                            12
##  4 Anti-Citizenship Status                                    1
##  5 Anti-Gay (Male)                                           13
##  6 Anti-Hispanic or Latino                                   14
##  7 Anti-Islamic (Muslim)                                      3
##  8 Anti-Jewish                                                4
##  9 Anti-Lesbian                                               1
## 10 Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)     3
## 11 Anti-Multiple Races (Group)                                1
## 12 Anti-Other Christian                                       1
## 13 Anti-Other Race/Ethnicity/Ancestry                         4
## 14 Anti-Transgender                                           2
## 15 Anti-White                                                 4

most_bias_2018 <- year_2018$most_serious_bias[which.max(year_2018$Count)]
print(most_bias_2018)

## [1] "Anti-Hispanic or Latino"

2019

year_2019 <- df_post2016 |>
  filter(year == 2019) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2019)

## # A tibble: 12 × 2
##    most_serious_bias                  Count
##    <chr>                              <int>
##  1 Anti-Arab                              4
##  2 Anti-Asian                             6
##  3 Anti-Black or African American        11
##  4 Anti-Gay (Male)                       21
##  5 Anti-Hispanic or Latino                6
##  6 Anti-Islamic (Muslim)                  3
##  7 Anti-Jewish                            2
##  8 Anti-Lesbian                           1
##  9 Anti-Multiple Races (Group)            3
## 10 Anti-Other Race/Ethnicity/Ancestry     1
## 11 Anti-Transgender                       2
## 12 Anti-White                             4

most_bias_2019 <- year_2019$most_serious_bias[which.max(year_2019$Count)]
print(most_bias_2019)

## [1] "Anti-Gay (Male)"

2020

year_2020 <- df_post2016 |>
  filter(year == 2020) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2020)

## # A tibble: 14 × 2
##    most_serious_bias                                      Count
##    <chr>                                                  <int>
##  1 Anti-Arab                                                  1
##  2 Anti-Asian                                                 9
##  3 Anti-Black or African American                            10
##  4 Anti-Catholic                                              1
##  5 Anti-Citizenship Status                                    1
##  6 Anti-Gay (Male)                                            5
##  7 Anti-Hispanic or Latino                                    9
##  8 Anti-Islamic (Muslim)                                      2
##  9 Anti-Jewish                                                4
## 10 Anti-Lesbian                                               2
## 11 Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)     1
## 12 Anti-Other Race/Ethnicity/Ancestry                         2
## 13 Anti-Transgender                                           2
## 14 Anti-White                                                 5

most_bias_2020 <- year_2020$most_serious_bias[which.max(year_2020$Count)]
print(most_bias_2020)

## [1] "Anti-Black or African American"

2021

year_2021 <- df_post2016 |>
  filter(year == 2021) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2021)

## # A tibble: 10 × 2
##    most_serious_bias                  Count
##    <chr>                              <int>
##  1 Anti-Asian                            60
##  2 Anti-Black or African American        14
##  3 Anti-Gay (Male)                       15
##  4 Anti-Hispanic or Latino                5
##  5 Anti-Islamic (Muslim)                  2
##  6 Anti-Jewish                            8
##  7 Anti-Multiple Races (Group)            1
##  8 Anti-Other Race/Ethnicity/Ancestry     6
##  9 Anti-Transgender                       1
## 10 Anti-White                             2

most_bias_2021 <- year_2021$most_serious_bias[which.max(year_2021$Count)]
print(most_bias_2021)

## [1] "Anti-Asian"

2022

year_2022 <- df_post2016 |>
  filter(year == 2022) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2022)

## # A tibble: 13 × 2
##    most_serious_bias                     Count
##    <chr>                                 <int>
##  1 Anti-Arab                                 1
##  2 Anti-Asian                                6
##  3 Anti-Black or African American            3
##  4 Anti-Female                               1
##  5 Anti-Gay (Male)                           6
##  6 Anti-Gender Non-Conforming                1
##  7 Anti-Hispanic or Latino                   3
##  8 Anti-Jewish                               5
##  9 Anti-Lesbian/Gay/Bisexual/Transgender     2
## 10 Anti-Male                                 1
## 11 Anti-Other Christian                      3
## 12 Anti-Transgender                          2
## 13 Anti-White                                2

most_bias_2022 <- year_2022$most_serious_bias[which.max(year_2022$Count)]
print(most_bias_2022)

## [1] "Anti-Asian"

2023

year_2023 <- df_post2016 |>
  filter(year == 2023) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2023)

## # A tibble: 11 × 2
##    most_serious_bias                     Count
##    <chr>                                 <int>
##  1 Anti-Arab                                 2
##  2 Anti-Asian                               13
##  3 Anti-Black or African American            5
##  4 Anti-Gay (Male)                           4
##  5 Anti-Hispanic or Latino                   3
##  6 Anti-Islamic (Muslim)                     1
##  7 Anti-Jewish                              23
##  8 Anti-Lesbian                              2
##  9 Anti-Lesbian/Gay/Bisexual/Transgender     3
## 10 Anti-Other Christian                      1
## 11 Anti-Transgender                          6

most_bias_2023 <- year_2023$most_serious_bias[which.max(year_2023$Count)]
print(most_bias_2023)

## [1] "Anti-Jewish"

2024

year_2024 <- df_post2016 |>
  filter(year == 2024) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2024)

## # A tibble: 11 × 2
##    most_serious_bias                     Count
##    <chr>                                 <int>
##  1 Anti-Arab                                 1
##  2 Anti-Asian                                5
##  3 Anti-Black or African American           13
##  4 Anti-Gay (Male)                           9
##  5 Anti-Hispanic or Latino                   2
##  6 Anti-Jewish                              13
##  7 Anti-Lesbian/Gay/Bisexual/Transgender     3
##  8 Anti-Mental Disability                    1
##  9 Anti-Other Race/Ethnicity/Ancestry        2
## 10 Anti-Physical Disability                  1
## 11 Anti-Transgender                          2

most_bias_2024 <- year_2024$most_serious_bias[which(year_2024$Count == max(year_2024$Count) )]
print(most_bias_2024)  #There's 2 max

## [1] "Anti-Black or African American" "Anti-Jewish"

2025

year_2025 <- df_post2016 |>
  filter(year == 2025) |>
  group_by(most_serious_bias) |>
  summarize(Count = n())

print(year_2025)

## # A tibble: 10 × 2
##    most_serious_bias                     Count
##    <chr>                                 <int>
##  1 Anti-Asian                                5
##  2 Anti-Black or African American            1
##  3 Anti-Gay (Male)                           4
##  4 Anti-Hispanic or Latino                   6
##  5 Anti-Islamic (Muslim)                     3
##  6 Anti-Jewish                               5
##  7 Anti-Lesbian/Gay/Bisexual/Transgender     2
##  8 Anti-Multiple Races (Group)               1
##  9 Anti-Transgender                          3
## 10 Anti-White                                1

most_bias_2025 <- year_2025$most_serious_bias[which.max(year_2025$Count)]
print(most_bias_2025)

## [1] "Anti-Hispanic or Latino"

Summary of bias between 2016 and 2025

df_most_bias <- df_post2016 |>
  group_by(year, most_serious_bias) |>
  summarize(Count = n())

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

most_bias <- df_most_bias$most_serious_bias[which.max(df_most_bias$Count)]
print(df_most_bias)

## # A tibble: 121 × 3
## # Groups:   year [10]
##     year most_serious_bias                                      Count
##    <dbl> <chr>                                                  <int>
##  1  2016 Anti-Arab                                                  1
##  2  2016 Anti-Asian                                                 3
##  3  2016 Anti-Black or African American                             3
##  4  2016 Anti-Female                                                1
##  5  2016 Anti-Gay (Male)                                           11
##  6  2016 Anti-Hindu                                                 1
##  7  2016 Anti-Jewish                                                4
##  8  2016 Anti-Lesbian                                               2
##  9  2016 Anti-Lesbian/Gay/Bisexual or Transgender (Mixed Group)     1
## 10  2016 Anti-Other Religion                                        1
## # ℹ 111 more rows

print(most_bias) #Between 2016 and 2025 Anti-Asian is the most prevalent hate crime

## [1] "Anti-Asian"

Conclusion and Future Directions:

The analysis of hate crime reports in SF from 2016 to 2025 reveals an annual fluctuation of hate crime biases. Throughout the years, the most reported hate crime biases have shifted notably. For instance, Anti-Gay bias was the most prevalent hate crime bias in 2016, 2017, and 2019. In 2018 and 2025, anti-Hispanic or Latino bias was commonly reported. In 2020 and 2024, Anti-Black or African American bias was reported the most. Anti-Asian hate crime was prevalent in 2021 and 2022, and in 2023, anti-Jewish bias reached its peak. The variance of hate crime bias potentially suggests a relationship between incidents and the political timeline. By connecting these data points with historical and political context, we can better understand which communities are impacted the most and explore the underlying reasons for the rise in hate crimes against certain groups. For example, the analysis shows an increase in anti-Hispanic or Latino bias for the year 2025. I can infer this increase is related to the rise of anti-immigration sentiment. This exploratory data analysis provides a foundation for research on how and why these types of hate crime bias fluctuate throughout the years. In addition, this hate crime bias analysis is crucial for understanding reports, identifying trends, and relationships. Thus, allowing us to form preventative measures on hate crimes, and to create new hypotheses to further the analysis.

References:

Morey, Brittany N. “Mechanisms by Which Anti-Immigrant Stigma Exacerbates Racial/Ethnic Health Disparities.” American journal of public health vol. 108,4 (2018): 460-463. doi:10.2105/AJPH.2017.304266
“Data.gov.” Data.gov, 2025, catalog.data.gov/dataset/police-department-investigated-hate-crimes.