Rows: 4029 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Record Create Date, Patrol Borough Name, County, Law Code Category ...
dbl (4): Full Complaint ID, Complaint Year Number, Month Number, Complaint P...
lgl (1): Arrest Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Clean up the data:
Make all headers lowercase and remove spaces
#make lowercasenames(hatecrimes) <-tolower(names(hatecrimes))#global substitution=sub#format:gsub(A,replacement,x)names(hatecrimes) <-gsub(" ","",names(hatecrimes))#show first 6 rowshead(hatecrimes)
bias_count |>head(10) |>#reorder ggplot(aes(x=reorder(biasmotivedescription,n),y=n))+geom_col()+#when to use + :inside ggplot(labs/coord_flip...)coord_flip()
Add title, caption for the data source, and x-axis label
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y=n))+geom_col()+coord_flip()+# labs(x=""labs(x="",y="Counts of hatecrime types based on motive",title="Bar Graph of Hate Crimes from 2019-2026",subtitle="Counts based on the hatecrime motive",caption="Source: NY State Division of Criminal Justice Services")
Finally add color and change the theme
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription,n), y=n)) +geom_col(fill="salmon")+coord_flip()+labs(x="",y="Counts of hatecrime types based on motive",title="Bar Graph of Hate Crimes from 2019-2026",subtitle="Counts based on the hatecrime motive",caption="Source: NY State Division of Criminal Justice Services")+#theme_minimal()theme_minimal()
Add annotations for counts and remove the x-axis values
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y=n))+geom_col(fill="lightblue")+coord_flip()+labs(x="",y="Counts of hatecrime types based on motive",title="Bar Graph of Hate Crimes from 2019-2026",subtitle="Counts based on the hatecrime motive",caption="Source: NY State Division of Criminal Justice Services")+theme_minimal()+#geom_text()geom_text(aes(label=n),hjust=-.05, size=4)+theme(axis.text.x=element_blank())
Look deeper into crimes against Jewish, Asian, Black people, and gay males
# A tibble: 127 × 4
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n
<dbl> <chr> <chr> <int>
1 2024 KINGS ANTI-JEWISH 152
2 2024 NEW YORK ANTI-JEWISH 136
3 2025 KINGS ANTI-JEWISH 136
4 2019 KINGS ANTI-JEWISH 128
5 2023 KINGS ANTI-JEWISH 126
6 2022 KINGS ANTI-JEWISH 125
7 2023 NEW YORK ANTI-JEWISH 124
8 2025 NEW YORK ANTI-JEWISH 110
9 2022 NEW YORK ANTI-JEWISH 104
10 2021 NEW YORK ANTI-ASIAN 84
# ℹ 117 more rows
Plot these three types of hate crimes together
ggplot(data=hate2)+geom_bar(aes(x=complaintyearnumber, y=n,fill=biasmotivedescription),#position:dodge=side by sideposition="dodge",stat="identity")+labs(fill="Hate Crime Type",y="Number of Hate Crime Incidents",title="Hate Crime Type in NY Counties Between 2010-2016",caption="Source: NY State Division of Criminal Justice Services")
What about the counties?
ggplot(data = hate2) +geom_bar(aes(x=county, y=n, fill = biasmotivedescription),position ="dodge", stat ="identity") +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")
The highest counts
Put it all together with years and counties using “facet”
ggplot(data = hate2) +geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),position ="dodge", stat ="identity") +facet_wrap(~county)+labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")
How would calculations be affected by looking at hate crimes in counties per year by population densities?
nypop <-read_csv("nyc_census_pop_2020.csv")
Rows: 62 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Area Name, Population Percent Change
num (2): 2020 Census Population, Population Change
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 6 × 2
county `2020 Census Population`
<chr> <dbl>
1 Albany County 314848
2 Allegany County 46456
3 Bronx County 1472654
4 Broome County 198683
5 Cattaraugus County 77042
6 Cayuga County 76248
# A tibble: 127 × 5
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <chr> <chr> <int> <dbl>
1 2024 KINGS ANTI-JEWISH 152 NA
2 2024 NEW Y… ANTI-JEWISH 136 NA
3 2025 KINGS ANTI-JEWISH 136 NA
4 2019 KINGS ANTI-JEWISH 128 NA
5 2023 KINGS ANTI-JEWISH 126 NA
6 2022 KINGS ANTI-JEWISH 125 NA
7 2023 NEW Y… ANTI-JEWISH 124 NA
8 2025 NEW Y… ANTI-JEWISH 110 NA
9 2022 NEW Y… ANTI-JEWISH 104 NA
10 2021 NEW Y… ANTI-ASIAN 84 NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# A tibble: 127 × 5
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <fct> <chr> <int> <dbl>
1 2024 kings ANTI-JEWISH 152 NA
2 2024 new y… ANTI-JEWISH 136 NA
3 2025 kings ANTI-JEWISH 136 NA
4 2019 kings ANTI-JEWISH 128 NA
5 2023 kings ANTI-JEWISH 126 NA
6 2022 kings ANTI-JEWISH 125 NA
7 2023 new y… ANTI-JEWISH 124 NA
8 2025 new y… ANTI-JEWISH 110 NA
9 2022 new y… ANTI-JEWISH 104 NA
10 2021 new y… ANTI-ASIAN 84 NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
Calculate the rate of incidents per 100,000. Then arrange in descending order
datajoinrate <- datajoin |>#n=count,`...` for space,1000000 populationmutate(rate = n/`2020 Census Population`*100000) |>arrange(desc(rate))datajoinrate
# A tibble: 127 × 6
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <fct> <chr> <int> <dbl>
1 2024 kings ANTI-JEWISH 152 NA
2 2024 new y… ANTI-JEWISH 136 NA
3 2025 kings ANTI-JEWISH 136 NA
4 2019 kings ANTI-JEWISH 128 NA
5 2023 kings ANTI-JEWISH 126 NA
6 2022 kings ANTI-JEWISH 125 NA
7 2023 new y… ANTI-JEWISH 124 NA
8 2025 new y… ANTI-JEWISH 110 NA
9 2022 new y… ANTI-JEWISH 104 NA
10 2021 new y… ANTI-ASIAN 84 NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# ℹ 1 more variable: rate <dbl>
Essay
I think the positive aspect of this hate crimes dataset is that it spans 16 years and compares different types of biased motives. The dataset is relatively complete and diverse, which allows us to obtain a fair result.
However, there may be underreporting cases as well for various reasons. Additionally, the definition of “hate crime” is not standardized or accurate. Different people have different opinions about it, which may compromise the dataset’s quality. Last but not least, it would be better if this dataset included detailed individual cases, so that we, as readers, can have a clearer picture of the victims’ families and educational backgrounds, and conduct a more comprehensive and in-depth analysis.
In the future, I would like to explore the relationships between major political or social events and hate crime rates, as well as between educational backgrounds and crime rates.