Rows: 4029 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Record Create Date, Patrol Borough Name, County, Law Code Category ...
dbl (4): Full Complaint ID, Complaint Year Number, Month Number, Complaint P...
lgl (1): Arrest Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bias_count |>head(10) |>ggplot(aes(x=biasmotivedescription, y = n)) +geom_col()
Arrange the bars according to height and rotate
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +geom_col() +coord_flip()
#Add title, caption for the data source, and x-axis label
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +geom_col() +coord_flip()+labs(x ="",y ="Counts of hatecrime types based on motive",title ="Bar Graph of Hate Crimes from 2019-2026",subtitle ="Counts based on the hatecrime motive",caption ="Source: NY State Division of Criminal Justice Services")
#Finally add color and change the theme
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +geom_col(fill ="salmon") +coord_flip()+labs(x ="",y ="Counts of hatecrime types based on motive",title ="Bar Graph of Hate Crimes from 2019-2026",subtitle ="Counts based on the hatecrime motive",caption ="Source: NY State Division of Criminal Justice Services") +theme_minimal()
#Add annotations for counts and remove the x-axis values
bias_count |>head(10) |>ggplot(aes(x=reorder(biasmotivedescription, n), y = n)) +geom_col(fill ="salmon") +coord_flip()+labs(x ="",y ="Counts of hatecrime types based on motive",title ="Bar Graph of Hate Crimes from 2019-2026",subtitle ="Counts based on the hatecrime motive",caption ="Source: NY State Division of Criminal Justice Services") +theme_minimal()+geom_text(aes(label = n), hjust =-.05, size =3) +theme(axis.text.x =element_blank())
Look deeper into crimes against Jewish, Asian, Black people, and gay males
# A tibble: 127 × 4
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n
<dbl> <chr> <chr> <int>
1 2024 KINGS ANTI-JEWISH 152
2 2024 NEW YORK ANTI-JEWISH 136
3 2025 KINGS ANTI-JEWISH 136
4 2019 KINGS ANTI-JEWISH 128
5 2023 KINGS ANTI-JEWISH 126
6 2022 KINGS ANTI-JEWISH 125
7 2023 NEW YORK ANTI-JEWISH 124
8 2025 NEW YORK ANTI-JEWISH 110
9 2022 NEW YORK ANTI-JEWISH 104
10 2021 NEW YORK ANTI-ASIAN 84
# ℹ 117 more rows
#Plot these three types of hate crimes together
ggplot(data = hate2) +geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),position ="dodge", stat ="identity") +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")
#What about the counties?
ggplot(data = hate2) +geom_bar(aes(x=county, y=n, fill = biasmotivedescription),position ="dodge", stat ="identity") +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")
#Put it all together with years and counties using “facet”
ggplot(data = hate2) +geom_bar(aes(x=complaintyearnumber, y=n, fill = biasmotivedescription),position ="dodge", stat ="identity") +facet_wrap(~county) +labs(fill ="Hate Crime Type",y ="Number of Hate Crime Incidents",title ="Hate Crime Type in NY Counties Between 2010-2016",caption ="Source: NY State Division of Criminal Justice Services")
#How would calculations be affected by looking at hate crimes in counties per year by population densities?
Rows: 62 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Area Name, Population Percent Change
num (2): 2020 Census Population, Population Change
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 127 × 5
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <chr> <chr> <int> <dbl>
1 2024 KINGS ANTI-JEWISH 152 NA
2 2024 NEW Y… ANTI-JEWISH 136 NA
3 2025 KINGS ANTI-JEWISH 136 NA
4 2019 KINGS ANTI-JEWISH 128 NA
5 2023 KINGS ANTI-JEWISH 126 NA
6 2022 KINGS ANTI-JEWISH 125 NA
7 2023 NEW Y… ANTI-JEWISH 124 NA
8 2025 NEW Y… ANTI-JEWISH 110 NA
9 2022 NEW Y… ANTI-JEWISH 104 NA
10 2021 NEW Y… ANTI-ASIAN 84 NA
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# A tibble: 127 × 5
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <fct> <chr> <int> <dbl>
1 2024 kings ANTI-JEWISH 152 2736074
2 2024 new y… ANTI-JEWISH 136 1694251
3 2025 kings ANTI-JEWISH 136 2736074
4 2019 kings ANTI-JEWISH 128 2736074
5 2023 kings ANTI-JEWISH 126 2736074
6 2022 kings ANTI-JEWISH 125 2736074
7 2023 new y… ANTI-JEWISH 124 1694251
8 2025 new y… ANTI-JEWISH 110 1694251
9 2022 new y… ANTI-JEWISH 104 1694251
10 2021 new y… ANTI-ASIAN 84 1694251
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
Calculate the rate of incidents per 100,000. Then arrange in descending order
datajoinrate <- datajoin |>mutate(rate = n/`2020 Census Population`*100000) |>arrange(desc(rate))datajoinrate
# A tibble: 127 × 6
# Groups: complaintyearnumber, county [35]
complaintyearnumber county biasmotivedescription n 2020 Census Populati…¹
<dbl> <fct> <chr> <int> <dbl>
1 2024 new y… ANTI-JEWISH 136 1694251
2 2023 new y… ANTI-JEWISH 124 1694251
3 2025 new y… ANTI-JEWISH 110 1694251
4 2022 new y… ANTI-JEWISH 104 1694251
5 2024 kings ANTI-JEWISH 152 2736074
6 2025 kings ANTI-JEWISH 136 2736074
7 2021 new y… ANTI-ASIAN 84 1694251
8 2021 new y… ANTI-JEWISH 84 1694251
9 2019 kings ANTI-JEWISH 128 2736074
10 2023 kings ANTI-JEWISH 126 2736074
# ℹ 117 more rows
# ℹ abbreviated name: ¹`2020 Census Population`
# ℹ 1 more variable: rate <dbl>
Your turn!
Once you complete this tutorial, include an essay of about 150-200 words which that answers the following questions:
Write about the positive and negative aspects of this hatecrimes dataset.
List 2 different paths you could hypothetically like to study about this dataset at some future point.
Some positive things about the hatecrimes dataset is the broadness of the variables. They go in many different directions from the offense category, like religion and religious practice all the way to the complaint year. This setup of the dataset makes it very easy to show correlation between variables and it makes it a well-rounded dataset. Some negatives about the hatecrimes dataset are the two variables at the end. Arrestdate and arrestid are filled all the way down with N/A values, which makes me question the reason why those variables are included in the first place. Also, the vertical barplots x-axis was very messy, making it very hard to follow along and understand those visualizations.
One thing that I would like to study about this dataset in the future is the correlation between the population density of each county ( Bronx, Kings, Queens, New York and Richmond) and the actual counts of the hate crimes. This would see if the increasing rates of the hate crimes would be caused by the bigger population size or the actual increase of hate crime incidents. Another thing that I would like to study about this dataset in the future is the is if the hate crimes increase when major social or political events happen. For example, if more anti-semetic crimes happen during the Palestine and Israel conflict.