library(here)
knitr::include_graphics(here("BiasType.png"))
The visualisation that we will critique today belongs to a bigger project published by user Ola Alsukour recently in November 2022 on Observable HQ. The project, according to Alsukour, aims to inform the general public about different motives behind hate crime in the US and how they have changed over the years.
The data was taken from Crime Data Explorer, which belongs to the Federal Bureau of Investigation (FBI for short). Their missions is to help expand knowledge and awareness of the crime statistics and improve accountability for law enforcement. They also show that victims are not alone and encourage lawmakers to address the issues for their communities.
Hate crime is defined as a committed criminal offense motivated by the offender’s prejudice against a race, ethnicity, religion, disability, gender identity or sexual orientation (FBI 2022)
The visualisation chosen is one of many more created by Alsukour and it focuses on the number of hate crimes committed during the period from 2010 to 2022 based on the motives. The project is public so it can be viewed via the URL below
According to Kaiser’s Trifecta check-up, I believe the author has a very good question to start with. We all know that hate crime affect many people in today society, especially in US where people are coming from all different backgrounds, having multiple beliefs, but maybe some of us are not quite sure how the trends have changed over the years, and whether the situation is getting better or getting worse. I personally find this question is fascinating and intriguing.
The data is taken from the FBI, so it is highly reliable and very relevant to the question. The full dataset have a lot of information recorded for each of the committed crime, such as the time and place of the crime, information about the incident reporting agency as well as basic information about the offenders and the victims.
I believe the actual number of hate crimes maybe higher than being reported due to many factors but at least we have a trusted source of data here. It also has minimal missing values or errors.
However, there are some issues can be identified in the visuals:
First of all, what really stands out is the green line positioned in the top part of the chart. It is significantly distinguishing from the rest of other lines and probably the first thing that most viewers look at. However, when viewers want to find out more about which motive this green line represents, they run into an issue here as there are more than one shade of green are used in the ‘Motives’ legend.
The same apple green colour is used for 2 groups: “Anti-Black or African American” and “Anti-Native Hawaiian or Other Pa…” (intentional) Then another slightly darker shade of green is used for “Anti-Bisexual” and “Anti-Multiple Religions”. To make it worse, these two colours are not significantly different from each other (some may find them mostly identical indeed), especially when there are nearly 20 other lines overlapping at the bottom area of the chart.
Of course, based on general knowledge and recent media coverage, many would correctly guess that the green line is for Anti-Black/African American but that means the colour choice serves zero purpose in this visualisation.
There are also many similar shades of pink/purple are used and they are not easily distinguished.
Last but not least, the colour palette is not colour-blinded friendly. Since the target audience is defined as the general public, my recommendation is to also take this population segment into consideration.
The colour issue, as discussed above, is stemmed from the fact that there are too many lines presented in the graph. It is impossible to tell them apart, especially for those lines below the 200 mark on the y-axis. The lines are almost melded to each other, making them counterproductive in conveying any information.
It is reasonable that the author has chosen the line chart to visualise a time-series data like this one, however, the number of lines needs to be reduced, either by grouping or only selecting important data to be shown. Line charts with multiple lines are supposed to allow meaningful comparison between different categories, however, the only comparison that viewers maybe able to make in this visualisation is that Anti-Black/African American is the most popular bias that lead to criminal activities, comparing to the rest.
In this case, for our line chart to be easy to comprehend, simplicity is preferred over complexity.
The legends are organised in alphabetical order which is pointless in this case mostly because there are too many listed and our minds cannot draw either relevance or difference.
Not all of the motives are being shown in the legends. The last line only says “…6 entries” and is not expandable. Some motive descriptions are too long, hence, not fully shown (e.g. “Anti-American Indian or Alaska Native”, “Anti-Eastern Orthodox (Russian, Greek, Other)” and “Anti-Native Hawaiian or Other Pacific Islander”). The big picture cannot be fully grasped without revisiting the original data source, which defeats the purpose of building the visualisation in the first place.
The years along the x-axis are placed horizontally, which is quite hard to read, although I understand the initial thought of the author maybe to prevent the years being too crowded and potentially overlapped.
The title is intriguing but a bit verbose. It can serve as a subheading but another shorter, simpler title can be used for better attention grabbing.
Alsukour O (2022) Hate Crime, Observable HQ, accessed 17 November 2022. https://observablehq.com/d/379c4a69e74c9cbf#imports
The following code was used to fix the issues identified in the original.
Install and load the packages you need to produce the report here:
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyverse)
library(stringr)
library(lubridate)
# load dataset
hate_crime <- read_csv(here("hate_crime.csv"))
# take a glimpse
glimpse(hate_crime)
## Rows: 219,577
## Columns: 28
## $ INCIDENT_ID <dbl> 3015, 3016, 43, 44, 3017, 3018, 3019, 45, 46,…
## $ DATA_YEAR <dbl> 1991, 1991, 1991, 1991, 1991, 1991, 1991, 199…
## $ ORI <chr> "AR0040200", "AR0290100", "AR0350100", "AR035…
## $ PUB_AGENCY_NAME <chr> "Rogers", "Hope", "Pine Bluff", "Pine Bluff",…
## $ PUB_AGENCY_UNIT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ AGENCY_TYPE_NAME <chr> "City", "City", "City", "City", "City", "City…
## $ STATE_ABBR <chr> "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR…
## $ STATE_NAME <chr> "Arkansas", "Arkansas", "Arkansas", "Arkansas…
## $ DIVISION_NAME <chr> "West South Central", "West South Central", "…
## $ REGION_NAME <chr> "South", "South", "South", "South", "South", …
## $ POPULATION_GROUP_CODE <chr> "5", "6", "3", "3", "3", "3", "2", "3", "3", …
## $ POPULATION_GROUP_DESC <chr> "Cities from 10,000 thru 24,999", "Cities fro…
## $ INCIDENT_DATE <chr> "31-AUG-91", "19-SEP-91", "04-JUL-91", "24-DE…
## $ ADULT_VICTIM_COUNT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ JUVENILE_VICTIM_COUNT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TOTAL_OFFENDER_COUNT <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 2, 10, 2, 1, 0, 1, 1,…
## $ ADULT_OFFENDER_COUNT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ JUVENILE_OFFENDER_COUNT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ OFFENDER_RACE <chr> "White", "Black or African American", "Black …
## $ OFFENDER_ETHNICITY <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ VICTIM_COUNT <dbl> 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, …
## $ OFFENSE_NAME <chr> "Intimidation", "Simple Assault", "Aggravated…
## $ TOTAL_INDIVIDUAL_VICTIMS <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, …
## $ LOCATION_NAME <chr> "Highway/Road/Alley/Street/Sidewalk", "Highwa…
## $ BIAS_DESC <chr> "Anti-Black or African American", "Anti-White…
## $ VICTIM_TYPES <chr> "Individual", "Individual", "Individual", "In…
## $ MULTIPLE_OFFENSE <chr> "S", "S", "S", "M", "S", "S", "S", "M", "S", …
## $ MULTIPLE_BIAS <chr> "S", "S", "S", "S", "S", "S", "S", "S", "S", …
There are many useful information here but we only need a few
variables to build the visualisation, particularly the
INCIDENT_ID, the year when the incident happened and the
underlying bias. We will also filter the incidents happened from 2010 to
2020 only.
It is noticeable that some observations of the BIAS_DESC
column have more than one bias listed, separated by a semi-colon (;) so
we will need to split them into different rows and include each one to
the total count based on year and bias.
hate_crime <- hate_crime %>%
select('INCIDENT_ID', 'DATA_YEAR', 'BIAS_DESC') %>%
filter(DATA_YEAR > 2009) %>%
separate_rows(BIAS_DESC, sep = ";") %>%
group_by(DATA_YEAR, BIAS_DESC) %>%
summarise(CRIMES_NO = n())
In order to tackle the overlapping issues I will try to group the motives into more generalized groups, which include:
The grouping system is inspired by the information on the FBI’s website. The only difference is that I would combine “Gender” (including “Anti-Male” and “Anti-Female”) and “Gender Identity” (including “Anti-Transgender” and “Anti-Gender Non-Conforming”) to one group only to make sure our line chart is not too crowded and clustered.
# divide into groups
race <- c("Anti-Black or African American",
"Anti-White",
"Anti-American Indian or Alaska Native",
"Anti-Asian",
"Anti-Native Hawaiian or Other Pacific Islander",
"Anti-Multiple Races, Group",
"Anti-Arab",
"Anti-Hispanic or Latino",
"Anti-Other Race/Ethnicity/Ancestry")
religion <- c("Anti-Jewish",
"Anti-Catholic",
"Anti-Protestant",
"Anti-Islamic (Muslim)",
"Anti-Other Religion",
"Anti-Multiple Religions, Group",
"Anti-Mormon",
"Anti-Jehovah's Witness",
"Anti-Eastern Orthodox (Russian, Greek, Other)",
"Anti-Other Christian",
"Anti-Buddhist",
"Anti-Hindu",
"Anti-Sikh",
"Anti-Atheism/Agnosticism")
sexual <- c("Anti-Gay (Male)",
"Anti-Lesbian (Female)",
"Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)",
"Anti-Heterosexual",
"Anti-Bisexual")
disability <- c("Anti-Physical Disability",
"Anti-Mental Disability")
gender <- c("Anti-Male",
"Anti-Female",
"Anti-Transgender",
"Anti-Gender Non-Conforming")
After defining the bias-to-category relationship, I add another
column called BIAS_GROUP to the dataset and remove any NA
values.
hate_crime <- hate_crime %>%
mutate(BIAS_GROUP = case_when(BIAS_DESC %in% race ~ "Race/Ethnicity/Ancestry",
BIAS_DESC %in% religion ~ "Religion",
BIAS_DESC %in% sexual ~ "Sexual Orientation",
BIAS_DESC %in% disability ~ "Disability",
BIAS_DESC %in% gender ~ "Gender Identity",
TRUE ~ as.character(NA))) %>%
drop_na()
The following plot addresses the main issues in the original one.
I group the data by year and group, convert them to factor and
visualise it using ggplot()
I particularly think the points representing the number of crimes in
each year are useful in guiding our eyes so I will keep that feature by
adding a geom_point() layer on top of the
geom_line() one. I also change the BIAS_GROUP
to factor and re-order the levels so the legends are easier to read. I
also display the years horizontally instead of vertically, and maintain
a 2-year gap so they are not too clustered. This helps the visualisation
focuses on the question about the trends over the years. Last but not
least, I make the text slightly bigger and use a minimal theme to help
the graph stays clear and focused as we already use quite many colours
to display the lines already.
# total crimes per bias motive groups
hate_crime_grouped <- hate_crime %>%
group_by(DATA_YEAR, BIAS_GROUP) %>%
summarise(CRIMES_NO = sum(CRIMES_NO))
# change to factor and specify factor levels
hate_crime_grouped$BIAS_GROUP <- factor(hate_crime_grouped$BIAS_GROUP,
levels = c("Race/Ethnicity/Ancestry",
"Religion",
"Sexual Orientation",
"Disability",
"Gender Identity"))
# The color palette:
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#CC79A7", "#0072B2", "#D55E00")
# viz
ggplot(hate_crime_grouped, aes(DATA_YEAR, CRIMES_NO, colour = BIAS_GROUP, group = BIAS_GROUP)) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = "Changes in Hate Crimes motives from 2010 to 2020",
x = "Year",
y = "Number of Crimes",
colour = "Bias motives") +
scale_x_continuous(breaks = seq(2010, 2020, 2)) +
scale_y_continuous(breaks = seq(0, 7000, 1000)) +
scale_colour_manual(values=cbPalette) +
theme_minimal() +
theme(text = element_text(size = 16))
Another way to visualise the trends of hate crime over the years is
looking into the percentage change after each year. To do this, I will
add a new column called PCT_CHANGE, which is the difference
in percentage between the current year and the year before. Doing the
visualisation this way means we have all the lines starting from the
same point of zero so we can compare how significantly they have
changed.
It is quite interesting that when we plot the data using the percentage change, the pink line which represents hate crime against bias in victim’s gender identity has increased significantly in 2013. In fact, when we look at the raw data, there were only 6 cases recorded in 2012 but this number surged to 51 in 2013, then 135 in 2014.
hate_crime_pct <- hate_crime_grouped %>%
group_by(BIAS_GROUP) %>%
arrange(DATA_YEAR, .by_group = TRUE) %>%
mutate(PCT_CHANGE = round((CRIMES_NO / lag(CRIMES_NO) - 1) * 100, digits = 2)) %>%
mutate_at(vars("PCT_CHANGE"), ~replace_na(.,0))
ggplot(hate_crime_pct, aes(DATA_YEAR, PCT_CHANGE, colour = BIAS_GROUP, group = BIAS_GROUP)) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = "Changes in Hate Crimes motives from 2010 to 2020",
x = "Year",
y = "Percetange change in Crimes",
colour = "Bias motives") +
scale_x_continuous(breaks = seq(2010, 2020, 2)) +
scale_colour_manual(values=cbPalette) +
theme_minimal() +
theme(text = element_text(size = 16))
The other lines, however, are not quite clear due to the scale on the y-axis has been stretched due to the extreme values for gender identity bias crime. One way to overcome this is to utilize faceting an to see each line more clearly.
ggplot(hate_crime_pct, aes(DATA_YEAR, PCT_CHANGE, colour = BIAS_GROUP, group = BIAS_GROUP)) +
geom_line(size = 1) +
geom_point(size = 1.5) +
facet_wrap(~ BIAS_GROUP, ncol=2) +
labs(title = "Changes in Hate Crimes motives from 2010 to 2020",
x = "Year",
y = "Percetange change in Crimes",
colour = "Bias motives") +
scale_x_continuous(breaks = seq(2010, 2020, 2)) +
# ylim(-25, 75) +
scale_colour_manual(values=cbPalette) +
theme_minimal() +
theme(text = element_text(size = 16))
FBI (Federal Bureau of Investigation) (2022), Hate Crime Statistics: 1991-2020 [data set], Crime Data Explorer, FBI website, accessed 17 Novemer 2022. https://crime-data-explorer.fr.cloud.gov/pages/downloads#datasets