Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Here is a visualisation from the Statista website published by Katharina Buchholz on March 3rd, 2022.
The objective is to show the growing refugee crisis is increasing worldwide.
The data is sourced from the UNHCR website and includes refugees, asylum-seekers and Venezuelans abroad in 2013 and 2020 but also depicts Ukraine in 2022.
Its aimed at an international audience who might be concerned with the escalating crisis in Ukraine due to a Russian military invasion. The purpose might be to increase aid to those countries experiencing war and displacement.
The visualisation chosen had the following three main issues:
So What? The use of a stacked bar graph fails to make a good comparison between countries and previous years. The title “World Refugee Crises” is generalized, and fails to make a compelling story or pose a practical question such as; Which country or war, is causing the world refugee population to grow?
Perceptual and color issue: The color scheme
chosen is random and unintentional. There are two countries in green and
one of these is next to a country in red which fails the colorblindness
test. Upon close observation, one might recognize that “Others” has the
largest increase over time, but its grey so fails to highlight an
important question. The lines in the background are distracting to the
quantity in each stack and make them appear smaller or larger depending
on where they are situated on the graph. This is misleading.
Text: Quite a few issues with text and arrangement are worth mentioning:
Reference
The following code was used to fix the issues identified in the original.
library(tidyverse) # For changing the structure of the data (includes ggplot2)
library(readxl) # useful for reading data in excel spreadsheets
library(ggalt) #Extra Coordinate Systems, Geoms, Scales & Fonts for ‘ggplot2
library(scales) #to customize the appearance of axis and legend labels
library(here) # for importing data from current working directory
theme_set(theme_classic())
#import dataset as data frame, skip first rows and assign numeric columns
rawdata <- data.frame(read_csv(here("populationdata_18_20_22_untidy.csv"),
col_types = cols(
`2018` = col_number(),
`2020` = col_number(),
`2022` = col_number()), skip = 14))
#filter data on countries of interest
filterdata <- rawdata %>% filter(Country.of.origin %in% c("Ukraine",
"Serbia",
"Turkey",
"Palestinian",
"Syria",
"Pakistan",
"Columbia",
"Congo",
"Honduras",
"Sudan",
"Afghanistan",
"Myanmar",
"Somalia",
"Uganda",
"Venezuela")) %>%
# create new column to calculate the difference between 2018 & 2020
mutate(Delta = X2020 - X2018) %>%
# sort by year 2020 for maximum impact in the graph
arrange(X2020)
#check structure of our dataset
str(filterdata)
## 'data.frame': 14 obs. of 5 variables:
## $ Country.of.origin: chr "Uganda" "Serbia" "Palestinian" "Turkey" ...
## $ X2018 : num 197990 42070 115648 111825 207585 ...
## $ X2020 : num 19191 37220 113578 138274 197444 ...
## $ X2022 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Delta : num -178799 -4850 -2070 26449 -10141 ...
# for correct ordering of the dumbbells, change y axis to Factor
filterdata$Country.of.origin <- factor(filterdata$Country.of.origin, levels=as.character(filterdata$Country.of.origin))
gg <- ggplot(filterdata, aes(x=X2018, xend=X2020, y= Country.of.origin, group= Country.of.origin)) +
geom_dumbbell(color="#9ecae1",
size=2.25, colour_x = "#9ecae1", colour_xend = "#094159",
dot_guide=TRUE, dot_guide_size=0.25) +
scale_x_continuous(label = function(l) {
paste0(round(l/1e6,1),"M")
}) +
labs(x="in Millions",
y=NULL,
title="The refugee crisis has grown considerably in some regions",
subtitle="Total refugee population by country of origin: 2018 to 2020",
caption="Excluding internally displaced people. Source: https://www.unhcr.org/en-au/data.html") +
theme(plot.title = element_text(hjust=0, face="bold"),
plot.background=element_rect(fill="#f7f7f7"),
panel.background=element_rect(fill="#f7f7f7"),
panel.grid.minor=element_blank(),
panel.grid.major.y=element_blank(),
panel.grid.major.x=element_line(),
axis.ticks=element_blank(),
legend.position="top",
panel.border=element_blank())
Data Reference
UNHCR The UN Refugee Agency. (2022). Global Trends report. Retrieved July 23rd, 2022, from UNHCR website: https://www.unhcr.org/en-au/data.html
UNHCR The UN Refugee Agency (2022). Operational Data Porta - Ukraine situation. Retrieved July 23rd, 2022, from UNHCR website: https://data.unhcr.org/en/situations/ukraine
Code Reference
Selva Prabhakaran (2017), Top 50 ggplot2 Visualizations - The Master List, R-statistics.co http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
Margaret (2018), Display an axis value in millions in ggplot, retrieved July 23rd 2022 from stackoverflow.com website: https://stackoverflow.com/questions/52602503/display-an-axis-value-in-millions-in-ggplot
Bob Rudis (2014), ggalt examples, cran.r-project.org accessed July 23rd, 2022, https://cran.r-project.org/web/packages/ggalt/vignettes/ggalt_examples.html
The following plot fixes the main issues in the original. The visualisation is clutter free and enhances the message that some countries are experiencing unprecedented refugee crises. The y axis displays the countries of interest, and the order is intentional, from the most significant changes to the least. The x axis displays the refugee population in Millions. The drag between light blue and dark blue dot shows the change between 2018 to 2020. They eyes are attracted to the darker blue dot but you can now clearly see the history and +/- change of the data.