Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Statista Refugee Crises Worldwide (2022).


Objective

Here is a visualisation from the Statista website published by Katharina Buchholz on March 3rd, 2022.

The objective is to show the growing refugee crisis is increasing worldwide.

The data is sourced from the UNHCR website and includes refugees, asylum-seekers and Venezuelans abroad in 2013 and 2020 but also depicts Ukraine in 2022.

Its aimed at an international audience who might be concerned with the escalating crisis in Ukraine due to a Russian military invasion. The purpose might be to increase aid to those countries experiencing war and displacement.

The visualisation chosen had the following three main issues:

  • So What? The use of a stacked bar graph fails to make a good comparison between countries and previous years. The title “World Refugee Crises” is generalized, and fails to make a compelling story or pose a practical question such as; Which country or war, is causing the world refugee population to grow?

  • Perceptual and color issue: The color scheme chosen is random and unintentional. There are two countries in green and one of these is next to a country in red which fails the colorblindness test. Upon close observation, one might recognize that “Others” has the largest increase over time, but its grey so fails to highlight an important question. The lines in the background are distracting to the quantity in each stack and make them appear smaller or larger depending on where they are situated on the graph. This is misleading.

  • Text: Quite a few issues with text and arrangement are worth mentioning:

    • The legend causes eyes to wander back and forth over the data.
    • Data labels are used too often on the y axis, these could be scaled down to larger chunks and reduce noise. They also appear on the 2020 bar but not the other so it forces the reader to guess the other number rather than paint a clear picture.
    • Ukraine is indiscriminately added in text, above the 2013 tick on the x axis but as year 2022.

Reference

Code

The following code was used to fix the issues identified in the original.

library(tidyverse) # For changing the structure of the data (includes ggplot2)
library(readxl) # useful for reading data in excel spreadsheets
library(ggalt) #Extra Coordinate Systems, Geoms, Scales & Fonts for ‘ggplot2
library(scales) #to customize the appearance of axis and legend labels
library(here) # for importing data from current working directory
theme_set(theme_classic()) 

#import dataset as data frame, skip first rows and assign numeric columns
rawdata <- data.frame(read_csv(here("populationdata_18_20_22_untidy.csv"), 
     col_types = cols(
       `2018` = col_number(), 
       `2020` = col_number(), 
       `2022` = col_number()), skip = 14))

#filter data on countries of interest
filterdata <- rawdata %>% filter(Country.of.origin %in% c("Ukraine", 
                                                          "Serbia",
                                                          "Turkey",
                                                          "Palestinian",
                                                          "Syria",
                                                          "Pakistan",
                                                          "Columbia",
                                                          "Congo",
                                                          "Honduras",
                                                          "Sudan",
                                                          "Afghanistan",
                                                          "Myanmar",
                                                          "Somalia",
                                                          "Uganda",
                                                          "Venezuela")) %>% 

# create new column to calculate the difference between 2018 & 2020              
  mutate(Delta = X2020 - X2018) %>% 
# sort by year 2020 for maximum impact in the graph 
  arrange(X2020)

#check structure of our dataset
str(filterdata)
## 'data.frame':    14 obs. of  5 variables:
##  $ Country.of.origin: chr  "Uganda" "Serbia" "Palestinian" "Turkey" ...
##  $ X2018            : num  197990 42070 115648 111825 207585 ...
##  $ X2020            : num  19191 37220 113578 138274 197444 ...
##  $ X2022            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Delta            : num  -178799 -4850 -2070 26449 -10141 ...
# for correct ordering of the dumbbells, change y axis to Factor
filterdata$Country.of.origin <- factor(filterdata$Country.of.origin, levels=as.character(filterdata$Country.of.origin))

gg <- ggplot(filterdata, aes(x=X2018, xend=X2020, y= Country.of.origin, group= Country.of.origin))         + 
        geom_dumbbell(color="#9ecae1", 
        size=2.25, colour_x = "#9ecae1", colour_xend = "#094159",
        dot_guide=TRUE, dot_guide_size=0.25) + 
        scale_x_continuous(label = function(l) {
        paste0(round(l/1e6,1),"M")
        }) + 
        labs(x="in Millions", 
             y=NULL, 
             title="The refugee crisis has grown considerably in some regions", 
             subtitle="Total refugee population by country of origin: 2018 to 2020", 
             caption="Excluding internally displaced people.  Source: https://www.unhcr.org/en-au/data.html") +
        theme(plot.title = element_text(hjust=0, face="bold"),
              plot.background=element_rect(fill="#f7f7f7"),
              panel.background=element_rect(fill="#f7f7f7"),
              panel.grid.minor=element_blank(),
              panel.grid.major.y=element_blank(),
              panel.grid.major.x=element_line(),
              axis.ticks=element_blank(),
              legend.position="top",
              panel.border=element_blank())

Data Reference

Code Reference

Reconstruction

The following plot fixes the main issues in the original. The visualisation is clutter free and enhances the message that some countries are experiencing unprecedented refugee crises. The y axis displays the countries of interest, and the order is intentional, from the most significant changes to the least. The x axis displays the refugee population in Millions. The drag between light blue and dark blue dot shows the change between 2018 to 2020. They eyes are attracted to the darker blue dot but you can now clearly see the history and +/- change of the data.