Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: HowMuch.net: Visualizing the Negative Economic Impact of COVID-19 on Tourism (2021).


Objective

The objective of the data visualisation is to show the tourism economic impact of the COVID pandemic on “high tourism” countries. As the COVID pandemic resulted in international border closures to prevent the spread of COVID-19, countries with high tourism experienced a decline in visitors, and therefore, income.

The target audience is the general public. The bold colours and use of country flags suggests this chart is intended to appeal to a general audience.

The visualisation chosen had the following three main issues:

  • Data deception: It is unclear why the countries shown in the visualisation were chosen. One would assume the creator picked the 14 countries with the highest income in 2019. However, an analysis of similar data from the World Bank and World Trade Organisation showed Macao and the United Arab Emirates should have been in the top 14 if countries were chosen by income. The fact Russia was included in the visualisation, which comparatively had a much lower tourism income in 2019, indicated the creator possibly wanted to include “recognisable” countries. However, the audience could easily make the assumption that the chosen countries had the highest tourism incomes in the world, which is not the case. The countries also appear to have been ordered by income percentage decline between 2019 and 2020, which could create further confusion.

  • Ignoring convention: When visualising numerical data such as tourism income, it ideal to show it on a plot so the audience can compare values between countries. This visualisation does not look like any conventional plot, and therefore, it is virtually impossible to compare the values within each country, and across the plot between different countries. The only representation of change in income - other than specified 2019 and 2020 income values above and below the circles - is a progressively darkening red colour within the circles indicating the percentage decrease. While the reader can identify the percentage decline bracket by comparing the country’s colour to the key, it is impossible to find the exact value unless the audience calculates it themselves. Therefore, it is difficult to understand and compare the change in tourism income within a country and between countries, even though this is the intended purpose of the plot.

  • Visual bombardment: While the use of country flags inside the circles possibly helps the audience identify the country faster, the different colours of the flags and the changing size of the cicles confuses the message of the visualisation. The United States is by far the biggest circle, and therefore becomes the focal point of the visualisation. It takes the viewer some time to figure out why it’s the largest - it is not because it had the biggest percentage change, but because the United States had the highest tourism income in 2019. While this is valuable to know, it is not the point of the visualisation, and therefore distracts from the intention, which is to show the decline in tourism income.

Reference

Code

The following code was used to fix the issues identified in the original.

Load packages and read CSV

library(readr)
library(tidyr)
library(forcats)
library(dplyr)
tourism <- read_csv('clean_tourism.csv')

Sort the 2019 values so they’re decending

library(dplyr)
tourism <- tourism %>% arrange(desc(2019))

Change the country column to a factor to make it easier to visualise

tourism$country <- factor(tourism$country)
tourism$country <- fct_reorder(tourism$country, tourism$`2019`, .desc=TRUE)

Make the data long for ease of visualisation

tourism <- tourism %>% pivot_longer(!c(country, perc, half), names_to = "year", values_to = "count")
tourism %>% head()
## # A tibble: 6 x 5
##   country        perc  half year         count
##   <fct>         <dbl> <dbl> <chr>        <dbl>
## 1 United States -64.8 162.  2019  239447000000
## 2 United States -64.8 162.  2020   84205000000
## 3 Spain*        -77.3  48.9 2019   79700000000
## 4 Spain*        -77.3  48.9 2020   18091900000
## 5 France        -49.2  53.4 2019   70776000000
## 6 France        -49.2  53.4 2020   35958000000

Divide ‘count’ value by one billion to make it easier to visualise

tourism$count <- tourism$count / 1000000000
tourism %>% head()
## # A tibble: 6 x 5
##   country        perc  half year  count
##   <fct>         <dbl> <dbl> <chr> <dbl>
## 1 United States -64.8 162.  2019  239. 
## 2 United States -64.8 162.  2020   84.2
## 3 Spain*        -77.3  48.9 2019   79.7
## 4 Spain*        -77.3  48.9 2020   18.1
## 5 France        -49.2  53.4 2019   70.8
## 6 France        -49.2  53.4 2020   36.0

Build visualisation

library(ggplot2)
p <- ggplot(data = tourism, aes(x = year, y = count, group=perc))
p1 <- p + geom_point(aes(color=perc, alpha = 0.7)) +
  geom_line(aes(color=perc, alpha = 0.7)) +
  labs(
    title = 'International tourism income decrease due to COVID pandemic (2019-2020)',
    subtitle = 'For top 16 highest international tourism income** countries in 2019',
    caption = 'Source: The World Bank, and World Tourism Organisation \n https://data.worldbank.org/indicator/ST.INT.RCPT.CD \n https://www.wto.org/english/tratop_e/envir_e/unwto_barom21.pdf \n
    *Provisional data from World Tourism Organisation estimated in May 2021 (confirmed figures for these countries unavailable in recent data) \n **Income measured as "receipts", which are expenditures by international inbound visitors, which include transport, goods and services',
    x = 'Year',
    y = 'Income ($US billions)'
  )  + facet_wrap(vars(country), scales = 'fixed') +
  guides(color = guide_colorbar(reverse = TRUE), alpha = FALSE) +
  scale_color_gradient(low = '#660000', high = '#FF3300', name = '% change') +
  geom_text(aes(x=1.65, y=half+25, label=paste0(perc, "%"), colour=perc),size=2.5, 
            fontface='bold') +
  geom_text(data = tourism %>% filter(year==2019),
            aes(label= sprintf("%0.1f", round(count, digits = 1))), hjust=1.3,
            vjust = 0.6, size=2.25, fontface= 'bold') +
  geom_text(data = tourism %>% filter(year==2020),
            aes(label=sprintf("%0.1f", round(count, digits = 1))), hjust = -0.3,
            vjust = -0.1, size=2.25, fontface= 'bold') +
  theme_light() +
  theme(text=element_text(family="Arial"),
        title = element_text(face='bold'),
        plot.subtitle = element_text(face='plain'),
        plot.caption = element_text(face='italic', size=8),
        axis.title.x = element_text(face='bold', size =9),
        axis.title.y = element_text(face='bold', size =9),
        axis.text.x = element_text(size=7),
        axis.text.y = element_text(size=7))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.