Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: symmy589 (2020).


The Author’s Intended Audience and Objectives

The author posted this graph to the dataisbeautiful subreddit to educate other reddit users about the current state of COVID-19.

Their intention was to display the following information for each country:

  • The proportion of COVID-19 cases compared to the world. The outer ring informs me of this intention.

  • The proportion of COVID-19 deaths compared to the world. The inner ring informs me of this intention.

  • The ratio of COVID-19 cases resulting in death. The decision to place the rings together informs me of this intention.

Issues with the Data Visualisation

Issues surrounding the decision to use pie charts:

  • It is difficult to compare proportions within individual pie charts. For instance, if we are to look at the outer ring, it is extremely difficult to tell if France or Turkey has a larger value.
  • Replacing pie charts with bar charts for each visualisation would solve this problem.

Issues surrounding the decision to nest two pie charts:

  • Some values do not line up. For instance, the value for Spain in the outer ring does not line up with its value in the inner ring, making them hard to compare.
  • Replacing nested pie charts with stacked bar charts would solve this problem.

Issues surrounding the choice of data to compare:

  • Comparing values across these two visualisations is misleading. For instance, (at first glance) it looks as though there are more deaths than cases for the United Kingdom, which would be impossible.
  • Splitting this visualisation into two facets would allow the audience to treat them as separate questions.
  • Adding a third facet which displays each country’s ‘percentage of cases resulting in death’ would be a more appropriate way to visualise the interaction between cases and deaths.

The Way Forward

I’ve decided to respect and re-visualise the author’s intended objectives by creating a visualisation with 3 facets:

  1. I’ve displayed the proportion of COVID-19 cases compared to the world by using a bar chart in the first facet. I’ve placed this facet first as it was the most prominent part of the original visualisation.
  2. I’ve displayed the proportion of COVID-19 deaths compared to the world by using another bar chart in the second facet.
  3. I’ve displayed the ratio of cases resulting in deaths with another bar chart in the third facet.

Respecting the original visualisation

In the third facet, I thought about displaying the proportion of cases compared to national populations but it wasn’t part of the author’s original intentions. I settled for using the case/death data for each country to respect the original.

I used a similar colour theme to the original in order to respect the author’s intentions and audience.

Reference

Code

The following code was used to fix the issues identified in the original.

# LOADING LIBRARIES -----------------------------------------------------

library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(forcats)

# LOADING DATA ----------------------------------------------------------

covid_cases <- 
  read_csv("owid-covid-data.csv")

# MANIPULATING DATA -----------------------------------------------------

covids <- 
  covid_cases %>% 
  filter(date=="2020-05-05") %>% 
  select(location, total_cases, total_deaths) %>% 
  arrange(desc(total_cases)) %>%
  mutate(case_prop = (total_cases/total_cases[1])*100) %>%
  mutate(death_prop = (total_deaths/total_deaths[1])*100) %>%
  mutate(death_rate = (total_deaths/total_cases)*100)
  

covid <- covids %>% slice(2:16)
others <- covids %>% slice(17:209) 

world_cases <- 3544222
world_deaths <- 250977
total_other_cases <- others$total_cases %>% sum()
total_other_deaths <- others$total_deaths %>% sum()
other_case_prop <- round((total_other_cases/world_cases)*100)
other_death_prop <- round((total_other_deaths/world_deaths)*100)
other_death_rate <- round((total_other_deaths/total_other_cases)*100)

covid <- covid %>% add_row(location = "Other", total_cases = total_other_cases, total_deaths = total_other_deaths, 
                  case_prop = other_case_prop, death_prop = other_death_prop, death_rate = other_death_rate)

covid <- covid %>% arrange(desc(total_cases))

# ORDERING LOCATIONS -----------------------------------------------------

covid$location <- fct_reorder(covid$location, covid$case_prop)

# CONVERTING WIDE DATA TO LONG DATA -------------------------------------

covid_long <- 
  covid %>% 
  gather(Variable, values, case_prop, death_prop, death_rate, factor_key = T)

# RENAMING VARIABLES ----------------------------------------------------

covid_long$Variable <- factor(covid_long$Variable,
                         levels=c("case_prop","death_prop","death_rate"),
                         labels=c("% of World Cases", "% of World Deaths", "% of Cases Resulting in Death"))

# GRAPHING THE DATA ----------------------------------------------------

covid_long$colours <- covid_long$location[1:16]

covid_long$colours <- factor(covid_long$colours, levels = rev(levels(covid_long$colours)))

p <- ggplot(covid_long, aes(x = location, y = values, fill=colours)) + 
  geom_bar(stat = "identity") +
  coord_flip() +
  facet_grid(.~Variable, scales="free") + 
  ylab("Percentage") +
  xlab("Country") +
  ggtitle("COVID-19 Case and Death Statistics")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.