Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
The Author’s Intended Audience and Objectives
The author posted this graph to the dataisbeautiful subreddit to educate other reddit users about the current state of COVID-19.
Their intention was to display the following information for each country:
The proportion of COVID-19 cases compared to the world. The outer ring informs me of this intention.
The proportion of COVID-19 deaths compared to the world. The inner ring informs me of this intention.
The ratio of COVID-19 cases resulting in death. The decision to place the rings together informs me of this intention.
Issues with the Data Visualisation
Issues surrounding the decision to use pie charts:
Issues surrounding the decision to nest two pie charts:
Issues surrounding the choice of data to compare:
The Way Forward
I’ve decided to respect and re-visualise the author’s intended objectives by creating a visualisation with 3 facets:
Respecting the original visualisation
In the third facet, I thought about displaying the proportion of cases compared to national populations but it wasn’t part of the author’s original intentions. I settled for using the case/death data for each country to respect the original.
I used a similar colour theme to the original in order to respect the author’s intentions and audience.
Reference
The following code was used to fix the issues identified in the original.
# LOADING LIBRARIES -----------------------------------------------------
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(forcats)
# LOADING DATA ----------------------------------------------------------
covid_cases <-
read_csv("owid-covid-data.csv")
# MANIPULATING DATA -----------------------------------------------------
covids <-
covid_cases %>%
filter(date=="2020-05-05") %>%
select(location, total_cases, total_deaths) %>%
arrange(desc(total_cases)) %>%
mutate(case_prop = (total_cases/total_cases[1])*100) %>%
mutate(death_prop = (total_deaths/total_deaths[1])*100) %>%
mutate(death_rate = (total_deaths/total_cases)*100)
covid <- covids %>% slice(2:16)
others <- covids %>% slice(17:209)
world_cases <- 3544222
world_deaths <- 250977
total_other_cases <- others$total_cases %>% sum()
total_other_deaths <- others$total_deaths %>% sum()
other_case_prop <- round((total_other_cases/world_cases)*100)
other_death_prop <- round((total_other_deaths/world_deaths)*100)
other_death_rate <- round((total_other_deaths/total_other_cases)*100)
covid <- covid %>% add_row(location = "Other", total_cases = total_other_cases, total_deaths = total_other_deaths,
case_prop = other_case_prop, death_prop = other_death_prop, death_rate = other_death_rate)
covid <- covid %>% arrange(desc(total_cases))
# ORDERING LOCATIONS -----------------------------------------------------
covid$location <- fct_reorder(covid$location, covid$case_prop)
# CONVERTING WIDE DATA TO LONG DATA -------------------------------------
covid_long <-
covid %>%
gather(Variable, values, case_prop, death_prop, death_rate, factor_key = T)
# RENAMING VARIABLES ----------------------------------------------------
covid_long$Variable <- factor(covid_long$Variable,
levels=c("case_prop","death_prop","death_rate"),
labels=c("% of World Cases", "% of World Deaths", "% of Cases Resulting in Death"))
# GRAPHING THE DATA ----------------------------------------------------
covid_long$colours <- covid_long$location[1:16]
covid_long$colours <- factor(covid_long$colours, levels = rev(levels(covid_long$colours)))
p <- ggplot(covid_long, aes(x = location, y = values, fill=colours)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_grid(.~Variable, scales="free") +
ylab("Percentage") +
xlab("Country") +
ggtitle("COVID-19 Case and Death Statistics")
Data Reference
The following plot fixes the main issues in the original.