Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: (McCandless and Kashan, 2020)


Objective

The objective of the original data visualisation is to infer which of the countries were hit the hardest by the ongoing coronavirus pandemic with respect to the total number of covid cases and the targeted audience is the general public population present around the world.

The visualisation chosen had the following three main issues:

    1. Deceptive methods - Visual Bombardment
      Clustering 200 odd countries with less than a million total cases at the bottom of the chart confuses the audience and essentially misleads them into assuming that the countries US, India and Brazil took the bulk of the damage while the rest only had minor consequences.
    1. Poor choice of graph
      Even though the line graphs are a great choice to represent trends, the messy cluster at the bottom of the chart and the choice to represent the total number of cases per country at given points of time is a poor choice. A better decision would’ve been to represent the top 10 countries affected and line charts would’ve been ideal to represent the day-to-day new cases rather than the total number of cases.
    1. Failure to provide accurate labels
      Essential labels such as x and y axes, grid or point-value labels are missing which leaves the audience to guess what the data represented in the charts represent specifically. For example, the x axis representing the number 20, 40, 60, etc., is very vague. Another example is the total number of cases for countries like India and Brazil.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readxl)
library(magrittr)
library(dplyr)
library(ggplot2)

covid <- read_excel("owid-covid-data.xlsx")
covid <- covid[, c("location", "total_cases")] %>% na.omit() %>% as.data.frame(covid) %>% filter(location != "World") 
covid <- covid %>% group_by(location) %>% summarise(max_cases=max(total_cases)) %>% arrange(desc(max_cases)) %>% top_n(10)
# scaling x axis to millions
covid$max_cases <- covid$max_cases/1000000

p1 <- covid %>% ggplot(aes(x = reorder(location, -max_cases), y = max_cases)) + geom_bar(stat = "identity") + theme_minimal() +
  labs(x = "Countries", y = "Total covid cases (in million)", title = paste("Top 10 countries with highest number of covid cases")) +
  geom_text(aes(label = paste(sprintf("%0.2f", round(max_cases, digits = 2)),"M",sep="")),nudge_y = +0.2)

Data Reference

Reconstruction

The following plot fixes the main issues in the original.