Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Rami Krispin coronavirus dashboard.


Objective

The Objective of this visualization is to represent hierarchical structure of the countries’ coronavirus cases. The visualization is used to understand the countries which stand out in the number of cases and to compare the number of cases between countries through the perception of the rectangles. The target users are the authorities, businesses and individuals.

The visualization chosen had the following three main issues:

  • Area and size quantity: The use of size makes it difficult for precise quantitative comparisons using length/height of a rectangle. For example, Mexico (vertical) and Iran (horizontal) are nearly similar in terms of cases and percentage but their sizes make it hard to correlate.
  • Color perception: The color scaling used doesn’t have any purpose or relevance to the data. This prompts user to look randomly at the rectangles unless searching for labels. For example, the rectangle color of Argentina and Germany is same but have no particular meaning and may confuse the viewer while visualizing.
  • Accuracy: There is a lack of accuracy where percentage doesn’t reflect the true number of cases. For example, the percentage and the rectangle size of UK and Italy are same but they have difference of 400,000 in number of cases. Another example is, Ukraine and Iraq has a percentage of 1% but they differ by size and the number of cases by 1 million. Moreover, the text isn’t clear in rectangles of small sizes.

Reference

Code

The following code was used to fix the issues identified in the original.

library(dplyr)
library(tidyr)
library(plotly)
library(ggplot2)

df <- read.csv("https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv", 
               stringsAsFactors= FALSE) %>% mutate(country = ifelse(country == "United Arab Emirates", 
                                                                                   "UAE", country),
                country = ifelse(country == "Mainland China", "China", country),
                country = ifelse(country == "North Macedonia", "N.Macedonia", country),
                country = trimws(country),
                country = factor(country, levels = unique(country)))


df_tree <- df %>%
  group_by(country, type) %>%
  summarise(total = sum(cases)) %>%
  mutate(type = ifelse(type == "confirmed", "Confirmed", type),
                type = ifelse(type == "recovered", "Recovered", type),
                type = ifelse(type == "death", "Death", type)) %>%
  pivot_wider(names_from = type, values_from = total) %>%
  mutate(Active = Confirmed - Death - Recovered) %>%
  pivot_longer(cols = -country, names_to = "type", values_to = "total")
### Cases Distribution by Type 
plot_ly(
  data = df_tree %>% dplyr::filter(type == "Confirmed"), type= "treemap", values = ~total, labels= ~country, 
  parents= ~type, domain = list(column=0), name = "Confirmed", textinfo="label+value+percent parent")
### Reconstruction

tree_df <- df_tree %>% filter(type == "Confirmed") %>% arrange(-total) %>% ungroup(country) %>% top_n(20, total) 

ggplot(data = tree_df, aes(x = reorder(country, total), y = total)) +
  geom_bar(stat="identity", fill = "#FF6666") + 
  geom_text(aes(label = total), hjust = -0.1) +
  coord_flip() +
  ggtitle("Confirmed covid cases of the countries in World")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.


Source: Rami Krispin coronavirus dashboard.