Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: [OC] Total & New daily COVID-19 cases per country : dataisbeautiful, 2020


Objective

The above data visualisation aims to find the total COVID-19 cases from January 2020 to September 2020. It shows the cumulative number of total cases for several countries such as United States, India, Peru, Russia, Brazil, etc.

The target audience benefitting from this data visualisation can be every sector impacted by COVID-19 along with the government agencies and the healthcare workers who are researching about the new virus and tracking down its spread. Apart from these, people, in general also benefit from the visualisation as it shows the monthly change in number of COVID cases.

The visualisation chosen had the following three main issues:

  • Incorrect representation of data
    • The graph represents total COVID-19 cases by country since January 2020. But according to the graph, Unites States has the least number of total COVID-19 cases as compared to all other countries. However, in reality, the US has the highest number of cases. Therefore, there seems to be an error in the visualisation. For each x axis value, it is taking column wise cumulative sum of COVID-19 cases. For instance, while the line for US represnts the total cases in the country, the line for India combines the value of both US and India. And a similar trend of cumulative addition can be observed for other countries as well, which is an incorrect way of representation.
  • Incorrect choice of graph
    • Had this graph been a correct visualisation, the use of density graph wouldn’t have been the most appropriate choice because it is evident from public data that countries like India and Brazil have shown a recent surge in cases as compared to the rest of the world. The graph fails to capture this change and a correct representation of the same would have been untidy because of the colour overlap of the density graph.
  • Improper axes lables
    • The label for y axis is missing from the graph, as a result of which, it is difficult to understand what the values are trying to convey. Also, the font size of label and title is too small and not easily readable.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readr)
library(ggplot2)
library(tidyr)
library(magrittr)
library(dplyr)
library(lubridate)

covid<- read_csv("results.csv")
head(covid)
## # A tibble: 6 x 7
##   day        `United States` India Brazil Russia  Peru Other
##   <chr>                <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1 31-12-2019               0     0      0      0     0    27
## 2 01-01-2020               0     0      0      0     0    27
## 3 02-01-2020               0     0      0      0     0    27
## 4 03-01-2020               0     0      0      0     0    44
## 5 04-01-2020               0     0      0      0     0    44
## 6 05-01-2020               0     0      0      0     0    59
#Tidying up the dataset 
covid<- covid %>% gather(`United States`, `India`, `Brazil`, `Russia`, `Peru`, `Other`, key = "Country", value = "Cases")
head(covid)
## # A tibble: 6 x 3
##   day        Country       Cases
##   <chr>      <chr>         <dbl>
## 1 31-12-2019 United States     0
## 2 01-01-2020 United States     0
## 3 02-01-2020 United States     0
## 4 03-01-2020 United States     0
## 5 04-01-2020 United States     0
## 6 05-01-2020 United States     0
#Converting day to date format
covid$day<- dmy(covid$day)
covid$day<- format(covid$day, "%d-%m")
x<-c("01-01","01-02","01-03","01-04","01-05","01-06","01-07","01-08","01-09", "14-09")
covid<-covid[covid$day %in% x,]

#Converting day to factor variable with the following labels for better representation.
covid$day<- factor(covid$day, labels = c("01 Jan","01 Feb","01 Mar","01 Apr","01 May","01 Jun","01 Jul","01 Aug","01 Sep", "14 Sep"), ordered = TRUE)
colnames(covid)<- c("Month", "Country", "Cases")

#Displaying cases in Millions
covid$Cases<-format(round(covid$Cases / 1000000, 1))
covid$Cases<- as.numeric(covid$Cases)

#Reconstruction
p1<- ggplot(data = covid, mapping = aes(x=Month, y=Cases, colour= Country, group= Country))
p1<- p1 + geom_line( size=1, linetype=1) +
  geom_point(shape=20, fill="black", size=2, colour="black") +
  scale_y_continuous(limits = c(0,12.5), name = "Number of cases (Millions)", breaks = seq(0,12,2)) +
  scale_x_discrete(name= "Months")  +
  theme_bw() +
  scale_color_brewer(palette = "Set2") +
  theme(axis.line = element_line(color = 'black')) +
  theme(panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        legend.background = element_rect(fill = "gray94"),
        legend.title = element_text(face = "bold"),
        plot.title = element_text(color = "black", size = 12, face = "bold"),
        plot.subtitle = element_text(color = "grey33")) +
  ggtitle(label = "COVID-19 cases from January-September 2020", subtitle = "Plot of total COVID-19 cases by each country")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.