Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Bankova & Ovaska (2021)


Objective

The visualisation chosen had the following three main issues:

  • Ignoring convention – in general, to illustrate trend over a period of time, x-axis indicates the timeframe and y-axis shows the figures or statistics; however, this data visualisation does not follow this convention. Vertical axis offers the time period and horizontal one provides information on whether the trend of confirmed cases or deaths is at its peak or twice the peak level.
  • Horizontal axis – the scale used in data visualisation on the x-axis were pre-vaccination cases/deaths peak and 2x peak. These do not represent the actual number of confirmed cases or deaths. It may not offer sufficient information to the audience if, for example, they want to find out what the number of confirmed cases or deaths in Europe is at its peak. In addition, if looking at the top of the visualisation, one might question or be confused with why the number of deaths is high when the number of cases is low because the actual scale of the x-axis is not shown.
  • Colour issue – the audience may question the choice of colour in the graph. First of all, with reference to MacDonald (1999), blue and yellow may not conventionally represent confirmed cases of illness and deaths respectively. If the purpose of the article is to demonstrate that vaccination helps to reduce the number of deaths from COVID-19 in Europe, these colours may downplay the significance of the message being delivered. Moreover, further into the new article, a later graph shows that blue is equivalent to vaccinated population whereas yellow represents number of people obtained booster shot across Europe. This may contribute to further confusion to the audience.

Reference

Code

The following code was used to fix the issues identified in the original.

library(dplyr)
library(ggplot2)
library(lubridate)

covid <- read.csv("owid-covid-data.csv")

covid$date <- as.Date(covid$date)

europe <- covid %>% 
  select(continent, location, date, total_cases,total_deaths) %>%
  filter(continent == "Europe") %>%
  group_by(date) %>%
  summarise(Confirmed_Cases = sum(total_cases, na.rm = TRUE),
            Deaths = sum(total_deaths, na.rm = TRUE)) %>%
  mutate(CFR = round(Deaths / Confirmed_Cases * 100, 2))

europe <- subset(europe, date >= "2020-03-01" & date < "2022-04-30")

p <- ggplot(data = europe, aes(x = date, y = CFR))
p <- p + geom_line() +
  geom_vline(xintercept = as.numeric(ymd("2021-05-01")), linetype="dashed", color = "blue") +
  geom_text(aes(x = as.Date("2021-05-01"), label = "After 10% double vaccinated", y = 5), angle = 90, vjust = 1.5, text = element_text(size = 8)) +
  geom_text(aes(x = as.Date("2021-05-01"), label = "Before 10% double vaccinated", y = 5), angle = 90, vjust = -1, text = element_text(size = 8)) +
  labs(title = "Europe's Case Fatality Rate (CFR)",
       y = "Case Fatality Rate (CFR)",
       x = "Time Period") +
   theme_minimal()

Data Reference

Reconstruction

The following plot fixes the main issues in the original.