Original


Source: Ting, I., Scott, N. & Workman, M. (2020).


Objective

The original visualisation was published on ABC news website and wanted to tell its news readers that people in 60s is the highest-risk age group of being infected with COVID-19.

However, it had the following three main issues:

  • Because the total rate is divided into male group and female group, it is not clear to show which group is the highest. In fact, we cannot derive the total rate by just adding the rates of the two gender group since the group bases is totally different. Simply speaking, \[\frac{1}{3}+\frac{2}{5} \neq \frac{1+2}{3+5}\]
  • The visualisation did not show the x-axis, which may cause audience to misunderstand the actual rate of confirmed cases of COVID-19.
  • There is also one information we can obtain from the data, but the visualisation might not show well. In some age groups, the confirmed case rate of women was higher than the rate of men, and vice versa.

Reference

Code

The following code was used to fix the issues identified in the original.

# Read data from html
library(rvest)
page <- html("Rates - Disease by age group and sex (5.2) (COVID-19).html")
table <- html_table(page)[[1]]
names(table)[1] <- "Age"
table <- head(table, -2) # drop the last two rows (Unknown row and Total row)

# Reorganise the data with tidyr
library(tidyr)
df <- gather(table, "Male", "Female", "Aust", key = "Group", value = "Case_Rate")

# plot
library(ggplot2)
library(dplyr)
p <- ggplot() + 
  geom_bar(data=filter(df, Group %in% c("Aust")), aes(x = Age, y = Case_Rate, fill = Group), stat ="identity") +
  scale_fill_manual(values = "purple", labels = "All Australia") +
  geom_point(data=filter(df, Group %in% c("Male", "Female")), aes(x = Age, y = Case_Rate, colour = Group)) + 
  geom_line(data=filter(df, Group %in% c("Male", "Female")), aes(x = Age, y = Case_Rate, colour = Group, group = Group)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Case Rate (per 100,000 population)") + 
  ylab("Case Rate")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

p