Objective
The original visualisation was published on ABC news website and wanted to tell its news readers that people in 60s is the highest-risk age group of being infected with COVID-19.
However, it had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
# Read data from html
library(rvest)
page <- html("Rates - Disease by age group and sex (5.2) (COVID-19).html")
table <- html_table(page)[[1]]
names(table)[1] <- "Age"
table <- head(table, -2) # drop the last two rows (Unknown row and Total row)
# Reorganise the data with tidyr
library(tidyr)
df <- gather(table, "Male", "Female", "Aust", key = "Group", value = "Case_Rate")
# plot
library(ggplot2)
library(dplyr)
p <- ggplot() +
geom_bar(data=filter(df, Group %in% c("Aust")), aes(x = Age, y = Case_Rate, fill = Group), stat ="identity") +
scale_fill_manual(values = "purple", labels = "All Australia") +
geom_point(data=filter(df, Group %in% c("Male", "Female")), aes(x = Age, y = Case_Rate, colour = Group)) +
geom_line(data=filter(df, Group %in% c("Male", "Female")), aes(x = Age, y = Case_Rate, colour = Group, group = Group)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Case Rate (per 100,000 population)") +
ylab("Case Rate")
Data Reference
The following plot fixes the main issues in the original.
p