This R Markdown document creates a heatmap of lab-confirmed SARS-CoV-2 cases per 100,000 estimated people in each age group. This is based on the Public Health England National COVID-19 Surveillance report for week 39.
First, we install the tidyverse group of packages.
library(tidyverse)
library(readxl)
Next, I set the plotting theme.
theme_clean <- theme_bw(base_family="Calibri") +
theme(legend.position = "top",
legend.title = element_text(size = 12),
legend.text = element_text(size = 12),
plot.title = element_text(size = 18, face = "bold"),
plot.subtitle = element_text(size = 12, face = "italic", margin = margin(b=12)),
plot.caption = element_text(size = 10),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank())
theme_set(theme_clean)
For this graph, we are interested in weeks 27 and later.
week_number_min <- 27
Next, I draw in the table from the prepared file:
phe_figure4table_df <- read_excel("PHE National COVID-19 Surveillance Data Report - Figure 4 Table - 2020-09-25.xlsx",
sheet = "DATA",
col_types = c("numeric", "date", "date",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "numeric"))
This data frame is then put into a tidy format:
phe_caserates_agegroup_df <- phe_figure4table_df %>%
pivot_longer(cols = 4:13,
names_to = "age_group",
values_to = "case_rate") %>%
filter(week_number >= week_number_min)
As ggplot2 can sometimes create misaligned date ticks, I want to specify the breaks:
week_end_breaks <- phe_figure4table_df %>%
filter(week_number >= week_number_min) %>%
pull(week_end_date)
Finally, I set the factors. This is so the age groups show in the correct order.
phe_caserates_agegroup_df$age_group <- factor(phe_caserates_agegroup_df$age_group,
levels = c("0 to 4", "5 to 9", "10 to 19",
"20 to 29", "30 to 39", "40 to 49",
"50 to 59", "60 to 69", "70 to 79", "80 or over"))
This is the code for the heatmap:
phe_caserates_agegroup_gg <- ggplot(data = phe_caserates_agegroup_df,
mapping = aes(x = week_end_date,
y = age_group,
fill = case_rate)) +
geom_raster() +
scale_fill_gradient(name = "",
low = "#FFFFFF",
high = "#d1112e",
guide = FALSE) +
geom_text(aes(label = round(case_rate))) +
scale_x_datetime(expand = c(0,0),
breaks = week_end_breaks,
date_labels = "%d-%b") +
labs(title = "In England, viral resurgence does not remain contained within one age group.",
subtitle = "Weekly COVID-19 lab-confirmed cases (pillar 1 and 2) in England per 100,000 people based on ONS population estimates, by age group. The latest statistics are provisional.",
x = "Week End Date (Sunday)",
y = "",
caption = "Data: Public Health England: National COVID-19 surveillance data report: 25 September 2020 (week 39), Figure 4.")
The graph is here:
phe_caserates_agegroup_gg