Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
This is a critique and reconstruction of this fun and magical image by Prof. Danny Dorling, of Oxford University visualising the increase or decrease of Covid-19 deaths per day.
The title for his plot is on his website which has been added to the in-document reference. The plot is part of series titled Slowdown Covid-19 and has numerous phase-portrait diagrams showing mortality rates of Covid-19. It is a 2-dimensional plane plot with 2-axis; however, the actual data is timeseries and time points are often added as annotations.
One axis, typically y, is the Average Number of Deaths Per Day, and for the x axis we have the Increase or Decrease in Deaths Per Day. Then it has the inclusion of dates. The goal of the plot, according to his website is to show how fast the death rate rose compared to how slow it will decline (Slowdown Covid-19). It is full of vibrant colour with a coronavirus icon legend. As the death rate increase it pushes towards the right of the plot and as the death rate decreases it moves to the left.
As for the target audience; the colours and illustrated might indicate the author may have been trying to attract people not data-savvy to his post, possibly with the hope they would read further, although the concept or story attempted to being told with the data is not so straightforward as simply daily deaths that generally everyday people care about, so it could easily be for the Oxford University community, as he is a professor there? The content in the plot is focused on European countries close to Britain with the inclusion of the USA and China, likely for comparison.
The visualisation chosen had the following three main issues:
Visual bombardment, with what I would call 2.5-dimensions of data
Accessibility and readability issues through overlapping colours of similar tones.
Perception problems with the passing of time due to chosen plot. The author is aware this is a timeseries but he believes conventional plots don’t show the full picture, maybe so, but is a phase-portrait for this data and with this many overlapping data points really better?
Reference
The following code was used to fix the issues identified in the original.
#load tidyverse into working space
library(tidyverse)
theme_set(theme_minimal())
#read data
covid19.raw.deaths <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
#check data frame structure, tibble [266 x 259] - needs some manipulation to use
#str(covid19.raw.deaths)
#set long form data, filter, rename, drop cols, select locations and time frames from the original plot
C19.deaths <- covid19.raw.deaths %>%
pivot_longer(-c(`Province/State`,`Country/Region`, Lat, Long),
names_to = "date",
values_to = "deaths") %>%
select(-c(`Province/State`, Lat, Long)) %>%
rename(location = `Country/Region`) %>%
mutate(date = lubridate::mdy(date)) %>%
group_by(location, date) %>%
summarise(deaths = sum(deaths)) %>%
ungroup() %>%
filter(location %in% c("China", "Germany", "France", "Italy", "Netherlands", "Spain", "United Kingdom", "US")) %>%
subset(date >= "2020-03-01" & date <= "2020-04-30")
#timeseries is currently accumulative, use the lag function to make it reflect daily change with new var new_deaths, then smooth new_deaths using a 5-day moving average. The non-smoothed plots are quite volatile and much harder to see the pattern.
C19.deaths <- C19.deaths %>%
arrange(date) %>%
group_by(location) %>%
mutate(new_deaths = deaths - lag(deaths, default = 0), ma_7d = zoo::rollmean(new_deaths, k = 5, fill = NA)) %>%
ungroup()
#style the plot. Simple timeseries with a facet and free scales because we are interested in the trend.
p.1 <- C19.deaths %>% ggplot(aes(x = date, y = ma_7d, color = location)) +
geom_line(show.legend = F) +
scale_x_date(date_breaks = "2 week", date_labels = "%d %b") +
scale_y_continuous() +
facet_wrap(~ location, ncol = 2, scales = "free_y") +
labs(x = "Date", y = "Daily Deaths Smoothed",
title = "Covid-19 Death Trends MAR/APR 2020",
subtitle = "Using a 5-day Moving Average")
Data Reference
I am not completely against the use of the phase portrait, but with this timescale I would have fewer data points and I would also animate it.
I simply took it back to its timeseries basics. Took the liberty to use just the UK and exclude England and Wales, though the author may have had reason for this. The main things considered fixed:
Removed distracting legend and created a timeseries running the two months. Through the long tail forming in the daily death count timeseries, in most plots, we can now see observe how deaths will likely come down much slower than they rose. Although this pandemic is far from over, there is a visible peak even as early as April.
Turned it into a faceted timeseries, as this creates sub-titles for country, we no longer need the legend. The facet segregates the countries for better viewing and understanding, as well as pattern matching.
The y-axis is unrestrained, this was a choice made because we are not comparing the amounts rather the patterns, although per million was considered, only the pattern is still lost here as some countries managed better to contain it better.
The plot has been given a title which deals with what we are looking at.
The author did annotate lockdown dates for each country. I have left this out. I do think they help add some context to the curve of each timeseries and should be mentioned.