Objective
This visualization aims to address the general audience all over the world with an intention to communicate the rate of change in the number of deaths per day together with the average number of deaths per day (including on that day, a day before and a day after). The aggregated number of deaths always tends to rise and never fall, therefore author has taken moving average of three consecutive days(including for that day, a day before and a day after). Readers who are keen to understand the rise or fall in the number of deaths per day and whether this rate is larger or smaller from one day to the next would have an amazing experience in understanding this brilliant visualization of data. With the help of this visualization, the author is able to convey the slowing trend of the number of deaths and is able to explain the effects of certain measures like lockdown and social distancing taken by each country according to the worsening of the situation.
The visualisation chosen had the following three main issues:
Trifecta Checkup: This visualization is failing the V of the trifecta Checkup because it seems like a hot mess of data, creating confusion, and unable to express data properly to the reader. The visualization is fairly complex and needs an explanation to understand. It is hard to connect the visual with the question asked and get powerful and quick insight from this visualization.
Poor Scaling Methods: The visualization is failing to choose the right scales as time-series data is poorly visualized on swirls. Taking the rate of change or the average number of deaths over time would have made more sense.
Ethically Wrong(not Beneficence): In the article as well as in the visualization from 3rd April to 4th April, it is depicted that curve for the US is smoothened but it’s not the case and data show a spike. The visualization is not serving its purpose succinctly and accurately. It is misleading the audience by showing a smoothening in the curve to show the chances of a downward trend soon.
Reference
The following code was used to fix the issues identified in the original.
library(readxl)
library(ggplot2)
library(tidyr)
library(dplyr)
library(magrittr)
library(egg)
library(scales)
library(plotly)
#Reading the data
Data <- read_excel("death_data.xlsx")
class(Data$Date)
## [1] "POSIXct" "POSIXt"
#Converting to Date format
Data$Date <- as.Date(Data$Date)
class(Data$Date)
## [1] "Date"
head(Data)
## # A tibble: 6 x 22
## Date USA France Spain Italy UK China Germany Avg_USA Avg_France
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2020-01-23 0 0 0 0 0 18 0 0 0
## 2 2020-01-24 0 0 0 0 0 26 0 0 0
## 3 2020-01-25 0 0 0 0 0 42 0 0 0
## 4 2020-01-26 0 0 0 0 0 56 0 0 0
## 5 2020-01-27 0 0 0 0 0 82 0 0 0
## 6 2020-01-28 0 0 0 0 0 131 0 0 0
## # ... with 12 more variables: Avg_Spain <dbl>, Avg_Italy <dbl>, Avg_UK <dbl>,
## # Avg_China <dbl>, Avg_Germany <dbl>, Change_USA <dbl>, Change_France <dbl>,
## # Change_Spain <dbl>, Change_Italy <dbl>, Change_UK <dbl>,
## # Change_China <dbl>, Change_Germany <dbl>
#Selecting the columns
new_data1 <- Data %>% select("Date","Avg_USA", "Avg_France","Avg_UK","Avg_Italy", "Avg_Spain")
new_data2 <- Data %>% select("Date","Avg_China", "Avg_Germany")
new_data3 <- Data %>% select("Date","Change_USA", "Change_France","Change_UK","Change_Italy", "Change_Spain","Change_China","Change_Germany")
#Renaming the columns
new_data1 <- new_data1 %>% rename("USA" = "Avg_USA", "France" = "Avg_France","UK" = "Avg_UK","Italy" = "Avg_Italy", "Spain" = "Avg_Spain")
new_data2 <- new_data2 %>% rename("China" = "Avg_China", "Germany" = "Avg_Germany")
new_data3 <- new_data3 %>% rename("USA" = "Change_USA", "France" = "Change_France","UK" = "Change_UK","Italy" = "Change_Italy", "Spain" = "Change_Spain","China" = "Change_China", "Germany" = "Change_Germany")
#Converting data from wide to long
data2 <- new_data1 %>% gather("USA", "France","UK", "Italy", "Spain", key = "Country", value = "cases")
data3 <- new_data2 %>% gather( "China", "Germany", key = "Country", value = "cases")
data4 <- new_data3 %>% gather("USA", "France","UK", "Italy", "Spain","China","Germany", key = "Country", value = "cases")
p1 <- ggplot(data = data2, aes(x=Date, y=cases)) +
geom_point(shape=15,aes(color=`Country`),lwd = 1.0) +
geom_line(aes(group=`Country`,color=`Country`))+
labs(x= "Date",
y = "Avg Deaths(3 consecutive days)")+
scale_x_date(date_breaks = "7 days", date_labels = "%d %b")+
theme(axis.text.x=element_text(angle=45, hjust=1))
p2 <- ggplot(data = data3, aes(x=Date, y=cases)) +
geom_point(shape=15,aes(color=`Country`),lwd = 1.0) +
geom_line(aes(group=`Country`,color=`Country`))+
labs(title = "Trend of COVID-19 Deaths",
y = "") +
scale_x_date(date_breaks = "7 days", date_labels = "%d %b")+
theme(axis.text.x=element_text(angle=45, hjust=1))
p3 <- ggplot(data = data4, aes(x=Date, y=cases)) +
geom_point(shape=15,aes(color=`Country`),lwd = 1.0) +
geom_line(aes(group=`Country`,color=`Country`))+
labs(title = "Change in Death Rate",
x= "Date",
y = "Rate of Change")+
scale_x_date(date_breaks = "7 days", date_labels = "%d %b")+
theme(axis.text.x=element_text(angle=45, hjust=1))
#subplot( p1,p2+scale_color_brewer(palette="Dark2"),nrows = 2, shareX = T, heights = c(0.7, 0.3),titleY= TRUE)
#ggplotly(p3)
Data Reference
The following plot fixes the main issues in the original.
Please click on country in legend to exclude it in plot.