This report will include 3 parts: (1) Original data visualization (2) Code to improve the issues in original data visulisation and (3) Reconstruction of original data visulisation. The visulisation shows the mortality in 7 countries: USA, France, Spain, Italy, UK, China and Germany, attributed to Covid 19 during period 23 January 2020 to 31 May 2020. Here, the average number of deaths per day is calculated as seven day rolling average whereas increase/ decrease in deaths per day is obtained from smooth rate of change from the date before to the date after the date shown.
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of this data visualization is to show the mortality in seven countries USA, France, Spain, Italy, UK, China and Germany attributed to Covid 19 using two factors (1) Average number of deaths per day and (2) Increase/ Decrease in deaths per day during period 23 January 2020 to 31 May 2020.
As conventional graphs focus to show the number of deaths reported each day for various countries, the audience is still left trying to discern the extent to which the rise from one day to the next is larger or slower. So, these graphs is targeted to audience who are willing to know the rate of death changes per day along with average number of deaths per day.
The visualization chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
#Installing necessary packages
library(ggplot2)
library(magrittr)
library(dplyr)
library(ggpubr)
library(readxl)
library(tidyr)
library(ggh4x)
#Setting up working directory and loading data file into working environment
setwd("~/Desktop/2nd Sem/Visualisation/Assignment 2")
data <- read_excel("~/Desktop/2nd Sem/Visualisation/Assignment 2/Visualisation.xlsx")
#Tidying and reconstructing the dataset
unitedata <- unite(data, col='Countryanddate', c('Country', 'Date'), sep='@')
tidydata <- gather(unitedata, key='Type', value = 'Value', 2:3 )
finaldata <- tidydata %>% separate(Countryanddate, into = c("Country","Date"), sep='@')
finaldata$Date <- as.Date(finaldata$Date)
#Deconstruct and Reconstructing
datavis <- ggplot(finaldata, aes(Date, Value, group = 1)) +
geom_line(aes(color = Country), size=1.4)+
geom_point(color="white", size=0.05) +
scale_x_date(date_breaks = "1 months", date_labels = "%e-%b") +
labs(title = "Mortality in Seven Countries attributed to Covid-19",
subtitle = "(From 23 January to 31 May 2020.)",
y = "",
x = "Date",
caption = "Data Source: COVID-19 Data Repository by CSSE at Johns Hopkins University") +
facet_grid(Country~Type, switch = "y", scale= "free_y") +
theme(
panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1", size = 2.5, linetype = "solid"),
panel.grid.minor = element_line(size = 0.15, linetype = 'solid', colour = "white"),
panel.grid.major = element_line(size = 0.3, linetype = 'solid', colour = "white"),
plot.background = element_rect(fill = "#BFD5E3"),
strip.background = element_rect(fill="#6D9EC1"), #Changing color of facet label background color
strip.text = element_text(color = "white", size = 14), #Changing text size and color of facet grid label
plot.title = element_text(size = 16), #Changing text size of visualization title
plot.subtitle = element_text (size=12), #Changing text size of visualization subtitle
plot.caption = element_text(size = 12), #Changing text size of visualization caption
legend.position = "bottom", #Changing legend location to bottom
legend.title = element_text(size=10), #Changing legend title size
legend.text = element_text(size=9) #Changing legend text size
) +
scale_color_discrete(name = "Lockdown Period",
labels=c("China : 23-Jan to 8-Apr",
"France : 17-Mar to 11-May",
"Germany : 23-Mar to 10-May ",
"Italy : 9-Mar to 18-May",
"Spain : 14-Mar to 9-May",
"USA : 19-Mar to 13-Apr",
"UK : 23-Mar to 3-Jun"))
Data Reference
The following plot fixes the main issues in the original.