Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The above data visualisation aims to find the total COVID-19 cases from January 2020 to September 2020. It shows the cumulative number of total cases for several countries such as United States, India, Peru, Russia, Brazil, etc.
The target audience benefitting from this data visualisation can be every sector impacted by COVID-19 along with the government agencies and the healthcare workers who are researching about the new virus and tracking down its spread. Apart from these, people, in general also benefit from the visualisation as it shows the monthly change in number of COVID cases.
The visualisation chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
library(readr)
library(ggplot2)
library(tidyr)
library(magrittr)
library(dplyr)
library(lubridate)
covid<- read_csv("results.csv")
head(covid)
## # A tibble: 6 x 7
## day `United States` India Brazil Russia Peru Other
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 31-12-2019 0 0 0 0 0 27
## 2 01-01-2020 0 0 0 0 0 27
## 3 02-01-2020 0 0 0 0 0 27
## 4 03-01-2020 0 0 0 0 0 44
## 5 04-01-2020 0 0 0 0 0 44
## 6 05-01-2020 0 0 0 0 0 59
#Tidying up the dataset
covid<- covid %>% gather(`United States`, `India`, `Brazil`, `Russia`, `Peru`, `Other`, key = "Country", value = "Cases")
head(covid)
## # A tibble: 6 x 3
## day Country Cases
## <chr> <chr> <dbl>
## 1 31-12-2019 United States 0
## 2 01-01-2020 United States 0
## 3 02-01-2020 United States 0
## 4 03-01-2020 United States 0
## 5 04-01-2020 United States 0
## 6 05-01-2020 United States 0
#Converting day to date format
covid$day<- dmy(covid$day)
covid$day<- format(covid$day, "%d-%m")
x<-c("01-01","01-02","01-03","01-04","01-05","01-06","01-07","01-08","01-09", "14-09")
covid<-covid[covid$day %in% x,]
#Converting day to factor variable with the following labels for better representation.
covid$day<- factor(covid$day, labels = c("01 Jan","01 Feb","01 Mar","01 Apr","01 May","01 Jun","01 Jul","01 Aug","01 Sep", "14 Sep"), ordered = TRUE)
colnames(covid)<- c("Month", "Country", "Cases")
#Displaying cases in Millions
covid$Cases<-format(round(covid$Cases / 1000000, 1))
covid$Cases<- as.numeric(covid$Cases)
#Reconstruction
p1<- ggplot(data = covid, mapping = aes(x=Month, y=Cases, colour= Country, group= Country))
p1<- p1 + geom_line( size=1, linetype=1) +
geom_point(shape=20, fill="black", size=2, colour="black") +
scale_y_continuous(limits = c(0,12.5), name = "Number of cases (Millions)", breaks = seq(0,12,2)) +
scale_x_discrete(name= "Months") +
theme_bw() +
scale_color_brewer(palette = "Set2") +
theme(axis.line = element_line(color = 'black')) +
theme(panel.grid.minor = element_blank(),
panel.border = element_blank(),
legend.background = element_rect(fill = "gray94"),
legend.title = element_text(face = "bold"),
plot.title = element_text(color = "black", size = 12, face = "bold"),
plot.subtitle = element_text(color = "grey33")) +
ggtitle(label = "COVID-19 cases from January-September 2020", subtitle = "Plot of total COVID-19 cases by each country")
Data Reference
[OC] Total & New daily COVID-19 cases per country : dataisbeautiful. (2020). Retrieved 19 September 2020, from https://www.reddit.com/r/dataisbeautiful/comments/isvfm5/oc_total_new_daily_covid19_cases_per_country/
https://gist.github.com/liamstrilchuk/244cdac2e414299b1243f9b5d269cf42
The following plot fixes the main issues in the original.