Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
This data visualisation intends to compare the death toll of various major diseases throughout the last 2000 years. Given the context of when it was created, it would appear that the primary intent is to compare the potential impact of the current COVID-19 disease with previous major diseases. The target audience for this appears to be members of the public who are not only interested in how the potential impact of COVID-19 will compare with that of diseases in the past, but also are specifically looking for visually appealing visualisations given the nature of the subReddit ‘r/dataisbeautiful’. More specifically, r/dataisbeautiful is one of the default subReddits, meaning that its content will appear to all users, not just those specifically interested in data visualisations. As such, the target demographic skews younger, and is concentrated in the US [1]. As such they might be less likely to be aware of various disease outbreaks in the last century, as well as those which primarily occurred outside the US.
The specific variables which it is attempting to communicate are the following:
The visualisation chosen had the following three main issues:
References
[1] TechJunkie Social. (2020). The Demographics of Reddit: Who uses the site?. Retrieved September 20, 2020, from TechJunkie website: https://social.techjunkie.com/demographics-reddit/
[2] User: u/jacobthejones. (2020). Death count of various pandemics as a ratio of world populatio [OC]. Retrieved September 9, 2020, from Reddit.com: https://www.reddit.com/r/dataisbeautiful/comments/fp76db/death_count_of_various_pandemics_as_a_ratio_of/
[3] Visual Capitalist. (2020). Visualising the History of Pandemics. Retrieved September 9, 2020, from Visual Capitalist: https://www.visualcapitalist.com/history-of-pandemics-deadliest/
[4] Worldometer. (2020). World Population by Year. Retrieved September 12, 2020, from Worldometer: https://www.worldometers.info/world-population/world-population-by-year/
[5] Rogers, D.J.; Wilson, A.J.; Hay, S.I.; Graham, A.J.. (2006). The Global Distribution of Yellow Fever and Dengue. Advances in Parasitology. 62: 181-220.
[6] World Health Organisation. (2019). WHO | Smallpox. Retrieved September 20, 2020, from World Health Organisation website: https://www.who.int/csr/disease/smallpox/en/.
[7] Johns Hopkins University & Medicine. (2020). COVID-19 Map - Johns Hopkins Coronavirus Resource Center. Retrieved August 31, 2020, from Johns Hopkins University website: https://coronavirus.jhu.edu/map.html
[8] Institute for Health Metrics and Evaluation (IMHE). (2020). COVID-19. Retrieved September 20, 2020, from IMHE website: https://covid19.healthdata.org/global?view=total-deaths&tab=trend
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(tidyr)
#This is the data for each disease, obtained from Visual capitalist [3], and the world population data is from Worldometer [4]. The world population data was interpolated from that on Worldometer when it was unavailable in the specific year of outbreak.
#The yellow fever data has been updated using that from [5], as the data from [3] only counted a single outbreak in the US
#The smallpox end year has been updated to reflect its eradication as per [6]
#The COVID-19 deaths prediction data has been updated using that from [8], as it is more accurate than the predictions in [2]
diseases<-data.frame(
Year = c(165, 735, 541, 1347, 1520, 1665, 1629, 1817, 1885, 1648, 1889, 1918, 1957, 1968, 1981, 2009, 2002, 2014, 2015, 2019,2019),
End = c(180, 737, 542, 1351, 1980, 1665, 1631, 1923, 1885, 2016, 1890, 1919, 1958, 1970, 2020, 2010, 2003, 2016, 2020, 2020,2020),
Deaths = c(5, 1, 40, 200, 56, 0.1, 1, 1, 12, 0.28, 1, 45, 1.1, 1, 30, 0.2, 0.00077, 0.011, 0.00085, 0.848, 2.667),
World_Population = c(190, 210, 200, 400, 450, 550, 525, 1050, 1500, 1400, 1450, 1800, 2873, 3552, 4537, 6873, 6302, 7295, 7380, 7713, 7713),
Cause = c(1,1,2,2,1,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1),
Disease_Name = c(1:21)
)
#Formats the data to allow visualisation, including faceting and ordering
diseases$Disease_Name<-factor(diseases$Disease_Name, levels = c(1:21), labels = c("Antonine Plague", "Japanese Smallpox Epidemic", "Plague of Justinian", "Black Death", "New World Smallpox Outbreak", "Great Plague of London", "Italian Plague", "Cholera Pandemics 1-6","Third Plague","Yellow Fever", "Russian Flu","Spanish Flu","Asian Flu","Hong Kong Flu","HIV/AIDS","Swine Flu","SARS","Ebola","MERS","COVID-19 Current", "COVID-19 Projected"))
diseases<-diseases[order(diseases$Year),]
diseases$Deaths_Percentage <-100*diseases$Deaths/diseases$World_Population
diseases$Cause<-factor(diseases$Cause,levels=c(1,2), labels = c("", "*"))
diseases$Activity<-diseases$End==2020
diseases$Activity<-factor(diseases$Activity,levels=c(FALSE,TRUE), labels = c("Inactive","Active"))
#Combine name and year into a single variable to allow them to label each disease
diseases$nameYear<-paste(diseases$Disease_Name, diseases$Cause, " (",diseases$Year,"-",diseases$End ,")", sep = "")
diseases_long<-gather(diseases, key = "Variable", value = "Value", c("End", "Deaths", "Deaths_Percentage"))
diseases_long$Variable <- factor(diseases_long$Variable,
levels = c("End", "Deaths","Deaths_Percentage"),
labels = c("Time Period Active",
"Total Deaths (Millions)",
"Percentage of World Population Dead"))
diseases_long$nameYear<-factor(diseases_long$nameYear, levels=unique(diseases_long$nameYear[order(diseases_long$Year)]))
#Create offsets for each variable to allow correct label placement in facets
diseases_long$Offset[diseases_long$Variable =="Total Deaths (Millions)"]<-22
diseases_long$Offset[diseases_long$Variable =="Percentage of World Population Dead"]<-7
diseases_long$Offset[diseases_long$Variable =="Time Period Active"]<-0
diseases_long$Start[diseases_long$Variable == "Time Period Active"]<-diseases_long$Year[diseases_long$Variable == "Time Period Active"]
diseases_long$Start[diseases_long$Variable != "Time Period Active"]<-0
diseases_long$Value[diseases_long$Variable =="Total Deaths (Millions)"]<-signif(diseases_long$Value[diseases_long$Variable =="Total Deaths (Millions)"],2)
diseases_long$Value[diseases_long$Variable =="Percentage of World Population Dead"]<-signif(diseases_long$Value[diseases_long$Variable =="Percentage of World Population Dead"],2)
diseases_long$Labels[diseases_long$Variable =="Total Deaths (Millions)"]<-as.character(diseases_long$Value[diseases_long$Variable =="Total Deaths (Millions)"])
diseases_long$Labels[diseases_long$Variable =="Percentage of World Population Dead"]<-paste(diseases_long$Value[diseases_long$Variable =="Percentage of World Population Dead"],"%", sep="")
diseases_long$Labels[diseases_long$Variable =="Time Period Active"]<-""
#Create flipped bar plot, to ensure disease names are displayed horizontally for ease of reading
pdiseases <- ggplot(data = diseases_long,
aes(x = nameYear, xend = nameYear, y = Start, yend = Value, colour = Activity)) +
geom_segment(stat = "identity", size = 2, lineend="butt", arrow = arrow(angle=90, ends="both", length = unit(0.03, "npc"))) + coord_flip() +
facet_grid(.~Variable, scales = "free")
#Add values to plot
pdiseases <- pdiseases +
geom_text(aes(x = nameYear, label = Labels, y = Value+Offset*(length(Labels)/6-7)),
fill = "gray", hjust = "top",vjust="middle", family="sans", colour = "black") +
labs(
title = "Deaths from Major Disease Outbreaks Throughout History" ,x = "", y = "* Indicates bacterial rather than viral cause.", caption = "Data Obtained From:\nVisual Capitalist (09/09/2020). Visualising the History of Pandemics. https://www.visualcapitalist.com/history-of-pandemics-deadliest/\nWorldometer (12/09/2020). World Population by Year.https://www.worldometers.info/world-population/world-population-by-year/\nRogers, D.J.; Wilson, A.J.; Hay, S.I.; Graham, A.J. (2006). The Global Distribution of Yellow Fever and Dengue. Advances in Parasitology. 62: 181-220\nWorld Health Organisation (20/09/2020). WHO | Smallpox. https://www.who.int/csr/disease/smallpox/en/\nJohns Hopkins University & Medicine (20/09/2020). COVID-19 Map - Johns Hopkins Coronavirus Resource Center. https://coronavirus.jhu.edu/map.html\nInstitute for Health Metrics and Evaluation (IMHE) (20/09/2020). COVID-19 https://covid19.healthdata.org/global?view=total-deaths&tab=trend"
)+
theme(plot.title=element_text(hjust=0.5), plot.caption=element_text(hjust=0), axis.text.x=element_text(size=14, face="bold"), axis.text.y=element_text(size=12, face="bold"))
Data References
[3] Visual Capitalist. (2020). Visualising the History of Pandemics. Retrieved September 9, 2020, from Visual Capitalist: https://www.visualcapitalist.com/history-of-pandemics-deadliest/
[4] Worldometer. (2020). World Population by Year. Retrieved September 12, 2020, from Worldometer: https://www.worldometers.info/world-population/world-population-by-year/
[5] Rogers, D.J.; Wilson, A.J.; Hay, S.I.; Graham, A.J.. (2006). The Global Distribution of Yellow Fever and Dengue. Advances in Parasitology. 62: 181-220.
[6] World Health Organisation. (2019). WHO | Smallpox. Retrieved September 20, 2020, from World Health Organisation website: https://www.who.int/csr/disease/smallpox/en/.
[7] Johns Hopkins University & Medicine. (2020). COVID-19 Map - Johns Hopkins Coronavirus Resource Center. Retrieved August 31, 2020, from Johns Hopkins University website: https://coronavirus.jhu.edu/map.html
[8] Institute for Health Metrics and Evaluation (IMHE). (2020). COVID-19. Retrieved September 20, 2020, from IMHE website: https://covid19.healthdata.org/global?view=total-deaths&tab=trend
The following plot fixes the main issues in the original.