Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Original Visualisation
Objective
Visually to show the history of viruses/pandemics over time and a count of deaths that have occurred for each of the outbreaks. Targeted audience would be medical people, epidemiologist, General doctors, medical students and government dealing with virus and diseases. The purpose of this visualisation would be to demonstrate the various historic pandemics over time to the audience to help improve the death rates and raise awareness of such diseases in the medical community. I used the first graph titled “history of pandemics” and not the second graph titled Death toll.
The visualisation chosen had the following three main issues:
One issue with the visualisation is the 3D effect for the years and Trapezoid area distortion. The number of deaths givens the wrong impression for the Bubonic plague for 200Million deaths vs the same sized graphic of HIV/AIDS for 35Million and the Spanish Flu 50Million deaths. The area of bubonic plague is half the size of both HIV/AIDS and Spanish Flu.
Second issue is volume and density. The size of each virus is misleading and forces the reader to read the values of deaths. This doesn’t allow for direct scale comparisons
Third issue is raw data is misleading, some of viruses and number of years showing multiple years although some viruses would have spiked early like the recent times like HIV and corona but modern drugs would have slowed transmission or life expectancy after transmission meaning the years are spread further but impact is much shorter and earlier. Some of the data for count of deaths for the older type of pandemics have large lower and upper intervals of count of deaths making not clear the total count effect. Some data wrangling was completed in excel for name changing of columns and expanding to wole numbers to the orginal data
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(tidyr)
library(dplyr)
library(readr)
library(readxl)
library(swirl)
library(knitr)
library(htmltools)
library(Hmisc)
library(stringr)
library(outliers)
library(openxlsx)
library(foreign)
library(haven)
library(MVN)
library(forecast)
library(infotheo)
library(readxl)
library(openxlsx)
library(scales)
pandemic <- read.csv("historyofpandemicsdata2.csv")
p1 <- ggplot(data=pandemic, aes(x=Average.year ,y=Deaths)) + geom_bar(stat="identity", fill="steelblue", width=12) + labs(title="History of Pandemics", x="Years", y = "Total deaths") + scale_y_continuous(labels = comma) + geom_text(aes(label =Name), vjust = -0.5, check_overlap = TRUE, size =2.5)
Data Reference
*data was copied from website and excel was used to data wrangle. https://www.visualcapitalist.com/history-of-pandemics-deadliest/
*other websites used to fact find was as follows :
*Wikipedia for small pox from website refereence : https://en.wikipedia.org/wiki/History_of_smallpox#:~:text=During%20the%201770s%2C%20smallpox%20killed,immunity%20and%20non%2DEuropean%20vulnerability.
wikipedia for yellow fever : https://en.wikipedia.org/wiki/Yellow_fever wikipedia for Fever : https://www.britannica.com/event/1968-flu-pandemic
*Wikipedia for swine flu : https://www.cdc.gov/flu/pandemic-resources/2009-h1n1-pandemic.html
*Stack over flow for integar values : https://stackoverflow.com/questions/15622001/how-to-display-only-integer-values-on-an-axis-using-ggplot2
*r cheat sheets for ggplots : https://www.rstudio.com/resources/cheatsheets/
The following plot fixes the main issues in the original.i had technical problems to add names of virus to the columns and run out of time.