Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Visualizing the History of Pandemics


Objective

The original data visualization tries to educate the common public regarding the pandemics over human history. This visualization was published when covid-19 was terrorizing the world and people weren’t aware of how intensity of a pandemic was measured. This visualization tries to normalise the pandemic situation and gives us insight about how it actually occurs and how ancient civilizations perceived them.

The visualisation chosen had the following four main issues:

  • Irregularity in scaling: The years shown in the original graph are not scaled equally. We can see that initially each unit is 100 years which then reduces to 50 then 25. Thus misleading the viewer to believe that there were many pandemics in the early years.
  • Area is used to describe the effect of the pandemic on the civilization. Considering the graph is 3d black death is shown to be lesser severe than Spanish flu. Hence creating a misconception
  • The visualization focuses on the death toll without considering the population of the time or the infected population. This causes a different perception among people who aren’t aware of the geography of human history of the respective time period. The population has increased drastically and cannot be plot of the same graph we should consider the infected rate. i.e., the percentage of people infected by the pandemic thereby getting an understanding of how infectious the pandemic is.
  • The color scheme used has no significant meaning. In the visualization created I have given importance to the color scheme and hues.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)

#creating the data frame based on the data available in the source website
pan <- data.frame(pandemic = c("Antonine Plague", "Plague of Justinian", 
                                  "Black Death", "Smallpox Outbreak", "Third Plague","Spanish Flu","HIV/AIDS","COVID-19"),
                  
                      mortality_rate = c(10,50,60,30,60,10,40,2),
                    Year= c(200,500,1300,1500,1850,1919,1981,2020),
                  infected_rate=c(25,47.6,85.3,21.74,1.58,27.47,1.68,5))

#infected rate is the percentage of people of the population of the earth at the time infected with the pandemic 
#creating the plot
p <- ggplot(pan, aes(x = Year, y = infected_rate, fill=mortality_rate)) +
  geom_bar(stat = "identity", width=45) + 
   ylim(0,100) +
  geom_text(aes(label = pandemic), hjust =-0.05 , 
             position = position_dodge(1), size = 3) +
  xlab("Year") + ylab("Percentage of people infected") +
  ggtitle("History of Pandemics") +
  theme(plot.title = element_text(color = "black", size = 20, hjust=0.5))+ coord_flip()

p2 <- p +
  labs(fill = "Mortality rate")
# the bar which is longer and lighter is considered deadlier

Data Reference

Reconstruction

The following plot fixes the main issues in the original.