Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The objective of the original data visualisation was to identify the number of deaths that occurred in the 20th century due to various causes, such as wars, natural disasters, infectious diseases, and non-communicable diseases. The visualisation allowed users to compare the number of deaths caused by different factors over time and across regions of the world
The targeted audience for this data visualisation was likely to be those who were interested in understanding the patterns and trends in global mortality over the past century. The visualisation might be of particular interest to students, researchers, policymakers, and journalists who were interested in public health, international relations, and global development.
However, the visualisation chosen had the following three main issues:
From the sources the site provided, three out of four were inactive sources, or at least they were not traceable. WHO Mortality report (PDF) linked to a UN web site, which seemed to be a reliable source, and the WHO and OECD websites were both organisations which were highly reputable and known for providing reliable and high-quality data on various health-related indicators and other development indicators. While the data on these websites were often used by researchers, policymakers, and international organisations to inform decision-making and policy development, only the source from OECD was traceable. This led to the issue with data integrity where unknown sources were provided. Using active sources would help improving the reliability of the presentation.
The data visualisation presented an imaginative and visually attractive representation of viruses and diseases that could cause death. However, the use of colours in the visualisation could be confusing, especially when multiple figures were presented together. The darker shade of colour was intended to highlight the main categories, but it could be difficult to differentiate between the circles of various sizes and colours without the lines connecting them. This might result in confusion regarding the prioritisation of information. To avoid such issues, it was essential to be mindful of the colours used and their potential impact on perception, and to ensure that the data was presented in a clear and accurate manner. In addition, the issues observed with the colour and perception of the visualisation also contribute to the possibility of the next issue; deception.
The use of circle sizes in the data visualisation to represent the number of cases in each category and subcategory can be misleading, especially when combined with colours. This is because some subcategories can appear more significant due to the larger circle sizes. To address this issue, an alternative approach that can lead to a better understanding of the information is to use a bar graph. The use of a bar graph would ensure that the comparison between categories and subcategories is clearer and easier to comprehend. This is because the height of each bar represents the amount of data in each category, making it simpler to compare different subcategories and to draw accurate conclusions. Furthermore, the use of a bar graph can also mitigate the issue of colour perception that can arise in circle visualisations. Since the bars are typically coloured uniformly, there is less room for ambiguity in the interpretation of the data
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
death <- data.frame(Cause = c("Natural Disasters", "Infectious Disease", "Pregnancy", "Nutritional Deficiencies", "Noncommunicable Diseases", "Cancer", "War", "Drug", "Accident"),
Total_Cases = c(24, 1680, 435, 59, 1917, 533, 131, 115, 298))
p1 <- ggplot(data = death, aes(group = 1, y = Cause)) +
geom_bar(aes(x = Total_Cases), stat = "identity", fill = "#1F78B4") +
labs(title = "Estimated Major Causes of Death, 20th Century (1900-1999)",
x = "Number of Cases (Millions)",
y = "Cause") +
theme_minimal()
Data Reference
The following plot fixes the main issues in the original.