Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: reddit.dataisugly.


Objective

From my point of view, the objective of the original data visualisation is to see the difference height of female between countries more clearly and efficiently.

In this case, the visualization is not complex. The audience can be a non-technical person, a complex visualization may confuses a non-technical audience and prevents them from extracting value. In the most of situation, we can find this kind of visualization online.

The visualisation chosen had the following three main issues:

  • It is not clear where the y-axis starts in this visualization

    The visualization confuses people about the actual difference between these contries. For example, compare the height of Latvia and India in the visualization, it has 0.5 foot of difference. But the visualization shows Indian height just reach Latvia female’s knees which is impossible.

  • The use of colour scale is inappropriate

    This visualization uses three different colours represent 6 different countries of average height which is unnecessary and misleading. For example, Latvia and South Africa are using the same colour, different from other countries, but it does not seen to have any connection. It is important to use colour to differentiate important features.

  • The information of data is not accurate

    We can see from the visualization, it uses simple images of women to represent their average heights with non-standard unit foot. People may not know the height straight from the visualization. For example, the average female height in Australia, Scotland and Peru seem to be equal, But they are not. And the choice of country is not very strict and representative, Scotland is not really a country.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(readxl)
average_height <- read_excel("~/Desktop/DV/Assignment/assignment2template1950/average height.xlsx")

average_height$Country<-as.factor(average_height$Country)
average_height$Height<-as.numeric(average_height$Height)
barplot <- ggplot(data = average_height, aes(x=average_height$Country, y=average_height$Height,fill=Gender))
barplot <- barplot + geom_bar(stat="identity", position="dodge") +geom_text(aes(label = paste(Height,"(cm)",sep="")))+
  labs(title = "Average Height by Countries", x = "Country", y = "height(cm)") + theme_minimal() + scale_y_continuous(limits = c(0,200))

Data Reference

Height > Data Download > NCD-RisC. Ncdrisc.org. (2020). Retrieved 17 September 2020, from http://www.ncdrisc.org/data-downloads-height.html.

Reconstruction

The following plot fixes the main issues in the original.

change the unit to centimeter

Extend y-axis from 0 to 200 centimeter

Adding male average height to have a better comparison with different color scale from female

Adding some more representative countries to have a better comparison