Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Original Source: morethanmyheight.com.


Objective

The visualisation shows average female height by country, covering a selection of 6 seemingly unrelated countries - Latvia, Australia, Scotland, Peru, South Africa and India.

The visualisation was shared on Reddit in January 2020, and the post author credits the website morethanmyheight.com as the original source, which is a fashion and clothing line designer for tall women. I was not able to locate the visualisation on the original website, but given that it is a fashion business I expect that the visualisation may have been designed with the intention to demonstrate the variation in average heights of women in markets where many of their customers are based (e.g. Australia, Scotland, Peru, South Africa) and compare these to average heights that fall at the upper and lower ends of the spread (Latvia, India).

The visualisation has the following three main issues:

  • Data integrity: The context and parameters of the data and its measurement is not clear. For example, the data source is not given and it is not clear whether the average measure is a mean or median (or mode). There is no way to know when the measurements were taken (what year, or over what period of years) or what the age of the sample participants would be, given that age may have a significant influence on height. It is also not clear why the particular countries shown have been selected, or why they are ordered by what appears to be descending order of height.
  • Perceptual scale and colour issues: The use of a female gender symbol instead of bars is confusing and unhelpful for many reasons. The scaling of the symbols on the x-axis is deceptive and suggests to the audience that Indian women are the size of the lower leg of an average Latvian woman, despite an average height difference of less than 6 inches. The overlapping symbols and use of different shades of pink serves no purpose in conveying information, and the absence of value labels on the bars/symbols makes it difficult to see any difference between heights for women in Australia, Scotland and Peru.
  • Deceptive y-axis range: The y-axis has been truncated to begin from somewhere near 5 foot, instead of starting at zero. This greatly exaggerates the differences in height, and is made worse by the fact that the truncation is not indicated or emphasised.

Reference

Code

The following code was used to fix the issues identified in the original:

library(ggplot2)
library(dplyr)
height <- read.csv(file = "NCD_RisC_eLife_2016_height_age18_countries.txt")
h2 <- height %>% filter(Year.of.birth == "1996", Sex == "Women", Country %in% c("Australia", "India", "Latvia", "Peru", "South Africa", "United Kingdom"))
plot1 <- ggplot(data = h2, aes(x=Country, y=Mean.height..cm.)) +
  geom_bar(stat = "identity", fill = "#FD61D1") +
  geom_text(aes(label=round(Mean.height..cm., digits=1)), vjust=1.6, color="white", size=6) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),
    plot.subtitle = element_text(hjust = 0.5),
    plot.caption = element_text(hjust = 1, face = "italic"),
    axis.text.x=element_text(size=11)
  ) +
  labs(
    title = "Average Female Height by Country",
    subtitle = "Mean height at age 18 for those born in 1996",
    y = "Height (cm)",
    caption = "Data source: NCD Risk Factor Collaboration")

Data Reference

Reconstruction

The following plot fixes the main issues in the original. I obtained what I deemed to likely be equivalent height data. This was measured for those born from 1896 to 1996, and measured at 18 years old for men and women. I filtered for those born in 1996 to ensure the most recent real-world measurements were used, given the audience is for people alive and purchasing clothes today. The data is in centimetres which I consider is more helpful than in impertial measurements, although this could be transformed for another audience (that prefers imperial measurements). Scotland is not recorded as a country in the dataset, and so I have substituted the United Kingdom (which includes Scotland).

I then focused on those countries from the original visualisation, on the assumption that these countries are of significance to the audience at More Than My Height. The visualisation otherwise addresses all of the issues raised in my analysis of the original, and includes labels to clearly indicate the average height for each country depicted. The countries are ordered alphabetically, and the values on the y-axis start from zero.