Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The visualisation shows average female height by country, covering a selection of 6 seemingly unrelated countries - Latvia, Australia, Scotland, Peru, South Africa and India.
The visualisation was shared on Reddit in January 2020, and the post author credits the website morethanmyheight.com as the original source, which is a fashion and clothing line designer for tall women. I was not able to locate the visualisation on the original website, but given that it is a fashion business I expect that the visualisation may have been designed with the intention to demonstrate the variation in average heights of women in markets where many of their customers are based (e.g. Australia, Scotland, Peru, South Africa) and compare these to average heights that fall at the upper and lower ends of the spread (Latvia, India).
The visualisation has the following three main issues:
Reference
r/CrappyDesign (2020). This graph comparing average women’s height around the world is…well… . Retrieved 19 September, 2020, from Reddit: https://www.reddit.com/r/CrappyDesign/comments/eqhtos/this_graph_comparing_average_womens_height_around/
More Than My Height (2018). What is MTMH? - More Than My Height. Retrieved 19 September, 2020, from More Than My Height website: https://morethanmyheight.com/what-is-mtmh/
The following code was used to fix the issues identified in the original:
library(ggplot2)
library(dplyr)
height <- read.csv(file = "NCD_RisC_eLife_2016_height_age18_countries.txt")
h2 <- height %>% filter(Year.of.birth == "1996", Sex == "Women", Country %in% c("Australia", "India", "Latvia", "Peru", "South Africa", "United Kingdom"))
plot1 <- ggplot(data = h2, aes(x=Country, y=Mean.height..cm.)) +
geom_bar(stat = "identity", fill = "#FD61D1") +
geom_text(aes(label=round(Mean.height..cm., digits=1)), vjust=1.6, color="white", size=6) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 1, face = "italic"),
axis.text.x=element_text(size=11)
) +
labs(
title = "Average Female Height by Country",
subtitle = "Mean height at age 18 for those born in 1996",
y = "Height (cm)",
caption = "Data source: NCD Risk Factor Collaboration")
Data Reference
The following plot fixes the main issues in the original. I obtained what I deemed to likely be equivalent height data. This was measured for those born from 1896 to 1996, and measured at 18 years old for men and women. I filtered for those born in 1996 to ensure the most recent real-world measurements were used, given the audience is for people alive and purchasing clothes today. The data is in centimetres which I consider is more helpful than in impertial measurements, although this could be transformed for another audience (that prefers imperial measurements). Scotland is not recorded as a country in the dataset, and so I have substituted the United Kingdom (which includes Scotland).
I then focused on those countries from the original visualisation, on the assumption that these countries are of significance to the audience at More Than My Height. The visualisation otherwise addresses all of the issues raised in my analysis of the original, and includes labels to clearly indicate the average height for each country depicted. The countries are ordered alphabetically, and the values on the y-axis start from zero.