Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Reddit (misteregamer1, 2020).


Objective

This visualisation is about showing the average female height per country. These countries include India, South Africa, Peru, Scotland, Australia and Lativa. The graph was posted on Reddit website in 2020, and mentioned that the source of the graph is a website (morethanmyheight.com). The website (More Than My Height) seems to be switched to another domain which is (amallitalli.com), which is a website store selling clothes for tall females. The audience of the original visualisation is tall females, as mentioned on the website (Targeted customers).

The visualisation chosen had the following three main issues:

  • Wrong visualisation method: The visualisation uses female symbols rather than using bars shape in the graph. This issue can lead to difficulty in distinguishing the differences between others in the graph.
  • Truncating scale: Starting the height (Y-axis) from 5 ft rather than starting from zero, which exaggerates data disparities. In the graph, Indian and Latvian females have a very exaggerated and inaccurate difference from each other.
  • Accuracy, completeness, and consistency of data: The data doesn’t give further details about the graph, which can help us to know the context and the representation of the values and the method used to represent the data in the chart.

Reference

Code

The following code was used to fix the issues identified in the original.

library(dplyr)
library(ggplot2)

avg_female_height <- read.csv("Height_of_Male_and_Female_by_Country_2022.csv")
avg_female_height <- setNames(avg_female_height, c("Rank_No", "Country","Male_H_Cm","Female_H_Cm","Male_H_Ft","Female_H_Ft"))

avg_female_height <- avg_female_height %>% select(-c('Rank_No', 'Male_H_Cm', 'Male_H_Ft', 'Female_H_Ft'))

# Select rows by using list of column values
# Scotland data doesn't exist in the dataset
# We are going to select United Kingdom
avg_female_height <- avg_female_height[is.element(avg_female_height$Country, c('Latvia', 'Australia', 'United Kingdom','Peru', 'South Africa', 'India')),]

# Sort rows descending order
avg_female_height <- avg_female_height[order(avg_female_height$Female_H_Cm, decreasing=TRUE),]

# Plotting bar chart
# Re-order bins by values (Highest to Lowest)
bar_chart <- ggplot(data = avg_female_height, aes(x = reorder(avg_female_height$Country, -avg_female_height$Female_H_Cm), y = avg_female_height$Female_H_Cm))+
  geom_bar(stat = "identity", fill = "orange3") + geom_text(aes(label = avg_female_height$Female_H_Cm), position=position_dodge(width=0.8), vjust=-0.25) + 
  labs(title = "Average Height of Females per Country", subtitle = "The Average Height of Females (2022)", caption = "The source of data: Kaggle website(MAJYHAIN, 2022)", x = "Country", y = "The average height in cm")+
  theme(plot.title = element_text(hjust = 0.6), plot.subtitle = element_text(hjust = 0.6))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.