Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The original visualization was used to compare average adult male heights in (feets and inches) from different countries against the average adult male heights in the US. The visualization focuses on identifying the tallest male population based on countries and figuring out if there are significant climate, geographical and ethnic background influences on the findings. The target audience are generally anyone interested in the average male height data. Such diagram can also be of interest of health organizations, nutritional experts and scientists.
The visualisation chosen had the following three main issues:
Colour issue. As you can see from the diagram, the colour chosen are darker for taller average male height and lighter for shorter average male heights. However, it is fairly difficult to differentiate all the heights in between as the choice of shades from a dark black to a light blueish hue makes it hard to compare. When the shades are placed on a world map, it makes it almost impossible to visually communicate the objectives of the data. The choice of the colour palette can be tricky especially for people with achromatopsia (monochromatic vision) when trying to differentiate the grayish palettes and the light blueish ones.
The second issue is the lines used to label each country. This adds unnecessary complexity to the visualization. Using the world map to represent this dataset is misleading since some countries have bigger borders and we are not trying to compare continental average adult male heights.
The original dataset has not been referenced properly in the visualization and only contains the website where the visual was sourced from. This raises a few concerns which are unknown source, data integrity and misleading to audience.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(dplyr)
## Load the data and display
df = read.csv("data.csv")
head(df)
## place pop2023 growthRate area country cca3 cca2 ccn3 region
## 1 528 17618299 0.00309 41850 Netherlands NLD NL 528 Europe
## 2 499 626485 -0.00095 13812 Montenegro MNE ME 499 Europe
## 3 70 3210847 -0.00701 51209 Bosnia and Herzegovina BIH BA 70 Europe
## 4 352 375318 0.00649 103000 Iceland ISL IS 352 Europe
## 5 208 5910913 0.00487 43094 Denmark DNK DK 208 Europe
## 6 203 10495295 0.00013 78865 Czech Republic CZE CZ 203 Europe
## subregion landAreaKm density densityMi Rank year meanHeightMale
## 1 Western Europe 33670.0 523.2640 1355.2538 72 2019 183.7824
## 2 Southern Europe 13450.0 46.5788 120.6391 169 2019 183.3022
## 3 Southern Europe 51200.0 62.7119 162.4237 137 2019 182.4740
## 4 Northern Europe 100830.0 3.7223 9.6407 179 2019 182.1016
## 5 Northern Europe 40000.0 147.7728 382.7316 115 2019 181.8927
## 6 Eastern Europe 77198.5 135.9521 352.1158 89 2019 181.1866
## meanHeightFemale rank
## 1 170.3612 1
## 2 169.9609 2
## 3 167.4704 3
## 4 168.9135 4
## 5 169.4706 5
## 6 167.9635 6
## Choosing only columns relevant to original visualisation
Male_height = subset(df, select = c(country, meanHeightMale))
## Choosing only rows/country in the original visualisation
Male_height_sel <- filter(Male_height, country %in% c("India", "China", "Mexico", "Japan", "Brazil", "Australia","Canada","Italy", "France", "Russia", "United States", "United Kingdom", "Spain", "Greece", "Germany"))
## Reconstructed Visualisation
plot <- ggplot(Male_height_sel, aes(x = reorder(country, -meanHeightMale), y = meanHeightMale))
plot <- plot + geom_bar(stat = "identity", position = "dodge", colour = "skyblue", fill = "steelblue", width = 0.6) + geom_text(aes(label = round(meanHeightMale, 1)), size = 3, vjust=-0.3)+labs(title = "Average Adult Male Height by Country",x = "Country", y = "Heights in (cm)") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5, size = 15, face = "bold"), axis.title.x = element_text(face = "bold"), axis.title.y = element_text(face = "bold")) + coord_cartesian(ylim = c(150, 185))
Data Reference
The following plot fixes the main issues in the original.