Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: DataIsBeautiful reddit.com - Distribution of brick colours in current Lego sets [OC] (2021).


Objective

This data visualisation titled “Distribution (%) of Lego brick colours in current Lego sets” uses a stylised minimalistic 3D bar chart to display colour distribution of Lego bricks. This bar chart uses stacks of Lego bricks at different heights to represent colour distribution percentage. The target audience for this chart are Lego enthusiasts and people who enjoy looking at interesting data visualisations.

The visualisation chosen had the following three main issues:

  • Issue 1 - Orientation - This chart currently displays percentage on the Y axis and Colour on the X axis. This is a very minimalist and stylized approach. Interpreting the data in this way and only using the colour on the X axis to identify the colour name makes it difficult to identify colours and their corresponding percentage distribution.
  • Issue 2 - X Axis Labels missing - Without labels on the X axis it is difficult to clearly identify the colour name by looking at just the colour used. Due to the large amount of colours available with many being very similar it is difficult to distinguish and visually identify particular Lego brick colours in this chart in its current form.
  • Issue 3 - Data Values missing - This chart uses 3D rendered stacked Lego bricks to represent a bar chart. Although it does have aesthetic appeal it is not very detailed. Primarily the data values which are a percentage of distribution of Lego brick colour are difficult to observe as the actual value for each colour percentage is missing. Also, as the values descend along the X axis using this stylized 3D brick approach it shows that a lot of colours share the same distribution percentage. However, further analysis has shown that this is incorrect and misleading as only very few colours actually share the same distribution percentage value.

Reference

Code

The following code was used to fix the issues identified in the original.

# Required libraries
library(readr)
library(ggplot2)
library(dplyr)

# Read in data
# Data downladed from https://rebrickable.com/downloads/
inventory_parts_df <- read_csv(file =  "inventory_parts.csv")
colors_df <- read_csv(file =  "colors.csv")

# Generate summary data frame that holds distinct color counts
summary_df <- inventory_parts_df %>% 
  count(color_id)

# Change "color_df" data frame "id" variable name e to "color_id" to match the "color_id" variable name in the "summaty_df" data frame
names(colors_df)[1] <- "color_id"

# Add color details to "summary_df" data frame (join on "color_id")
summary_df <- merge(summary_df, colors_df, by = "color_id")

# Calculate and add new "percent" variable to the "summary_df" data frame
summary_df$percent = 100/sum(summary_df$n) * summary_df$n

# Round "percent" variables 2 decimal places
summary_df$percent <- as.numeric(format(round(summary_df$percent, 2), nsmall = 2))

# Reorder "summary_df" data frame using "n" (count)
summary_df <- summary_df[order(decreasing = TRUE, summary_df$n),]

# Create new "chart_df" data frame that only holds top 34 observations to match original chart
chart_df = summary_df[1:34,]

# Generate Bar Chart
p1 <- ggplot(chart_df, aes(x = reorder(name, percent), y = percent, fill=name), show.legend = FALSE) + 
  geom_bar(stat = "identity") +
  geom_col(colour = "black") +
  scale_fill_manual(values=c(
    "[No Color/Any Color]" = "#05131D",
    "Black" = "#05131D",
    "Blue" = "#0055BF",
    "Bright Green" = "#4B9F4A",
    "Bright Light Orange" = "#F8BB3D",
    "Bright Pink" = "#E4ADC8",
    "Brown" = "#583927",
    "Dark Blue" = "#0A3463",
    "Dark Bluish Gray" = "#6C6E68",
    "Dark Brown" = "#352100",
    "Dark Gray" = "#6D6E5C",
    "Dark Pink" = "#C870A0",
    "Dark Purple" = "#3F3691",
    "Dark Red" = "#720E0F",
    "Dark Tan" = "#958A73",
    "Flat Silver" = "#898788",
    "Green" = "#237841",
    "Light Bluish Gray" = "#A0A5A9",
    "Light Gray" = "#9BA19D",
    "Lime" = "#BBE90B",
    "Medium Azure" = "#36AEBF",
    "Medium Dark Flesh" = "#CC702A",
    "Orange" = "#FE8A18",
    "Pearl Gold" = "#AA7F2E",
    "Red" = "#C91A09",
    "Reddish Brown" = "#582A12",
    "Tan" = "#E4CD9E",
    "Trans-Clear" = "#FCFCFC",
    "Trans-Light Blue" = "#AEEFEC",
    "Trans-Orange" = "#F08F1C",
    "Trans-Red" = "#C91A09",
    "Trans-Yellow" = "#F5CD2F",
    "White" = "#FFFFFF",
    "Yellow" = "#F2CD37"
    ))+  
  geom_text(aes(label = percent), hjust = -0.2, size = 3.5) + 
  labs(title = "Distribution (%) of brick colours in current Lego sets", 
       x = "Colour",
       y = "Percentage") +
  ylim(0, 20) +
  coord_flip() +
  theme_classic() +
  theme(legend.position = "none")

Data Reference

Reconstruction

The following plot fixes the main issues in the original.