Explore LEGO world – from mini-dolls hairstyle
Introduction
In July 2021, LEGO officially released the most extensive mini-doll set ever made, the Frozen Castle. In addition to the main castle, one of the most notable features is the five mini-dolls, including characters already featured in Frozen I and Frozen II, Elsa in her coronation robe, Anna, Kristoff, Olaf the Snowman and a second Elsa in her legendary costume.
But in fact, Lego’s mini-doll is somewhat controversial. When it was first introduced(in 2012), it was noticed that the mini-doll was less blocky, more stylised and taller, unlike traditional LEGO Minifigures; it also had a more feminine appearance (Gutwald, 2017). Many fans of Lego found the gender imbalance to be natural, which was seen as a reflection of stereotypes. But in fact, LEGO launched mini-doll to appeal to female players and balance the gender ratio of the audience (Reich et al., 2017). The mini-dolls are about 5mm taller than standard Minifigures and have plain studs on their heads that can be worn with any hairstyle, but each figure is shipped with the original hairstyle.
This study mainly focuses on mini-dolls hairstyle, or more specifically the hair colour, to explore whether the composition of the mini-dolls hair colour reflects the pursuit of diversity. As the LEGO mini-doll audience is mainly youth, it needs to be compared with real-life to explore whether the mini-doll constructs a world close to the real one and contributes to the construction of youth worldview. Secondly, it may be evident that LEGO started to experiment with new explorations in 2012 to tap into the female market. Therefore, the number of Sets produced by LEGO since 2012 will also be visualised and used to explore the construction of the LEGO world.
Data Source
The data for this markdown file, which downloads below are taken from https://rebrickable.com/downloads/ (@rebrickable2022).
📄inventories.csv: stores the unique id and set_num of each mini-doll.
📄inventory_sets.csv: inventory_id for a quick inventory search, set_num for a unique identification number for the set, this file is not very useful for mini doll research.
📄sets.csv: stores information about the year, name, theme, etc. of each set release, with no information related to mini dolls.
📄themes.csv: stores information on the theme of Lego sets.
📄inventory_minifigs.csv: id and figure number of the inventory miniatures.
📄minifigs.csv: stores images and names of each mini doll.
📄inventory_parts.csv: inventory_id can be combined with part_num in the file to help us find the exact part in the official sets.
📄parts.csv: stores information about the part, including name, number and material.
📄colors.csv: stores the colour codes (hex) of all the parts and the parts’ colour names, the ids of which we can use to link the datasets, an extremely important file in this study.
📄elements.csv: linking part_num and color_id together.
📄part_categories.csv: figure out the category id of mini dolls, like Minidoll Heads.
📄part_relationships.csv: provides relationships between parts – parent and child parts.
Research Question 1: How has the number of toy sets produced by LEGO changed since 2012?
1.Data Preparation
Once you have downloaded the required data file sets.csv, you need to filter the data in it–screen out the data prior to 2012. It is then a relatively simple process to count the number of sets released by LEGO for each year since 2021. The sifted data is then formed into a new file, years_product, which contains the year and the number of sets produced each year.
#Prepare Data
sub_sets = sets[sets$year >= 2012,] #Exclude data prior to 2012
number = table(sub_sets$year) #counting the years frequency, the data is the production of sets per year
years_product = data.frame(number)
colnames(years_product) = c('year','number') #Create a new data table(named years_product) with the year and production
2.Data Visualization (Line chart)
A line chart is a form of scatter plot where the x-axis is (usually) time and is used to represent the trend of a particular quantity over time(Anon, 2016). For the research on this question, I will also visualise the data using a line chart so that it can be seen more visually, the change in LEGO toy sets production from 2012 to date - the trend in its imperial territory.
#Generate Visualization
#Create a line chart
line_chart = ggplot(years_product, aes(x=year, y=number)) +
geom_line(aes(group=" "), color="black") +
geom_point(size=9, shape=21, fill="#720E0F")+
labs(title='Lego sets produced by year from 2012', x = "Years",y = "Number of sets",caption="——Data from http://rebrickable.com")+
theme(title = element_text(face = 'bold', size = 14), plot.margin = margin(0, 0, 1, 1, "cm"))
line_chart
3.Findings
Since 2012, there has been an overall listing trend in the number of sets produced by LEGO each year. There is a slight dip in 2015 and 2020, but it does not affect the overall trend. It is easy to see from the line graph that production of sets increased sharply in 2014, reaching a new plateau. And it was in that year (2014), that LEGO started using mini-doll for the Disney Princess theme and the Fusion theme, so the range should be a success in that it boosted production. Incidentally, for 2022 production, because this report is written in May 2022, the information on the number of LEGO kits produced in that year is not complete, which is what causes it to be so much smaller in number than in other years.
Because of the increase in numbers, LEGO players were also more exposed to this new toy, and it grew in influence. With this in mind, this report will continue to examine the distribution of mini-doll hair colours at the factory. From the tiny point of view of hair colour, we will assess whether it creates an “imbalance” or whether it builds a world close to reality.
Research Question 2: What does the LEGO mini-doll hair colour distribution look like?
1.Data Preparation
In connection with the question of the study, we need to find the hairstyle characteristics of each mini-doll (if it has hair), and to do this, we need to match the hair of each mini-doll with its colour. Before that, we need to find the minifigure - minifigure headwear - mini-doll hair. Unfortunately, however, this data cannot be found in a single file, so the data in these downloaded files above will need to be extracted and culled to build the dataset we need, along the lines and process below.
First, the three tables, part_categories, parts and inventory_parts, were merged based on the Minifigure headwear data.
Subsequently, the results were linked to the colours using part_num and the colour information. After getting the data for each headwear colour, the inventory_id was used to merge with the stock to create the table subminifigs. There was a little trouble in the next step; the inventory_minifig table and the inventory table did not have the same inventory_id value (I was curious about why this was). After several attempts, I found that the value in the set_num column in the result is unique for each mini-doll inventory entry, and the value matches the fig_num column in the minifigs. Therefore, these two columns were used and the final output dataset subminifig_char.
The data in the mini-doll required for this research project include the unique id of the stock, the id of the Hair, the hair colour name and colour value code, and the unique id of each mini-doll. However, the data currently matched out contains headwear information for Minifigure, so the 10565 data in subminifig_char needs to be filtered again. I chose to use the name.x column data to filter for mini-doll Hair as a keyword. The mini-doll table was then obtained (containing 566 pieces of data).
Although the mini-doll contains almost everything needed for this project, there are still problems because the same mini-doll characters have been recorded very differently. For example, Emma - Lavender Top, Bright Light Blue Skirt and Emma - Medium Blue Crop Top, Dark Pink Shorts, which are the same character, are recorded as two types of mini-doll. And this affects the hair colour statistics, so a separate column for the doll names is necessary. str_split_fixed() has been used in the next step to clean up the data, a function that separates the column names, and the character names are added to the mini-doll as new columns. So far, the dataset is fully prepared for visualisation. It contains information about all released mini-doll characters from Lego blocks. The columns with the doll name, hair name, hair colour and the corresponding code are extracted into a subset minidoll_hair.
#Prepare Data
minifig_id = part_categories$id[part_categories$name == "Minifig Headwear"]
minifig_parts = subset(parts, part_cat_id == minifig_id) #Retrieve all parts classified as "Minifig Headwear" from the table "parts" using part_id
subparts = subset(inventory_parts, select = inventory_id:quantity)
inventory_minifig = merge(minifig_parts, subparts, by = "part_num", all = FALSE)
color_minifig = merge(inventory_minifig, colors, by.x = "color_id", by.y = "id", all.x = TRUE, all.y = FALSE)
subminifigs = merge(color_minifig, inventories, by.x = "inventory_id", by.y = "id", all.x=TRUE, all.y=FALSE)
subminifig_char = merge(subminifigs, minifigs, by.x = "set_num", by.y = "fig_num", all = FALSE)
minidoll = subset(subminifig_char,grepl("^.*Minidoll Hair.*$",name.x)) #Select rows containing Minidoll Hair
minidolls = str_split_fixed(minidoll$name, "-| with| in| -|,", 2)
minidoll$doll_name= minidolls[1:581,1]
colnames(minidoll) = c('set_num','inventory_id','color_id','part_num','name_hair_part','part_cat_id','part_material','quantity','color_name','rgb','is_trans','version','name','num_parts','doll_name')
minidoll_hair = subset(minidoll, select = c(name_hair_part, color_name, rgb, doll_name))
colnames(minidoll_hair) = c('hairstyle','color','code','doll_name') #Separate columns with separators to form mini-doll names, then add them to the minidoll table and finally filter for the desired minidoll_hair table
Using the minidoll_hair table collated above, we can count how often each doll’s hair colour appears using the doll name and how often each colour appears using the colour name. Also, I have arranged these data in descending order to facilitate their use in other subsequent research questions.
#Subsequent processing of the data
##Frequency of occurrence of each doll's hair colour
dolls = aggregate(minidoll_hair$doll_name,list(minidoll_hair$doll_name, minidoll_hair$color, minidoll_hair$code),length)
colnames(dolls) = c('name','Hair color','code','appear_frequency')
dolls = dolls[order(dolls$appear_frequency, decreasing = TRUE),]
##Frequency of occurrence of each hair colour
color = aggregate(dolls$'Hair color',list(dolls$'Hair color', dolls$code),length)
colnames(color) = c('name','code','ratio')
color$'Hair color' = factor(color$name, unique(color$name))
color = color[order(color$ratio, decreasing = TRUE),]
2.Data Visualization (Pie chart & Bar chart)
The hair colour distribution of all the mini-dolls studied is the main idea of the second question. Therefore, the data table COLOR, which is further processed above, is available and contains the column Hair colour. The correct way to visualise the data is the primary consideration before visualisation operations. Pie and bar charts are commonly used about two-dimensional fields and angles to achieve an intuitive and practical effect to help us look at tabular data (Gao, 2012). As this is a visualisation chart about how much data is available in a particular category, pie and bar charts are considered first.
When using ggplot, the colours in bar & pie charts are randomly matched (where no colours are specified). The object of this study contains colours, and to be able to represent more visually how many and what proportion of each colour is present, we will specify the colour (for example, red is red and will not be replaced by other colours), which will bring out the visual effect better. The 25 colours in the color table will then be matched one by one. With this in mind, I have done the following visualisation operations.
#Specify Colour
color_pattern = c("Orange" = "#FE8A18","Dark red" = "#720E0F","Reddish Brown" = "#582A12","Black" = "#05131D","Bright Light Yellow" = "#FFF03A","Dark Brown" = "#352100","Light Aqua" = "#ADC3C0","Dark Azure" = "#078BC9","Lavender" = "#E1D5ED","Dark Orange" = "#A95500", "Yellowish Green" = "#DFEEA5","Magenta" = "#923978","Tan" = "#E4CD9E", "Dark Pink" = "#C870A0", "Red" = "#C91A09", "White" = "#FFFFFF", "Dark Green" = "#184632" ,"Dark Blue" = "#0A3463", "Medium Nougat" = "#AA7D55", "Medium Azure" = "#36AEBF", "Bright Green" = "#4B9F4A", "Bright Light Orange" = "#F8BB3D", "Bright Light Blue" = "#9FC3E9", "Dark Red" = "#720E0F", "Dark Purple" = "#3F3691", "Medium Lavender" = "#AC78BA")
#Generate Visualization
#Create a pie chart
pie_chart = ggplot(color, aes(x = 2, y = ratio, fill = `Hair color`))+ geom_bar(stat = "identity")+ scale_fill_manual(values = color_pattern)+ coord_polar(theta = "y", start = 0, direction = -1)+ xlim(1, 3)+ labs(title = 'Hair Color Distribution Among LEGO mini-doll', caption = "——Data From Rebrickable.com")+ theme_void()+ theme(title = element_text(face = 'bold', size = 14), plot.margin = margin(0, 0, 1, 1, "cm"),legend.text = element_text(size = 7.5))
pie_chart
#Generate Visualization
#Create a bar chart
bar_chart = ggplot(color, aes(x = reorder(name, ratio), y = ratio, fill = `Hair color`))+ geom_bar(stat = "identity")+ scale_fill_manual(values = color_pattern)+ scale_y_continuous(expand = c(0, 1), limits = c(0, 40))+ labs(title='Hair Color Distribution Among LEGO mini-doll', x = "Color of Hair", y = "The number of characters with this hair color",caption = "——Data From Rebrickable.com")+ theme(title = element_text(face = 'bold', size = 14),plot.margin = margin(0, 0, 1, 1, "cm"), legend.text = element_text(size = 7.5))
bar_chart = bar_chart + coord_flip()
bar_chart
I then did a visualisation about the 10 most common minidoll characters in different Lego toys and the colour of their hair. They have the largest circulation and are therefore the most likely to create stereotypes for players.
doll_hair = head(dolls,10) #Filter the top 10 doll hair colours using files that have been sorted in descending order
#Generate Visualization
#Create a bar chart
bar_chart = ggplot(doll_hair, aes(x = reorder(name, appear_frequency), y = appear_frequency, fill = `Hair color`))+ geom_bar(stat = "identity")+scale_fill_manual(values = color_pattern)+ scale_y_continuous(expand = c(0, 1), limits = c(0, 90))+ labs(title='Top 10 LEGO mini-doll characters in Hair Colour', x = "Name", y = "The number of characters with this hair color",caption = "——Data From Rebrickable.com")+ theme(title = element_text(face = 'bold', size = 14),plot.margin = margin(0, 0, 0, 1, "cm"), legend.text = element_text(size = 7.5))
bar_chart = bar_chart + coord_flip()
bar_chart
3.Findings
As you can see from the three visualisations, most mini-doll hair colours are the more common hair colours in real life. Black, bright light yellow and brown make up the hair colour of most mini-dolls. Only about a quarter of LEGO’s mini-dolls have colourful hair. LEGO’s explanation for this is that they will be enhancing the diversity of the mini-doll line’s appearance to bring the LEGO world closer to the real thing (Ramblingbrick, 2017). I have no quarrel with that statement, and for the hair aspect, LEGO has done an excellent job - it has recreated the real-life hair colour proportions as close as possible.
The colour of hair is one of the markers of race. Generally speaking, yellow people have black hair, black people have reddish-brown hair, and white people have primarily yellow hair, and these colours can be clearly found in visualisations. There is an opinion that LEGO could do better, as there is a more significant amount of black than yellow than reddish-brown, leading to a potential risk of racism. According to the percentage of hair colours globally, 80% of people have black hair, but the rate of black in LEGO toys is not that large. It’s not hard to guess that Lego has chosen a compromise, ensuring that black is the most popular hair colour but that it doesn’t differ much from other hair colours, creating a lego world that is both from life and higher than life. In addition, by looking at the charts, LEGO also offers some rare hair colours, such as green and purple, which is good. It may help youngsters understand this diverse society better - hair is also a reflection of personality.
Conclusion
Since the launch of the mini-doll in 2012, LEGO has been steadily increasing its toy production, with the number of sets released each year steadily increasing, inevitably due to the female market that the mini-doll has developed. As the mini-doll has risen in the LEGO world, some of its problems have also been identified, including gender and race issues. Analysing the distribution of mini-doll hair colours more realistically reproduces real-world hair colours’ composition. Also, it offers players a small number of novelty hair colours to personalise their mini-doll.
Finally, the author carefully selected the choice visualisation chart types for each set of data in this study. The line chart is used to reflect the change in quantity over the years, the bar chart is used to remember how many of each colour were present, and the pie chart is used to reflect how much of each colour was present. However, the shortcoming is that only the most basic three types of charts were used for this visualisation. More visualisation charts could be used in future studies to enhance the user experience, such as bubble charts, grid charts, radar charts, etc.
Reference
Brickpedia (no date). ‘Mini-doll figure’. Available at: https://brickipedia.fandom.com/wiki/Mini-doll_figure (Accessed: May 18, 2022).
Gao, S. (2012) Manage visual development of information. Tianjin: Tianjin University.
Anon (2016) line chart. A Dictionary of Social Research Methods
Gutwald, R. (2017) ‘Girl, LEGO® Friends is not your Friend! Does LEGO® Construct Gender Stereotypes?’, in LEGO® and Philosophy. [Online]. Hoboken, NJ, USA: John Wiley & Sons, Inc. pp. 103–112.
Reich, S. M. et al. (2017) Constructing Difference: Lego® Set Narratives Promote Stereotypic Gender Roles and Play. Sex roles. [Online] 79 (5-6), 285–298.
Ramblingbrick (2017). ‘Do you know who your Friends are? (The official word on the new look for LEGO Friends in 2018)’, The Rambling Brick. 7 December. Available at: https://ramblingbrick.com/2017/12/07/do-you-know-who-your-friends-are-the-official-word-on-the-new-look-for-lego-friends-in-2018/ (Accessed: May 18, 2022).