LEGO is a popular brand of toy building bricks. They are often sold in sets to build a specific object. Each set contains several parts in different shapes, sizes, and colors.
From the LEGO.com website, on their history:
“The name ‘LEGO’ is an abbreviation of the two Danish words “leg godt”, meaning “play well”. It’s our name and it’s our ideal. The LEGO Group was founded in 1932 by Ole Kirk Kristiansen. The company has passed from father to son and is now owned by Kjeld Kirk Kristiansen, a grandchild of the founder. It has come a long way over the past almost 90 years - from a small carpenter’s workshop to a modern, global enterprise that is now one of the world’s largest manufacturers of toys.
The LEGO brick is our most important product. We are proud to have been named “Toy of the Century” twice. Our products have undergone extensive development over the years – but the foundation remains the traditional LEGO brick. The brick in its present form was launched in 1958. The interlocking principle with its tubes makes it unique and offers unlimited building possibilities. It’s just a matter of getting the imagination going – and letting a wealth of creative ideas emerge through play.”
While exact sales numbers for sets or themes was not provided for the analysis, Net Profit information for 2000 through 2022 was available via Annual Financial Statements on the LEGO website. In these figures, it shows a steep incline in these profits between 2004 and 2015. It is believed that this is due to the introduction of movies and sports-related themes. Also, it is presumed that this increase is due to LEGOs being sold in new countries and markets. LEGOs are not just for kids anymore. There are now more complex sets, targeting older teens and adults. With these more complex sets, the price of these is higher than LEGO customers have seen in the past.
In 2022, LEGO announced that they were going to increase the price of existing sets. The reason for this increase is based on inflation of raw materials and operational costs. Part of the LEGO business model has been to have a fixed price for a set for the lifetime of that set. Brickset published an article detailing the price increases for the 100 sets in North America in the first week of August 2022. In the US, the price increases were between 5% to 25%. In Canada, they saw price increases from 3% to 32%. In an article by BrickNerd, David Schefcik takes an in-depth economic analysis look as to why LEGO decided to increase the prices. He uses a simple regression model to prove that the price increases are almost a perfect slope (0.9943) comparing the price increases to the inflation-adjusted original price. Showing that LEGO changed the prices to account for inflation over time, and not to take advantage of the consumers during time of inflation.
How does LEGO decide on innovative designs, sets, and themes? The Design teams engage with the public directly in-stores during holiday seasons, by conducting interviews with children, and using web-based platforms where customers can submit ideas and designs to the Designers. Design teams are stationed around the globe to better design products for particular markets. Design teams are in Denmark (headquarters), Japan, Germany, UK, and Spain.
The goal of this analytic analysis is to help current and potential investors have a better understanding of how the LEGO line has evolved in, sets and themes, over the years.
By answering the following questions, we will be able to illustrate past development sets and themes.
Utilizing a variety of analytic methods, analyze the data sets by using the below techniques. Analytic tools used will be Excel, Rstudio, Tableau Prep, and Google Colabs (Python).
Rebrickable created a database which holds information on which parts are included in different LEGO sets and which sets were in which themes. It was originally compiled to help people who owned LEGO sets figure out what other sets they could build with the pieces they already owned. This dataset has the LEGO parts, sets, colors, themes, and inventories of every official LEGO set in the Rebrickable database. This data set is automatically updated daily and contains 12 data files and a schema map for how to join the files together. The files used in this project are current as of February 7, 2023. Not all files will be used in this analysis project.
Statistic has compiled the LEGO Group’s net profit from 2000 thru 2021, then extracted the 2022 net profit from the LEGO Group’s 2022 Annual Financial Report.
Exchange-Rates.org is a website that converts DKK to USD. The data in the LEGO Net Profit table has a column for the Net Profit in USD and this site was used for the conversion. As of 3/19/2023, the current conversion is 1 DKK equals 0.1433 USD.
colors.csv): Name of the
colors of the bricks by Color IDinventories.csv):
Assigns Set #’s to the Inventory Partsinventory_parts.csv): Used to join the parts, inventories,
and colors tablespart_categories.csv): The Category the parts are inparts.csv): Used to join
from the part categories and inventory parts tablessets.csv): Used to join
the themes and inventories table. Also, indicates the year the set was
published and the name of the setthemes.csv): Name of the
theme the sets are inLEGO Net Profit.csv): The Net Profits by year in DKK and
USDTableau Prep was used to Join tables, remove columns, and clean the data. The final data set (not all the fields from all the tables were used) is clean and there does not seem to be any erroneous data values.
General Quality Notes: For most of the project a joined data table was used. However, there was also analysis performed on some of the original data tables. There does not seem to be any issues with the data in the original data files nor in the post-joined data set.
Tableau Prep was used to Join tables, remove columns, and clean the data. The final data set (not all the fields from all the tables were used) is clean and there does not seem to be any erroneous data values.
Cleaning - Removed color_name = “[Unknown]” as there are only 7 rows, and these are not related to parts needed for sets. - Remove build year 2023, as it is only 2 months out of the year.
Column Renaming – renamed fields in the base tables so the joins are cleaner, and the fields are easily identified throughout the process.
Fields Removed – fields were removed due to duplication in the joining process or if the field has been deemed not needed for the analysis.
colors table - is_trans
inventories table - version
inventory_parts table - img_url - is_spare - quantity
themes table - parent_id
sets table - img_url - num_parts
Join 1 – left join sets to themes - removed theme_id-1
Join 2 – inner join parts to part_categories - removed part_cat_id-1
Join 3 – left join inventory_parts to colors - removed color_id-1
Join 4 – inner join Join 3 to inventories - removed inventory_id-1
Join 5 – inner join Join 4 to Join 2 - removed part_num-1
Join 6 – inner join Join 5 to Join 1 - removed set_num-1, color_id, part_cat_id, inventory_id, rgb, theme_id
In RStudio the following packages were installed:
In RStudio the following libraries were used:
In Python, via Google Colabs, the following items were imported:
colors <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/colors.csv")
colnames(colors)[2] = "color_name"
sets <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/sets.csv")
colnames(sets)[2] = "sets_name"
parts <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/parts.csv")
colnames(parts)[2] = "parts_name"
inventory_sets <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/inventory_sets.csv")
inventory <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/inventories.csv")
inventory_parts <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/inventory_parts.csv")
themes <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/themes.csv")
colnames(themes)[1] = "theme_id"
colnames(themes)[2] = "themes_name"
part_category <- read.csv("C:/Users/justt/Desktop/School/624/Project/Orig Data/part_categories.csv")
colnames(part_category)[2] = "pc_name"
joined_data <- read.csv("C:/Users/justt/Desktop/School/624/Project/Lego Tableau Prep v3.csv")
LEGO has used a number of assorted colors in their pieces over the past 70 years. In the color data set, there are 217 colors used for pieces.
num_colors <- length(unique(colors$color_name))
colors <- colors %>% mutate(rgb = paste0("#",str_trim(rgb)))
my_color <- colors$rgb
names(my_color) <- my_color
By looking at the colors used in the first 40 years of piece creations, it is shown that 532 different colors are used.
options(warn=-1)
options(repr.plot.width = 10, repr.plot.height = 5)
set_color <- sets %>%
left_join(inventory, by = "set_num") %>%
left_join(inventory_parts, by = c("id" = "inventory_id")) %>%
left_join(colors, by = c("color_id" = "id"))
this_year <- format(Sys.Date(), "%Y")
cyear <- as.numeric(this_year)
scolor <- set_color %>% select(color_name,rgb,year) %>% mutate(ys = cyear-year)
color_years <- scolor %>% select(color_name,ys,year,rgb) %>% group_by(color_name,rgb) %>% summarise(mx_year = max(ys),mn_year = min(year)) %>% arrange(desc(mx_year)) %>% filter(!is.na(rgb)) %>% filter(mx_year >= 40 & !color_name == "[No Color]")
label_data <- data.frame(id = seq(1:nrow(color_years)),
lbl = paste(color_years$color_name,color_years$mx_year," Years"),
value = color_years$mx_year)
no_bars <- nrow(label_data)
cl <- color_years %>% filter(!is.na(rgb)) %>% filter(!color_name == "[No Color]") %>% select(rgb)
cl <- cl$rgb
names(cl) <- cl
angle = 90-360*(label_data$id-0.5) / no_bars
label_data$hjust <- ifelse( angle < -90, 1, 0)
label_data$angle <- ifelse(angle < -90, angle + 180, angle)
ggplot(color_years,aes(x = color_name,y = mx_year,fill = color_name)) + geom_bar(stat = "identity")+
scale_fill_manual(values=names(cl)) + ylim(-60,90) + coord_polar(start=0) +
theme(
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
legend.position = "none",legend.text = element_text(size = 7) )+
geom_text(data = label_data, aes(x = id, y = value+10, label = lbl, hjust=hjust), color="blue", fontface = "bold",alpha = 0.6, size = 3, angle = label_data$angle, inherit.aes = FALSE) +
labs(title = "LEGO Colors the first 40 Years")
Here is a look at the colors used in the past 40 years of piece creation. Looks like a jumbled mess as there are 2,190 colors used in new set creations. There is quite a difference in the number of colors used in the past 40 years (2,190) versus the first 40 years (532).
color_years_recent <- scolor %>% select(color_name,ys,year,rgb) %>% group_by(color_name,rgb) %>% summarise(mx_year = max(ys),mn_year = min(year)) %>% arrange(desc(mx_year)) %>% filter(!is.na(rgb)) %>% filter(mx_year < 40 & !color_name == "[No Color]")
label_data1 <- data.frame(id = seq(1:nrow(color_years_recent)),
lbl = paste(color_years_recent$color_name,color_years_recent$mx_year," Years"),
value = color_years_recent$mx_year)
no_bars <- nrow(label_data1)
cl1 <- color_years_recent %>% filter(!is.na(rgb)) %>% filter(!color_name == "[No Color]") %>% select(rgb)
cl1 <- cl1$rgb
names(cl1) <- cl1
options(repr.plot.width = 10, repr.plot.height = 5)
angle = 90 - 360 * (label_data1$id - 0.5) / no_bars
label_data1$hjust <- ifelse(angle < -90, 1, 0)
label_data1$angle <- ifelse(angle < -90, angle + 180, angle)
ggplot(color_years_recent,aes(x = color_name,y = mx_year,fill = color_name)) + geom_bar(stat = "identity") +
scale_fill_manual(values = names(cl1)) + ylim(-40,60) + coord_polar(start = 0) +
theme(
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
legend.position = "none",legend.text = element_text(size = 7) ) +
geom_text(data = label_data1, aes(x = id, y = value + 10, label = lbl, hjust = hjust), color = "blue", fontface = "bold",alpha = 0.6, linewidth = 3, angle = label_data1$angle, inherit.aes = FALSE) +
labs(title="LEGO Colors the Past 40 Years")
Now, here is look at the colors used in just the past 20 years of set creation. There are 1,459 colors used in just the past 20 years, which is almost 3 times the number of colors used in the first 40 years.
color_years_recent <- scolor %>% select(color_name,ys,year,rgb) %>% group_by(color_name,rgb) %>% summarise(mx_year = max(ys),mn_year = min(year)) %>% arrange(desc(mx_year)) %>% filter(!is.na(rgb)) %>% filter(mx_year < 20 & !color_name == "[No Color]")
label_data1 <- data.frame(id = seq(1:nrow(color_years_recent)),
lbl = paste(color_years_recent$color_name,color_years_recent$mx_year," Years"),
value = color_years_recent$mx_year)
no_bars <- nrow(label_data1)
cl1 <- color_years_recent %>% filter(!is.na(rgb)) %>% filter(!color_name == "[No Color]") %>% select(rgb)
cl1 <- cl1$rgb
names(cl1) <- cl1
options(repr.plot.width = 10, repr.plot.height = 5)
angle = 90 - 360 * (label_data1$id - 0.5) / no_bars
label_data1$hjust <- ifelse(angle < -90, 1, 0)
label_data1$angle <- ifelse(angle < -90, angle + 180, angle)
ggplot(color_years_recent,aes(x = color_name,y = mx_year,fill = color_name)) + geom_bar(stat = "identity") +
scale_fill_manual(values = names(cl1)) + ylim(-40,60) + coord_polar(start = 0) +
theme(
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
legend.position = "none",legend.text = element_text(size = 7) ) +
geom_text(data = label_data1, aes(x = id, y = value + 10, label = lbl, hjust = hjust), color = "blue", fontface = "bold",alpha = 0.6, linewidth = 3, angle = label_data1$angle, inherit.aes = FALSE) +
labs(title="LEGO Colors the Past 20 Years")
In the parts data set, there are 66 unique parts categories and 32,563 unique part pieces.
LEGO started making parts in 1949. In the beginning there were 8 parts created in 1949. There were 4,112 parts created in 2022.
LEGO has had a steady growth in the number of pieces created. There have been a few notable spikes in the mid 1960’s, a nice bump in the early 2000’s that lasted a few years. Then see a sharp incline around 2014, which will be discussed further in the presentation.
It’s interesting that the LEGO company produces so much more these days than just bricks. Some of these parts are parts of a set, such as stickers for a house set, but many of these are stand-alone parts. They are creating Minifig, accessories, stickers, and trading cards.
This shows the top ten part categories per set by the number of unique parts used in the sets. This does not represent the number of total pieces used in the sets. The two largest categories are Plates and Plates Special with over 8,000 sets using these parts categories
In the sets data set, there are 16,155 unique sets.
As shown in this chart that the sets with the greatest number of unique parts are the Basic Building Set and Police Station with right at, or just over, 3,000 parts per set.
This graph shows the number of unique parts created has been increasing and pretty much the same rate as the unique number of sets over the years. There is a noticeable spike in the 1960’s where new parts were created much more so than new sets created.
While the number of pieces (total not unique) per set has increased year over year, it is still shown that most of sets are still less than 1,500 pieces per set. There are a few sets that are over 6,000 pieces and the max is just under 12,000 pieces. Based on a simple linear regression model, over the last 74 years, LEGO sets have gotten larger by just over three pieces a year, 3.04.
Looking at the yellow line, it shows that the mean number of pieces has not really changed over time. Similarly, the green line shows that the median number of pieces has not really changed either.
This graph shows the top 10 sets with the greatest number of pieces in the set. The World Map and the Eiffel Tower are the two sets with the greatest number of parts. And as expected, there are a couple of Star Wars related sets on here..
names_with_most_parts <- sets %>% select(sets_name, year, num_parts) %>% arrange(desc(num_parts))
top_10_names <- names_with_most_parts[1:10,]
ggplot(top_10_names, aes(x = reorder(sets_name,-num_parts), y = num_parts, fill = sets_name)) +
geom_bar(stat = "identity") +
theme_fivethirtyeight() +
theme(axis.text.x = element_text(angle = 75, face = "bold", hjust = 1), legend.position = "none") +
geom_label(aes(label = num_parts)) +
ggtitle("Top Ten Sets with Most Pieces")
The below chart shows that LEGO created less than 100 different sets per year during its first 15 years of operation. The first 45 years, or so, LEGO had some steady growth in its product offering. It was really in the mid-1990s that the number of sets created by the company increased dramatically. There was also a brief decline in the early 2000s and a peak in 2021.
Around 2014, there was significant growth in the number of sets created. This is mostly due to the release of the first LEGO movie, beginning a resurgence in kids, and adults, desiring new LEGO sets, in new types of themes.
Understanding LEGO Themes vs. LEGO Sets
A LEGO set is a particular box of LEGO or product. Therefore, a single theme typically has many different sets.
There are 368 unique themes that have been created throughout the years.
Here it is shown that LEGO only had 2 themes during the first few years in which new sets were created, Supplemental and System. However, just like the number of sets, the number of themes expanded over the years.
This shows how the number of sets and themes created have been on an upward trend except for around 2008, where numbers were lower. Could the dip in 2008 be related to lack of ingenious creativity or a reassessment of the company and a new direction? Ramping up related to the new movie that came out in 2014?
LEGO has licensed with many hit franchises from Star Wars to Super Mario, and many others to create sets and themes based on those franchises. A couple of these themes like Star Wars, Technic, or Duplo are what many people think of when they think of LEGO.
This graph shows the LEGO themes with the greatest number of pieces. Not too surprising that Technic and Star Wars are at the top of the list. These themes have over 100,000 more pieces than the third place of Friends (not the TV series).
options(repr.plot.width = 10, repr.plot.height = 4)
themes <- themes[,-3]
set_themes <- themes %>% left_join(sets,by = "theme_id")
set_themes %>% group_by(themes_name) %>% summarise(set_part_cnt = sum(num_parts)) %>% arrange(desc(set_part_cnt)) %>% head(25) %>% ggplot(aes(x = reorder(themes_name,set_part_cnt),y = set_part_cnt,fill = themes_name,group= 1)) + geom_line(stat = "identity") + geom_point() + scale_color_manual(values = names(my_color)) + geom_label_repel(aes(label = themes_name),size = 2) + theme(legend.position = "none",axis.text.x = element_blank()) + labs(title = "LEGO Sets with Maximum Pieces Per Theme - Top 25",x = "Theme", y = "# of Pieces")
Here is a list of themes with the fewest # of sets. These should be reviewed to determine if they are new themes that are in the process of being built out, or maybe there just weren’t enough sales for them to be built out. This cannot be determined with this data set. But could be good for future analysis ideas.
The goal of this analytic analysis was to help current and potential investors have a better understanding of how the LEGO line has evolved in, sets and themes, over the years.
Below are the original questions for this analysis.
Utilized a variety of analytic methods to analyze the data sets by using the below techniques. Analytic tools used were Excel, Rstudio, Tableau Prep, and Google Colabs (Python).
The analysis was limited to historical fact analysis and not predictive or association analysis like was planned. Without having sales information related to how many sets are sold and when, it limited the final ability to present actionable information.
By exploring the LEGO’s color data set, we can see that the number of unique colors used in the past 20 years is almost 3 times that of what was used for all the first 40 years of set creations. This increase of color variations used helps the LEGO company create more creative, diverse, and colorful sets. Not much can really be gleaned from this data other than more color variations being used in sets in the past 20 years. Plus, the visuals are pretty cool.
By exploring the LEGO’s parts data set, LEGO has had a steady growth in the number of pieces used in sets created. There have been a few notable spikes in 1963, then another one starting in 1996 where there were over 10,000 pieces being used. We then see a sharp incline around 2014, which will be discussed further in the presentation, where the pieces exceed 40,000 pieces.
Duplo, Quatro, and Primo parts categories are the highest number of parts in the category. While Plates and Plates Special are the largest category with the greatest number of sets using pieces in these categories.
By exploring LEGO’s sets data, the number of sets created, and the number of parts used in sets each year have been mostly on a similar upward trend. Again, there is that outlier spike in the number of parts created in 1963, which is when LEGO started including accessories and stickers for sets. The sets with the greatest number of unique parts are the Basic Building Set and Police Station with right at, or just over, 3,000 parts per set. The sets with the greatest number of pieces used are the World Map, the Eiffel Tower, and a couple from some of the movie franchises, like Star Wars and Harry Potter.
In 2014, we saw that there was an increase in the number of parts used and the number of sets created. This increase is related to the release of the LEGO movie. Using other movie franchises is good, but nothing like having their own franchise that they can use to help make the LEGO brand relevant to kids and adults alike.
In exploring the themes data set, we can see similar growth trends following set and parts. Around 2008, there was a dip in the number of themes used in sets created. It would be interesting to get further insight as to what caused this lag. Was it related to a lack of ingenious creativity or a reassessment of the company and a new direction? We are not able to determine this with the provided data.
It is also shown that the themes with the greatest number of sets created, and number of pieces used, were Technic and Star Wars. Technic sets provide an advanced and complex building experience based on real-life vehicles big and small like sportscars, motorcycles and construction vehicles. The Star Wars theme continues its growth with the newer movies that have come out and with the Mandalorian series.
Finally, information was provided about the themes with the least number of sets. It is possible that these are new themes that are ramping up. Or maybe there just weren’t enough sales for them to be built out. This cannot be determined with this data set. But could be good for future analysis ideas.
The most that can be determined with the provided data is that in 1963, more parts started being included in new sets created. In 2014, thanks to the help of releasing a LEGO owned franchised movie, the number of sets created increased, so did the number of parts used in the sets, and the number of sets in themes increased.
LEGO has proven themselves a dominating force in children, and now adult toys. With their expansion into various franchises, educational and developmental sets, they are proving they are still as relevant today as they were when they created their first blocks.
I do wonder though, why LEGO hasn’t branched into some of the sports league themes? Have they and the data is middle of the road that it was present in the min and max stats we provided? Or will the sports leagues not let LEGO into their market?