LEGO sets are a buildable toy that contains many interlocking pieces known as bricks, that have commercialized to include many popular themes and characters. The name LEGO is an abbreviation of Leg Godt meaning “Play Well” and the company was found by Ole Kirk Christiansen, a Danish carpenter, in 1936. Lego has a wide variety of themes, with each theme designed for particular age group or interest. Lego sets tend to be under 100 dollars, but some larger sets can cost significantly more, such as the 849.99$ Millenium Falcon released in 2017. Fans of Lego sets often debate what drives prices more, whether it is the theme, the piece count, or the number of minifigures (or specific minifigures exclusive to one set). The research question that we will be examining is “What can we learn about prices based on the price of pieces and theme of the set?”. Examining this question is relevant because it reflects how real world analysts examine business, marketing, and pricing strategies, and how they work with complex themes such as inflation. These kinds of observational datasets are used every day by people who work for the companies who examine trends to inform policy, and by people who track and report companies’ pricing and marketing decisions. This set provides a very basic representation of how companies analyze pricing and marketing every day.
This dataset compiles a list of 14,936 LEGO sets, first released in
1975. We have shortened the dataset into 4,328 observations, removing
various NAs in mostly older sets and removing variables we are
uninterested in and removing sets below 50 pieces (these are small
unique products like keychains that skew factors like price). This is
the main dataset we will be using and the time frame goes from the year
1991 to 2023. Information ranges from year released, number of pieces,
retail price, current resale price, number of minifigures included, etc.
This data originates from 2 large, reputable websites that resell and
document LEGO sets, that being ‘Brickset’ and ‘Bricklink’. In our
narrowed version, there are 7 variables, including the key variables we
will be examining, Price, Pieces,
Theme, and Minifigures. Price
represents the initial prices in dollars, Pieces counts the
number of bricks in each recorded set, Theme represents the
category of LEGO sets, such as licensed media like “Jurassic Park” or
original themes like “Ninjago”, and Minifigures represents
the number of small plastic figures made of LEGO bricks included in the
set.
legosets2 |>
filter(Pieces <= 6000) |>
mutate(Theme = fct_collapse(Theme, "Star Wars" = c("Star Wars"), "City" = "City", other_level = "All Other", n = 5)) |>
ggplot(aes(x = Pieces, y = Price, color = Theme)) +
geom_point() +
geom_smooth(method = lm, se = FALSE, linetype = 5) +
theme_classic() +
scale_color_manual(values = c("darkorange", "mediumblue", "black")) +
labs(title = "Figure 1: Price of Themed LEGO Sets Across Number of Pieces in Sets Below 6000 Pieces", caption = "")
## `geom_smooth()` using formula = 'y ~ x'
Figure 1 shows the price of lego sets as the number of pieces increases. As expected, there is a strong positive relationship between piece count and price. However, exploratory data analysis showed that 2 themes had an increased price for each piece, that being City and Star Wars. Lego sets above 6000 pieces were removed for their high influence, primarily because only the Star Wars makes these large sets. We can see than the City theme has the highest slope, followed by Star Wars, and then all other sets combined. Because the range of pieces is so different between these themes, a new exploration into price per piece is required to visualization the relationship better.
legosets2 |>
filter(PricePerPiece <= .4) |>
mutate(Theme = fct_collapse(Theme, "Star Wars" = c("Star Wars"), "City" = "City", other_level = "All Other", n = 5)) |>
ggplot(aes(x = PricePerPiece, color = Theme)) +
geom_density_ridges2(aes(y = Theme, fill = Theme), color = "black", show.legend = FALSE) +
theme_classic() +
scale_fill_manual(values = c("darkorange", "darkblue", "darkgrey")) +
labs(x = "Price per Piece", title = "Figure 2: Distribution of Price per Piece of Different LEGO Themes", caption = "")
## Picking joint bandwidth of 0.00728
Figure 2 shows us the distribution of the price per piece of different LEGO sets by theme, confirming that the value of price over piece is higher for City and Star Wars themes than for all other themes, despite all being in the range of around 10 to 15 cents on average. This faceted distribution shows than even though the City theme has sets with fewer pieces, the price per piece is higher than Star Wars. These visualizations thus far have only shown sets across all time, however, and the changing of this price per piece over time may tell a more complex story.
legosets2 |>
mutate(Theme = fct_collapse(Theme, "Star Wars" = c("Star Wars"), "City" = "City", other_level = "All Other", n = 5)) |>
group_by(Year, Theme) |>
summarize(PricePerPiece = mean(PricePerPiece)) |>
filter(PricePerPiece <= .3) |>
ggplot(aes(x = Year, y = PricePerPiece, color = Theme)) +
geom_point(size = 1) +
geom_smooth(method = lm, se = FALSE, linetype = 5, linewidth = .5) +
theme_classic() +
scale_color_manual(values = c("darkorange", "mediumblue", "black")) +
labs(y = "Price per Piece", title = "Figure 3: Average Price per Piece of Different LEGO Themes by Year", caption = "")
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
## `geom_smooth()` using formula = 'y ~ x'
Figure 3 demonstrates that the average price per piece of LEGO sets is fairly consistent over time, around 13 cents per piece. However, certain themes increase over time, and exploratory data analyses found that Star Wars and City were the only two to do so. This graph visualizes with a simple linear regression how the average price per piece changes depending on theme, with every theme but Star Wars and City combined keeping a horizontal slope from 2000 to 2023. City sets tend to have a higher average price per piece than Star Wars sets, but the rate the price increases seems to be around the same. Both sets are now above average as of 2023.
legosets2 |>
filter(Theme == "Star Wars") |>
group_by(Year, Theme) |>
summarize(PricePerPiece = mean(PricePerPiece)) |>
ggplot(aes(x = Year, y = PricePerPiece)) +
geom_point(color = "darkblue", size = 1) +
geom_line(aes(color = Theme)) +
geom_vline(xintercept = c(2012.92, 2020.4), linetype = 2, colour = "red") +
geom_vline(xintercept = c(2015.92, 2017.92, 2019.92), linetype = 2) +
geom_text(aes(x = 2012.43, y = .102, label = "Disney Aqquires Star Wars", angle = 90), colour = "red", size = 4.5) +
geom_text(aes(x = 2015.43, y = .103, label = "The Force Awakens Release", angle = 90), colour = "darkblue", size = 4.5) +
geom_text(aes(x = 2017.43, y = .098, label = "The Last Jedi Release", angle = 90), colour = "darkblue", size = 4.5) +
geom_text(aes(x = 2019.43, y = .102, label = "Rise of Skywalker Release", angle = 90), colour = "darkblue", size = 4.5) +
geom_text(aes(x = 2020.9, y = .097, label = "COVID-19 Pandemic", angle = 90), colour = "red", size = 4.5) +
theme_classic() +
scale_color_manual(values = c("darkblue")) +
labs(y = "Price per Piece", title = "Figure 4: Average Price per Piece of the Star Wars LEGO Theme by Year", caption = "CDC. (2024, July 8). COVID-19 Timeline. Centers for Disease Control and Prevention; CDC.
https://www.cdc.gov/museum/timeline/covid19.html
Wikipedia Contributors. (2020, April 7). List of Star Wars films. Wikipedia; Wikimedia Foundation.
https://en.wikipedia.org/wiki/List_of_Star_Wars_films")
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
Figure 4 demonstrates that the average price per piece of Star Wars themed lego sets has increased over time from 2000 to 2023. Ranging from around 8 cents a piece to 15 cents a piece, the price started low before 2005 and has increased to around 13 cents a piece in 2023. The price per piece appeared to increase at a relatively constant rate from 2005 to 2010, reaching a peak of around 14 cents. After Disney bought the Star Wars brand in 2012, we can see a high amount of fluctuation, starting with a rapid increase until an all time peak around 15 cents in 2016, around the same time the first major Star Wars movie released in decade, The Force Awakens came out in December 2015. The price then continued to drop with each release of the trilogy (The Last Jedi, The Rise of Skywalker), until a second spike in price likely due to the declaration of the COVID-19 Pandemic in March, 2020.
legosets2 |>
filter(Theme == "City") |>
group_by(Year, Theme) |>
summarize(PricePerPiece = mean(PricePerPiece)) |>
ggplot(aes(x = Year, y = PricePerPiece)) +
geom_point() +
geom_line() +
geom_smooth(se = FALSE, aes(color = Theme)) +
theme_classic() +
scale_color_manual(values = c("darkorange")) +
labs(y = "Price per Piece", title = "Figure 5: Average Price per Piece of the City LEGO Theme by Year", caption = "Morgan, C. (2024, August 19). Why Lego sets are so expensive. Business Insider; Insider.
https://www.businessinsider.com/lego-sets-so-expensive-star-wars-2024-8")
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Figure 5 visualizes the change over time of the average City theme price per piece, and shows a sharp increase since 2015. The price peaked in 2022 at around 17 cents. Prior to 2015, the price seems to fluctuate between extremely low and extremely high between 10 and 16 cents respectively. The cause for these fluctuations isn’t as immediately obvious per year, as these are original, non-licensed sets that are not dependent on media. What we do know about City sets, however, is that they are marketed towards younger audiences and feature simpler builds than other themes. According to Business Insider, the younger the play level (and subsequently the larger the pieces and simpler the build), the higher the price per piece.
Our explorations have shown that there are numerous complex factors that affect the price of LEGO sets, but there are easily quantifiable factors including number of pieces and theme (specifically Star Wars and City). We find that the price per piece of LEGO has actually remained consistent on average from 2000 to 2023, and that only City and Star Wars themed sets increase per year. For the City theme, the factors that drive it are hard to diagnose but likely include the target audience and size of pieces. For Star Wars themed sets, we can assume much more given the different state that the Star Wars brand is in. We find that after Disney buys Star Wars, the price per piece is highly volatile before decreasing after each film’s release, before spiking during the COVID-19 Pandemic. This information is highly valuable to both parents and fans making decisions about gift-giving and collecting for different themes, and which sets are a better value.