LEGO®

Analyzing the price, brick by brick

Joel Sandbäck, Alva Rolandson, Klara Persson, Porsche Thichan

Introduction

We chose to study the “lego_sample” dataset from OpenIntro because we found it interesting and it was a subject that most people are familiar with.

There are a total of 75 lego sets in the dataset. In this study we will be using variables that we found suitable for our subject such as “pieces”, “price”, “unique_pieces”, “theme”, “ages” and lastly “set_name”.

Hypothesis

The price of LEGO sets is determined by the number of pieces and the theme of the set, with more complex sets tending to be more expensive.

Method

We will utilize data visualization to examine the dataset “lego_sample” and with the help of tidyverse, ggplot2, dplyr, determine if our hypothesis is correct.

In this study we will be using variables that we found suitable for our subject such as “pieces”, “price”, “unique_pieces”, “theme”, “ages” and lastly “set_name”. There are a total of 75 lego sets in the dataset.

Visualization

We begin by exploring the correlation between all numeric variables in the LEGO dataset. This can help us identify relationships between price, number of pieces, weight, and other variables.

Correlation

Price and pieces

The scatter plots show the relationship between LEGO set price and the number of pieces. The first plot displays separate regression lines for each theme, while the second shows a single overall regression line for all themes.

Scatterplot with trendline for each theme

Combined trendline

Complexity

We define complexity by the number of unique pieces and the suggested age rating, with sets with higher age ratings being more complex.

The violin chart shows the distribution of unique pieces by theme, and the boxplot displays price variation by age group.

Violin plot

Boxplot

Average price per piece

This column chart shows the average price per piece for each LEGO theme, highlighting the themes with the highest cost per piece. The data reveals how pricing varies across different themes.

Column chart

Conclusion

Hypothesis: The price of LEGO sets is determined by the number of pieces and the theme of the set, with more complex sets tending to be more expensive.

Conclusion: We can conclude that the hypothesis is wrong as the price of duplo sets averages to around the same price as other themes despite the lower complexity and lower piece count

Reflection

Overall the project went well but there were some challenges such as

  • Figuring out which visualization to use

  • Showing name of each sets for each point in one of the interactive plot