Analyzing the price, brick by brick
We chose to study the “lego_sample” dataset from OpenIntro because we found it interesting and it was a subject that most people are familiar with.
There are a total of 75 lego sets in the dataset. In this study we will be using variables that we found suitable for our subject such as “pieces”, “price”, “unique_pieces”, “theme”, “ages” and lastly “set_name”.
The price of LEGO sets is determined by the number of pieces and the theme of the set, with more complex sets tending to be more expensive.
We will utilize data visualization to examine the dataset “lego_sample” and with the help of tidyverse, ggplot2, dplyr, determine if our hypothesis is correct.
In this study we will be using variables that we found suitable for our subject such as “pieces”, “price”, “unique_pieces”, “theme”, “ages” and lastly “set_name”. There are a total of 75 lego sets in the dataset.
We begin by exploring the correlation between all numeric variables in the LEGO dataset. This can help us identify relationships between price, number of pieces, weight, and other variables.
The scatter plots show the relationship between LEGO set price and the number of pieces. The first plot displays separate regression lines for each theme, while the second shows a single overall regression line for all themes.
We define complexity by the number of unique pieces and the suggested age rating, with sets with higher age ratings being more complex.
The violin chart shows the distribution of unique pieces by theme, and the boxplot displays price variation by age group.
This column chart shows the average price per piece for each LEGO theme, highlighting the themes with the highest cost per piece. The data reveals how pricing varies across different themes.
Hypothesis: The price of LEGO sets is determined by the number of pieces and the theme of the set, with more complex sets tending to be more expensive.
Conclusion: We can conclude that the hypothesis is wrong as the price of duplo sets averages to around the same price as other themes despite the lower complexity and lower piece count
Overall the project went well but there were some challenges such as
Figuring out which visualization to use
Showing name of each sets for each point in one of the interactive plot