1. Introduction

The motivation for this project was to look at data of everyone’s favorite toy, legos. In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!

import pandas as pd

colors = pd.read_csv('https://raw.githubusercontent.com/indianspice/Lego/master/Data/colors.csv')
colors.head()
# How many distinct colors are available?
num_colors = colors[['name']].name.nunique()
print(num_colors)
177

3. Transparent Colors in Lego Sets

The colors data has a column named is_trans that indicates whether a color is transparent or not. It would be interesting to explore the distribution of transparent vs. non-transparent colors.

colors_summary = colors.groupby('is_trans').count()
colors_summary

4. Explore Lego Sets

%matplotlib inline

sets = pd.read_csv('https://raw.githubusercontent.com/indianspice/Lego/master/Data/sets.csv')

# Create a summary of average number of parts by year: `parts_by_year`
parts_by_year = sets[['year', 
                      'num_parts']].groupby('year', as_index=False).mean().round(2)


# Plot trends in average number of parts by year
import matplotlib.pyplot as plt
plt.plot('year', 'num_parts', data = parts_by_year)
plt.xlabel('Year')
plt.ylabel('Average Number of Parts')
plt.title('Average Lego Parts by Year')
plt.show()

png

png

5. Lego Themes Over Years

# themes_by_year: Number of themes shipped by year
themes_by_year = sets[['year', 
                       'theme_id']].groupby('year').theme_id.nunique()
themes_by_year.head()
year
1950    2
1953    1
1954    2
1955    4
1956    3
Name: theme_id, dtype: int64