1. Eksplorasi Data Awal
  1. Tampilkan informasi umum dataset (jumlah data, kolom, tipe data).
  2. Lakukan pembersihan data sederhana jika diperlukan (null values, duplikat, dll).
kbl_table <- lego_sales %>%
  head(11) %>%
  kable("html") %>%
  kable_styling(full_width = F, bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, background = "#e30b5d", color = "white")

num_cols <- ncol(lego_sales)

kbl_table <- kbl_table %>%
  column_spec(1:num_cols, background = "#ff1493", color = "white")


kbl_table
first_name last_name age phone_number set_id number theme subtheme year name pieces us_price image_url quantity
Kimberly Beckstead 24 216-555-2549 24701 76062 DC Comics Super Heroes Mighty Micros 2018 Robin vs. Bane 77 9.99 http://images.brickset.com/sets/images/76062-1.jpg 1
Neel Garvin 35 819-555-3189 25626 70595 Ninjago Rise of the Villains 2018 Ultra Stealth Raider 1093 119.99 http://images.brickset.com/sets/images/70595-1.jpg 1
Neel Garvin 35 819-555-3189 24665 21031 Architecture NA 2018 Burj Khalifa 333 39.99 http://images.brickset.com/sets/images/21031-1.jpg 1
Chelsea Bouchard 41 NA 24695 31048 Creator NA 2018 Lakeside Lodge 368 29.99 http://images.brickset.com/sets/images/31048-1.jpg 1
Chelsea Bouchard 41 NA 25626 70595 Ninjago Rise of the Villains 2018 Ultra Stealth Raider 1093 119.99 http://images.brickset.com/sets/images/70595-1.jpg 1
Chelsea Bouchard 41 NA 24721 10831 Duplo NA 2018 My First Caterpillar 19 9.99 http://images.brickset.com/sets/images/10831-1.jpg 1
Bryanna Welsh 19 NA 24797 75138 Star Wars Episode V 2018 Hoth Attack 233 24.99 http://images.brickset.com/sets/images/75138-1.jpg 1
Bryanna Welsh 19 NA 24701 76062 DC Comics Super Heroes Mighty Micros 2018 Robin vs. Bane 77 9.99 http://images.brickset.com/sets/images/76062-1.jpg 3
Caleb Garcia-Wideman 37 907-555-9236 24730 41115 Friends NA 2018 Emma’s Creative Workshop 108 9.99 http://images.brickset.com/sets/images/41115-1.jpg 1
Caleb Garcia-Wideman 37 907-555-9236 25611 21127 Minecraft Minifig-scale 2018 The Fortress NA 109.99 http://images.brickset.com/sets/images/21127-1.jpg 2
Chase Fortenberry 19 205-555-3704 24707 10801 Duplo NA 2018 Baby Animals 13 9.99 http://images.brickset.com/sets/images/10801-1.jpg 1
str(lego_sales)
## spc_tbl_ [620 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ first_name  : chr [1:620] "Kimberly" "Neel" "Neel" "Chelsea" ...
##  $ last_name   : chr [1:620] "Beckstead" "Garvin" "Garvin" "Bouchard" ...
##  $ age         : num [1:620] 24 35 35 41 41 41 19 19 37 37 ...
##  $ phone_number: chr [1:620] "216-555-2549" "819-555-3189" "819-555-3189" NA ...
##  $ set_id      : num [1:620] 24701 25626 24665 24695 25626 ...
##  $ number      : chr [1:620] "76062" "70595" "21031" "31048" ...
##  $ theme       : chr [1:620] "DC Comics Super Heroes" "Ninjago" "Architecture" "Creator" ...
##  $ subtheme    : chr [1:620] "Mighty Micros" "Rise of the Villains" NA NA ...
##  $ year        : num [1:620] 2018 2018 2018 2018 2018 ...
##  $ name        : chr [1:620] "Robin vs. Bane" "Ultra Stealth Raider" "Burj Khalifa" "Lakeside Lodge" ...
##  $ pieces      : num [1:620] 77 1093 333 368 1093 ...
##  $ us_price    : num [1:620] 9.99 119.99 39.99 29.99 119.99 ...
##  $ image_url   : chr [1:620] "http://images.brickset.com/sets/images/76062-1.jpg" "http://images.brickset.com/sets/images/70595-1.jpg" "http://images.brickset.com/sets/images/21031-1.jpg" "http://images.brickset.com/sets/images/31048-1.jpg" ...
##  $ quantity    : num [1:620] 1 1 1 1 1 1 1 3 1 2 ...
##  - attr(*, "spec")=List of 3
##   ..$ cols   :List of 14
##   .. ..$ first_name  : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ last_name   : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ age         : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   .. ..$ phone_number: list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ set_id      : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   .. ..$ number      : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ theme       : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ subtheme    : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ year        : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   .. ..$ name        : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ pieces      : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   .. ..$ us_price    : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   .. ..$ image_url   : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_character" "collector"
##   .. ..$ quantity    : list()
##   .. .. ..- attr(*, "class")= chr [1:2] "collector_double" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr [1:2] "collector_guess" "collector"
##   ..$ skip   : num 1
##   ..- attr(*, "class")= chr "col_spec"
dim(lego_sales)
## [1] 620  14
names(lego_sales)
##  [1] "first_name"   "last_name"    "age"          "phone_number" "set_id"      
##  [6] "number"       "theme"        "subtheme"     "year"         "name"        
## [11] "pieces"       "us_price"     "image_url"    "quantity"
colSums(is.na(lego_sales))
##   first_name    last_name          age phone_number       set_id       number 
##            0            0            0           92            0            0 
##        theme     subtheme         year         name       pieces     us_price 
##            0          172            0            0           69            0 
##    image_url     quantity 
##           59            0
sum(duplicated(lego_sales))
## [1] 0
lego_sales_clean <- lego_sales %>%
  tidyr::drop_na() %>%
  dplyr::distinct()

dim(lego_sales_clean)
## [1] 287  14
summary(lego_sales_clean)
##   first_name         last_name              age       phone_number      
##  Length:287         Length:287         Min.   :16.0   Length:287        
##  Class :character   Class :character   1st Qu.:25.5   Class :character  
##  Mode  :character   Mode  :character   Median :33.0   Mode  :character  
##                                        Mean   :34.4                     
##                                        3rd Qu.:41.0                     
##                                        Max.   :68.0                     
##      set_id         number             theme             subtheme        
##  Min.   :24548   Length:287         Length:287         Length:287        
##  1st Qu.:24703   Class :character   Class :character   Class :character  
##  Median :24783   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :24923                                                           
##  3rd Qu.:24920                                                           
##  Max.   :25969                                                           
##       year          name               pieces          us_price     
##  Min.   :2018   Length:287         Min.   :  18.0   Min.   :  3.99  
##  1st Qu.:2018   Class :character   1st Qu.:  76.0   1st Qu.:  9.99  
##  Median :2018   Mode  :character   Median : 102.0   Median : 19.99  
##  Mean   :2018                      Mean   : 257.5   Mean   : 28.69  
##  3rd Qu.:2018                      3rd Qu.: 294.0   3rd Qu.: 29.99  
##  Max.   :2018                      Max.   :2380.0   Max.   :249.99  
##   image_url            quantity    
##  Length:287         Min.   :1.000  
##  Class :character   1st Qu.:1.000  
##  Mode  :character   Median :1.000  
##                     Mean   :1.467  
##                     3rd Qu.:2.000  
##                     Max.   :5.000

2.Visualisasi Wajib Buat minimal 5 visualisasi dari kategori berikut: a) 10 customer dengan transaksi terbanyak

library(stringr)
## Warning: package 'stringr' was built under R version 4.3.3
lego_sales_clean <- lego_sales %>%
  drop_na(first_name, last_name, phone_number) %>%  # hilangkan NA di komponen customer
  mutate(customer = str_c(first_name, last_name, phone_number, sep = "_"))

lego_sales_clean %>%
  group_by(customer) %>%
  summarise(total_transaksi = n()) %>%
  arrange(desc(total_transaksi)) %>%
  slice_max(order_by = total_transaksi, n = 10) %>%
  ggplot(aes(x = reorder(customer, total_transaksi), y = total_transaksi, fill = customer)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  scale_fill_manual(values = rep(c("#ff1493", "#e30b5d", "#da3287"), length.out = 10)) +
  labs(title = "Top 10 Customers by Number of Transactions",
       x = "Customer (Name_Phone)", y = "Total Transactions") +
  theme_minimal(base_size = 13) +
  theme(
    plot.margin = margin(10, 30, 10, 10),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

  1. 10 Tema LEGO Terpopuler Berdasarkan Penjualan
#visualisasi 1
lego_sales %>%
  group_by(theme) %>%
  summarise(total_sales = sum(quantity)) %>%
  arrange(desc(total_sales)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(x = reorder(theme, total_sales), y = total_sales)) +
  geom_col(fill = "#da3287") +
  coord_flip() +
  labs(title = "Top 10 Tema LEGO Terpopuler", x = "Tema", y = "Total Penjualan") +
  theme_minimal()

#visualisasi 2
#5 popular theme
top5_themes <- lego_sales %>%
  group_by(theme) %>%
  summarise(total_sales = sum(quantity)) %>%
  arrange(desc(total_sales)) %>%
  slice_head(n = 5) %>%
  pull(theme)

lego_top5 <- lego_sales %>%
  filter(theme %in% top5_themes)

manual_colors <- c("#ff1493", "#e30b5d", "#da3287", "#ff69b4", "#c71585")

ggplot(lego_top5, aes(x = pieces, y = us_price)) +
  geom_point(aes(color = theme), alpha = 0.8, size = 3) +
  scale_color_manual(values = manual_colors) +
  labs(title = "Pieces vs Price (Top 5 Themes)",
       x = "Number of Pieces", y = "Price (USD)") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")
## Warning: Removed 55 rows containing missing values or values outside the scale range
## (`geom_point()`).

  1. Sebaran Jumlah Pieces dan Harga
ggplot(lego_sales, aes(x = pieces, y = us_price)) +
  geom_point(aes(color = theme), alpha = 0.7) +
  scale_color_viridis_d() +
  labs(title = "Pieces vs Price (Auto Color)",
       x = "Number of Pieces", y = "Price (USD)") +
  theme_minimal(base_size = 13)
## Warning: Removed 69 rows containing missing values or values outside the scale range
## (`geom_point()`).

#visualisasi 1
lego_sales %>%
  group_by(age) %>%
  summarise(total = sum(quantity, na.rm = TRUE)) %>%
  ggplot(aes(x = age, y = total)) +
  geom_col(fill = "#da3287") +
  labs(title = " LEGO Purchases by Customer Age",
       x = "Age", y = "Total Sets Bought") +
  theme_minimal(base_size = 13)

e) Heatmap Korelasi Antar Variabel Numerik

library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.3
## corrplot 0.95 loaded
numeric_data <- lego_sales %>%
  select(where(is.numeric))

cor_matrix <- cor(numeric_data, use = "complete.obs")
## Warning in cor(numeric_data, use = "complete.obs"): the standard deviation is
## zero
corrplot(cor_matrix, method = "color",
         col = colorRampPalette(c("#da3287", "#e30b5d", "#ff1493"))(200),
         type = "upper", addCoef.col = "white",
         title = "Numeric Variable Correlation", mar = c(0,0,2,0))

  1. Insight dan Narasi ## Insight dan Narasi

1. Visualisasi: 2a - Top 10 Customers by Transactions

Dari visualisasi Top 10 Customers by Transactions, kita lihat ada beberapa nama yang frequent shopper, bahkan sampai lebih dari 100 transaksi. Bisa jadi mereka loyal banget sama LEGO atau mungkin collector yang sangat hobi beli.
This is a big opportunity for LEGO untuk membuat program loyalty, promo spesial, atau bahkan membership buat customer seperti ini—yang bisa dikatakan level VIP.


2. Visualisasi: 2b - Top 10 LEGO Themes

Dari visualisasi Top 10 LEGO Themes berdasarkan penjualan, ada beberapa tema seperti Star Wars dan Nexo Knights yang benar-benar dominan. Ini menunjukkan bahwa LEGO nggak cuma buat children, tapi juga jadi barang koleksi for adult people. LEGO bisa banget manfaatin hal ini buat strategi marketing, seperti mengeluarkan limited edition untuk menarik marke adult fans yang sudah invested dalam dunia LEGO. misalnya, email promo “Your Inner Child Misses You”.


3. Visualisasi: 2c - Pieces vs. Price

Dari grafik Pieces vs. Price, terlihat bahwa ada beberapa set LEGO dengan banyak pieces tapi harganya nggak selalu mahal.Sebaliknya, ada juga set yang jumlah pieces-nya sedikit tapi harga jualnya tinggi—biasanya karena limited edition atau kolaborasi dengan brand besar. Insight-nya adalah harga itu tidak hanya soal jumlah pieces, tapi lebih ke brand, tema, dan exclusivity dari set tersebut. bisa jadi pricing straytegy LEGO ini nggak cuma berdasarkan jumlah barang, tapi juga faktor perceived value, Alias: “bukan ukuran pieces-nya yang penting, tapi vibes-nya.”


4. Visualisasi: 2d - LEGO Purchases by Customer Age

Dari visualisasi LEGO Purchases by Customer Age, kelompok usia 30-an adalah pembeli paling dominan dalam hal pembelian. This could mean mereka beli untuk anak-anak, atau justru mereka sendiri yang nostalgic dan koleksi LEGO pribadi. LEGO bisa fokus ke marketing buat millennial parents atau adult fans yang butuh koleksi yang lebih exclusive.


5. Visualisasi: 2e - Heatmap Korelasi

Dari heatmap korelasi, terlihat sekali ada hubungan kuat antara pieces, weight, dan price. Ini menunjukkan bahwa semakin banyak pieces, harga cenderung naik—tapi tema dan brand value juga berpengaruh besar. LEGO’s pricing strategy bukan cuma soal kuantitas, tapi lebih ke perceived value dan apa yang membuat set tertentu lebih berharga dari yang lain. Artinya, pricing strategy LEGO itu bukan cuma soal angka, tapi juga branding magic. “Value itu nggak selalu soal kuantitas, tapi juga soal persepsi.”