Email: nikitaindriyni@gmail.com
RPubs: https://rpubs.com/nikitaindriyani/
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:glue':
##
## collapse
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Andikan Anda adalah seorang Manajer dibidang projek Data Scince, ingin melakukan analisis perilaku pelanggan menggunakan data transaksi perusahaan ritel online di Inggris (yang memperoleh dataset dalam kurung waktu antara 01/12/2010 dan 09/12/2011). Diketahui dari data tersebut banyak pelanggan perusahaan adalah grosir (pemasok). Selain itu, ada beberapa hal penting yang perlu diperhatikan mengenai data tersebut adalah sebagai berikut:
| Variabel | Deskripsi |
|---|---|
| invoice_no | Nomor invoice terdiri dari 6 digit unik untuk setiap transaksi. Jika diawali dengan huruf C, ini menandakan bahwa transaksi tersebut berstatus Batal |
| stock_code | Kode produk yang unik untuk setiap produk, terdiri dari 5 digit angka yang unik untuk setiap produk (Deskripsi Nama Produk). |
| quantity | Jumlah produk yang dibeli |
| invoice_date | Tanggal dan waktu transaksi |
| unit_price | Harga produk per unit |
| customer_id | ID Pelanggan terdiri dari 5 digit angka yang unik untuk setiap pelanggan. |
| country | Negara pelanggan |
Anda dapat mengunduh data yang digunakan dalam kasus ini di Google Classrom atau klik Retail.xlsx dan Retail.rds.
Import kedua data tersebut ke Rstudio Anda sesuai dengan jenis file masing-masing (Proses import mana yang lebih baik menurut Anda?).
## # A tibble: 6 x 8
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice
## <chr> <chr> <chr> <dbl> <dttm> <dbl>
## 1 536365 85123A WHITE HANG~ 6 2010-12-01 08:26:00 2.55
## 2 536365 71053 WHITE META~ 6 2010-12-01 08:26:00 3.39
## 3 536365 84406B CREAM CUPI~ 8 2010-12-01 08:26:00 2.75
## 4 536365 84029G KNITTED UN~ 6 2010-12-01 08:26:00 3.39
## 5 536365 84029E RED WOOLLY~ 6 2010-12-01 08:26:00 3.39
## 6 536365 22752 SET 7 BABU~ 2 2010-12-01 08:26:00 7.65
## # ... with 2 more variables: CustomerID <dbl>, Country <chr>
## # A tibble: 6 x 8
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice
## <chr> <chr> <chr> <dbl> <dttm> <dbl>
## 1 536365 85123A WHITE HANG~ 6 2010-12-01 08:26:00 2.55
## 2 536365 71053 WHITE META~ 6 2010-12-01 08:26:00 3.39
## 3 536365 84406B CREAM CUPI~ 8 2010-12-01 08:26:00 2.75
## 4 536365 84029G KNITTED UN~ 6 2010-12-01 08:26:00 3.39
## 5 536365 84029E RED WOOLLY~ 6 2010-12-01 08:26:00 3.39
## 6 536365 22752 SET 7 BABU~ 2 2010-12-01 08:26:00 7.65
## # ... with 2 more variables: CustomerID <dbl>, Country <chr>
Argumen Anda: saya merasa lebih baik menggunakan rds karna dapat digunakan secara mudah.
Ubah nama variabel data tersebut agar lebih mudah dipelajari oleh pembaca.
names(data_input_2)[names(data_input_2)==names(data_input_2)] <- c(
"Invoice_No" ,
"Kode_Stock" ,
"Deskripsi" ,
"Kuantitas" ,
"Tanggal_Invoice" ,
"Harga_per_Barang",
"ID_Customer" ,
"Negara"
)
data_input_2## # A tibble: 541,909 x 8
## Invoice_No Kode_Stock Deskripsi Kuantitas Tanggal_Invoice
## <chr> <chr> <chr> <dbl> <dttm>
## 1 536365 85123A WHITE HA~ 6 2010-12-01 08:26:00
## 2 536365 71053 WHITE ME~ 6 2010-12-01 08:26:00
## 3 536365 84406B CREAM CU~ 8 2010-12-01 08:26:00
## 4 536365 84029G KNITTED ~ 6 2010-12-01 08:26:00
## 5 536365 84029E RED WOOL~ 6 2010-12-01 08:26:00
## 6 536365 22752 SET 7 BA~ 2 2010-12-01 08:26:00
## 7 536365 21730 GLASS ST~ 6 2010-12-01 08:26:00
## 8 536366 22633 HAND WAR~ 6 2010-12-01 08:28:00
## 9 536366 22632 HAND WAR~ 6 2010-12-01 08:28:00
## 10 536367 84879 ASSORTED~ 32 2010-12-01 08:34:00
## # ... with 541,899 more rows, and 3 more variables: Harga_per_Barang <dbl>,
## # ID_Customer <dbl>, Negara <chr>
Argumen Anda: Saya merasa lebih mudah jika seluruh nama kolom menggunakan bahasa indonesia
Lakukan pemeriksaan struktur data untuk mengubah jenis kumpulan data (jika ada yang perlu diubah).
## Rows: 541,909
## Columns: 8
## $ InvoiceNo <chr> "536365", "536365", "536365", "536365", "536365", "5363...
## $ StockCode <chr> "85123A", "71053", "84406B", "84029G", "84029E", "22752...
## $ Description <chr> "WHITE HANGING HEART T-LIGHT HOLDER", "WHITE METAL LANT...
## $ Quantity <dbl> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2, 3, ...
## $ InvoiceDate <dttm> 2010-12-01 08:26:00, 2010-12-01 08:26:00, 2010-12-01 0...
## $ UnitPrice <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1.85, 1...
## $ CustomerID <dbl> 17850, 17850, 17850, 17850, 17850, 17850, 17850, 17850,...
## $ Country <chr> "United Kingdom", "United Kingdom", "United Kingdom", "...
data.frame(
invoice_Nomor = as.integer(data_input_1$InvoiceNo %>% unique() %>% length()),
stock_code_Barang = data_input_1$Kode_Stock %>% unique() %>% length(),
description_Barang = data_input_1$Deskripsi %>% unique() %>% length(),
Nama_Negara = data_input_1$Negara %>% unique() %>% length(),
customer = data_input_1$ID_Customer %>% unique() %>% length()
)## Warning: Unknown or uninitialised column: `Kode_Stock`.
## Warning: Unknown or uninitialised column: `Deskripsi`.
## Warning: Unknown or uninitialised column: `Negara`.
## Warning: Unknown or uninitialised column: `ID_Customer`.
## invoice_Nomor stock_code_Barang description_Barang Nama_Negara customer
## 1 25900 0 0 0 0
Argumen Anda: Dengan ini kita dapat mengetahui dan menstrukturkan data secara jelas
Pembersihan data atau disebut juga data scrubbing adalah proses menganalisis kualitas data dengan cara mengubah. Anda sebagai Manajer juga dapat memperbaiki atau menghapus data tersebut. Beberapa hal yang mungkin dilakukan dalam projek ini adalah:
data_input_1 %>% filter(grepl("C", data_input_1$InvoiceNo)) %>% summarise(total_cancelled_transaction = n())## # A tibble: 1 x 1
## total_cancelled_transaction
## <int>
## 1 9288
Argumen Anda: Dengan Ini kita tau total transaksi yang di cancel ada 9288
## # A tibble: 3 x 8
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice
## <chr> <chr> <chr> <dbl> <dttm> <dbl>
## 1 A563185 B Adjust bad~ 1 2011-08-12 14:50:00 11062.
## 2 A563186 B Adjust bad~ 1 2011-08-12 14:51:00 -11062.
## 3 A563187 B Adjust bad~ 1 2011-08-12 14:52:00 -11062.
## # ... with 2 more variables: CustomerID <dbl>, Country <chr>
Argumen Anda: Sesuai deskripsi, Invoice No yang valid memiliki 3 digit angka.
## [1] 1336
Argumen Anda: Terdapat 1336 data transaksi yang memiliki quantity<=0
## [1] 1179
Argumen Anda: Terdapat 1179 data transaksi yang memiliki UnitPrice<=0
stock_valid <- data_input_1 %>% mutate(
stock_code_validation = substr(StockCode,start = 1, stop = 5)
) %>% mutate(
stock_code_validation = as.numeric(stock_code_validation)
) %>% select(StockCode,stock_code_validation,Description) %>% distinct()## Warning: Problem with `mutate()` input `stock_code_validation`.
## i NAs introduced by coercion
## i Input `stock_code_validation` is `as.numeric(stock_code_validation)`.
## Warning in mask$eval_all_mutate(dots[[i]]): NAs introduced by coercion
data_input_1 %>% filter(Description %in% (stock_valid %>% filter(is.na(stock_code_validation)) %>% .$Description))## # A tibble: 2,378 x 8
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice
## <chr> <chr> <chr> <dbl> <dttm> <dbl>
## 1 536370 POST POSTAGE 3 2010-12-01 08:45:00 18
## 2 536403 POST POSTAGE 1 2010-12-01 11:27:00 15
## 3 536527 POST POSTAGE 1 2010-12-01 13:04:00 18
## 4 536540 C2 CARRIAGE 1 2010-12-01 14:05:00 50
## 5 536544 DOT DOTCOM POS~ 1 2010-12-01 14:32:00 570.
## 6 536569 M Manual 1 2010-12-01 15:35:00 1.25
## 7 536569 M Manual 1 2010-12-01 15:35:00 19.0
## 8 536592 DOT DOTCOM POS~ 1 2010-12-01 17:06:00 607.
## 9 536779 BANK CHA~ Bank Charg~ 1 2010-12-02 15:08:00 15
## 10 536840 POST POSTAGE 1 2010-12-02 18:27:00 18
## # ... with 2,368 more rows, and 2 more variables: CustomerID <dbl>,
## # Country <chr>
Argumen Anda: Terdapat 2379 data transaksi yang memiliki stock code tidak valid.
invalid_stock <- stock_valid %>% filter(is.na(stock_code_validation)) %>% .$Description
data_input_1 <- data_input_1 %>% filter(!Description %in% invalid_stock)# Additional adjustment codes to remove
descr <- c( "check", "check?", "?", "??", "damaged", "found",
"adjustment", "Amazon", "AMAZON", "amazon adjust",
"Amazon Adjustment", "amazon sales", "Found", "FOUND",
"found box", "Found by jackie ","Found in w/hse","dotcom",
"dotcom adjust", "allocate stock for dotcom orders ta", "FBA",
"Dotcomgiftshop Gift Voucher 愼㸳100.00", "on cargo order",
"wrongly sold (22719) barcode", "wrongly marked 23343",
"dotcomstock", "rcvd be air temp fix for dotcom sit",
"Manual", "John Lewis", "had been put aside",
"for online retail orders", "taig adjust", "amazon",
"incorrectly credited C550456 see 47", "returned",
"wrongly coded 20713", "came coded as 20713",
"add stock to allocate online orders", "Adjust bad debt",
"alan hodge cant mamage this section", "website fixed",
"did a credit and did not tick ret", "michel oops",
"incorrectly credited C550456 see 47", "mailout", "test",
"Sale error", "Lighthouse Trading zero invc incorr", "SAMPLES",
"Marked as 23343", "wrongly coded 23343","Adjustment",
"rcvd be air temp fix for dotcom sit", "Had been put aside." )
descr## [1] "check"
## [2] "check?"
## [3] "?"
## [4] "??"
## [5] "damaged"
## [6] "found"
## [7] "adjustment"
## [8] "Amazon"
## [9] "AMAZON"
## [10] "amazon adjust"
## [11] "Amazon Adjustment"
## [12] "amazon sales"
## [13] "Found"
## [14] "FOUND"
## [15] "found box"
## [16] "Found by jackie "
## [17] "Found in w/hse"
## [18] "dotcom"
## [19] "dotcom adjust"
## [20] "allocate stock for dotcom orders ta"
## [21] "FBA"
## [22] "Dotcomgiftshop Gift Voucher <U+00A3>100.00"
## [23] "on cargo order"
## [24] "wrongly sold (22719) barcode"
## [25] "wrongly marked 23343"
## [26] "dotcomstock"
## [27] "rcvd be air temp fix for dotcom sit"
## [28] "Manual"
## [29] "John Lewis"
## [30] "had been put aside"
## [31] "for online retail orders"
## [32] "taig adjust"
## [33] "amazon"
## [34] "incorrectly credited C550456 see 47"
## [35] "returned"
## [36] "wrongly coded 20713"
## [37] "came coded as 20713"
## [38] "add stock to allocate online orders"
## [39] "Adjust bad debt"
## [40] "alan hodge cant mamage this section"
## [41] "website fixed"
## [42] "did a credit and did not tick ret"
## [43] "michel oops"
## [44] "incorrectly credited C550456 see 47"
## [45] "mailout"
## [46] "test"
## [47] "Sale error"
## [48] "Lighthouse Trading zero invc incorr"
## [49] "SAMPLES"
## [50] "Marked as 23343"
## [51] "wrongly coded 23343"
## [52] "Adjustment"
## [53] "rcvd be air temp fix for dotcom sit"
## [54] "Had been put aside."
setuju dengan hasil temuannya, oleh karena itu data transaksi yang memiliki deskripsi diatas perlu di remove.
df_product <- data_input_1 %>% select(StockCode,Description) %>% distinct()
df_product <- df_product %>%
mutate(stock_code_lowercase = tolower(StockCode),
description_lowercase = tolower(Description))data_frame(
stock_code_unik = df_product$StockCode %>% unique() %>% length(),
stock_code_unik_lowercase = df_product$stock_code_lowercase %>% unique() %>% length(),
description_unik = df_product$Description %>% unique() %>% length(),
description_unik_lower = df_product$description_lowercase %>% unique() %>% length()
)## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 1 x 4
## stock_code_unik stock_code_unik_lowerca~ description_unik description_unik_lo~
## <int> <int> <int> <int>
## 1 3900 3791 3994 3994
Jumlah stock code unik dan produk unik tidak sama, mari kita cek dari sisi stock code dahulu.
stock_code_check_dupli <- df_product %>% select(StockCode,stock_code_lowercase) %>%
distinct() %>%
group_by(stock_code_lowercase) %>%
summarise(freq = n()) %>%
ungroup() %>%
filter(freq>1)## `summarise()` ungrouping output (override with `.groups` argument)
Data diatas merupakan data stock code yang duplikat apabila kita ubah menjadi lowecase. Mari kita cek apakah benar duplikat.
df_product %>% filter(stock_code_lowercase %in% stock_code_check_dupli$stock_code_lowercase) %>%
arrange(stock_code_lowercase)## # A tibble: 230 x 4
## StockCode Description stock_code_lowerc~ description_lowercase
## <chr> <chr> <chr> <chr>
## 1 15056BL EDWARDIAN PARASOL BLACK 15056bl edwardian parasol black
## 2 15056bl EDWARDIAN PARASOL BLACK 15056bl edwardian parasol black
## 3 15056N EDWARDIAN PARASOL NATU~ 15056n edwardian parasol natur~
## 4 15056n EDWARDIAN PARASOL NATU~ 15056n edwardian parasol natur~
## 5 15056P EDWARDIAN PARASOL PINK 15056p edwardian parasol pink
## 6 15056p EDWARDIAN PARASOL PINK 15056p edwardian parasol pink
## 7 15060B FAIRY CAKE DESIGN UMBR~ 15060b fairy cake design umbre~
## 8 15060b FAIRY CAKE DESIGN UMBR~ 15060b fairy cake design umbre~
## 9 18098C PORCELAIN BUTTERFLY OI~ 18098c porcelain butterfly oil~
## 10 18098c PORCELAIN BUTTERFLY OI~ 18098c porcelain butterfly oil~
## # ... with 220 more rows
Ternyata terdapat data stock code yang duplicated karena efek case sensitive. Oleh karena itu, seluruh stock_code akan kita convert menjadi UPPERCASE untuk mengilahkan efek case sensitive.
data_input_1 <- data_input_1 %>% mutate(
StockCode = toupper(StockCode)
)
data_frame(
jumlah_stock_code_unik = data_input_1$StockCode %>% unique() %>% length(),
stock_code_unik_lowercase = data_input_1$StockCode %>% tolower() %>% unique() %>% length(),
description_unik = data_input_1$Description %>% unique() %>% length(),
description_unik_lower = data_input_1$Description %>% tolower() %>% unique() %>% length()
)## # A tibble: 1 x 4
## jumlah_stock_code_u~ stock_code_unik_low~ description_unik description_unik_l~
## <int> <int> <int> <int>
## 1 3791 3791 3994 3994
Oke, data stock_code sudah clean. Namun, seharusnya jumlah stock_code dan description berjumlah sama karena bersifat unik. Hal ini mengindikasikan duplikat data. Mari kita cek.
description_check <- data_input_1 %>% select(StockCode,Description) %>%
distinct() %>%
group_by(StockCode) %>%
summarise(freq = n()) %>%
ungroup() %>%
filter(freq>1)## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 212 x 2
## StockCode freq
## <chr> <int>
## 1 16156L 2
## 2 17107D 3
## 3 20622 2
## 4 20725 2
## 5 20914 2
## 6 21109 2
## 7 21112 2
## 8 21175 2
## 9 21232 2
## 10 21243 2
## # ... with 202 more rows
Data diatas adalah data stock code yang memiliki description > 1. Mari kita sampling datanya.
data_input_1 %>% filter(StockCode %in% description_check$StockCode) %>%
select(StockCode,Description) %>%
distinct() %>%
arrange(StockCode,Description)## # A tibble: 443 x 2
## StockCode Description
## <chr> <chr>
## 1 16156L WRAP CAROUSEL
## 2 16156L WRAP, CAROUSEL
## 3 17107D FLOWER FAIRY 5 DRAWER LINERS
## 4 17107D FLOWER FAIRY 5 SUMMER DRAW LINERS
## 5 17107D FLOWER FAIRY,5 SUMMER B'DRAW LINERS
## 6 20622 VIP PASSPORT COVER
## 7 20622 VIPPASSPORT COVER
## 8 20725 LUNCH BAG RED RETROSPOT
## 9 20725 LUNCH BAG RED SPOTTY
## 10 20914 SET/5 RED RETROSPOT LID GLASS BOWLS
## # ... with 433 more rows
Dari hasil pengecekan diatas dapat kita simpulkan bahwa terdapat kesalahan pada deskripsi yang berupa tanda baca, spasi hingga kesalahan penulisan deskripsi produk. Untuk itu kita akan generate deskripsi produk menggunakan data pertamanya.
df_description <- data_input_1 %>% select(StockCode,Description) %>%
filter(StockCode %in% description_check$StockCode) %>%
distinct() %>%
group_by(StockCode) %>%
slice(1) %>%
ungroup()
data_input_1 <- data_input_1 %>% left_join(df_description, by=c("StockCode")) %>%
mutate(Description = ifelse(is.na(Description.y),Description.x,Description.y)) %>%
select(-c(Description.y,Description.x))
data_frame(
jumlah_stock_code_unik = data_input_1$StockCode %>% unique() %>% length(),
jumlah_description_unik = data_input_1$Description %>% unique() %>% length()
)## # A tibble: 1 x 2
## jumlah_stock_code_unik jumlah_description_unik
## <int> <int>
## 1 3791 3765
Oke, selisih nya sudah mulai berkurang. Jumlah diatas mengindikasikan terdapat stock_code yang memiliki description sama.
df_description <- data_input_1 %>% select(StockCode,Description) %>%
distinct() %>%
group_by(Description) %>%
summarise(freq = n()) %>%
ungroup() %>%
filter(freq>1)## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 24 x 2
## Description freq
## <chr> <int>
## 1 BATHROOM METAL SIGN 2
## 2 COLOURING PENCILS BROWN TUBE 2
## 3 COLUMBIAN CANDLE RECTANGLE 2
## 4 COLUMBIAN CANDLE ROUND 3
## 5 EAU DE NILE JEWELLED PHOTOFRAME 2
## 6 FRENCH FLORAL CUSHION COVER 2
## 7 FRENCH LATTICE CUSHION COVER 2
## 8 FRENCH PAISLEY CUSHION COVER 2
## 9 FROSTED WHITE BASE 2
## 10 HEART T-LIGHT HOLDER 2
## # ... with 14 more rows
Terdapat 24 stock code yang memiliki deskripsi sama.
data_input_1 %>% select(StockCode,Description) %>%
distinct() %>%
filter(Description %in% c(df_description$Description)) %>%
arrange(Description)## # A tibble: 50 x 2
## StockCode Description
## <chr> <chr>
## 1 82580 BATHROOM METAL SIGN
## 2 21171 BATHROOM METAL SIGN
## 3 10133 COLOURING PENCILS BROWN TUBE
## 4 10135 COLOURING PENCILS BROWN TUBE
## 5 72133 COLUMBIAN CANDLE RECTANGLE
## 6 72131 COLUMBIAN CANDLE RECTANGLE
## 7 72127 COLUMBIAN CANDLE ROUND
## 8 72130 COLUMBIAN CANDLE ROUND
## 9 72128 COLUMBIAN CANDLE ROUND
## 10 85023B EAU DE NILE JEWELLED PHOTOFRAME
## # ... with 40 more rows
Untuk case ini kita bisa berasumsi bahwa produk tersebut sama. Sehinnga, kita akan generate setiap description menggunakan stock code pertama.
df_product_unik <- data_input_1 %>% select(StockCode,Description) %>%
filter(Description %in% c(df_description$Description)) %>%
distinct() %>%
group_by(Description) %>%
slice(1) %>%
ungroup()
data_input_1 <- data_input_1 %>% left_join(df_product_unik, by=c("Description")) %>%
mutate(StockCode = ifelse(is.na(StockCode.y),StockCode.x,StockCode.y)) %>%
select(-c("StockCode.x","StockCode.y")) %>%
select(InvoiceNo,InvoiceDate,CustomerID,Country,StockCode,Description,Quantity,UnitPrice)
data_frame(
jumlah_stock_code_unik = data_input_1$StockCode %>% unique() %>% length(),
jumlah_description_unik = data_input_1$Description %>% unique() %>% length(),
jumlah_code_description_unik = data_input_1 %>% select(StockCode,Description) %>% distinct() %>% nrow()
)## # A tibble: 1 x 3
## jumlah_stock_code_unik jumlah_description_unik jumlah_code_description_unik
## <int> <int> <int>
## 1 3765 3765 3765
Argumen Anda: Oke, setiap stock sudah bersifat unik.
data_frame(
customer_id_unik = data_input_1 %>% select(CustomerID,Country) %>% distinct() %>% nrow(),
customer_country_unik = data_input_1 %>% select(CustomerID) %>% distinct() %>% nrow()
)## # A tibble: 1 x 2
## customer_id_unik customer_country_unik
## <int> <int>
## 1 4351 4335
Berdasar data diatas terdapat 1 customer yang memiliki 2 negara. Mungkin bisa karena customer tersebut pindah, oleh karena itu kita bisa ambil negara customer berdasarkan negara terakhir ia melakukan transaksi.
df_master_customer <- data_input_1 %>%
arrange(desc(InvoiceDate,CustomerID)) %>%
select(CustomerID, Country) %>%
group_by(CustomerID) %>%
slice(1)
data_input_1 <- data_input_1 %>% select(-Country) %>%
left_join(df_master_customer, by = c("CustomerID"))
data_frame(
customer_id_unik = data_input_1 %>% select(CustomerID,Country) %>% distinct() %>% nrow(),
customer_country_unik = data_input_1 %>% select(CustomerID) %>% distinct() %>% nrow())## # A tibble: 1 x 2
## customer_id_unik customer_country_unik
## <int> <int>
## 1 4335 4335
Argumen Anda: setiap data customer dan negara sudah sesuai.
data_input_1 <- data_input_1 %>% filter(as.Date(InvoiceDate) > ymd("2010-11-30"), as.Date(InvoiceDate) < ymd("2011-12-01"))
tail(data_input_1,10)## # A tibble: 10 x 8
## InvoiceNo InvoiceDate CustomerID StockCode Description Quantity
## <chr> <dttm> <dbl> <chr> <chr> <dbl>
## 1 579885 2011-11-30 17:37:00 15444 22118 JOY WOODEN~ 2
## 2 579885 2011-11-30 17:37:00 15444 21287 SCENTED VE~ 12
## 3 579885 2011-11-30 17:37:00 15444 23035 DRAWER KNO~ 6
## 4 579885 2011-11-30 17:37:00 15444 23240 SET OF 4 K~ 1
## 5 579885 2011-11-30 17:37:00 15444 84882 GREEN WIRE~ 2
## 6 579885 2011-11-30 17:37:00 15444 85034C 3 ROSE MOR~ 4
## 7 579885 2011-11-30 17:37:00 15444 21742 LARGE ROUN~ 2
## 8 579885 2011-11-30 17:37:00 15444 23084 RABBIT NIG~ 6
## 9 579885 2011-11-30 17:37:00 15444 21257 VICTORIAN ~ 1
## 10 579885 2011-11-30 17:37:00 15444 21259 VICTORIAN ~ 1
## # ... with 2 more variables: UnitPrice <dbl>, Country <chr>
Argumen Anda: Data ini berisikan transaksi antara 01/12/2010 sampai 09/12/2011. Data transaksi Desember 2011 tidak full 1 bulan, sehinnga saya memilih untuk melakukan analisa dari 01/12/2010 sampai 30/11/2011.
data.frame(
jumlah_data = data_input_1 %>% nrow(),
jumlah_data_unik = data_input_1 %>% distinct() %>% nrow()
)## jumlah_data jumlah_data_unik
## 1 502695 497672
Dataset ini memiliki data yang duplikat, untuk itu perlu kita remove.
## [1] 497672
data_input_1 <- data_input_1 %>% mutate(total_amount = Quantity * UnitPrice) %>%
select(InvoiceNo,InvoiceDate,CustomerID,Country,StockCode,Description,Quantity,UnitPrice,total_amount)
head(data_input_1)## # A tibble: 6 x 9
## InvoiceNo InvoiceDate CustomerID Country StockCode Description
## <chr> <dttm> <dbl> <chr> <chr> <chr>
## 1 536365 2010-12-01 08:26:00 17850 United~ 85123A WHITE HANG~
## 2 536365 2010-12-01 08:26:00 17850 United~ 71053 WHITE META~
## 3 536365 2010-12-01 08:26:00 17850 United~ 84406B CREAM CUPI~
## 4 536365 2010-12-01 08:26:00 17850 United~ 84029G KNITTED UN~
## 5 536365 2010-12-01 08:26:00 17850 United~ 84029E RED WOOLLY~
## 6 536365 2010-12-01 08:26:00 17850 United~ 22752 SET 7 BABU~
## # ... with 3 more variables: Quantity <dbl>, UnitPrice <dbl>,
## # total_amount <dbl>
Argumen Anda: mengekstrak data total amount per transaksi berdasarkan quantity * unit_price mempermudah proses analisis selanjutnya.
## InvoiceNo InvoiceDate CustomerID Country StockCode Description
## 0 0 123531 0 0 0
## Quantity UnitPrice total_amount
## 0 0 0
Projek ini ditujukan untuk melakukan segmentasi pelanggan dan membuat personalisasi rekomendasi produk. Segmentasi pelanggan jelas harus mengetahui siapa pelangannya, sehingga data transaksi yang tidak memiliki data customer_id perlu di-exclude. Kemudian untuk membangun sistem rekomendasi, kita bisa mengabaikan customer nya dan fokus pada produk yang dibeli, sehingga dapat menggunakan data invoice_no dan stock_code.
Oleh karena itu dapat kita putuskan untuk membagi data ini menjadi 3 dataset, yaitu: 1. df_general_transaction : dataset data original. 1. df_customer_transaction : dataset ini digungakan untuk segmentasi pelanggan, sehingga harus mengexclude missing values. 2. df_product_recomm : dataset untuk membangun sistem rekomendasi produk yang hanya terdiri dari customer_id, invoice_no, stock_code dan description.
Simpan data yang sudah Anda bersihkan ke dalam folder dalam format .json atau .xml atau .rds.
Import data yang sudah anda simpan pada Tugas 5, silahkan pilih salah satau jenis file saja. Kemudian, lakukan Analisis Data Eksplorasi dengan menggunakan Visualisasi yang telah Anda pelajari untuk menjawab setiap pertanyaan berikut:
• Gunakan Bar-Chart untuk memperlihatkan berapa banyak pelanggan yang melakukan transaksi setiap bulan?
Customer_per_Bulan <- data_input_1 %>%
select(InvoiceDate, CustomerID) %>%
distinct() %>%
mutate(yearmonth = format(InvoiceDate, format = "%y-%b-1"),
yearmonth = ymd(yearmonth)) %>%
group_by(yearmonth) %>%
summarise(total= n()) %>%
ungroup()## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 12 x 2
## yearmonth total
## <date> <int>
## 1 2010-12-01 1529
## 2 2011-01-01 1081
## 3 2011-02-01 1089
## 4 2011-03-01 1424
## 5 2011-04-01 1231
## 6 2011-05-01 1646
## 7 2011-06-01 1488
## 8 2011-07-01 1423
## 9 2011-08-01 1335
## 10 2011-09-01 1796
## 11 2011-10-01 1987
## 12 2011-11-01 2738
ggplot(Customer_per_Bulan,
aes(x = yearmonth, y = total)) +
geom_bar(width=24, fill = rainbow(12), color="azure4", stat= "identity" ) +
theme_minimal() +
labs(
x = "Bulan" ,
y = "Customer",
title = "Transaksi Pelanggan Setiap Bulan"
) +
theme(axis.text.x = element_text(angle = -45, hjust = -.2)) +
scale_x_date(breaks = date_breaks('1 month'),
labels = date_format("%b %y"))Argumen Anda: banyak pelanggan yang melakukan transaksi setiap bulan tertinggi ada pada November 11
• Gunakan interaktif Line-Chart untuk memperlihatkan bagaimana pertumbuhan pelanggan baru setiap bulan?
plot_monthly_new_customer <- data_input_1 %>%
group_by(CustomerID) %>%
summarise(first_order = min(InvoiceDate)) %>%
ungroup() %>%
mutate( yearmonth = format(first_order, format="%Y-%m-1"),
yearmonth = ymd(yearmonth),
ym = as.yearmon(first_order)) %>%
group_by(ym,yearmonth) %>%
summarise(total_new_customer = n()) %>%
ungroup() %>%
mutate(
normalisasi = (total_new_customer-min(total_new_customer))/(max(total_new_customer)-min(total_new_customer)),
popup=glue("Year-Month : {ym}
Total New Customer : {total_new_customer} ({round((total_new_customer/sum(total_new_customer))*100,1)}%)")
) %>%
ggplot(aes(yearmonth,normalisasi))+
geom_area(fill="blue",alpha=0.7)+
geom_line(size=0.7,color="#181818") +
labs(
title = "Growth of New Customer",
x = "Month-Year",
y = NULL
)+
geom_point(color="#181818", size = 2, alpha = 0.9, aes(text=popup))+
scale_x_date(breaks=date_breaks('1 months'),
labels=date_format('%b %y'))+
theme(
axis.text.y = element_blank()
)## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` regrouping output by 'ym' (override with `.groups` argument)
## Warning: Ignoring unknown aesthetics: text
Argumen Anda: pertumbuhan pelanggan baru setiap bulan tertinggi pada desember 10 dan paling rendah agustus 11
• Gunakan Radar-charts untuk menganalisis Waktu pemesanan yang terbaru dalam (bulanan, harian, dan per-jam)
# Bulan
Bulanan <- data_input_1 %>% select(InvoiceDate,InvoiceNo) %>%
distinct() %>%
mutate(month = month(InvoiceDate),
day = day(InvoiceDate)) %>%
group_by(month,day) %>%
summarise(total = n()) %>%
ungroup() %>%
group_by(day) %>%
summarise(avg_monthly_trans = as.integer(median(total))) %>%
ungroup() %>%
mutate(popup = glue("Date : {day}
Total Transaction: {avg_monthly_trans}")) ## `summarise()` regrouping output by 'month' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
Bulanan %>%
ggplot(aes(x=as.factor(day), y=avg_monthly_trans)) +
geom_bar(stat="identity", aes(fill=avg_monthly_trans),show.legend = FALSE)+
labs(title = "Transaksi perbulan",
x = "Tanggal",
y = NULL)+
theme_minimal()+
scale_fill_gradient(low = "#FFF8DC", high="#00008B")+
theme(axis.title = element_blank(),
legend.position = "none",
plot.title = element_text(hjust = 0.5,size=12, face="bold"),
plot.subtitle = element_text(hjust = 0.5,size=10),
axis.text.y = element_blank(),
axis.text.x=element_text(size=11, face="bold"))+
coord_polar() -> Bulan
# Hari
Harian <- data_input_1 %>% select(InvoiceNo,InvoiceDate) %>%
distinct() %>%
mutate(month = month(InvoiceDate),
wday = wday(InvoiceDate,week_start = getOption("lubridate.week.start", 1))) %>%
group_by(month,wday) %>%
summarise(total = n()) %>%
ungroup() %>%
group_by(wday) %>%
summarise(avg_wday_trans = as.integer(median(total))) %>%
ungroup() %>%
mutate(popup = glue("wday : {wday}
Total Transaction: {avg_wday_trans}")) ## `summarise()` regrouping output by 'month' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
Harian %>%
ggplot(aes(wday,avg_wday_trans))+
geom_bar(width=1, stat="identity", show.legend = FALSE, aes(fill=avg_wday_trans))+
labs(
title = "Transaksi Harian",
x = "Hari",
y = NULL)+
scale_x_continuous(breaks = c(1,2,3,4,5,6,7),
labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))+
theme_minimal()+
scale_fill_gradient(low = "#adff2f", high="#006400")+
theme(axis.title = element_blank(),
legend.position = "none",
plot.title = element_text(hjust = 0.5,size=12, face="bold"),
plot.subtitle = element_text(hjust = 0.5,size=10),
axis.text.y = element_blank(),
axis.text.x=element_text(size=11, face="bold"))+
coord_polar() -> Hari
#Jam
Jaman <- data_input_1 %>% select(InvoiceDate,InvoiceNo) %>%
distinct() %>%
mutate(day = day(InvoiceDate),
hour = hour(InvoiceDate)) %>%
group_by(day,hour) %>%
summarise(total = n()) %>%
ungroup() %>%
group_by(hour) %>%
summarise(avg_hourly_trans = as.integer(median(total))) %>%
ungroup() %>%
mutate(popup = glue("Hour of Day : {hour}
Total Transaction: {avg_hourly_trans}")) ## `summarise()` regrouping output by 'day' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
time_range = data_frame(hour = c(0:23))
data_frame(hour = c(0:23)) %>%
left_join(Jaman ,by="hour") %>%
mutate(hour = as.factor(hour)) %>%
ggplot(aes(x=hour,y=avg_hourly_trans))+
geom_bar(stat="identity",show.legend = FALSE, aes(fill=avg_hourly_trans))+
labs(title = "Transaksi Perjam",
x = "Jam",
y = NULL)+
theme_minimal()+
scale_fill_gradient(low = "#FFFF00", high="#FF4500")+
theme(axis.title = element_blank(),
legend.position = "none",
plot.title = element_text(hjust = 0.5,size=12, face="bold"),
plot.subtitle = element_text(hjust = 0.5,size=10),
axis.text.y = element_blank(),
axis.text.x=element_text(size=11, face="bold"))+
coord_polar() -> Jam
gridExtra::grid.arrange(Bulan,Hari,Jam, ncol = 3)## Warning: Removed 9 rows containing missing values (position_stack).
Argumen Anda: Dari hasil dapat dilihat, semakin gelap warnanya, maka semakin banyak transaksi yang terjadi. Untuk Bulanannya, Transaksi terjadi paling banyak di setiap tanggal 30.Untuk harian, transaksi terjadi paling banyak di hari Kamis.Untuk jam nya, transaksi terjadi paling banyak di jam 12 siang setiap harinya.
• Gunakan interaktif Bar-Chart untuk memperlihatkan berapa frekuensi transaksi setiap bulan?
Transaksi_Bulan <- data_input_1 %>%
select(InvoiceDate,InvoiceNo) %>%
distinct() %>%
mutate( yearmonth = format(InvoiceDate, format = "%y - %m - 1"),
yearmonth = ymd(yearmonth)) %>%
group_by(yearmonth) %>%
summarise(total_transaksi = n()) %>%
ungroup()## `summarise()` ungrouping output (override with `.groups` argument)
freq_bulanan <- plot_ly(Transaksi_Bulan,
x = ~yearmonth,
y = ~total_transaksi,
type = 'bar',
marker = list(color = rainbow(12),
line = list(color = "black",
width = 1.5)))
freq_bulanan <- freq_bulanan %>% layout(title = "Frekuensi Transaksi Perbulan",
xaxis = list(title = "Bulan",
type = "date",
tickformat = "%b %y"),
yaxis = list(title = "Total Transaksi"))
freq_bulanan## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Argumen Anda: Frekuensi transaksi terbanyak terjadi pada November 2011 sejumlah 2754, dan Frekuensi transaksi yang paling sedikit terjadi pada Januari 2011 sebanyak 1088 transaksi
• Gunakan interaktif Bar-Chart yang horizontal untuk memperlihatkan 10 teratas dari produk terpopuler berdasarkan frekuensi transaksinya!
plot_most_frequency <- data_input_1 %>% group_by(StockCode,Description) %>%
summarise(frequency = n()) %>%
ungroup() %>%
arrange(desc(frequency)) %>%
head(10) %>%
mutate(Description = as.factor(Description),
Description = reorder(Description,frequency)) %>%
ggplot(aes(x=Description, y=frequency))+
geom_bar(stat="identity", aes(fill=Description, text=frequency), show.legend = FALSE)+
labs(title="Most 10 Popular Product by Frequency Order",
x=NULL)+
coord_flip()## `summarise()` regrouping output by 'StockCode' (override with `.groups` argument)
## Warning: Ignoring unknown aesthetics: text
ggplotly(plot_most_frequency, tooltip="text") %>%
layout(showlegend=FALSE) %>%
config(displayModeBar = F, scrollzoom = F)## Warning: 'config' objects don't have these attributes: 'scrollzoom'
## Valid attributes include:
## 'staticPlot', 'plotlyServerURL', 'editable', 'edits', 'autosizable', 'responsive', 'fillFrame', 'frameMargins', 'scrollZoom', 'doubleClick', 'doubleClickDelay', 'showAxisDragHandles', 'showAxisRangeEntryBoxes', 'showTips', 'showLink', 'linkText', 'sendData', 'showSources', 'displayModeBar', 'showSendToCloud', 'showEditInChartStudio', 'modeBarButtonsToRemove', 'modeBarButtonsToAdd', 'modeBarButtons', 'toImageButtonOptions', 'displaylogo', 'watermark', 'plotGlPixelRatio', 'setBackground', 'topojsonURL', 'mapboxAccessToken', 'logging', 'notifyOnLogging', 'queueLength', 'globalTransforms', 'locale', 'locales'
Argumen Anda: Ternyata transaksi pembelian product terbanyak berdasarkan frekuensi transaksi yaitu white hanging heart T-Light Holder dan paling sedikit pembeli adalah lunch bag black skull and suki design
• Gunakan interaktif Bar-Chart yang horizontal untuk memperlihatkan 10 produk paling populer berdasarkan jumlah pesanan!
plot_most_quantity <- data_input_1 %>%
group_by(StockCode,Description) %>%
summarise(total_quantity = sum(Quantity)) %>%
ungroup() %>%
arrange(desc(total_quantity)) %>%
head(10) %>%
mutate(Description = as.factor(Description),
Description = reorder(Description,total_quantity)) %>%
ggplot(aes(x=Description, y=total_quantity))+
geom_bar(stat="identity", aes(fill=Description, text=total_quantity), show.legend = FALSE)+
labs(title="Most 10 Popular Product by Quantity Order",
x=NULL)+
coord_flip()## `summarise()` regrouping output by 'StockCode' (override with `.groups` argument)
## Warning: Ignoring unknown aesthetics: text
ggplotly(plot_most_quantity, tooltip = "text") %>%
layout(showlegend=FALSE) %>%
config(displayModeBar = F, scrollzoom = F)## Warning: 'config' objects don't have these attributes: 'scrollzoom'
## Valid attributes include:
## 'staticPlot', 'plotlyServerURL', 'editable', 'edits', 'autosizable', 'responsive', 'fillFrame', 'frameMargins', 'scrollZoom', 'doubleClick', 'doubleClickDelay', 'showAxisDragHandles', 'showAxisRangeEntryBoxes', 'showTips', 'showLink', 'linkText', 'sendData', 'showSources', 'displayModeBar', 'showSendToCloud', 'showEditInChartStudio', 'modeBarButtonsToRemove', 'modeBarButtonsToAdd', 'modeBarButtons', 'toImageButtonOptions', 'displaylogo', 'watermark', 'plotGlPixelRatio', 'setBackground', 'topojsonURL', 'mapboxAccessToken', 'logging', 'notifyOnLogging', 'queueLength', 'globalTransforms', 'locale', 'locales'
Argumen Anda: produk paling populer berdasarkan jumlah pesanan paling banyak adalah medium ceramic top storage jar- dan paling sedikit adalah pack of 60 pink paisley cake cases
• Gunakan interaktif Bar-Chart yang horizontal untuk memperlihatkan 10 produk paling populer menurut total pelanggan!
plot_most_customer <- data_input_1 %>%
select(CustomerID, StockCode,Description) %>%
distinct() %>%
group_by(StockCode,Description) %>%
summarise(total_customer = n()) %>%
ungroup() %>%
arrange(desc(total_customer)) %>%
head(10) %>%
mutate(Description = as.factor(Description),
Description = reorder(Description,total_customer)) %>%
ggplot(aes(x=Description, y=total_customer))+
geom_bar(stat="identity", aes(fill=Description, text=total_customer), show.legend = FALSE)+
labs(title="Most 10 Popular Product by Total Customer",
x=NULL)+
coord_flip()## `summarise()` regrouping output by 'StockCode' (override with `.groups` argument)
## Warning: Ignoring unknown aesthetics: text
ggplotly(plot_most_customer, tooltip = "text") %>%
layout(showlegend=FALSE) %>%
config(displayModeBar = F, scrollzoom = F)## Warning: 'config' objects don't have these attributes: 'scrollzoom'
## Valid attributes include:
## 'staticPlot', 'plotlyServerURL', 'editable', 'edits', 'autosizable', 'responsive', 'fillFrame', 'frameMargins', 'scrollZoom', 'doubleClick', 'doubleClickDelay', 'showAxisDragHandles', 'showAxisRangeEntryBoxes', 'showTips', 'showLink', 'linkText', 'sendData', 'showSources', 'displayModeBar', 'showSendToCloud', 'showEditInChartStudio', 'modeBarButtonsToRemove', 'modeBarButtonsToAdd', 'modeBarButtons', 'toImageButtonOptions', 'displaylogo', 'watermark', 'plotGlPixelRatio', 'setBackground', 'topojsonURL', 'mapboxAccessToken', 'logging', 'notifyOnLogging', 'queueLength', 'globalTransforms', 'locale', 'locales'
Argumen Anda: produk paling populer menurut total pelanggan terbanyak adalah regency cakestand 3 Tier dan paling sedikit adalah baking set 9 piece rertospot
• Gunakan interaktif Bar-Chart yang horizontal untuk memperlihatkan 10 produk paling populer berdasarkan Nilai Mata uang!
plot_most_profit <- data_input_1 %>%
group_by(StockCode,Description) %>%
summarise(total_amount = sum(total_amount)) %>%
ungroup() %>%
arrange(desc(total_amount)) %>%
head(10) %>%
mutate(Description = as.factor(Description),
Description = reorder(Description,total_amount)) %>%
ggplot(aes(x=Description, y=total_amount))+
geom_bar(stat="identity", aes(fill=Description, text=paste0("GBP ",total_amount)), show.legend = FALSE)+
labs(title="Most 10 Popular Product by Order Value",
x=NULL)+
coord_flip()## `summarise()` regrouping output by 'StockCode' (override with `.groups` argument)
## Warning: Ignoring unknown aesthetics: text
ggplotly(plot_most_profit, tooltip = "text") %>%
layout(showlegend=FALSE) %>%
config(displayModeBar = F, scrollzoom = F)## Warning: 'config' objects don't have these attributes: 'scrollzoom'
## Valid attributes include:
## 'staticPlot', 'plotlyServerURL', 'editable', 'edits', 'autosizable', 'responsive', 'fillFrame', 'frameMargins', 'scrollZoom', 'doubleClick', 'doubleClickDelay', 'showAxisDragHandles', 'showAxisRangeEntryBoxes', 'showTips', 'showLink', 'linkText', 'sendData', 'showSources', 'displayModeBar', 'showSendToCloud', 'showEditInChartStudio', 'modeBarButtonsToRemove', 'modeBarButtonsToAdd', 'modeBarButtons', 'toImageButtonOptions', 'displaylogo', 'watermark', 'plotGlPixelRatio', 'setBackground', 'topojsonURL', 'mapboxAccessToken', 'logging', 'notifyOnLogging', 'queueLength', 'globalTransforms', 'locale', 'locales'
Argumen Anda: produk paling populer berdasarkan Nilai Mata uang terbanyak adalah regency cakestand 3 tier dan paling sedikit adalah paper chain kit 50’s christmas
• Lakukan Analisa dengan menggunakan Time Series apakah penjualan berdasarkan Nilai Mata Uang Meningkat-Menurun?
income_bulan <- data_input_1 %>%
group_by(Description, total_amount) %>%
summarise(awal = min(InvoiceDate)) %>%
ungroup() %>%
mutate(ym = format(awal, format = "%y-%m-1"),
ym = ymd(ym)) %>%
group_by(ym) %>%
summarise(income = sum(total_amount)) %>%
ungroup() %>%
mutate(popup = glue("Bulan : {ym}
Income : {income}"))## `summarise()` regrouping output by 'Description' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
income_bulan_plot <- income_bulan %>%
ggplot(aes(x = ym,
y = income,
group = 1)) +
geom_line(size = 1, colour="red") +
labs( title = "Perubahan Income Perbulan",
x = "Bulan",
y = "Income")+
scale_x_date(breaks = date_breaks(width = "1 month"),
labels = date_format("%b %y")) +
scale_y_continuous(label = scales::format_format(big.mark = ",",
decimal.mark = ".",
scientific = F))
income_harian <- df_customer_transaction %>%
group_by(Description, total_amount) %>%
summarise(awal = min(InvoiceDate)) %>%
ungroup() %>%
mutate(ym = format(awal, format = "%y-%m-%d"),
ym = ymd(ym)) %>%
group_by(ym) %>%
summarise(income = sum(total_amount)) %>%
ungroup() %>%
mutate(popup = glue("Bulan : {ym}
Income : {income}"))## `summarise()` regrouping output by 'Description' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
income_hari_plot <- income_harian %>%
ggplot(aes(x = ym,
y = income,
group = 1)) +
geom_line(size = 1, colour="green") +
labs( title = "Perubahan Income Perbulan - Perhari",
x = "Bulan",
y = "Income")+
scale_x_date(breaks = date_breaks(width = "1 month"),
labels = date_format("%b %y")) +
scale_y_continuous(label = scales::format_format(big.mark = ",",
decimal.mark = ".",
scientific = F))
hasil <- gridExtra::grid.arrange(income_bulan_plot, income_hari_plot, ncol = 1)• Gunakan Tree-Map untuk memvisualisasikan sebagian besar Konsumen menurut Negara?
Argumen Anda: Konsumen terbesar jatuh pada united kingdom.
Berikan pandangan dan pendapat terkait kasus yang sudah anda kerjakan diatas (Apa yang akan anda lakukan sebagai Manager mengenai kasus tersebeut untuk mengembangkan bisnis perbelanjaan online tersebut berdasarkan analisa yang anda temukan!).
Argumen Anda:
cara mengembangkan bisnis pembelanjaan online yang efektif: - Meningkatkan kualitas produk - Meningkatkan kualitas pelayanan - Diskon produk - Promosi secara rutin - Inovasi produk paling baru