Projek Status : on-progress update description

Email :
Linkedin : www.linkedin.com/in/gabrielerichson
Github : www.github.com/gabrielerichsonmrp

"Personalized Product
# Kustomisasi Warna dan Visualisasi chart
my_color = c(
  col1="#d3f2a3",
  col2="#97e196",
  col3="#6cc08b",
  col4="#4c9b82",
  col5="#217a79",
  col6="#105965",
  col7="#074050"
)

my_theme_fill  <- get_scale_fill(get_pal(my_color))
my_theme_color <- get_scale_color(get_pal(my_color))
my_theme_hex <- get_hex(my_color)


color_dark_text = "#222629"

# MY PLOT THEME
my_plot_theme <- function (base_size, base_family="Segoe UI Semibold"){ 
  dark_color="#222629"
  facet_header = "#78767647"
  dark_text = "#222629"
  
  half_line <- base_size/2
  theme_algoritma <- theme(
    
    plot.background = element_rect(fill=NA,colour = NA), #background plot
    plot.title = element_text(size = rel(1.2), margin = margin(b = half_line * 1.2), 
                              color= dark_text, hjust = 0, family=base_family, face = "bold"),
    plot.subtitle = element_text(size = rel(1.0), margin = margin(b = half_line * 1.2), color= dark_text, hjust=0),
    plot.margin=unit(c(0.5,0.5,0.5,0.5),"cm"),
    #plot.margin=unit(c(0.5,r=5,1,0.5),"cm"),
    
    panel.background = element_rect(fill="#18181800",colour = "#e8e8e8"), #background chart
    panel.border = element_rect(fill=NA,color = NA),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(color="#e8e8e8", linetype=2),
    panel.grid.minor.y = element_blank(),
    #panel.margin = unit(0.8*half_line, "mm"), 
    panel.margin.x = NULL, 
    panel.margin.y = NULL, 
    panel.ontop = FALSE,
    panel.spacing = unit(1.2,"lines"),
    
    legend.background = element_rect(fill="#18181800",colour = NA),
    legend.text = element_text(size = rel(0.7),color=dark_text),
    legend.title =  element_text(colour = dark_text, size = base_size, lineheight = 0.8),
    legend.box = NULL, 
    
    # text = element_text(colour = "white", size = base_size, lineheight = 0.9, 
    #                    angle = 0, margin = margin(), debug = FALSE),
    axis.text = element_text(size = rel(0.8), color=dark_text),
    axis.text.x = element_text(colour = dark_text, size = base_size, margin = margin(t = 0.8 * half_line/2)),
    axis.text.y = element_text(colour = dark_text, size = base_size, margin = margin(r = 0.8 * half_line/2)),
    axis.title.x = element_text(colour = dark_text, size = base_size, lineheight = 0.8,
                                margin = margin(t = 0.8 * half_line, b = 0.8 * half_line/2)), 
    axis.title.y = element_text(colour = dark_text, size = base_size, lineheight = 0.8,
                                angle = 90, margin = margin(r = 0.8 * half_line, l = 0.8 * half_line/2)),
    axis.ticks = element_blank(),
    
    strip.background = element_rect(fill=facet_header,colour = NA),
    strip.text = element_text(colour = dark_text, size = rel(0.8)), 
    strip.text.x = element_text(margin = margin(t = half_line*0.8, b = half_line*0.8)), 
    strip.text.y = element_text(angle = -90, margin = margin(l = half_line, r = half_line)),
    strip.switch.pad.grid = unit(0.1, "cm"), 
    strip.switch.pad.wrap = unit(0.1, "cm"),
    complete = TRUE
    
  )
}

1 Project Description

Pada Part 2 ini, kita akan melakukan analisis dan segmentasi customer berdasarkan RFM Value. Project ini terbagi menjadi 3 part yaitu:

  1. Part 1 : Data Preparation and Exploratory Analysis
  2. Part 2 : Customer Analysis and Segmentation
  3. Part 3 : Product Personalization
  4. Shiny Dashboard : CUSTMARKETS


Saya sangat menyarankan teman-teman membaca setiap part secara berurutan karena setiap part berhubungan.



2 Data Preparation

Pada part 3 ini fokus kita untuk membentuk model untuk personalized product recommendatoin. Berikut 10 data teratas:

3 modeling : Development

Untuk membuat model ini kita akan menguji beberapa algoritma yaitu Association Rules, Item-based CF dan User-based CF menggunakan pendekatan pearson correlation, consine similarity dan jaccard distance.

3.1 Generate Matrix

Data yang kita miliki merupakan data histori transaksi, sehingga kita bisa menggunakan binaryRatingsMatrix dengan ketentuan Jika produk dibeli maka di set 1 dan jika tidak maka di set 0. Matriks jenis ini biasa disebut juga Sparse Matrix. Kita akan mencoba menggunakan matriks atas data Customer-Produk dan Invoice-Produk. Syarat yang harus dipenuhi dari sistem rekomendasi yaitu customer harus pernah melakukan transaksi atau produk pernah dibeli.

3.1.1 Matrix of Customer-Product

Matriks customer-product terdiri dari data histori transaksi setiap produk yang dibeli oleh masing-masing customer. Matriks ini bisa digunakan untuk memberikan rekomendasi yang bersifat bebas atau kapanpun. Misalkan pada aplikasi m-commerce, ketika user sudah login dan masuk kehalaman pertama maka pengguna akan mendapat rekomendasi langung.

#> customer_id  stock_code 
#>        4293        3624
#> 4293 x 3624 rating matrix of class 'binaryRatingMatrix' with 257561 ratings.

3.1.2 Matrix of Invoice-Product without Drop NA Rows

Matriks invoice-product sering digunakan untuk melakukan market based analysis. Matriks ini tidak memperdulikan siapa penggunanya, dia hanya peduli produk apa yang dibeli dalam suatu transaksi. Misalkan pada mcommerce, ketika pengguna memasukan produk ke keranjang belanja, maka sistem akan memberikan produk rekomedasi atas produk yang memiliki kemiripan dengan yang dikeranjang belanja pengguna. Untuk lebih detail, silahkan lihat ini Cross Selling and Market Basket Analysis

#> invoice_no stock_code 
#>      18957       3768
#> 18957 x 3768 rating matrix of class 'binaryRatingMatrix' with 492627 ratings.

3.2 Model Validation with K-Fold

Proses modeling menggunakan metode K-Fold Cross Validation menggunakan K=5 dengan proporsi train 80% dan tes 20% pada setiap fold.

algorithms_binary <- list(
  "association_rules" = list(name  = "AR", param = list(support = 0.05, confidence = 0.75)),
  "popular"           = list(name  = "POPULAR", param = NULL),
  "random"            = list(name  = "RANDOM",  param = NULL),
  
  "ibcf_jaccard_5"    = list(name  = "IBCF", param = list(k = 5)),
  #"ibcf_jaccard_30"   = list(name  = "IBCF", param = list(k = 30)),
  #"ibcf_jaccard_50"   = list(name  = "IBCF", param = list(k = 50)),
  #"ibcf_jaccard_100"  = list(name  = "IBCF", param = list(k = 100)),
  #"ibcf_jaccard_200"  = list(name  = "IBCF", param = list(k = 200)),
  
  "ibcf_pearson_5"    = list(name  = "IBCF", param = list(method = "Pearson", k = 5)),
  #"ibcf_pearson_30"   = list(name  = "IBCF", param = list(method = "Pearson", k = 30)),
  #"ibcf_pearson_50"   = list(name  = "IBCF", param = list(method = "Pearson", k = 50)),
  #"ibcf_pearson_200"  = list(name  = "IBCF", param = list(method = "Pearson", k = 200)),
  
  "ibcf_cosine_100"     = list(name  = "IBCF", param = list(method = "Cosine", k = 100)),
  #"ibcf_cosine_200"    = list(name  = "IBCF", param = list(method = "Cosine", k = 200)),
  
  #"ubcf_jaccard_25"   = list(name  = "UBCF", param = list(nn = 25)),
  #"ubcf_jaccard_50"   = list(name  = "UBCF", param = list(nn = 50)),
  "ubcf_jaccard_100"  = list(name  = "UBCF", param = list(nn = 100)),
  #"ubcf_jaccard_200"  = list(name  = "UBCF", param = list(nn = 200)),
  #"ubcf_jaccard_300"  = list(name  = "UBCF", param = list(nn = 300)),

  #"ubcf_pearson_25"   = list(name  = "UBCF", param = list(method = "Pearson", nn = 25)),
  #"ubcf_pearson_50"   = list(name  = "UBCF", param = list(method = "Pearson", nn = 50)),
  "ubcf_pearson_100"  = list(name  = "UBCF", param = list(method = "Pearson", nn = 100)),
  #"ubcf_pearson_200"  = list(name  = "UBCF", param = list(method = "Pearson", nn = 200)),
  #"ubcf_pearson_200"  = list(name  = "UBCF", param = list(method = "Pearson", nn = 300)),

  #"ubcf_cosine_25"    = list(name  = "UBCF", param = list(method = "Cosine", nn = 25)),
  #"ubcf_cosine_50"    = list(name  = "UBCF", param = list(method = "Cosine", nn = 50)),
  "ubcf_cosine_100"   = list(name  = "UBCF", param = list(method = "Cosine", nn = 100))
  #"ubcf_cosine_200"   = list(name  = "UBCF", param = list(method = "Cosine", nn = 200)),
  #"ubcf_cosine_200"   = list(name  = "UBCF", param = list(method = "Cosine", nn = 300))
  )

memory.limit(size=56000)
start <- Sys.time()
results_custprod <- recommenderlab::evaluate(scheme_custprod, 
                                    algorithms_binary, 
                                    type  = "topNList", 
                                    n     = c(1, 3, 5, 10, 15, 20))

results_invprod <- recommenderlab::evaluate(scheme_invprod, 
                                    algorithms_binary, 
                                    type  = "topNList", 
                                    n     = c(1, 3, 5, 10, 15, 20))

results_invprod_drop <- recommenderlab::evaluate(scheme_invprod_drop, 
                                    algorithms_binary, 
                                    type  = "topNList", 
                                    n     = c(1, 3, 5, 10, 15, 20))


end <- Sys.time()
cat('runtime', end - start)

wd <-  as.character(getwd())
saveRDS(object=results_custprod, file=paste(paste(wd,"/modeling_recommendation/development/",sep = ""),
                                            "results_custprod_fx.rds",sep=""))

saveRDS(object=results_invprod, file=paste(paste(wd,"/modeling_recommendation/development/",sep = ""),
                                           "results_invprod_fx.rds",sep=""))

saveRDS(object=results_invprod_drop, file=paste(paste(wd,"/modeling_recommendation/development/",sep = ""),
                                           "results_invprod_drop_fx.rds",sep=""))

results_custprod
results_invprod
results_invprod_drop

3.3 Model Evaluation

4 Modeling : Production

Dari hasil evaluasi diatas, metode Item-based Collaborative Filtering menghasilkan nilai yang lebih baik pada ketiga matriks. Pada matriks invoice-product, jika hanya merokomendasikan 1 produk maka lebih baik menggunakan metode User-based CF karena bisa lihat garis cenderung jauh menurun apabila memberikan rekomendasi lebih dari 1. Oleh karena itu untuk enviroment production akan menggunakan metode rekomendasi Item-based Collaborative Filtering.

4.1 Rating Matrix Customer-Product

#> Observations: 3,624
#> Variables: 2
#> $ stock_code  <chr> "10002", "10080", "10120", "10123C", "10124A", "10124G"...
#> $ description <chr> "INFLATABLE POLITICAL GLOBE", "GROOVY CACTUS INFLATABLE...
#> customer_id  stock_code 
#>        4293        3624
#> 4293 x 3624 rating matrix of class 'binaryRatingMatrix' with 257561 ratings.

4.2 Generate Recommendation Model

#>                      Length   Class     Mode     
#> description                 1 -none-    character
#> sim                  13133376 dgCMatrix S4       
#> k                           1 -none-    numeric  
#> method                      1 -none-    character
#> normalize_sim_matrix        1 -none-    logical  
#> alpha                       1 -none-    numeric  
#> verbose                     1 -none-    logical

4.3 Predict Recommendation

4.3.1 Get History Product

Rekomendasi ini dihasilkan berdasarkan histori produk yang dibeli oleh customer. Misalkan kita simulasi untuk customer_id == 17850. Histori produk yang dibeli sebagai berikut:

4.3.3 Predict Recommendation

Berikut Top produk yang bisa direkomendasikan kepada customer_id == 17850 :