Pre-Start

Load package yang diperlukan. Saya menggunakan algoritma apriori sehingga membutuhkan library arules dan arulesviz untuk visualisasi

Background

Pada Latihan ini kita akan membuat association rules dari data yang digunakan diambil dari Weka dataset. Dataset dapat diakses pada data/supermarket.csv. supermarket.csv merupakan dataset yang berisi daftar pembelian barang setiap transaksinya.

Data Preparation

Read data supermarket

Data Overview

Summary data dan lihat data secara keseluruhan

##       TID                    name      
##  Min.   :   1   bread and cake : 3330  
##  1st Qu.:1162   fruit          : 2962  
##  Median :2324   vegetables     : 2961  
##  Mean   :2317   milk cream     : 2939  
##  3rd Qu.:3476   baking needs   : 2795  
##  Max.   :4627   frozen foods   : 2717  
##                 (Other)        :61922

Exploratory Analysis

Saya akan memulai dengan menunjukan barang yang paling banyak dibeli di supermartket ini .

Build Rules

Setelah itu, saya membuat sebuah rules algoritma apriori.Pertama, ubah dataframe menjadi sebuah list

## $`1`
##  [1] baby needs         bread and cake     baking needs       juice sat cord ms 
##  [5] biscuits           canned vegetables  cleaners polishers coffee            
##  [9] sauces gravy pkle  confectionary      dishcloths scour   frozen foods      
## [13] razor blades       party snack foods  tissues paper prd  wrapping          
## [17] mens toiletries    cheese             milk cream         margarine         
## [21] small goods        fruit              vegetables         750ml white nz    
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping
## 
## $`2`
##  [1] canned fish meat   canned fruit       canned vegetables  sauces gravy pkle 
##  [5] deod disinfectant  frozen foods       pet foods          laundry needs     
##  [9] tissues paper prd  deodorants soap    haircare           milk cream        
## [13] fruit              vegetables        
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping
## 
## $`3`
##  [1] bread and cake     baking needs       juice sat cord ms  biscuits          
##  [5] canned fruit       sauces gravy pkle  puddings deserts   wrapping          
##  [9] health food other  small goods        dairy foods        beef              
## [13] lamb               fruit              vegetables         stationary        
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping
## 
## $`4`
##  [1] bread and cake     baking needs       juice sat cord ms  biscuits          
##  [5] canned vegetables  breakfast food     cleaners polishers frozen foods      
##  [9] jams spreads       pet foods          party snack foods  tissues paper prd 
## [13] deodorants soap    mens toiletries    cheese             margarine         
## [17] dairy foods        beef               stationary         prepared meals    
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping
## 
## $`5`
##  [1] bread and cake     baking needs       juice sat cord ms  tea               
##  [5] cleaners polishers coffee             sauces gravy pkle  frozen foods      
##  [9] jams spreads       laundry needs      wrapping           deodorants soap   
## [13] haircare           dental needs       meat misc          milk cream        
## [17] margarine          beef               poultry            potatoes          
## [21] vegetables         condiments         small goods2      
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping
## 
## $`6`
##  [1] bread and cake     baking needs       juice sat cord ms  tea               
##  [5] biscuits           canned vegetables  breakfast food     confectionary     
##  [9] frozen foods       spices             party snack foods  tissues paper prd 
## [13] deodorants soap    margarine          dairy foods        fruit             
## [17] potatoes           vegetables         stationary         bake off products 
## 100 Levels: 750ml red imp  750ml red nz  750ml white imp  ... wrapping

Kemudian, buat class list menjadi transactions. Hal ini akan mempermudah untuk melakukan algoritma apriori

##     items                transactionID
## [1] {750ml white nz ,                 
##      baby needs ,                     
##      baking needs ,                   
##      biscuits,                        
##      bread and cake ,                 
##      canned vegetables ,              
##      cheese,                          
##      cleaners polishers,              
##      coffee,                          
##      confectionary,                   
##      dishcloths scour,                
##      frozen foods ,                   
##      fruit,                           
##      juice sat cord ms,               
##      margarine,                       
##      mens toiletries ,                
##      milk cream,                      
##      party snack foods ,              
##      razor blades ,                   
##      sauces gravy pkle,               
##      small goods ,                    
##      tissues paper prd ,              
##      vegetables,                      
##      wrapping}                       1
## [2] {canned fish meat ,               
##      canned fruit ,                   
##      canned vegetables ,              
##      deod disinfectant,               
##      deodorants soap,                 
##      frozen foods ,                   
##      fruit,                           
##      haircare,                        
##      laundry needs ,                  
##      milk cream,                      
##      pet foods ,                      
##      sauces gravy pkle,               
##      tissues paper prd ,              
##      vegetables}                     2
## [3] {baking needs ,                   
##      beef,                            
##      biscuits,                        
##      bread and cake ,                 
##      canned fruit ,                   
##      dairy foods ,                    
##      fruit,                           
##      health food other ,              
##      juice sat cord ms,               
##      lamb,                            
##      puddings deserts,                
##      sauces gravy pkle,               
##      small goods ,                    
##      stationary,                      
##      vegetables,                      
##      wrapping}                       3
## [1] 4601  100

Buat rules menggunakan fungsi apriori dengan parameter supp = 0.1 dan conf = 0.75

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.75    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 460 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[100 item(s), 4601 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.05s].
## writing ... [9958 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Lakukan summary untuk rules

## set of 9958 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4    5    6    7 
##   41  720 3440 4367 1320   70 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   4.000   5.000   4.644   5.000   7.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift      
##  Min.   :0.1002   Min.   :0.7500   Min.   :0.1098   Min.   :1.042  
##  1st Qu.:0.1067   1st Qu.:0.7716   1st Qu.:0.1324   1st Qu.:1.196  
##  Median :0.1161   Median :0.7995   Median :0.1452   Median :1.252  
##  Mean   :0.1259   Mean   :0.8080   Mean   :0.1563   Mean   :1.260  
##  3rd Qu.:0.1332   3rd Qu.:0.8408   3rd Qu.:0.1674   3rd Qu.:1.313  
##  Max.   :0.5079   Max.   :0.9205   Max.   :0.6438   Max.   :1.594  
##      count       
##  Min.   : 461.0  
##  1st Qu.: 491.0  
##  Median : 534.0  
##  Mean   : 579.2  
##  3rd Qu.: 613.0  
##  Max.   :2337.0  
## 
## mining info:
##             data ntransactions support confidence
##  sup_transaction          4601     0.1       0.75

Rules Interpretation

Bandingkan rules berdasarkan confidence, support, dan lift.

##     lhs                     rhs                 support confidence  coverage     lift count
## [1] {biscuits,                                                                             
##      frozen foods ,                                                                        
##      milk cream,                                                                           
##      pet foods ,                                                                           
##      vegetables}         => {bread and cake } 0.1032384  0.9205426 0.1121495 1.271897   475
## [2] {baking needs ,                                                                        
##      biscuits,                                                                             
##      fruit,                                                                                
##      margarine,                                                                            
##      milk cream,                                                                           
##      vegetables}         => {bread and cake } 0.1008476  0.9188119 0.1097587 1.269506   464
## [3] {biscuits,                                                                             
##      frozen foods ,                                                                        
##      margarine,                                                                            
##      milk cream,                                                                           
##      vegetables}         => {bread and cake } 0.1167138  0.9179487 0.1271463 1.268313   537
## [4] {biscuits,                                                                             
##      canned vegetables ,                                                                   
##      frozen foods ,                                                                        
##      fruit,                                                                                
##      vegetables}         => {bread and cake } 0.1069333  0.9179104 0.1164964 1.268260   492
## [5] {baking needs ,                                                                        
##      frozen foods ,                                                                        
##      fruit,                                                                                
##      margarine,                                                                            
##      milk cream,                                                                           
##      vegetables}         => {bread and cake } 0.1030211  0.9168279 0.1123669 1.266764   474

Nilai confidence yang tinggi menunjukan peluang untuk terbelinya barang B jika sudah membeli barang A. Pada data frame di atas, dapat dilihat bahwa nilai confidence tertinggi ada pada item bread and cake. Hal ini menjelaskan bahwa peluang item bread and cake terbeli sangat tinggi. Banyak kombinasi pembelian item yang akan membuat seorang pelanggan membeli bread and cake.

##     lhs                rhs               support   confidence coverage 
## [1] {milk cream}    => {bread and cake } 0.5079331 0.7951684  0.6387742
## [2] {fruit}         => {bread and cake } 0.5053249 0.7849426  0.6437731
## [3] {vegetables}    => {bread and cake } 0.4994566 0.7760892  0.6435557
## [4] {baking needs } => {bread and cake } 0.4762008 0.7838998  0.6074766
## [5] {frozen foods } => {bread and cake } 0.4627255 0.7835848  0.5905238
##     lift     count
## [1] 1.098670 2337 
## [2] 1.084541 2325 
## [3] 1.072308 2298 
## [4] 1.083100 2191 
## [5] 1.082665 2129

Nilai support menunjukkan peluang seorang pembeli membeli sebuah atau kombinasi barang. Dengan kata lain, support adalah rasio terbelinya suatu barang terhadap total pembelian. Dataset ini menerangkan pembelian milk cream => bread and cake memiliki peluang tertinggi dibanding barang lainnya yaitu sebesar 0.5. Situasi ini menjelaskan bahwa ketika seorang membli milk cream ia akan membeli bread and cake juga.

##     lhs                     rhs                    support confidence  coverage     lift count
## [1] {baking needs ,                                                                           
##      biscuits,                                                                                
##      bread and cake ,                                                                         
##      juice sat cord ms,                                                                       
##      sauces gravy pkle}  => {party snack foods } 0.1010650  0.8072917 0.1251902 1.594141   465
## [2] {laundry needs ,                                                                          
##      wrapping}           => {tissues paper prd } 0.1038905  0.7697262 0.1349707 1.576106   478
## [3] {biscuits,                                                                                
##      bread and cake ,                                                                         
##      frozen foods ,                                                                           
##      juice sat cord ms,                                                                       
##      sauces gravy pkle}  => {party snack foods } 0.1056292  0.7928222 0.1332319 1.565569   486
## [4] {biscuits,                                                                                
##      margarine,                                                                               
##      wrapping}           => {tissues paper prd } 0.1006303  0.7565359 0.1330146 1.549097   463
## [5] {baking needs ,                                                                           
##      biscuits,                                                                                
##      cheese,                                                                                  
##      tissues paper prd } => {margarine}          0.1019344  0.7701149 0.1323625 1.548645   469

Nilai lift adalah tingkat kemampuan suatu peristiwa untuk mendorong terjadinya peristiwa lain. Dalam kasus ini, niali lift dapat dijelaskan dengan kemampuan pembelian suatu barang untuk meningkatkan peluang terbelinya barang lain. Contoh pada data frame di atas, nilai lift dari nomer 1 adalah sebesar 1.5 . Nilai lift > 1 berarti pembelian baking needs, biscuits, bread and cake, juice sar cord ms, dan sauces gravy pkle memiliki kemampuan untuk meningkatan peluang pembelian party snack foods.

Visualisasi

Lakukan visualisasi rules yang sudah dobuat menggunakan grafik agar mudah dianalisa

Grafik di atas menunjukan hubungan atau keterkaitan pembelian suatu barang terhadap pembelian barang lain. Panah menunjukan barang yang dibeli selanjutnya. Di sini, kita dapat melihat barang mana yang paling banyak dibeli dan memiliki keterkaitan terhadap pembelian barang lainnya. Metode ini sangat baik digunakan di industri retail karena dapat membantu dalam penyediaan barang (supply chain) dan juga meningkatkan revenue tiap bulannya. Dengan mengetahui penjualan suatu barang tertentu, pihak supplier dapat menentukan item yang memiliki potensi untuk dijual selanjutnya.

Ending

So, that’s all for the process of market basket analysis and visualization by apriori algorithm using packages in R programming language.I hope this page can help you understand text problem and the solution behind it.

See you in the other page!

Author,
Alfado Sembiring

Notes :
In case you want to look up my profile, click the link below :
Jump To My Profile (open link in a new tab)