1 Market Basket Analysis

  Market basket analysis adalah teknik penambangan data yang digunakan untuk mengungkap pola pembelian dalam bidang ritel. Tujuannya adalah memahami perilaku konsumen dengan mengidentifikasi hubungan antara barang-barang yang dibeli.

2 Source Code

> summary(Data)
transactions as itemMatrix in sparse format with
 300 rows (elements/itemsets/transactions) and
 140 columns (items) and a density of 0.0352381 

most frequent items:
      whole milk other vegetables             soda       rolls/buns 
              83               61               59               56 
   shopping bags          (Other) 
              47             1174 

element (itemset/transaction) length distribution:
sizes
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 32 
61 43 31 45 30 15 14  6 12 11  7  7  3  4  3  1  3  1  2  1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   4.000   4.933   6.250  32.000 

includes extended item information - examples:
            labels
1 abrasive cleaner
2   baby cosmetics
3    baking powder

Berdasarkan data yang digunakan, terdapat 300 transaksi dan 140 item yang dijual pada toko tersebut.

> apply(Data@data[,1:6], 2, function(r)
+   
+ paste(Data@itemInfo[r,"labels"], collapse=","))
[1] "frozen vegetables,fruit/vegetable juice,kitchen towels,margarine,oil,other vegetables,tropical fruit,UHT-milk,whipped/sour cream"                                       
[2] "baking powder,bottled water,butter,frankfurter,frozen vegetables,long life bakery product,misc. beverages,newspapers,rolls/buns,sugar,tropical fruit,whipped/sour cream"
[3] "coffee"                                                                                                                                                                 
[4] "bottled beer,chewing gum,ham,rolls/buns"                                                                                                                                
[5] "bottled water,curd,other vegetables,pickled vegetables,yogurt"                                                                                                          
[6] "butter milk,curd,sausage,whipped/sour cream,whole milk"                                                                                                                 

Berikut merupakan 6 transaksi awal yang terjadi.

2.1 Frequent Itemsets

Apabila digunakan minimum support sebesar 0,02

> itemsets1 <- apriori(Data, parameter=list(minlen=1, maxlen=1,support=0.02,target="frequent itemsets"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
         NA    0.1    1 none FALSE            TRUE       5    0.02      1
 maxlen            target  ext
      1 frequent itemsets TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 6 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [67 set(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> inspect(head(sort(itemsets1,by="support"),10))
     items                support    count
[1]  {whole milk}         0.27666667 83   
[2]  {other vegetables}   0.20333333 61   
[3]  {soda}               0.19666667 59   
[4]  {rolls/buns}         0.18666667 56   
[5]  {shopping bags}      0.15666667 47   
[6]  {yogurt}             0.15000000 45   
[7]  {root vegetables}    0.13000000 39   
[8]  {tropical fruit}     0.10666667 32   
[9]  {bottled water}      0.10333333 31   
[10] {whipped/sour cream} 0.09666667 29   

Untuk frequent 1-itemsets, terdapat 67 transaksi yang nilai supportnya di atas 0,02.

> itemsets2<- apriori(Data, parameter=list(minlen=2, maxlen=2, support=0.02,target="frequent itemsets"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
         NA    0.1    1 none FALSE            TRUE       5    0.02      2
 maxlen            target  ext
      2 frequent itemsets TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 6 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [148 set(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> inspect(head(sort(itemsets2,by="support"),10))
     items                               support    count
[1]  {other vegetables, whole milk}      0.09333333 28   
[2]  {root vegetables, whole milk}       0.08000000 24   
[3]  {tropical fruit, whole milk}        0.06333333 19   
[4]  {soda, whole milk}                  0.06333333 19   
[5]  {rolls/buns, whole milk}            0.06000000 18   
[6]  {whole milk, yogurt}                0.05666667 17   
[7]  {other vegetables, rolls/buns}      0.05666667 17   
[8]  {whipped/sour cream, whole milk}    0.05333333 16   
[9]  {other vegetables, shopping bags}   0.05333333 16   
[10] {other vegetables, root vegetables} 0.05333333 16   

Untuk frequent 2-itemsets, terdapat 148 transaksi yang nilai supportnya di atas 0,02.

> itemsets3<- apriori(Data, parameter=list(minlen=3, maxlen=3, support=0.02,target="frequent itemsets"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
         NA    0.1    1 none FALSE            TRUE       5    0.02      3
 maxlen            target  ext
      3 frequent itemsets TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 6 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [33 set(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> inspect(head(sort(itemsets3,by="support"),10))
     items                                             support    count
[1]  {other vegetables, root vegetables, whole milk}   0.03666667 11   
[2]  {root vegetables, whole milk, yogurt}             0.03333333 10   
[3]  {butter, other vegetables, whole milk}            0.03000000  9   
[4]  {butter, root vegetables, whole milk}             0.02666667  8   
[5]  {root vegetables, whipped/sour cream, whole milk} 0.02666667  8   
[6]  {other vegetables, tropical fruit, whole milk}    0.02666667  8   
[7]  {rolls/buns, root vegetables, whole milk}         0.02666667  8   
[8]  {rolls/buns, whole milk, yogurt}                  0.02666667  8   
[9]  {other vegetables, pork, whole milk}              0.02333333  7   
[10] {citrus fruit, other vegetables, whole milk}      0.02333333  7   

Untuk frequent 3-itemsets, terdapat 33 transaksi yang nilai supportnya di atas 0,02.

2.2 Confident

> rules<-apriori(Data, parameter=list(supp=0.02))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5    0.02      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 6 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[140 item(s), 300 transaction(s)] done [0.00s].
sorting and recoding items ... [67 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [13 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> 
> inspect(head(sort(rules,by="confidence"),5))
    lhs                                      rhs          support    confidence
[1] {butter, fruit/vegetable juice}       => {whole milk} 0.02000000 1.0000000 
[2] {butter, other vegetables}            => {whole milk} 0.03000000 1.0000000 
[3] {rolls/buns, root vegetables, yogurt} => {whole milk} 0.02000000 1.0000000 
[4] {root vegetables, whipped/sour cream} => {whole milk} 0.02666667 0.8888889 
[5] {butter, whipped/sour cream}          => {whole milk} 0.02333333 0.8750000 
    coverage   lift     count
[1] 0.02000000 3.614458 6    
[2] 0.03000000 3.614458 9    
[3] 0.02000000 3.614458 6    
[4] 0.03000000 3.212851 8    
[5] 0.02666667 3.162651 7    

Output di atas merupakan 5 confident terbesar. Artinya, 100% orang yang membeli butter dan fruit/vegetable juice maka orang tersebut akan membeli whole milk juga.

2.3 Lift

> inspect(head(sort(rules,by="lift"),5))
    lhs                                      rhs               support
[1] {onions, whole milk}                  => {root vegetables} 0.02   
[2] {hard cheese}                         => {rolls/buns}      0.02   
[3] {butter, fruit/vegetable juice}       => {whole milk}      0.02   
[4] {butter, other vegetables}            => {whole milk}      0.03   
[5] {rolls/buns, root vegetables, yogurt} => {whole milk}      0.02   
    confidence coverage   lift     count
[1] 0.8571429  0.02333333 6.593407 6    
[2] 0.8571429  0.02333333 4.591837 6    
[3] 1.0000000  0.02000000 3.614458 6    
[4] 1.0000000  0.03000000 3.614458 9    
[5] 1.0000000  0.02000000 3.614458 6    

Output di atas merupakan 5 lift terbesar. Artinya, hubungan kuat antara onions dan whole milk terhadap root vegetables sebesar 6,593407.

2.4 Plot

> highLiftRules<-head(sort(rules,by="lift"),5)
> plot(highLiftRules,method="graph",
+      control=list(type="items",
+                   edges = ggraph::geom_edge_link(
+                     end_cap = ggraph::circle(4, "mm"),
+                     start_cap = ggraph::circle(4, "mm"),
+                     color = "black",
+                     arrow = arrow(length = unit(2, "mm"), angle = 20, type = "closed"),
+                     alpha = 0.3)))
Available control parameters (with default values):
layout   =  stress
circular     =  FALSE
ggraphdots   =  NULL
edges    =  <environment>
nodes    =  <environment>
nodetext     =  <environment>
colors   =  c("#EE0000FF", "#EEEEEEFF")
engine   =  ggplot2
max  =  100
verbose  =  FALSE

Sekitar 2% pelanggan yang membeli hard cheese maka akan membeli rolls/buns terlihat pada ukuran lingkaran dan hubungan kuat antara hard cheese dan rolls/buns yang dapat dilihat pada lingkaran yang berwarna merah.

Sekitar 2% pelanggan yang membeli rolls/buns, yogurt, dan root vegetables maka akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.

Sekitar 2% pelanggan yang membeli fruit/vegetable juice dan butter akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.

Sekitar 3% pelanggan yang membeli butter dan other vegetables akan membeli whole milk dengan hubungan antar produk tidak terlalu kuat dikarenakan warna pada lingkaran berwarna abu-abu.

Sekitar 2% pelanggan yang membeli whole milk dan onions akan membeli root vegetables dengan hubungan antar produk yang sangat kuat dikarenakan warna pada lingkaran berwarna merah pekat.