Market Basket Analysis - 購物籃分析 (關聯分析) I

購物籃分析又稱關聯分析，從大量的交易資料中探勘資料間具有相關性的隱藏商業規則，其中最經典的就是啤酒與尿布的例子。以下透過 R 進行簡單範例的操作與說明。

首先安裝與載入 arules 套件

# install.packages("arules", repos="http://cran.us.r-project.org")
library ( arules )

## Warning: 套件 'arules' 是用 R 版本 4.2.3 來建造的

## 載入需要的套件：Matrix

## Warning: 套件 'Matrix' 是用 R 版本 4.2.3 來建造的

## 
## 載入套件：'arules'

## 下列物件被遮斷自 'package:base':
## 
##     abbreviate, write

接下來選擇使用 arules 套件內建的 Groceries (食品雜貨) 資料集，獲取其摘要訊息。

data("Groceries")                                
summary(Groceries)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage

展示 Groceries 資料集的前八條交易資料：

inspect(Groceries[1:8])

##     items                     
## [1] {citrus fruit,            
##      semi-finished bread,     
##      margarine,               
##      ready soups}             
## [2] {tropical fruit,          
##      yogurt,                  
##      coffee}                  
## [3] {whole milk}              
## [4] {pip fruit,               
##      yogurt,                  
##      cream cheese ,           
##      meat spreads}            
## [5] {other vegetables,        
##      whole milk,              
##      condensed milk,          
##      long life bakery product}
## [6] {whole milk,              
##      butter,                  
##      yogurt,                  
##      rice,                    
##      abrasive cleaner}        
## [7] {rolls/buns}              
## [8] {other vegetables,        
##      UHT-milk,                
##      rolls/buns,              
##      bottled beer,            
##      liquor (appetizer)}

一條交易資料及代表一位消費者購物籃內的商品類別，例如第一位消費者購買了 柑橘類水果 (citrus fruits) 、半成品麵包 (semi-finished bread)、人造奶油 (margarine)、 現成即食湯 (ready soups) 四種品項。

除了看商品品項之外，我們也可以使用 size 函數，來看單次購買的品項有幾種商品。

size(Groceries[1:8])

## [1] 4 3 1 4 4 5 1 5

然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例，藉此也可以找出出現頻率比較高的商品。

itemFrequency(Groceries)

##               frankfurter                   sausage                liver loaf 
##              0.0589730554              0.0939501779              0.0050838841 
##                       ham                      meat         finished products 
##              0.0260294865              0.0258261312              0.0065073716 
##           organic sausage                   chicken                    turkey 
##              0.0022369090              0.0429079817              0.0081342145 
##                      pork                      beef            hamburger meat 
##              0.0576512456              0.0524656838              0.0332486019 
##                      fish              citrus fruit            tropical fruit 
##              0.0029486528              0.0827656329              0.1049313676 
##                 pip fruit                    grapes                   berries 
##              0.0756481952              0.0223690900              0.0332486019 
##               nuts/prunes           root vegetables                    onions 
##              0.0033553635              0.1089984748              0.0310116929 
##                     herbs          other vegetables packaged fruit/vegetables 
##              0.0162684291              0.1934926284              0.0130147433 
##                whole milk                    butter                      curd 
##              0.2555160142              0.0554143366              0.0532791052 
##                   dessert               butter milk                    yogurt 
##              0.0371123538              0.0279613625              0.1395017794 
##        whipped/sour cream                 beverages                  UHT-milk 
##              0.0716827656              0.0260294865              0.0334519573 
##            condensed milk                     cream               soft cheese 
##              0.0102694459              0.0013218099              0.0170818505 
##             sliced cheese               hard cheese             cream cheese  
##              0.0245043213              0.0245043213              0.0396542959 
##          processed cheese             spread cheese               curd cheese 
##              0.0165734621              0.0111845450              0.0050838841 
##          specialty cheese                mayonnaise            salad dressing 
##              0.0085409253              0.0091509914              0.0008134215 
##                   tidbits         frozen vegetables             frozen fruits 
##              0.0023385867              0.0480935435              0.0012201322 
##              frozen meals               frozen fish            frozen chicken 
##              0.0283680732              0.0116929334              0.0006100661 
##                 ice cream            frozen dessert    frozen potato products 
##              0.0250127097              0.0107778343              0.0084392476 
##             domestic eggs                rolls/buns               white bread 
##              0.0634468734              0.1839349263              0.0420945602 
##               brown bread                    pastry            roll products  
##              0.0648703610              0.0889679715              0.0102694459 
##       semi-finished bread                  zwieback           potato products 
##              0.0176919166              0.0069140824              0.0028469751 
##                     flour                      salt                      rice 
##              0.0173868836              0.0107778343              0.0076258261 
##                     pasta                   vinegar                       oil 
##              0.0150482969              0.0065073716              0.0280630402 
##                 margarine             specialty fat                     sugar 
##              0.0585663447              0.0036603965              0.0338586680 
##          artif. sweetener                     honey                   mustard 
##              0.0032536858              0.0015251652              0.0119979664 
##                   ketchup                    spices                     soups 
##              0.0042704626              0.0051855618              0.0068124047 
##               ready soups     Instant food products                    sauces 
##              0.0018301983              0.0080325369              0.0054905948 
##                   cereals          organic products             baking powder 
##              0.0056939502              0.0016268429              0.0176919166 
##     preservation products            pudding powder         canned vegetables 
##              0.0002033554              0.0023385867              0.0107778343 
##              canned fruit        pickled vegetables      specialty vegetables 
##              0.0032536858              0.0178952720              0.0017285206 
##                       jam             sweet spreads              meat spreads 
##              0.0053889171              0.0090493137              0.0042704626 
##               canned fish                  dog food                  cat food 
##              0.0150482969              0.0085409253              0.0232841891 
##                  pet care                 baby food                    coffee 
##              0.0094560244              0.0001016777              0.0580579563 
##            instant coffee                       tea              cocoa drinks 
##              0.0074224708              0.0038637519              0.0022369090 
##             bottled water                      soda           misc. beverages 
##              0.1105236401              0.1743772242              0.0283680732 
##     fruit/vegetable juice                     syrup              bottled beer 
##              0.0722928317              0.0032536858              0.0805287239 
##               canned beer                    brandy                    whisky 
##              0.0776817489              0.0041687850              0.0008134215 
##                    liquor                       rum                   liqueur 
##              0.0110828673              0.0044738180              0.0009150991 
##        liquor (appetizer)                white wine            red/blush wine 
##              0.0079308592              0.0190137265              0.0192170819 
##                  prosecco            sparkling wine               salty snack 
##              0.0020335536              0.0055922725              0.0378240976 
##                   popcorn                 nut snack            snack products 
##              0.0072191154              0.0031520081              0.0030503305 
##  long life bakery product                   waffles                  cake bar 
##              0.0374173869              0.0384341637              0.0132180986 
##               chewing gum                 chocolate         cooking chocolate 
##              0.0210472801              0.0496187087              0.0025419420 
##       specialty chocolate             specialty bar     chocolate marshmallow 
##              0.0304016268              0.0273512964              0.0090493137 
##                     candy         seasonal products                 detergent 
##              0.0298932384              0.0142348754              0.0192170819 
##                  softener               decalcifier              dish cleaner 
##              0.0054905948              0.0015251652              0.0104728012 
##          abrasive cleaner                   cleaner            toilet cleaner 
##              0.0035587189              0.0050838841              0.0007117438 
##          bathroom cleaner                hair spray               dental care 
##              0.0027452974              0.0011184545              0.0057956279 
##            male cosmetics           make up remover                 skin care 
##              0.0045754957              0.0008134215              0.0035587189 
##  female sanitary products            baby cosmetics                      soap 
##              0.0061006609              0.0006100661              0.0026436197 
##           rubbing alcohol          hygiene articles                   napkins 
##              0.0010167768              0.0329435689              0.0523640061 
##                    dishes                  cookware           kitchen utensil 
##              0.0175902389              0.0027452974              0.0004067107 
##           cling film/bags            kitchen towels    house keeping products 
##              0.0113879004              0.0059989832              0.0083375699 
##                   candles               light bulbs      sound storage medium 
##              0.0089476360              0.0041687850              0.0001016777 
##                newspapers                photo/film                pot plants 
##              0.0798169802              0.0092526690              0.0172852059 
##    flower soil/fertilizer            flower (seeds)             shopping bags 
##              0.0019318760              0.0103711235              0.0985256736 
##                      bags 
##              0.0004067107

看起來類別太過繁雜，因此改用 itemFrequencyPlot 指令繪出 Top10 產品。

itemFrequencyPlot(Groceries,topN = 10)

itemFrequencyPlot(Groceries,topN = 10,type = "absolute")

itemFrequencyPlot(Groceries,topN = 10,horiz = T,
 main = "Item Frequency",xlab = "Relative Frequency")

顯示比例至少高於 0.1 產品所佔的比例圖， support 參數是支持度的意思 (通常會默認是 0.1，如果不使用的話，將列出所有產品品項，畫面會很亂。)

itemFrequencyPlot(Groceries,support = 0.1,
 main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")

接下來，我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法，Equivalence CLAss Transformation, Eclat) 函數，看看是否可以從資料中發掘一些有趣的結論：

apriori 演算法大致的運作方式，是首先透過設定 support 以及 confidence 兩個參數，再進一步觀察第三個參數 lift 與第四個參數 coverage:

支持度 (support)：「規則」在資料內具有的普遍性，也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)
信賴度 (confidence)：「規則」要有一定的信心水準，也就是當購買 A 狀態下，也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)
提升度 (lift)：「規則」對於特定商品的存在要有一定的提升效果，也就是在產品 B 出現的可能基礎 \(P(B)\) 上，購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)
覆蓋度 (coverage)：「規則」對於特定商品的存在機率是否達到一定的標準，影響了關聯規則的適用性，又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)

首先，嘗試對 apriori 函數做最少的限制，再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ，信賴度的最小閾值暫設定為 0.5 ，其他參數暫時不設定採預設值，並將所得之關聯規則命名為 rules0：

rules0=apriori(Groceries,parameter=list(support=0.001,confidence=0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [5668 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

結果顯示支持度、信賴度最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。

rules0         # 顯示 rules0 產生之關聯規則數目

## set of 5668 rules

inspect(rules0[1:10])       # 顯示其中前 10 條規則

##      lhs                    rhs                support     confidence
## [1]  {honey}             => {whole milk}       0.001118454 0.7333333 
## [2]  {tidbits}           => {rolls/buns}       0.001220132 0.5217391 
## [3]  {cocoa drinks}      => {whole milk}       0.001321810 0.5909091 
## [4]  {pudding powder}    => {whole milk}       0.001321810 0.5652174 
## [5]  {cooking chocolate} => {whole milk}       0.001321810 0.5200000 
## [6]  {cereals}           => {whole milk}       0.003660397 0.6428571 
## [7]  {jam}               => {whole milk}       0.002948653 0.5471698 
## [8]  {specialty cheese}  => {other vegetables} 0.004270463 0.5000000 
## [9]  {rice}              => {other vegetables} 0.003965430 0.5200000 
## [10] {rice}              => {whole milk}       0.004677173 0.6133333 
##      coverage    lift     count
## [1]  0.001525165 2.870009 11   
## [2]  0.002338587 2.836542 12   
## [3]  0.002236909 2.312611 13   
## [4]  0.002338587 2.212062 13   
## [5]  0.002541942 2.035097 13   
## [6]  0.005693950 2.515917 36   
## [7]  0.005388917 2.141431 29   
## [8]  0.008540925 2.584078 42   
## [9]  0.007625826 2.687441 39   
## [10] 0.007625826 2.400371 46

可看出 rules0 中共包含 5668 條關聯規則，完整顯示這 5668 條關聯規則並沒有太大的意義。而且，透過觀察前 10 條規則，我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。

面對複雜混亂的大量訊息，較好的方法是針對生成規則進行強度控制，以生成關聯性較強的若干重要規則，以下透過幾種嘗試進行參數調整：

將支持度調整為 0.005，信賴度維持在 0.5，並將所得之關聯規則命名為 rules01，得 120 條關聯規則。

rules01=apriori(Groceries,parameter=list(support=0.005,confidence=0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [120 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules01

## set of 120 rules

將支持度調整為 0.005，信賴度提高到 0.6，並將所得之關聯規則命名為 rules02，得 22 條關聯規則。

rules02=apriori(Groceries,parameter=list(support=0.005,confidence=0.6))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules02

## set of 22 rules

將支持度調整為 0.005，信賴度提高到 0.64，並將所得之關聯規則命名為 rules03，得 4 條關聯規則。

rules03=apriori(Groceries,parameter=list(support=0.005,confidence=0.64))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.64    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [120 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules03

## set of 4 rules

觀察一下關聯規則 rules03 規則的詳細內容：

inspect(rules03)

##     lhs                     rhs              support confidence    coverage     lift count
## [1] {butter,                                                                              
##      whipped/sour cream} => {whole milk} 0.006710727  0.6600000 0.010167768 2.583008    66
## [2] {pip fruit,                                                                           
##      whipped/sour cream} => {whole milk} 0.005998983  0.6483516 0.009252669 2.537421    59
## [3] {pip fruit,                                                                           
##      root vegetables,                                                                     
##      other vegetables}   => {whole milk} 0.005490595  0.6750000 0.008134215 2.641713    54
## [4] {tropical fruit,                                                                      
##      root vegetables,                                                                     
##      yogurt}             => {whole milk} 0.005693950  0.7000000 0.008134215 2.739554    56

此外，我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值，再依照其他參數來選擇前幾強的關聯規則，例如對於關聯規則 rules0 ，設定信賴度閾值為 0.5，並分別按支持度、信賴度、提升度排序，將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift，各選擇出前 6 強的關聯規則：

rules.sorted_supp = sort ( rules0, by="support" )   
inspect ( rules.sorted_supp [1:6] )

##     lhs                                       rhs                support   
## [1] {other vegetables, yogurt}             => {whole milk}       0.02226741
## [2] {tropical fruit, yogurt}               => {whole milk}       0.01514997
## [3] {other vegetables, whipped/sour cream} => {whole milk}       0.01464159
## [4] {root vegetables, yogurt}              => {whole milk}       0.01453991
## [5] {pip fruit, other vegetables}          => {whole milk}       0.01352313
## [6] {root vegetables, yogurt}              => {other vegetables} 0.01291307
##     confidence coverage   lift     count
## [1] 0.5128806  0.04341637 2.007235 219  
## [2] 0.5173611  0.02928317 2.024770 149  
## [3] 0.5070423  0.02887646 1.984385 144  
## [4] 0.5629921  0.02582613 2.203354 143  
## [5] 0.5175097  0.02613116 2.025351 133  
## [6] 0.5000000  0.02582613 2.584078 127

rules.sorted_conf = sort ( rules0, by="confidence" )   
inspect ( rules.sorted_conf [1:6] )

##     lhs                      rhs                    support confidence    coverage     lift count
## [1] {rice,                                                                                       
##      sugar}               => {whole milk}       0.001220132          1 0.001220132 3.913649    12
## [2] {canned fish,                                                                                
##      hygiene articles}    => {whole milk}       0.001118454          1 0.001118454 3.913649    11
## [3] {root vegetables,                                                                            
##      butter,                                                                                     
##      rice}                => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [4] {root vegetables,                                                                            
##      whipped/sour cream,                                                                         
##      flour}               => {whole milk}       0.001728521          1 0.001728521 3.913649    17
## [5] {butter,                                                                                     
##      soft cheese,                                                                                
##      domestic eggs}       => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [6] {citrus fruit,                                                                               
##      root vegetables,                                                                            
##      soft cheese}         => {other vegetables} 0.001016777          1 0.001016777 5.168156    10

rules.sorted_lift = sort ( rules0, by="lift" )   
inspect ( rules.sorted_lift [1:6] )

##     lhs                         rhs                  support confidence    coverage     lift count
## [1] {Instant food products,                                                                       
##      soda}                   => {hamburger meat} 0.001220132  0.6315789 0.001931876 18.99565    12
## [2] {soda,                                                                                        
##      popcorn}                => {salty snack}    0.001220132  0.6315789 0.001931876 16.69779    12
## [3] {flour,                                                                                       
##      baking powder}          => {sugar}          0.001016777  0.5555556 0.001830198 16.40807    10
## [4] {ham,                                                                                         
##      processed cheese}       => {white bread}    0.001931876  0.6333333 0.003050330 15.04549    19
## [5] {whole milk,                                                                                  
##      Instant food products}  => {hamburger meat} 0.001525165  0.5000000 0.003050330 15.03823    15
## [6] {other vegetables,                                                                            
##      curd,                                                                                        
##      yogurt,                                                                                      
##      whipped/sour cream}     => {cream cheese }  0.001016777  0.5882353 0.001728521 14.83409    10

由前面對於參數的概略介紹，我們可以知道提升度 (lift) 可說是篩選關聯規則的最可靠指標。且得到的結論也往往是有趣、富有啟發意味的 (例如強度最高的關聯規則為 {即時食品、碳酸飲料} \(\longrightarrow\) {漢堡肉}，其次為 {碳酸飲料、爆米花} \(\longrightarrow\) {鹹零食(垃圾食物)})，甚至可以進一步猜想可能的消費情境：這是一群忙碌了一週又懶得好好下廚的消費者，打算利用週末好好放鬆，喝可樂汽水等碳酸飲料、吃洋芋片、爆米花等垃圾食物、餓了還可以簡單煎個漢堡肉塞肚子、或是乾脆吃泡麵來充飢…。

接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況，這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品：芥末 (mustard)，可以透過將函數 apriori 中的關聯結果 (rhs) 參數設定為 “mustard”，來搜尋出 rhs 僅包含 mustard 的關聯規則，藉此有效地找到 mustard 的強關聯規則商品，做為適合與其做商品綑綁的考量基礎。其中參數 maxlen 設定為 2 的意思是控制 lhs 中僅包含一種產品，畢竟目標是將兩種產品進行綑綁，而非一堆產品。

rules04=apriori(Groceries,parameter=list(maxlen=2,supp=0.001,conf=0.05),appearance=list(rhs="mustard",default="lhs"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.05    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2

## Warning in apriori(Groceries, parameter = list(maxlen = 2, supp = 0.001, :
## Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!

##  done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect ( rules04 )

##     lhs                     rhs       support     confidence coverage   
## [1] {mayonnaise}         => {mustard} 0.001423488 0.15555556 0.009150991
## [2] {canned fish}        => {mustard} 0.001016777 0.06756757 0.015048297
## [3] {pickled vegetables} => {mustard} 0.001016777 0.05681818 0.017895272
## [4] {oil}                => {mustard} 0.001423488 0.05072464 0.028063040
##     lift      count
## [1] 12.965160 14   
## [2]  5.631585 10   
## [3]  4.735651 10   
## [4]  4.227770 14

rules05=apriori(Groceries,parameter=list(maxlen=2,supp=0.001,conf=0.1),appearance=list(rhs="mustard",default="lhs"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2

## Warning in apriori(Groceries, parameter = list(maxlen = 2, supp = 0.001, :
## Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!

##  done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect ( rules05 )

##     lhs             rhs       support     confidence coverage    lift     count
## [1] {mayonnaise} => {mustard} 0.001423488 0.1555556  0.009150991 12.96516 14

結果顯示蛋黃醬 (mayonnaise) 是芥末 (mustard) 的 (最) 強關聯規則商品，因此可以考慮將這兩種產品綑綁在一起銷售。

apriori 和 eclat 函數都可以根據需要，輸出頻繁項集 (frequent itemsets) 等其他形式的結果。譬如當我們想知道這個月銷售量最高的商品，或者綑綁策略在哪些綁定成對的商品中作用最顯著，則選擇輸出給定條件下的頻繁項集即可。

如以下即是將目標參數 (target) 設定為 “frequent itemsets” 所得的結果：

itemsets_apr = apriori ( Groceries, parameter = list (supp=0.001,target = "frequent itemsets"),control=list(sort=-1))  # 將 apriori 中目標參數設定為 frequent itemsets

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen            target  ext
##      10 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE   -1    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## sorting transactions ... done [0.00s].
## writing ... [13492 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

itemsets_apr                # 顯示所生成頻繁項集的個數

## set of 13492 itemsets

inspect(itemsets_apr[1:5])  # 觀測前 5 個頻繁項集

##     items              support   count
## [1] {whole milk}       0.2555160 2513 
## [2] {other vegetables} 0.1934926 1903 
## [3] {rolls/buns}       0.1839349 1809 
## [4] {soda}             0.1743772 1715 
## [5] {yogurt}           0.1395018 1372

如上結果，經 sort 參數對項集頻率進行降序排序 (由大到小) 後，得銷售量前 5 的商品分別為全脂牛奶、蔬菜、麵包卷、碳酸飲料、優格。

接下來使用 eclat 函數來獲取最適合進行的綑綁銷售 (或者說最適合做相鄰擺放銷售) 的 5 對商品，例如輸出中的 “全脂牛奶、蜂蜜” 與 “全脂牛奶、可可飲料” (或 “全脂牛奶、布丁粉” )，作為共同出現最為頻繁的兩種商品，可以考慮綑綁或相鄰擺放的銷售策略。

itemsets_ecl = eclat( Groceries, parameter = list ( minlen=1, maxlen=3,supp=0.001, target = "frequent itemsets"),control=list(sort=-1))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE   0.001      1      3 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -1    TRUE
## 
## Absolute minimum support count: 9 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating sparse bit matrix ... [157 row(s), 9835 column(s)] done [0.00s].
## writing  ... [9969 set(s)] done [0.02s].
## Creating S4 object  ... done [0.00s].

itemsets_ecl

## set of 9969 itemsets

inspect(itemsets_ecl[1:5])

##     items                        support     count
## [1] {whole milk, honey}          0.001118454 11   
## [2] {whole milk, cocoa drinks}   0.001321810 13   
## [3] {whole milk, pudding powder} 0.001321810 13   
## [4] {tidbits, rolls/buns}        0.001220132 12   
## [5] {tidbits, soda}              0.001016777 10

Market Basket Analysis - 購物籃分析 (關聯分析) I

gcchen

2023-02-15