購物籃分析又稱關聯分析,從大量的交易資料中探勘資料間具有相關性的隱藏商業規則,其中最經典的就是啤酒與尿布的例子。 以下透過 R 針對台南牛肉湯之店面銷售資料進行簡單範例的操作與說明。

首先安裝與載入 arules 套件

# install.packages("arules", repos="http://cran.us.r-project.org")
library("arules")  
## Warning: 套件 'arules' 是用 R 版本 4.2.3 來建造的
## 載入需要的套件:Matrix
## Warning: 套件 'Matrix' 是用 R 版本 4.2.3 來建造的
## 
## 載入套件:'arules'
## 下列物件被遮斷自 'package:base':
## 
##     abbreviate, write

接下來讀取店面銷售資料,使用 summary 指令,獲取其摘要訊息。

cow <- read.transactions("cow.csv")                                
summary(cow)     
## transactions as itemMatrix in sparse format with
##  578 rows (elements/itemsets/transactions) and
##  39 columns (items) and a density of 0.07563659 
## 
## most frequent items:
##    溫體牛肉湯      溫體牛肉   溫體牛肉3+1    芥蘭炒牛肉 牛肉貢丸(7粒) 
##           194           163           142           128           111 
##       (Other) 
##           967 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9 
##  98 178 127  80  51  22  16   3   3 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    2.95    4.00    9.00 
## 
## includes extended item information - examples:
##       labels
## 1 三菇炒牛肉
## 2   牛三層肉
## 3 牛三層肉湯

展示訂單資料的前八條交易資料:

inspect(cow[1:8])     
##     items                                                               
## [1] {青椒炒牛肉, 綜合湯}                                                
## [2] {牛肉貢丸(7粒), 加肉, 玉米, 娃娃菜, 高麗菜, 溫體牛肉3+1, 滑蛋炒牛肉}
## [3] {芥蘭炒牛肉, 溫體牛肉湯, 滑蛋炒牛肉}                                
## [4] {溫體牛肉湯}                                                        
## [5] {溫體牛肉湯}                                                        
## [6] {三菇炒牛肉, 溫體牛肉湯}                                            
## [7] {溫體牛肉湯, 滑蛋炒牛肉}                                            
## [8] {溫體牛肉湯, 滑蛋炒牛肉, 綜合湯}

除了看商品品項之外,我們也可以使用 size 函數,來看單次購買的品項有幾種商品。

size(cow[1:8])   
## [1] 2 7 3 1 1 2 2 3

然後我們再利用 itemFrequency 函數計算以列出每一項品項佔的比例,藉此也可以找出出現頻率比較高的商品。

itemFrequency(cow)
##    三菇炒牛肉      牛三層肉    牛三層肉湯          牛心        牛心湯 
##   0.027681661   0.031141869   0.055363322   0.008650519   0.005190311 
## 牛肉貢丸(7粒)        牛骨隨          牛筋        牛筋湯          牛雜 
##   0.192041522   0.019031142   0.036332180   0.008650519   0.034602076 
##        牛雜湯          加肉          玉米    油菜炒牛肉      炒牛肉麵 
##   0.031141869   0.089965398   0.057093426   0.015570934   0.107266436 
##        炒牛雜      炒牛雜麵    芥蘭炒牛肉  金沙苦瓜牛肉        青江菜 
##   0.058823529   0.024221453   0.221453287   0.024221453   0.041522491 
##    青椒炒牛肉    青蔥炒牛肉        娃娃菜    洋蔥炒牛肉          茼蒿 
##   0.060553633   0.051903114   0.117647059   0.024221453   0.001730104 
##        高麗菜  高麗菜炒牛肉      麻油牛心      麻油牛腦      溫體牛肉 
##   0.166089965   0.089965398   0.015570934   0.001730104   0.282006920 
##   溫體牛肉3+1    溫體牛肉湯    滑蛋炒牛肉        綜合湯        綜合菇 
##   0.245674740   0.335640138   0.164359862   0.067474048   0.105536332 
##    蒜頭炒牛肉      蔥炒牛肉    薑絲炒牛肉    薑絲炒牛肚 
##   0.058823529   0.001730104   0.019031142   0.050173010

用 itemFrequencyPlot 指令繪出 Top10 產品。

itemFrequencyPlot(cow,topN = 10)

itemFrequencyPlot(cow,topN = 10,type = "absolute")

itemFrequencyPlot(cow,topN = 10,horiz = T,
 main = "Item Frequency",xlab = "Relative Frequency")

顯示比例至少高於 0.1 產品所佔的比例圖, support 參數是支持度的意思 (通常會默認是 0.1,如果不使用的話,將列出所有產品品項,畫面會很亂。)

itemFrequencyPlot(cow,support = 0.1,
 main = "Item Frequency with S = 0.1",ylab = "Relative Frequency")

接下來,我們嘗試運用 apriori (先驗演算法) 及 eclat (等價類變換演算法,Equivalence CLAss Transformation, Eclat) 函數 ,看看是否可以從資料中發掘一些有趣的結論:

apriori 演算法大致的運作方式,是首先透過設定 support 以及 confidence 兩個參數,再進一步觀察第三個參數 lift 與第四個參數 coverage:

  1. 支持度 (support):「規則」在資料內具有的普遍性,也就是這些 A 跟 B 同時出現的機率多少。 \(Support(A \longrightarrow B) = P(A,B)\)

  2. 信賴度 (confidence):「規則」要有一定的信心水準,也就是當購買 A 狀態下,也會購買 B 的條件機率。\(confidence(A \longrightarrow B) = P(B|A)=P(A,B)/P(A)\)

  3. 提升度 (lift):「規則」對於特定商品的存在要有一定的提升效果,也就是在產品 B 出現的可能基礎 \(P(B)\) 上,購買 A 狀態下也會購買 B 的條件機率 \(P(B|A)\) ( A 出現的前提下 B 的出現率) 的提升程度。\(lift(A \longrightarrow B) = P(B|A)/P(B) = confidence(A \longrightarrow B) / P(B)\)

  4. 覆蓋度 (coverage):「規則」對於特定商品的存在機率是否達到一定的標準,影響了關聯規則的適用性,又稱 LHS-support。也就是 A 產品的購買比例。 \(coverage(A) = P(A)\)

首先,嘗試對 apriori 函數做最少的限制,再依結果來決定該如何調整。支持度的最小閾值暫設定為 0.001 ,信賴度的最小閾值暫設定為 0.05 ,其他參數暫時不設定採預設值,並將所得之關聯規則命名為 rules0:

rules0=apriori(cow,parameter=list(support=0.001,confidence=0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [39 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 done [0.00s].
## writing ... [13620 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].

結果顯示 支持度、信賴度 最小值的參數詳解 (parameter specification) 以及計錄演算法執行過程中相關參數的演算法控制 (algorithmic control) 以及若干執行細節。

rules0      # 顯示 rules0 產生之關聯規則數目      
## set of 13620 rules

可看出 rules0 中共包含 13620 條關聯規則,完整顯示這 13620 條關聯規則並沒有太大的意義。而且,透過觀察前 10 條規則,我們發現關聯規則的先後順序與其關聯強度的四個參數值 (support、confidence、lift、coverage) 的取值大小沒有明顯的關聯。

inspect(rules0[1:10])   # 顯示rules0其中前 10 條規則   
##      lhs           rhs          support     confidence coverage    lift     
## [1]  {蔥炒牛肉} => {溫體牛肉湯} 0.001730104 1.0000000  0.001730104  2.979381
## [2]  {麻油牛腦} => {牛骨隨}     0.001730104 1.0000000  0.001730104 52.545455
## [3]  {麻油牛腦} => {炒牛肉麵}   0.001730104 1.0000000  0.001730104  9.322581
## [4]  {茼蒿}     => {加肉}       0.001730104 1.0000000  0.001730104 11.115385
## [5]  {茼蒿}     => {滑蛋炒牛肉} 0.001730104 1.0000000  0.001730104  6.084211
## [6]  {茼蒿}     => {溫體牛肉}   0.001730104 1.0000000  0.001730104  3.546012
## [7]  {牛心湯}   => {溫體牛肉湯} 0.003460208 0.6666667  0.005190311  1.986254
## [8]  {牛筋湯}   => {芥蘭炒牛肉} 0.005190311 0.6000000  0.008650519  2.709375
## [9]  {牛心}     => {青江菜}     0.005190311 0.6000000  0.008650519 14.450000
## [10] {牛心}     => {娃娃菜}     0.005190311 0.6000000  0.008650519  5.100000
##      count
## [1]  1    
## [2]  1    
## [3]  1    
## [4]  1    
## [5]  1    
## [6]  1    
## [7]  2    
## [8]  3    
## [9]  3    
## [10] 3

由於訂單資料尚在累積中,交易資料有限,因此可看出前 10 條規則有超過一半都僅出現於一筆交易,顯然不是理想的關聯規則,coverage偏低。

面對複雜混亂的大量訊息,較好的方法是針對生成規則進行強度控制,以生成關聯性較強的若干重要規則,以下透過幾種嘗試進行參數調整:

  1. 將支持度調整為 0.005,信賴度維持在 0.5,並將所得之關聯規則命名為 rules01,得 475 條關聯規則。
rules01=apriori(cow,parameter=list(support=0.005,confidence=0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [475 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules01         # 顯示 rules01 產生之關聯規則數目      
## set of 475 rules
inspect(rules01[1:10])       # 顯示其中前 10 條規則   
##      lhs             rhs             support     confidence coverage   
## [1]  {牛筋湯}     => {芥蘭炒牛肉}    0.005190311 0.6000000  0.008650519
## [2]  {牛心}       => {青江菜}        0.005190311 0.6000000  0.008650519
## [3]  {牛心}       => {娃娃菜}        0.005190311 0.6000000  0.008650519
## [4]  {牛心}       => {高麗菜}        0.005190311 0.6000000  0.008650519
## [5]  {牛心}       => {溫體牛肉}      0.005190311 0.6000000  0.008650519
## [6]  {油菜炒牛肉} => {高麗菜}        0.008650519 0.5555556  0.015570934
## [7]  {油菜炒牛肉} => {牛肉貢丸(7粒)} 0.010380623 0.6666667  0.015570934
## [8]  {油菜炒牛肉} => {溫體牛肉}      0.010380623 0.6666667  0.015570934
## [9]  {洋蔥炒牛肉} => {溫體牛肉湯}    0.012110727 0.5000000  0.024221453
## [10] {麻油牛心}   => {溫體牛肉3+1}   0.013840830 0.8888889  0.015570934
##      lift      count
## [1]   2.709375 3    
## [2]  14.450000 3    
## [3]   5.100000 3    
## [4]   3.612500 3    
## [5]   2.127607 3    
## [6]   3.344907 5    
## [7]   3.471471 6    
## [8]   2.364008 6    
## [9]   1.489691 7    
## [10]  3.618153 8
  1. 將支持度調整為 0.005,信賴度維持在 0.7,並將所得之關聯規則命名為 rules02,得 174 條關聯規則。
rules02=apriori(cow,parameter=list(support=0.005,confidence=0.7))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [174 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules02         # 顯示 rules02 產生之關聯規則數目      
## set of 174 rules
inspect(rules02[1:10])       # 顯示其中前 10 條規則   
##      lhs                          rhs             support     confidence
## [1]  {麻油牛心}                => {溫體牛肉3+1}   0.013840830 0.8888889 
## [2]  {牛雜}                    => {溫體牛肉}      0.029411765 0.8500000 
## [3]  {牛筋}                    => {溫體牛肉}      0.025951557 0.7142857 
## [4]  {加肉}                    => {溫體牛肉}      0.074394464 0.8269231 
## [5]  {牛心, 娃娃菜}            => {溫體牛肉}      0.005190311 1.0000000 
## [6]  {牛心, 溫體牛肉}          => {娃娃菜}        0.005190311 1.0000000 
## [7]  {牛肉貢丸(7粒), 炒牛雜麵} => {溫體牛肉}      0.005190311 1.0000000 
## [8]  {炒牛雜麵, 溫體牛肉}      => {牛肉貢丸(7粒)} 0.005190311 0.7500000 
## [9]  {牛筋, 油菜炒牛肉}        => {高麗菜}        0.005190311 1.0000000 
## [10] {牛筋, 油菜炒牛肉}        => {牛肉貢丸(7粒)} 0.005190311 1.0000000 
##      coverage    lift     count
## [1]  0.015570934 3.618153  8   
## [2]  0.034602076 3.014110 17   
## [3]  0.036332180 2.532866 15   
## [4]  0.089965398 2.932279 43   
## [5]  0.005190311 3.546012  3   
## [6]  0.005190311 8.500000  3   
## [7]  0.005190311 3.546012  3   
## [8]  0.006920415 3.905405  3   
## [9]  0.005190311 6.020833  3   
## [10] 0.005190311 5.207207  3
  1. 將支持度調整為 0.01,信賴度維持在 0.8,並將所得之關聯規則命名為 rules03,得 14 條關聯規則。
rules03=apriori(cow,parameter=list(support=0.01,confidence=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules03         # 顯示 rules03 產生之關聯規則數目      
## set of 14 rules
inspect(rules03)         # 觀察其內容
##      lhs                                   rhs             support   
## [1]  {麻油牛心}                         => {溫體牛肉3+1}   0.01384083
## [2]  {牛雜}                             => {溫體牛肉}      0.02941176
## [3]  {加肉}                             => {溫體牛肉}      0.07439446
## [4]  {牛筋, 芥蘭炒牛肉}                 => {溫體牛肉}      0.01038062
## [5]  {牛肉貢丸(7粒), 青江菜}            => {綜合菇}        0.01557093
## [6]  {加肉, 蒜頭炒牛肉}                 => {溫體牛肉}      0.01211073
## [7]  {高麗菜, 蒜頭炒牛肉}               => {牛肉貢丸(7粒)} 0.01384083
## [8]  {加肉, 玉米}                       => {高麗菜}        0.01038062
## [9]  {炒牛肉麵, 滑蛋炒牛肉}             => {溫體牛肉湯}    0.01038062
## [10] {牛肉貢丸(7粒), 加肉}              => {溫體牛肉}      0.03460208
## [11] {青江菜, 溫體牛肉, 綜合菇}         => {牛肉貢丸(7粒)} 0.01038062
## [12] {牛肉貢丸(7粒), 青江菜, 溫體牛肉}  => {綜合菇}        0.01038062
## [13] {牛肉貢丸(7粒), 玉米, 高麗菜}      => {溫體牛肉3+1}   0.01384083
## [14] {牛肉貢丸(7粒), 玉米, 溫體牛肉3+1} => {高麗菜}        0.01384083
##      confidence coverage   lift     count
## [1]  0.8888889  0.01557093 3.618153  8   
## [2]  0.8500000  0.03460208 3.014110 17   
## [3]  0.8269231  0.08996540 2.932279 43   
## [4]  1.0000000  0.01038062 3.546012  6   
## [5]  0.9000000  0.01730104 8.527869  9   
## [6]  1.0000000  0.01211073 3.546012  7   
## [7]  0.8888889  0.01557093 4.628629  8   
## [8]  0.8571429  0.01211073 5.160714  6   
## [9]  0.8571429  0.01211073 2.553756  6   
## [10] 0.8000000  0.04325260 2.836810 20   
## [11] 0.8571429  0.01211073 4.463320  6   
## [12] 1.0000000  0.01038062 9.475410  6   
## [13] 0.8000000  0.01730104 3.256338  8   
## [14] 0.8000000  0.01730104 4.816667  8
  1. 將支持度調整為 0.015,信賴度維持在 0.8,並將所得之關聯規則命名為 rules04,得 4 條關聯規則。
rules04=apriori(cow,parameter=list(support=0.015,confidence=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.015      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 8 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules04         # 顯示 rules04 產生之關聯規則數目      
## set of 4 rules
inspect(rules04)         # 觀察其內容
##     lhs                        rhs        support    confidence coverage  
## [1] {牛雜}                  => {溫體牛肉} 0.02941176 0.8500000  0.03460208
## [2] {加肉}                  => {溫體牛肉} 0.07439446 0.8269231  0.08996540
## [3] {牛肉貢丸(7粒), 青江菜} => {綜合菇}   0.01557093 0.9000000  0.01730104
## [4] {牛肉貢丸(7粒), 加肉}   => {溫體牛肉} 0.03460208 0.8000000  0.04325260
##     lift     count
## [1] 3.014110 17   
## [2] 2.932279 43   
## [3] 8.527869  9   
## [4] 2.836810 20

結果得到四條關聯規則,coverage 也都達到0.017以上。

此外,我們也可以透過將已形成的關聯規則 (例如 rules0) 之其中一個參數採取固定閾值,再依照其他參數來選擇前幾強的關聯規則,例如對於關聯規則 rules0 ,設定信賴度閾值為 0.5,並分別按 支持度、信賴度、提升度 排序,將結果計為 rules.sorted_supp、rules.sorted_conf、rules.sorted_lift,各選擇出前 6 強的關聯規則:

rules.sorted_supp = sort ( rules0, by="support" )   
inspect ( rules.sorted_supp [1:6] )    
##     lhs                        rhs             support    confidence coverage  
## [1] {牛肉貢丸(7粒)}         => {溫體牛肉}      0.09688581 0.5045045  0.19204152
## [2] {高麗菜}                => {牛肉貢丸(7粒)} 0.08477509 0.5104167  0.16608997
## [3] {加肉}                  => {溫體牛肉}      0.07439446 0.8269231  0.08996540
## [4] {娃娃菜}                => {溫體牛肉}      0.06228374 0.5294118  0.11764706
## [5] {綜合菇}                => {溫體牛肉}      0.05363322 0.5081967  0.10553633
## [6] {牛肉貢丸(7粒), 高麗菜} => {溫體牛肉3+1}   0.04325260 0.5102041  0.08477509
##     lift     count
## [1] 1.788979 56   
## [2] 2.657845 49   
## [3] 2.932279 43   
## [4] 1.877301 36   
## [5] 1.802072 31   
## [6] 2.076746 25
rules.sorted_conf = sort ( rules0, by="confidence" )   
inspect ( rules.sorted_conf [1:6] )    
##     lhs           rhs          support     confidence coverage    lift     
## [1] {蔥炒牛肉} => {溫體牛肉湯} 0.001730104 1          0.001730104  2.979381
## [2] {麻油牛腦} => {牛骨隨}     0.001730104 1          0.001730104 52.545455
## [3] {麻油牛腦} => {炒牛肉麵}   0.001730104 1          0.001730104  9.322581
## [4] {茼蒿}     => {加肉}       0.001730104 1          0.001730104 11.115385
## [5] {茼蒿}     => {滑蛋炒牛肉} 0.001730104 1          0.001730104  6.084211
## [6] {茼蒿}     => {溫體牛肉}   0.001730104 1          0.001730104  3.546012
##     count
## [1] 1    
## [2] 1    
## [3] 1    
## [4] 1    
## [5] 1    
## [6] 1
rules.sorted_lift = sort ( rules0, by="lift" )   
inspect ( rules.sorted_lift [1:6] ) 
##     lhs                       rhs        support     confidence coverage   
## [1] {牛骨隨, 炒牛肉麵}     => {麻油牛腦} 0.001730104 1          0.001730104
## [2] {牛骨隨, 麻油牛心}     => {牛筋湯}   0.001730104 1          0.001730104
## [3] {牛骨隨, 炒牛雜}       => {牛筋湯}   0.001730104 1          0.001730104
## [4] {牛骨隨, 芥蘭炒牛肉}   => {牛筋湯}   0.001730104 1          0.001730104
## [5] {青椒炒牛肉, 麻油牛心} => {牛筋湯}   0.001730104 1          0.001730104
## [6] {炒牛雜, 麻油牛心}     => {牛筋湯}   0.001730104 1          0.001730104
##     lift  count
## [1] 578.0 1    
## [2] 115.6 1    
## [3] 115.6 1    
## [4] 115.6 1    
## [5] 115.6 1    
## [6] 115.6 1

看來在設定信賴度閾值為 0.5的條件下,只有按支持度選擇出的前 6 強關聯規則較具實際意義,因為 counts (消費訂單數) 皆達20以上。

接下來透過例子談談關聯規則的實際應用。我們常在賣場裡發現過兩種商品綑綁在一起銷售的情況,這也許是因為商家想要促銷其中的某種商品。假設店家現在想要促銷一款較冷門的商品:高麗菜炒牛肉

rules05=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="高麗菜炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.05    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
##  done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

提高信賴度的要求至0.1以上,將關聯規則限縮到只剩一項。

inspect(rules05)
##     lhs              rhs            support     confidence coverage   lift     
## [1] {}            => {高麗菜炒牛肉} 0.089965398 0.08996540 1.00000000 1.0000000
## [2] {牛三層肉湯}  => {高麗菜炒牛肉} 0.005190311 0.09375000 0.05536332 1.0420673
## [3] {牛三層肉}    => {高麗菜炒牛肉} 0.005190311 0.16666667 0.03114187 1.8525641
## [4] {薑絲炒牛肚}  => {高麗菜炒牛肉} 0.008650519 0.17241379 0.05017301 1.9164456
## [5] {滑蛋炒牛肉}  => {高麗菜炒牛肉} 0.012110727 0.07368421 0.16435986 0.8190283
## [6] {高麗菜}      => {高麗菜炒牛肉} 0.008650519 0.05208333 0.16608997 0.5789263
## [7] {溫體牛肉湯}  => {高麗菜炒牛肉} 0.043252595 0.12886598 0.33564014 1.4323949
## [8] {溫體牛肉3+1} => {高麗菜炒牛肉} 0.022491349 0.09154930 0.24567474 1.0176056
##     count
## [1] 52   
## [2]  3   
## [3]  3   
## [4]  5   
## [5]  7   
## [6]  5   
## [7] 25   
## [8] 13
rules06=apriori(cow,parameter=list(maxlen=2,supp=0.02,conf=0.1),appearance=list(rhs="高麗菜炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.02      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 11 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [29 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.02, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
##  done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules06)
##     lhs             rhs            support   confidence coverage  lift    
## [1] {溫體牛肉湯} => {高麗菜炒牛肉} 0.0432526 0.128866   0.3356401 1.432395
##     count
## [1] 25

結果顯示 “溫體牛肉湯” 是 “高麗菜炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。

假設店家現在想要促銷另一款較冷門的商品:青蔥炒牛肉

rules07=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.05),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.05    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf =
## 0.05), : Mining stopped (maxlen reached). Only patterns up to a length of 2
## returned!
##  done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules07)
##     lhs             rhs          support     confidence coverage   lift    
## [1] {}           => {青蔥炒牛肉} 0.051903114 0.05190311 1.00000000 1.000000
## [2] {薑絲炒牛肚} => {青蔥炒牛肉} 0.006920415 0.13793103 0.05017301 2.657471
## [3] {炒牛雜}     => {青蔥炒牛肉} 0.005190311 0.08823529 0.05882353 1.700000
## [4] {炒牛肉麵}   => {青蔥炒牛肉} 0.008650519 0.08064516 0.10726644 1.553763
## [5] {加肉}       => {青蔥炒牛肉} 0.008650519 0.09615385 0.08996540 1.852564
## [6] {娃娃菜}     => {青蔥炒牛肉} 0.006920415 0.05882353 0.11764706 1.133333
## [7] {高麗菜}     => {青蔥炒牛肉} 0.008650519 0.05208333 0.16608997 1.003472
## [8] {溫體牛肉湯} => {青蔥炒牛肉} 0.020761246 0.06185567 0.33564014 1.191753
##     count
## [1] 30   
## [2]  4   
## [3]  3   
## [4]  5   
## [5]  5   
## [6]  4   
## [7]  5   
## [8] 12

提高信賴度的要求至0.1以上,將關聯規則限縮到只剩一項。

rules08=apriori(cow,parameter=list(maxlen=2,supp=0.005,conf=0.10),appearance=list(rhs="青蔥炒牛肉",default="lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target  ext
##       2  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(cow, parameter = list(maxlen = 2, supp = 0.005, conf = 0.1),
## : Mining stopped (maxlen reached). Only patterns up to a length of 2 returned!
##  done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rules08)
##     lhs             rhs          support     confidence coverage   lift    
## [1] {薑絲炒牛肚} => {青蔥炒牛肉} 0.006920415 0.137931   0.05017301 2.657471
##     count
## [1] 4

結果顯示 “薑絲炒牛肚” 是 “青蔥炒牛肉” 的 (最) 強關聯規則商品,因此可以考慮將這兩種產品綑綁在一起銷售。

關聯規則的視覺化

透過圖形的方式,可藉由視覺化方式,更直觀地顯示出關聯分析的結果。 這需要用到 R 的擴充套件 arulesViz

此處介紹一些此套件相關的簡單應用。

# install.packages("arulesViz", repos="http://cran.us.r-project.org")
library (arulesViz)  
## Warning: 套件 'arulesViz' 是用 R 版本 4.2.3 來建造的
data("cow")
## Warning in data("cow"): 沒有 'cow' 這個資料集

接下來將支持度設置為 0.002,信賴度設置為 0.5,並將所得之關聯規則命名為 rules09。

rules09 = apriori (cow, parameter = list ( support=0.002, confidence=0.5 ) )  
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.002      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[39 item(s), 578 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [1270 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules09
## set of 1270 rules

接下來對 rules09 繪製散佈圖:

plot(rules09)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

圖中的每個點描述了相對應規則的支持度 (橫軸) 與信賴度 (縱軸),而顏色的深淺則由 lift (提升度) 值的高低來決定。可以透過參數設定,變更橫縱軸與顏色所對應的變量,例如:

plot(rules09, measure=c("support", "lift"), shading ="confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

由於觀察圖形,無法確認感興趣的規則對應到的是哪些商品,這個問題可以透過設置互動參數來克服 (可互動部分無法以網頁呈現,可另外透過 R 程式來操作,以滑鼠點選之互動方式選擇特定關聯規則,程式在註解在下方程式區塊 # 後)

# plot(rules09, interactive=TRUE)

此外,我們還可以將 shading 參數設置為 “order” 來繪製出一種特殊的散佈圖- “two-key plot”,而顏色的深淺代表關聯規則中所含有的商品數目的多少,商品的種類 (order) 越多,點的顏色越深。

plot(rules09, shading="order", control=list(main = "Two-key plot"))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

接下來我們將圖形類型變更為 “grouped” 來產生 “grouped matrix” 圖形,以 lift 參數來看,關聯性最強 (圓點顏色最深) 的兩組產品是 薑絲炒牛肚 及 炒牛雜 等產品與 油菜炒牛肉 ,亦可透過 support 參數來看關聯性最強的組合 (圓點尺寸最大)。

plot(rules09, method= "grouped")

至於 method 參數,還可以修改為 “matrix”。“matrix3D”、“paracoord” 等:

以下圖形以 “matrix” 方法呈現,顯示了 rules09 的前 50 項關連規則中, 31 個 LHS (Left Hand Side) 與 12 個 RHS (Right Hand Side) 的支持度參數圖形化呈現。 其中顏色的深淺代表 lift 支持度參數的高低。

plot(rules09[1:50], method="matrix", measure="lift")
## Itemsets in Antecedent (LHS)
##  [1] "{油菜炒牛肉,高麗菜}"        "{牛肉貢丸(7粒),油菜炒牛肉}"
##  [3] "{牛心,溫體牛肉}"            "{牛心,娃娃菜}"             
##  [5] "{油菜炒牛肉,炒牛雜}"        "{牛心,高麗菜}"             
##  [7] "{油菜炒牛肉,薑絲炒牛肚}"    "{牛心}"                    
##  [9] "{牛心,溫體牛肉3+1}"         "{牛筋,油菜炒牛肉}"         
## [11] "{牛筋湯,溫體牛肉湯}"        "{牛心,青江菜}"             
## [13] "{炒牛雜麵,溫體牛肉}"        "{麻油牛心}"                
## [15] "{青江菜}"                   "{牛肉貢丸(7粒),炒牛雜麵}"  
## [17] "{油菜炒牛肉}"               "{牛雜}"                    
## [19] "{加肉}"                     "{玉米}"                    
## [21] "{牛筋湯}"                   "{高麗菜}"                  
## [23] "{牛筋}"                     "{牛心湯}"                  
## [25] "{牛筋湯,芥蘭炒牛肉}"        "{牛三層肉}"                
## [27] "{娃娃菜}"                   "{綜合菇}"                  
## [29] "{牛肉貢丸(7粒)}"            "{洋蔥炒牛肉}"              
## [31] "{三菇炒牛肉}"              
## Itemsets in Consequent (RHS)
##  [1] "{溫體牛肉湯}"    "{溫體牛肉}"      "{溫體牛肉3+1}"   "{芥蘭炒牛肉}"   
##  [5] "{牛肉貢丸(7粒)}" "{高麗菜}"        "{綜合菇}"        "{娃娃菜}"       
##  [9] "{牛筋}"          "{青江菜}"        "{炒牛雜}"        "{薑絲炒牛肚}"

rules09 的前 50 項關連規則詳細內容如下:

inspect(rules09[1:50])
##      lhs                            rhs             support     confidence
## [1]  {牛心湯}                    => {溫體牛肉湯}    0.003460208 0.6666667 
## [2]  {牛筋湯}                    => {芥蘭炒牛肉}    0.005190311 0.6000000 
## [3]  {牛心}                      => {青江菜}        0.005190311 0.6000000 
## [4]  {牛心}                      => {娃娃菜}        0.005190311 0.6000000 
## [5]  {牛心}                      => {高麗菜}        0.005190311 0.6000000 
## [6]  {牛心}                      => {溫體牛肉}      0.005190311 0.6000000 
## [7]  {油菜炒牛肉}                => {高麗菜}        0.008650519 0.5555556 
## [8]  {油菜炒牛肉}                => {牛肉貢丸(7粒)} 0.010380623 0.6666667 
## [9]  {油菜炒牛肉}                => {溫體牛肉}      0.010380623 0.6666667 
## [10] {洋蔥炒牛肉}                => {溫體牛肉湯}    0.012110727 0.5000000 
## [11] {麻油牛心}                  => {溫體牛肉3+1}   0.013840830 0.8888889 
## [12] {三菇炒牛肉}                => {溫體牛肉湯}    0.013840830 0.5000000 
## [13] {牛雜}                      => {溫體牛肉}      0.029411765 0.8500000 
## [14] {牛三層肉}                  => {溫體牛肉}      0.017301038 0.5555556 
## [15] {牛筋}                      => {溫體牛肉}      0.025951557 0.7142857 
## [16] {青江菜}                    => {綜合菇}        0.022491349 0.5416667 
## [17] {青江菜}                    => {溫體牛肉}      0.024221453 0.5833333 
## [18] {玉米}                      => {高麗菜}        0.031141869 0.5454545 
## [19] {玉米}                      => {牛肉貢丸(7粒)} 0.029411765 0.5151515 
## [20] {玉米}                      => {溫體牛肉3+1}   0.032871972 0.5757576 
## [21] {加肉}                      => {溫體牛肉}      0.074394464 0.8269231 
## [22] {綜合菇}                    => {溫體牛肉}      0.053633218 0.5081967 
## [23] {娃娃菜}                    => {溫體牛肉}      0.062283737 0.5294118 
## [24] {高麗菜}                    => {牛肉貢丸(7粒)} 0.084775087 0.5104167 
## [25] {牛肉貢丸(7粒)}             => {溫體牛肉}      0.096885813 0.5045045 
## [26] {牛筋湯, 芥蘭炒牛肉}        => {溫體牛肉湯}    0.003460208 0.6666667 
## [27] {牛筋湯, 溫體牛肉湯}        => {芥蘭炒牛肉}    0.003460208 1.0000000 
## [28] {牛心, 青江菜}              => {娃娃菜}        0.003460208 0.6666667 
## [29] {牛心, 娃娃菜}              => {青江菜}        0.003460208 0.6666667 
## [30] {牛心, 青江菜}              => {高麗菜}        0.003460208 0.6666667 
## [31] {牛心, 高麗菜}              => {青江菜}        0.003460208 0.6666667 
## [32] {牛心, 青江菜}              => {溫體牛肉}      0.003460208 0.6666667 
## [33] {牛心, 溫體牛肉}            => {青江菜}        0.003460208 0.6666667 
## [34] {牛心, 娃娃菜}              => {溫體牛肉}      0.005190311 1.0000000 
## [35] {牛心, 溫體牛肉}            => {娃娃菜}        0.005190311 1.0000000 
## [36] {牛心, 高麗菜}              => {溫體牛肉3+1}   0.003460208 0.6666667 
## [37] {牛心, 溫體牛肉3+1}         => {高麗菜}        0.003460208 1.0000000 
## [38] {牛肉貢丸(7粒), 炒牛雜麵}   => {溫體牛肉}      0.005190311 1.0000000 
## [39] {炒牛雜麵, 溫體牛肉}        => {牛肉貢丸(7粒)} 0.005190311 0.7500000 
## [40] {牛筋, 油菜炒牛肉}          => {高麗菜}        0.005190311 1.0000000 
## [41] {油菜炒牛肉, 高麗菜}        => {牛筋}          0.005190311 0.6000000 
## [42] {牛筋, 油菜炒牛肉}          => {牛肉貢丸(7粒)} 0.005190311 1.0000000 
## [43] {牛肉貢丸(7粒), 油菜炒牛肉} => {牛筋}          0.005190311 0.5000000 
## [44] {牛筋, 油菜炒牛肉}          => {溫體牛肉}      0.003460208 0.6666667 
## [45] {油菜炒牛肉, 薑絲炒牛肚}    => {炒牛雜}        0.003460208 1.0000000 
## [46] {油菜炒牛肉, 炒牛雜}        => {薑絲炒牛肚}    0.003460208 1.0000000 
## [47] {油菜炒牛肉, 薑絲炒牛肚}    => {牛肉貢丸(7粒)} 0.003460208 1.0000000 
## [48] {油菜炒牛肉, 薑絲炒牛肚}    => {溫體牛肉}      0.003460208 1.0000000 
## [49] {油菜炒牛肉, 炒牛雜}        => {牛肉貢丸(7粒)} 0.003460208 1.0000000 
## [50] {油菜炒牛肉, 炒牛雜}        => {溫體牛肉}      0.003460208 1.0000000 
##      coverage    lift      count
## [1]  0.005190311  1.986254  2   
## [2]  0.008650519  2.709375  3   
## [3]  0.008650519 14.450000  3   
## [4]  0.008650519  5.100000  3   
## [5]  0.008650519  3.612500  3   
## [6]  0.008650519  2.127607  3   
## [7]  0.015570934  3.344907  5   
## [8]  0.015570934  3.471471  6   
## [9]  0.015570934  2.364008  6   
## [10] 0.024221453  1.489691  7   
## [11] 0.015570934  3.618153  8   
## [12] 0.027681661  1.489691  8   
## [13] 0.034602076  3.014110 17   
## [14] 0.031141869  1.970007 10   
## [15] 0.036332180  2.532866 15   
## [16] 0.041522491  5.132514 13   
## [17] 0.041522491  2.068507 14   
## [18] 0.057093426  3.284091 18   
## [19] 0.057093426  2.682501 17   
## [20] 0.057093426  2.343577 19   
## [21] 0.089965398  2.932279 43   
## [22] 0.105536332  1.802072 31   
## [23] 0.117647059  1.877301 36   
## [24] 0.166089965  2.657845 49   
## [25] 0.192041522  1.788979 56   
## [26] 0.005190311  1.986254  2   
## [27] 0.003460208  4.515625  2   
## [28] 0.005190311  5.666667  2   
## [29] 0.005190311 16.055556  2   
## [30] 0.005190311  4.013889  2   
## [31] 0.005190311 16.055556  2   
## [32] 0.005190311  2.364008  2   
## [33] 0.005190311 16.055556  2   
## [34] 0.005190311  3.546012  3   
## [35] 0.005190311  8.500000  3   
## [36] 0.005190311  2.713615  2   
## [37] 0.003460208  6.020833  2   
## [38] 0.005190311  3.546012  3   
## [39] 0.006920415  3.905405  3   
## [40] 0.005190311  6.020833  3   
## [41] 0.008650519 16.514286  3   
## [42] 0.005190311  5.207207  3   
## [43] 0.010380623 13.761905  3   
## [44] 0.005190311  2.364008  2   
## [45] 0.003460208 17.000000  2   
## [46] 0.003460208 19.931034  2   
## [47] 0.003460208  5.207207  2   
## [48] 0.003460208  3.546012  2   
## [49] 0.003460208  5.207207  2   
## [50] 0.003460208  3.546012  2